date:20141106

[Qemu-devel] Adding SMP support for Sparc Target

2014-11-06 Thread Damien Hilloulin


Hello everyone,

I'm a newcomer in QEMU and my goal would be to port an existing system 
simulator using another emulator to QEMU.
Some work has already been done, and Sparc has been the main target so 
far because of its simplicity (and because we have a very good support 
for Sparc with the other emulator).
QEMU is great, open-source (contrary to the other emulator we have been 
using in the past), and that's why we are aiming at using it.


However, it seems that the Sparc targets doesn't really support SMP/CMT 
as of now. So I am considering two possibilities:
- adding SMP support in QEMU for the Sparc targets (and contribute it to 
QEMU :) )
- switch our focus to another arch wich supports SMP/CMT in QEMU e.g 
x86, ARM (lot of work for us : ( )...


So that's why I'm writing to you, to judge how feasible it is to add the 
SMP/CMT support for the Sparc Targets.
I'm not a Sparc guru at all, so I don't really know what would be the 
steps to add this support.
Do you have any idea about how to approach it? Do you know where I can 
find some good documentation that could help me?
How much time it would take for someone having 3 or 4 hours per day to 
dedicate it?



Thanks for any help,

Best Regards,
Damien.

Re: [Qemu-devel] [PATCH v3] qemu-log: add log category for MMU info

2014-11-06 Thread Antony Pavlov


  Running barebox on qemu-system-mips* with '-d unimp' overloads
  stderr by very very many mips_cpu_handle_mmu_fault() messages:
  
mips_cpu_handle_mmu_fault address=b80003fd ret 0 physical 
  180003fd 
  prot 3
mips_cpu_handle_mmu_fault address=a0800884 ret 0 physical 
  00800884 
  prot 3
mips_cpu_handle_mmu_fault pc a080cd80 ad b80003fd rw 0 mmu_idx 0
  
  So it's very difficult to find LOG_UNIMP message.
  
  The mips_cpu_handle_mmu_fault() messages appears on enabling ANY
  logging! It's not very handy.
  
  Adding separate log category for *_cpu_handle_mmu_fault()
  logging fixes the problem.
  
  Signed-off-by: Antony Pavlov address@hidden
 
 Have you benchmarked the performance delta with this patch applied? Just
 boot up a random small PPC guest that shuts down immediately and time
 it with and without the patch applied.

Here is my simple benchmark.

I have used buildroot with qemu_ppc64_pseries_defconfig configuration.

After successfull rootfs image build I patched inittab for halting-after-boot:

--- a/output/images/rootfs.ext2/etc/inittab   2014-11-06 10:21:25.024198993 
+0300
+++ b/output/images/rootfs.ext2/etc/inittab   2014-11-06 10:20:57.089421643 
+0300
@@ -23,10 +23,11 @@
 # now run any rc scripts
 ::sysinit:/etc/init.d/rcS
 
-::askfirst:-/bin/sh
+::sysinit:/sbin/halt
+#::askfirst:-/bin/sh
 
 # Put a getty on the serial port
-hvc0::respawn:/sbin/getty -L  hvc0 115200 vt100 # GENERIC_SERIAL
+#hvc0::respawn:/sbin/getty -L  hvc0 115200 vt100 # GENERIC_SERIAL
 
 # Stuff to do for the 3-finger salute
 ::ctrlaltdel:/sbin/reboot


Here is my qemu cmdline:

buildroot$ time ~/qemu.git/ppc64-softmmu/qemu-system-ppc64 -M pseries -cpu 
POWER7 -m 256 -kernel output/images/vmlinux -append 'console=hvc0 
root=/dev/sda' -drive file=output/images/rootfs.ext2,if=scsi,index=0 -serial 
stdio  -nographic -monitor none


I use my 'not-very-busy' AMD Opteron 6176 SE-based server for testing.

Three 'time' command outputs; qemu with qemu-log: add log category for MMU 
info patch:

real0m39.744s
user0m36.940s
sys 0m1.216s

real0m39.552s
user0m37.200s
sys 0m0.924s

real0m39.585s
user0m37.340s
sys 0m0.704s


Three 'time' command outputs; qemu without qemu-log: add log category for MMU 
info patch:

real0m39.732s
user0m37.484s
sys 0m0.756s

real0m40.077s
user0m37.920s
sys 0m0.744s

real0m39.766s
user0m37.304s
sys 0m1.032s


So the performance delta is less than experimental error.

-- 
Best regards,
  Antony Pavlov

Re: [Qemu-devel] [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice

2014-11-06 Thread Eric Auger

On 11/05/2014 06:35 PM, Alex Williamson wrote:
 Hi Eric,
 
 On Fri, 2014-10-31 at 14:05 +, Eric Auger wrote:
 Introduce the VFIODevice struct that is going to be shared by
 VFIOPCIDevice and VFIOPlatformDevice.

 Additional fields will be added there later on for review
 convenience.

 the group's device_list becomes a list of VFIODevice

 This obliges to rework the reset_handler which becomes generic and
 calls VFIODevice ops that are specialized in each parent object.
 Also functions that iterate on this list must take care that the
 devices can be something else than VFIOPCIDevice. The type is used
 to discriminate them.

 we profit from this step to change the prototype of
 vfio_unmask_intx, vfio_mask_intx, vfio_disable_irqindex which now
 apply to VFIODevice. They are renamed as *_irqindex.
 The index is passed as parameter to anticipate their usage for
 platform IRQs
 
 I cringe when reviewers tell me this, so I apologize in advance, but
 there are logically at least 4 separate things happening in this patch:
 
 1) VFIODevice
 2) VFIODeviceOps
 3) irqindex conversions
 4) strcmp(name) vs comparing :bb:dd.f
 
 I don't really see any dependencies between them, and
 I think they'd also be easier to review as 4 separate patches.  More
 below...

Hi Alex,

no problem I am going to split it.
 

 Signed-off-by: Eric Auger eric.au...@linaro.org

 ---

 v4-v5:
 - fix style issues
 - in vfio_initfn, rework allocation of vdev-vbasedev.name and
   replace snprintf by g_strdup_printf
 ---
  hw/vfio/pci.c | 241 
 +++---
  1 file changed, 147 insertions(+), 94 deletions(-)

 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
 index 93181bf..0531744 100644
 --- a/hw/vfio/pci.c
 +++ b/hw/vfio/pci.c
 @@ -48,6 +48,11 @@
  #define VFIO_ALLOW_KVM_MSI 1
  #define VFIO_ALLOW_KVM_MSIX 1
  
 +enum {
 +VFIO_DEVICE_TYPE_PCI = 0,
 +VFIO_DEVICE_TYPE_PLATFORM = 1,
 
 VFIO_DEVICE_TYPE_PLATFORM gets dropped in patch 8 and re-added in patch
 9.  Let's remove it here and let it's first appearance be in patch 9.

yes sure. My bad.
 
 +};
 +
  struct VFIOPCIDevice;
  
  typedef struct VFIOQuirk {
 @@ -185,9 +190,27 @@ typedef struct VFIOMSIXInfo {
  void *mmap;
  } VFIOMSIXInfo;
  
 +typedef struct VFIODeviceOps VFIODeviceOps;
 +
 +typedef struct VFIODevice {
 +QLIST_ENTRY(VFIODevice) next;
 +struct VFIOGroup *group;
 +char *name;
 +int fd;
 +int type;
 +bool reset_works;
 +bool needs_reset;
 +VFIODeviceOps *ops;
 +} VFIODevice;
 +
 +struct VFIODeviceOps {
 +bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
 +int (*vfio_hot_reset_multi)(VFIODevice *vdev);
 +};
 +
  typedef struct VFIOPCIDevice {
  PCIDevice pdev;
 -int fd;
 +VFIODevice vbasedev;
  VFIOINTx intx;
  unsigned int config_size;
  uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */
 @@ -203,20 +226,16 @@ typedef struct VFIOPCIDevice {
  VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
  VFIOVGA vga; /* 0xa, 0x3b0, 0x3c0 */
  PCIHostDeviceAddress host;
 -QLIST_ENTRY(VFIOPCIDevice) next;
 -struct VFIOGroup *group;
  EventNotifier err_notifier;
  uint32_t features;
  #define VFIO_FEATURE_ENABLE_VGA_BIT 0
  #define VFIO_FEATURE_ENABLE_VGA (1  VFIO_FEATURE_ENABLE_VGA_BIT)
  int32_t bootindex;
  uint8_t pm_cap;
 -bool reset_works;
  bool has_vga;
  bool pci_aer;
  bool has_flr;
  bool has_pm_reset;
 -bool needs_reset;
  bool rom_read_failed;
  } VFIOPCIDevice;
  
 @@ -224,7 +243,7 @@ typedef struct VFIOGroup {
  int fd;
  int groupid;
  VFIOContainer *container;
 -QLIST_HEAD(, VFIOPCIDevice) device_list;
 +QLIST_HEAD(, VFIODevice) device_list;
  QLIST_ENTRY(VFIOGroup) next;
  QLIST_ENTRY(VFIOGroup) container_next;
  } VFIOGroup;
 @@ -277,7 +296,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, 
 bool enabled);
  /*
   * Common VFIO interrupt disable
   */
 -static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
 +static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
  {
  struct vfio_irq_set irq_set = {
  .argsz = sizeof(irq_set),
 @@ -287,37 +306,37 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, 
 int index)
  .count = 0,
  };
  
 -ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set);
 +ioctl(vbasedev-fd, VFIO_DEVICE_SET_IRQS, irq_set);
  }
  
  /*
   * INTx
   */
 -static void vfio_unmask_intx(VFIOPCIDevice *vdev)
 +static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
  {
  struct vfio_irq_set irq_set = {
  .argsz = sizeof(irq_set),
  .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
 -.index = VFIO_PCI_INTX_IRQ_INDEX,
 +.index = index,
  .start = 0,
  .count = 1,
  };
 
 We're turning these into a generic function, but the function assumes a
 single start/count.  Do we want to reflect that

Re: [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation

2014-11-06 Thread Eric Auger

On 11/05/2014 11:23 PM, Alexander Graf wrote:
 
 
 On 05.11.14 13:31, Eric Auger wrote:
 On 11/05/2014 11:59 AM, Alexander Graf wrote:


 On 31.10.14 15:05, Eric Auger wrote:
 vfio-calxeda-xgmac now can be instantiated using the -device option.
 The node creation function generates a very basic dt node composed
 of the compat, reg and interrupts properties

 Signed-off-by: Eric Auger eric.au...@linaro.org

 ---

 v6 - v7:
 - compat string re-formatting removed since compat string is not exposed
   anymore as a user option
 - VFIO IRQ kick-off removed from sysbus-fdt and moved to VFIO platform
   device
 ---
  hw/arm/sysbus-fdt.c | 88 
 +
  1 file changed, 88 insertions(+)

 diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
 index d5476f1..f8b310b 100644
 --- a/hw/arm/sysbus-fdt.c
 +++ b/hw/arm/sysbus-fdt.c
 @@ -27,6 +27,8 @@
  #include hw/platform-bus.h
  #include sysemu/sysemu.h
  #include hw/platform-bus.h
 +#include hw/vfio/vfio-platform.h
 +#include hw/vfio/vfio-calxeda-xgmac.h
  
  /*
   * internal struct that contains the information to create dynamic
 @@ -54,8 +56,11 @@ typedef struct NodeCreationPair {
  int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque);
  } NodeCreationPair;
  
 +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque);
 +
  /* list of supported dynamic sysbus devices */
  NodeCreationPair add_fdt_node_functions[] = {
 +{TYPE_VFIO_CALXEDA_XGMAC, add_basic_vfio_fdt_node},
  {, NULL}, /*last element*/
  };

 Can you maybe place the list somewhere smartly to make sure we don't
 need forward declarations? Either put it in between the generic and
 device specific code or at the end of the file with a single forward
 declaration for the array?

 sure

  
 @@ -86,6 +91,89 @@ static int add_fdt_node(SysBusDevice *sbdev, void 
 *opaque)
  }
  
  /**
 + * add_basic_vfio_fdt_node - generates the most basic node for a VFIO node
 + *
 + * set properties are:
 + * - compatible string
 + * - regs
 + * - interrupts
 + */
 +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque)
 +{
 +PlatformBusFdtData *data = opaque;
 +PlatformBusDevice *pbus = data-pbus;
 +void *fdt = data-fdt;
 +const char *parent_node = data-pbus_node_name;
 +int compat_str_len;
 +char *nodename;
 +int i, ret;
 +uint32_t *irq_attr;
 +uint64_t *reg_attr;
 +uint64_t mmio_base;
 +uint64_t irq_number;
 +VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
 +VFIODevice *vbasedev = vdev-vbasedev;
 +Object *obj = OBJECT(sbdev);
 +
 +mmio_base = object_property_get_int(obj, mmio[0], NULL);
 +
 +nodename = g_strdup_printf(%s/%s@% PRIx64, parent_node,
 +   vbasedev-name,
 +   mmio_base);
 +
 +qemu_fdt_add_subnode(fdt, nodename);
 +
 +compat_str_len = strlen(vdev-compat) + 1;
 +qemu_fdt_setprop(fdt, nodename, compatible,
 +  vdev-compat, compat_str_len);

 What if there are multiple compatibles?
 My purpose here was absolutely not to come back again on a proposal
 where we could have a generic node creation. I understand that it is not
 realistic. I rather tried to put some common property creation in this
 function but you're right even the interrupt prop depend on the device.

 About your question, I think the specialized VFIO device would set its
 compat string including the various substrings. This was done in the
 past for PL330 which required arm,pl330;arm,primecell.


 +
 +reg_attr = g_new(uint64_t, vbasedev-num_regions*4);
 +
 +for (i = 0; i  vbasedev-num_regions; i++) {
 +mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
 +reg_attr[4*i] = 1;

 What is the 1 here?
 address-cells? since the bus is  4GB, 1 32b reg is required to specify
 the base address. But since you put #size-cells already in the parent
 node maybe I can remove it.
 
 I'm confused. Shouldn't the reg look like [ addr size ... ]?
 
   http://www.devicetree.org/Device_Tree_Usage#Memory_Mapped_Devices
 
 The number of cells is defined separately via #address-cells or #size-cells.

Hi Alex,

sorry my answer was misleading and I was mixing
qemu_fdt_setprop_sized_cells_from_array usage and produced dts syntax.
1 values effectively correspond to the number of cells respectively
used for addr value and size value. Args of
qemu_fdt_setprop_sized_cells_from_array are pairs (size, value), see
below as a reminder. The fact platform bus node has attributes
#size-cells = 0x1, and #address-cells = 0x1 forces me to use 1. As a
result the guest dt will look as

/ {
#address-cells = 1;
#size-cells = 1;

...

serial@101f {
compatible = arm,pl011;
reg = 0x101f 0x1000 ;
../..

I hope this clarifies.

Best Regards

Eric

 * qemu_fdt_setprop_sized_cells_from_array:
 * @fdt: device tree blob
 * @node_path: node to set property on
 * @property:

Re: [Qemu-devel] [PATCH] ui/input: strictly check console in finding input handler

2014-11-06 Thread Markus Armbruster

Gerd Hoffmann kra...@redhat.com writes:

 On Mi, 2014-11-05 at 00:49 +0800, Amos Kong wrote:
 qemu_input_find_handler() prefers a handler associated with con.
 But if none exists, it takes any. This patch added a parameter
 to strictly check console, in case we want to input event to
 special console.
 
 'input-send-event' has a parameter to assign special console,
 so we should enable strict checking in finding handler.

 I don't think we want do that by default.  It only matters in case of a
 multiseat setup where you actually have multiple input devices of the
 same kind.  Which isn't a very typical use case.

 Options I see are:

   (a) Turn console into an optional parameter, do strict checking in
   case it is present.
   (b) Add a optional 'strict' parameter.

Current behavior (please correct misunderstandings):

The guest must be running.
input-send-event parameter 'console' is mandatory.
The console identified by its value must exist.
If this console can accept the event, send it there.
Else, a console that can accept the event must exist.  Send it to
one of them.  Which one exactly isn't specified.

Behavior with (a):

The guest must be running.
input-send-event parameter 'console' is optional.
If it's present, the console identified by its value must exist, and
must be able to accept the event.  Send it there.
Else, a console that can accept the event must exist.  Send it to
one of them.  Which one exactly isn't specified.

Must means or else command fails.

I think that's a clear improvement.  It's actually what I expected from
the command documentation, until I read the code.

Re: [Qemu-devel] [PATCH 2/4] Qemu-Xen-vTPM: Register Xen stubdom vTPM frontend driver

2014-11-06 Thread Xu, Quan

 -Original Message-
 From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com]
 Sent: Monday, November 03, 2014 7:54 PM
 To: Xu, Quan
 Cc: qemu-devel@nongnu.org; xen-de...@lists.xen.org;
 stefano.stabell...@eu.citrix.com
 Subject: Re: [PATCH 2/4] Qemu-Xen-vTPM: Register Xen stubdom vTPM
 frontend driver

 On Sun, 2 Nov 2014, Quan Xu wrote:
  This drvier transfers any request/repond between TPM xenstubdoms
  driver and Xen vTPM stubdom, and facilitates communications between
  Xen vTPM stubdom domain and vTPM xenstubdoms driver

  Signed-off-by: Quan Xu quan...@intel.com

 Please describe what changes did make to xen_backend.c and why.
 The commit message should contains info on all the changes made by the
 patch below.

The following is code process when Qemu is running with Xen. 
##code process
[...]
 xen_hvm_init()
--xen_be_register()
--xenstore_scan()
--xen_be_check_state()

--xen_vtpm_register()

ideally, I can register 'vtpm' via xen_vtpm_register() as

+ xen_be_register(console, xen_console_ops);
+ xen_be_register(vkbd, xen_kbdmouse_ops);
+ xen_be_register(qdisk, xen_blkdev_ops);

but there are 2 reasons why I add xen_vtpm_register(), instead
of xen_be_register().

1. The backend of TPM is runing in a Xen stubDom, not Domain 0.
some functions are not working, for example 'setup watch' and
'look for backend' in xenstore_scan()

2. there is a thread runing in Xen stubDom [event_thread()], it will
handle backend status when the frontend is initialized. It is not
compatible with xen_be_check_state(). xen_be_check_state() always tries 
to modify the status of backend. 

as there is always a tradeoff, if I force to integrate this case into
xen_be_register(), there are maybe a lot of 'if ... else.. '. It will
break the code architecture. Also I should leverage existing source
code with minimum modifcation. i add 'DEVOPS_FLAG_STUBDOM_BE' flag in
include/hw/xen/xen_backend.h to indicate that device backend is Xen
stubDom.

 Please also describe what is the Xen vTPM stubdom, what is the
 vTPM xenstubdoms driver and how the communicate with each others.

In previous email, I have explained what is Xen vTPM stubdom, what is the 
 vTPM xenstubdoms driver . 

Let me describe how the communicate with each others.

||  ^   |
|v  |   |
|vTPM  |
| XenStubdoms driver |  (new ..)
+-+
 |  ^
 v  |
+-+
|  xen_vtpmdev_ops |  (new ..)
+-+

xen_vtpmdev_ops is initialized with the following process:
 xen_hvm_init()
[...]
--xen_vtpm_register()
  [...]
-- vtpm_alloc()
-- vtpm_initialise()

## 
vTPM XenStubdoms driver is initialized by Qemu command line options,
-tpmdev xenstubdoms,id=xenvtpm0 -device tpm-tis,tpmdev=xenvtpm0

(the communicate with each others in following function.
.. hw/tpm/tpm_xenstubdoms.c
static int tpm_xenstubdoms_unix_transfer(const TPMLocality *locty_data)
{
[...]
xen_vtpm_send(locty_data-w_buffer.buffer, locty_data-w_offset);
xen_vtpm_recv(locty_data-r_buffer.buffer, rlen);
[...]
} 

 Where does the vTPM backend lives?

Xen stubDom. 

   hw/xen/Makefile.objs |   1 +
   hw/xen/xen_backend.c | 182 ++-
   hw/xen/xen_stubdom_vtpm.c| 333
 +++
   include/hw/xen/xen_backend.h |  11 ++
   include/hw/xen/xen_common.h  |   6 +
   xen-hvm.c|  13 ++
   6 files changed, 544 insertions(+), 2 deletions(-)
   create mode 100644 hw/xen/xen_stubdom_vtpm.c

  diff --git a/hw/xen/Makefile.objs b/hw/xen/Makefile.objs
  index a0ca0aa..724df8d 100644
  --- a/hw/xen/Makefile.objs
  +++ b/hw/xen/Makefile.objs
  @@ -1,5 +1,6 @@
   # xen backend driver support
   common-obj-$(CONFIG_XEN_BACKEND) += xen_backend.o
 xen_devconfig.o
  +common-obj-$(CONFIG_TPM_XENSTUBDOMS) += xen_stubdom_vtpm.o

   obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
   obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o
 xen_pt_config_init.o xen_pt_msi.o
  diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
  index b2cb22b..45a5778 100644
  --- a/hw/xen/xen_backend.c
  +++ b/hw/xen/xen_backend.c
  @@ -194,6 +194,32 @@ int xen_be_set_state(struct XenDevice *xendev,
 enum xenbus_state state)
   return 0;
   }

  +/*get stubdom backend*/
  +static char *xen_stubdom_be(const char *type, int dom, int dev)
  +{
  +char *val, *domu;
  +char path[XEN_BUFSIZE];
  +unsigned int len, ival;
  +
  +/*front domu*/
  +domu = xs_get_domain_path(xenstore, dom);
  +snprintf(path, sizeof(path), %s/device/%s/%d/backend-id,
  + domu, type, dev);
  +g_free(domu);
  +
  +val = xs_read(xenstore, 0, path, len);
  +if (!val || 1 != sscanf(val, %d, ival)) {
  +

Re: [Qemu-devel] [PATCH] 9pfs: changed to use event_notifier instead of qemu_pipe

2014-11-06 Thread Michael S. Tsirkin

On Thu, Nov 06, 2014 at 10:38:34AM +0900, SeokYeon Hwang wrote:
 Please review this patch.
 
 Thanks.

Thanks for the patch!
Sorry, this patch missed the devel freeze.
Please resubmit after 2.2 is out.

  -Original Message-
  From: SeokYeon Hwang [mailto:syeon.hw...@samsung.com]
  Sent: Friday, October 31, 2014 5:04 PM
  To: qemu-devel@nongnu.org
  Cc: aneesh.ku...@linux.vnet.ibm.com; SeokYeon Hwang
  Subject: [PATCH] 9pfs: changed to use event_notifier instead of qemu_pipe
  
  Changed to use event_notifier instead of qemu_pipe.
  It is necessary for porting 9pfs to Windows and MacOS.
  
  Signed-off-by: SeokYeon Hwang syeon.hw...@samsung.com
  ---
   hw/9pfs/virtio-9p-coth.c | 29 +++--
  hw/9pfs/virtio-9p-coth.h |  4 ++--
   2 files changed, 9 insertions(+), 24 deletions(-)
  
  diff --git a/hw/9pfs/virtio-9p-coth.c b/hw/9pfs/virtio-9p-coth.c index
  ae6cde8..8185c53 100644
  --- a/hw/9pfs/virtio-9p-coth.c
  +++ b/hw/9pfs/virtio-9p-coth.c
  @@ -14,6 +14,7 @@
  
   #include fsdev/qemu-fsdev.h
   #include qemu/thread.h
  +#include qemu/event_notifier.h
   #include block/coroutine.h
   #include virtio-9p-coth.h
  
  @@ -26,15 +27,11 @@ void co_run_in_worker_bh(void *opaque)
   g_thread_pool_push(v9fs_pool.pool, co, NULL);  }
  
  -static void v9fs_qemu_process_req_done(void *arg)
  +static void v9fs_qemu_process_req_done(EventNotifier *e)
   {
  -char byte;
  -ssize_t len;
   Coroutine *co;
  
  -do {
  -len = read(v9fs_pool.rfd, byte, sizeof(byte));
  -} while (len == -1   errno == EINTR);
  +event_notifier_test_and_clear(e);
  
   while ((co = g_async_queue_try_pop(v9fs_pool.completed)) != NULL) {
   qemu_coroutine_enter(co, NULL); @@ -43,22 +40,18 @@ static void
  v9fs_qemu_process_req_done(void *arg)
  
   static void v9fs_thread_routine(gpointer data, gpointer user_data)  {
  -ssize_t len;
  -char byte = 0;
   Coroutine *co = data;
  
   qemu_coroutine_enter(co, NULL);
  
   g_async_queue_push(v9fs_pool.completed, co);
  -do {
  -len = write(v9fs_pool.wfd, byte, sizeof(byte));
  -} while (len == -1  errno == EINTR);
  +
  +event_notifier_set(v9fs_pool.e);
   }
  
   int v9fs_init_worker_threads(void)
   {
   int ret = 0;
  -int notifier_fds[2];
   V9fsThPool *p = v9fs_pool;
   sigset_t set, oldset;
  
  @@ -66,10 +59,6 @@ int v9fs_init_worker_threads(void)
   /* Leave signal handling to the iothread.  */
   pthread_sigmask(SIG_SETMASK, set, oldset);
  
  -if (qemu_pipe(notifier_fds) == -1) {
  -ret = -1;
  -goto err_out;
  -}
   p-pool = g_thread_pool_new(v9fs_thread_routine, p, -1, FALSE, NULL);
   if (!p-pool) {
   ret = -1;
  @@ -84,13 +73,9 @@ int v9fs_init_worker_threads(void)
   ret = -1;
   goto err_out;
   }
  -p-rfd = notifier_fds[0];
  -p-wfd = notifier_fds[1];
  -
  -fcntl(p-rfd, F_SETFL, O_NONBLOCK);
  -fcntl(p-wfd, F_SETFL, O_NONBLOCK);
  +event_notifier_init(p-e, 0);
  
  -qemu_set_fd_handler(p-rfd, v9fs_qemu_process_req_done, NULL, NULL);
  +event_notifier_set_handler(p-e, v9fs_qemu_process_req_done);
   err_out:
   pthread_sigmask(SIG_SETMASK, oldset, NULL);
   return ret;
  diff --git a/hw/9pfs/virtio-9p-coth.h b/hw/9pfs/virtio-9p-coth.h index
  86d5ed4..4f51b25 100644
  --- a/hw/9pfs/virtio-9p-coth.h
  +++ b/hw/9pfs/virtio-9p-coth.h
  @@ -21,8 +21,8 @@
   #include glib.h
  
   typedef struct V9fsThPool {
  -int rfd;
  -int wfd;
  +EventNotifier e;
  +
   GThreadPool *pool;
   GAsyncQueue *completed;
   } V9fsThPool;
  --
  2.1.0

Re: [Qemu-devel] [PATCH] pci: fixed mismatch of error-handling between pci_qdev_init() and qdev

2014-11-06 Thread Markus Armbruster

SeokYeon Hwang syeon.hw...@samsung.com writes:

 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
 Bonzini
 Sent: Wednesday, November 05, 2014 11:55 PM
 To: Michael S. Tsirkin
 Cc: Markus Armbruster; SeokYeon Hwang; qemu-devel@nongnu.org
 Subject: Re: [Qemu-devel] [PATCH] pci: fixed mismatch of error-handling
 between pci_qdev_init() and qdev

 On 05/11/2014 14:28, Michael S. Tsirkin wrote:
   I think bypassing the question by converting to realize makes the
   most sense...

  I'm fine with doing that but Markus's patches wouldn't yet have solved
  the problem by themselves since init is still around, right?

  This probably means fixing this bug can't justify merging the realize
  patchset after freeze.

 Yes, I agree.  I meant that the API is not very well defined.  I would
 handle everything else on a case-by-case basis, by reviewing each init
 function that is converted to realize.

 Since the patch was for an out-of-tree device, it can wait for 2.3 anyway.

 Paolo

 I cannot fully understand your conversation.

You appear to have a PCIDeviceClass method init() returning a positive
value.  Doesn't work.  Only values = 0 do.

Your proposed fix is to make its caller treat a positive value like a
negative one.

Paolo points out that init()'s contract is unclear.  His preferred way
of clarifying it is to convert PCI from init() to realize(), which has a
sufficiently clear contract.

Doesn't help you now.  My pci: Partial conversion to realize series,
will help you once it lands, but only if you convert your device.

You obviously want a solution earlier.  The one you proposed implicitly
clarifies the PCIDeviceClass init() contract to zero means success,
anything else failure.  I don't think that's a good idea, because it
makes PCIDeviceClass's init() differ from DeviceClass's.  There,
non-negative value means success, negative means failure (see
device_realize()).

Fix your device not to return positive values instead.

You could additionally fix pci_qdev_init() to treat positive numbers as
success, for consistency with device_realize(), but that requires
auditing all existing PCIDeviceClass init() methods.  Waste of your
time, because they all go away when we convert to realize().

 But, I think this patch is still worth before all 'init()' convert to
 'realize()'.
 Moreover, It has no side effect at all.

I don't like it, because it makes PCIDeviceClass's init() inconsistent
with DeviceClass's.

Re: [Qemu-devel] [PATCH] pci: fixed mismatch of error-handling between pci_qdev_init() and qdev

2014-11-06 Thread Michael S. Tsirkin

On Thu, Nov 06, 2014 at 11:26:01AM +0900, SeokYeon Hwang wrote:

  -Original Message-
  From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
  Bonzini
  Sent: Wednesday, November 05, 2014 11:55 PM
  To: Michael S. Tsirkin
  Cc: Markus Armbruster; SeokYeon Hwang; qemu-devel@nongnu.org
  Subject: Re: [Qemu-devel] [PATCH] pci: fixed mismatch of error-handling
  between pci_qdev_init() and qdev

  On 05/11/2014 14:28, Michael S. Tsirkin wrote:
I think bypassing the question by converting to realize makes the
most sense...

   I'm fine with doing that but Markus's patches wouldn't yet have solved
   the problem by themselves since init is still around, right?

   This probably means fixing this bug can't justify merging the realize
   patchset after freeze.

  Yes, I agree.  I meant that the API is not very well defined.  I would
  handle everything else on a case-by-case basis, by reviewing each init
  function that is converted to realize.

  Since the patch was for an out-of-tree device, it can wait for 2.3 anyway.

  Paolo

 I cannot fully understand your conversation.
 But, I think this patch is still worth before all 'init()' convert to
 'realize()'.
 Moreover, It has no side effect at all.

 Thanks.

The root cause is API misuse: functions that return int
should return a negative code on failure, either 0 or = 0 on success.
In rare cases, we use int as bool, so 0 on failure, 1 on success.

Your device returned 1 on failure, this broke things.
So don't do this then :)

The question would be: are there existing devices that return a positive
return code on init. If there are, it's a bug, but the best fix might be
your patch - easier that fixing many devices.

If there aren't, the patch isn't needed.

-- 
MST

Re: [Qemu-devel] [PATCH] seccomp: change configure to avoid arm 32 to break

2014-11-06 Thread Eduardo Otubo

On Wed, Nov 05, 2014 at 03:35:09PM -0500, Paul Moore wrote:
 On Wednesday, November 05, 2014 08:08:06 PM Peter Maydell wrote:
  On 5 November 2014 19:46, Paul Moore pmo...@redhat.com wrote:
   On Wednesday, November 05, 2014 05:08:20 PM Peter Maydell wrote:
   On 5 November 2014 16:47, Eduardo Otubo wrote:
Right now seccomp is breaking the compilation of Qemu on armv7l due
to libsecomp current lack of support for this arch. This problem is
already fixed on libseccomp upstream but no release date for that is
scheduled to far. This patch disables support for seccomp on armv7l
temporarily until libseccomp does a new release. Then I'll remove the
hack and update libseccomp dependency on configure script.

Related bug: https://bugs.launchpad.net/qemu/+bug/1363641
   
   ...
   
   (How are upstream proposing to fix this anyway? I couldn't
   figure that out from the mailing list thread.)
   
   The problem was that the released version of libseccomp has some holes
   in
   the internal syscall table for 32-bit ARM with respect to all of the other
   supported architectures.  The current libseccomp upstream has some
   additional tooling and checks to ensure that the different ABI syscall
   tables are kept in sync to prevent something like this from happening in
   the future.
  
  OK. So should we make QEMU say if x86_64 or i386, require
  seccomp 2.1 or better, else require 2.2 or better?

I don't think it's worth to point to a non existing version right now,
it might confuse people.

 
 I would probably just limit QEMU/seccomp to x86_64 and x86.  Once we have the 
 new release that fixes everything we can start worrying about versions and 
 different ABIs.

That's fine for me, since is a temporary fix. I'll just go and rewrite
this patch, then.

Paul, do you have any plans for a new libseccomp release?

Regards,

-- 
Eduardo Otubo
ProfitBricks GmbH

Re: [Qemu-devel] [PATCH] error: fixed error_set_errno() to deal with a negative type of os_error.

2014-11-06 Thread Markus Armbruster

SeokYeon Hwang syeon.hw...@samsung.com writes:

 -Original Message-
 From: SeokYeon Hwang [mailto:syeon.hw...@samsung.com]
 Sent: Wednesday, November 05, 2014 10:13 PM
 To: 'Paolo Bonzini'; 'Max Reitz'; 'qemu-devel@nongnu.org'
 Cc: 'arm...@redhat.com'; 'paolo.bonz...@gmail.com'
 Subject: RE: [PATCH] error: fixed error_set_errno() to deal with a
 negative type of os_error.

  -Original Message-
  From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of
  Paolo Bonzini
  Sent: Wednesday, November 05, 2014 9:45 PM
  To: Max Reitz; SeokYeon Hwang; qemu-devel@nongnu.org
  Cc: arm...@redhat.com; paolo.bonz...@gmail.com
  Subject: Re: [PATCH] error: fixed error_set_errno() to deal with a
  negative type of os_error.

  On 05/11/2014 12:11, Max Reitz wrote:

   Of course I understand, but this patch doesn't make matters worse,
   as long as there are not systems which have negative values for
   errno (which I think we generally assume not to exist throughout
 qemu).
   That's why I'm fine with it. We should fix the callers but I don't
   see why we shouldn't apply this patch as well.

   A similar issue already came up and led to commit b276d2499, where
   callers of error_setg_errno() assumed that it would not clobber
   errno, so we fixed some of the callers but also applied that commit
   which just saves errno because there's no reason not to.

  I think side effect are a different matter than misuse of QEMU.

  There are only 157 calls to error_setg_errno; 67 use errno as the
  argument, and 4 use an explicit errno value (one of them is the wrong
  - EBUSY).  The other 86 seem correct and should not be hard to audit.

  Let's instead add an assertion check to error_setg_errno.

  Paolo

 I have expected to come out several opinions about this patch.

 The use of negative errno on strerror() was obviously wrong. But that
 does not mean it is wrong to use the negative errno on
 error_set_errno().
 The reason that I chose this one among the solutions is to change function
 specification. I think it seems good to us to respect the tradition of the
 developers that use negative errno.

 But if error_set_errno() has strict specification - so, we must not change
 it's spec - I agree with Paolo's opinion.

 I think we have 2 options.

 1. error_set_errno() is just utility for developer's convenience.
 Why can't we supply more convenience to developer ??
 - My first opinion.

 2. It is not just utility function for convenience or we cannot change
 its spec because it is well-known function.
 - If this is right, I'm ready to post 2nd patch that applied Paolo's
 opinion.

 What do you think about it??

3. Passing a negative value to an errno parameter is wrong.  It's
probably a harmless sign error, but it *could* be a logic error.  We
should not sweep programming errors under the rug.

Please assert(os_error = 0).  Help with auditing callers is welcome.

Re: [Qemu-devel] [PATCH] pci: fixed mismatch of error-handling between pci_qdev_init() and qdev

2014-11-06 Thread Michael S. Tsirkin

On Thu, Nov 06, 2014 at 10:20:32AM +0100, Markus Armbruster wrote:
 SeokYeon Hwang syeon.hw...@samsung.com writes:

  -Original Message-
  From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
  Bonzini
  Sent: Wednesday, November 05, 2014 11:55 PM
  To: Michael S. Tsirkin
  Cc: Markus Armbruster; SeokYeon Hwang; qemu-devel@nongnu.org
  Subject: Re: [Qemu-devel] [PATCH] pci: fixed mismatch of error-handling
  between pci_qdev_init() and qdev

  On 05/11/2014 14:28, Michael S. Tsirkin wrote:
I think bypassing the question by converting to realize makes the
most sense...

   I'm fine with doing that but Markus's patches wouldn't yet have solved
   the problem by themselves since init is still around, right?

   This probably means fixing this bug can't justify merging the realize
   patchset after freeze.

  Yes, I agree.  I meant that the API is not very well defined.  I would
  handle everything else on a case-by-case basis, by reviewing each init
  function that is converted to realize.

  Since the patch was for an out-of-tree device, it can wait for 2.3 anyway.

  Paolo

  I cannot fully understand your conversation.

 You appear to have a PCIDeviceClass method init() returning a positive
 value.  Doesn't work.  Only values = 0 do.

 Your proposed fix is to make its caller treat a positive value like a
 negative one.

 Paolo points out that init()'s contract is unclear.  His preferred way
 of clarifying it is to convert PCI from init() to realize(), which has a
 sufficiently clear contract.

 Doesn't help you now.  My pci: Partial conversion to realize series,
 will help you once it lands, but only if you convert your device.

 You obviously want a solution earlier.  The one you proposed implicitly
 clarifies the PCIDeviceClass init() contract to zero means success,
 anything else failure.  I don't think that's a good idea, because it
 makes PCIDeviceClass's init() differ from DeviceClass's.  There,
 non-negative value means success, negative means failure (see
 device_realize()).

 Fix your device not to return positive values instead.

 You could additionally fix pci_qdev_init() to treat positive numbers as
 success, for consistency with device_realize(), but that requires
 auditing all existing PCIDeviceClass init() methods.  Waste of your
 time, because they all go away when we convert to realize().

  But, I think this patch is still worth before all 'init()' convert to
  'realize()'.
  Moreover, It has no side effect at all.

 I don't like it, because it makes PCIDeviceClass's init() inconsistent
 with DeviceClass's.

I agree with Markus here. A positive return value should not indicate an
error.

-- 
MST

Re: [Qemu-devel] [RFC PATCH] virtio-mmio: support for multiple irqs

2014-11-06 Thread Michael S. Tsirkin

On Tue, Nov 04, 2014 at 05:35:12PM +0800, Shannon Zhao wrote:
 As the current virtio-mmio only support single irq,
 so some advanced features such as vhost-net with irqfd
 are not supported. And the net performance is not
 the best without vhost-net and irqfd supporting.
 
 This patch support virtio-mmio to request multiple
 irqs like virtio-pci. With this patch and qemu assigning
 multiple irqs for virtio-mmio device, it's ok to use
 vhost-net with irqfd on arm/arm64.
 
 As arm doesn't support msi-x now, we use GSI for
 multiple irq. In this patch we use vm_try_to_find_vqs
 to check whether multiple irqs are supported like
 virtio-pci.
 
 Is this the right direction? is there other ways to
 make virtio-mmio support multiple irq? Hope for feedback.
 Thanks.
 
 Signed-off-by: Shannon Zhao zhaoshengl...@huawei.com


So how does guest discover whether host device supports multiple IRQs?
Could you please document the new interface?
E.g. send a patch for virtio spec.
I think this really should be controlled by hypervisor, per device.
I'm also tempted to make this a virtio 1.0 only feature.



 ---
  drivers/virtio/virtio_mmio.c |  234 
 --
  1 files changed, 203 insertions(+), 31 deletions(-)
 
 diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
 index c600ccf..2b7d935 100644
 --- a/drivers/virtio/virtio_mmio.c
 +++ b/drivers/virtio/virtio_mmio.c
 @@ -122,6 +122,15 @@ struct virtio_mmio_device {
   /* a list of queues so we can dispatch IRQs */
   spinlock_t lock;
   struct list_head virtqueues;
 +
 + /* multiple irq support */
 + int single_irq_enabled;
 + /* Number of available irqs */
 + unsigned num_irqs;
 + /* Used number of irqs */
 + int used_irqs;
 + /* Name strings for interrupts. */
 + char (*vm_vq_names)[256];
  };
  
  struct virtio_mmio_vq_info {
 @@ -229,33 +238,53 @@ static bool vm_notify(struct virtqueue *vq)
   return true;
  }
  
 +/* Handle a configuration change: Tell driver if it wants to know. */
 +static irqreturn_t vm_config_changed(int irq, void *opaque)
 +{
 + struct virtio_mmio_device *vm_dev = opaque;
 + struct virtio_driver *vdrv = container_of(vm_dev-vdev.dev.driver,
 + struct virtio_driver, driver);
 +
 + if (vdrv  vdrv-config_changed)
 + vdrv-config_changed(vm_dev-vdev);
 + return IRQ_HANDLED;
 +}
 +
  /* Notify all virtqueues on an interrupt. */
 -static irqreturn_t vm_interrupt(int irq, void *opaque)
 +static irqreturn_t vm_vring_interrupt(int irq, void *opaque)
  {
   struct virtio_mmio_device *vm_dev = opaque;
   struct virtio_mmio_vq_info *info;
 - struct virtio_driver *vdrv = container_of(vm_dev-vdev.dev.driver,
 - struct virtio_driver, driver);
 - unsigned long status;
 + irqreturn_t ret = IRQ_NONE;
   unsigned long flags;
 +
 + spin_lock_irqsave(vm_dev-lock, flags);
 + list_for_each_entry(info, vm_dev-virtqueues, node) {
 + if (vring_interrupt(irq, info-vq) == IRQ_HANDLED)
 + ret = IRQ_HANDLED;
 + }
 + spin_unlock_irqrestore(vm_dev-lock, flags);
 +
 + return ret;
 +}
 +
 +/* Notify all virtqueues and handle a configuration
 + * change on an interrupt. */
 +static irqreturn_t vm_interrupt(int irq, void *opaque)
 +{
 + struct virtio_mmio_device *vm_dev = opaque;
 + unsigned long status;
   irqreturn_t ret = IRQ_NONE;
  
   /* Read and acknowledge interrupts */
   status = readl(vm_dev-base + VIRTIO_MMIO_INTERRUPT_STATUS);
   writel(status, vm_dev-base + VIRTIO_MMIO_INTERRUPT_ACK);
  
 - if (unlikely(status  VIRTIO_MMIO_INT_CONFIG)
 -  vdrv  vdrv-config_changed) {
 - vdrv-config_changed(vm_dev-vdev);
 - ret = IRQ_HANDLED;
 - }
 + if (unlikely(status  VIRTIO_MMIO_INT_CONFIG))
 + return vm_config_changed(irq, opaque);
  
 - if (likely(status  VIRTIO_MMIO_INT_VRING)) {
 - spin_lock_irqsave(vm_dev-lock, flags);
 - list_for_each_entry(info, vm_dev-virtqueues, node)
 - ret |= vring_interrupt(irq, info-vq);
 - spin_unlock_irqrestore(vm_dev-lock, flags);
 - }
 + if (likely(status  VIRTIO_MMIO_INT_VRING))
 + return vm_vring_interrupt(irq, opaque);
  
   return ret;
  }
 @@ -284,18 +313,98 @@ static void vm_del_vq(struct virtqueue *vq)
   kfree(info);
  }
  
 -static void vm_del_vqs(struct virtio_device *vdev)
 +static void vm_free_irqs(struct virtio_device *vdev)
  {
 + int i;
   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
 +
 + if (vm_dev-single_irq_enabled) {
 + free_irq(platform_get_irq(vm_dev-pdev, 0), vm_dev);
 + vm_dev-single_irq_enabled = 0;
 + }
 +
 + for (i = 0; i  vm_dev-used_irqs; ++i)
 + free_irq(platform_get_irq(vm_dev-pdev, i), vm_dev);
 +
 + vm_dev-num_irqs = 0;
 +

Re: [Qemu-devel] [PATCH] pci: fixed mismatch of error-handling between pci_qdev_init() and qdev

2014-11-06 Thread SeokYeon Hwang

 -Original Message-
 From: Markus Armbruster [mailto:arm...@redhat.com]
 Sent: Thursday, November 06, 2014 6:21 PM
 To: SeokYeon Hwang
 Cc: 'Paolo Bonzini'; 'Michael S. Tsirkin'; qemu-devel@nongnu.org
 Subject: Re: [Qemu-devel] [PATCH] pci: fixed mismatch of error-handling
 between pci_qdev_init() and qdev

 SeokYeon Hwang syeon.hw...@samsung.com writes:

  -Original Message-
  From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of
  Paolo Bonzini
  Sent: Wednesday, November 05, 2014 11:55 PM
  To: Michael S. Tsirkin
  Cc: Markus Armbruster; SeokYeon Hwang; qemu-devel@nongnu.org
  Subject: Re: [Qemu-devel] [PATCH] pci: fixed mismatch of
  error-handling between pci_qdev_init() and qdev

  On 05/11/2014 14:28, Michael S. Tsirkin wrote:
I think bypassing the question by converting to realize makes the
most sense...

   I'm fine with doing that but Markus's patches wouldn't yet have
   solved the problem by themselves since init is still around, right?

   This probably means fixing this bug can't justify merging the
   realize patchset after freeze.

  Yes, I agree.  I meant that the API is not very well defined.  I
  would handle everything else on a case-by-case basis, by reviewing
  each init function that is converted to realize.

  Since the patch was for an out-of-tree device, it can wait for 2.3
 anyway.

  Paolo

  I cannot fully understand your conversation.

 You appear to have a PCIDeviceClass method init() returning a positive
 value.  Doesn't work.  Only values = 0 do.

 Your proposed fix is to make its caller treat a positive value like a
 negative one.

 Paolo points out that init()'s contract is unclear.  His preferred way of
 clarifying it is to convert PCI from init() to realize(), which has a
 sufficiently clear contract.

 Doesn't help you now.  My pci: Partial conversion to realize series,
 will help you once it lands, but only if you convert your device.

 You obviously want a solution earlier.  The one you proposed implicitly
 clarifies the PCIDeviceClass init() contract to zero means success,
 anything else failure.  I don't think that's a good idea, because it
 makes PCIDeviceClass's init() differ from DeviceClass's.  There, non-
 negative value means success, negative means failure (see
 device_realize()).

 Fix your device not to return positive values instead.

 You could additionally fix pci_qdev_init() to treat positive numbers as
 success, for consistency with device_realize(), but that requires auditing
 all existing PCIDeviceClass init() methods.  Waste of your time, because
 they all go away when we convert to realize().

  But, I think this patch is still worth before all 'init()' convert to
  'realize()'.
  Moreover, It has no side effect at all.

 I don't like it, because it makes PCIDeviceClass's init() inconsistent
 with DeviceClass's.

I understand completely.
Thank you for your kind explanation.

Re: [Qemu-devel] [PATCH] 9pfs: changed to use event_notifier instead of qemu_pipe

2014-11-06 Thread SeokYeon Hwang

 -Original Message-
 From: Michael S. Tsirkin [mailto:m...@redhat.com]
 Sent: Thursday, November 06, 2014 6:17 PM
 To: SeokYeon Hwang
 Cc: qemu-devel@nongnu.org; aneesh.ku...@linux.vnet.ibm.com;
 pbonz...@redhat.com; Peter Maydell
 Subject: Re: [PATCH] 9pfs: changed to use event_notifier instead of
 qemu_pipe
 
 On Thu, Nov 06, 2014 at 10:38:34AM +0900, SeokYeon Hwang wrote:
  Please review this patch.
 
  Thanks.
 
 Thanks for the patch!
 Sorry, this patch missed the devel freeze.
 Please resubmit after 2.2 is out.
 

Ok, Thanks.

   -Original Message-
   From: SeokYeon Hwang [mailto:syeon.hw...@samsung.com]
   Sent: Friday, October 31, 2014 5:04 PM
   To: qemu-devel@nongnu.org
   Cc: aneesh.ku...@linux.vnet.ibm.com; SeokYeon Hwang
   Subject: [PATCH] 9pfs: changed to use event_notifier instead of
   qemu_pipe
  
   Changed to use event_notifier instead of qemu_pipe.
   It is necessary for porting 9pfs to Windows and MacOS.
  
   Signed-off-by: SeokYeon Hwang syeon.hw...@samsung.com
   ---
hw/9pfs/virtio-9p-coth.c | 29 +++--
   hw/9pfs/virtio-9p-coth.h |  4 ++--
2 files changed, 9 insertions(+), 24 deletions(-)
  
   diff --git a/hw/9pfs/virtio-9p-coth.c b/hw/9pfs/virtio-9p-coth.c
   index
   ae6cde8..8185c53 100644
   --- a/hw/9pfs/virtio-9p-coth.c
   +++ b/hw/9pfs/virtio-9p-coth.c
   @@ -14,6 +14,7 @@
  
#include fsdev/qemu-fsdev.h
#include qemu/thread.h
   +#include qemu/event_notifier.h
#include block/coroutine.h
#include virtio-9p-coth.h
  
   @@ -26,15 +27,11 @@ void co_run_in_worker_bh(void *opaque)
g_thread_pool_push(v9fs_pool.pool, co, NULL);  }
  
   -static void v9fs_qemu_process_req_done(void *arg)
   +static void v9fs_qemu_process_req_done(EventNotifier *e)
{
   -char byte;
   -ssize_t len;
Coroutine *co;
  
   -do {
   -len = read(v9fs_pool.rfd, byte, sizeof(byte));
   -} while (len == -1   errno == EINTR);
   +event_notifier_test_and_clear(e);
  
while ((co = g_async_queue_try_pop(v9fs_pool.completed)) != NULL)
 {
qemu_coroutine_enter(co, NULL); @@ -43,22 +40,18 @@ static
   void v9fs_qemu_process_req_done(void *arg)
  
static void v9fs_thread_routine(gpointer data, gpointer user_data)  {
   -ssize_t len;
   -char byte = 0;
Coroutine *co = data;
  
qemu_coroutine_enter(co, NULL);
  
g_async_queue_push(v9fs_pool.completed, co);
   -do {
   -len = write(v9fs_pool.wfd, byte, sizeof(byte));
   -} while (len == -1  errno == EINTR);
   +
   +event_notifier_set(v9fs_pool.e);
}
  
int v9fs_init_worker_threads(void)
{
int ret = 0;
   -int notifier_fds[2];
V9fsThPool *p = v9fs_pool;
sigset_t set, oldset;
  
   @@ -66,10 +59,6 @@ int v9fs_init_worker_threads(void)
/* Leave signal handling to the iothread.  */
pthread_sigmask(SIG_SETMASK, set, oldset);
  
   -if (qemu_pipe(notifier_fds) == -1) {
   -ret = -1;
   -goto err_out;
   -}
p-pool = g_thread_pool_new(v9fs_thread_routine, p, -1, FALSE,
 NULL);
if (!p-pool) {
ret = -1;
   @@ -84,13 +73,9 @@ int v9fs_init_worker_threads(void)
ret = -1;
goto err_out;
}
   -p-rfd = notifier_fds[0];
   -p-wfd = notifier_fds[1];
   -
   -fcntl(p-rfd, F_SETFL, O_NONBLOCK);
   -fcntl(p-wfd, F_SETFL, O_NONBLOCK);
   +event_notifier_init(p-e, 0);
  
   -qemu_set_fd_handler(p-rfd, v9fs_qemu_process_req_done, NULL,
 NULL);
   +event_notifier_set_handler(p-e, v9fs_qemu_process_req_done);
err_out:
pthread_sigmask(SIG_SETMASK, oldset, NULL);
return ret;
   diff --git a/hw/9pfs/virtio-9p-coth.h b/hw/9pfs/virtio-9p-coth.h
   index
   86d5ed4..4f51b25 100644
   --- a/hw/9pfs/virtio-9p-coth.h
   +++ b/hw/9pfs/virtio-9p-coth.h
   @@ -21,8 +21,8 @@
#include glib.h
  
typedef struct V9fsThPool {
   -int rfd;
   -int wfd;
   +EventNotifier e;
   +
GThreadPool *pool;
GAsyncQueue *completed;
} V9fsThPool;
   --
   2.1.0

Re: [Qemu-devel] [RFC PATCH] virtio-mmio: support for multiple irqs

2014-11-06 Thread Shannon Zhao

On 2014/11/6 17:34, Michael S. Tsirkin wrote:
 On Tue, Nov 04, 2014 at 05:35:12PM +0800, Shannon Zhao wrote:
 As the current virtio-mmio only support single irq,
 so some advanced features such as vhost-net with irqfd
 are not supported. And the net performance is not
 the best without vhost-net and irqfd supporting.

 This patch support virtio-mmio to request multiple
 irqs like virtio-pci. With this patch and qemu assigning
 multiple irqs for virtio-mmio device, it's ok to use
 vhost-net with irqfd on arm/arm64.

 As arm doesn't support msi-x now, we use GSI for
 multiple irq. In this patch we use vm_try_to_find_vqs
 to check whether multiple irqs are supported like
 virtio-pci.

 Is this the right direction? is there other ways to
 make virtio-mmio support multiple irq? Hope for feedback.
 Thanks.

 Signed-off-by: Shannon Zhao zhaoshengl...@huawei.com
 
 
 So how does guest discover whether host device supports multiple IRQs?

Guest uses vm_try_to_find_vqs to check whether it can get multiple IRQs
like virtio-pci uses vp_try_to_find_vqs. And within function
vm_request_multiple_irqs, guest check whether the number of IRQs host
device gives is equal to the number we want.

for (i = 0; i  nirqs; i++) {
irq = platform_get_irq(vm_dev-pdev, i);
if (irq == -ENXIO)
goto error;
}

If we can't get the expected number of IRQs, return error and this try
fails. Then guest will try two IRQS and single IRQ like virtio-pci.

 Could you please document the new interface?
 E.g. send a patch for virtio spec.

Ok, I'll send it later. Thank you very much :)

Shannon

 I think this really should be controlled by hypervisor, per device.
 I'm also tempted to make this a virtio 1.0 only feature.
 
 
 
 ---
  drivers/virtio/virtio_mmio.c |  234 
 --
  1 files changed, 203 insertions(+), 31 deletions(-)

 diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
 index c600ccf..2b7d935 100644
 --- a/drivers/virtio/virtio_mmio.c
 +++ b/drivers/virtio/virtio_mmio.c
 @@ -122,6 +122,15 @@ struct virtio_mmio_device {
  /* a list of queues so we can dispatch IRQs */
  spinlock_t lock;
  struct list_head virtqueues;
 +
 +/* multiple irq support */
 +int single_irq_enabled;
 +/* Number of available irqs */
 +unsigned num_irqs;
 +/* Used number of irqs */
 +int used_irqs;
 +/* Name strings for interrupts. */
 +char (*vm_vq_names)[256];
  };
  
  struct virtio_mmio_vq_info {
 @@ -229,33 +238,53 @@ static bool vm_notify(struct virtqueue *vq)
  return true;
  }
  
 +/* Handle a configuration change: Tell driver if it wants to know. */
 +static irqreturn_t vm_config_changed(int irq, void *opaque)
 +{
 +struct virtio_mmio_device *vm_dev = opaque;
 +struct virtio_driver *vdrv = container_of(vm_dev-vdev.dev.driver,
 +struct virtio_driver, driver);
 +
 +if (vdrv  vdrv-config_changed)
 +vdrv-config_changed(vm_dev-vdev);
 +return IRQ_HANDLED;
 +}
 +
  /* Notify all virtqueues on an interrupt. */
 -static irqreturn_t vm_interrupt(int irq, void *opaque)
 +static irqreturn_t vm_vring_interrupt(int irq, void *opaque)
  {
  struct virtio_mmio_device *vm_dev = opaque;
  struct virtio_mmio_vq_info *info;
 -struct virtio_driver *vdrv = container_of(vm_dev-vdev.dev.driver,
 -struct virtio_driver, driver);
 -unsigned long status;
 +irqreturn_t ret = IRQ_NONE;
  unsigned long flags;
 +
 +spin_lock_irqsave(vm_dev-lock, flags);
 +list_for_each_entry(info, vm_dev-virtqueues, node) {
 +if (vring_interrupt(irq, info-vq) == IRQ_HANDLED)
 +ret = IRQ_HANDLED;
 +}
 +spin_unlock_irqrestore(vm_dev-lock, flags);
 +
 +return ret;
 +}
 +
 +/* Notify all virtqueues and handle a configuration
 + * change on an interrupt. */
 +static irqreturn_t vm_interrupt(int irq, void *opaque)
 +{
 +struct virtio_mmio_device *vm_dev = opaque;
 +unsigned long status;
  irqreturn_t ret = IRQ_NONE;
  
  /* Read and acknowledge interrupts */
  status = readl(vm_dev-base + VIRTIO_MMIO_INTERRUPT_STATUS);
  writel(status, vm_dev-base + VIRTIO_MMIO_INTERRUPT_ACK);
  
 -if (unlikely(status  VIRTIO_MMIO_INT_CONFIG)
 - vdrv  vdrv-config_changed) {
 -vdrv-config_changed(vm_dev-vdev);
 -ret = IRQ_HANDLED;
 -}
 +if (unlikely(status  VIRTIO_MMIO_INT_CONFIG))
 +return vm_config_changed(irq, opaque);
  
 -if (likely(status  VIRTIO_MMIO_INT_VRING)) {
 -spin_lock_irqsave(vm_dev-lock, flags);
 -list_for_each_entry(info, vm_dev-virtqueues, node)
 -ret |= vring_interrupt(irq, info-vq);
 -spin_unlock_irqrestore(vm_dev-lock, flags);
 -}
 +if (likely(status  VIRTIO_MMIO_INT_VRING))
 +return vm_vring_interrupt(irq, opaque);
  
  return

Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 4/4] target-ppc: Handle ibm, nmi-register RTAS call

2014-11-06 Thread Aravinda Prasad



On Wednesday 05 November 2014 09:16 PM, Tom Musta wrote:
 On 11/5/2014 2:32 AM, Alexander Graf wrote:


 On 05.11.14 08:13, Aravinda Prasad wrote:
 This patch adds FWNMI support in qemu for powerKVM
 guests by handling the ibm,nmi-register rtas call.
 Whenever OS issues ibm,nmi-register RTAS call, the
 machine check notification address is saved and the
 machine check interrupt vector 0x200 is patched to
 issue a private hcall.

 This patch also handles the cases when multi-processors
 experience machine check at or about the same time.
 As per PAPR, subsequent processors serialize waiting
 for the first processor to issue the ibm,nmi-interlock call.
 The second processor retries if the first processor which
 received a machine check is still reading the error log
 and is yet to issue ibm,nmi-interlock call.

 Signed-off-by: Aravinda Prasad aravi...@linux.vnet.ibm.com
 ---
  hw/ppc/spapr_hcall.c|   16 +++
  hw/ppc/spapr_rtas.c |   93 
 +++
  include/hw/ppc/spapr.h  |   17 +++
  pc-bios/spapr-rtas/spapr-rtas.S |   38 
  4 files changed, 163 insertions(+), 1 deletion(-)

 diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
 index 8f16160..eceb5e5 100644
 --- a/hw/ppc/spapr_hcall.c
 +++ b/hw/ppc/spapr_hcall.c
 @@ -97,6 +97,9 @@ struct rtas_mc_log {
  struct rtas_error_log err_log;
  };
  
 +/* Whether machine check handling is in progress by any CPU */
 +bool mc_in_progress;
 +
  static void do_spr_sync(void *arg)
  {
  struct SPRSyncState *s = arg;
 @@ -678,6 +681,19 @@ static target_ulong h_report_mc_err(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
  cpu_synchronize_state(CPU(ppc_env_get_cpu(env)));
  
  /*
 + * Only one VCPU can process machine check NMI at a time. Hence
 + * set the lock mc_in_progress. Once the VCPU finishes processing
 + * NMI, it executes ibm,nmi-interlock and mc_in_progress is unset
 + * in ibm,nmi-interlock handler. Meanwhile if other VCPUs encounter
 + * NMI we return 0 asking the VCPU to retry h_report_mc_err
 + */
 +if (mc_in_progress == 1) {

 Please don't depend on bools being numbers. Use true / false. For if()s,
 just don't use == at all - it makes it more readable.

 +return 0;
 +}
 +
 +mc_in_progress = 1;
 +
 +/*
   * We save the original r3 register in SPRG2 in 0x200 vector,
   * which is patched during call to ibm.nmi-register. Original
   * r3 is required to be included in error log
 diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
 index 2ec2a8e..71c7662 100644
 --- a/hw/ppc/spapr_rtas.c
 +++ b/hw/ppc/spapr_rtas.c
 @@ -36,6 +36,9 @@
  
  #include libfdt.h
  
 +#define BRANCH_INST_MASK  0xFC00
 +extern bool mc_in_progress;

 Please put this into the spapr struct.

 +
  static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment 
 *spapr,
 uint32_t token, uint32_t nargs,
 target_ulong args,
 @@ -290,6 +293,90 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
  rtas_st(rets, 0, ret);
  }
  
 +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
 +  sPAPREnvironment *spapr,
 +  uint32_t token, uint32_t nargs,
 +  target_ulong args,
 +  uint32_t nret, target_ulong rets)
 +{
 +int i;
 +uint32_t ori_inst = 0x6063;
 +uint32_t branch_inst = 0x4802;
 +target_ulong guest_machine_check_addr;
 +uint32_t trampoline[TRAMPOLINE_INSTS];
 +int total_inst = sizeof(trampoline) / sizeof(uint32_t);

 ARRAY_SIZE(trampoline), though I don't quite understand why you need a
 variable that contains the same value as a constant (TRAMPOLINE_INSTS).

 But since you're moving all of those bits into variable fields on the
 rtas blob itself as we discussed in the last version, I guess this code
 will go away anyways ;).

 +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
 +
 +/* Store the system reset and machine check address */
 +guest_machine_check_addr = rtas_ld(args, 1);

 Load or Store? I don't find the comment particularly useful either ;).

 +
 +/*
 + * Read the trampoline instructions from RTAS Blob and patch
 + * the KVMPPC_H_REPORT_MC_ERR hcall number and the guest
 + * machine check address before copying to 0x200 vector
 + */
 +cpu_physical_memory_read(spapr-rtas_addr + RTAS_TRAMPOLINE_OFFSET,
 + trampoline, sizeof(trampoline));
 +
 +/* Safety Check */

 Same for this comment.

 +QEMU_BUILD_BUG_ON(sizeof(trampoline)  MC_INTERRUPT_VECTOR_SIZE);
 +
 +/* Update the KVMPPC_H_REPORT_MC_ERR value in trampoline */
 +ori_inst |= KVMPPC_H_REPORT_MC_ERR;
 +memcpy(trampoline[TRAMPOLINE_ORI_INST_INDEX], ori_inst,
 +sizeof(ori_inst));

 Why memcpy a u32 into a u32 array?
 
 Additionally,

Re: [Qemu-devel] [PATCH v3] qemu-log: add log category for MMU info

2014-11-06 Thread Alexander Graf




 Am 06.11.2014 um 09:11 schrieb Antony Pavlov antonynpav...@gmail.com:
 
 
 Running barebox on qemu-system-mips* with '-d unimp' overloads
 stderr by very very many mips_cpu_handle_mmu_fault() messages:
 
  mips_cpu_handle_mmu_fault address=b80003fd ret 0 physical 180003fd 
 prot 3
  mips_cpu_handle_mmu_fault address=a0800884 ret 0 physical 00800884 
 prot 3
  mips_cpu_handle_mmu_fault pc a080cd80 ad b80003fd rw 0 mmu_idx 0
 
 So it's very difficult to find LOG_UNIMP message.
 
 The mips_cpu_handle_mmu_fault() messages appears on enabling ANY
 logging! It's not very handy.
 
 Adding separate log category for *_cpu_handle_mmu_fault()
 logging fixes the problem.
 
 Signed-off-by: Antony Pavlov address@hidden
 
 Have you benchmarked the performance delta with this patch applied? Just
 boot up a random small PPC guest that shuts down immediately and time
 it with and without the patch applied.
 
 Here is my simple benchmark.
 
 I have used buildroot with qemu_ppc64_pseries_defconfig configuration.
 
 After successfull rootfs image build I patched inittab for halting-after-boot:
 
 --- a/output/images/rootfs.ext2/etc/inittab   2014-11-06 
 10:21:25.024198993 +0300
 +++ b/output/images/rootfs.ext2/etc/inittab   2014-11-06 
 10:20:57.089421643 +0300
 @@ -23,10 +23,11 @@
 # now run any rc scripts
 ::sysinit:/etc/init.d/rcS
 
 -::askfirst:-/bin/sh
 +::sysinit:/sbin/halt
 +#::askfirst:-/bin/sh
 
 # Put a getty on the serial port
 -hvc0::respawn:/sbin/getty -L  hvc0 115200 vt100 # GENERIC_SERIAL
 +#hvc0::respawn:/sbin/getty -L  hvc0 115200 vt100 # GENERIC_SERIAL
 
 # Stuff to do for the 3-finger salute
 ::ctrlaltdel:/sbin/reboot
 
 
 Here is my qemu cmdline:
 
 buildroot$ time ~/qemu.git/ppc64-softmmu/qemu-system-ppc64 -M pseries -cpu 
 POWER7 -m 256 -kernel output/images/vmlinux -append 'console=hvc0 
 root=/dev/sda' -drive file=output/images/rootfs.ext2,if=scsi,index=0 -serial 
 stdio  -nographic -monitor none
 
 
 I use my 'not-very-busy' AMD Opteron 6176 SE-based server for testing.
 
 Three 'time' command outputs; qemu with qemu-log: add log category for MMU 
 info patch:
 
real0m39.744s
user0m36.940s
sys 0m1.216s
 
real0m39.552s
user0m37.200s
sys 0m0.924s
 
real0m39.585s
user0m37.340s
sys 0m0.704s
 
 
 Three 'time' command outputs; qemu without qemu-log: add log category for 
 MMU info patch:
 
real0m39.732s
user0m37.484s
sys 0m0.756s
 
real0m40.077s
user0m37.920s
sys 0m0.744s
 
real0m39.766s
user0m37.304s
sys 0m1.032s
 
 
 So the performance delta is less than experimental error.

Ok, works for me then :).

Ackef-by: Alexander Graf ag...@suse.de

Alex

Re: [Qemu-devel] [RFC PATCH 6/8] generic_pci: generate dt node after devices init

2014-11-06 Thread alvise rigo

On Wed, Nov 5, 2014 at 1:26 PM, Claudio Fontana
claudio.font...@huawei.com wrote:
 On 11.07.2014 09:21, Alvise Rigo wrote:
 Keeping advantage of the finalize_dt QEMUMachine function, the mach-virt
 machine now completes the device tree creation after that all the
 generic devices have been instantiated. This allows to generate the
 interrupt-map node according to the devices attached to the PCI bus.
 These devices can be specified either as command line argument (like
 -device lsi53c895a addr=0x5 ...) or explicitly inside the machine
 definition with the pci_create_simple method.

 Fill also the generic_pci state with the offsets and sizes of the memory
 regions needed by the PCI system. The offset in the machine address
 space of these regions (config, IO and memory) is specified by the
 mach-virt platform.

 If this is mach-virt specific, why is this called generic_pci..?
 Is it generic ARM pci?

The overall implementation shouldn't be mach-virt specific, but at the
moment I don't see any other use case for it.
Maybe in the future this controller could come in handy to support
other ARM platforms with PCI support.



 TODO:
 - Part of the ranges device node is still hardcoded.

 Also this would benefit from extensive commenting.
 Especially the exact meaning of the interrupt mapping, and if it is ARM 
 specific,
 how the interrupts end up being mapped to gic irqs.
 [see below]


 Signed-off-by: Alvise Rigo a.r...@virtualopensystems.com
 ---
  hw/arm/virt.c | 80 +++--
  hw/pci-host/generic-pci.c | 94 
 +++
  include/hw/pci-host/pci_generic.h | 20 +
  3 files changed, 160 insertions(+), 34 deletions(-)

 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index a433902..e182282 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
 @@ -358,44 +358,35 @@ static void create_pci_host(const VirtBoardInfo *vbi, 
 qemu_irq *pic)
  PCIBus *pci_bus;
  DeviceState *dev;
  SysBusDevice *busdev;
 -uint32_t gic_phandle;
 -char *nodename;
 -int i;
 +PCIVPBState *ps;
 +int i, count = 0;
  hwaddr cfg_base = vbi-memmap[VIRT_PCI_CFG].base;
 -hwaddr cfg_size = vbi-memmap[VIRT_PCI_CFG].size;
  hwaddr io_base = vbi-memmap[VIRT_PCI_IO].base;
 -hwaddr io_size = vbi-memmap[VIRT_PCI_IO].size;
  hwaddr mem_base = vbi-memmap[VIRT_PCI_MEM].base;
 -hwaddr mem_size = vbi-memmap[VIRT_PCI_MEM].size;
 -
 -nodename = g_strdup_printf(/pci@% PRIx64, cfg_base);
 -qemu_fdt_add_subnode(vbi-fdt, nodename);
 -qemu_fdt_setprop_string(vbi-fdt, nodename, compatible,
 -pci-host-cam-generic);
 -qemu_fdt_setprop_string(vbi-fdt, nodename, device_type, pci);
 -qemu_fdt_setprop_cell(vbi-fdt, nodename, #address-cells, 0x3);
 -qemu_fdt_setprop_cell(vbi-fdt, nodename, #size-cells, 0x2);
 -qemu_fdt_setprop_cell(vbi-fdt, nodename, #interrupt-cells, 0x1);
 -
 -qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, reg, 2, cfg_base,
 -   2, cfg_size);
 -
 -qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, ranges,
 -1, 0x0100, 2, 0x, 2, io_base, 2, io_size,
 -1, 0x0200, 2, 0x1200, 2, mem_size, 2, mem_size);
 -
 -gic_phandle = qemu_fdt_get_phandle(vbi-fdt, /intc);
 -qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, interrupt-map-mask,
 -1, 0xf800, 1, 0x0, 1, 0x0, 1, 0x7);
 -qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, interrupt-map,
 -1, 0x, 2, 0x, 1, 0x1, 1, gic_phandle, 1, 0, 1, 0x4, 1, 
 0x1,
 -1, 0x0800, 2, 0x, 1, 0x1, 1, gic_phandle, 1, 0, 1, 0x5, 1, 
 0x1,
 -1, 0x1000, 2, 0x, 1, 0x1, 1, gic_phandle, 1, 0, 1, 0x6, 1, 
 0x1,
 -1, 0x1800, 2, 0x, 1, 0x1, 1, gic_phandle, 1, 0, 1, 0x7, 1, 
 0x1);

  dev = qdev_create(NULL, generic_pci);
  busdev = SYS_BUS_DEVICE(dev);
 +ps = PCI_GEN(dev);
 +
 +/* Set the mapping data in the device structure where:
 + * ptr[i] = base address
 + * ptr[i+1] = size
 + * i = 0 = config
 + * i = 2 = I/O
 + * i = 4 = memory
 + * These values are needed by the specific device code.
 + * */
 +hwaddr *ptr = g_malloc(6 * sizeof(hwaddr));
 +for (i = VIRT_PCI_CFG; i = VIRT_PCI_MEM; i++) {
 +ptr[count++] = vbi-memmap[i].base;
 +ptr[count++] = vbi-memmap[i].size;
 +}
 +ps-dt_data.fdt = vbi-fdt;
 +ps-dt_data.irq_base = vbi-irqmap[VIRT_PCI_CFG];
 +ps-dt_data.addr_mapping = ptr;
 +
  qdev_init_nofail(dev);
 +
  sysbus_mmio_map(busdev, 0, cfg_base); /* PCI config */
  sysbus_mmio_map(busdev, 1, io_base);  /* PCI I/O */
  sysbus_mmio_map(busdev, 2, mem_base); /* PCI memory window */
 @@ -407,8 +398,6 @@ static void create_pci_host(const VirtBoardInfo *vbi, 
 qemu_irq *pic)
  pci_bus = (PCIBus *)qdev_get_child_bus(dev, pci);
  pci_create_simple(pci_bus, -1, pci-ohci);

Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 4/4] target-ppc: Handle ibm, nmi-register RTAS call

2014-11-06 Thread Alexander Graf




 Am 06.11.2014 um 11:00 schrieb Aravinda Prasad aravi...@linux.vnet.ibm.com:
 
 
 
 On Wednesday 05 November 2014 09:16 PM, Tom Musta wrote:
 On 11/5/2014 2:32 AM, Alexander Graf wrote:
 
 
 On 05.11.14 08:13, Aravinda Prasad wrote:
 This patch adds FWNMI support in qemu for powerKVM
 guests by handling the ibm,nmi-register rtas call.
 Whenever OS issues ibm,nmi-register RTAS call, the
 machine check notification address is saved and the
 machine check interrupt vector 0x200 is patched to
 issue a private hcall.
 
 This patch also handles the cases when multi-processors
 experience machine check at or about the same time.
 As per PAPR, subsequent processors serialize waiting
 for the first processor to issue the ibm,nmi-interlock call.
 The second processor retries if the first processor which
 received a machine check is still reading the error log
 and is yet to issue ibm,nmi-interlock call.
 
 Signed-off-by: Aravinda Prasad aravi...@linux.vnet.ibm.com
 ---
 hw/ppc/spapr_hcall.c|   16 +++
 hw/ppc/spapr_rtas.c |   93 
 +++
 include/hw/ppc/spapr.h  |   17 +++
 pc-bios/spapr-rtas/spapr-rtas.S |   38 
 4 files changed, 163 insertions(+), 1 deletion(-)
 
 diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
 index 8f16160..eceb5e5 100644
 --- a/hw/ppc/spapr_hcall.c
 +++ b/hw/ppc/spapr_hcall.c
 @@ -97,6 +97,9 @@ struct rtas_mc_log {
 struct rtas_error_log err_log;
 };
 
 +/* Whether machine check handling is in progress by any CPU */
 +bool mc_in_progress;
 +
 static void do_spr_sync(void *arg)
 {
 struct SPRSyncState *s = arg;
 @@ -678,6 +681,19 @@ static target_ulong h_report_mc_err(PowerPCCPU *cpu, 
 sPAPREnvironment *spapr,
 cpu_synchronize_state(CPU(ppc_env_get_cpu(env)));
 
 /*
 + * Only one VCPU can process machine check NMI at a time. Hence
 + * set the lock mc_in_progress. Once the VCPU finishes processing
 + * NMI, it executes ibm,nmi-interlock and mc_in_progress is unset
 + * in ibm,nmi-interlock handler. Meanwhile if other VCPUs encounter
 + * NMI we return 0 asking the VCPU to retry h_report_mc_err
 + */
 +if (mc_in_progress == 1) {
 
 Please don't depend on bools being numbers. Use true / false. For if()s,
 just don't use == at all - it makes it more readable.
 
 +return 0;
 +}
 +
 +mc_in_progress = 1;
 +
 +/*
  * We save the original r3 register in SPRG2 in 0x200 vector,
  * which is patched during call to ibm.nmi-register. Original
  * r3 is required to be included in error log
 diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
 index 2ec2a8e..71c7662 100644
 --- a/hw/ppc/spapr_rtas.c
 +++ b/hw/ppc/spapr_rtas.c
 @@ -36,6 +36,9 @@
 
 #include libfdt.h
 
 +#define BRANCH_INST_MASK  0xFC00
 +extern bool mc_in_progress;
 
 Please put this into the spapr struct.
 
 +
 static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment 
 *spapr,
uint32_t token, uint32_t nargs,
target_ulong args,
 @@ -290,6 +293,90 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
 rtas_st(rets, 0, ret);
 }
 
 +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
 +  sPAPREnvironment *spapr,
 +  uint32_t token, uint32_t nargs,
 +  target_ulong args,
 +  uint32_t nret, target_ulong rets)
 +{
 +int i;
 +uint32_t ori_inst = 0x6063;
 +uint32_t branch_inst = 0x4802;
 +target_ulong guest_machine_check_addr;
 +uint32_t trampoline[TRAMPOLINE_INSTS];
 +int total_inst = sizeof(trampoline) / sizeof(uint32_t);
 
 ARRAY_SIZE(trampoline), though I don't quite understand why you need a
 variable that contains the same value as a constant (TRAMPOLINE_INSTS).
 
 But since you're moving all of those bits into variable fields on the
 rtas blob itself as we discussed in the last version, I guess this code
 will go away anyways ;).
 
 +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
 +
 +/* Store the system reset and machine check address */
 +guest_machine_check_addr = rtas_ld(args, 1);
 
 Load or Store? I don't find the comment particularly useful either ;).
 
 +
 +/*
 + * Read the trampoline instructions from RTAS Blob and patch
 + * the KVMPPC_H_REPORT_MC_ERR hcall number and the guest
 + * machine check address before copying to 0x200 vector
 + */
 +cpu_physical_memory_read(spapr-rtas_addr + RTAS_TRAMPOLINE_OFFSET,
 + trampoline, sizeof(trampoline));
 +
 +/* Safety Check */
 
 Same for this comment.
 
 +QEMU_BUILD_BUG_ON(sizeof(trampoline)  MC_INTERRUPT_VECTOR_SIZE);
 +
 +/* Update the KVMPPC_H_REPORT_MC_ERR value in trampoline */
 +ori_inst |= KVMPPC_H_REPORT_MC_ERR;
 +memcpy(trampoline[TRAMPOLINE_ORI_INST_INDEX], ori_inst,
 +

[Qemu-devel] [PATCH] target-mips: fix for missing delay slot in BC1EQZ and BC1NEZ

2014-11-06 Thread Leon Alrae

New R6 COP1 conditional branches currently don't have delay slot. Fixing this
by setting MIPS_HFLAG_BDS32 flag which is required for branches having 4-byte
delay slot.

Signed-off-by: Leon Alrae leon.al...@imgtec.com
---
 target-mips/translate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 2117ce8..e83c50a 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -8104,6 +8104,7 @@ static void gen_compute_branch1_r6(DisasContext *ctx, 
uint32_t op,
 MIPS_DEBUG(%s: cond %02x target  TARGET_FMT_lx, opn,
ctx-hflags, btarget);
 ctx-btarget = btarget;
+ctx-hflags |= MIPS_HFLAG_BDS32;
 
 out:
 tcg_temp_free_i64(t0);
-- 
2.1.0

Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 4/4] target-ppc: Handle ibm, nmi-register RTAS call

2014-11-06 Thread Aravinda Prasad



On Thursday 06 November 2014 03:59 PM, Alexander Graf wrote:
 
 
 
 Am 06.11.2014 um 11:00 schrieb Aravinda Prasad aravi...@linux.vnet.ibm.com:




[...]


 And, perhaps this was discussed in an earlier patch, but couldn't you just 
 do:

li 3,KVMPPC_H_REPORT_MC_ERR

 here and avoid the patching altogether?

 KVMPPC_H_REPORT_MC_ERR def in not visible in spapr-rtas.S, either I can
 define it in spapr-rtas.S as already done for KVMPPC_H_RTAS or patch it
 in ibm,nmi-register call.
 
 Could you include the header?

hmm. ok.

 

 It is very unlikely that the KVMPPC_H_REPORT_MC_ERR will be changed, but
 I prefer to patch it to avoid maintaining it in both places. What do you
 think?
 
 Hypercall numbers need to be stable anyway in case we migrate from an older 
 qemu version, so it must not change.

ok

 
 
 Alex
 




 +sc  1   /* Issue H_CALL */
 +cmpdi   cr0,3,0
 +beq cr0,1b  /* retry KVMPPC_H_REPORT_MC_ERR */
 +mtsprg  2,4
 +ld  4,0(3)
 +mtsrr0  4   /* Restore srr0 */
 +ld  4,8(3)
 +mtsrr1  4   /* Restore srr1 */
 +ld  4,16(3)
 +mtcrf   0,4 /* Restore cr */
 +addi3,3,24
 +mfsprg  4,2
 +/*
 + * Branch to address registered by OS. The branch address is
 + * patched in the ibm,nmi-register rtas call.
 + */
 +ba  0x0
 +b   .

 -- 
 Regards,
 Aravinda

 

-- 
Regards,
Aravinda

Re: [Qemu-devel] [PATCH] target-mips: fix for missing delay slot in BC1EQZ and BC1NEZ

2014-11-06 Thread Yongbok Kim


On 06/11/2014 10:29, Leon Alrae wrote:

New R6 COP1 conditional branches currently don't have delay slot. Fixing this
by setting MIPS_HFLAG_BDS32 flag which is required for branches having 4-byte
delay slot.

Signed-off-by: Leon Alrae leon.al...@imgtec.com
---
  target-mips/translate.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 2117ce8..e83c50a 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -8104,6 +8104,7 @@ static void gen_compute_branch1_r6(DisasContext *ctx, 
uint32_t op,
  MIPS_DEBUG(%s: cond %02x target  TARGET_FMT_lx, opn,
 ctx-hflags, btarget);
  ctx-btarget = btarget;
+ctx-hflags |= MIPS_HFLAG_BDS32;
  
  out:

  tcg_temp_free_i64(t0);


Reviewed-by: Yongbok Kim yongbok@imgtec.com

Regards,
Yongbok Kim

Re: [Qemu-devel] [RFC PATCH] virtio-mmio: support for multiple irqs

2014-11-06 Thread Michael S. Tsirkin

On Thu, Nov 06, 2014 at 05:54:54PM +0800, Shannon Zhao wrote:
 On 2014/11/6 17:34, Michael S. Tsirkin wrote:
  On Tue, Nov 04, 2014 at 05:35:12PM +0800, Shannon Zhao wrote:
  As the current virtio-mmio only support single irq,
  so some advanced features such as vhost-net with irqfd
  are not supported. And the net performance is not
  the best without vhost-net and irqfd supporting.
 
  This patch support virtio-mmio to request multiple
  irqs like virtio-pci. With this patch and qemu assigning
  multiple irqs for virtio-mmio device, it's ok to use
  vhost-net with irqfd on arm/arm64.
 
  As arm doesn't support msi-x now, we use GSI for
  multiple irq. In this patch we use vm_try_to_find_vqs
  to check whether multiple irqs are supported like
  virtio-pci.
 
  Is this the right direction? is there other ways to
  make virtio-mmio support multiple irq? Hope for feedback.
  Thanks.
 
  Signed-off-by: Shannon Zhao zhaoshengl...@huawei.com
  
  
  So how does guest discover whether host device supports multiple IRQs?
 
 Guest uses vm_try_to_find_vqs to check whether it can get multiple IRQs
 like virtio-pci uses vp_try_to_find_vqs. And within function
 vm_request_multiple_irqs, guest check whether the number of IRQs host
 device gives is equal to the number we want.

OK but how does host specify the number of IRQs for a device?
for pci this is done through the MSI-X capability register.

   for (i = 0; i  nirqs; i++) {
   irq = platform_get_irq(vm_dev-pdev, i);
   if (irq == -ENXIO)
   goto error;
   }
 
 If we can't get the expected number of IRQs, return error and this try
 fails. Then guest will try two IRQS and single IRQ like virtio-pci.
 
  Could you please document the new interface?
  E.g. send a patch for virtio spec.
 
 Ok, I'll send it later. Thank you very much :)
 
 Shannon
 
  I think this really should be controlled by hypervisor, per device.
  I'm also tempted to make this a virtio 1.0 only feature.
  
  
  
  ---
   drivers/virtio/virtio_mmio.c |  234 
  --
   1 files changed, 203 insertions(+), 31 deletions(-)
 
  diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
  index c600ccf..2b7d935 100644
  --- a/drivers/virtio/virtio_mmio.c
  +++ b/drivers/virtio/virtio_mmio.c
  @@ -122,6 +122,15 @@ struct virtio_mmio_device {
 /* a list of queues so we can dispatch IRQs */
 spinlock_t lock;
 struct list_head virtqueues;
  +
  +  /* multiple irq support */
  +  int single_irq_enabled;
  +  /* Number of available irqs */
  +  unsigned num_irqs;
  +  /* Used number of irqs */
  +  int used_irqs;
  +  /* Name strings for interrupts. */
  +  char (*vm_vq_names)[256];
   };
   
   struct virtio_mmio_vq_info {
  @@ -229,33 +238,53 @@ static bool vm_notify(struct virtqueue *vq)
 return true;
   }
   
  +/* Handle a configuration change: Tell driver if it wants to know. */
  +static irqreturn_t vm_config_changed(int irq, void *opaque)
  +{
  +  struct virtio_mmio_device *vm_dev = opaque;
  +  struct virtio_driver *vdrv = container_of(vm_dev-vdev.dev.driver,
  +  struct virtio_driver, driver);
  +
  +  if (vdrv  vdrv-config_changed)
  +  vdrv-config_changed(vm_dev-vdev);
  +  return IRQ_HANDLED;
  +}
  +
   /* Notify all virtqueues on an interrupt. */
  -static irqreturn_t vm_interrupt(int irq, void *opaque)
  +static irqreturn_t vm_vring_interrupt(int irq, void *opaque)
   {
 struct virtio_mmio_device *vm_dev = opaque;
 struct virtio_mmio_vq_info *info;
  -  struct virtio_driver *vdrv = container_of(vm_dev-vdev.dev.driver,
  -  struct virtio_driver, driver);
  -  unsigned long status;
  +  irqreturn_t ret = IRQ_NONE;
 unsigned long flags;
  +
  +  spin_lock_irqsave(vm_dev-lock, flags);
  +  list_for_each_entry(info, vm_dev-virtqueues, node) {
  +  if (vring_interrupt(irq, info-vq) == IRQ_HANDLED)
  +  ret = IRQ_HANDLED;
  +  }
  +  spin_unlock_irqrestore(vm_dev-lock, flags);
  +
  +  return ret;
  +}
  +
  +/* Notify all virtqueues and handle a configuration
  + * change on an interrupt. */
  +static irqreturn_t vm_interrupt(int irq, void *opaque)
  +{
  +  struct virtio_mmio_device *vm_dev = opaque;
  +  unsigned long status;
 irqreturn_t ret = IRQ_NONE;
   
 /* Read and acknowledge interrupts */
 status = readl(vm_dev-base + VIRTIO_MMIO_INTERRUPT_STATUS);
 writel(status, vm_dev-base + VIRTIO_MMIO_INTERRUPT_ACK);
   
  -  if (unlikely(status  VIRTIO_MMIO_INT_CONFIG)
  -   vdrv  vdrv-config_changed) {
  -  vdrv-config_changed(vm_dev-vdev);
  -  ret = IRQ_HANDLED;
  -  }
  +  if (unlikely(status  VIRTIO_MMIO_INT_CONFIG))
  +  return vm_config_changed(irq, opaque);
   
  -  if (likely(status  VIRTIO_MMIO_INT_VRING)) {
  -  spin_lock_irqsave(vm_dev-lock, flags);
  -  list_for_each_entry(info, vm_dev-virtqueues, node)
  -  ret |=

[Qemu-devel] [PATCH v2 0/2] migration: Add a new feature to do live migration

2014-11-06 Thread Li Liang

This feature can help to reduce the data transferred about 60%, and the
migration time can also be reduced about 70%.

Summary of changed from v1-v2

-Changed the decompression thread limit from 64 to 255
-Fixed some spelling mistake
-Added test result to the document
-Fixed the version mistake in qapi-schema.json
-Added the document of the 'compress' flag
-Rebased the series to proposed the document first
-Fixed comment

-- 
1.9.1

[Qemu-devel] [v2 1/2] docs: Add a doc about multiple compression threads

2014-11-06 Thread Li Liang

Give some details about the multiple compression threads and how
to use it in live migration.

Signed-off-by: Li Liang liang.z...@intel.com
---
 docs/multiple-compression-threads.txt | 128 ++
 1 file changed, 128 insertions(+)
 create mode 100644 docs/multiple-compression-threads.txt

diff --git a/docs/multiple-compression-threads.txt 
b/docs/multiple-compression-threads.txt
new file mode 100644
index 000..a5e53de
--- /dev/null
+++ b/docs/multiple-compression-threads.txt
@@ -0,0 +1,128 @@
+Use multiple (de)compression threads in live migration
+=
+Copyright (C) 2014 Li Liang liang.z...@intel.com
+
+
+Contents:
+=
+* Introduction
+* When to use
+* Performance
+* Usage
+* TODO
+
+Introduction
+
+Instead of sending the guest memory directly, this solution will
+compress the ram page before sending, after receiving, the data will
+be decompressed. Using compression in live migration can help
+to reduce the data transferred about 60%, this is very useful when the
+bandwidth is limited, and the migration time can also be reduced about
+70% in a typical case.
+
+The process of compression will consume additional CPU cycles, and the
+extra CPU cycles will increase the migration time. On the other hand,
+the amount of data transferred will reduced, this factor can reduce
+the migration time. If the process of the compression is quick
+enough, then the total migration time can be reduced, and multiple
+compression threads can be used to accelerate the compression process.
+
+The decompression speed of zlib is at least 4 times as quickly as
+compression, if the source and destination CPU have equal speed,
+keeping the compression thread count 4 times the decompression
+thread count can avoid CPU waste.
+
+Compression level can be used to control the compression speed and the
+compression ratio. High compression ratio will take more time, level 0
+stands for no compression, level 1 stands for the best compression
+speed, and level 9 stands for the best compression ratio. Users can
+select a level number between 0 and 9.
+
+
+When to use the multiple compression threads in live migration
+==
+Compression of data will consume lot of extra CPU cycles, in a system
+with high overhead of CPU, avoid using this feature. When the network
+bandwidth is very limited and the CPU resource is adequate, use the
+multiple compression threads will be very helpful. If both the CPU and
+the network bandwidth are adequate, use multiple compression threads
+can still help to reduce the migration time.
+
+Performance
+===
+Test environment:
+
+CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
+Socket Count: 2
+Ram: 128G
+NIC: Intel I350 (10/100/1000Mbps)
+Host OS: CentOS 7 64-bit
+Guest OS: Ubuntu 12.10 64-bit
+Parameter: qemu-system-x86_64 -enable-kvm -m 1024
+ /share/ia32e_ubuntu12.10.img -monitor stdio
+
+There is no additional application is running on the guest when doing
+the test.
+
+
+Speed limit: 32MB/s
+---
+| original  | compress thread: 8
+|   way | decompress thread: 2
+|   | compression level: 1
+---
+total time(msec):   |  26561|  7920
+---
+transferred ram(kB):|  877054   | 260641
+---
+throughput(mbps):   |  270.53   | 269.68
+---
+total ram(kB):  |  1057604  | 1057604
+---
+
+
+Speed limit: No
+---
+| original  | compress thread: 15
+|   way | decompress thread: 4
+|   | compression level: 1
+---
+total time(msec):   |  7611 |  2888
+---
+transferred ram(kB):|  876761   | 262301
+---
+throughput(mbps):   |  943.78   | 744.27
+---
+total ram(kB):  |  1057604  | 1057604
+---
+
+Usage
+==
+1. Verify the destination QEMU version is able to support the multiple
+compression threads migration:
+{qemu} info_migrate_capablilites
+{qemu} ... compress: off ...
+
+2. Activate compression on the souce:
+{qemu} migrate_set_capability compress on
+
+3. Set the compression thread count on source:
+{qemu} migrate_set_compress_threads 10
+
+4. Set the compression level on

[Qemu-devel] [v2 2/2] migration: Implement multiple compression threads

2014-11-06 Thread Li Liang

Instead of sending the guest memory directly, this solution compress
the ram page before sending, after receiving, the data will be
decompressed.
This feature can help to reduce the data transferred about
60%, this is very useful when the network bandwidth is limited,
and the migration time can also be reduced about 70%. The
feature is off by default, following the document
docs/multiple-compression-threads.txt for information to use it.

Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Li Liang liang.z...@intel.com
---
 arch_init.c   | 435 --
 hmp-commands.hx   |  56 ++
 hmp.c |  57 ++
 hmp.h |   6 +
 include/migration/migration.h |  12 +-
 include/migration/qemu-file.h |   1 +
 migration.c   |  99 ++
 monitor.c |  21 ++
 qapi-schema.json  |  88 -
 qmp-commands.hx   | 131 +
 10 files changed, 890 insertions(+), 16 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 88a5ba0..a27d87b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -24,6 +24,7 @@
 #include stdint.h
 #include stdarg.h
 #include stdlib.h
+#include zlib.h
 #ifndef _WIN32
 #include sys/types.h
 #include sys/mman.h
@@ -126,6 +127,7 @@ static uint64_t bitmap_sync_count;
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 #define RAM_SAVE_FLAG_XBZRLE   0x40
 /* 0x80 is reserved in migration.h start with 0x100 next */
+#define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
 
 static struct defconfig_file {
 const char *filename;
@@ -332,6 +334,177 @@ static uint64_t migration_dirty_pages;
 static uint32_t last_version;
 static bool ram_bulk_stage;
 
+#define COMPRESS_BUF_SIZE (TARGET_PAGE_SIZE + 16)
+#define MIG_BUF_SIZE (COMPRESS_BUF_SIZE + 256 + 16)
+struct MigBuf {
+int buf_index;
+uint8_t buf[MIG_BUF_SIZE];
+};
+
+typedef struct MigBuf MigBuf;
+
+static void migrate_put_byte(MigBuf *f, int v)
+{
+f-buf[f-buf_index] = v;
+f-buf_index++;
+}
+
+static void migrate_put_be16(MigBuf *f, unsigned int v)
+{
+migrate_put_byte(f, v  8);
+migrate_put_byte(f, v);
+}
+
+static void migrate_put_be32(MigBuf *f, unsigned int v)
+{
+migrate_put_byte(f, v  24);
+migrate_put_byte(f, v  16);
+migrate_put_byte(f, v  8);
+migrate_put_byte(f, v);
+}
+
+static void migrate_put_be64(MigBuf *f, uint64_t v)
+{
+migrate_put_be32(f, v  32);
+migrate_put_be32(f, v);
+}
+
+static void migrate_put_buffer(MigBuf *f, const uint8_t *buf, int size)
+{
+int l;
+
+while (size  0) {
+l = MIG_BUF_SIZE - f-buf_index;
+if (l  size) {
+l = size;
+}
+memcpy(f-buf + f-buf_index, buf, l);
+f-buf_index += l;
+buf += l;
+size -= l;
+}
+}
+
+static size_t migrate_save_block_hdr(MigBuf *f, RAMBlock *block,
+ram_addr_t offset, int cont, int flag)
+{
+size_t size;
+
+migrate_put_be64(f, offset | cont | flag);
+size = 8;
+
+if (!cont) {
+migrate_put_byte(f, strlen(block-idstr));
+migrate_put_buffer(f, (uint8_t *)block-idstr,
+strlen(block-idstr));
+size += 1 + strlen(block-idstr);
+}
+return size;
+}
+
+static int migrate_qemu_add_compress(MigBuf *f,  const uint8_t *p,
+int size, int level)
+{
+uLong  blen = COMPRESS_BUF_SIZE;
+if (compress2(f-buf + f-buf_index + sizeof(int), blen, (Bytef *)p,
+size, level) != Z_OK) {
+error_report(Compress Failed!\n);
+return 0;
+}
+migrate_put_be32(f, blen);
+f-buf_index += blen;
+return blen + sizeof(int);
+}
+
+enum {
+COM_DONE = 0,
+COM_START,
+};
+
+static int  compress_thread_count;
+static int  decompress_thread_count;
+
+struct compress_param {
+int state;
+MigBuf migbuf;
+RAMBlock *block;
+ram_addr_t offset;
+bool last_stage;
+int ret;
+int bytes_sent;
+uint8_t *p;
+int cont;
+bool bulk_stage;
+};
+
+typedef struct compress_param compress_param;
+compress_param *comp_param;
+
+struct decompress_param {
+int state;
+void *des;
+uint8 compbuf[COMPRESS_BUF_SIZE];
+int len;
+};
+typedef struct decompress_param decompress_param;
+
+static decompress_param *decomp_param;
+bool incomming_migration_done;
+static bool quit_thread;
+
+static int save_compress_ram_page(compress_param *param);
+
+
+static void *do_data_compress(void *opaque)
+{
+compress_param *param = opaque;
+while (!quit_thread) {
+if (param-state == COM_START) {
+save_compress_ram_page(param);
+param-state = COM_DONE;
+ } else {
+ g_usleep(1);
+ }
+}
+
+return NULL;
+}
+
+
+void migrate_compress_threads_join(MigrationState *s)
+{
+int i;
+if (!migrate_use_compress()) {
+return;
+}
+quit_thread = true;
+for (i = 0; i  compress_thread_count; i++) {
+

[Qemu-devel] [PATCH 1/2] virtio: serial: expose a 'guest_writable' callback for users

2014-11-06 Thread Amit Shah

Users of virtio-serial may want to know when a port becomes writable.  A
port can stop accepting writes if the guest port is open but not being
read from.  In this case, data gets queued up in the virtqueue, and
after the vq is full, writes to the port do not succeed.

When the guest reads off a vq element, and adds a new one for the host
to put data in, we can tell users the port is available for more writes,
via the new -guest_writable() callback.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/char/virtio-serial-bus.c   | 31 +++
 include/hw/virtio/virtio-serial.h | 11 +++
 2 files changed, 42 insertions(+)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index c6870f1..bea7a17 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -465,6 +465,37 @@ static void handle_output(VirtIODevice *vdev, VirtQueue 
*vq)
 
 static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
 {
+/*
+ * Users of virtio-serial would like to know when guest becomes
+ * writable again -- i.e. if a vq had stuff queued up and the
+ * guest wasn't reading at all, the host would not be able to
+ * write to the vq anymore.  Once the guest reads off something,
+ * we can start queueing things up again.  However, this call is
+ * made for each buffer addition by the guest -- even though free
+ * buffers existed prior to the current buffer addition.  This is
+ * done so as not to maintain previous state, which will need
+ * additional live-migration-related changes.
+ */
+VirtIOSerial *vser;
+VirtIOSerialPort *port;
+VirtIOSerialPortClass *vsc;
+
+vser = VIRTIO_SERIAL(vdev);
+port = find_port_by_vq(vser, vq);
+
+if (!port) {
+return;
+}
+vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
+
+/*
+ * If guest_connected is false, this call is being made by the
+ * early-boot queueing up of descriptors, which is just noise for
+ * the host apps -- don't disturb them in that case.
+ */
+if (port-guest_connected  port-host_connected  vsc-guest_writable) {
+vsc-guest_writable(port);
+}
 }
 
 static uint32_t get_features(VirtIODevice *vdev, uint32_t features)
diff --git a/include/hw/virtio/virtio-serial.h 
b/include/hw/virtio/virtio-serial.h
index a679e54..6c7b3b8 100644
--- a/include/hw/virtio/virtio-serial.h
+++ b/include/hw/virtio/virtio-serial.h
@@ -98,6 +98,17 @@ typedef struct VirtIOSerialPortClass {
 /* Guest is now ready to accept data (virtqueues set up). */
 void (*guest_ready)(VirtIOSerialPort *port);
 
+/*
+ * Guest has enqueued a buffer for the host to write into.
+ * Called each time a buffer is enqueued by the guest;
+ * irrespective of whether there already were free buffers the
+ * host could have consumed.
+ *
+ * This is dependent on both, the guest and host ends being
+ * connected.
+ */
+void (*guest_writable)(VirtIOSerialPort *port);
+
 /*
  * Guest wrote some data to the port. This data is handed over to
  * the app via this callback.  The app can return a size less than
-- 
1.9.3

[Qemu-devel] [PULL] virtio-serial: crash fix, guest_writable()

2014-11-06 Thread Amit Shah

The following changes since commit 6e76d125f244e10676b917208f2a074729820246:

  Update version for v2.2.0-rc0 release (2014-11-05 15:21:04 +)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/qemu/amit/virtio-serial.git 
tags/vser-2.2.0-queue

for you to fetch changes up to 745d32d12f10badaafd26088c616025ebfe223fe:

  virtio-serial: avoid crash when port has no name (2014-11-06 16:43:35 +0530)


Couple of patches for 2.2.0: one fixes a crash, and the other adds the
guest_writable() API.


Amit Shah (1):
  virtio: serial: expose a 'guest_writable' callback for users

Marc-André Lureau (1):
  virtio-serial: avoid crash when port has no name

 hw/char/virtio-serial-bus.c   | 33 -
 include/hw/virtio/virtio-serial.h | 11 +++
 2 files changed, 43 insertions(+), 1 deletion(-)

[Qemu-devel] [PATCH 2/2] virtio-serial: avoid crash when port has no name

2014-11-06 Thread Amit Shah

From: Marc-André Lureau marcandre.lur...@gmail.com

It seems name is not mandatory, and the following command line (based
on one generated by current libvirt) will crash qemu at start:

qemu-system-x86_64 \
-device virtio-serial-pci \
-device virtserialport,name=foo \
-device virtconsole

Program received signal SIGSEGV, Segmentation fault.
__strcmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:210
210movlpd(%rsi), %xmm2
Missing separate debuginfos, use: debuginfo-install
python-libs-2.7.5-13.fc20.x86_64
(gdb) bt
 #0  __strcmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:210
 #1  0x5566bdc6 in find_port_by_name (name=0x0) at 
/home/elmarco/src/qemu/hw/char/virtio-serial-bus.c:67

Signed-off-by: Marc-André Lureau marcandre.lur...@gmail.com
Reviewed-by: Amos Kong ak...@redhat.com
Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/char/virtio-serial-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index bea7a17..c191283 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -902,7 +902,7 @@ static void virtser_port_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-if (find_port_by_name(port-name)) {
+if (port-name != NULL  find_port_by_name(port-name)) {
 error_setg(errp, virtio-serial-bus: A port already exists by name %s,
port-name);
 return;
-- 
1.9.3

Re: [Qemu-devel] [v2 1/2] docs: Add a doc about multiple compression threads

2014-11-06 Thread Eric Blake

On 11/06/2014 12:08 PM, Li Liang wrote:
 Give some details about the multiple compression threads and how
 to use it in live migration.
 
 Signed-off-by: Li Liang liang.z...@intel.com
 ---
  docs/multiple-compression-threads.txt | 128 
 ++
  1 file changed, 128 insertions(+)
  create mode 100644 docs/multiple-compression-threads.txt
 
 diff --git a/docs/multiple-compression-threads.txt 
 b/docs/multiple-compression-threads.txt
 new file mode 100644
 index 000..a5e53de
 --- /dev/null
 +++ b/docs/multiple-compression-threads.txt
 @@ -0,0 +1,128 @@
 +Use multiple (de)compression threads in live migration
 +=
 +Copyright (C) 2014 Li Liang liang.z...@intel.com

Asserting copyright without also mentioning an open license is awkward
in open source (IANAL, but as I understand it, in some areas, asserting
a copyright without also granting disclaimers merely gets the default
non-open status where the file cannot be copied at all; the license is
essential to make it obvious that the copyright holder INTENDS for the
file to be copied in some circumstances).  Thus, you need to explicitly
call out GPLv2+ (even if it can be argued it is was implied by the
top-level LICENSE) or some other compatible license to be safe.

 +
 +
 +Contents:
 +=
 +* Introduction
 +* When to use
 +* Performance
 +* Usage
 +* TODO
 +
 +Introduction
 +
 +Instead of sending the guest memory directly, this solution will
 +compress the ram page before sending, after receiving, the data will

s/sending,/sending;/

 +be decompressed. Using compression in live migration can help
 +to reduce the data transferred about 60%, this is very useful when the
 +bandwidth is limited, and the migration time can also be reduced about
 +70% in a typical case.
 +
 +The process of compression will consume additional CPU cycles, and the
 +extra CPU cycles will increase the migration time. On the other hand,
 +the amount of data transferred will reduced, this factor can reduce
 +the migration time. If the process of the compression is quick
 +enough, then the total migration time can be reduced, and multiple
 +compression threads can be used to accelerate the compression process.
 +
 +The decompression speed of zlib is at least 4 times as quickly as

s/quickly/quick/

 +compression, if the source and destination CPU have equal speed,
 +keeping the compression thread count 4 times the decompression
 +thread count can avoid CPU waste.
 +
 +Compression level can be used to control the compression speed and the
 +compression ratio. High compression ratio will take more time, level 0
 +stands for no compression, level 1 stands for the best compression
 +speed, and level 9 stands for the best compression ratio. Users can
 +select a level number between 0 and 9.
 +
 +
 +When to use the multiple compression threads in live migration
 +==
 +Compression of data will consume lot of extra CPU cycles, in a system

s/lot of//
s/cycles,/cycles; so/

 +with high overhead of CPU, avoid using this feature. When the network
 +bandwidth is very limited and the CPU resource is adequate, use the

s/use the/use of/

 +multiple compression threads will be very helpful. If both the CPU and
 +the network bandwidth are adequate, use multiple compression threads

s/use/use of/

 +can still help to reduce the migration time.
 +
 +Performance
 +===
 +Test environment:
 +
 +CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
 +Socket Count: 2
 +Ram: 128G
 +NIC: Intel I350 (10/100/1000Mbps)
 +Host OS: CentOS 7 64-bit
 +Guest OS: Ubuntu 12.10 64-bit
 +Parameter: qemu-system-x86_64 -enable-kvm -m 1024
 + /share/ia32e_ubuntu12.10.img -monitor stdio
 +
 +There is no additional application is running on the guest when doing
 +the test.
 +
 +
 +Speed limit: 32MB/s
 +---
 +| original  | compress thread: 8
 +|   way | decompress thread: 2
 +|   | compression level: 1
 +---
 +total time(msec):   |  26561|  7920
 +---
 +transferred ram(kB):|  877054   | 260641
 +---
 +throughput(mbps):   |  270.53   | 269.68
 +---
 +total ram(kB):  |  1057604  | 1057604
 +---
 +
 +
 +Speed limit: No
 +---
 +| original  | compress thread: 15
 +|   way | decompress thread: 4
 +|   | compression level: 1
 +---
 +total time(msec):

Re: [Qemu-devel] [PATCH v2 10/20] target-mips: add MSA I8 format instructions

2014-11-06 Thread Yongbok Kim


On 05/11/2014 17:43, Richard Henderson wrote:

On 10/29/2014 02:41 AM, Yongbok Kim wrote:

+void helper_msa_shf_df(CPUMIPSState *env, uint32_t df, uint32_t wd,
+   uint32_t ws, uint32_t imm)
+{
+wr_t *pwd = (env-active_fpu.fpr[wd].wr);
+wr_t *pws = (env-active_fpu.fpr[ws].wr);
+wr_t wx, *pwx = wx;
+uint32_t i;
+
+switch (df) {
+case DF_BYTE:
+for (i = 0; i  DF_ELEMENTS(DF_BYTE); i++) {
+pwx-b[i] = pws-b[SHF_POS(i, imm)];
+}
+break;
+case DF_HALF:
+for (i = 0; i  DF_ELEMENTS(DF_HALF); i++) {
+pwx-h[i] = pws-h[SHF_POS(i, imm)];
+}
+break;

Why pass DF to decode at runtime?  It's better to fully decode this at
translate time and call the correct function.


r~

Hi Richard,

Agreed. DF is already known in translation time.
I do have a plan to improve efficiency of MSA implementation.

Regards,
Yongbok

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Markus Armbruster

Max Reitz mre...@redhat.com writes:

 On 2014-11-04 at 19:45, Markus Armbruster wrote:
 I'll try to explain all solutions fairly.  Isn't easy when you're as
 biased towards one of them as I am.  Please bear with me.


 = The trust boundary between image contents and meta-data =

 A disk image consists of image contents and meta-data.

 Example: all of a raw image's contents is image contents.  Leaves just
 file name and attributes for meta-data.

 Example: QCOW2 meta-data includes header, header extensions, L1 table,
 L2 tables, ...  The meta-data defines where in the image the actual
 contents is stored.

 A guest can access the image contents, not the meta-data.

 Image contents you've let an untrusted guest write is untrusted.

 Therefore, there's a trust boundary between image contents and
 meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
 contents.  The exact location of the trust boundary depends on the image
 format.


 = How we instruct QEMU what to trust =

 By configuring QEMU to use an image, the user instructs QEMU to trust
 the image's meta-data.

 When the user's configuration specifies the image format explicitly, the
 trust boundary is clear.

 Else, the trust boundary is ambigous when more than one format is
 possible.

 QEMU resolves this ambiguity by picking the first format with the
 highest score.  Raw format is always possible, and always has the
 lowest score.


 = How this lets the guest escape isolation =

 Unfortunately, this lets the guest shift the trust boundary and escape
 isolation, as follows:

 * Expose a raw image to the guest (whether you specify the format=raw or
let QEMU guess it doesn't matter).  The complete contents becomes
untrusted.

 * Reuse the image *without* specifying the raw format.  QEMU guesses the
format based on untrusted image contents.  Now QEMU guesses a format
chosen by the guest, with meta-data chosen by the guest.  By
controlling image meta-data, the malicious guest can access arbitrary
files as QEMU, enlarge its storage, and more.  A non-malicious guest
can accidentally DoS itself, by writing a pattern probing recognizes.

 Thank you for bringing that to my attention. This means that I'm even
 more in favor of using Kevin's patches because in fact they don't
 break anything.

They break things differently.  The difference may or may not matter.

Example: innocent guest writes a recognized pattern.

  Now: next restart fails, guest DoSed itself until host operator gets
  around to adding format=raw to the configuration.  Consequence:
  downtime (probably lengthy), but no data corruption.

  With Kevin's patch: write fails, guest may or may not handle the
  failure gracefully.  Consequences can range from guest complains to
  its logs (who cares) via guest stops whatever it's doing and refuses
  to continue until its hardware gets fixed (downtime as above) to
  data corruption.

Example: innocent guest first writes a recognized pattern, then
overwrites it with a non-recognized pattern.

  Now: works.

  With Kevin's patch: as above.

Again, I'm not claiming the differences are serious in practice, only
that they exist.

 This is CVE-2008-2004.


 = Aside: other trust boundaries =

 Of course, this is not the only trust boundary that matters.  For
 instance, there's normally one between your host and somebody else's
 computers.  Telling QEMU to trust meta-data of some image you got from
 the internet violates it.  There's nothing QEMU can do about that.


 = Insecure usage is easy, secure usage is hard =

 The oldest stratum of user interfaces doesn't let you specify the image
 format.  Use of raw images with these is insecure by design.  These
 interfaces are still recommended for human users.

 Example of insecure usage: -hda foo.img, where foo.img is raw.

 With the next generation of interfaces, specifying the image format is
 optional.  Use of raw images with these is insecure by default.

 Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
 where foo.img is raw.  The -hda above is actually sugar for this.

 Equivalent secure usage: add format=raw.

 Note that specifying just the top image's format is not enough, you also
 have to specify any backing images' formats.  QCOW2 can optionally store
 the backing image format in the image.  The other COW formats can't.

 Well, they can, with json:. *cough*

Point coughingly taken.

 Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
 with a raw backing file.

 Yesterday I found out that doesn't seem possible. You apparently can
 only use VMDK with VMDK backing files.

I figure you're referring to this code in vmdk_create():

if (strcmp(bs-drv-format_name, vmdk)) {
bdrv_unref(bs);
ret = -EINVAL;
goto exit;
}

Other than that, we only have
 qcow1 and qed as COW formats which should not be used anyway.

qemu-doc.texi calls

Re: [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation

2014-11-06 Thread Alexander Graf



On 06.11.14 09:57, Eric Auger wrote:
 On 11/05/2014 11:23 PM, Alexander Graf wrote:


 On 05.11.14 13:31, Eric Auger wrote:
 On 11/05/2014 11:59 AM, Alexander Graf wrote:


 On 31.10.14 15:05, Eric Auger wrote:
 vfio-calxeda-xgmac now can be instantiated using the -device option.
 The node creation function generates a very basic dt node composed
 of the compat, reg and interrupts properties

 Signed-off-by: Eric Auger eric.au...@linaro.org

 ---

 v6 - v7:
 - compat string re-formatting removed since compat string is not exposed
   anymore as a user option
 - VFIO IRQ kick-off removed from sysbus-fdt and moved to VFIO platform
   device
 ---
  hw/arm/sysbus-fdt.c | 88 
 +
  1 file changed, 88 insertions(+)

 diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
 index d5476f1..f8b310b 100644
 --- a/hw/arm/sysbus-fdt.c
 +++ b/hw/arm/sysbus-fdt.c
 @@ -27,6 +27,8 @@
  #include hw/platform-bus.h
  #include sysemu/sysemu.h
  #include hw/platform-bus.h
 +#include hw/vfio/vfio-platform.h
 +#include hw/vfio/vfio-calxeda-xgmac.h
  
  /*
   * internal struct that contains the information to create dynamic
 @@ -54,8 +56,11 @@ typedef struct NodeCreationPair {
  int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque);
  } NodeCreationPair;
  
 +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque);
 +
  /* list of supported dynamic sysbus devices */
  NodeCreationPair add_fdt_node_functions[] = {
 +{TYPE_VFIO_CALXEDA_XGMAC, add_basic_vfio_fdt_node},
  {, NULL}, /*last element*/
  };

 Can you maybe place the list somewhere smartly to make sure we don't
 need forward declarations? Either put it in between the generic and
 device specific code or at the end of the file with a single forward
 declaration for the array?

 sure

  
 @@ -86,6 +91,89 @@ static int add_fdt_node(SysBusDevice *sbdev, void 
 *opaque)
  }
  
  /**
 + * add_basic_vfio_fdt_node - generates the most basic node for a VFIO 
 node
 + *
 + * set properties are:
 + * - compatible string
 + * - regs
 + * - interrupts
 + */
 +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque)
 +{
 +PlatformBusFdtData *data = opaque;
 +PlatformBusDevice *pbus = data-pbus;
 +void *fdt = data-fdt;
 +const char *parent_node = data-pbus_node_name;
 +int compat_str_len;
 +char *nodename;
 +int i, ret;
 +uint32_t *irq_attr;
 +uint64_t *reg_attr;
 +uint64_t mmio_base;
 +uint64_t irq_number;
 +VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
 +VFIODevice *vbasedev = vdev-vbasedev;
 +Object *obj = OBJECT(sbdev);
 +
 +mmio_base = object_property_get_int(obj, mmio[0], NULL);
 +
 +nodename = g_strdup_printf(%s/%s@% PRIx64, parent_node,
 +   vbasedev-name,
 +   mmio_base);
 +
 +qemu_fdt_add_subnode(fdt, nodename);
 +
 +compat_str_len = strlen(vdev-compat) + 1;
 +qemu_fdt_setprop(fdt, nodename, compatible,
 +  vdev-compat, compat_str_len);

 What if there are multiple compatibles?
 My purpose here was absolutely not to come back again on a proposal
 where we could have a generic node creation. I understand that it is not
 realistic. I rather tried to put some common property creation in this
 function but you're right even the interrupt prop depend on the device.

 About your question, I think the specialized VFIO device would set its
 compat string including the various substrings. This was done in the
 past for PL330 which required arm,pl330;arm,primecell.


 +
 +reg_attr = g_new(uint64_t, vbasedev-num_regions*4);
 +
 +for (i = 0; i  vbasedev-num_regions; i++) {
 +mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
 +reg_attr[4*i] = 1;

 What is the 1 here?
 address-cells? since the bus is  4GB, 1 32b reg is required to specify
 the base address. But since you put #size-cells already in the parent
 node maybe I can remove it.

 I'm confused. Shouldn't the reg look like [ addr size ... ]?

   http://www.devicetree.org/Device_Tree_Usage#Memory_Mapped_Devices

 The number of cells is defined separately via #address-cells or #size-cells.
 
 Hi Alex,
 
 sorry my answer was misleading and I was mixing
 qemu_fdt_setprop_sized_cells_from_array usage and produced dts syntax.
 1 values effectively correspond to the number of cells respectively
 used for addr value and size value. Args of
 qemu_fdt_setprop_sized_cells_from_array are pairs (size, value), see
 below as a reminder. The fact platform bus node has attributes
 #size-cells = 0x1, and #address-cells = 0x1 forces me to use 1. As a
 result the guest dt will look as
 
 / {
 #address-cells = 1;
 #size-cells = 1;
 
 ...
 
 serial@101f {
 compatible = arm,pl011;
 reg = 0x101f 0x1000 ;
 ../..
 
 I hope this clarifies.
 
 Best Regards
 
 Eric
 
  * qemu_fdt_setprop_sized_cells_from_array:
  * @fdt: device

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Markus Armbruster

Eric Blake ebl...@redhat.com writes:

 On 11/05/2014 09:38 AM, Max Reitz wrote:

 Note that specifying just the top image's format is not enough, you also
 have to specify any backing images' formats.  QCOW2 can optionally store
 the backing image format in the image.  The other COW formats can't.
 
 Well, they can, with json:. *cough*
 
 Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
 with a raw backing file.
 
 Yesterday I found out that doesn't seem possible. You apparently can
 only use VMDK with VMDK backing files. Other than that, we only have
 qcow1 and qed as COW formats which should not be used anyway.

 Actually, qed requires the backing format to be recorded (it is
 non-optional) and is therefore immune to probing problems of backing
 files.  That's one thing it got right.

If I read the code correctly:

QED has a feature bit QED_F_BACKING_FORMAT_NO_PROBE.

It is changed when you set the backing file format.  Setting format to
raw sets the flag, anything else (including nothing) clears the flag.
The actual non-raw format is not recorded.

Creating an image counts as setting the backing file format.

If the flag is set, open uses rawfor the backing file (no probing).

If it's unset, open probes, and the probe may yield raw.

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-06 Thread Peter Maydell

On 5 November 2014 09:58, Claudio Fontana claudio.font...@huawei.com wrote:
 Please correct me if I am wrong, my understanding at the moment is that
 for X86 there is an ACPI implementation in hw/acpi, with the table generation
 happening in hw/i386/acpi-build.c .
 Couldn't there be some unification where part of the infrastructure for
 ACPI is reused, with arch-specific code specializing for X86 and ARM?
 Why are ACPI tables created for X86, but cannot be created likewise for ARM?

Because then for ARM boards we'd be creating a description of the
hardware twice, once in device tree and once in ACPI, which seems
like unnecessary duplication.

 We need ACPI guest support in QEMU for AArch64 over here, with all features
 (including the ability to run ACPI code and add specific tables), for
 ACPI-based guests.

The plan for providing ACPI to guests is that we run a UEFI BIOS
blob which is what is responsible for providing ACPI and UEFI
runtime services to guests which need them. (The UEFI blob finds
out about its hardware by looking at a device tree that QEMU
passes it, but that's a detail between QEMU and its bios blob).
This pretty much looks like what x86 QEMU used to do with ACPI
for a very long time, so we know it's a feasible approach.

thanks
-- PMM

[Qemu-devel] [PATCH 6/7] iotests: padded parallels image test

2014-11-06 Thread Denis V. Lunev

Unfortunately, old guest OSes do not align partitions to page size by
default. This is true for Windows 2003 and Windows XP.

Padding is a value which should be added to guest LBA to obtain
sector number inside the image. This results in a shifted images.
   0123offset inside image (in 512 byte sectors)
  +---
  +.012guest data (512 byte sectors)
  +---
The information about this is available in DiskDescriptor.xml ONLY. There
is no such data in the image header.

This patch contains very simple image with padding and corresponding
XML disk descriptor created in authentic way.

Signed-off-by: Denis V. Lunev d...@openvz.org
Acked-by: Roman Kagan rka...@parallels.com
Reviewed-by: Jeff Cody jc...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
---
 tests/qemu-iotests/076|   6 ++
 tests/qemu-iotests/076.out|   4 
 tests/qemu-iotests/sample_images/parallels-padded.xml.bz2 | Bin 0 - 377 bytes
 tests/qemu-iotests/sample_images/parallels-v2-padded.bz2  | Bin 0 - 139 bytes
 4 files changed, 10 insertions(+)
 create mode 100644 tests/qemu-iotests/sample_images/parallels-padded.xml.bz2
 create mode 100644 tests/qemu-iotests/sample_images/parallels-v2-padded.bz2

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index 636fc58..766b359 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -81,6 +81,12 @@ _use_sample_img parallels-v2.bz2
 _use_sample_img parallels-simple.xml.bz2
 { $QEMU_IO -c read -P 0x11 0 64k $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
 
+echo
+echo == Read from a valid v2 image opened through xml with padding ==
+_use_sample_img parallels-v2-padded.bz2
+_use_sample_img parallels-padded.xml.bz2
+{ $QEMU_IO -c read -P 0x11 0 64k $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
+
 # success, all done
 echo *** done
 rm -f $seq.full
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index 628d9bf..46680d8 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -23,4 +23,8 @@ read 65536/65536 bytes at offset 0
 == Read from a valid v2 image opened through xml ==
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Read from a valid v2 image opened through xml with padding ==
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
diff --git a/tests/qemu-iotests/sample_images/parallels-padded.xml.bz2 
b/tests/qemu-iotests/sample_images/parallels-padded.xml.bz2
new file mode 100644
index 
..90e481b88e1f4d65256d0511a8509114c6358052
GIT binary patch
literal 377
zcmV-0fzoUT4*^jL0KkKS+JP`HwKUw~9lPy_$xpWsdazwh08FaddH8X6Fz^phqa
z0kr@C000`P{z){AUpa9SS5-9{U$c4nhyVtdfqMRN`P%Vs7e5mydg1cvP`s
zYo%7Ss_C5i%-bDqP)#HmEBiP+TU)8WyoiurHMw!l_ZF18_3m3N^^QwEd8=8hkN
zLO3;VtVoT0C3B518KN#oIV4@)7DfOSbkchYBQVO$9`K;#(G?m(FfikRxRq(!eB`z
zPfQaAWzYbTAj%3PZH#S=yl!A7-v`x0Cm|7Ih97TD6vNg%Klm)F8;i7;OxS)eka
zB8A?w-LbP2oQ=RX}Frn6(gA)fHtcO2o1qAWlwhDKQigsdQwI23XvR~T0E)#6G
zUDR|Oi30I4$}VPcTnCQ(CKJjW0g+;mFwDaQlohzp(E|mg47s2j;GNR^b?hm3@s%x
XY}H|5IAssg=l(9_ig2MJVKN1r8+(|=

literal 0
HcmV?d1

diff --git a/tests/qemu-iotests/sample_images/parallels-v2-padded.bz2 
b/tests/qemu-iotests/sample_images/parallels-v2-padded.bz2
new file mode 100644
index 
..80948ce689e2c5c2335d6e6b36965fe0b2305613
GIT binary patch
literal 139
zcmV;60CfLCT4*^jL0KkKStWeumB1p|H%DFU;t162mk{B2!JYJ)8f2AOHXuAOMPl
zQl^8-X^;SIHmDUv6HN?Ehp1#26B`Us=32n6t4O~g)zWI0C`4`x5NN3Kna)uT+{)Q
t08GFDLI4D2paA;UmwRX=_?Gd@%RJirP-3P26T+uTcBnnPYyj78F)siB

literal 0
HcmV?d1

-- 
1.9.1

[Qemu-devel] [PATCH 1/7] configure: add dependency from libxml2

2014-11-06 Thread Denis V. Lunev

This dependency is required for adequate Parallels images support.
Typically the disk consists of several images which are glued by
XML disk descriptor. Also XML hides inside several important parameters
which are not available in the image header.

The patch also adds clause to checkpatch.pl to understand libxml2 types.
In the other case there is the following false positive:
ERROR: need consistent spacing around '*' (ctx:WxB)
#210: FILE: block/parallels.c:232:
+   !xmlStrcmp(root-name, (const xmlChar *)Parallels_disk_image))

Signed-off-by: Denis V. Lunev d...@openvz.org
Acked-by: Roman Kagan rka...@parallels.com
CC: Jeff Cody jc...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
CC: Paolo Bonzini pbonz...@redhat.com
CC: Michael Tokarev m...@tls.msk.ru
---
 block/Makefile.objs   |  2 ++
 configure | 28 
 scripts/checkpatch.pl |  1 +
 3 files changed, 31 insertions(+)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 04b0e43..3040c33 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -38,3 +38,5 @@ ssh.o-libs := $(LIBSSH2_LIBS)
 archipelago.o-libs := $(ARCHIPELAGO_LIBS)
 qcow.o-libs:= -lz
 linux-aio.o-libs   := -laio
+parallels.o-cflags := $(LIBXML2_CFLAGS)
+parallels.o-libs   := $(LIBXML2_LIBS)
diff --git a/configure b/configure
index 2f17bf3..391a64d 100755
--- a/configure
+++ b/configure
@@ -335,6 +335,7 @@ libssh2=
 vhdx=
 quorum=
 numa=
+libxml2=
 
 # parse CC options first
 for opt do
@@ -1129,6 +1130,10 @@ for opt do
   ;;
   --enable-numa) numa=yes
   ;;
+  --disable-libxml2) libxml2=no
+  ;;
+  --enable-libxml2) libxml2=yes
+  ;;
   *)
   echo ERROR: unknown option $opt
   echo Try '$0 --help' for more information
@@ -1400,6 +1405,8 @@ Advanced options (experts only):
   --enable-quorum  enable quorum block filter support
   --disable-numa   disable libnuma support
   --enable-numaenable libnuma support
+  --disable-libxml2disable libxml2 (for Parallels image format)
+  --enable-libxml2 enable libxml2 (for Parallels image format)
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -4097,6 +4104,20 @@ if test -z $zero_malloc ; then
 fi
 fi
 
+# check for libxml2
+if test $libxml2 != no ; then
+if $pkg_config --exists libxml-2.0; then
+libxml2=yes
+libxml2_cflags=$($pkg_config --cflags libxml-2.0)
+libxml2_libs=$($pkg_config --libs libxml-2.0)
+else
+if test $libxml2 = yes; then
+feature_not_found libxml2 Install libxml2 devel
+fi
+libxml2=no
+fi
+fi
+
 # Now we've finished running tests it's OK to add -Werror to the compiler flags
 if test $werror = yes; then
 QEMU_CFLAGS=-Werror $QEMU_CFLAGS
@@ -4340,6 +4361,7 @@ echo Quorum$quorum
 echo lzo support   $lzo
 echo snappy support$snappy
 echo NUMA host support $numa
+echo libxml2   $libxml2
 
 if test $sdl_too_old = yes; then
 echo - Your SDL version is too old - please upgrade to have SDL support
@@ -4846,6 +4868,12 @@ if test $rdma = yes ; then
   echo CONFIG_RDMA=y  $config_host_mak
 fi
 
+if test $libxml2 = yes ; then
+  echo CONFIG_LIBXML2=y  $config_host_mak
+  echo LIBXML2_CFLAGS=$libxml2_cflags  $config_host_mak
+  echo LIBXML2_LIBS=$libxml2_libs  $config_host_mak
+fi
+
 # Hold two types of flag:
 #   CONFIG_THREAD_SETNAME_BYTHREAD  - we've got a way of setting the name on
 # a thread we have a handle to
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 053e432..3ddb097 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -242,6 +242,7 @@ our @typeList = (
qr{${Ident}_t},
qr{${Ident}_handler},
qr{${Ident}_handler_fn},
+   qr{xml${Ident}},
 );
 our @modifierList = (
qr{fastcall},
-- 
1.9.1

[Qemu-devel] [PATCH 3/7] iotests, parallels: quote TEST_IMG in 076 test to be path-safe

2014-11-06 Thread Denis V. Lunev

suggested by Jeff Cody

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Jeff Cody jc...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
---
 tests/qemu-iotests/076 | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/076 b/tests/qemu-iotests/076
index ed2be35..0139976 100755
--- a/tests/qemu-iotests/076
+++ b/tests/qemu-iotests/076
@@ -49,31 +49,31 @@ nb_sectors_offset=$((0x24))
 echo
 echo == Read from a valid v1 image ==
 _use_sample_img parallels-v1.bz2
-{ $QEMU_IO -c read -P 0x11 0 64k $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c read -P 0x11 0 64k $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo == Negative catalog size ==
 _use_sample_img parallels-v1.bz2
 poke_file $TEST_IMG $catalog_entries_offset \xff\xff\xff\xff
-{ $QEMU_IO -c read 0 512 $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c read 0 512 $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo == Overflow in catalog allocation ==
 _use_sample_img parallels-v1.bz2
 poke_file $TEST_IMG $nb_sectors_offset \xff\xff\xff\xff
 poke_file $TEST_IMG $catalog_entries_offset \x01\x00\x00\x40
-{ $QEMU_IO -c read 64M 64M $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c read 64M 64M $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo == Zero sectors per track ==
 _use_sample_img parallels-v1.bz2
 poke_file $TEST_IMG $tracks_offset \x00\x00\x00\x00
-{ $QEMU_IO -c read 0 512 $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c read 0 512 $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo == Read from a valid v2 image ==
 _use_sample_img parallels-v2.bz2
-{ $QEMU_IO -c read -P 0x11 0 64k $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c read -P 0x11 0 64k $TEST_IMG; } 21 | _filter_qemu_io | 
_filter_testdir
 
 # success, all done
 echo *** done
-- 
1.9.1

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Max Reitz


On 2014-11-06 at 13:26, Markus Armbruster wrote:

Max Reitz mre...@redhat.com writes:


On 2014-11-04 at 19:45, Markus Armbruster wrote:

I'll try to explain all solutions fairly.  Isn't easy when you're as
biased towards one of them as I am.  Please bear with me.


= The trust boundary between image contents and meta-data =

A disk image consists of image contents and meta-data.

Example: all of a raw image's contents is image contents.  Leaves just
file name and attributes for meta-data.

Example: QCOW2 meta-data includes header, header extensions, L1 table,
L2 tables, ...  The meta-data defines where in the image the actual
contents is stored.

A guest can access the image contents, not the meta-data.

Image contents you've let an untrusted guest write is untrusted.

Therefore, there's a trust boundary between image contents and
meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
contents.  The exact location of the trust boundary depends on the image
format.


= How we instruct QEMU what to trust =

By configuring QEMU to use an image, the user instructs QEMU to trust
the image's meta-data.

When the user's configuration specifies the image format explicitly, the
trust boundary is clear.

Else, the trust boundary is ambigous when more than one format is
possible.

QEMU resolves this ambiguity by picking the first format with the
highest score.  Raw format is always possible, and always has the
lowest score.


= How this lets the guest escape isolation =

Unfortunately, this lets the guest shift the trust boundary and escape
isolation, as follows:

* Expose a raw image to the guest (whether you specify the format=raw or
let QEMU guess it doesn't matter).  The complete contents becomes
untrusted.

* Reuse the image *without* specifying the raw format.  QEMU guesses the
format based on untrusted image contents.  Now QEMU guesses a format
chosen by the guest, with meta-data chosen by the guest.  By
controlling image meta-data, the malicious guest can access arbitrary
files as QEMU, enlarge its storage, and more.  A non-malicious guest
can accidentally DoS itself, by writing a pattern probing recognizes.

Thank you for bringing that to my attention. This means that I'm even
more in favor of using Kevin's patches because in fact they don't
break anything.

They break things differently.  The difference may or may not matter.

Example: innocent guest writes a recognized pattern.

   Now: next restart fails, guest DoSed itself until host operator gets
   around to adding format=raw to the configuration.  Consequence:
   downtime (probably lengthy), but no data corruption.

   With Kevin's patch: write fails, guest may or may not handle the
   failure gracefully.  Consequences can range from guest complains to
   its logs (who cares) via guest stops whatever it's doing and refuses
   to continue until its hardware gets fixed (downtime as above) to
   data corruption.


You somehow seem convinced that writing to sector 0 is a completely 
normal operation. For x86, it isn't, though.


There are only a couple of programs which do that, I can only think of 
partitioning and setting up boot loaders. There's not a myriad of 
programs which would increase the probability of one both writing a 
recognizable pattern *and* not handling EPERM correctly.


I see the probability of both happening at the same time as extremely 
low, not least because there are only a handful of programs which access 
that sector.



Example: innocent guest first writes a recognized pattern, then
overwrites it with a non-recognized pattern.

   Now: works.

   With Kevin's patch: as above.


Not really, if the guest overwrites the data with a non-recognized 
pattern after EPERM it works as well. The difference here is that you 
won't have the intended data in the meantime.



Again, I'm not claiming the differences are serious in practice, only
that they exist.


True.


This is CVE-2008-2004.


= Aside: other trust boundaries =

Of course, this is not the only trust boundary that matters.  For
instance, there's normally one between your host and somebody else's
computers.  Telling QEMU to trust meta-data of some image you got from
the internet violates it.  There's nothing QEMU can do about that.


= Insecure usage is easy, secure usage is hard =

The oldest stratum of user interfaces doesn't let you specify the image
format.  Use of raw images with these is insecure by design.  These
interfaces are still recommended for human users.

Example of insecure usage: -hda foo.img, where foo.img is raw.

With the next generation of interfaces, specifying the image format is
optional.  Use of raw images with these is insecure by default.

Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
where foo.img is raw.  The -hda above is actually sugar for this.

Equivalent secure usage: add format=raw.

Note that specifying just the top image's format is not enough, you also
have to specify

[Qemu-devel] [PATCH v3 0/7] parallels format support improvements

2014-11-06 Thread Denis V. Lunev

The patchset implements additional compatibility bits for Parallels
format:
- initial support of parsing of Parallels DiskDeskriptor.xml
  Typically Parallels disk bundle consists of several images which are
  glued by XML disk descriptor. Also XML hides inside several important
  parameters which are not available in the image header.
- support for padded Parallels images.
  For the time being Parallels was created an optimization for such OSes
  in its desktop product. Desktop users are not qualified enough to create
  properly aligned installations. Thus Parallels makes a blind guess
  on a customer behalf and creates so-called padded images if guest
  OS type is specified as WinXP, Win2k and Win2k3.

The code uses approach from VMDK support, either image or XML descriptor
could be used. Though there is temporary hack in the opening code:
BlockDriverState-file is being reopened inside parallels_open. I prefer
to keep this code in this state till proper Parallels snapshots support
in order to minimize current changes.

Changes from v2:
- (patch 1) changed libxml2 addition as suggested by Michael Tokarev
- (patch 2) changed API of xml_find/xml_get_text to avoid memcpy to variable
  on stack
- (patch 2) dropped predefined value for PARALLELS_XML/PARALLELS_IMAGE
- (patch 2) other minor changes (spelling, placement)
- (patch 3) quoted TEST_IMG as suggested by Jeff Cody
- (patches 4, 6) quoted TEST_IMG as suggested by Jeff Cody

Changes from v1:
- dropped already merged part (original patches 1-3)

CC: Jeff Cody jc...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
CC: Roman Kagan rka...@parallels.com
CC: Denis V. Lunev d...@openvz.org

[Qemu-devel] [PATCH 5/7] block/parallels: support padded Parallels images

2014-11-06 Thread Denis V. Lunev

Unfortunately, old guest OSes do not align partitions to page size by
default. This is true for Windows 2003 and Windows XP.

For the time being Parallels was created an optimization for such OSes
in its desktop product. Desktop users are not qualified enough to create
properly aligned installations. Thus Parallels makes a blind guess
on a customer behalf and creates so-called padded images if guest
OS type is specified as WinXP, Win2k and Win2k3.

Padding is a value which should be added to guest LBA to obtain
sector number inside the image. This results in a shifted images.
   0123offset inside image (in 512 byte sectors)
  +---
  +.012guest data (512 byte sectors)
  +---
The information about this is available in DiskDescriptor.xml ONLY. There
is no such data in the image header.

There share of such images could be evaluated as 6-8% according to the
statistics in my hands.

This patch obtains proper value from XML and applies it on reading.

Signed-off-by: Denis V. Lunev d...@openvz.org
Acked-by: Roman Kagan rka...@parallels.com
Reviewed-by: Jeff Cody jc...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
---
 block/parallels.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 4a8240d..b27d1ea 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -63,6 +63,7 @@ typedef struct BDRVParallelsState {
 unsigned int tracks;
 
 unsigned int off_multiplier;
+unsigned int padding;
 } BDRVParallelsState;
 
 
@@ -245,6 +246,7 @@ static int parallels_open_xml(BlockDriverState *bs, int 
flags, Error **errp)
 const char *data;
 char image_path[PATH_MAX];
 Error *local_err = NULL;
+BDRVParallelsState *s = bs-opaque;
 
 ret = size = bdrv_getlength(bs-file);
 if (ret  0) {
@@ -274,6 +276,19 @@ static int parallels_open_xml(BlockDriverState *bs, int 
flags, Error **errp)
 if (root == NULL) {
 goto fail;
 }
+
+data = xml_get_text(root, Disk_Parameters, Padding, NULL);
+if (data != NULL) {
+char *endptr;
+unsigned long pad;
+
+pad = strtoul(data, endptr, 0);
+if ((endptr != NULL  *endptr != '\0') || pad  UINT_MAX) {
+goto fail;
+}
+s-padding = (uint32_t)pad;
+}
+
 image = xml_seek(root, StorageData, Storage, Image, NULL);
 data = ; /* make gcc happy */
 for (size = 0; image != NULL; image = image-next) {
@@ -375,6 +390,10 @@ static int64_t seek_to_sector(BlockDriverState *bs, 
int64_t sector_num)
 static int parallels_read(BlockDriverState *bs, int64_t sector_num,
 uint8_t *buf, int nb_sectors)
 {
+BDRVParallelsState *s = bs-opaque;
+
+sector_num += s-padding;
+
 while (nb_sectors  0) {
 int64_t position = seek_to_sector(bs, sector_num);
 if (position = 0) {
-- 
1.9.1

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-06 Thread Igor Mammedov

On Thu, 6 Nov 2014 12:44:04 +
Peter Maydell peter.mayd...@linaro.org wrote:

 On 5 November 2014 09:58, Claudio Fontana claudio.font...@huawei.com wrote:
  Please correct me if I am wrong, my understanding at the moment is that
  for X86 there is an ACPI implementation in hw/acpi, with the table 
  generation
  happening in hw/i386/acpi-build.c .
  Couldn't there be some unification where part of the infrastructure for
  ACPI is reused, with arch-specific code specializing for X86 and ARM?
  Why are ACPI tables created for X86, but cannot be created likewise for ARM?
 
 Because then for ARM boards we'd be creating a description of the
 hardware twice, once in device tree and once in ACPI, which seems
 like unnecessary duplication.
 
  We need ACPI guest support in QEMU for AArch64 over here, with all features
  (including the ability to run ACPI code and add specific tables), for
  ACPI-based guests.
 
 The plan for providing ACPI to guests is that we run a UEFI BIOS
 blob which is what is responsible for providing ACPI and UEFI
 runtime services to guests which need them. (The UEFI blob finds
 out about its hardware by looking at a device tree that QEMU
 passes it, but that's a detail between QEMU and its bios blob).
 This pretty much looks like what x86 QEMU used to do with ACPI
 for a very long time, so we know it's a feasible approach.
ACPI in BIOS had also led to necessity to
 1. update BIOS and QEMU in lockstep if fix/feature is must to have
 2. adding compatibility hooks so it it would work with mismatched
versions.
 3. never ending expansion of PV QEMU-BIOS interface

That's the reasons why ACPI tables are build by QEMU now, and we
probably should learn on x86 experience instead of going through
the same issues second time.

 
 thanks
 -- PMM

Re: [Qemu-devel] [v2 2/2] migration: Implement multiple compression threads

2014-11-06 Thread Eric Blake

On 11/06/2014 12:08 PM, Li Liang wrote:
 Instead of sending the guest memory directly, this solution compress
 the ram page before sending, after receiving, the data will be
 decompressed.
 This feature can help to reduce the data transferred about
 60%, this is very useful when the network bandwidth is limited,
 and the migration time can also be reduced about 70%. The
 feature is off by default, following the document
 docs/multiple-compression-threads.txt for information to use it.
 
 Reviewed-by: Eric Blake ebl...@redhat.com

Please DON'T add this line unless the author spelled it out (or if they
mentioned that it would be okay if you fix minor issues).  I
intentionally omitted a reviewed-by on v1:

https://lists.gnu.org/archive/html/qemu-devel/2014-11/msg00672.html

because I was not happy with the patch as it was presented and did not
think the work to fix it was trivial.  Furthermore, my review of v1 was
just over the interface, and not the entire patch; there are very likely
still bugs lurking in the .c files.  Once again, I'm going to limit my
review of v2 to the interface (at least in this email):

 Signed-off-by: Li Liang liang.z...@intel.com
 ---

 +++ b/qapi-schema.json
 @@ -491,13 +491,17 @@
  #  to enable the capability on the source VM. The feature is 
 disabled by
  #  default. (since 1.6)
  #
 +# @compress: Using the multiple compression threads to accelerate live 
 migration.
 +#  This feature can help to reduce the migration traffic, by sending
 +#  compressed pages. The feature is disabled by default. (since 2.3)
 +#
  # @auto-converge: If enabled, QEMU will automatically throttle down the guest
  #  to speed up convergence of RAM migration. (since 1.6)
  #
  # Since: 1.2
  ##
  { 'enum': 'MigrationCapability',
 -  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
 +  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 
 'compress'] }
  

I'll repeat what I said on v1 (but this time, with some links to back it
up :)

We really need to avoid a proliferation of new commands, two per tunable
does not scale well.  I think now is the time to implement my earlier
suggestion at making MigrationCapability become THE resource for tunables:

https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02274.html

 +++ b/qmp-commands.hx
 @@ -705,7 +705,138 @@ Example:
  - { return: 67108864 }
  
  EQMP
 +{
 +.name   = migrate-set-compress-level,
 +.args_type  = value:i,
 +.mhandler.cmd_new = qmp_marshal_input_migrate_set_compress_level,
 +},
 +
 +SQMP
 +migrate-set-compress-level
 +--

Convention in this file is to have the --- line extended out to the
length of the text it is tied to (you are missing four bytes,
corresponding to the tail evel)

 +
 +Set compress level to be used by compress migration, the compress level is 
 an integer

s/compress level/the compression level/ (twice)

 +between 0 and 9

s/9/9, where 9 means try harder for smaller compression at the expense
of more CPU time/

 +
 +Arguments:
 +
 +- value: compress level (json-int)
 +
 +Example:
 +
 +- { execute: migrate-set-compress-level, arguments: { value: 
 536870912 } }

Umm, 536870912 is not an integer between 0 and 9.


 +SQMP
 +query-migrate-compress-level
 +

--- length

 +
 +Show compress level to be used by compress migration
 +
 +returns a json-object with the following information:
 +- size : json-int
 +
 +Example:
 +
 +- { execute: query-migrate-compress-level }
 +- { return: 67108864 }

Ewww. Please no new interfaces that return raw ints.  Rather, return a
dictionary with one key/value pair holding the int.  Raw ints are not as
extensible as dictionaries.  Also, make the example realistic - 67108864
is not a valid compression level.

{ return: { level: 9 } }


 +migrate-set-compress-threads
 +--

--- length

 +
 +Set compress thread count to be used by compress migration, the compress 
 thread count is an integer
 +between 1 and 255
 +
 +Arguments:
 +
 +- value: compress threads (json-int)
 +
 +Example:
 +
 +- { execute: migrate-set-compress-threads, arguments: { value: 
 536870912 } }

Value out of range 1-255

 +- { return: {} }
 +
 +EQMP
 +{
 +.name   = query-migrate-compress-threads,
 +.args_type  = ,
 +.mhandler.cmd_new = qmp_marshal_input_query_migrate_compress_threads,
 +},
 +
 +SQMP
 +query-migrate-compress-threads
 +

--- length

 +
 +Show compress thread count to be used by compress migration
 +
 +returns a json-object with the following information:
 +- size : json-int
 +
 +Example:
 +
 +- { execute: query-migrate-compress-threads }
 +- { return: 67108864 }

out of range, raw int return

and so on in the rest of the patch (I'll quit calling it out, especially
if we switch over to my enhanced set-capabilities proposal)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Eric Blake

On 11/06/2014 01:43 PM, Markus Armbruster wrote:

 Actually, qed requires the backing format to be recorded (it is
 non-optional) and is therefore immune to probing problems of backing
 files.  That's one thing it got right.
 
 If I read the code correctly:
 
 QED has a feature bit QED_F_BACKING_FORMAT_NO_PROBE.
 
 It is changed when you set the backing file format.  Setting format to
 raw sets the flag, anything else (including nothing) clears the flag.
 The actual non-raw format is not recorded.
 
 Creating an image counts as setting the backing file format.
 
 If the flag is set, open uses rawfor the backing file (no probing).
 
 If it's unset, open probes, and the probe may yield raw.

Eww.  Well, looks like a deficiency in the qed spec, and maybe all that
is needed to plug it is:

If the probe yields raw, refuse to open the backing file (or put
another way, either the probe MUST find a non-raw file, or the user has
a bug that they forgot to set the raw bit so we refuse to open the file
to point out their bug).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Kevin Wolf

Am 06.11.2014 um 13:26 hat Markus Armbruster geschrieben:
  * Reuse the image *without* specifying the raw format.  QEMU guesses the
 format based on untrusted image contents.  Now QEMU guesses a format
 chosen by the guest, with meta-data chosen by the guest.  By
 controlling image meta-data, the malicious guest can access arbitrary
 files as QEMU, enlarge its storage, and more.  A non-malicious guest
 can accidentally DoS itself, by writing a pattern probing recognizes.
 
  Thank you for bringing that to my attention. This means that I'm even
  more in favor of using Kevin's patches because in fact they don't
  break anything.
 
 They break things differently.  The difference may or may not matter.
 
 Example: innocent guest writes a recognized pattern.
 
   Now: next restart fails, guest DoSed itself until host operator gets
   around to adding format=raw to the configuration.  Consequence:
   downtime (probably lengthy), but no data corruption.

Depends.

Another possible outcome is that the guest doesn't DoS itself, but
writes a valid image header. You've argued before that a guest could be
keeping a qcow2 image on a whole disk for whatever odd reason. In this
case, the qemu restart will present the nested image instead of the
top-level one to the guest, which can probably be labelled corruption.

   With Kevin's patch: write fails, guest may or may not handle the
   failure gracefully.  Consequences can range from guest complains to
   its logs (who cares) via guest stops whatever it's doing and refuses
   to continue until its hardware gets fixed (downtime as above) to
   data corruption.
 
 Example: innocent guest first writes a recognized pattern, then
 overwrites it with a non-recognized pattern.
 
   Now: works.
 
   With Kevin's patch: as above.
 
 Again, I'm not claiming the differences are serious in practice, only
 that they exist.

Yes, the failure scenario is different. The point still stands that if
it were relevant in practice, we would likely have heard of it before.

  You actually want to completely abolish probing? I thought we just
  wanted to use the filename as a second source of information and warn
  if the contents and the extension don't seem to match; and in the
  future, maybe make it an error (but we don't have to discuss that
  second part now, I think).
 
 Yes, I propose to ditch it completely, after a suitable grace period.  I
 tried to make that quite clear in my PATCH RFC 2/2.
 
 Here's why.
 
 Now: 1. probe
  4. open, error out if meta-data is bad
 
 With my proposed patch:
  1. probe
  2. guess from trusted meta-data
  3. warn unless 1 and 2 match
  4. open, error out if meta-data is bad
 
 Now make the warning an error:
  1. probe
  2. guess from trusted meta-data
  3. error out unless 1 and 2 match
  4. open, error out if meta-data is bad
 
 I figure the following is equivalent, but simpler:
 
  2. guess from trusted meta-data
  4. open, error out if meta-data is bad
 
 because open should detect all the errors the previous step 3 caught.
 If not, things are broken with explicit format=.

Not entirely true, see below.

  My conclusion: Don't ditch probing. It increases entropy, why would
  you ditch probing? Just combine it with the extension and if both
  don't seem to match, that's an error.
 
 How does probing add value?
 
 Compare
 
  1. probe
  2. guess from trusted meta-data
  3. error out unless 1 and 2 match
  4. open, error out if meta-data is bad
 
 to just
 
  2. guess from trusted meta-data
  4. open, error out if meta-data is bad
 
 Let P be the driver recommended by probe, and G be the driver
 recommended by guess.

P = qcow2, G = raw

 If P == G, same result: we open with the same driver.

No, they are not the same.

 Else, if open with G fails, equivalent result: error out in step 3
 vs. error out in step 4.

No, raw accepts the image.

 Else, we have an odd case: one driver's probe accepts (P's), yet a
 different driver's open succeeds (G's).
 
 If G's probe rejects: is this a bug?  Shouldn't open always fail
 when probe rejects?

No, raw's probe doesn't reject, it just has a very low score.

 If G's probe accepts, then probing chose P over G.  Maybe it
 shouldn't have.  Or maybe the probing functions are imprecise.
 Anyway, keeping probing around makes this an error.  Should it be
 one?

Yes, raw being the fallback for everything is imprecise. It's the only
way we have for probing raw.

 Am I missing something?

This is the safety measure that was missing in your proposal, against
corruption caused by a qcow2 image stored in foo.img that is now
unintentionally opened as raw.

Same thing probably for qcow2 stored on LVs etc.

Kevin

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Markus Armbruster

Dr. David Alan Gilbert dgilb...@redhat.com writes:

 * Markus Armbruster (arm...@redhat.com) wrote:
 I'll try to explain all solutions fairly.  Isn't easy when you're as
 biased towards one of them as I am.  Please bear with me.
 
 
 = The trust boundary between image contents and meta-data =
 
 A disk image consists of image contents and meta-data.
 
 Example: all of a raw image's contents is image contents.  Leaves just
 file name and attributes for meta-data.
 
 Example: QCOW2 meta-data includes header, header extensions, L1 table,
 L2 tables, ...  The meta-data defines where in the image the actual
 contents is stored.
 
 A guest can access the image contents, not the meta-data.
 
 Image contents you've let an untrusted guest write is untrusted.
 
 Therefore, there's a trust boundary between image contents and
 meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
 contents.  The exact location of the trust boundary depends on the image
 format.

 I'm not sure of the line:
 'QEMU has to trust image meta-data'

Quoting myself: by configuring QEMU to use an image, the user instructs
QEMU to trust the image's meta-data.

 I think there are different levels of trust and people will be more
 prepared to accept levels of pain at the commandline to avoid different
 types of problem.

 A problem that could cause qemu to access arbitrary other files on the
 host (as backing files for example) is obviously the worst; especially
 if things like qemu-img and other analysis type stuff could trip it.

 Stuff that only allows a guest to misuse it's own block storage is bad;
 but it's nowhere near as bad as being able to walk around the host.

Yes, sensible, informed users will weigh risk against pain.  They may
decide that avoiding certain risks isn't worth the pain for them.

Reckless or under-informed users will go with whatever causes the least
pain.

Our job is to reduce the pain.  Secure usage should not be painful.

That way, sensible, informed users get to take less pain, and the other
users get hopefully exposed to less risk.

Re: [Qemu-devel] [PATCH v2] block/vdi: Limit maximum size even futher

2014-11-06 Thread Stefan Hajnoczi

On Tue, Oct 28, 2014 at 11:12:32AM +0100, Max Reitz wrote:
 The block layer read and write functions do not like requests which are
 bigger than INT_MAX bytes. Since the VDI bmap is read and written in a
 single operation, its size is therefore limited accordingly. This
 reduces the maximum VDI image size supported by QEMU to half of what it
 currently is (down to approximately 512 TB).
 
 The VDI test 084 has to be adapted accordingly. Actually, one could
 clearly see that it was broken from the Could not open
 'TEST_DIR/t.IMGFMT': Invalid argument line for an image which was
 supposed to work just fine.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
 v2:
 - Reducing the size to just under 512 TB wasn't enough because the bmap
   size is rounded up on sector boundaries; fixed (thanks for testing,
   Peter)
 - Finally a patch regarding this problem that I tested myself :-)
 ---
  block/vdi.c| 14 --
  tests/qemu-iotests/084 | 14 +++---
  tests/qemu-iotests/084.out | 13 -
  3 files changed, 27 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgpMwO8khkHia.pgp
Description: PGP signature

Re: [Qemu-devel] [v2 1/2] docs: Add a doc about multiple compression threads

2014-11-06 Thread Dr. David Alan Gilbert

* Li Liang (liang.z...@intel.com) wrote:
 Give some details about the multiple compression threads and how
 to use it in live migration.
 
 Signed-off-by: Li Liang liang.z...@intel.com
 ---
  docs/multiple-compression-threads.txt | 128 
 ++
  1 file changed, 128 insertions(+)
  create mode 100644 docs/multiple-compression-threads.txt
 
 diff --git a/docs/multiple-compression-threads.txt 
 b/docs/multiple-compression-threads.txt
 new file mode 100644
 index 000..a5e53de
 --- /dev/null
 +++ b/docs/multiple-compression-threads.txt

Should probably have migration in the title?

 +Usage
 +==
 +1. Verify the destination QEMU version is able to support the multiple
 +compression threads migration:
 +{qemu} info_migrate_capablilites
 +{qemu} ... compress: off ...
 +
 +2. Activate compression on the souce:
 +{qemu} migrate_set_capability compress on
 +
 +3. Set the compression thread count on source:
 +{qemu} migrate_set_compress_threads 10
 +
 +4. Set the compression level on the source:
 +{qemu} migrate_set_compress_level 1
 +
 +5. Set the decompression thread count on destination:
 +{qemu} migrate_set_decompress_threads 5
 +
 +6. Start outgoing migration:
 +{qemu} migrate -d tcp:destination.host:
 +{qemu} info migrate
 +Capabilities: ... compress: on
 +...
 +
 +TODO
 +
 +Some faster compression/decompression method such as lz4 and quicklz
 +can help to reduce the CPU consumption when doing (de)compression.
 +Less (de)compression threads are needed when doing the migration.

OK, some high level questions:
   1) How does the performance compare to running a separate compressor
process in the stream rather than embedding it in the qemu?

   2) Since you're looking at different compression schemes do we need
something in the settings to select it, and to say what makes sense
for the 'compress_level'?   For example I don't know if lz4 or quicklz
have 1-10 for their compression levels?  How do I know which compression
schemes are available on any host?

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2] block/vdi: Limit maximum size even futher

2014-11-06 Thread Peter Lieven


On 06.11.2014 14:06, Stefan Hajnoczi wrote:

On Tue, Oct 28, 2014 at 11:12:32AM +0100, Max Reitz wrote:

The block layer read and write functions do not like requests which are
bigger than INT_MAX bytes. Since the VDI bmap is read and written in a
single operation, its size is therefore limited accordingly. This
reduces the maximum VDI image size supported by QEMU to half of what it
currently is (down to approximately 512 TB).

The VDI test 084 has to be adapted accordingly. Actually, one could
clearly see that it was broken from the Could not open
'TEST_DIR/t.IMGFMT': Invalid argument line for an image which was
supposed to work just fine.

Signed-off-by: Max Reitz mre...@redhat.com
---
v2:
- Reducing the size to just under 512 TB wasn't enough because the bmap
   size is rounded up on sector boundaries; fixed (thanks for testing,
   Peter)
- Finally a patch regarding this problem that I tested myself :-)
---
  block/vdi.c| 14 --
  tests/qemu-iotests/084 | 14 +++---
  tests/qemu-iotests/084.out | 13 -
  3 files changed, 27 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


Reviewed-by: Peter Lieven p...@kamp.de

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-06 Thread Mark Rutland

On Thu, Nov 06, 2014 at 06:53:03AM +, Hanjun Guo wrote:
 On 2014-10-31 2:02, Mark Rutland wrote:
  On Thu, Oct 30, 2014 at 05:52:44PM +, Peter Maydell wrote:
  On 30 October 2014 17:43, Alexander Spyridakis
  a.spyrida...@virtualopensystems.com wrote:
  Currently, the virt machine model generates Device Tree information 
  dynamically based on the existing devices in the system. This patch 
  series extends the same concept but for ACPI information instead. A total 
  of seven tables have been
  implemented in this patch series, which is the minimum for a basic ARM 
  support.
 
  The set of generated tables are:
  - RSDP
  - XSDT
  - MADT
  - GTDT
  - FADT
  - FACS
  - DSDT
 
  The tables are created in standalone buffers, taking into account the
  needed information passed from the virt machine model. When the generation
  is finalized, the individual buffers are compacted to a single ACPI binary
  blob, where it is injected on the guest memory space in a fixed location.
  The guest kernel can find the ACPI tables by providing to it the physical
  address of the ACPI blob (e.g. acpi_rsdp=0x4700 boot argument).
 
  (Sorry, I should have waited for the cover letter to arrive before 
  replying.)
 
  I think this is definitely the wrong approach. We already have to
  generate device tree information for the hardware we have, and having
  an equivalent parallel infrastructure for generating ACPI as well
  seems like it would be a tremendous mess. We should support guests
  that require ACPI by having QEMU boot a UEFI bios blob and have that
  UEFI code generate ACPI tables based on the DTB we hand it.
  (Chances seem good that any guest that wants ACPI is going to want
  UEFI runtime services anyway.)
  
  Depending on why people want ACPI in a guest environment, generating
  ACPI tables from a DTB might not be possible (e.g. if they want to use
  AML for some reason).
 
 Agreed.
 
  
  So the important question is _why_ the guest needs to see an ACPI
  environment. What exactly can ACPI provide to the guest that DT does not
  already provide, and why is that necessary? What infrastrucutre is
  needed for that use case?
 
 There is important feature called system device dynamic reconfiguration,
 you know, hot-add/remove, if a gust need more/less memory or CPU, can we
 add or remove them dynamically with DT? ACPI can do this, but I have no
 idea if DT can. (Sorry, I don't know much about DT)

There is no way of doing this with DT. There has been work into DT
fragments/overlays where portions can be added to the tree dynamically,
but that's controlled by the OS rather than the hypervisor, and there's
no standard for communicating what has been hotplugged to trigger
changes to the tree, so it's not quite the same. It really only works
for tightly coupled hw/kernel/userspace combinations (i.e. embedded).

Depending on how you implement the hot-add/remove you might be able to
get away with an initial static configuration translated from DT. If you
need to describe what might be hotplugged from the start, then I suspect
you cannot get away with translating a DT in general.

Thanks,
Mark.

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-06 Thread Alexander Spyridakis

On 6 November 2014 14:44, Peter Maydell peter.mayd...@linaro.org wrote:


  We need ACPI guest support in QEMU for AArch64 over here, with all features
  (including the ability to run ACPI code and add specific tables), for
  ACPI-based guests.

 The plan for providing ACPI to guests is that we run a UEFI BIOS
 blob which is what is responsible for providing ACPI and UEFI
 runtime services to guests which need them. (The UEFI blob finds
 out about its hardware by looking at a device tree that QEMU
 passes it, but that's a detail between QEMU and its bios blob).
 This pretty much looks like what x86 QEMU used to do with ACPI
 for a very long time, so we know it's a feasible approach.

Hi Peter,

The rational in the proposed approach is meant for cases where the
user does not want to rely on external firmware layers. While UEFI
could do what you are describing, the point is to avoid this not so
trivial overhead in the booting process. Especially in the case of
thin guests, where another software dependency is undesired.

Regards.

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-06 Thread Arnd Bergmann

On Thursday 06 November 2014 13:30:01 Mark Rutland wrote:
 
 There is no way of doing this with DT. There has been work into DT
 fragments/overlays where portions can be added to the tree dynamically,
 but that's controlled by the OS rather than the hypervisor, and there's
 no standard for communicating what has been hotplugged to trigger
 changes to the tree, so it's not quite the same. It really only works
 for tightly coupled hw/kernel/userspace combinations (i.e. embedded).
 
 Depending on how you implement the hot-add/remove you might be able to
 get away with an initial static configuration translated from DT. If you
 need to describe what might be hotplugged from the start, then I suspect
 you cannot get away with translating a DT in general.

I believe IBM POWER 5/6/7 servers have an interface to update the DT
from the hypervisor, but it's not a nice interface, so we may want to
avoid duplicating that.

Arnd

Re: [Qemu-devel] [v2 1/2] docs: Add a doc about multiple compression threads

2014-11-06 Thread Eric Blake

On 11/06/2014 02:24 PM, Dr. David Alan Gilbert wrote:
 * Li Liang (liang.z...@intel.com) wrote:
 Give some details about the multiple compression threads and how
 to use it in live migration.

 Signed-off-by: Li Liang liang.z...@intel.com
 ---

 +TODO
 +
 +Some faster compression/decompression method such as lz4 and quicklz
 +can help to reduce the CPU consumption when doing (de)compression.
 +Less (de)compression threads are needed when doing the migration.
 
 OK, some high level questions:
1) How does the performance compare to running a separate compressor
 process in the stream rather than embedding it in the qemu?

Interesting question.  I wonder if libvirt should be extended to
optionally insert a compression/decompression filter in the setups it
creates.  Remember, in libvirt tunnelled mode, where libvirt is adding
TLS encryption on top of the migration data stream so that it is not
sniffable from TCP, all data is already going through the path:

source qemu - source libvirt - destination libvirt - destination qemu
  Unix socket/pipe  TCP socket  Unix socket/pipe

Furthermore, libvirt is ALREADY wired up to use external compression
when doing migration to file (such as supporting multiple compression
formats for 'virsh save'), which looks like:

qemu - compressor - libvirt I/O helper - file
 pipe pipe   O_DIRECT file ops

then restoring that image with:

file - libvirt I/O helper - decompressor - qemu
  O_DIRECT file ops  pipe pipe

So adding compression in the mix seems like it would be easy for libvirt
to do:

source qemu - compressor - source libvirt - destination libvirt ...
  pipe   pipeTCP socket
   - decompressor - destination qemu
 pipe pipe


Of course, with an external processor, I don't know if you can get
speedups from having multiple compression threads when all input is
coming serially from a single connection, so your approach of folding in
parallel compression threads directly into qemu may still have some
speed merits.  On the other hand, I'm not sure how your solution is
multiplexing the multiple compression threads into a single migration
stream; if you are still bottlenecked by a single migration stream, what
good do you get by adding multiple (de)compression threads, without some
way in the migration protocol to cleanly call out a fair rotation from
the independent sub-stream of each thread?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-06 Thread Peter Maydell

On 6 November 2014 13:33, Alexander Spyridakis
a.spyrida...@virtualopensystems.com wrote:
 The rational in the proposed approach is meant for cases where the
 user does not want to rely on external firmware layers. While UEFI
 could do what you are describing, the point is to avoid this not so
 trivial overhead in the booting process. Especially in the case of
 thin guests, where another software dependency is undesired.

Can you boot an x86 QEMU/KVM ACPI-using guest without running
the BIOS?

-- PMM

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Markus Armbruster

Kevin Wolf kw...@redhat.com writes:

 Am 04.11.2014 um 19:45 hat Markus Armbruster geschrieben:
 I'll try to explain all solutions fairly.  Isn't easy when you're as
 biased towards one of them as I am.  Please bear with me.
 
 
 = The trust boundary between image contents and meta-data =
 
 A disk image consists of image contents and meta-data.
 
 Example: all of a raw image's contents is image contents.  Leaves just
 file name and attributes for meta-data.

 Better: Leaves only protocol-specific metadata (e.g. file name and
 attributes for raw-posix).

Can you give examples for other protocols?

 Example: QCOW2 meta-data includes header, header extensions, L1 table,
 L2 tables, ...  The meta-data defines where in the image the actual
 contents is stored.
 
 A guest can access the image contents, not the meta-data.
 
 Image contents you've let an untrusted guest write is untrusted.
 
 Therefore, there's a trust boundary between image contents and
 meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
 contents.  The exact location of the trust boundary depends on the image
 format.

 Trust metadata to a certain degree - the block layer audit was started
 because we noticed that it might not be all that trustworthy in
 practice. Different problem, though, it just shows that there are hardly
 any clear borders between always completely trusted and always
 completely untrusted.

Yes.  Similar point made by Dave, I think.

Say I receive a QCOW2 image from an untrusted source.  Things I'd like
to be able to do securely with it:

* Examine it with qemu-img info.

* Open it with BDRV_O_NO_BACKING.

* If the backing file is innocent, open it without BDRV_O_NO_BACKING.

Assuming I do all this with trusted tools, of course.

However, here I talk about something else: the trust boundary between
image contents and meta-data.

 = How we instruct QEMU what to trust =
 
 By configuring QEMU to use an image, the user instructs QEMU to trust
 the image's meta-data.
 
 When the user's configuration specifies the image format explicitly, the
 trust boundary is clear.
 
 Else, the trust boundary is ambigous when more than one format is
 possible.
 
 QEMU resolves this ambiguity by picking the first format with the
 highest score.  Raw format is always possible, and always has the
 lowest score.

 You used the term untrusted guest before. Are there any trusted guests,
 or should we assume that guests are untrusted by definition?

I'm trying to allow for users saying guest subverting this trust
boundary is not an issue for me, now get off my lawn!

 If you use virtualisation for isolation, then the answer is probably
 that guests are always untrusted. Other users may know exactly what
 their guest is doing and are using qemu for other reasons. The former
 would probably want to disable probing completely, the latter don't care
 about it and prefer convenience.

Yes.

The third group is unsophisticated users blissfully unaware of the
problem.

 My guess is that the share of those with trusted guests is higher among
 direct qemu users than libvirt users, but it's just that, a guess. It
 also doesn't mean that they are the majority of direct qemu users (they
 might be, but I honestly don't know).

Me neither.

If secure usage was easy, it wouldn't really matter.

 If there are trusted and untrusted guests, does this section need some
 thoughts about instructing qemu whether to trust the guest or not?

I don't know.

 = How this lets the guest escape isolation =
 
 Unfortunately, this lets the guest shift the trust boundary and escape
 isolation, as follows:
 
 * Expose a raw image to the guest (whether you specify the format=raw or
   let QEMU guess it doesn't matter).  The complete contents becomes
   untrusted.
 
 * Reuse the image *without* specifying the raw format.  QEMU guesses the
   format based on untrusted image contents.  Now QEMU guesses a format
   chosen by the guest, with meta-data chosen by the guest.  By
   controlling image meta-data, the malicious guest can access arbitrary
   files as QEMU, enlarge its storage, and more.  A non-malicious guest
   can accidentally DoS itself, by writing a pattern probing recognizes.
 
 This is CVE-2008-2004.
 
 
 = Aside: other trust boundaries =
 
 Of course, this is not the only trust boundary that matters.  For
 instance, there's normally one between your host and somebody else's
 computers.  Telling QEMU to trust meta-data of some image you got from
 the internet violates it.  There's nothing QEMU can do about that.

 Okay, this addresses what I commented above.

Good :)

 = Insecure usage is easy, secure usage is hard =
 
 The oldest stratum of user interfaces doesn't let you specify the image
 format.  Use of raw images with these is insecure by design.  These
 interfaces are still recommended for human users.
 
 Example of insecure usage: -hda foo.img, where foo.img is raw.
 
 With the next generation of interfaces, specifying the image format is

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Eric Blake

On 11/06/2014 02:57 PM, Markus Armbruster wrote:

 Yes, you can override the backing file driver (backing.driver=raw should
 do the trick). Not really user-friendly, especially with long backing
 file chains, but it happens to be there.

 And of course, libvirt should be using it for non-qcow2 or qcow2 without
 the backing format header extension (but doesn't yet).
 
 I'm glad it's there.  Too bad libvirt doesn't use it, yet.  Supports my
 point that secure usage is too hard now.

Indeed, libvirt is still lacking on enforcing the backing type that it
probed vs. letting qemu re-probe a (possibly different) backing type.
Were libvirt to actually enforce this (that is, libvirt's default
out-of-the-box policy is to avoid all probes, and treat anything without
a type as raw) means that a user that forgets to use -obacking_fmt and
creates a chain base-mid-top will have the following conflicting view:

libvirt: mid[raw]-top[qcow2]
qemu: base[qcow2]-mid[qcow2]-top[qcow2]

Right now, if the chain is only 2 deep, qemu happily boots the guest
with qcow2 format (in spite of libvirt treating mid as raw); but when
the chain is 3 deep, because libvirt failed to give SELinux permissions
to base, then qemu fails to open base, and fails to boot, but the
failure message is very cryptic.

On the other hand, if libvirt were to ENFORCE its view that mid is raw,
by passing appropriate drive options, then qemu would always boot, but
now the top image would be visibly corrupted (treating a qcow2 file as
raw leads to MUCH different data visible in the guest) and the guest
will likely fail to boot completely, but with no error message from qemu
(rather more likely that things just hang in a 100% cpu loop in the
guest).  Although the existing error message is cryptic, this new
behavior of enforcing a probed image to be raw feels like it will be
even more user-unfriendly.

At any rate, I've certainly been working on getting libvirt to output
the ENTIRE backing chain that it has determined, rather than its old
behavior of keeping that information in memory only; this at least helps
libvirt developers diagnose bug reports (show me what libvirt thought
your backing chain was, then go fix your XML to tell libvirt the truth
and your problem will go away, if you didn't corrupt the backing files
in the meantime with something like a 'commit' operation).

[I mentioned libvirt's default policy is to forbid probing and treat
untyped disks as raw; but both of those knobs can be tweaked, to either
allow probing, or treat the default type as qcow2, or both]



 .img is not as clear, I've seen people using it for other formats. It's
 still a disk image, but not a raw one.
 
 Is this usage common?

At least on my laptop - yes.  I have several qcow2 files with an image
suffix of .img (perhaps because I was lazy when I created them, or
sometimes because I was quickly hacking a script to add a -fqcow2 to a
qemu-img line without changing the file name, because changing the name
would have a ripple effect on the rest of the script).  But I don't know
if my usage is typical, and I also don't mind adjusting my naming
conventions to silence a warning if qemu starts getting picky about
confusing name-vs-contents issues.  And if I consistently use
format=qcow2, I shouldn't be penalized with the warning, right?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 10/11] fixup! pc: explicitly check maxmem limit when adding DIMM

2014-11-06 Thread Igor Mammedov

Fix build failure that would be caused by missing pc_dimm_count()
that was introduced in dropped
  [03/11] pc: check if KVM has enough memory slots for DIMM devices

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/i386/pc.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1d413a7..e656658 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1550,6 +1550,18 @@ void qemu_register_pc_machine(QEMUMachine *m)
 g_free(name);
 }
 
+static int pc_dimm_count(Object *obj, void *opaque)
+{
+int *count = opaque;
+
+if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+(*count)++;
+}
+
+object_child_foreach(obj, pc_dimm_count, opaque);
+return 0;
+}
+
 static int pc_existing_dimms_capacity(Object *obj, void *opaque)
 {
 Error *local_err = NULL;
-- 
1.8.3.1

[Qemu-devel] [PATCH v7] Support vhd type VHD_DIFFERENCING

2014-11-06 Thread Xiaodong Gong

Now qemu only supports vhd type VHD_FIXED and VHD_DYNAMIC, so qemu
can't read snapshot volume of vhd, and can't support other storage
features of vhd file.

This patch add read parent information in function vpc_open, read
bitmap in vpc_read, and change bitmap in vpc_write.

Signed-off-by: Xiaodong Gong gongxiaodo...@huawei.com
Reviewed-by: Ding xiao ssdx...@163.com
---
Changes since v6:
- remove include of iconv.h (Stefan Hajnoczi)
- make sure data_length  length of backing_file (Stefan Hajnoczi)
- change big-ending of platform to cpu order (Stefan Hajnoczi)

Changes since v5:
- Change malloc to g_malloc. (Gonglei)(Stefan Hajnoczi)
- Fix the bug of free(null). (Gonglei)(Stefan Hajnoczi)

Changes since v4:
- Parse the batmap only when the version of VHD  1.2. (Lucian Petrut)
- Add support to parent location of W2RU. (Lucian Petrut) (Philipp Hahn)

Changes since v3:
- Remove the PARENT_MAX_LOC.

Changes since v2:
- Change MACX to PLATFAORM_MACX. (Kevin Wolf)
- Return with EINVAL to parent location is W2RU and W2KU. (Kevin Wolf)
- Change -1 == ret to a natrual order of ret == -1. (Kevin Wolf)
- Get rid of the get_sector_offset_diff, get_sector_offset
  supports VHD_DIFF. (Kevin Wolf)
- Return code of get_sector_offset is set to, -1 for error,
  -2 for not allocate, -3 for in parent. (Kevin Wolf)
- Fix un init ret of vpc_write, when nb_sector == 0. (Kevin Wolf)
- Change if (diff == ture) to if (diff) and so on. (Kevin Wolf)
- Add PARENT_MAX_LOC to more understand. (Kevin Wolf)
- Restore the boundary check to write on dynamic type in
  get_sector_offset. (Kevin Wolf)

Changes since v1:
- Add Boundary check to any input. (Stefan Hajnoczi)
- Clean the code no used after in vpc_open. (Stefan Hajnoczi)
- Change bdrv_co_readv() to bdrv_preadv in vpc_read. (Stefan Hajnoczi)
- Added some code to make it easy to understand. (Stefan Hajnoczi)
---
 block/vpc.c | 430 ++--
 1 file changed, 360 insertions(+), 70 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 38c4f02..151ce85 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -33,13 +33,22 @@
 /**/
 
 #define HEADER_SIZE 512
+#define DYNAMIC_HEADER_SIZE 1024
+#define PARENT_LOCATOR_NUM 8
+#define MACX_PREFIX_LEN 7 /* file:// */
+#define TBBATMAP_HEAD_SIZE 28
+
+#define PLATFORM_MACX 0x5863614d /* big endian */
+#define PLATFORM_W2RU 0x75723257
+
+#define VHD_VERSION(major, minor)  (((major)  16) | ((minor)  0x))
 
 //#define CACHE
 
 enum vhd_type {
 VHD_FIXED   = 2,
 VHD_DYNAMIC = 3,
-VHD_DIFFERENCING= 4,
+VHD_DIFF= 4,
 };
 
 // Seconds since Jan 1, 2000 0:00:00 (UTC)
@@ -138,6 +147,15 @@ typedef struct BDRVVPCState {
 Error *migration_blocker;
 } BDRVVPCState;
 
+typedef struct vhd_tdbatmap_header {
+char magic[8]; /* always tdbatmap */
+
+uint64_t batmap_offset;
+uint32_t batmap_size;
+uint32_t batmap_version;
+uint32_t checksum;
+} QEMU_PACKED VHDTdBatmapHeader;
+
 static uint32_t vpc_checksum(uint8_t* buf, size_t size)
 {
 uint32_t res = 0;
@@ -157,6 +175,108 @@ static int vpc_probe(const uint8_t *buf, int buf_size, 
const char *filename)
 return 0;
 }
 
+static int vpc_read_backing_loc(VHDDynDiskHeader *dyndisk_header,
+BlockDriverState *bs,
+Error **errp)
+{
+BDRVVPCState *s = bs-opaque;
+int64_t data_offset = 0;
+int data_length = 0;
+uint32_t platform;
+bool done = false;
+int parent_locator_offset = 0;
+int i;
+int ret = 0;
+
+for (i = 0; i  PARENT_LOCATOR_NUM; i++) {
+data_offset =
+be64_to_cpu(dyndisk_header-parent_locator[i].data_offset);
+data_length =
+be32_to_cpu(dyndisk_header-parent_locator[i].data_length);
+platform =
+be32_to_cpu(dyndisk_header-parent_locator[i].platform);
+
+/* Extend the location offset */
+if (parent_locator_offset  data_offset) {
+parent_locator_offset = data_offset;
+}
+
+if (done) {
+continue;
+}
+
+/* Skip file:// in MacX platform */
+if (platform == PLATFORM_MACX) {
+   if (data_length  MACX_PREFIX_LEN) {
+   return -1;
+   }
+
+data_offset += MACX_PREFIX_LEN;
+data_length -= MACX_PREFIX_LEN;
+}
+
+/* Read location of backing file */
+if (platform == PLATFORM_MACX || platform == PLATFORM_W2RU) {
+if (data_offset  s-max_table_entries * s-block_size) {
+return -1;
+}
+if (data_length  sizeof(bs-backing_file) - 1) {
+return ret;
+}
+ret = bdrv_pread(bs-file, data_offset, bs-backing_file,
+data_length);
+if (ret  0) {
+return ret;
+}
+
+

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-06 Thread Hanjun Guo

On 2014-10-31 2:02, Mark Rutland wrote:
 On Thu, Oct 30, 2014 at 05:52:44PM +, Peter Maydell wrote:
 On 30 October 2014 17:43, Alexander Spyridakis
 a.spyrida...@virtualopensystems.com wrote:
 Currently, the virt machine model generates Device Tree information 
 dynamically based on the existing devices in the system. This patch series 
 extends the same concept but for ACPI information instead. A total of seven 
 tables have been
 implemented in this patch series, which is the minimum for a basic ARM 
 support.

 The set of generated tables are:
 - RSDP
 - XSDT
 - MADT
 - GTDT
 - FADT
 - FACS
 - DSDT

 The tables are created in standalone buffers, taking into account the
 needed information passed from the virt machine model. When the generation
 is finalized, the individual buffers are compacted to a single ACPI binary
 blob, where it is injected on the guest memory space in a fixed location.
 The guest kernel can find the ACPI tables by providing to it the physical
 address of the ACPI blob (e.g. acpi_rsdp=0x4700 boot argument).

 (Sorry, I should have waited for the cover letter to arrive before replying.)

 I think this is definitely the wrong approach. We already have to
 generate device tree information for the hardware we have, and having
 an equivalent parallel infrastructure for generating ACPI as well
 seems like it would be a tremendous mess. We should support guests
 that require ACPI by having QEMU boot a UEFI bios blob and have that
 UEFI code generate ACPI tables based on the DTB we hand it.
 (Chances seem good that any guest that wants ACPI is going to want
 UEFI runtime services anyway.)
 
 Depending on why people want ACPI in a guest environment, generating
 ACPI tables from a DTB might not be possible (e.g. if they want to use
 AML for some reason).

Agreed.

 
 So the important question is _why_ the guest needs to see an ACPI
 environment. What exactly can ACPI provide to the guest that DT does not
 already provide, and why is that necessary? What infrastrucutre is
 needed for that use case?

There is important feature called system device dynamic reconfiguration,
you know, hot-add/remove, if a gust need more/less memory or CPU, can we
add or remove them dynamically with DT? ACPI can do this, but I have no
idea if DT can. (Sorry, I don't know much about DT)

Thanks
Hanjun

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Jeff Cody

On Thu, Nov 06, 2014 at 02:57:07PM +0100, Markus Armbruster wrote:
 Kevin Wolf kw...@redhat.com writes:
 
  Am 04.11.2014 um 19:45 hat Markus Armbruster geschrieben:

[...]

  I proposed something less radical, namely to keep guessing the image
  format, but base the guess on trusted meta-data only: file name and
  attributes.  Block and character special files are raw.  For other
  files, find the file name extension, and look up the format claiming it.
  
  PRO: Plugs the hole.
  
  CON: Breaks existing usage when the new guess differs from the old
  guess.  Common usage should be fine:
  
  * -hda test.qcow2
  
Fine as long as test.qcow2 is really QCOW2 (as it should!), and
either specifies a backing format (as it arguably should), or the
backing file name is sane.
  
  * -hda disk.img
  
Fine as long as disk.img is really a disk image (as it should).
 
  .img is not as clear, I've seen people using it for other formats. It's
  still a disk image, but not a raw one.
 
 Is this usage common?
 

More anecdotal data: Like Eric, I have non-raw images using a .img
extension.

Also, .img as a generic naming convention is useful enough that some
of our own qemu iotests use it, regardless of format (mainly in block
job python tests)

[...]

[Qemu-devel] [PATCH RESEND] Support vhd type VHD_DIFFERENCING

2014-11-06 Thread Xiaodong Gong

Now qemu only supports vhd type VHD_FIXED and VHD_DYNAMIC, so qemu
can't read snapshot volume of vhd, and can't support other storage
features of vhd file.

This patch add read parent information in function vpc_open, read
bitmap in vpc_read, and change bitmap in vpc_write.

Signed-off-by: Xiaodong Gong gongxiaodo...@huawei.com
Reviewed-by: Ding xiao ssdx...@163.com
---
Changes since v6:
- remove include of iconv.h (Stefan Hajnoczi)
- make sure data_length  length of backing_file (Stefan Hajnoczi)
- change big-ending of platform to cpu order (Stefan Hajnoczi)

Changes since v5:
- Change malloc to g_malloc. (Gonglei)(Stefan Hajnoczi)
- Fix the bug of free(null). (Gonglei)(Stefan Hajnoczi)

Changes since v4:
- Parse the batmap only when the version of VHD  1.2. (Lucian Petrut)
- Add support to parent location of W2RU. (Lucian Petrut) (Philipp Hahn)

Changes since v3:
- Remove the PARENT_MAX_LOC.

Changes since v2:
- Change MACX to PLATFAORM_MACX. (Kevin Wolf)
- Return with EINVAL to parent location is W2RU and W2KU. (Kevin Wolf)
- Change -1 == ret to a natrual order of ret == -1. (Kevin Wolf)
- Get rid of the get_sector_offset_diff, get_sector_offset
  supports VHD_DIFF. (Kevin Wolf)
- Return code of get_sector_offset is set to, -1 for error,
  -2 for not allocate, -3 for in parent. (Kevin Wolf)
- Fix un init ret of vpc_write, when nb_sector == 0. (Kevin Wolf)
- Change if (diff == ture) to if (diff) and so on. (Kevin Wolf)
- Add PARENT_MAX_LOC to more understand. (Kevin Wolf)
- Restore the boundary check to write on dynamic type in
  get_sector_offset. (Kevin Wolf)

Changes since v1:
- Add Boundary check to any input. (Stefan Hajnoczi)
- Clean the code no used after in vpc_open. (Stefan Hajnoczi)
- Change bdrv_co_readv() to bdrv_preadv in vpc_read. (Stefan Hajnoczi)
- Added some code to make it easy to understand. (Stefan Hajnoczi)
---
 block/vpc.c | 430 ++--
 1 file changed, 360 insertions(+), 70 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 38c4f02..c002270 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -33,13 +33,22 @@
 /**/
 
 #define HEADER_SIZE 512
+#define DYNAMIC_HEADER_SIZE 1024
+#define PARENT_LOCATOR_NUM 8
+#define MACX_PREFIX_LEN 7 /* file:// */
+#define TBBATMAP_HEAD_SIZE 28
+
+#define PLATFORM_MACX 0x5863614d /* big endian */
+#define PLATFORM_W2RU 0x75723257
+
+#define VHD_VERSION(major, minor)  (((major)  16) | ((minor)  0x))
 
 //#define CACHE
 
 enum vhd_type {
 VHD_FIXED   = 2,
 VHD_DYNAMIC = 3,
-VHD_DIFFERENCING= 4,
+VHD_DIFF= 4,
 };
 
 // Seconds since Jan 1, 2000 0:00:00 (UTC)
@@ -138,6 +147,15 @@ typedef struct BDRVVPCState {
 Error *migration_blocker;
 } BDRVVPCState;
 
+typedef struct vhd_tdbatmap_header {
+char magic[8]; /* always tdbatmap */
+
+uint64_t batmap_offset;
+uint32_t batmap_size;
+uint32_t batmap_version;
+uint32_t checksum;
+} QEMU_PACKED VHDTdBatmapHeader;
+
 static uint32_t vpc_checksum(uint8_t* buf, size_t size)
 {
 uint32_t res = 0;
@@ -157,6 +175,108 @@ static int vpc_probe(const uint8_t *buf, int buf_size, 
const char *filename)
 return 0;
 }
 
+static int vpc_read_backing_loc(VHDDynDiskHeader *dyndisk_header,
+BlockDriverState *bs,
+Error **errp)
+{
+BDRVVPCState *s = bs-opaque;
+int64_t data_offset = 0;
+int data_length = 0;
+uint32_t platform;
+bool done = false;
+int parent_locator_offset = 0;
+int i;
+int ret = 0;
+
+for (i = 0; i  PARENT_LOCATOR_NUM; i++) {
+data_offset =
+be64_to_cpu(dyndisk_header-parent_locator[i].data_offset);
+data_length =
+be32_to_cpu(dyndisk_header-parent_locator[i].data_length);
+platform =
+be32_to_cpu(dyndisk_header-parent_locator[i].platform);
+
+/* Extend the location offset */
+if (parent_locator_offset  data_offset) {
+parent_locator_offset = data_offset;
+}
+
+if (done) {
+continue;
+}
+
+/* Skip file:// in MacX platform */
+if (platform == PLATFORM_MACX) {
+if (data_length  MACX_PREFIX_LEN) {
+return -1;
+}
+
+data_offset += MACX_PREFIX_LEN;
+data_length -= MACX_PREFIX_LEN;
+}
+
+/* Read location of backing file */
+if (platform == PLATFORM_MACX || platform == PLATFORM_W2RU) {
+if (data_offset  s-max_table_entries * s-block_size) {
+return -1;
+}
+if (data_length  sizeof(bs-backing_file) - 1) {
+return ret;
+}
+ret = bdrv_pread(bs-file, data_offset, bs-backing_file,
+data_length);
+if (ret  0) {
+return ret;
+}
+
+

[Qemu-devel] [PATCH] seccomp: change configure to avoid arm 32 to break

2014-11-06 Thread Eduardo Otubo

Right now seccomp is breaking the compilation of Qemu on armv7l due
to libsecomp current lack of support for this arch. This problem is
already fixed on libseccomp upstream but no release date for that is
scheduled to far. This patch disables support for seccomp on armv7l
temporarily until libseccomp does a new release. Then I'll remove the
hack and update libseccomp dependency on configure script.

Related bug: https://bugs.launchpad.net/qemu/+bug/1363641

Signed-off-by: Eduardo Otubo eduardo.ot...@profitbricks.com
---
 configure | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 2f17bf3..16fd7f5 100755
--- a/configure
+++ b/configure
@@ -1823,15 +1823,17 @@ fi
 # libseccomp check
 
 if test $seccomp != no ; then
-if $pkg_config --atleast-version=2.1.0 libseccomp; then
-libs_softmmu=$libs_softmmu `$pkg_config --libs libseccomp`
-QEMU_CFLAGS=$QEMU_CFLAGS `$pkg_config --cflags libseccomp`
-   seccomp=yes
-else
-   if test $seccomp = yes; then
-feature_not_found libseccomp Install libseccomp devel = 2.1.0
-   fi
-   seccomp=no
+if test $cpu = i386 || test $cpu = x86_64; then
+if $pkg_config --atleast-version=2.1.0 libseccomp; then
+libs_softmmu=$libs_softmmu `$pkg_config --libs libseccomp`
+QEMU_CFLAGS=$QEMU_CFLAGS `$pkg_config --cflags libseccomp`
+seccomp=yes
+else
+if test $seccomp = yes; then
+feature_not_found libseccomp Install libseccomp devel = 
2.1.0
+fi
+seccomp=no
+fi
 fi
 fi
 ##
-- 
1.9.1

[Qemu-devel] How to use bootloader with OSX?

2014-11-06 Thread eezacque

I spent some time with Andrei Ostanin's Playing with Mac OS X on KVM
(http://blog.ostanin.org/2014/02/11/playing-with-mac-os-x-on-kvm/) which
works like a charm, except for the final part where I am supposed to
install a bootloader (Chimera, Chameleon) to be able to boot my virtual
machine. In principle, I could do without, but I would like to set some
boot parameters in the bootloader configuration. However, I cannot get
this to work: I cannot boot at all. Any suggestions are welcome...

Thanks,
Izak

[Qemu-devel] Qemu-KVM: Virtual Machine Power Managment

2014-11-06 Thread Carew, Alan

Hi folks,

I am looking for feedback regarding work-in-progress or planned CPU power
management features for Qemu-KVM based Virtual Machines.

Looking back through the mailing list archives I did not find any discussion
or patches relating to the general problem of virtual machine power
management.

Currently the MSRs for power management are not exposed to a guest OS and as
far as I am aware no abstraction driver exists to facilitate acpi_cpufreq like
features on a guest.

The context for my query relates to deterministic power management at the
application level(non-VM), where in certain domains the workload is partitioned
on a per core basis(1:1 thread-core exclusive pinning) and based on current
workload an application thread can request a transition to a different P-State,
for example using the apci_cpufreq userspace power governor.

Are there any plans or previous discussion that I missed for exposing such an
ability or similar for Qemu-KVM based virtual machines.

Thanks in advance,

Alan.

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Jeff Cody

On Thu, Nov 06, 2014 at 01:53:35PM +0100, Max Reitz wrote:
 On 2014-11-06 at 13:26, Markus Armbruster wrote:
 Max Reitz mre...@redhat.com writes:
 
 On 2014-11-04 at 19:45, Markus Armbruster wrote:
 I'll try to explain all solutions fairly.  Isn't easy when you're as
 biased towards one of them as I am.  Please bear with me.
 
 
 = The trust boundary between image contents and meta-data =
 
 A disk image consists of image contents and meta-data.
 
 Example: all of a raw image's contents is image contents.  Leaves just
 file name and attributes for meta-data.
 
 Example: QCOW2 meta-data includes header, header extensions, L1 table,
 L2 tables, ...  The meta-data defines where in the image the actual
 contents is stored.
 
 A guest can access the image contents, not the meta-data.
 
 Image contents you've let an untrusted guest write is untrusted.
 
 Therefore, there's a trust boundary between image contents and
 meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
 contents.  The exact location of the trust boundary depends on the image
 format.
 
 
 = How we instruct QEMU what to trust =
 
 By configuring QEMU to use an image, the user instructs QEMU to trust
 the image's meta-data.
 
 When the user's configuration specifies the image format explicitly, the
 trust boundary is clear.
 
 Else, the trust boundary is ambigous when more than one format is
 possible.
 
 QEMU resolves this ambiguity by picking the first format with the
 highest score.  Raw format is always possible, and always has the
 lowest score.
 
 
 = How this lets the guest escape isolation =
 
 Unfortunately, this lets the guest shift the trust boundary and escape
 isolation, as follows:
 
 * Expose a raw image to the guest (whether you specify the format=raw or
 let QEMU guess it doesn't matter).  The complete contents becomes
 untrusted.
 
 * Reuse the image *without* specifying the raw format.  QEMU guesses the
 format based on untrusted image contents.  Now QEMU guesses a format
 chosen by the guest, with meta-data chosen by the guest.  By
 controlling image meta-data, the malicious guest can access arbitrary
 files as QEMU, enlarge its storage, and more.  A non-malicious guest
 can accidentally DoS itself, by writing a pattern probing recognizes.
 Thank you for bringing that to my attention. This means that I'm even
 more in favor of using Kevin's patches because in fact they don't
 break anything.
 They break things differently.  The difference may or may not matter.
 
 Example: innocent guest writes a recognized pattern.
 
Now: next restart fails, guest DoSed itself until host operator gets
around to adding format=raw to the configuration.  Consequence:
downtime (probably lengthy), but no data corruption.
 
With Kevin's patch: write fails, guest may or may not handle the
failure gracefully.  Consequences can range from guest complains to
its logs (who cares) via guest stops whatever it's doing and refuses
to continue until its hardware gets fixed (downtime as above) to
data corruption.
 
 You somehow seem convinced that writing to sector 0 is a completely
 normal operation. For x86, it isn't, though.
 
 There are only a couple of programs which do that, I can only think
 of partitioning and setting up boot loaders. There's not a myriad of
 programs which would increase the probability of one both writing a
 recognizable pattern *and* not handling EPERM correctly.
 
 I see the probability of both happening at the same time as
 extremely low, not least because there are only a handful of
 programs which access that sector.


I'm not yet opposed to the restricted-raw method, but...

I think the above is a somewhat dangerous viewpoint to take with QEMU.
It is a bit of a slippery slope to start to assume what data guests
will write to the disks provided to them.  Even if the probability of
this happening is very low, with what usage we envision now, it is
still entirely legitimate usage for a guest to write data starting at
sector 0.

 Example: innocent guest first writes a recognized pattern, then
 overwrites it with a non-recognized pattern.
 
Now: works.
 
With Kevin's patch: as above.
 
 Not really, if the guest overwrites the data with a non-recognized
 pattern after EPERM it works as well. The difference here is that
 you won't have the intended data in the meantime.
 
 Again, I'm not claiming the differences are serious in practice, only
 that they exist.
 
 True.
 
 This is CVE-2008-2004.
 
 
 = Aside: other trust boundaries =
 
 Of course, this is not the only trust boundary that matters.  For
 instance, there's normally one between your host and somebody else's
 computers.  Telling QEMU to trust meta-data of some image you got from
 the internet violates it.  There's nothing QEMU can do about that.
 
 
 = Insecure usage is easy, secure usage is hard =
 
 The oldest stratum of user interfaces doesn't let you specify the image
 format.  Use

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Max Reitz


On 2014-11-06 at 15:56, Jeff Cody wrote:

On Thu, Nov 06, 2014 at 01:53:35PM +0100, Max Reitz wrote:

On 2014-11-06 at 13:26, Markus Armbruster wrote:

Max Reitz mre...@redhat.com writes:


On 2014-11-04 at 19:45, Markus Armbruster wrote:

I'll try to explain all solutions fairly.  Isn't easy when you're as
biased towards one of them as I am.  Please bear with me.


= The trust boundary between image contents and meta-data =

A disk image consists of image contents and meta-data.

Example: all of a raw image's contents is image contents.  Leaves just
file name and attributes for meta-data.

Example: QCOW2 meta-data includes header, header extensions, L1 table,
L2 tables, ...  The meta-data defines where in the image the actual
contents is stored.

A guest can access the image contents, not the meta-data.

Image contents you've let an untrusted guest write is untrusted.

Therefore, there's a trust boundary between image contents and
meta-data.  QEMU has to trust image meta-data, but shouldn't trust image
contents.  The exact location of the trust boundary depends on the image
format.


= How we instruct QEMU what to trust =

By configuring QEMU to use an image, the user instructs QEMU to trust
the image's meta-data.

When the user's configuration specifies the image format explicitly, the
trust boundary is clear.

Else, the trust boundary is ambigous when more than one format is
possible.

QEMU resolves this ambiguity by picking the first format with the
highest score.  Raw format is always possible, and always has the
lowest score.


= How this lets the guest escape isolation =

Unfortunately, this lets the guest shift the trust boundary and escape
isolation, as follows:

* Expose a raw image to the guest (whether you specify the format=raw or
let QEMU guess it doesn't matter).  The complete contents becomes
untrusted.

* Reuse the image *without* specifying the raw format.  QEMU guesses the
format based on untrusted image contents.  Now QEMU guesses a format
chosen by the guest, with meta-data chosen by the guest.  By
controlling image meta-data, the malicious guest can access arbitrary
files as QEMU, enlarge its storage, and more.  A non-malicious guest
can accidentally DoS itself, by writing a pattern probing recognizes.

Thank you for bringing that to my attention. This means that I'm even
more in favor of using Kevin's patches because in fact they don't
break anything.

They break things differently.  The difference may or may not matter.

Example: innocent guest writes a recognized pattern.

   Now: next restart fails, guest DoSed itself until host operator gets
   around to adding format=raw to the configuration.  Consequence:
   downtime (probably lengthy), but no data corruption.

   With Kevin's patch: write fails, guest may or may not handle the
   failure gracefully.  Consequences can range from guest complains to
   its logs (who cares) via guest stops whatever it's doing and refuses
   to continue until its hardware gets fixed (downtime as above) to
   data corruption.

You somehow seem convinced that writing to sector 0 is a completely
normal operation. For x86, it isn't, though.

There are only a couple of programs which do that, I can only think
of partitioning and setting up boot loaders. There's not a myriad of
programs which would increase the probability of one both writing a
recognizable pattern *and* not handling EPERM correctly.

I see the probability of both happening at the same time as
extremely low, not least because there are only a handful of
programs which access that sector.


I'm not yet opposed to the restricted-raw method, but...

I think the above is a somewhat dangerous viewpoint to take with QEMU.
It is a bit of a slippery slope to start to assume what data guests
will write to the disks provided to them.  Even if the probability of
this happening is very low, with what usage we envision now, it is
still entirely legitimate usage for a guest to write data starting at
sector 0.


Then let's officially deprecate format probing, if we haven't done so 
already. That way, there's no excuse.


What I'm saying is that there are obviously no compatibility issues. 
There is no guest software which did write recognizable patterns (so far 
nobody provided a counterexample), and since format probing is 
deprecated (or should be), you have no excuse for running future guests 
in qemu without having explicitly specified the format.


And if you are specifying the format, Kevin's patches will not prevent 
the guest from making its disk a qcow2 image whatsoever.


Max

Re: [Qemu-devel] [PATCH] ui/input: strictly check console in finding input handler

2014-11-06 Thread Amos Kong

On Thu, Nov 06, 2014 at 02:37:54PM +0800, Amos Kong wrote:
 On Wed, Nov 05, 2014 at 09:47:47AM +0100, Gerd Hoffmann wrote:
  On Mi, 2014-11-05 at 00:49 +0800, Amos Kong wrote:
   qemu_input_find_handler() prefers a handler associated with con.
   But if none exists, it takes any. This patch added a parameter
   to strictly check console, in case we want to input event to
   special console.
 
 If console is assigned, it will try to find right handler by first
 loop in qemu_input_find_handler(). The second loop is used to find
 mask matched handler if console isn't assigned.
 
 If we assigned console and didn't find handler in first loop, it
 skip second loop body by 'continue', and return NULL.
 It seems my concern is wrong, we don't need this repeated parameter.

I was wrong, if we don't assign the console for qemu_input_find_handler(),
it has chance to get an arbitrary console (mask matched). So the
original issue this patch try to fix truely exists.
 
 NACK this patch.
 
 Thanks.
  
   'input-send-event' has a parameter to assign special console,
   so we should enable strict checking in finding handler.
  
  I don't think we want do that by default.  It only matters in case of a
  multiseat setup where you actually have multiple input devices of the
  same kind.  Which isn't a very typical use case.
  
  Options I see are:
  
(a) Turn console into an optional parameter, do strict checking in
case it is present.
(b) Add a optional 'strict' parameter.
 
 -- 
   Amos.

From Markus:
 
 Current behavior (please correct misunderstandings):
 
 The guest must be running.
 input-send-event parameter 'console' is mandatory.
 The console identified by its value must exist.
 If this console can accept the event, send it there.
 Else, a console that can accept the event must exist.  Send it
 to
 one of them.  Which one exactly isn't specified.
 
 Behavior with (a):
 
 The guest must be running.
 input-send-event parameter 'console' is optional.
 If it's present, the console identified by its value must exist,
 and
 must be able to accept the event.  Send it there.
 Else, a console that can accept the event must exist.  Send it
 to
 one of them.  Which one exactly isn't specified.
 
 Must means or else command fails.
 
 I think that's a clear improvement.  It's actually what I expected
 from
 the command documentation, until I read the code.

Thanks for your clear description, I now agree with Gerd's option (a),
(a) is better than (b). So we need to change QMP to support optional
console parameter and change qemu_input_find_handler() to support
strict checking (as my patch).

Gerd, I'd like to work on both of them, if you already work on it,
please let me know, thanks. 

-- 
Amos.


signature.asc
Description: Digital signature

Re: [Qemu-devel] Image probing: how it can be insecure, and what we could do about it

2014-11-06 Thread Kevin Wolf

Am 06.11.2014 um 14:57 hat Markus Armbruster geschrieben:
 Kevin Wolf kw...@redhat.com writes:
 
  Am 04.11.2014 um 19:45 hat Markus Armbruster geschrieben:
  I'll try to explain all solutions fairly.  Isn't easy when you're as
  biased towards one of them as I am.  Please bear with me.
  
  
  = The trust boundary between image contents and meta-data =
  
  A disk image consists of image contents and meta-data.
  
  Example: all of a raw image's contents is image contents.  Leaves just
  file name and attributes for meta-data.
 
  Better: Leaves only protocol-specific metadata (e.g. file name and
  attributes for raw-posix).
 
 Can you give examples for other protocols?

Max already gave the example of NBD always implying raw.

I can also imagine that protocols that take URLs would use the file name
from the URL - which is not necessarily the end of the string, because
query options could follow.

Even though I don't think we have it today, it's also not entirely
unthinkable that some network protocol specifically made for VM images
(perhaps something like Sheepdog) could be storing the image format as
metadata on the server. Actually, I think that would make a whole lot of
sense for them.

  = Insecure usage is easy, secure usage is hard =
  
  The oldest stratum of user interfaces doesn't let you specify the image
  format.  Use of raw images with these is insecure by design.  These
  interfaces are still recommended for human users.
  
  Example of insecure usage: -hda foo.img, where foo.img is raw.
  
  With the next generation of interfaces, specifying the image format is
  optional.  Use of raw images with these is insecure by default.
  
  Example of insecure usage: -drive file=foo.img,index=0,media=cdrom,
  where foo.img is raw.  The -hda above is actually sugar for this.
  
  Equivalent secure usage: add format=raw.
  
  Note that specifying just the top image's format is not enough, you also
  have to specify any backing images' formats.  QCOW2 can optionally store
  the backing image format in the image.  The other COW formats can't.
  
  Example of insecure usage: -hda bar.vmdk, where bar.vmdk is a VMDK image
  with a raw backing file.
 
  Usually this is mitigated by the fact that backing files are read-only.
  Trouble is starting when you use things like commit.
 
 Yes.
 
  Equivalent secure usage: Beats me.  Maybe there's a funky -drive
  backing.whatever to specify the backing image's format.
 
  Yes, you can override the backing file driver (backing.driver=raw should
  do the trick). Not really user-friendly, especially with long backing
  file chains, but it happens to be there.
 
  And of course, libvirt should be using it for non-qcow2 or qcow2 without
  the backing format header extension (but doesn't yet).
 
 I'm glad it's there.  Too bad libvirt doesn't use it, yet.  Supports my
 point that secure usage is too hard now.

I don't know whether it's related to being too hard or just too new. I
won't disagree when you say that it isn't obvious, but the libvirt
authors are experts and probably know better than the average command
line user what they should be doing ideally.

  I proposed something less radical, namely to keep guessing the image
  format, but base the guess on trusted meta-data only: file name and
  attributes.  Block and character special files are raw.  For other
  files, find the file name extension, and look up the format claiming it.
  
  PRO: Plugs the hole.
  
  CON: Breaks existing usage when the new guess differs from the old
  guess.  Common usage should be fine:
  
  * -hda test.qcow2
  
Fine as long as test.qcow2 is really QCOW2 (as it should!), and
either specifies a backing format (as it arguably should), or the
backing file name is sane.
  
  * -hda disk.img
  
Fine as long as disk.img is really a disk image (as it should).
 
  .img is not as clear, I've seen people using it for other formats. It's
  still a disk image, but not a raw one.
 
 Is this usage common?

More common that writing a qcow2 header to your boot sector. ;-)

But seriously, one of the problems in this discussion is that we don't
have any actual data for more exotic use cases. I can only say that I've
seen it before, even though that doesn't mean much.

If you want me to guess: Not really common, but probably one of the most
common corner cases from those that we've been discussing here.

  * -hda /dev/mapper/vg0-virtdisk
  
Fine as long as the logical volume is raw.
  
  Less common usage can break:
  
  * -hda nbd://localhost
  
Socket provides no clue, so no guess.
  
  Weird usage can conceivably break hard:
  
  * -hdd disk.img
  
Breaks hard when disk.img is actually QCOW2, the guest boots
anyway from another drive, then proceeds to overwrite this one.
  
  Mitigation: lengthy transition period where we warn this usage is
  insecure, and we'll eventually break it; here's a hint on secure

Re: [Qemu-devel] New emulator code base (qemu-android) and ranchu virtual board.

2014-11-06 Thread Alex Bennée


Christopher Covington c...@codeaurora.org writes:
 Hi,

 [snip--for full message see
 https://groups.google.com/d/msg/android-emulator-dev/dltBnUW_HzU/2tSZNLaVzmQJ]

 5) Relationship with upstream
 
 In an ideal world, we would not need a fork, and all code would live on
 the upstream QEMU git.
 
 In reality, things are different: there is little chance that the upstream 
 team would want to maintain 100K+ lines of code that are completely 
 specific to Android, and for good reason. That's why the refactoring effort
 is so important, we need to find a way to maintain the Android-specific
 QEMU patches as small as possible, and push as much stuff into the
 android-emulation library.

 I'm curious, have there been previous discussions with the QEMU maintainers
 that you can summarize or point me to?

Taking my Linaro hat off and certainly not speaking for Google or the
maintainers:

Whenever new functionality is added to QEMU we rely on having:

 * Someone to maintain and support the code
 * Interested users that can regularly test and report breakages

Without this functionality can bitrot without being noticed and impose
an additional maintenance burden on the rest of the code-base. The
original authors may well have different priorities (e.g. shipping product!).
Of course not working directly upstream does impose additional costs
in the long term as you either take on the maintenance burden of
backporting fixes to your older tree or fixing conflicts on re-basing
with the upstream.

 Even with smaller changes, it's crucial to have a good set of tests, that 
 exercise these Android-specific features, added to the QEMU test suite, and
 clear documentation about the implementation being added. This may require
 a stub or minimal mock version of android-emulation.
 
 Finally, we may want to dedicate serious engineering resources to better 
 continuous integration of upstream QEMU that would also exercise the 
 Android bits.
 
 Until we reach such a situation, we will have to maintain a separate fork 
 and continue to rebase it on top of recent QEMU changes.

Having said all that if you look at the current ranchu branch you'll see
the delta has come down a fair bit and is in itself relatively
self-contained. This should reduce the pain of regularly re-basing and
I'm sure Google don't want to go through the major upheaval moving from
the very old QEMU fork that the current emulator to a more modern QEMU
too often.

I'm hopeful we can get to a point where basic Android support is
up-stremable (and defended upstream) and the heavy android specific
stuff becomes a simple mechanical re-basing operations. Of the android
changes (off the top of my head) we have:

* Machine descriptions (fairly self-contained)
* Simple event driver (again self-contained)
* android_pipe services (self-contained but replicates virt-io functionality)
* android console support (provides ADB specific interfaces to QEMU)
* Simple Android Frame Buffer
* OpenGL (out-of-tree, likely very android specific)

I suspect the first 2 or 3 could be up-streamed without too much trouble
but it would be interesting to know if having this basic android emulation 
in master would be of any interest to the wider community?


-- 
Alex Bennée

[Qemu-devel] [PATCH v3 3/3] linux-aio: remove 'node' from 'struct qemu_laiocb'

2014-11-06 Thread Ming Lei

No one uses the 'node' field any more, so remove it
from 'struct qemu_laiocb', and this can save 16byte
for the struct on 64bit arch.

Signed-off-by: Ming Lei ming@canonical.com
---
 block/linux-aio.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index f5ca41d..b12da25 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -35,7 +35,6 @@ struct qemu_laiocb {
 size_t nbytes;
 QEMUIOVector *qiov;
 bool is_read;
-QLIST_ENTRY(qemu_laiocb) node;
 };
 
 /*
-- 
1.7.9.5

[Qemu-devel] [PATCH v3 0/3] linux-aio: fix batch submission

2014-11-06 Thread Ming Lei

The 1st patch fixes batch submission.

The 2nd one fixes -EAGAIN for non-batch case.

The 3rd one is a cleanup.

This patchset is splitted from previous patchset(dataplane: optimization
and multi virtqueue support), as suggested by Stefan.

v3:
- rebase on QEMU master
v2:
- code style fix and commit log fix as suggested by Benoît Canet
v1:
- rebase on latest QEMU master

 block/linux-aio.c |  124 ++---
 1 file changed, 100 insertions(+), 24 deletions(-)

Thanks,

[Qemu-devel] [PATCH v3 1/3] linux-aio: fix submit aio as a batch

2014-11-06 Thread Ming Lei

In the enqueue path, we can't complete request, otherwise
Co-routine re-entered recursively may be caused, so this
patch fixes the issue with below ideas:

- for -EAGAIN or partial completion, retry the submision by
schedule an BH in following completion cb
- for part of completion, also update the io queue
- for other failure, return the failure if in enqueue path,
otherwise, abort all queued I/O

Signed-off-by: Ming Lei ming@canonical.com
---
 block/linux-aio.c |  101 +
 1 file changed, 79 insertions(+), 22 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index d92513b..f66e8ad 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -38,11 +38,19 @@ struct qemu_laiocb {
 QLIST_ENTRY(qemu_laiocb) node;
 };
 
+/*
+ * TODO: support to batch I/O from multiple bs in one same
+ * AIO context, one important use case is multi-lun scsi,
+ * so in future the IO queue should be per AIO context.
+ */
 typedef struct {
 struct iocb *iocbs[MAX_QUEUED_IO];
 int plugged;
 unsigned int size;
 unsigned int idx;
+
+/* handle -EAGAIN and partial completion */
+QEMUBH *retry;
 } LaioQueue;
 
 struct qemu_laio_state {
@@ -137,6 +145,12 @@ static void qemu_laio_completion_bh(void *opaque)
 }
 }
 
+static void qemu_laio_start_retry(struct qemu_laio_state *s)
+{
+if (s-io_q.idx)
+qemu_bh_schedule(s-io_q.retry);
+}
+
 static void qemu_laio_completion_cb(EventNotifier *e)
 {
 struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, e);
@@ -144,6 +158,7 @@ static void qemu_laio_completion_cb(EventNotifier *e)
 if (event_notifier_test_and_clear(s-e)) {
 qemu_bh_schedule(s-completion_bh);
 }
+qemu_laio_start_retry(s);
 }
 
 static void laio_cancel(BlockAIOCB *blockacb)
@@ -163,6 +178,9 @@ static void laio_cancel(BlockAIOCB *blockacb)
 }
 
 laiocb-common.cb(laiocb-common.opaque, laiocb-ret);
+
+/* check if there are requests in io queue */
+qemu_laio_start_retry(laiocb-ctx);
 }
 
 static const AIOCBInfo laio_aiocb_info = {
@@ -177,45 +195,80 @@ static void ioq_init(LaioQueue *io_q)
 io_q-plugged = 0;
 }
 
-static int ioq_submit(struct qemu_laio_state *s)
+static void abort_queue(struct qemu_laio_state *s)
+{
+int i;
+for (i = 0; i  s-io_q.idx; i++) {
+struct qemu_laiocb *laiocb = container_of(s-io_q.iocbs[i],
+  struct qemu_laiocb,
+  iocb);
+laiocb-ret = -EIO;
+qemu_laio_process_completion(s, laiocb);
+}
+}
+
+static int ioq_submit(struct qemu_laio_state *s, bool enqueue)
 {
 int ret, i = 0;
 int len = s-io_q.idx;
+int j = 0;
 
-do {
-ret = io_submit(s-ctx, len, s-io_q.iocbs);
-} while (i++  3  ret == -EAGAIN);
+if (!len) {
+return 0;
+}
 
-/* empty io queue */
-s-io_q.idx = 0;
+ret = io_submit(s-ctx, len, s-io_q.iocbs);
+if (ret == -EAGAIN) { /* retry in following completion cb */
+return 0;
+} else if (ret  0) {
+if (enqueue) {
+return ret;
+}
 
-if (ret  0) {
-i = 0;
-} else {
-i = ret;
+/* in non-queue path, all IOs have to be completed */
+abort_queue(s);
+ret = len;
+} else if (ret == 0) {
+goto out;
 }
 
-for (; i  len; i++) {
-struct qemu_laiocb *laiocb =
-container_of(s-io_q.iocbs[i], struct qemu_laiocb, iocb);
-
-laiocb-ret = (ret  0) ? ret : -EIO;
-qemu_laio_process_completion(s, laiocb);
+for (i = ret; i  len; i++) {
+s-io_q.iocbs[j++] = s-io_q.iocbs[i];
 }
+
+ out:
+/*
+ * update io queue, for partial completion, retry will be
+ * started automatically in following completion cb.
+ */
+s-io_q.idx -= ret;
+
 return ret;
 }
 
-static void ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
+static void ioq_submit_retry(void *opaque)
+{
+struct qemu_laio_state *s = opaque;
+ioq_submit(s, false);
+}
+
+static int ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
 {
 unsigned int idx = s-io_q.idx;
 
+if (unlikely(idx == s-io_q.size)) {
+return -1;
+}
+
 s-io_q.iocbs[idx++] = iocb;
 s-io_q.idx = idx;
 
-/* submit immediately if queue is full */
-if (idx == s-io_q.size) {
-ioq_submit(s);
+/* submit immediately if queue depth is above 2/3 */
+if (idx  s-io_q.size * 2 / 3) {
+return ioq_submit(s, true);
 }
+
+return 0;
 }
 
 void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
@@ -237,7 +290,7 @@ int laio_io_unplug(BlockDriverState *bs, void *aio_ctx, 
bool unplug)
 }
 
 if (s-io_q.idx  0) {
-ret = ioq_submit(s);
+ret = ioq_submit(s, false);
 }
 
 return ret;
@@ -281,7 +334,9 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, void

[Qemu-devel] [PATCH v3 2/3] linux-aio: handling -EAGAIN for !s-io_q.plugged case

2014-11-06 Thread Ming Lei

Previously -EAGAIN is simply ignored for !s-io_q.plugged case,
and sometimes it is easy to cause -EIO to VM, such as NVME device.

This patch handles -EAGAIN by io queue for !s-io_q.plugged case,
and it will be retried in following aio completion cb.

Suggested-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Ming Lei ming@canonical.com
---
 block/linux-aio.c |   22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index f66e8ad..f5ca41d 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -263,6 +263,11 @@ static int ioq_enqueue(struct qemu_laio_state *s, struct 
iocb *iocb)
 s-io_q.iocbs[idx++] = iocb;
 s-io_q.idx = idx;
 
+/* don't submit until next completion for -EAGAIN of non plug case */
+if (unlikely(!s-io_q.plugged)) {
+return 0;
+}
+
 /* submit immediately if queue depth is above 2/3 */
 if (idx  s-io_q.size * 2 / 3) {
 return ioq_submit(s, true);
@@ -330,10 +335,25 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, void 
*aio_ctx, int fd,
 io_set_eventfd(laiocb-iocb, event_notifier_get_fd(s-e));
 
 if (!s-io_q.plugged) {
-if (io_submit(s-ctx, 1, iocbs)  0) {
+int ret;
+
+if (!s-io_q.idx) {
+ret = io_submit(s-ctx, 1, iocbs);
+} else {
+ret = -EAGAIN;
+}
+/*
+ * Switch to queue mode until -EAGAIN is handled, we suppose
+ * there is always uncompleted I/O, so try to enqueue it first,
+ * and will be submitted again in following aio completion cb.
+ */
+if (ret == -EAGAIN) {
+goto enqueue;
+} else if (ret  0) {
 goto out_free_aiocb;
 }
 } else {
+ enqueue:
 if (ioq_enqueue(s, iocbs)  0) {
 goto out_free_aiocb;
 }
-- 
1.7.9.5

Re: [Qemu-devel] Adding SMP support for Sparc Target

2014-11-06 Thread Artyom Tarasenko

Hello Damien,

On Thu, Nov 6, 2014 at 8:38 AM, Damien Hilloulin
damien.hillou...@epfl.ch wrote:
 Hello everyone,

 I'm a newcomer in QEMU and my goal would be to port an existing system
 simulator using another emulator to QEMU.
 Some work has already been done, and Sparc has been the main target so far
 because of its simplicity (and because we have a very good support for Sparc
 with the other emulator).
 QEMU is great, open-source (contrary to the other emulator we have been
 using in the past), and that's why we are aiming at using it.

 However, it seems that the Sparc targets doesn't really support SMP/CMT as
 of now. So I am considering two possibilities:
 - adding SMP support in QEMU for the Sparc targets (and contribute it to
 QEMU :) )

Do you mean a) emulating multiple guest cores on in a single host
thread, or b) emulating multiple guest cores in multiple host threads?

The former (a) should be relative easy for a sun4m platform: just have
to put the CPUs at the proper place in the system bus and fill the CPU
Module Ids (MIDs) with the proper data.

It would bring no performance increase though. In fact the guest OS
would likely run slower because the speed of an emulated CPU would
decrease like 1/N, and utilization of multiple CPUs by a guest OS is
probably scale like ~ log N, where N is the number of CPUs emulated.

If you mean b), things get more complicated because TCG can currently
utilize just one host thread. There was an attempt to do utilize
multiple threads for an ARM target:
http://sourceforge.net/p/coremu/home/Home

It would be interesting to hear what the TCG experts would say. Adding
Richard to CC.

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

Re: [Qemu-devel] [PATCH 2/4] Qemu-Xen-vTPM: Register Xen stubdom vTPM frontend driver

2014-11-06 Thread Xu, Quan

 -Original Message-
 From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com]
 Sent: Thursday, November 06, 2014 11:42 PM
 To: Xu, Quan
 Cc: Stefano Stabellini; qemu-devel@nongnu.org; xen-de...@lists.xen.org
 Subject: RE: [PATCH 2/4] Qemu-Xen-vTPM: Register Xen stubdom vTPM
 frontend driver

 On Thu, 6 Nov 2014, Xu, Quan wrote:
   -Original Message-
   From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com]
   Sent: Monday, November 03, 2014 7:54 PM
   To: Xu, Quan
   Cc: qemu-devel@nongnu.org; xen-de...@lists.xen.org;
   stefano.stabell...@eu.citrix.com
   Subject: Re: [PATCH 2/4] Qemu-Xen-vTPM: Register Xen stubdom vTPM
   frontend driver

   On Sun, 2 Nov 2014, Quan Xu wrote:
This drvier transfers any request/repond between TPM xenstubdoms
driver and Xen vTPM stubdom, and facilitates communications
between Xen vTPM stubdom domain and vTPM xenstubdoms driver

Signed-off-by: Quan Xu quan...@intel.com

   Please describe what changes did make to xen_backend.c and why.
   The commit message should contains info on all the changes made by
   the patch below.

  The following is code process when Qemu is running with Xen.
  ##code process
  [...]
   xen_hvm_init()
  --xen_be_register()
  --xenstore_scan()
  --xen_be_check_state()

  --xen_vtpm_register()

  ideally, I can register 'vtpm' via xen_vtpm_register() as

  + xen_be_register(console, xen_console_ops);
  + xen_be_register(vkbd, xen_kbdmouse_ops); xen_be_register(qdisk,
  + xen_blkdev_ops);

  but there are 2 reasons why I add xen_vtpm_register(), instead of
  xen_be_register().

  1. The backend of TPM is runing in a Xen stubDom, not Domain 0.
  some functions are not working, for example 'setup watch' and 'look
  for backend' in xenstore_scan()

  2. there is a thread runing in Xen stubDom [event_thread()], it will
  handle backend status when the frontend is initialized. It is not
  compatible with xen_be_check_state(). xen_be_check_state() always
  tries to modify the status of backend.

  as there is always a tradeoff, if I force to integrate this case into
  xen_be_register(), there are maybe a lot of 'if ... else.. '. It will
  break the code architecture. Also I should leverage existing source
  code with minimum modifcation. i add 'DEVOPS_FLAG_STUBDOM_BE' flag
 in
  include/hw/xen/xen_backend.h to indicate that device backend is Xen
  stubDom.

 Given that xen_vtpm_register is actually registering a frontend, not a
 backend, you cannot use xen_be_register for it.

 However instead of introducing xen_vtpm_register, I think you should be
 adding a generic xen_fe_register function that handle any Xen PV frontend
 registration. It should also be able to handle backends not in Dom0. Then you
 can call:

 xen_fe_register(console, xen_vtpm_ops);

  xen_fe_register(vtpm, xen_vtpm_ops); ?

A good solution, I will try to add a generic xen_fe_register function that 
handle any Xen PV frontend in v2.

[Qemu-devel] [PATCH v10 01/26] target-arm: extend async excp masking

2014-11-06 Thread Greg Bellows

This patch extends arm_excp_unmasked() to use lookup tables for determining
whether IRQ and FIQ exceptions are masked.  The lookup tables are based on the
ARMv8 and ARMv7 specification physical interrupt masking tables.

If EL3 is using AArch64 IRQ/FIQ masking is ignored in all exception levels
other than EL3 if SCR.{FIQ|IRQ} is set to 1 (routed to EL3).

Signed-off-by: Greg Bellows greg.bell...@linaro.org

---

v8 - v9
- Undo the use of tables for exception masking and instead go with simplified
  logic based on the target EL lookup.
- Remove the masking tables

v7 - v8
- Add IRQ and FIQ exeception masking lookup tables.
- Rewrite patch to use lookup tables for determining whether an excpetion is
  masked or not.

v5 - v6
- Globally change Aarch# to AArch#
- Fixed comment termination

v4 - v5
- Merge with v4 patch 10
---
 target-arm/cpu.h | 66 
 1 file changed, 52 insertions(+), 14 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 7f80090..cf30b2a 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -1247,27 +1247,51 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
unsigned int excp_idx)
 CPUARMState *env = cs-env_ptr;
 unsigned int cur_el = arm_current_el(env);
 unsigned int target_el = arm_excp_target_el(cs, excp_idx);
-/* FIXME: Use actual secure state.  */
-bool secure = false;
-/* If in EL1/0, Physical IRQ routing to EL2 only happens from NS state.  */
-bool irq_can_hyp = !secure  cur_el  2  target_el == 2;
-
-/* Don't take exceptions if they target a lower EL.  */
+bool secure = arm_is_secure(env);
+uint32_t scr;
+uint32_t hcr;
+bool pstate_unmasked;
+int8_t unmasked = 0;
+bool is_aa64 = arm_el_is_aa64(env, 3);
+
+/* Don't take exceptions if they target a lower EL.
+ * This check should catch any exceptions that would not be taken but left
+ * pending.
+ */
 if (cur_el  target_el) {
 return false;
 }
 
 switch (excp_idx) {
 case EXCP_FIQ:
-if (irq_can_hyp  (env-cp15.hcr_el2  HCR_FMO)) {
-return true;
-}
-return !(env-daif  PSTATE_F);
+/* If FIQs are routed to EL3 or EL2 then there are cases where we
+ * override the CPSR.F in determining if the exception is masked or
+ * not.  If neither of these are set then we fall back to the CPSR.F
+ * setting otherwise we further assess the state below.
+ */
+hcr = (env-cp15.hcr_el2  HCR_FMO);
+scr = (env-cp15.scr_el3  SCR_FIQ);
+
+/* When EL3 is 32-bit, the SCR.FW bit controls whether the CPSR.F bit
+ * masks FIQ interrupts when taken in non-secure state.  If SCR.FW is
+ * set then FIQs can be masked by CPSR.F when non-secure but only
+ * when FIQs are only routed to EL3.
+ */
+scr = is_aa64 || !((env-cp15.scr_el3  SCR_FW)  !hcr);
+pstate_unmasked = !(env-daif  PSTATE_F);
+break;
+
 case EXCP_IRQ:
-if (irq_can_hyp  (env-cp15.hcr_el2  HCR_IMO)) {
-return true;
-}
-return !(env-daif  PSTATE_I);
+/* When EL3 execution state is 32-bit, if HCR.IMO is set then we may
+ * override the CPSR.I masking when in non-secure state.  The SCR.IRQ
+ * setting has already been taken into consideration when setting the
+ * target EL, so it does not have a further affect here.
+ */
+hcr = is_aa64 || (env-cp15.hcr_el2  HCR_IMO);
+scr = false;
+pstate_unmasked = !(env-daif  PSTATE_I);
+break;
+
 case EXCP_VFIQ:
 if (secure || !(env-cp15.hcr_el2  HCR_FMO)) {
 /* VFIQs are only taken when hypervized and non-secure.  */
@@ -1283,6 +1307,20 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
unsigned int excp_idx)
 default:
 g_assert_not_reached();
 }
+
+/* Use the target EL, current execution state and SCR/HCR settings to
+ * determine whether the corresponding CPSR bit is used to mask the
+ * interrupt.
+ */
+if ((target_el  cur_el)  (target_el != 1)  (scr || hcr) 
+(is_aa64 || !secure)) {
+unmasked = 1;
+}
+
+/* The PSTATE bits only mask the interrupt if we have not overriden the
+ * ability above.
+ */
+return unmasked || pstate_unmasked;
 }
 
 static inline CPUARMState *cpu_init(const char *cpu_model)
-- 
1.8.3.2

[Qemu-devel] [PATCH v10 02/26] target-arm: add async excp target_el function

2014-11-06 Thread Greg Bellows

Adds a dedicated function and a lookup table for determining the target
exception level of IRQ and FIQ exceptions.  The lookup table is taken from the
ARMv7 and ARMv8 specification exception routing tables.

Signed-off-by: Greg Bellows greg.bell...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org

---

v8 - v9
- Fixed target_el_table in correct 32-bit secure values
- Expanded comment on target_el_table untaken exception handling
- Fixed minor issues

v7 - v8
- Added target EL lookup table
- Rework arm_phys_excp_target_el to use an EL lookup table rather than
  conditionals.

v5 - v6
- Removed unneeded arm_phys_excp_target_el() function prototype.
- Removed unneeded arm_phys_excp_target_el() USER_ONLY function.
- Fixed up arm_phys_excp_target_el() function definition to be static.
- Globally replace Aarch# with AArch#

v4 - v5
- Simplify target EL function including removal of mode which was unused
- Merged with patch that plugs in the use of the function

v3 - v4
- Fixed arm_phys_excp_target_el() 0/0/0 case to return excp_mode when EL2
  rather than ABORT.
---
 target-arm/helper.c | 116 +++-
 1 file changed, 97 insertions(+), 19 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index c47487a..a48ebae 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -3761,6 +3761,101 @@ void switch_mode(CPUARMState *env, int mode)
 env-spsr = env-banked_spsr[i];
 }
 
+/* Physical Interrupt Target EL Lookup Table
+ *
+ * [ From ARM ARM section G1.13.4 (Table G1-15) ]
+ *
+ * The below multi-dimensional table is used for looking up the target
+ * exception level given numerous condition criteria.  Specifically, the
+ * target EL is based on SCR and HCR routing controls as well as the
+ * currently executing EL and secure state.
+ *
+ *Dimensions:
+ *target_el_table[2][2][2][2][2][4]
+ *|  |  |  |  |  +--- Current EL
+ *|  |  |  |  +-- Non-secure(0)/Secure(1)
+ *|  |  |  +- HCR mask override
+ *|  |  + SCR exec state control
+ *|  +--- SCR mask override
+ *+-- 32-bit(0)/64-bit(1) EL3
+ *
+ *The table values are as such:
+ *0-3 = EL0-EL3
+ * -1 = Cannot occur
+ *
+ * The ARM ARM target EL table includes entries indicating that an exception
+ * is not taken.  The two cases where this is applicable are:
+ *1) An exception is taken from EL3 but the SCR does not have the exception
+ *routed to EL3.
+ *2) An exception is taken from EL2 but the HCR does not have the exception
+ *routed to EL2.
+ * In these two cases, the below table contain a target of EL1.  This value is
+ * returned as it is expected that the consumer of the table data will check
+ * for target EL = current EL to ensure the exception is not taken.
+ *
+ *SCR HCR
+ * 64  EA AMO From
+ *BIT IRQ IMO  Non-secure Secure
+ *EL3 FIQ  RW FMO   EL0 EL1 EL2 EL3   EL0 EL1 EL2 EL3
+ */
+const int8_t target_el_table[2][2][2][2][2][4] = {
+/* 0   0   0   0 */{ 1,  1,  2, -1 },{ 3, -1, -1,  3 },},
+   {/* 0   0   0   1 */{ 2,  2,  2, -1 },{ 3, -1, -1,  3 },},},
+  {{/* 0   0   1   0 */{ 1,  1,  2, -1 },{ 3, -1, -1,  3 },},
+   {/* 0   0   1   1 */{ 2,  2,  2, -1 },{ 3, -1, -1,  3 },},},},
+ {{{/* 0   1   0   0 */{ 3,  3,  3, -1 },{ 3, -1, -1,  3 },},
+   {/* 0   1   0   1 */{ 3,  3,  3, -1 },{ 3, -1, -1,  3 },},},
+  {{/* 0   1   1   0 */{ 3,  3,  3, -1 },{ 3, -1, -1,  3 },},
+   {/* 0   1   1   1 */{ 3,  3,  3, -1 },{ 3, -1, -1,  3 },},},},},
+/* 1   0   0   0 */{ 1,  1,  2, -1 },{ 1,  1, -1,  1 },},
+   {/* 1   0   0   1 */{ 2,  2,  2, -1 },{ 1,  1, -1,  1 },},},
+  {{/* 1   0   1   0 */{ 1,  1,  1, -1 },{ 1,  1, -1,  1 },},
+   {/* 1   0   1   1 */{ 2,  2,  2, -1 },{ 1,  1, -1,  1 },},},},
+ {{{/* 1   1   0   0 */{ 3,  3,  3, -1 },{ 3,  3, -1,  3 },},
+   {/* 1   1   0   1 */{ 3,  3,  3, -1 },{ 3,  3, -1,  3 },},},
+  {{/* 1   1   1   0 */{ 3,  3,  3, -1 },{ 3,  3, -1,  3 },},
+   {/* 1   1   1   1 */{ 3,  3,  3, -1 },{ 3,  3, -1,  3 },},},},},
+};
+
+/*
+ * Determine the target EL for physical exceptions
+ */
+static inline uint32_t arm_phys_excp_target_el(CPUState *cs, uint32_t excp_idx,
+uint32_t cur_el, bool secure)
+{
+CPUARMState *env = cs-env_ptr;
+int rw = ((env-cp15.scr_el3  SCR_RW) == SCR_RW);
+int scr;
+int hcr;
+int target_el;
+int is64 = arm_el_is_aa64(env, 3);
+
+switch (excp_idx) {
+case EXCP_IRQ:
+scr = ((env-cp15.scr_el3  SCR_IRQ) == SCR_IRQ);
+hcr = ((env-cp15.hcr_el2  HCR_IMO) == HCR_IMO);
+break;
+case EXCP_FIQ:
+scr = ((env-cp15.scr_el3  SCR_FIQ) == SCR_FIQ);
+hcr = ((env-cp15.hcr_el2  HCR_FMO)

[Qemu-devel] [PATCH v10 00/26] target-arm: add Security Extensions for CPUs

2014-11-06 Thread Greg Bellows

Version 10 of the ARM processor security extension (TrustZone) support.  This
patchset includes changes to support the processor security extensions
on ARMv7 aarch32 with hooks for later enabling v8 aarch64/32.

This is a rebase of v9 to a more recent master as well as a fix for an
overlooked bug in patch 12 that broke AA64.

Fabian Aggeler (19):
  target-arm: add banked register accessors
  target-arm: add CPREG secure state support
  target-arm: insert AArch32 cpregs twice into hashtable
  target-arm: move AArch32 SCR into security reglist
  target-arm: implement IRQ/FIQ routing to Monitor mode
  target-arm: add NSACR register
  target-arm: add MVBAR support
  target-arm: add SCTLR_EL3 and make SCTLR banked
  target-arm: respect SCR.FW, SCR.AW and SCTLR.NMFI
  target-arm: make CSSELR banked
  target-arm: make TTBR0/1 banked
  target-arm: make TTBCR banked
  target-arm: make DACR banked
  target-arm: make IFSR banked
  target-arm: make DFSR banked
  target-arm: make IFAR/DFAR banked
  target-arm: make PAR banked
  target-arm: make c13 cp regs banked (FCSEIDR, ...)
  target-arm: add cpu feature EL3 to CPUs with Security Extensions

Greg Bellows (6):
  target-arm: extend async excp masking
  target-arm: add async excp target_el function
  target-arm: add secure state bit to CPREG hash
  target-arm: add SDER definition
  target-arm: make VBAR banked
  target-arm: make MAIR0/1 banked

Sergey Fedorov (1):
  target-arm: add non-secure Translation Block flag

 hw/arm/pxa2xx.c |   6 +-
 linux-user/aarch64/target_cpu.h |   2 +-
 linux-user/arm/target_cpu.h |   2 +-
 linux-user/main.c   |   2 +-
 target-arm/cpu.c|  14 +-
 target-arm/cpu.h| 364 ++---
 target-arm/helper.c | 682 ++--
 target-arm/internals.h  |   6 +-
 target-arm/op_helper.c  |   4 +-
 target-arm/translate.c  |  15 +-
 target-arm/translate.h  |   1 +
 11 files changed, 868 insertions(+), 230 deletions(-)

--
1.8.3.2

[Qemu-devel] [PATCH v10 17/26] target-arm: make TTBCR banked

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Adds secure and non-secure bank register suport for TTBCR.
Added new struct to compartmentalize the TCR data and masks.  Removed old
tcr/ttbcr data and added a 4 element array of the new structs in cp15.  This
allows for one entry per EL.  Added a CP register definition for TCR_EL3.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org

---

v8 - v9
- Removed get_phys_addr_lpae() AArch64 EL3 support.
- Replaced calls to ARM_CP_SECSTATE_TEST with direct access
- Reorganized the TCR data into a common TCR struct.
- Added raw_ptr() utility function for acquiring the TCR pointer from a reginfo
- Replaced accesses to tcr to use the new struct and utility function.
- Removed uses of A32_BANKED_CURRENT_REG_GET/SET with tcr

v5 - v6
- Changed _el field variants to be array based
- Switch to use distinct CPREG secure flags

v4 - v5
- Changed c2_mask updates to use the TTBCR cpreg bank flag for selcting the
  secure bank instead of the A32_BANKED_CURRENT macro.  This more accurately
  chooses the correct bank matching that of the TTBCR being accessed.
---
 target-arm/cpu.h   | 11 +---
 target-arm/helper.c| 72 --
 target-arm/internals.h |  6 ++---
 3 files changed, 58 insertions(+), 31 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 772d4e5..69a2079 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -120,6 +120,12 @@ typedef struct ARMGenericTimer {
 #define GTIMER_VIRT 1
 #define NUM_GTIMERS 2
 
+typedef struct {
+uint64_t raw_tcr;
+uint32_t mask;
+uint32_t base_mask;
+} TCR;
+
 typedef struct CPUARMState {
 /* Regs for current mode.  */
 uint32_t regs[16];
@@ -217,9 +223,8 @@ typedef struct CPUARMState {
 };
 uint64_t ttbr1_el[4];
 };
-uint64_t c2_control; /* MMU translation table base control.  */
-uint32_t c2_mask; /* MMU translation table base selection mask.  */
-uint32_t c2_base_mask; /* MMU translation table base 0 mask. */
+/* MMU translation table base control. */
+TCR tcr_el[4];
 uint32_t c2_data; /* MPU data cachable bits.  */
 uint32_t c2_insn; /* MPU instruction cachable bits.  */
 uint32_t c3; /* MMU domain access control register
diff --git a/target-arm/helper.c b/target-arm/helper.c
index b75a394..53a1859 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -136,6 +136,11 @@ static void raw_write(CPUARMState *env, const ARMCPRegInfo 
*ri,
 }
 }
 
+static void *raw_ptr(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+return (char *)(env) + (ri)-fieldoffset;
+}
+
 static uint64_t read_raw_cp_reg(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 /* Raw read of a coprocessor register (as needed for migration, etc). */
@@ -1560,6 +1565,7 @@ static const ARMCPRegInfo pmsav5_cp_reginfo[] = {
 static void vmsa_ttbcr_raw_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
+TCR *tcr = raw_ptr(env, ri);
 int maskshift = extract32(value, 0, 3);
 
 if (!arm_feature(env, ARM_FEATURE_V8)) {
@@ -1578,14 +1584,15 @@ static void vmsa_ttbcr_raw_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 }
 }
 
-/* Note that we always calculate c2_mask and c2_base_mask, but
+/* Update the masks corresponding to the the TCR bank being written
+ * Note that we always calculate mask and base_mask, but
  * they are only used for short-descriptor tables (ie if EAE is 0);
- * for long-descriptor tables the TTBCR fields are used differently
- * and the c2_mask and c2_base_mask values are meaningless.
+ * for long-descriptor tables the TCR fields are used differently
+ * and the mask and base_mask values are meaningless.
  */
-raw_write(env, ri, value);
-env-cp15.c2_mask = ~(((uint32_t)0xu)  maskshift);
-env-cp15.c2_base_mask = ~((uint32_t)0x3fffu  maskshift);
+tcr-raw_tcr = value;
+tcr-mask = ~(((uint32_t)0xu)  maskshift);
+tcr-base_mask = ~((uint32_t)0x3fffu  maskshift);
 }
 
 static void vmsa_ttbcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -1604,19 +1611,25 @@ static void vmsa_ttbcr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 
 static void vmsa_ttbcr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
 {
-env-cp15.c2_base_mask = 0xc000u;
-raw_write(env, ri, 0);
-env-cp15.c2_mask = 0;
+TCR *tcr = raw_ptr(env, ri);
+
+/* Reset both the TCR as well as the masks corresponding to the bank of
+ * the TCR being reset.
+ */
+tcr-raw_tcr = 0;
+tcr-mask = 0;
+tcr-base_mask = 0xc000u;
 }
 
 static void vmsa_tcr_el1_write(CPUARMState *env, const ARMCPRegInfo *ri,
uint64_t value)
 {
 ARMCPU *cpu = arm_env_get_cpu(env);
+TCR *tcr = raw_ptr(env, ri);
 
 /* For AArch64 the A1 bit could result in a change of

[Qemu-devel] [PATCH v10 03/26] target-arm: add banked register accessors

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

If EL3 is in AArch32 state certain cp registers are banked (secure and
non-secure instance). When reading or writing to coprocessor registers
the following macros can be used.

- A32_BANKED macros are used for choosing the banked register based on provided
  input security argument.  This macro is used to choose the bank during
  translation of MRC/MCR instructions that are dependent on something other
  than the current secure state.
- A32_BANKED_CURRENT macros are used for choosing the banked register based on
  current secure state.  This is NOT to be used for choosing the bank used
  during translation as it breaks monitor mode.

If EL3 is operating in AArch64 state coprocessor registers are not
banked anymore. The macros use the non-secure instance (_ns) in this
case, which is architecturally mapped to the AArch64 EL register.

Signed-off-by: Sergey Fedorov s.fedo...@samsung.com
Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org

---

v7 - v8
- Move use_secure_reg() function to the TBFLAG patch.

v5 - v6
- Converted macro USE_SECURE_REG() into inlince function use_secure_reg()
- Globally replace Aarch# with AArch#

v4 - v5
- Cleaned-up macros to try and alleviate misuse.  Made A32_BANKED macros take
  secure arg indicator rather than relying on USE_SECURE_REG.  Incorporated the
  A32_BANKED macros into the A32_BANKED_CURRENT.  CURRENT is now the only one
  that automatically chooses based on current secure state.
---
 target-arm/cpu.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index cf30b2a..7769ccf 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -817,6 +817,33 @@ static inline bool arm_el_is_aa64(CPUARMState *env, int el)
 return arm_feature(env, ARM_FEATURE_AARCH64);
 }
 
+/* Macros for accessing a specified CP register bank */
+#define A32_BANKED_REG_GET(_env, _regname, _secure)\
+((_secure) ? (_env)-cp15._regname##_s : (_env)-cp15._regname##_ns)
+
+#define A32_BANKED_REG_SET(_env, _regname, _secure, _val)   \
+do {\
+if (_secure) {   \
+(_env)-cp15._regname##_s = (_val);\
+} else {\
+(_env)-cp15._regname##_ns = (_val);   \
+}   \
+} while (0)
+
+/* Macros for automatically accessing a specific CP register bank depending on
+ * the current secure state of the system.  These macros are not intended for
+ * supporting instruction translation reads/writes as these are dependent
+ * solely on the SCR.NS bit and not the mode.
+ */
+#define A32_BANKED_CURRENT_REG_GET(_env, _regname)\
+A32_BANKED_REG_GET((_env), _regname,\
+   ((!arm_el_is_aa64((_env), 3)  arm_is_secure(_env
+
+#define A32_BANKED_CURRENT_REG_SET(_env, _regname, _val)   
\
+A32_BANKED_REG_SET((_env), _regname,\
+   ((!arm_el_is_aa64((_env), 3)  arm_is_secure(_env))),  
\
+   (_val))
+
 void arm_cpu_list(FILE *f, fprintf_function cpu_fprintf);
 unsigned int arm_excp_target_el(CPUState *cs, unsigned int excp_idx);
 
-- 
1.8.3.2

[Qemu-devel] [PATCH v10 08/26] target-arm: move AArch32 SCR into security reglist

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Define a new ARM CP register info list for the ARMv7 Security Extension
feature. Register that list only for ARM cores with Security Extension/EL3
support. Moving AArch32 SCR into Security Extension register group.

Signed-off-by: Sergey Fedorov s.fedo...@samsung.com
Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org

---

v7 - v8
- Fix SCR register fieldoffset to be offsetoflow32.
- Rename v7_el3_cp_reginfo to el3_cp_reginfo and remove v7 feature check when
  defining.  This allows all common v7/8 secure CP regs to be registered
  together leaving the v8_el3_cp_reginfo to only v8 specific EL3 registers.
- Move SCR_EL3 into el3_cp_reginfo.

v4 - v5
- Added reset value on SCR_EL3
- Squashed SCR Migration fix (previously patch 33)
  This patch adds code to mark duplicate CP register registrations as
  NO_MIGRATE to avoid duplicate migrations.

v3 - v4
- Renamed security_cp_reginfo to v7_el3_cp_reginfo
- Conditionalized define on whether v7 or v8 were enabled
---
 target-arm/helper.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 0471e6c..1be185d 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -898,9 +898,6 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .access = PL1_RW, .writefn = vbar_write,
   .fieldoffset = offsetof(CPUARMState, cp15.vbar_el[1]),
   .resetvalue = 0 },
-{ .name = SCR, .cp = 15, .crn = 1, .crm = 1, .opc1 = 0, .opc2 = 0,
-  .access = PL1_RW, .fieldoffset = offsetoflow32(CPUARMState, 
cp15.scr_el3),
-  .resetvalue = 0, .writefn = scr_write },
 { .name = CCSIDR, .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 0,
   .access = PL1_R, .readfn = ccsidr_read, .type = ARM_CP_NO_MIGRATE },
@@ -2335,11 +2332,18 @@ static const ARMCPRegInfo v8_el3_cp_reginfo[] = {
   .access = PL3_RW, .writefn = vbar_write,
   .fieldoffset = offsetof(CPUARMState, cp15.vbar_el[3]),
   .resetvalue = 0 },
+REGINFO_SENTINEL
+};
+
+static const ARMCPRegInfo el3_cp_reginfo[] = {
 { .name = SCR_EL3, .state = ARM_CP_STATE_AA64,
-  .type = ARM_CP_NO_MIGRATE,
   .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 1, .opc2 = 0,
   .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.scr_el3),
-  .writefn = scr_write },
+  .resetvalue = 0, .writefn = scr_write },
+{ .name = SCR,  .type = ARM_CP_NO_MIGRATE,
+  .cp = 15, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 0,
+  .access = PL3_RW, .fieldoffset = offsetoflow32(CPUARMState, 
cp15.scr_el3),
+  .resetfn = arm_cp_reset_ignore, .writefn = scr_write },
 REGINFO_SENTINEL
 };
 
@@ -2960,7 +2964,10 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 }
 }
 if (arm_feature(env, ARM_FEATURE_EL3)) {
-define_arm_cp_regs(cpu, v8_el3_cp_reginfo);
+if (arm_feature(env, ARM_FEATURE_V8)) {
+define_arm_cp_regs(cpu, v8_el3_cp_reginfo);
+}
+define_arm_cp_regs(cpu, el3_cp_reginfo);
 }
 if (arm_feature(env, ARM_FEATURE_MPU)) {
 /* These are the MPU registers prior to PMSAv6. Any new
-- 
1.8.3.2

[Qemu-devel] [PATCH v10 05/26] target-arm: add CPREG secure state support

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Prepare ARMCPRegInfo to support specifying two fieldoffsets per
register definition. This will allow us to keep one register
definition for banked registers (different offsets for secure/
non-secure world).

Also added secure state tracking field and flags.  This allows for
identification of the register info secure state.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org

---

v8 - v9
- Removed ARM_CP_SECSTATE_TEST macro
- Replaced dropped comment

v7 - v8
- Break up the fieldoffset union to avoid need for sometimes overwriting one
  bank when updating fieldoffset.  This also removes the need for the #define
  short-cut introduced in v7.

v6 - v7
- Add naming for fieldoffset fields and macros for accessing.  This was needed
  to overcome issues with the GCC-4.4 compiler.

v5 - v6
- Separate out secure CPREG flags
- Add convenience macro for testing flags
- Removed extraneous newline
- Move add_cpreg_to_hashtable() functionality to a later commit for which it is
  dependent on.
- Added comment explaining fieldoffset padding

v4 - v5
- Added ARM CP register secure and non-secure bank flags
- Added setting of secure and non-secure flags furing registration
---
 target-arm/cpu.h | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 69aed3e..8ee9026 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -993,6 +993,21 @@ enum {
 ARM_CP_STATE_BOTH = 2,
 };
 
+/* ARM CP register secure state flags.  These flags identify security state
+ * attributes for a given CP register entry.
+ * The existence of both or neither secure and non-secure flags indicates that
+ * the register has both a secure and non-secure hash entry.  A single one of
+ * these flags causes the register to only be hashed for the specified
+ * security state.
+ * Although definitions may have any combination of the S/NS bits, each
+ * registered entry will only have one to identify whether the entry is secure
+ * or non-secure.
+ */
+enum {
+ARM_CP_SECSTATE_S =   (1  0), /* bit[0]: Secure state register */
+ARM_CP_SECSTATE_NS =  (1  1), /* bit[1]: Non-secure state register */
+};
+
 /* Return true if cptype is a valid type field. This is used to try to
  * catch errors where the sentinel has been accidentally left off the end
  * of a list of registers.
@@ -1127,6 +1142,8 @@ struct ARMCPRegInfo {
 int type;
 /* Access rights: PL*_[RW] */
 int access;
+/* Security state: ARM_CP_SECSTATE_* bits/values */
+int secure;
 /* The opaque pointer passed to define_arm_cp_regs_with_opaque() when
  * this register was defined: can be used to hand data through to the
  * register read/write functions, since they are passed the ARMCPRegInfo*.
@@ -1136,12 +1153,27 @@ struct ARMCPRegInfo {
  * fieldoffset is non-zero, the reset value of the register.
  */
 uint64_t resetvalue;
-/* Offset of the field in CPUARMState for this register. This is not
- * needed if either:
+/* Offset of the field in CPUARMState for this register.
+ *
+ * This is not needed if either:
  *  1. type is ARM_CP_CONST or one of the ARM_CP_SPECIALs
  *  2. both readfn and writefn are specified
  */
 ptrdiff_t fieldoffset; /* offsetof(CPUARMState, field) */
+
+/* Offsets of the secure and non-secure fields in CPUARMState for the
+ * register if it is banked.  These fields are only used during the static
+ * registration of a register.  During hashing the bank associated
+ * with a given security state is copied to fieldoffset which is used from
+ * there on out.
+ *
+ * It is expected that register definitions use either fieldoffset or
+ * bank_fieldoffsets in the definition but not both.  It is also expected
+ * that both bank offsets are set when defining a banked register.  This
+ * use indicates that a register is banked.
+ */
+ptrdiff_t bank_fieldoffsets[2];
+
 /* Function for making any access checks for this register in addition to
  * those specified by the 'access' permissions bits. If NULL, no extra
  * checks required. The access check is performed at runtime, not at
-- 
1.8.3.2

[Qemu-devel] [PATCH v10 06/26] target-arm: add secure state bit to CPREG hash

2014-11-06 Thread Greg Bellows

Added additional NS-bit to CPREG hash encoding.  Updated hash lookup
locations to specify hash bit currently set to non-secure.

Signed-off-by: Greg Bellows greg.bell...@linaro.org

---

v8 - v9
- Fixed CP_REG_NS_MASK
- Changed ENCODE_CP_REG argument order so ns follows is64
- Replaced use of CP_REG_NS_MASK with CP_REG_NS_SHIFT
- Changed add_cpreg_to_hashtable argument order so ns follows is64
- Replaced use of SCR_NS with ARM_CP_SECSTATE_NS on registration
- Undid global replace of Aarch# with AArch# in translate.c

v5 - v6
- Globally replace Aarch# with AArch#
---
 target-arm/cpu.h   | 25 -
 target-arm/helper.c|  7 ---
 target-arm/translate.c | 14 +-
 3 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 8ee9026..baa709b 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -879,6 +879,7 @@ void armv7m_nvic_complete_irq(void *opaque, int irq);
  *  Crn, Crm, opc1, opc2 fields
  *  32 or 64 bit register (ie is it accessed via MRC/MCR
  *or via MRRC/MCRR?)
+ *  non-secure/secure bank (AArch32 only)
  * We allow 4 bits for opc1 because MRRC/MCRR have a 4 bit field.
  * (In this case crn and opc2 should be zero.)
  * For AArch64, there is no 32/64 bit size distinction;
@@ -896,9 +897,16 @@ void armv7m_nvic_complete_irq(void *opaque, int irq);
 #define CP_REG_AA64_SHIFT 28
 #define CP_REG_AA64_MASK (1  CP_REG_AA64_SHIFT)
 
-#define ENCODE_CP_REG(cp, is64, crn, crm, opc1, opc2)   \
-(((cp)  16) | ((is64)  15) | ((crn)  11) |\
- ((crm)  7) | ((opc1)  3) | (opc2))
+/* To enable banking of coprocessor registers depending on ns-bit we
+ * add a bit to distinguish between secure and non-secure cpregs in the
+ * hashtable.
+ */
+#define CP_REG_NS_SHIFT 29
+#define CP_REG_NS_MASK (1  CP_REG_NS_SHIFT)
+
+#define ENCODE_CP_REG(cp, is64, ns, crn, crm, opc1, opc2)   \
+((ns)  CP_REG_NS_SHIFT | ((cp)  16) | ((is64)  15) |   \
+ ((crn)  11) | ((crm)  7) | ((opc1)  3) | (opc2))
 
 #define ENCODE_AA64_CP_REG(cp, crn, crm, op0, op1, op2) \
 (CP_REG_AA64_MASK | \
@@ -917,8 +925,15 @@ static inline uint32_t kvm_to_cpreg_id(uint64_t kvmid)
 uint32_t cpregid = kvmid;
 if ((kvmid  CP_REG_ARCH_MASK) == CP_REG_ARM64) {
 cpregid |= CP_REG_AA64_MASK;
-} else if ((kvmid  CP_REG_SIZE_MASK) == CP_REG_SIZE_U64) {
-cpregid |= (1  15);
+} else {
+if ((kvmid  CP_REG_SIZE_MASK) == CP_REG_SIZE_U64) {
+cpregid |= (1  15);
+}
+
+/* KVM is always non-secure so add the NS flag on AArch32 register
+ * entries.
+ */
+ cpregid |= 1  CP_REG_NS_SHIFT;
 }
 return cpregid;
 }
diff --git a/target-arm/helper.c b/target-arm/helper.c
index a48ebae..1aadb79 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -3287,7 +3287,7 @@ CpuDefinitionInfoList *arch_query_cpu_definitions(Error 
**errp)
 }
 
 static void add_cpreg_to_hashtable(ARMCPU *cpu, const ARMCPRegInfo *r,
-   void *opaque, int state,
+   void *opaque, int state, int secstate,
int crm, int opc1, int opc2)
 {
 /* Private utility function for define_one_arm_cp_reg_with_opaque():
@@ -3296,6 +3296,7 @@ static void add_cpreg_to_hashtable(ARMCPU *cpu, const 
ARMCPRegInfo *r,
 uint32_t *key = g_new(uint32_t, 1);
 ARMCPRegInfo *r2 = g_memdup(r, sizeof(ARMCPRegInfo));
 int is64 = (r-type  ARM_CP_64BIT) ? 1 : 0;
+int ns = (r-secure  ARM_CP_SECSTATE_NS) ? 1 : 0;
 if (r-state == ARM_CP_STATE_BOTH  state == ARM_CP_STATE_AA32) {
 /* The AArch32 view of a shared register sees the lower 32 bits
  * of a 64 bit backing field. It is not migratable as the AArch64
@@ -3327,7 +3328,7 @@ static void add_cpreg_to_hashtable(ARMCPU *cpu, const 
ARMCPRegInfo *r,
 *key = ENCODE_AA64_CP_REG(r2-cp, r2-crn, crm,
   r2-opc0, opc1, opc2);
 } else {
-*key = ENCODE_CP_REG(r2-cp, is64, r2-crn, crm, opc1, opc2);
+*key = ENCODE_CP_REG(r2-cp, is64, ns, r2-crn, crm, opc1, opc2);
 }
 if (opaque) {
 r2-opaque = opaque;
@@ -3477,7 +3478,7 @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
 continue;
 }
 add_cpreg_to_hashtable(cpu, r, opaque, state,
-   crm, opc1, opc2);
+   ARM_CP_SECSTATE_NS, crm, opc1, 
opc2);
 }
 }
 }
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 17c459a..b52c758 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7091,7 +7091,7 @@ static int disas_coproc_insn(DisasContext *s, uint32_t 
insn)
 rt = (insn  12)  0xf;
 
 ri = get_arm_cp_reginfo(s-cp_regs,
-ENCODE_CP_REG(cpnum, is64, crn,

[Qemu-devel] [PATCH v10 24/26] target-arm: make c13 cp regs banked (FCSEIDR, ...)

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

When EL3 is running in AArch32 (or ARMv7 with Security Extensions)
FCSEIDR, CONTEXTIDR, TPIDRURW, TPIDRURO and TPIDRPRW have a secure
and a non-secure instance.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org

---

v8 - v9
- Changed contextidr structure definition to have 4 uint64_t fields.
- Broke up secure/non-secure CONTEXTIDR defs so the secure instance can be
  properly migrated and reset.
- Broke up secure/non-secure  FCSEIDR defs.
- Reversed CP register field reordering
- Reversed white-space changes

v6 - v7
- Fix linux-user/arm/target-cpu.h to use array based tpidr_el.
- Fix linux-user/main.c to use array based tpidrro_el.
- Remove tab identified by checkpatch failure.
- FIx linux-user/aarch64/target_cpu.h to use array based tpidr_el.

v5 - v6
- Changed _el field variants to be array based
- Rework data layout for correct aliasing
- Merged CONTEXTIDR and CONTEXTIDR_EL1 reginfo entries

v3 - v4
- Fix tpidrprw mapping
---
 linux-user/aarch64/target_cpu.h |  2 +-
 linux-user/arm/target_cpu.h |  2 +-
 linux-user/main.c   |  2 +-
 target-arm/cpu.h| 36 +
 target-arm/helper.c | 58 -
 target-arm/op_helper.c  |  2 +-
 6 files changed, 80 insertions(+), 22 deletions(-)

diff --git a/linux-user/aarch64/target_cpu.h b/linux-user/aarch64/target_cpu.h
index 21560ef..b5593dc 100644
--- a/linux-user/aarch64/target_cpu.h
+++ b/linux-user/aarch64/target_cpu.h
@@ -32,7 +32,7 @@ static inline void cpu_set_tls(CPUARMState *env, target_ulong 
newtls)
 /* Note that AArch64 Linux keeps the TLS pointer in TPIDR; this is
  * different from AArch32 Linux, which uses TPIDRRO.
  */
-env-cp15.tpidr_el0 = newtls;
+env-cp15.tpidr_el[0] = newtls;
 }
 
 #endif
diff --git a/linux-user/arm/target_cpu.h b/linux-user/arm/target_cpu.h
index 39d65b6..d8a534d 100644
--- a/linux-user/arm/target_cpu.h
+++ b/linux-user/arm/target_cpu.h
@@ -29,7 +29,7 @@ static inline void cpu_clone_regs(CPUARMState *env, 
target_ulong newsp)
 
 static inline void cpu_set_tls(CPUARMState *env, target_ulong newtls)
 {
-env-cp15.tpidrro_el0 = newtls;
+env-cp15.tpidrro_el[0] = newtls;
 }
 
 #endif
diff --git a/linux-user/main.c b/linux-user/main.c
index 5c14c1e..186ee4d 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -564,7 +564,7 @@ do_kernel_trap(CPUARMState *env)
 end_exclusive();
 break;
 case 0x0fe0: /* __kernel_get_tls */
-env-regs[0] = env-cp15.tpidrro_el0;
+env-regs[0] = env-cp15.tpidrro_el[0];
 break;
 case 0x0f60: /* __kernel_cmpxchg64 */
 arm_kernel_cmpxchg64_helper(env);
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 0bf88a2..dd0dee0 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -307,11 +307,37 @@ typedef struct CPUARMState {
 uint64_t vbar_el[4];
 };
 uint32_t mvbar; /* (monitor) vector base address register */
-uint32_t c13_fcse; /* FCSE PID.  */
-uint64_t contextidr_el1; /* Context ID.  */
-uint64_t tpidr_el0; /* User RW Thread register.  */
-uint64_t tpidrro_el0; /* User RO Thread register.  */
-uint64_t tpidr_el1; /* Privileged Thread register.  */
+struct { /* FCSE PID. */
+uint32_t fcseidr_ns;
+uint32_t fcseidr_s;
+};
+union { /* Context ID. */
+struct {
+uint64_t _unused_contextidr_0;
+uint64_t contextidr_ns;
+uint64_t _unused_contextidr_1;
+uint64_t contextidr_s;
+};
+uint64_t contextidr_el[4];
+};
+union { /* User RW Thread register. */
+struct {
+uint64_t tpidrurw_ns;
+uint64_t tpidrprw_ns;
+uint64_t htpidr;
+uint64_t _tpidr_el3;
+};
+uint64_t tpidr_el[4];
+};
+/* The secure banks of these registers don't map anywhere */
+uint64_t tpidrurw_s;
+uint64_t tpidrprw_s;
+uint64_t tpidruro_s;
+
+union { /* User RO Thread register. */
+uint64_t tpidruro_ns;
+uint64_t tpidrro_el[1];
+};
 uint64_t c14_cntfrq; /* Counter Frequency register */
 uint64_t c14_cntkctl; /* Timer Control register */
 ARMGenericTimer c14_timer[NUM_GTIMERS];
diff --git a/target-arm/helper.c b/target-arm/helper.c
index d4461f0..0b5330d 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -424,13 +424,36 @@ static void tlbimvaa_is_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 }
 
 static const ARMCPRegInfo cp_reginfo[] = {
-{ .name = FCSEIDR, .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 0,
-  .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c13_fcse),
+/* Define the secure and

[Qemu-devel] [PATCH v10 07/26] target-arm: insert AArch32 cpregs twice into hashtable

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Prepare for cp register banking by inserting every cp register twice,
once for secure world and once for non-secure world.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org

---

v8 - v9
- Fixed setting of secure field in add_cpreg_to_hashtable so it uses secstate
  and happens in all cases.
- Fixed check for disabling of migration to only occur on duplicately defined
  reginfos.
- Fixed comment on disabling migration and reset and eliminated crn special
  case.
- Reworked define_one_arm_cp_reg_with_opaque() secure case handling.

v7 - v8
- Updated define registers asserts to allow either a non-zero fieldoffset or
  non-zero bank_fieldoffsets.
- Updated CP register hashing to always set the register fieldoffset when
  banked register offsets are specified.

v5 - v6
- Fixed NS-bit number in the CPREG hash lookup from 27 to 29.
- Switched to dedicated CPREG secure flags.
- Fixed disablement of reset and migration of common 32/64-bit registers.
- Globally replace Aarch# with AArch#

v4 - v5
- Added use of ARM CP secure/non-secure bank flags during register processing
  in define_one_arm_cp_reg_with_opaque().  We now only register the specified
  bank if only one flag is specified, otherwise we register both a secure and
  non-secure instance.
---
 target-arm/helper.c | 98 +++--
 1 file changed, 81 insertions(+), 17 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 1aadb79..0471e6c 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -3296,23 +3296,59 @@ static void add_cpreg_to_hashtable(ARMCPU *cpu, const 
ARMCPRegInfo *r,
 uint32_t *key = g_new(uint32_t, 1);
 ARMCPRegInfo *r2 = g_memdup(r, sizeof(ARMCPRegInfo));
 int is64 = (r-type  ARM_CP_64BIT) ? 1 : 0;
-int ns = (r-secure  ARM_CP_SECSTATE_NS) ? 1 : 0;
-if (r-state == ARM_CP_STATE_BOTH  state == ARM_CP_STATE_AA32) {
-/* The AArch32 view of a shared register sees the lower 32 bits
- * of a 64 bit backing field. It is not migratable as the AArch64
- * view handles that. AArch64 also handles reset.
- * We assume it is a cp15 register if the .cp field is left unset.
+int ns = (secstate  ARM_CP_SECSTATE_NS) ? 1 : 0;
+
+/* Reset the secure state to the specific incoming state.  This is
+ * necessary as the register may have been defined with both states.
+ */
+r2-secure = secstate;
+
+if (r-bank_fieldoffsets[0]  r-bank_fieldoffsets[1]) {
+/* Register is banked (using both entries in array).
+ * Overwriting fieldoffset as the array is only used to define
+ * banked registers but later only fieldoffset is used.
  */
-if (r2-cp == 0) {
-r2-cp = 15;
+r2-fieldoffset = r-bank_fieldoffsets[ns];
+}
+
+if (state == ARM_CP_STATE_AA32) {
+if (r-bank_fieldoffsets[0]  r-bank_fieldoffsets[1]) {
+/* If the register is banked then we don't need to migrate or
+ * reset the 32-bit instance in certain cases:
+ *
+ * 1) If the register has both 32-bit and 64-bit instances then we
+ *can count on the 64-bit instance taking care of the
+ *non-secure bank.
+ * 2) If ARMv8 is enabled then we can count on a 64-bit version
+ *taking care of the secure bank.  This requires that separate
+ *32 and 64-bit definitions are provided.
+ */
+if ((r-state == ARM_CP_STATE_BOTH  ns) ||
+(arm_feature(cpu-env, ARM_FEATURE_V8)  !ns)) {
+r2-type |= ARM_CP_NO_MIGRATE;
+r2-resetfn = arm_cp_reset_ignore;
+}
+} else if ((secstate != r-secure)  !ns) {
+/* The register is not banked so we only want to allow migration of
+ * the non-secure instance.
+ */
+r2-type |= ARM_CP_NO_MIGRATE;
+r2-resetfn = arm_cp_reset_ignore;
 }
-r2-type |= ARM_CP_NO_MIGRATE;
-r2-resetfn = arm_cp_reset_ignore;
+
+if (r-state == ARM_CP_STATE_BOTH) {
+/* We assume it is a cp15 register if the .cp field is left unset.
+ */
+if (r2-cp == 0) {
+r2-cp = 15;
+}
+
 #ifdef HOST_WORDS_BIGENDIAN
-if (r2-fieldoffset) {
-r2-fieldoffset += sizeof(uint32_t);
-}
+if (r2-fieldoffset) {
+r2-fieldoffset += sizeof(uint32_t);
+}
 #endif
+}
 }
 if (state == ARM_CP_STATE_AA64) {
 /* To allow abbreviation of ARMCPRegInfo
@@ -3461,10 +3497,14 @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
  */
 if (!(r-type  (ARM_CP_SPECIAL|ARM_CP_CONST))) {
 if (r-access  PL3_R) {
-assert(r-fieldoffset || r-readfn);
+assert((r-fieldoffset ||
+

[Qemu-devel] [PATCH v10 09/26] target-arm: implement IRQ/FIQ routing to Monitor mode

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

SCR.{IRQ/FIQ} bits allow to route IRQ/FIQ exceptions to monitor CPU
mode. When taking IRQ exception to monitor mode FIQ exception is
additionally masked.

Signed-off-by: Sergey Fedorov s.fedo...@samsung.com
Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org
---
 target-arm/helper.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 1be185d..3086c2c 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4233,12 +4233,21 @@ void arm_cpu_do_interrupt(CPUState *cs)
 /* Disable IRQ and imprecise data aborts.  */
 mask = CPSR_A | CPSR_I;
 offset = 4;
+if (env-cp15.scr_el3  SCR_IRQ) {
+/* IRQ routed to monitor mode */
+new_mode = ARM_CPU_MODE_MON;
+mask |= CPSR_F;
+}
 break;
 case EXCP_FIQ:
 new_mode = ARM_CPU_MODE_FIQ;
 addr = 0x1c;
 /* Disable FIQ, IRQ and imprecise data aborts.  */
 mask = CPSR_A | CPSR_I | CPSR_F;
+if (env-cp15.scr_el3  SCR_FIQ) {
+/* FIQ routed to monitor mode */
+new_mode = ARM_CPU_MODE_MON;
+}
 offset = 4;
 break;
 case EXCP_SMC:
-- 
1.8.3.2

[Qemu-devel] [PATCH v10 12/26] target-arm: add MVBAR support

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Use MVBAR register as exception vector base address for
exceptions taken to CPU monitor mode.

Signed-off-by: Sergey Fedorov s.fedo...@samsung.com
Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org

---

v8 - v9
- Fixed declaration order of the MVBARR register components

v7 - v8
- Changed the mvbar cp15 storage from uint64_t to uint32_t
---
 target-arm/cpu.h|  1 +
 target-arm/helper.c | 15 +--
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 7a860e6..cdc6f6d 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -211,6 +211,7 @@ typedef struct CPUARMState {
 uint32_t c9_pminten; /* perf monitor interrupt enables */
 uint64_t mair_el1;
 uint64_t vbar_el[4]; /* vector base address register */
+uint32_t mvbar; /* (monitor) vector base address register */
 uint32_t c13_fcse; /* FCSE PID.  */
 uint64_t contextidr_el1; /* Context ID.  */
 uint64_t tpidr_el0; /* User RW Thread register.  */
diff --git a/target-arm/helper.c b/target-arm/helper.c
index cb15ad4..a12ba1f 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2356,6 +2356,9 @@ static const ARMCPRegInfo el3_cp_reginfo[] = {
 { .name = NSACR, .cp = 15, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 2,
   .access = PL3_W | PL1_R, .resetvalue = 0,
   .fieldoffset = offsetof(CPUARMState, cp15.nsacr) },
+{ .name = MVBAR, .cp = 15, .opc1 = 0, .crn = 12, .crm = 0, .opc2 = 1,
+  .access = PL3_RW, .writefn = vbar_write, .resetvalue = 0,
+  .fieldoffset = offsetof(CPUARMState, cp15.mvbar) },
 REGINFO_SENTINEL
 };
 
@@ -4272,16 +4275,16 @@ void arm_cpu_do_interrupt(CPUState *cs)
 cpu_abort(cs, Unhandled exception 0x%x\n, cs-exception_index);
 return; /* Never happens.  Keep compiler happy.  */
 }
-/* High vectors.  */
-if (env-cp15.c1_sys  SCTLR_V) {
-/* when enabled, base address cannot be remapped.  */
+
+if (new_mode == ARM_CPU_MODE_MON) {
+addr += env-cp15.mvbar;
+} else if (env-cp15.c1_sys  SCTLR_V) {
+/* High vectors. When enabled, base address cannot be remapped. */
 addr += 0x;
 } else {
 /* ARM v7 architectures provide a vector base address register to remap
  * the interrupt vector table.
- * This register is only followed in non-monitor mode, and has a secure
- * and un-secure copy. Since the cpu is always in a un-secure operation
- * and is never in monitor mode this feature is always active.
+ * This register is only followed in non-monitor mode, and is banked.
  * Note: only bits 31:5 are valid.
  */
 addr += env-cp15.vbar_el[1];
-- 
1.8.3.2

[Qemu-devel] [PATCH v10 10/26] target-arm: add NSACR register

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Implements NSACR register with corresponding read/write functions
for ARMv7 and ARMv8.

Signed-off-by: Sergey Fedorov s.fedo...@samsung.com
Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org

---

v8 - v9
- Removed unused NSACR constants
- Added TODO for trapping secure EL1 accesses to NSACR
- Change NSACR access from PL3_RW to PL3_W
- Fixed declaration order of the NSACR register components

v7 - v8
- Update naming from c1_nsacr to nsacr to match other registers being changed.
- Remove NSACR read/write functions

v4 - v5
- Changed to use renamed arm_current_el()
---
 target-arm/cpu.h| 1 +
 target-arm/helper.c | 4 
 2 files changed, 5 insertions(+)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index baa709b..97f952c 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -181,6 +181,7 @@ typedef struct CPUARMState {
 uint64_t c1_sys; /* System control register.  */
 uint64_t c1_coproc; /* Coprocessor access register.  */
 uint32_t c1_xscaleauxcr; /* XScale auxiliary control register.  */
+uint32_t nsacr; /* Non-secure access control register. */
 uint64_t ttbr0_el1; /* MMU translation table base 0. */
 uint64_t ttbr1_el1; /* MMU translation table base 1. */
 uint64_t c2_control; /* MMU translation table base control.  */
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 3086c2c..016cf39 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2344,6 +2344,10 @@ static const ARMCPRegInfo el3_cp_reginfo[] = {
   .cp = 15, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 0,
   .access = PL3_RW, .fieldoffset = offsetoflow32(CPUARMState, 
cp15.scr_el3),
   .resetfn = arm_cp_reset_ignore, .writefn = scr_write },
+  /* TODO: Implement NSACR trapping of secure EL1 accesses to EL3 */
+{ .name = NSACR, .cp = 15, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 2,
+  .access = PL3_W | PL1_R, .resetvalue = 0,
+  .fieldoffset = offsetof(CPUARMState, cp15.nsacr) },
 REGINFO_SENTINEL
 };
 
-- 
1.8.3.2

[Qemu-devel] [PATCH v10 26/26] target-arm: add cpu feature EL3 to CPUs with Security Extensions

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Set ARM_FEATURE_EL3 feature for CPUs that implement Security Extensions.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org
---
 target-arm/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index d3db279..1871865 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -640,6 +640,7 @@ static void arm1176_initfn(Object *obj)
 set_feature(cpu-env, ARM_FEATURE_DUMMY_C15_REGS);
 set_feature(cpu-env, ARM_FEATURE_CACHE_DIRTY_REG);
 set_feature(cpu-env, ARM_FEATURE_CACHE_BLOCK_OPS);
+set_feature(cpu-env, ARM_FEATURE_EL3);
 cpu-midr = 0x410fb767;
 cpu-reset_fpsid = 0x410120b5;
 cpu-mvfr0 = 0x;
@@ -728,6 +729,7 @@ static void cortex_a8_initfn(Object *obj)
 set_feature(cpu-env, ARM_FEATURE_NEON);
 set_feature(cpu-env, ARM_FEATURE_THUMB2EE);
 set_feature(cpu-env, ARM_FEATURE_DUMMY_C15_REGS);
+set_feature(cpu-env, ARM_FEATURE_EL3);
 cpu-midr = 0x410fc080;
 cpu-reset_fpsid = 0x410330c0;
 cpu-mvfr0 = 0x0222;
@@ -795,6 +797,7 @@ static void cortex_a9_initfn(Object *obj)
 set_feature(cpu-env, ARM_FEATURE_VFP_FP16);
 set_feature(cpu-env, ARM_FEATURE_NEON);
 set_feature(cpu-env, ARM_FEATURE_THUMB2EE);
+set_feature(cpu-env, ARM_FEATURE_EL3);
 /* Note that A9 supports the MP extensions even for
  * A9UP and single-core A9MP (which are both different
  * and valid configurations; we don't model A9UP).
@@ -862,6 +865,7 @@ static void cortex_a15_initfn(Object *obj)
 set_feature(cpu-env, ARM_FEATURE_DUMMY_C15_REGS);
 set_feature(cpu-env, ARM_FEATURE_CBAR_RO);
 set_feature(cpu-env, ARM_FEATURE_LPAE);
+set_feature(cpu-env, ARM_FEATURE_EL3);
 cpu-kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A15;
 cpu-midr = 0x412fc0f1;
 cpu-reset_fpsid = 0x410430f0;
-- 
1.8.3.2

[Qemu-devel] [PATCH v10 11/26] target-arm: add SDER definition

2014-11-06 Thread Greg Bellows

Added CP register defintions for SDER and SDER32_EL3 as well as cp15.sder for
register storage.

Signed-off-by: Sergey Fedorov s.fedo...@samsung.com
Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org

---

v8 - v9
- Fixed declaration order of the SDER register components

v7 - v8
- Added SDER32_EL3 register definition
- Changed sder name from c1_sder to sder
- Changed sder from uint32_t to uint64_t.
---
 target-arm/cpu.h| 1 +
 target-arm/helper.c | 8 
 2 files changed, 9 insertions(+)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 97f952c..7a860e6 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -181,6 +181,7 @@ typedef struct CPUARMState {
 uint64_t c1_sys; /* System control register.  */
 uint64_t c1_coproc; /* Coprocessor access register.  */
 uint32_t c1_xscaleauxcr; /* XScale auxiliary control register.  */
+uint64_t sder; /* Secure debug enable register. */
 uint32_t nsacr; /* Non-secure access control register. */
 uint64_t ttbr0_el1; /* MMU translation table base 0. */
 uint64_t ttbr1_el1; /* MMU translation table base 1. */
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 016cf39..cb15ad4 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2344,6 +2344,14 @@ static const ARMCPRegInfo el3_cp_reginfo[] = {
   .cp = 15, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 0,
   .access = PL3_RW, .fieldoffset = offsetoflow32(CPUARMState, 
cp15.scr_el3),
   .resetfn = arm_cp_reset_ignore, .writefn = scr_write },
+{ .name = SDER32_EL3, .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 1, .opc2 = 1,
+  .access = PL3_RW, .resetvalue = 0,
+  .fieldoffset = offsetof(CPUARMState, cp15.sder) },
+{ .name = SDER,
+  .cp = 15, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 1,
+  .access = PL3_RW, .resetvalue = 0,
+  .fieldoffset = offsetoflow32(CPUARMState, cp15.sder) },
   /* TODO: Implement NSACR trapping of secure EL1 accesses to EL3 */
 { .name = NSACR, .cp = 15, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 2,
   .access = PL3_W | PL1_R, .resetvalue = 0,
-- 
1.8.3.2

[Qemu-devel] [PATCH v10 04/26] target-arm: add non-secure Translation Block flag

2014-11-06 Thread Greg Bellows

From: Sergey Fedorov s.fedo...@samsung.com

This patch is based on idea found in patch at
git://github.com/jowinter/qemu-trustzone.git
f3d955c6c0ed8c46bc0eb10b634201032a651dd2 by
Johannes Winter johannes.win...@iaik.tugraz.at.

The TBFLAG captures the SCR NS secure state at the time when a TB is created so
the correct bank is accessed on system register accesses.

Signed-off-by: Sergey Fedorov s.fedo...@samsung.com
Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org

---

v7 - v8
- Moved and renamed use_secure_reg() to this patch.  New name is
  access_secure_reg().
- Fixed function comment

v5 - v6
- Removed 64-bit NS TBFLAG macros as they are not needed
- Added comment on DisasContext ns field
- Replaced use of USE_SECURE_REG with use_secure_reg

v4 - v5
- Merge changes
- Fixed issue where TB secure state flag was incorrectly being set based on
  secure state rather than NS setting.  This caused an issue where monitor mode
  MRC/MCR accesses were always secure rather than being based on NS bit
  setting.
- Added separate 64/32 TB secure state flags
- Unconditionalized the setting of the DC ns bit
- Removed IS_NS macro and replaced with direct usage.
---
 target-arm/cpu.h   | 27 +++
 target-arm/translate.c |  1 +
 target-arm/translate.h |  1 +
 3 files changed, 29 insertions(+)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 7769ccf..69aed3e 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -817,6 +817,22 @@ static inline bool arm_el_is_aa64(CPUARMState *env, int el)
 return arm_feature(env, ARM_FEATURE_AARCH64);
 }
 
+/* Function for determing whether guest cp register reads and writes should
+ * access the secure or non-secure bank of a cp register.  When EL3 is
+ * operating in AArch32 state, the NS-bit determines whether the secure
+ * instance of a cp register should be used. When EL3 is AArch64 (or if
+ * it doesn't exist at all) then there is no register banking, and all
+ * accesses are to the non-secure version.
+ */
+static inline bool access_secure_reg(CPUARMState *env)
+{
+bool ret = (arm_feature(env, ARM_FEATURE_EL3) 
+!arm_el_is_aa64(env, 3) 
+!(env-cp15.scr_el3  SCR_NS));
+
+return ret;
+}
+
 /* Macros for accessing a specified CP register bank */
 #define A32_BANKED_REG_GET(_env, _regname, _secure)\
 ((_secure) ? (_env)-cp15._regname##_s : (_env)-cp15._regname##_ns)
@@ -1467,6 +1483,12 @@ static inline bool arm_singlestep_active(CPUARMState 
*env)
  */
 #define ARM_TBFLAG_XSCALE_CPAR_SHIFT 20
 #define ARM_TBFLAG_XSCALE_CPAR_MASK (3  ARM_TBFLAG_XSCALE_CPAR_SHIFT)
+/* Indicates whether cp register reads and writes by guest code should access
+ * the secure or nonsecure bank of banked registers; note that this is not
+ * the same thing as the current security state of the processor!
+ */
+#define ARM_TBFLAG_NS_SHIFT 22
+#define ARM_TBFLAG_NS_MASK  (1  ARM_TBFLAG_NS_SHIFT)
 
 /* Bit usage when in AArch64 state */
 #define ARM_TBFLAG_AA64_EL_SHIFT0
@@ -1511,6 +1533,8 @@ static inline bool arm_singlestep_active(CPUARMState *env)
 (((F)  ARM_TBFLAG_AA64_SS_ACTIVE_MASK)  ARM_TBFLAG_AA64_SS_ACTIVE_SHIFT)
 #define ARM_TBFLAG_AA64_PSTATE_SS(F) \
 (((F)  ARM_TBFLAG_AA64_PSTATE_SS_MASK)  ARM_TBFLAG_AA64_PSTATE_SS_SHIFT)
+#define ARM_TBFLAG_NS(F) \
+(((F)  ARM_TBFLAG_NS_MASK)  ARM_TBFLAG_NS_SHIFT)
 
 static inline void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
 target_ulong *cs_base, int *flags)
@@ -1560,6 +1584,9 @@ static inline void cpu_get_tb_cpu_state(CPUARMState *env, 
target_ulong *pc,
 if (privmode) {
 *flags |= ARM_TBFLAG_PRIV_MASK;
 }
+if (!(access_secure_reg(env))) {
+*flags |= ARM_TBFLAG_NS_MASK;
+}
 if (env-vfp.xregs[ARM_VFP_FPEXC]  (1  30)
 || arm_el_is_aa64(env, 1)) {
 *flags |= ARM_TBFLAG_VFPEN_MASK;
diff --git a/target-arm/translate.c b/target-arm/translate.c
index af51568..17c459a 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11031,6 +11031,7 @@ static inline void 
gen_intermediate_code_internal(ARMCPU *cpu,
 #if !defined(CONFIG_USER_ONLY)
 dc-user = (ARM_TBFLAG_PRIV(tb-flags) == 0);
 #endif
+dc-ns = ARM_TBFLAG_NS(tb-flags);
 dc-cpacr_fpen = ARM_TBFLAG_CPACR_FPEN(tb-flags);
 dc-vfp_enabled = ARM_TBFLAG_VFPEN(tb-flags);
 dc-vec_len = ARM_TBFLAG_VECLEN(tb-flags);
diff --git a/target-arm/translate.h b/target-arm/translate.h
index 41a9071..f6ee789 100644
--- a/target-arm/translate.h
+++ b/target-arm/translate.h
@@ -20,6 +20,7 @@ typedef struct DisasContext {
 #if !defined(CONFIG_USER_ONLY)
 int user;
 #endif
+bool ns;/* Use non-secure CPREG bank on access */
 bool cpacr_fpen; /* FP enabled via CPACR.FPEN */
 bool vfp_enabled; /* FP enabled via FPSCR.EN */

[Qemu-devel] [PATCH v10 13/26] target-arm: add SCTLR_EL3 and make SCTLR banked

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Implements SCTLR_EL3 and uses secure/non-secure instance when
needed.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org

---

v9 - v10
- Fix SCTLR to use opc0 instead of cp

v8 - v9
- Remove the v8 check in arm_cpu_reset when setting regs[15]
- Fix SCTLR definition component order

v5 - v6
- Changed _el field variants to be array based
- Consolidate SCTLR and SCTLR_EL1 reginfo entries
---
 hw/arm/pxa2xx.c|  2 +-
 target-arm/cpu.c   |  8 --
 target-arm/cpu.h   | 10 ++-
 target-arm/helper.c| 72 +-
 target-arm/op_helper.c |  2 +-
 5 files changed, 59 insertions(+), 35 deletions(-)

diff --git a/hw/arm/pxa2xx.c b/hw/arm/pxa2xx.c
index 693dfec..11d51af 100644
--- a/hw/arm/pxa2xx.c
+++ b/hw/arm/pxa2xx.c
@@ -273,7 +273,7 @@ static void pxa2xx_pwrmode_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 case 3:
 s-cpu-env.uncached_cpsr = ARM_CPU_MODE_SVC;
 s-cpu-env.daif = PSTATE_A | PSTATE_F | PSTATE_I;
-s-cpu-env.cp15.c1_sys = 0;
+s-cpu-env.cp15.sctlr_ns = 0;
 s-cpu-env.cp15.c1_coproc = 0;
 s-cpu-env.cp15.ttbr0_el1 = 0;
 s-cpu-env.cp15.c3 = 0;
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 5ce7350..fdb7b35 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -109,7 +109,7 @@ static void arm_cpu_reset(CPUState *s)
 #if defined(CONFIG_USER_ONLY)
 env-pstate = PSTATE_MODE_EL0t;
 /* Userspace expects access to DC ZVA, CTL_EL0 and the cache ops */
-env-cp15.c1_sys |= SCTLR_UCT | SCTLR_UCI | SCTLR_DZE;
+env-cp15.sctlr_el[1] |= SCTLR_UCT | SCTLR_UCI | SCTLR_DZE;
 /* and to the FP/Neon instructions */
 env-cp15.c1_coproc = deposit64(env-cp15.c1_coproc, 20, 2, 3);
 #else
@@ -167,7 +167,11 @@ static void arm_cpu_reset(CPUState *s)
 env-thumb = initial_pc  1;
 }
 
-if (env-cp15.c1_sys  SCTLR_V) {
+/* AArch32 has a hard highvec setting of 0x.  If we are currently
+ * executing as AArch32 then check if highvecs are enabled and
+ * adjust the PC accordingly.
+ */
+if (A32_BANKED_CURRENT_REG_GET(env, sctlr)  SCTLR_V) {
 env-regs[15] = 0x;
 }
 
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index cdc6f6d..1c8691f 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -178,7 +178,15 @@ typedef struct CPUARMState {
 struct {
 uint32_t c0_cpuid;
 uint64_t c0_cssel; /* Cache size selection.  */
-uint64_t c1_sys; /* System control register.  */
+union { /* System control register. */
+struct {
+uint64_t _unused_sctlr;
+uint64_t sctlr_ns;
+uint64_t hsctlr;
+uint64_t sctlr_s;
+};
+uint64_t sctlr_el[4];
+};
 uint64_t c1_coproc; /* Coprocessor access register.  */
 uint32_t c1_xscaleauxcr; /* XScale auxiliary control register.  */
 uint64_t sder; /* Secure debug enable register. */
diff --git a/target-arm/helper.c b/target-arm/helper.c
index a12ba1f..948192b 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1908,7 +1908,7 @@ static void aa64_fpsr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 
 static CPAccessResult aa64_daif_access(CPUARMState *env, const ARMCPRegInfo 
*ri)
 {
-if (arm_current_el(env) == 0  !(env-cp15.c1_sys  SCTLR_UMA)) {
+if (arm_current_el(env) == 0  !(env-cp15.sctlr_el[1]  SCTLR_UMA)) {
 return CP_ACCESS_TRAP;
 }
 return CP_ACCESS_OK;
@@ -1926,7 +1926,7 @@ static CPAccessResult aa64_cacheop_access(CPUARMState 
*env,
 /* Cache invalidate/clean: NOP, but EL0 must UNDEF unless
  * SCTLR_EL1.UCI is set.
  */
-if (arm_current_el(env) == 0  !(env-cp15.c1_sys  SCTLR_UCI)) {
+if (arm_current_el(env) == 0  !(env-cp15.sctlr_el[1]  SCTLR_UCI)) {
 return CP_ACCESS_TRAP;
 }
 return CP_ACCESS_OK;
@@ -2003,7 +2003,7 @@ static CPAccessResult aa64_zva_access(CPUARMState *env, 
const ARMCPRegInfo *ri)
 /* We don't implement EL2, so the only control on DC ZVA is the
  * bit in the SCTLR which can prohibit access for EL0.
  */
-if (arm_current_el(env) == 0  !(env-cp15.c1_sys  SCTLR_DZE)) {
+if (arm_current_el(env) == 0  !(env-cp15.sctlr_el[1]  SCTLR_DZE)) {
 return CP_ACCESS_TRAP;
 }
 return CP_ACCESS_OK;
@@ -2042,6 +2042,24 @@ static void spsel_write(CPUARMState *env, const 
ARMCPRegInfo *ri, uint64_t val)
 update_spsel(env, val);
 }
 
+static void sctlr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t value)
+{
+ARMCPU *cpu = arm_env_get_cpu(env);
+
+if (raw_read(env, ri) == value) {
+/* Skip the TLB flush if nothing actually changed; Linux likes
+ * to do a lot of pointless SCTLR writes.
+ */
+return;
+}
+
+raw_write(env, ri,

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-06 Thread Paolo Bonzini

On 06/11/2014 07:53, Hanjun Guo wrote:
 So the important question is _why_ the guest needs to see an ACPI
 environment. What exactly can ACPI provide to the guest that DT does not
 already provide, and why is that necessary? What infrastrucutre is
 needed for that use case?
 
 There is important feature called system device dynamic reconfiguration,
 you know, hot-add/remove, if a gust need more/less memory or CPU, can we
 add or remove them dynamically with DT? ACPI can do this, but I have no
 idea if DT can. (Sorry, I don't know much about DT)

Indeed hot-add/remove is the single biggest AML user in x86 QEMU.
Whether you really need it, it depends on what you are adding/removing.

For PCI there is no problem.  We can use PCIe from the beginning, and
use PCIe hotplug support that is already in QEMU.

Memory and CPU are more problematic.  For memory we could perhaps use a
PCI memory device, though I'm not sure if that would require drivers in
the OS or everything just works.

CPU hotplug, however, probably requires AML.  Of course it can be
generated in the firmware, like we used to do for x86, but Igor
explained why it wasn't a great idea.  That said, one of the problems
(never ending expansion of PV QEMU-BIOS interface) could be less
important since ARM DT is a better interface than x86 fw_cfg.

Paolo

[Qemu-devel] [PATCH v10 15/26] target-arm: make CSSELR banked

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Rename CSSELR (cache size selection register) and add secure
instance (AArch32).

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org

---

v8 - v9
- Replaced call to ARM_CP_SECSTATE_TEST with direct access

v7 - v8
- Fix CSSELR CP register definition to use .opc0 rather than .cp.

v5 - v6
- Changed _el field variants to be array based
- Switch to use distinct CPREG secure flags.
- Merged CSSELR and CSSELR_EL1 reginfo entries

v4 - v5
- Changed to use the CCSIDR cpreg bank flag to select the csselr bank instead
  of the  A32_BANKED macro.  This more accurately uses the secure state bank
  matching the CCSIDR.
---
 target-arm/cpu.h| 10 +-
 target-arm/helper.c | 14 +++---
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 1c8691f..67f319f 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -177,7 +177,15 @@ typedef struct CPUARMState {
 /* System control coprocessor (cp15) */
 struct {
 uint32_t c0_cpuid;
-uint64_t c0_cssel; /* Cache size selection.  */
+union { /* Cache size selection */
+struct {
+uint64_t _unused_csselr0;
+uint64_t csselr_ns;
+uint64_t _unused_csselr1;
+uint64_t csselr_s;
+};
+uint64_t csselr_el[4];
+};
 union { /* System control register. */
 struct {
 uint64_t _unused_sctlr;
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 9186fc7..5377480 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -776,7 +776,14 @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo 
*ri, uint64_t value)
 static uint64_t ccsidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 ARMCPU *cpu = arm_env_get_cpu(env);
-return cpu-ccsidr[env-cp15.c0_cssel];
+
+/* Acquire the CSSELR index from the bank corresponding to the CCSIDR
+ * bank
+ */
+uint32_t index = A32_BANKED_REG_GET(env, csselr,
+ri-secure  ARM_CP_SECSTATE_S);
+
+return cpu-ccsidr[index];
 }
 
 static void csselr_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -903,8 +910,9 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .access = PL1_R, .readfn = ccsidr_read, .type = ARM_CP_NO_MIGRATE },
 { .name = CSSELR, .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 2, .opc2 = 0,
-  .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c0_cssel),
-  .writefn = csselr_write, .resetvalue = 0 },
+  .access = PL1_RW, .writefn = csselr_write, .resetvalue = 0,
+  .bank_fieldoffsets = { offsetof(CPUARMState, cp15.csselr_s),
+ offsetof(CPUARMState, cp15.csselr_ns) } },
 /* Auxiliary ID register: this actually has an IMPDEF value but for now
  * just RAZ for all cores:
  */
-- 
1.8.3.2

[Qemu-devel] [PATCH v10 14/26] target-arm: respect SCR.FW, SCR.AW and SCTLR.NMFI

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Add checks of SCR AW/FW bits when performing writes of CPSR.  These SCR bits
are used to control whether the CPSR masking bits can be adjusted from
non-secure state.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org

---

v8 - v9
- Move cpsr_write mask filtering above mode switch.
- Replace conditional checks removed in v8.

v7 - v8
- Fixed incorrect use of env-uncached_cpsr A/I/F to use env-daif instead.
- Removed incorrect statement about SPSR to CPSR copies being affected by
  SCR.AW/FW.
- Fix typo in comment.
- Simpified cpsr_write logic

v3 - v4
- Fixed up conditions for ignoring CPSR.A/F updates by isolating to v7 and
  checking for the existence of EL3 and non-existence of EL2.
---
 target-arm/helper.c | 59 +++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 948192b..9186fc7 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -3644,6 +3644,8 @@ uint32_t cpsr_read(CPUARMState *env)
 
 void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
 {
+uint32_t changed_daif;
+
 if (mask  CPSR_NZCV) {
 env-ZF = (~val)  CPSR_Z;
 env-NF = val;
@@ -3666,8 +3668,57 @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t 
mask)
 env-GE = (val  16)  0xf;
 }
 
-env-daif = ~(CPSR_AIF  mask);
-env-daif |= val  CPSR_AIF  mask;
+/* In a V7 implementation that includes the security extensions but does
+ * not include Virtualization Extensions the SCR.FW and SCR.AW bits control
+ * whether non-secure software is allowed to change the CPSR_F and CPSR_A
+ * bits respectively.
+ *
+ * In a V8 implementation, it is permitted for privileged software to
+ * change the CPSR A/F bits regardless of the SCR.AW/FW bits.
+ */
+if (!arm_feature(env, ARM_FEATURE_V8) 
+arm_feature(env, ARM_FEATURE_EL3) 
+!arm_feature(env, ARM_FEATURE_EL2) 
+!arm_is_secure(env)) {
+
+changed_daif = (env-daif ^ val)  mask;
+
+if (changed_daif  CPSR_A) {
+/* Check to see if we are allowed to change the masking of async
+ * abort exceptions from a non-secure state.
+ */
+if (!(env-cp15.scr_el3  SCR_AW)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  Ignoring attempt to switch CPSR_A flag from 
+  non-secure world with SCR.AW bit clear\n);
+mask = ~CPSR_A;
+}
+}
+
+if (changed_daif  CPSR_F) {
+/* Check to see if we are allowed to change the masking of FIQ
+ * exceptions from a non-secure state.
+ */
+if (!(env-cp15.scr_el3  SCR_FW)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  Ignoring attempt to switch CPSR_F flag from 
+  non-secure world with SCR.FW bit clear\n);
+mask = ~CPSR_F;
+}
+
+/* Check whether non-maskable FIQ (NMFI) support is enabled.
+ * If this bit is set software is not allowed to mask
+ * FIQs, but is allowed to set CPSR_F to 0.
+ */
+if ((A32_BANKED_CURRENT_REG_GET(env, sctlr)  SCTLR_NMFI) 
+(val  CPSR_F)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  Ignoring attempt to enable CPSR_F flag 
+  (non-maskable FIQ [NMFI] support enabled)\n);
+mask = ~CPSR_F;
+}
+}
+}
 
 if ((env-uncached_cpsr ^ val)  mask  CPSR_M) {
 if (bad_mode_switch(env, val  CPSR_M)) {
@@ -3680,6 +3731,10 @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t 
mask)
 switch_mode(env, val  CPSR_M);
 }
 }
+
+env-daif = ~(CPSR_AIF  mask);
+env-daif |= val  CPSR_AIF  mask;
+
 mask = ~CACHED_CPSR_BITS;
 env-uncached_cpsr = (env-uncached_cpsr  ~mask) | (val  mask);
 }
-- 
1.8.3.2

Re: [Qemu-devel] Adding SMP support for Sparc Target

2014-11-06 Thread Alex Bennée


Artyom Tarasenko atar4q...@gmail.com writes:

 Hello Damien,

 On Thu, Nov 6, 2014 at 8:38 AM, Damien Hilloulin
 damien.hillou...@epfl.ch wrote:
 Hello everyone,

 I'm a newcomer in QEMU and my goal would be to port an existing system
 simulator using another emulator to QEMU.
snip
 However, it seems that the Sparc targets doesn't really support SMP/CMT as
 of now. So I am considering two possibilities:
 - adding SMP support in QEMU for the Sparc targets (and contribute it to
 QEMU :) )

 Do you mean a) emulating multiple guest cores on in a single host
 thread, or b) emulating multiple guest cores in multiple host threads?
snip
 If you mean b), things get more complicated because TCG can currently
 utilize just one host thread. There was an attempt to do utilize
 multiple threads for an ARM target:
 http://sourceforge.net/p/coremu/home/Home

 It would be interesting to hear what the TCG experts would say. Adding
 Richard to CC.

There is a desire to fix this but a distinct lack of cycles. It's not a
small job and will require quite a bit of preparatory work to map out an
approach and then fix it.

Having said that I'm sure someone mentioned they had done some work on
this on one of the KVM conference calls. Unfortunately I didn't catch
their names as my phone kept dumping me of the call. Does anyone
remember who that was? 

-- 
Alex Bennée

[Qemu-devel] [PATCH v10 18/26] target-arm: make DACR banked

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

When EL3 is running in AArch32 (or ARMv7 with Security Extensions)
DACR has a secure and a non-secure instance.  Adds definition for DACR32_EL2.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org

---

v8 - v9
- Added definition for DACR32_EL2
- Changed dacr cp15 fields to uint64_t
---
 hw/arm/pxa2xx.c |  2 +-
 target-arm/cpu.h| 13 +++--
 target-arm/helper.c | 28 ++--
 3 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/hw/arm/pxa2xx.c b/hw/arm/pxa2xx.c
index 2b00b59..8967cc4 100644
--- a/hw/arm/pxa2xx.c
+++ b/hw/arm/pxa2xx.c
@@ -276,7 +276,7 @@ static void pxa2xx_pwrmode_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 s-cpu-env.cp15.sctlr_ns = 0;
 s-cpu-env.cp15.c1_coproc = 0;
 s-cpu-env.cp15.ttbr0_el[1] = 0;
-s-cpu-env.cp15.c3 = 0;
+s-cpu-env.cp15.dacr_ns = 0;
 s-pm_regs[PSSR  2] |= 0x8; /* Set STS */
 s-pm_regs[RCSR  2] |= 0x8; /* Set GPR */
 
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 69a2079..0609ccc 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -227,8 +227,17 @@ typedef struct CPUARMState {
 TCR tcr_el[4];
 uint32_t c2_data; /* MPU data cachable bits.  */
 uint32_t c2_insn; /* MPU instruction cachable bits.  */
-uint32_t c3; /* MMU domain access control register
-MPU write buffer control.  */
+union { /* MMU domain access control register
+ * MPU write buffer control.
+ */
+struct {
+uint64_t dacr_ns;
+uint64_t dacr_s;
+};
+struct {
+uint64_t dacr32_el2;
+};
+};
 uint32_t pmsav5_data_ap; /* PMSAv5 MPU data access permissions */
 uint32_t pmsav5_insn_ap; /* PMSAv5 MPU insn access permissions */
 uint64_t hcr_el2; /* Hypervisor configuration register */
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 53a1859..dbfa6bb 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -440,10 +440,12 @@ static const ARMCPRegInfo not_v8_cp_reginfo[] = {
  * definitions that don't use CP_ANY wildcards (mostly in v8_cp_reginfo[]).
  */
 /* MMU Domain access control / MPU write buffer control */
-{ .name = DACR, .cp = 15,
-  .crn = 3, .crm = CP_ANY, .opc1 = CP_ANY, .opc2 = CP_ANY,
-  .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c3),
-  .resetvalue = 0, .writefn = dacr_write, .raw_writefn = raw_write, },
+{ .name = DACR,
+  .cp = 15, .opc1 = CP_ANY, .crn = 3, .crm = CP_ANY, .opc2 = CP_ANY,
+  .access = PL1_RW, .resetvalue = 0,
+  .writefn = dacr_write, .raw_writefn = raw_write,
+  .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.dacr_s),
+ offsetoflow32(CPUARMState, cp15.dacr_ns) } },
 /* ??? This covers not just the impdef TLB lockdown registers but also
  * some v7VMSA registers relating to TEX remap, so it is overly broad.
  */
@@ -2257,10 +2259,11 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
 { .name = DCCISW, .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
   .type = ARM_CP_NOP, .access = PL1_W },
 /* MMU Domain access control / MPU write buffer control */
-{ .name = DACR, .cp = 15,
-  .opc1 = 0, .crn = 3, .crm = 0, .opc2 = 0,
-  .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c3),
-  .resetvalue = 0, .writefn = dacr_write, .raw_writefn = raw_write, },
+{ .name = DACR, .cp = 15, .opc1 = 0, .crn = 3, .crm = 0, .opc2 = 0,
+  .access = PL1_RW, .resetvalue = 0,
+  .writefn = dacr_write, .raw_writefn = raw_write,
+  .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.dacr_s),
+ offsetoflow32(CPUARMState, cp15.dacr_ns) } },
 { .name = ELR_EL1, .state = ARM_CP_STATE_AA64,
   .type = ARM_CP_NO_MIGRATE,
   .opc0 = 3, .opc1 = 0, .crn = 4, .crm = 0, .opc2 = 1,
@@ -2330,6 +2333,11 @@ static const ARMCPRegInfo v8_el2_cp_reginfo[] = {
   .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
   .writefn = hcr_write },
+{ .name = DACR32_EL2, .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 4, .crn = 3, .crm = 0, .opc2 = 0,
+  .access = PL2_RW, .resetvalue = 0,
+  .writefn = dacr_write, .raw_writefn = raw_write,
+  .fieldoffset = offsetof(CPUARMState, cp15.dacr32_el2) },
 { .name = ELR_EL2, .state = ARM_CP_STATE_AA64,
   .type = ARM_CP_NO_MIGRATE,
   .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 1,
@@ -4518,7 +4526,7 @@ static int get_phys_addr_v5(CPUARMState *env, uint32_t 
address, int access_type,
 desc = ldl_phys(cs-as, table);
 type = (desc  3);
 domain = (desc  5)  0x0f;
-domain_prot = (env-cp15.c3  (domain * 2))  3;
+

[Qemu-devel] [PATCH v10 16/26] target-arm: make TTBR0/1 banked

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

Adds secure and non-secure bank register suport for TTBR0 and TTBR1.
Changes include adding secure and non-secure instances of ttbr0 and ttbr1 as
well as a CP register definition for TTBR0_EL3.  Added a union containing
both EL based array fields and secure and non-secure fields mapped to them.
Updated accesses to use A32_BANKED_CURRENT_REG_GET macro.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org

---

v8 - v9
- Fixed naming of TTBR0/1 and defined with opc0 instead of cp
- Removed get_phys_addr_lpae() AArch64 EL3 support.
- Remove stray whitespace change in pxa2xx.c

v5 - v6
- Changed _el field variants to be array based
- Merged TTBR# and TTBR#_EL1 reginfo entries
- Globally replace Aarch# with AArch#
---
 hw/arm/pxa2xx.c |  2 +-
 target-arm/cpu.h| 20 ++--
 target-arm/helper.c | 37 +
 3 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/hw/arm/pxa2xx.c b/hw/arm/pxa2xx.c
index 11d51af..2b00b59 100644
--- a/hw/arm/pxa2xx.c
+++ b/hw/arm/pxa2xx.c
@@ -275,7 +275,7 @@ static void pxa2xx_pwrmode_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 s-cpu-env.daif = PSTATE_A | PSTATE_F | PSTATE_I;
 s-cpu-env.cp15.sctlr_ns = 0;
 s-cpu-env.cp15.c1_coproc = 0;
-s-cpu-env.cp15.ttbr0_el1 = 0;
+s-cpu-env.cp15.ttbr0_el[1] = 0;
 s-cpu-env.cp15.c3 = 0;
 s-pm_regs[PSSR  2] |= 0x8; /* Set STS */
 s-pm_regs[RCSR  2] |= 0x8; /* Set GPR */
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 67f319f..772d4e5 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -199,8 +199,24 @@ typedef struct CPUARMState {
 uint32_t c1_xscaleauxcr; /* XScale auxiliary control register.  */
 uint64_t sder; /* Secure debug enable register. */
 uint32_t nsacr; /* Non-secure access control register. */
-uint64_t ttbr0_el1; /* MMU translation table base 0. */
-uint64_t ttbr1_el1; /* MMU translation table base 1. */
+union { /* MMU translation table base 0. */
+struct {
+uint64_t _unused_ttbr0_0;
+uint64_t ttbr0_ns;
+uint64_t _unused_ttbr0_1;
+uint64_t ttbr0_s;
+};
+uint64_t ttbr0_el[4];
+};
+union { /* MMU translation table base 1. */
+struct {
+uint64_t _unused_ttbr1_0;
+uint64_t ttbr1_ns;
+uint64_t _unused_ttbr1_1;
+uint64_t ttbr1_s;
+};
+uint64_t ttbr1_el[4];
+};
 uint64_t c2_control; /* MMU translation table base control.  */
 uint32_t c2_mask; /* MMU translation table base selection mask.  */
 uint32_t c2_base_mask; /* MMU translation table base 0 mask. */
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 5377480..b75a394 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1646,13 +1646,15 @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
   .access = PL1_RW,
   .fieldoffset = offsetof(CPUARMState, cp15.esr_el[1]), .resetvalue = 0, },
 { .name = TTBR0_EL1, .state = ARM_CP_STATE_BOTH,
-  .opc0 = 3, .crn = 2, .crm = 0, .opc1 = 0, .opc2 = 0,
-  .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.ttbr0_el1),
-  .writefn = vmsa_ttbr_write, .resetvalue = 0 },
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 0,
+  .access = PL1_RW, .writefn = vmsa_ttbr_write, .resetvalue = 0,
+  .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
+ offsetof(CPUARMState, cp15.ttbr0_ns) } },
 { .name = TTBR1_EL1, .state = ARM_CP_STATE_BOTH,
-  .opc0 = 3, .crn = 2, .crm = 0, .opc1 = 0, .opc2 = 1,
-  .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.ttbr1_el1),
-  .writefn = vmsa_ttbr_write, .resetvalue = 0 },
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 1,
+  .access = PL1_RW, .writefn = vmsa_ttbr_write, .resetvalue = 0,
+  .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
+ offsetof(CPUARMState, cp15.ttbr1_ns) } },
 { .name = TCR_EL1, .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .crn = 2, .crm = 0, .opc1 = 0, .opc2 = 2,
   .access = PL1_RW, .writefn = vmsa_tcr_el1_write,
@@ -1883,11 +1885,13 @@ static const ARMCPRegInfo lpae_cp_reginfo[] = {
   .fieldoffset = offsetof(CPUARMState, cp15.par_el1), .resetvalue = 0 },
 { .name = TTBR0, .cp = 15, .crm = 2, .opc1 = 0,
   .access = PL1_RW, .type = ARM_CP_64BIT | ARM_CP_NO_MIGRATE,
-  .fieldoffset = offsetof(CPUARMState, cp15.ttbr0_el1),
+  .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
+ offsetof(CPUARMState, cp15.ttbr0_ns) },
   .writefn = vmsa_ttbr_write, .resetfn = arm_cp_reset_ignore },
 { .name = TTBR1, .cp = 15, .crm = 2, .opc1 = 1,

[Qemu-devel] [PATCH v10 19/26] target-arm: make IFSR banked

2014-11-06 Thread Greg Bellows

From: Fabian Aggeler aggel...@ethz.ch

When EL3 is running in AArch32 (or ARMv7 with Security Extensions)
IFSR has a secure and a non-secure instance.  Adds IFSR32_EL2 definition and
storage.

Signed-off-by: Fabian Aggeler aggel...@ethz.ch
Signed-off-by: Greg Bellows greg.bell...@linaro.org

---

v8 - v9
- Added definition for IFSR32_EL2
- Changed ifsr cp15 fields to uint64_t
---
 target-arm/cpu.h| 10 +-
 target-arm/helper.c | 13 +
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 0609ccc..c271ab2 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -242,7 +242,15 @@ typedef struct CPUARMState {
 uint32_t pmsav5_insn_ap; /* PMSAv5 MPU insn access permissions */
 uint64_t hcr_el2; /* Hypervisor configuration register */
 uint64_t scr_el3; /* Secure configuration register.  */
-uint32_t ifsr_el2; /* Fault status registers.  */
+union { /* Fault status registers.  */
+struct {
+uint64_t ifsr_ns;
+uint64_t ifsr_s;
+};
+struct {
+uint64_t ifsr32_el2;
+};
+};
 uint64_t esr_el[4];
 uint32_t c6_region[8]; /* MPU base/size registers.  */
 uint64_t far_el[4]; /* Fault address registers.  */
diff --git a/target-arm/helper.c b/target-arm/helper.c
index dbfa6bb..f47748b 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1654,8 +1654,9 @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
   .fieldoffset = offsetoflow32(CPUARMState, cp15.esr_el[1]),
   .resetfn = arm_cp_reset_ignore, },
 { .name = IFSR, .cp = 15, .crn = 5, .crm = 0, .opc1 = 0, .opc2 = 1,
-  .access = PL1_RW,
-  .fieldoffset = offsetof(CPUARMState, cp15.ifsr_el2), .resetvalue = 0, },
+  .access = PL1_RW, .resetvalue = 0,
+  .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.ifsr_s),
+ offsetoflow32(CPUARMState, cp15.ifsr_ns) } },
 { .name = ESR_EL1, .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .crn = 5, .crm = 2, .opc1 = 0, .opc2 = 0,
   .access = PL1_RW,
@@ -2347,6 +2348,10 @@ static const ARMCPRegInfo v8_el2_cp_reginfo[] = {
   .type = ARM_CP_NO_MIGRATE,
   .opc0 = 3, .opc1 = 4, .crn = 5, .crm = 2, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.esr_el[2]) },
+{ .name = IFSR32_EL2, .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 4, .crn = 5, .crm = 0, .opc2 = 1,
+  .access = PL2_RW, .resetvalue = 0,
+  .fieldoffset = offsetof(CPUARMState, cp15.ifsr32_el2) },
 { .name = FAR_EL2, .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 4, .crn = 6, .crm = 0, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.far_el[2]) },
@@ -4324,11 +4329,11 @@ void arm_cpu_do_interrupt(CPUState *cs)
 env-exception.fsr = 2;
 /* Fall through to prefetch abort.  */
 case EXCP_PREFETCH_ABORT:
-env-cp15.ifsr_el2 = env-exception.fsr;
+A32_BANKED_CURRENT_REG_SET(env, ifsr, env-exception.fsr);
 env-cp15.far_el[1] = deposit64(env-cp15.far_el[1], 32, 32,
 env-exception.vaddress);
 qemu_log_mask(CPU_LOG_INT, ...with IFSR 0x%x IFAR 0x%x\n,
-  env-cp15.ifsr_el2, (uint32_t)env-exception.vaddress);
+  env-exception.fsr, (uint32_t)env-exception.vaddress);
 new_mode = ARM_CPU_MODE_ABT;
 addr = 0x0c;
 mask = CPSR_A | CPSR_I;
-- 
1.8.3.2

1 2 >

1 - 100 of 149 matches

Mail list logo