from:"Konrad Rzeszutek Wilk"

Re: [Xen-devel] [PATCH v0 RFC 0/2] xl/libxl support for PVUSB

2014-11-10 Thread Konrad Rzeszutek Wilk

On Mon, Nov 10, 2014 at 01:37:44AM -0700, Chun Yan Liu wrote:
 Is there any progress on this work? I didn't see new version after this.
 Anyone knows the status?

I believe Olaf and Juergen were looking at this for Xen 4.6?

CC-ing them.
 
 Thanks,
 Chunyan
 
  On 8/11/2014 at 04:23 AM, in message
 1407702234-22309-1-git-send-email-caobosi...@gmail.com, Bo Cao
 caobosi...@gmail.com wrote: 
  Finally I have a workable version xl/libxl support for PVUSB. Most of 
  its commands work property now, but there are still some probelm to be
  solved. 
  Please take a loot and give me some advices.
   
  == What have been implemented ? == 
  I have implemented libxl functions for PVUSB in libxl_usb.c. It mainly  
  consists of two part: 
  usbctrl_add/remove/list and usb_add/remove/list in which usbctrl denote usb 
   
  controller in which 
  usd device can be plugged in. I don't use ao_dev in  
  libxl_deivce_usbctrl_add since we don't need to 
  execute hotplug script for usbctrl and without ao_dev, adding default  
  usbctrl for usb device 
  would be easier. 
   
  For the cammands to manipulate usb device such as xl usb-attach and xl  
  usb-detach, this patch now only 
  support to specify usb devices by their interface in sysfs. Using this  
  interface, we can read usb device 
  information through sysfs and bind/unbind usb device. (The support for  
  mapping the lsusb bus:addr to the 
  sysfs usb interface will come later). 
   
  == What needs to do next ? == 
  There are two main problems to be solved. 
   
  1.  PVUSB Options in VM Guest's Configuration File 
  The interface in VM Guest's configuration file to add usb device is:  
  usb=[interface=1-1]. 
  But the problem is now is that after the default usbctrl is added, the 
  state  
  of usbctrl is 2, e,g, XenbusStateInitWait, 
  waiting for xen-usbfront to connect. The xen-usbfront in VM Guest isn't  
  loaded. Therefore, sysfs_intf_write 
  will report error. Does anyone have any clue how to solve this? 
   
  2. sysfs_intf_write 
  In the process of xl usb-attach domid intf=1-1, after writing 1-1 
  to  
  Xenstore entry, we need to 
  bind the controller of this usb device to usbback driver so that it can be  
  used by VM Guest. For exampele, 
  for usb device 1-1, it's controller interface maybe 1-1:1.0, and we  
  write this value to /sys/bus/usb/driver/usbback/bind. 
  But for some devices, they have two controllers, for example 1-1:1.0 and  
  1-1:1.1. I think this means it has two functions, 
  such as usbhid and usb-storage. So in this case, we bind the two controller 
   
  to usbback? 
   
   
  There maybe some errors or bugs in the codes. Feel free to tell me. 
   
  Cheers, 
   
  - Simon 
   
  --- 
  CC: George Dunlap george.dun...@eu.citrix.com 
  CC: Ian Jackson ian.jack...@citrix.com 
  CC: Ian Campbell ian.campb...@citrix.com 
  CC: Pasi Kärkkäinen pa...@iki.fi 
  CC: Lars Kurth lars.ku...@citrix.com 
   
   
   
  ___ 
  Xen-devel mailing list 
  Xen-devel@lists.xen.org 
  http://lists.xen.org/xen-devel 
   
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] PCI passthrough (pci-attach) to HVM guests bug (BAR64 addresses are bogus)

2014-11-11 Thread Konrad Rzeszutek Wilk

On Mon, Nov 10, 2014 at 04:32:48PM -0500, Konrad Rzeszutek Wilk wrote:
 On Mon, Nov 10, 2014 at 01:07:20PM -0500, Konrad Rzeszutek Wilk wrote:
  On Mon, Nov 10, 2014 at 05:42:32PM +, David Vrabel wrote:
   On 10/11/14 17:32, Konrad Rzeszutek Wilk wrote:
Hey,

With Xen 4.5 (today's staging), when I boot a guest and then do 
pci-attach
the BARs values are corrupt.
 
 I can reproduce this with Xen 4.4, Xen 4.3 and Xen 4.1.
 
 A bit digging in and I realized that:
 
 (XEN) memory_map:add: dom1 gfn=f4000 mfn=d8000 nr=4000 [64M]
 (XEN) AMD-Vi: update_paging_mode Try to access pdev_list without aquiring 
 pcidevs_lock.
 (XEN) memory_map:add: dom1 gfn=f8000 mfn=fc000 nr=2000 [32M]
 (XEN) ioport_map:add: dom1 gport=1000 mport=c000 nr=80
 (XEN) AMD-Vi: Disable: device id = 0x500, domain = 0, paging mode = 3
 (XEN) AMD-Vi: Setup I/O page table: device id = 0x500, type = 0x1, root table 
 = 0x228b02000, domain = 1, paging mode = 3
 
 The sizes are my own editing. This means QEMU is putting the
 devices in the MMIO region - and doing it succesfully. But then:
 
   
   
[  152.572965] pci :00:04.0: BAR 1: no space for [mem size 
0x0800 64bit  pref]
  [  152.518320] pci :00:04.0: reg 0x14: [mem 0x-0x07ff 64bit 
  pref]
 
 .. The guest computes the right size for them, but reads the wrong BAR value
 that was set by QEMU and also created in the hypervisor.
 
 Perhaps this is Linux kernel being on fritz. Will try another kernel.

I figured this out.


When we pass in the device at bootup, the hvmloader does:

(d4) pci dev 05:0 bar 14 size 00800: 0e00c
(d4) pci dev 05:0 bar 1c size 00400: 0e80c
(d4) pci dev 05:0 bar 10 size 00200: 0ec00
(d4) pci dev 05:0 bar 24 size 00080: 0c201

That is - it finds the size, and then it sets the BARs to fit within
the MMIO region. QEMU is not involved in this.

When we PCI insert an device, the BARs are not set at all - and hence
the Linux kernel is the one that tries to set the BARs in. The
reason it cannot fit the device in the MMIO region is due to the
_CRS only having certain ranges (even thought the MMIO region can
cover 2GB). See:

Without any devices (and me doing PCI insertion after that):
# dmesg | grep bus resource
[0.366000] pci_bus :00: root bus resource [bus 00-ff]
[0.366000] pci_bus :00: root bus resource [io  0x-0x0cf7]
[0.366000] pci_bus :00: root bus resource [io  0x0d00-0x]
[0.366000] pci_bus :00: root bus resource [mem 0x000a-0x000b]
[0.366000] pci_bus :00: root bus resource [mem 0xf000-0xfbff]

With the device (my GPU card) inserted so that hvmloader can enumerate it:
 dmesg | grep 'resource' 
[0.455006] pci_bus :00: root bus resource [bus 00-ff]
[0.459006] pci_bus :00: root bus resource [io  0x-0x0cf7]
[0.462006] pci_bus :00: root bus resource [io  0x0d00-0x]
[0.466006] pci_bus :00: root bus resource [mem 0x000a-0x000b]
[0.469006] pci_bus :00: root bus resource [mem 0xe000-0xfbff]

I chatted with Bjorn and Rafeal on IRC about how PCI insertion works
on baremetal and it sounds like Thunderbolt device insertion is an
interesting case. The SMM sets the BAR regions to fit within the MMIO
(which is advertised by the _CRS) and it then pokes the OS to enumerate
the BARs. The OS is free to use what the firmware has set or renumber
it. The end result is that since the SMM 'fits' the BAR inside the
pre-set _CRS window it all works. We do not do that.

The two ways I could think of making this work are:
 - QEMU tracks BAR enumeration. When a new device is inserted it would
   set the BAR to fit within the E820 HOLE region. If it can't
   (because the MMIO is too small) it puts it at the end of the memory.
   Naturally the 'end of the memory' part would require adding
   _CRS to cover end of GPFN to never never land. And also the _CRS
   region for the MMIO under 4GB would have to be expanded so QEMU
   can jam things in there.

 - Or add in dsdt.asl another _CRS region controlled by the hvmloader.
   This one would start at the end of GPFN + delta of maxmem - mem and
   continue to never never land. The hvmloader would just write the
   the values in the BIOS OperationRegion (0xFC00) and let the
   AML code take care of parsing it and constructing the #9 _CRS region.
   This will allow kernels who are picky about BARs not being in _CRS
   region to deal with cards that are hot-plugged past BIOS boot.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-xen-4.5] tools/hotplug: use configure --sysconfdir result

2014-11-12 Thread Konrad Rzeszutek Wilk

On Wed, Nov 12, 2014 at 11:12:06AM +, Ian Campbell wrote:
 You forgot to add the release manager... I've done that for you.
 
 In 1413279117.1497.25.ca...@citrix.com I said:
  Acked-by: Ian Campbell ian.campb...@citrix.com
  
  Is this a bug fix or a feature? What are the risks? IsLKonrad OK with
  it?

Back again to that question.

What happens if we do not take that in now but delay to Xen 4.6?

Will systemd still correctly work? It looks like it will and 
this is just an improvement that makes the code be more streamlined.

It does not fix a bug (at least that is what I see from
reading), I believe this should be deferred to Xen 4.6.


 
 On Wed, 2014-11-12 at 12:06 +0100, Olaf Hering wrote:
  Ping?
  
  
   ... instead of hardcoding values and guess where they config files may
   be. Also use the result of --with-sysconfig-leaf-dir.
   
   Signed-off-by: Olaf Hering o...@aepfle.de
   Cc: Ian Campbell ian.campb...@citrix.com
   Cc: Ian Jackson ian.jack...@eu.citrix.com
   Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
   Cc: Wei Liu wei.l...@citrix.com
   ---
tools/hotplug/Linux/init.d/xencommons.in  | 6 
   +-
tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in| 3 +--
tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in | 3 +--
tools/hotplug/Linux/systemd/xenconsoled.service.in| 3 +--
tools/hotplug/Linux/systemd/xenstored.service.in  | 3 +--
tools/hotplug/Linux/xendomains.in | 6 
   +-
6 files changed, 6 insertions(+), 18 deletions(-)
   
   diff --git a/tools/hotplug/Linux/init.d/xencommons.in 
   b/tools/hotplug/Linux/init.d/xencommons.in
   index d53a1f3..a1095c2 100644
   --- a/tools/hotplug/Linux/init.d/xencommons.in
   +++ b/tools/hotplug/Linux/init.d/xencommons.in
   @@ -23,11 +23,7 @@ BACKEND_MODULES=@LINUX_BACKEND_MODULES@

. @XEN_SCRIPT_DIR@/hotplugpath.sh

   -if [ -d /etc/sysconfig ]; then
   - xencommons_config=/etc/sysconfig
   -else
   - xencommons_config=/etc/default
   -fi
   +xencommons_config=@CONFIG_DIR@/@CONFIG_LEAF_DIR@

test -f $xencommons_config/xencommons  . $xencommons_config/xencommons

   diff --git a/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in 
   b/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in
   index 44dfce8..1e930ed 100644
   --- a/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in
   +++ b/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in
   @@ -5,8 +5,7 @@ RefuseManualStop=true

[Mount]
Environment=XENSTORED_MOUNT_CTX=none
   -EnvironmentFile=-/etc/sysconfig/xenstored
   -EnvironmentFile=-/etc/default/xenstored
   +EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xenstored
What=xenstore
Where=@XEN_LIB_STORED@
Type=tmpfs
   diff --git 
   a/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in 
   b/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in
   index d3470fc..2282923 100644
   --- a/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in
   +++ b/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in
   @@ -8,8 +8,7 @@ ConditionVirtualization=xen

[Service]
Type=simple
   -EnvironmentFile=-/etc/default/xenstored
   -EnvironmentFile=-/etc/sysconfig/xenstored
   +EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xenstored
PIDFile=@XEN_RUN_DIR@/qemu-dom0.pid
ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities
ExecStartPre=/bin/mkdir -p @XEN_RUN_DIR@
   diff --git a/tools/hotplug/Linux/systemd/xenconsoled.service.in 
   b/tools/hotplug/Linux/systemd/xenconsoled.service.in
   index 7ca0264..377f131 100644
   --- a/tools/hotplug/Linux/systemd/xenconsoled.service.in
   +++ b/tools/hotplug/Linux/systemd/xenconsoled.service.in
   @@ -9,8 +9,7 @@ Type=simple
Environment=XENCONSOLED_ARGS=
Environment=XENCONSOLED_LOG=none
Environment=XENCONSOLED_LOG_DIR=@XEN_LOG_DIR@/console
   -EnvironmentFile=-/etc/default/xenconsoled
   -EnvironmentFile=-/etc/sysconfig/xenconsoled
   +EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xenconsoled
PIDFile=@XEN_RUN_DIR@/xenconsoled.pid
ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities
ExecStartPre=/bin/mkdir -p ${XENCONSOLED_LOG_DIR}
   diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in 
   b/tools/hotplug/Linux/systemd/xenstored.service.in
   index 013e69e..f85b37d 100644
   --- a/tools/hotplug/Linux/systemd/xenstored.service.in
   +++ b/tools/hotplug/Linux/systemd/xenstored.service.in
   @@ -11,8 +11,7 @@ Type=notify
Environment=XENSTORED_ARGS=
Environment=XENSTORED_ROOTDIR=@XEN_LIB_STORED@
Environment=XENSTORED=@XENSTORED@
   -EnvironmentFile=-/etc/default/xencommons
   -EnvironmentFile=-/etc/sysconfig/xencommons
   +EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons
ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities
ExecStartPre=-/bin/rm -f

Re: [Xen-devel] [PATCH for-4.5] libxl: add missing action in DEFINE_DEVICE_ADD

2014-11-12 Thread Konrad Rzeszutek Wilk

On Wed, Nov 12, 2014 at 10:56:24AM +, Ian Campbell wrote:
 On Wed, 2014-11-12 at 10:39 +, Wei Liu wrote:
  ... otherwise when device add operation fails, the error message looks
  like libxl: error: libxl.c:1897:device_addrm_aocomplete: unable to (null)
  device, which is not very helpful.
 
 Thanks.
  
  Signed-off-by: Wei Liu wei.l...@citrix.com
 
 Acked-by: Ian Campbell ian.campb...@citrix.com

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
  Cc: Ian Jackson ian.jack...@eu.citrix.com
  ---
   tools/libxl/libxl.c |1 +
   1 file changed, 1 insertion(+)
  
  diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
  index f7961f6..de23fec 100644
  --- a/tools/libxl/libxl.c
  +++ b/tools/libxl/libxl.c
  @@ -4164,6 +4164,7 @@ DEFINE_DEVICE_REMOVE(vtpm, destroy, 1)
   \
   GCNEW(aodev);   \
   libxl__prepare_ao_device(ao, aodev);\
  +aodev-action = LIBXL__DEVICE_ACTION_ADD;   \
   aodev-callback = device_addrm_aocomplete;  \
   aodev-update_json = true;  \
   libxl__device_##type##_add(egc, domid, type, aodev);\
 
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] Adjust number of domains in cpupools when destroying domain

2014-11-12 Thread Konrad Rzeszutek Wilk

On Wed, Nov 12, 2014 at 11:11:15AM +, George Dunlap wrote:
 On Wed, Nov 12, 2014 at 11:10 AM, Juergen Gross jgr...@suse.com wrote:
  Commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a (move domain to
  cpupool0 before destroying it) introduced an error in the accounting
  of cpupools regarding the number of domains. The number of domains
  is nor adjusted when a domain is moved to cpupool0 in kill_domain().
 
  Correct this by introducing a cpupool function doing the move
  instead of open coding it by calling sched_move_domain().
 
  Signed-off-by: Juergen Gross jgr...@suse.com
  Tested-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com
  Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
 
 Acked-by: George Dunlap george.dun...@eu.citrix.com

Excellent!

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] Adjust number of domains in cpupools when destroying domain

2014-11-12 Thread Konrad Rzeszutek Wilk

On Wed, Nov 12, 2014 at 12:10:02PM +0100, Juergen Gross wrote:
 Commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a (move domain to
 cpupool0 before destroying it) introduced an error in the accounting
 of cpupools regarding the number of domains. The number of domains
 is nor adjusted when a domain is moved to cpupool0 in kill_domain().

s/nor/not/
 
 Correct this by introducing a cpupool function doing the move
 instead of open coding it by calling sched_move_domain().
 
 Signed-off-by: Juergen Gross jgr...@suse.com
 Tested-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com
 Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
 ---
  xen/common/cpupool.c| 47 +--
  xen/common/domain.c |  2 +-
  xen/include/xen/sched.h |  1 +
  3 files changed, 35 insertions(+), 15 deletions(-)
 
 diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
 index 73249d3..a758a8b 100644
 --- a/xen/common/cpupool.c
 +++ b/xen/common/cpupool.c
 @@ -225,6 +225,35 @@ static int cpupool_destroy(struct cpupool *c)
  }
  
  /*
 + * Move domain to another cpupool
 + */
 +static int cpupool_move_domain_locked(struct domain *d, struct cpupool *c)
 +{
 +int ret;
 +
 +d-cpupool-n_dom--;
 +ret = sched_move_domain(d, c);
 +if ( ret )
 +d-cpupool-n_dom++;
 +else
 +c-n_dom++;
 +
 +return ret;
 +}

\n ?

 +int cpupool_move_domain(struct domain *d, struct cpupool *c)
 +{
 +int ret;
 +
 +spin_lock(cpupool_lock);
 +
 +ret = cpupool_move_domain_locked(d, c);
 +
 +spin_unlock(cpupool_lock);
 +
 +return ret;
 +}
 +
 +/*
   * assign a specific cpu to a cpupool
   * cpupool_lock must be held
   */
 @@ -338,14 +367,9 @@ static int cpupool_unassign_cpu(struct cpupool *c, 
 unsigned int cpu)
  ret = -EBUSY;
  break;
  }
 -c-n_dom--;
 -ret = sched_move_domain(d, cpupool0);
 +ret = cpupool_move_domain_locked(d, cpupool0);
  if ( ret )
 -{
 -c-n_dom++;
  break;
 -}
 -cpupool0-n_dom++;
  }
  rcu_read_unlock(domlist_read_lock);
  if ( ret )
 @@ -613,16 +637,11 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
  d-domain_id, op-cpupool_id);
  ret = -ENOENT;
  spin_lock(cpupool_lock);
 +
  c = cpupool_find_by_id(op-cpupool_id);
  if ( (c != NULL)  cpumask_weight(c-cpu_valid) )
 -{
 -d-cpupool-n_dom--;
 -ret = sched_move_domain(d, c);
 -if ( ret )
 -d-cpupool-n_dom++;
 -else
 -c-n_dom++;
 -}
 +ret = cpupool_move_domain_locked(d, c);
 +
  spin_unlock(cpupool_lock);
  cpupool_dprintk(cpupool move_domain(dom=%d)-pool=%d ret %d\n,
  d-domain_id, op-cpupool_id, ret);
 diff --git a/xen/common/domain.c b/xen/common/domain.c
 index a3f51ec..4a62c1d 100644
 --- a/xen/common/domain.c
 +++ b/xen/common/domain.c
 @@ -621,7 +621,7 @@ int domain_kill(struct domain *d)
  rc = -EAGAIN;
  break;
  }
 -if ( sched_move_domain(d, cpupool0) )
 +if ( cpupool_move_domain(d, cpupool0) )
  return -EAGAIN;
  for_each_vcpu ( d, v )
  unmap_vcpu_info(v);
 diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
 index c5157e6..46fc6e3 100644
 --- a/xen/include/xen/sched.h
 +++ b/xen/include/xen/sched.h
 @@ -871,6 +871,7 @@ struct cpupool *cpupool_get_by_id(int poolid);
  void cpupool_put(struct cpupool *pool);
  int cpupool_add_domain(struct domain *d, int poolid);
  void cpupool_rm_domain(struct domain *d);
 +int cpupool_move_domain(struct domain *d, struct cpupool *c);
  int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op);
  void schedule_dump(struct cpupool *c);
  extern void dump_runq(unsigned char key);
 -- 
 2.1.2
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [TestDay] Xen-4.5.0 RC1 bug: with xen_platform_pci=0 option, the guest with VT-d device fails to boot up and qemu-xen

2014-11-12 Thread Konrad Rzeszutek Wilk

 Basic root-causing log:
 --
 [root@vt-hsw1 carl]# xl cr xlexample.hvm 
 Parsing config from xlexample.hvm
 libxl: error: libxl_qmp.c:287:qmp_handle_error_response: received an error
 message from QMP server: Unsupported bus. Bus doesn't have property
 'acpi-pcihp-bsel' set
 libxl: error: libxl_create.c:1385:domcreate_attach_pci: libxl_device_pci_add
 failed: -3

I can reproduce this.

If you use device_model_version = 'qemu-xen-traditional' the problem
go aways and you can boot the guest without Xen PCI platform driver.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-xen-4.5] tools/hotplug: use configure --sysconfdir result

2014-11-12 Thread Konrad Rzeszutek Wilk

On Wed, Nov 12, 2014 at 06:06:57PM +0100, Olaf Hering wrote:
 On Wed, Nov 12, Konrad Rzeszutek Wilk wrote:
 
  
  What happens if we do not take that in now but delay to Xen 4.6?
 
 I will be very unhappy...
 I mean, what exactly is the concern here?!

I need to know what the risk is if this does not go in. We
are at RC2 and I really want to make the amount of patches
that go in be a trickle.
 
  Will systemd still correctly work? It looks like it will and 
  this is just an improvement that makes the code be more streamlined.
  
  It does not fix a bug (at least that is what I see from
  reading), I believe this should be deferred to Xen 4.6.
 
 It does fix a bug. If the whole thing was configured with --prefix=X
 --sysconfdir=Y parts of it will still refer to hardcoded paths entirely
 unrelated to what was just installed with 'make install'.

How often does that happen? Do the two paths that are specified
in the code cover 99% of the use-cases?

 
 Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V3 2/8] xen: Delay remapping memory of pv-domain

2014-11-12 Thread Konrad Rzeszutek Wilk

On Tue, Nov 11, 2014 at 06:43:40AM +0100, Juergen Gross wrote:
 Early in the boot process the memory layout of a pv-domain is changed
 to match the E820 map (either the host one for Dom0 or the Xen one)
 regarding placement of RAM and PCI holes. This requires removing memory
 pages initially located at positions not suitable for RAM and adding
 them later at higher addresses where no restrictions apply.
 
 To be able to operate on the hypervisor supported p2m list until a
 virtual mapped linear p2m list can be constructed, remapping must
 be delayed until virtual memory management is initialized, as the
 initial p2m list can't be extended unlimited at physical memory
 initialization time due to it's fixed structure.
 
 A further advantage is the reduction in complexity and code volume as
 we don't have to be careful regarding memory restrictions during p2m
 updates.
 
 Signed-off-by: Juergen Gross jgr...@suse.com
 Reviewed-by: David Vrabel david.vra...@citrix.com
 ---
  arch/x86/include/asm/xen/page.h |   1 -
  arch/x86/xen/mmu.c  |   4 +
  arch/x86/xen/p2m.c  | 149 
  arch/x86/xen/setup.c| 385 
 +++-
  arch/x86/xen/xen-ops.h  |   1 +
  5 files changed, 223 insertions(+), 317 deletions(-)
 
 diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
 index 6c16451..b475297 100644
 --- a/arch/x86/include/asm/xen/page.h
 +++ b/arch/x86/include/asm/xen/page.h
 @@ -44,7 +44,6 @@ extern unsigned long  machine_to_phys_nr;
  
  extern unsigned long get_phys_to_machine(unsigned long pfn);
  extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 -extern bool __init early_set_phys_to_machine(unsigned long pfn, unsigned 
 long mfn);
  extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
  extern unsigned long set_phys_range_identity(unsigned long pfn_s,
unsigned long pfn_e);
 diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
 index a8a1a3d..d3e492b 100644
 --- a/arch/x86/xen/mmu.c
 +++ b/arch/x86/xen/mmu.c
 @@ -1223,6 +1223,10 @@ static void __init xen_pagetable_init(void)
   /* Allocate and initialize top and mid mfn levels for p2m structure */
   xen_build_mfn_list_list();
  
 + /* Remap memory freed because of conflicts with E820 map */

s/becasue of/due to
 + if (!xen_feature(XENFEAT_auto_translated_physmap))
 + xen_remap_memory();
 +
   xen_setup_shared_info();
   xen_post_allocator_init();
  }
 diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
 index fa75842..f67f8cf 100644
 --- a/arch/x86/xen/p2m.c
 +++ b/arch/x86/xen/p2m.c
 @@ -164,6 +164,7 @@
  #include linux/sched.h
  #include linux/seq_file.h
  #include linux/bootmem.h
 +#include linux/slab.h
  
  #include asm/cache.h
  #include asm/setup.h
 @@ -204,6 +205,8 @@ RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / 
 (P2M_PER_PAGE * P2M_MID_PER
   */
  RESERVE_BRK(p2m_identity_remap, PAGE_SIZE * 2 * 3 * MAX_REMAP_RANGES);
  
 +static int use_brk = 1;
 +
  static inline unsigned p2m_top_index(unsigned long pfn)
  {
   BUG_ON(pfn = MAX_P2M_PFN);
 @@ -268,6 +271,22 @@ static void p2m_init(unsigned long *p2m)
   p2m[i] = INVALID_P2M_ENTRY;
  }
  
 +static void * __ref alloc_p2m_page(void)
 +{
 + if (unlikely(use_brk))
 + return extend_brk(PAGE_SIZE, PAGE_SIZE);
 +
 + if (unlikely(!slab_is_available()))
 + return alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
 +
 + return (void *)__get_free_page(GFP_KERNEL | __GFP_REPEAT);
 +}
 +
 +static void free_p2m_page(void *p)
 +{
 + free_page((unsigned long)p);
 +}
 +
  /*
   * Build the parallel p2m_top_mfn and p2m_mid_mfn structures
   *
 @@ -287,13 +306,13 @@ void __ref xen_build_mfn_list_list(void)
  
   /* Pre-initialize p2m_top_mfn to be completely missing */
   if (p2m_top_mfn == NULL) {
 - p2m_mid_missing_mfn = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
 + p2m_mid_missing_mfn = alloc_p2m_page();
   p2m_mid_mfn_init(p2m_mid_missing_mfn, p2m_missing);
  
 - p2m_top_mfn_p = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
 + p2m_top_mfn_p = alloc_p2m_page();
   p2m_top_mfn_p_init(p2m_top_mfn_p);
  
 - p2m_top_mfn = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
 + p2m_top_mfn = alloc_p2m_page();
   p2m_top_mfn_init(p2m_top_mfn);
   } else {
   /* Reinitialise, mfn's all change after migration */
 @@ -327,7 +346,7 @@ void __ref xen_build_mfn_list_list(void)
* missing parts of the mfn tree after
* runtime.
*/
 - mid_mfn_p = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
 + mid_mfn_p = alloc_p2m_page();
   p2m_mid_mfn_init(mid_mfn_p, p2m_missing);

Re: [Xen-devel] [PATCH V3 4/8] xen: Delay invalidating extra memory

2014-11-12 Thread Konrad Rzeszutek Wilk

 @@ -376,12 +374,14 @@ void __init xen_build_dynamic_phys_to_machine(void)
   unsigned long max_pfn;
   unsigned long pfn;
  
 -  if (xen_feature(XENFEAT_auto_translated_physmap))
 + if (xen_feature(XENFEAT_auto_translated_physmap))

Spurious change.

.. snip..
 diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
 index 0e5f9b6..8d5985b 100644
 --- a/arch/x86/xen/setup.c
 +++ b/arch/x86/xen/setup.c
 @@ -75,7 +75,6 @@ static unsigned long xen_remap_mfn __initdata = 
 INVALID_P2M_ENTRY;
  
  static void __init xen_add_extra_mem(u64 start, u64 size)
  {
 - unsigned long pfn;
   int i;
  
   for (i = 0; i  XEN_EXTRA_MEM_MAX_REGIONS; i++) {
 @@ -95,17 +94,74 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
   printk(KERN_WARNING Warning: not enough extra memory 
 regions\n);
  
   memblock_reserve(start, size);
 +}
  
 - xen_max_p2m_pfn = PFN_DOWN(start + size);
 - for (pfn = PFN_DOWN(start); pfn  xen_max_p2m_pfn; pfn++) {
 - unsigned long mfn = pfn_to_mfn(pfn);
 +static void __init xen_del_extra_mem(u64 start, u64 size)
 +{
 + int i;
 + u64 start_r, size_r;
  
 - if (WARN_ONCE(mfn == pfn, Trying to over-write 1-1 mapping 
 (pfn: %lx)\n, pfn))
 - continue;
 - WARN_ONCE(mfn != INVALID_P2M_ENTRY, Trying to remove %lx which 
 has %lx mfn!\n,
 -   pfn, mfn);
 + for (i = 0; i  XEN_EXTRA_MEM_MAX_REGIONS; i++) {
 + start_r = xen_extra_mem[i].start;
 + size_r = xen_extra_mem[i].size;
 +
 + /* Start of region. */
 + if (start_r == start) {
 + BUG_ON(size  size_r);
 + xen_extra_mem[i].start += size;
 + xen_extra_mem[i].size -= size;
 + break;
 + }
 + /* End of region. */
 + if (start_r + size_r == start + size) {
 + BUG_ON(size  size_r);
 + xen_extra_mem[i].size -= size;
 + break;
 + }
 + /* Mid of region. */
 + if (start  start_r  start  start_r + size_r) {
 + BUG_ON(start + size  start_r + size_r);
 + xen_extra_mem[i].size = start - start_r;
 + xen_add_extra_mem(start + size, start_r + size_r -
 +   (start + size));

Which ends up calling 'memblock_reserve' for an region it already has
reserved. Should we call memblock_free(start_r, size_r - size) before calling 
this?

Or is that not neccessary as memblock_* is pretty smart about this sort of 
thing?

 + break;
 + }
 + }
 + memblock_free(start, size);
 +}
 +

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V3 6/8] xen: Hide get_phys_to_machine() to be able to tune common path

2014-11-12 Thread Konrad Rzeszutek Wilk

On Tue, Nov 11, 2014 at 06:43:44AM +0100, Juergen Gross wrote:
 Today get_phys_to_machine() is always called when the mfn for a pfn
 is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
 to be able to avoid calling get_phys_to_machine() when possible as

s/when/where/
 soon as the switch to a linear mapped p2m list has been done.

But your inline function still calls get_phys_to_machine?


 
 Signed-off-by: Juergen Gross jgr...@suse.com
 ---
  arch/x86/include/asm/xen/page.h | 27 +--
  arch/x86/xen/mmu.c  |  2 +-
  arch/x86/xen/p2m.c  |  6 +++---
  3 files changed, 25 insertions(+), 10 deletions(-)
 
 diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
 index 28fa795..07d8a7b 100644
 --- a/arch/x86/include/asm/xen/page.h
 +++ b/arch/x86/include/asm/xen/page.h
 @@ -59,6 +59,22 @@ extern int clear_foreign_p2m_mapping(struct 
 gnttab_unmap_grant_ref *unmap_ops,
struct page **pages, unsigned int count);
  extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long 
 pfn);
  
 +/*
 + * When to use pfn_to_mfn(), __pfn_to_mfn() or get_phys_to_machine():
 + * - pfn_to_mfn() returns either INVALID_P2M_ENTRY or the mfn. In case of an
 + *   identity entry the identity indicator will be cleared.

Why don't you say : In case of identity PFN the same PFN is returned.

But you did miss that also the FOREIGN_FRAME_BIT is cleared.

 + * - __pfn_to_mfn() returns the found entry of the p2m table. A possibly set

s/of the/in the/
 + *   identity indicator will be still set. __pfn_to_mfn() is encapsulating
.. be still set if the PFN is an identity one.
 + *   get_phys_to_machine() and might skip that function if possible to speed
 + *   up the common path.

How is is skipping that function? The patch below does no such thing?

 + * - get_phys_to_machine() is basically the same as __pfn_to_mfn(), but
 + *   without any short cuts for the common fast path.

Right. Perhpas we should call it 'slow_p2m' instead of the 
'get_phys_to_machine'.

 + */
 +static inline unsigned long __pfn_to_mfn(unsigned long pfn)
 +{
 + return get_phys_to_machine(pfn);
 +}
 +

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [TestDay] VMX test report for Xen 4.5.0-rc1

2014-11-12 Thread Konrad Rzeszutek Wilk

On Wed, Nov 12, 2014 at 12:06:20PM -0400, Konrad Rzeszutek Wilk wrote:
 On Wed, Nov 12, 2014 at 06:58:49AM +, Hu, Robert wrote:
  Hi All,
  
  This is a bug summary for Xen 4.5-rc1 on Intel Server platforms.
 
 Yeey! Thank you for doing those tests.
  
  Test environment:
  Xen: Xen 4.5-rc1 
  Dom0: Linux kernel 3.17.0
  Hardware: Intel IVT-EX, Haswell-EP, BDW Client, HSW-EX, IVT-EX, HSW-UP
  
  New bugs(9):
 
  4. Not all PFs are available if assign multi VT-d devices to Wndows guest VM
http://bugzilla-archived.xenproject.org/bugzilla/show_bug.cgi?id=1896
 
 Please post the bug as an email on xen-devel so we can start the
 discussion here.
 
 
  5. Dom0 failed to resume from S3 power state
 
 Interstingly enough it works for me on my AMD box. I will shortly try
 it out with the Intel laptop (X230) (been using Xen 4.4 + 3.17.0).

I put it on laptop (Lenovo X230) and used 'pm-suspend' and as well just
closing the lid. In both cases it worked.

Does that work for you as well?

Thanks.
 
 Please post the bug as an email on xen-devel so we can start the
 discussion here.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] xl: correct test condition on libxl_domain_info

2014-11-12 Thread Konrad Rzeszutek Wilk

On November 12, 2014 8:01:21 PM EST, Chao Peng chao.p.p...@linux.intel.com 
wrote:
On Wed, Nov 12, 2014 at 10:22:07AM -0500, Konrad Rzeszutek Wilk wrote:
 On Wed, Nov 12, 2014 at 11:10:52AM +, Ian Campbell wrote:
  CCing the folks who signed-of-by is on the original patch
  
  On Wed, 2014-11-12 at 11:05 +, Wei Liu wrote:
   The `if' statement considered return value 0 from
libxl_domain_info an
   error, while 0 actually means success.
   
   Signed-off-by: Wei Liu wei.l...@citrix.com
  
  Acked-by: Ian Campbell ian.campb...@citrix.com
  
   Cc: Ian Jackson ian.jack...@eu.citrix.com
   ---
   This is a bug fix for PSR feature. This feature was added
recently and it's
   not an regression. However, it would be good to have it working
correctly
   since the beginning, and the fix is straightforward, which should
be of
   very low risk.
  
  Ack.
 
 I concur and I think it should go in Xen 4.5, but I would like their
 input first.

Yeah, I'd like it to go in.
Thanks Wei for your quick fix.

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Chao
 
  
   ---
tools/libxl/xl_cmdimpl.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
   
   diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
   index 3c9f146..9afef3f 100644
   --- a/tools/libxl/xl_cmdimpl.c
   +++ b/tools/libxl/xl_cmdimpl.c
   @@ -7908,7 +7908,7 @@ static int
psr_cmt_show_cache_occupancy(uint32_t domid)
/* Each domain */
if (domid != INVALID_DOMID) {
libxl_dominfo dominfo;
   -if (!libxl_domain_info(ctx, dominfo, domid)) {
   +if (libxl_domain_info(ctx, dominfo, domid)) {
fprintf(stderr, Failed to get domain info for
%d\n, domid);
return -1;
}
  
  
  
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] XenServer results from the Test Day for RC2

2014-11-13 Thread Konrad Rzeszutek Wilk

On Thu, Nov 13, 2014 at 04:02:47PM +, Andrew Cooper wrote:
 Hi,
 
 While attempting to test Konrads new interrupt injection logic, I got
 blocked behind yet another Pygrub bug caused by c/s d1b93ea
 
 This is the first time I have looked into the issue, but a cursory
 inspection of the first hunk shows that it cannot possibly be correct as
 self._default is either an integer or string.
 
 I have a possible workaround which I am testing, but cursory review of
 the patch would have shown that it cannot work as intended.
 
 Notice that self._default is now either a string or an integer,
 defaulting to an integer, and the top level code is updated to require a
 string.  As a result, any bootloader configuration which doesn't
 explicitly set a default, or still drives this logic with integers
 (ExtLinux or LiLO) will die with an AttributeError

Thank you for testing it. In case you don't get to it - and are 
preempted by other stuff - please do give me a braindump (and
preliminary patch if you have one) so I can dig into it.

Thank you!
 
 ~Andrew
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] xen: remove Xencomm

2014-11-13 Thread Konrad Rzeszutek Wilk

On Thu, Nov 13, 2014 at 04:18:58PM +, Jan Beulich wrote:
  On 13.11.14 at 17:03, t...@xen.org wrote:
  Being a feature that has only been used by ia64 and/or ppc it
  doesn't seem like we need to keep it any longer in the tree.
  
  So remove it.
  
  Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
  Signed-off-by: Tim Deegan t...@xen.org
 
 Acked-by: Jan Beulich jbeul...@suse.com

Are folks OK if this is deferred to Xen 4.6?

Thank you!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] kexec-ed kernel possibly needing low memory

2014-11-14 Thread Konrad Rzeszutek Wilk

On Fri, Nov 14, 2014 at 11:02:54AM +, David Vrabel wrote:
 On 14/11/14 10:00, Jan Beulich wrote:
  David,
  
  we're being approached with the situation where a disk driver in the
  kexec-ed kernel needs memory below 4G in order to perform DMA
  (e.g. for the swiotlb to be set up). Linux not so long ago invented a
  two area approach, which doesn't fit with the current single
  KEXEC_RANGE_MA_CRASH area obtainable via
  KEXEC_CMD_kexec_get_range.
  
  I see multiple options
  - do no change at all; the user can deal with this by explicitly
specifying an area below 4G via crashkernel=
 
 This is what we do.
 
  - add KEXEC_RANGE_MA_CRASH_LOW
 
 If you choose this option, it would be preferable to support N (which
 might be 2) arbitrary crash regions rather than specifying that one
 region is always low memory.  I would suggest combining this with a way
 to specify the bounds of each region.
 
 e.g., crashkernel=32M@4G,128M

We could also implement the 'autoplacement' feature that Red Hat has put
in their kexec implementation (not sure if it was upstreamed).

That would nicely deal with this by always putting it under 4GB.
 
 I wouldn't go this route unless you actually need a large crash region
 that would use up too much low memory otherwise.
 
  - when not asked for a specific address, always allocate the (single)
area below 4G if there is enough space
 
 I don't think the default location should change.  A user might have
 specified  a large crash region that might use up most of low memory.
 
  - provide a means to request allocating the (single) area below 4G
(or perhaps more generically below a certain boundary) without
requiring an exact address to be specified
 
 This sounds ok.
 
  Do you have any preference here, or do you see other viable
  alternatives?
 
 My preference would be option 1 since it already works, then option 4.
 
 David
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC] EFI: allow retry of ExitBootServices() call

2014-11-14 Thread Konrad Rzeszutek Wilk

On Fri, Nov 14, 2014 at 12:37:30PM +, Jan Beulich wrote:
 The specification is kind of vague under what conditions
 ExitBootServices() may legitimately fail, requiring the OS loader to
 retry:
 
 If MapKey value is incorrect, ExitBootServices() returns
  EFI_INVALID_PARAMETER and GetMemoryMap() with ExitBootServices() must
  be called again. Firmware implementation may choose to do a partial
  shutdown of the boot services during the first call to
  ExitBootServices(). EFI OS loader should not make calls to any boot
  service function other then GetMemoryMap() after the first call to
  ExitBootServices().
 
 While our code guarantees the map key to be valid, there are systems
 where a firmware internal notification sent while processing
 ExitBootServices() reportedly results in changes to the memory map.

s/reportedly/in fact/
 In that case, make a best effort second try: Avoid any boot service
 calls other than the two named above, with the possible exception of
 error paths. Those aren't a problem, since if we end up needing to
 retry, we're hosed when something goes wrong as much as if we didn't
 make the retry attempt.
 
 For x86, a minimal adjustment to efi_arch_process_memory_map() is
 needed for it to cope with potentially being called a second time.

Wow. Talk about timing. We saw this and were going to see
doing something similar.


 
 For arm64, while efi_process_memory_map_bootinfo() is easy to verify
 that it can safely be called more than once without violating spec
 constraints, it's not so obvious for fdt_add_uefi_nodes(), hence a
 step by step approach:
 - deletion of memory nodes and memory reserve map entries: the 2nd pass
   shouldn't find any as the 1st one deleted them all,
 - a chosen node should be found as it got added in the 1st pass,
 - the various linux,uefi-* nodes all got added during the 1st pass
   and hence only their contents may get updated.
 
 Signed-off-by: Jan Beulich jbeul...@suse.com
 
 --- a/xen/arch/x86/efi/efi-boot.h
 +++ b/xen/arch/x86/efi/efi-boot.h
 @@ -140,7 +140,7 @@ static void __init efi_arch_process_memo
  
  /* Populate E820 table and check trampoline area availability. */
  e = e820map - 1;
 -for ( i = 0; i  map_size; i += desc_size )
 +for ( e820nr = i = 0; i  map_size; i += desc_size )
  {
  EFI_MEMORY_DESCRIPTOR *desc = map + i;
  u64 len = desc-NumberOfPages  EFI_PAGE_SHIFT;
 --- a/xen/common/efi/boot.c
 +++ b/xen/common/efi/boot.c
 @@ -703,7 +703,7 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SY
  EFI_GRAPHICS_OUTPUT_PROTOCOL *gop = NULL;
  EFI_GRAPHICS_OUTPUT_MODE_INFORMATION *mode_info;
  union string section = { NULL }, name;
 -bool_t base_video = 0;
 +bool_t base_video = 0, retry;
  char *option_str;
  bool_t use_cfg_file;
  
 @@ -1051,17 +1051,23 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SY
  if ( !efi_memmap )
  blexit(LUnable to allocate memory for EFI memory map);
  
 -status = efi_bs-GetMemoryMap(efi_memmap_size, efi_memmap, map_key,
 -  efi_mdesc_size, mdesc_ver);
 -if ( EFI_ERROR(status) )
 -PrintErrMesg(LCannot obtain memory map, status);
 +for ( retry = 0; ; retry = 1 )
 +{
 +status = efi_bs-GetMemoryMap(efi_memmap_size, efi_memmap, map_key,
 +  efi_mdesc_size, mdesc_ver);
 +if ( EFI_ERROR(status) )
 +PrintErrMesg(LCannot obtain memory map, status);
  
 -efi_arch_process_memory_map(SystemTable, efi_memmap, efi_memmap_size,
 -efi_mdesc_size, mdesc_ver);
 +efi_arch_process_memory_map(SystemTable, efi_memmap, efi_memmap_size,
 +efi_mdesc_size, mdesc_ver);
  
 -efi_arch_pre_exit_boot();
 +efi_arch_pre_exit_boot();
 +
 +status = efi_bs-ExitBootServices(ImageHandle, map_key);
 +if ( status != EFI_INVALID_PARAMETER || retry )
 +break;
 +}

Any reason for just doing the loop at max twice? Could we iterate
more than those (say forever?) with an printk at suitable intervals
to notify the user?

  
 -status = efi_bs-ExitBootServices(ImageHandle, map_key);
  if ( EFI_ERROR(status) )
  PrintErrMesg(LCannot exit boot services, status);
  
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v10 for-xen-4.5 2/2] dpci: Replace tasklet with an softirq

2014-11-14 Thread Konrad Rzeszutek Wilk

On Fri, Nov 14, 2014 at 03:13:42PM +, Jan Beulich wrote:
  On 12.11.14 at 03:23, konrad.w...@oracle.com wrote:
  +static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
  +{
  +struct domain *d = pirq_dpci-dom;
  +
  +ASSERT(spin_is_locked(d-event_lock));
  +
  +switch ( cmpxchg(pirq_dpci-state, 1  STATE_SCHED, 0) )
  +{
  +case (1  STATE_SCHED):
  +/*
  + * We are going to try to de-schedule the softirq before it goes in
  + * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'.
  + */
  +put_domain(d);
  +/* fallthrough. */
 
 Considering Sander's report, the only suspicious place I find is this
 one: When the STATE_SCHED flag is set, pirq_dpci is on some
 CPU's list. What guarantees it to get removed from that list before
 getting inserted on another one?

None. The moment that STATE_SCHED is cleared, 'raise_softirq_for'
is free to manipulate the list.

We could:
 - Add another bit, say STATE_ZOMBIE - which pt_pirq_softirq_reset could
   set, and dpci_softirq - if it sees it - would clear. Said bit
   would stop 'raise_softirq_for' from trying to do anything.

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..8e9141e 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -50,20 +50,25 @@ static DEFINE_PER_CPU(struct list_head, dpci_list);
 
 enum {
 STATE_SCHED,
-STATE_RUN
+STATE_RUN,
+STATE_ZOMBIE
 };
 
 /*
  * This can be called multiple times, but the softirq is only raised once.
- * That is until the STATE_SCHED state has been cleared. The state can be
- * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
- * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
- * the softirq had a chance to run).
+ * That is until the STATE_SCHED and STATE_ZOMBIE state has been cleared. The
+ * STATE_SCHED and STATE_ZOMBIE state can be cleared by the 'dpci_softirq'
+ * (when it has executed 'hvm_dirq_assist'). The STATE_SCHED can be cleared
+ * by 'pt_pirq_softirq_reset' (which will try to clear the state before the
+ * softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
 {
 unsigned long flags;
 
+if ( test_bit(STATE_ZOMBIE, pirq_dpci-state) )
+return;
+
 if ( test_and_set_bit(STATE_SCHED, pirq_dpci-state) )
 return;
 
@@ -85,7 +90,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
  */
 bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
 {
-if ( pirq_dpci-state  ((1  STATE_RUN) | (1  STATE_SCHED)) )
+if ( pirq_dpci-state  ((1  STATE_RUN) | (1  STATE_SCHED) | (1  
STATE_ZOMBIE) ) )
 return 1;
 
 /*
@@ -109,7 +114,7 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci 
*pirq_dpci)
 
 ASSERT(spin_is_locked(d-event_lock));
 
-switch ( cmpxchg(pirq_dpci-state, 1  STATE_SCHED, 0) )
+switch ( cmpxchg(pirq_dpci-state, 1  STATE_SCHED, 1  STATE_ZOMBIE ) )
 {
 case (1  STATE_SCHED):
 /*
@@ -120,6 +125,7 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci 
*pirq_dpci)
 /* fallthrough. */
 case (1  STATE_RUN):
 case (1  STATE_RUN) | (1  STATE_SCHED):
+case (1  STATE_RUN) | (1  STATE_SCHED) | (1  STATE_ZOMBIE):
 /*
  * The reason it is OK to reset 'dom' when STATE_RUN bit is set is due
  * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
@@ -779,6 +785,7 @@ unlock:
 static void dpci_softirq(void)
 {
 unsigned int cpu = smp_processor_id();
+unsigned int reset = 0;
 LIST_HEAD(our_list);
 
 local_irq_disable();
@@ -805,7 +812,15 @@ static void dpci_softirq(void)
 hvm_dirq_assist(d, pirq_dpci);
 put_domain(d);
 }
+else
+reset = 1;
+
 clear_bit(STATE_RUN, pirq_dpci-state);
+if ( reset )
+{
+clear_bit(STATE_ZOMBIE, pirq_dpci-state);
+reset = 0;
+}
 }
 }
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V3 2/8] xen: Delay remapping memory of pv-domain

2014-11-14 Thread Konrad Rzeszutek Wilk

On Fri, Nov 14, 2014 at 05:53:19AM +0100, Juergen Gross wrote:
 On 11/13/2014 08:56 PM, Konrad Rzeszutek Wilk wrote:
 + mfn_save = virt_to_mfn(buf);
 +
 + while (xen_remap_mfn != INVALID_P2M_ENTRY) {
 
 So the 'list' is constructed by going forward - that is from low-numbered
 PFNs to higher numbered ones. But the 'xen_remap_mfn' is going the
 other way - from the highest PFN to the lowest PFN.
 
 Won't that mean we will restore the chunks of memory in the wrong
 order? That is we will still restore them in chunks size, but the
 chunks will be in descending order instead of ascending?
 
 No, the information where to put each chunk is contained in the chunk
 data. I can add a comment explaining this.
 
 Right, the MFNs in a chunks are going to be restored in the right order.
 
 I was thinking that the chunks (so a set of MFNs) will be restored in
 the opposite order that they are written to.
 
 And oddly enough the chunks are done in 512-3 = 509 MFNs at once?
 
 More don't fit on a single page due to the other info needed. So: yes.

But you could use two pages - one for the structure and the other
for the list of MFNs. That would fix the problem of having only
509 MFNs being contingous per chunk when restoring.

Anyhow the point I had that I am worried is that we do not restore the
MFNs in the same order. We do it in chunk size which is OK (so the 509 MFNs
at once)- but the order we traverse the restoration process is the opposite of
the save process. Say we have 4MB of contingous MFNs, so two (err, three)
chunks. The first one we iterate is from 0-509, the second is 510-1018, the
last is 1019-1023. When we restore (remap) we start with the last 'chunk'
so we end up restoring them: 1019-1023, 510-1018, 0-509 order.

If we go with using two pages - one for the structure and one for the
list of PFNs, we could expand the structure to have an 'next' and 'prev'
MFN. When you then traverse in 'xen_remap_memory' you could do:

mfn = xen_remap_mfn;
while (mfn != INVALID_P2M_ENTRY) {
xen_remap_mfn = mfn;
set_pte_mfn(buf, mfn, PAGE_KERNEL);
mfn = xen_remap_buf.next_area_mfn;
}

And then you can start from this updated xen_remap_mfn which will
start with the first chunk that has been set. Thought at this point
it does not matter whether we have a seperate page for the MFNs as
the restoration/remap process will put them in the save order
that they were saved.

 
 
 
 
 + /* Map the remap information */
 + set_pte_mfn(buf, xen_remap_mfn, PAGE_KERNEL);
 +
 + BUG_ON(xen_remap_mfn != xen_remap_buf.mfns[0]);
 +
 + free = 0;
 + pfn = xen_remap_buf.target_pfn;
 + for (i = 0; i  xen_remap_buf.size; i++) {
 + mfn = xen_remap_buf.mfns[i];
 + if (!released  xen_update_mem_tables(pfn, mfn)) {
 + remapped++;
 
 If we fail 'xen_update_mem_tables' we will on the next chunk (so i+1) keep 
 on
 freeing pages instead of trying to remap. Is that intentional? Could we
 try to remap?
 
 Hmm, I'm not sure this is worth the effort. What could lead to failure
 here? I suspect we could even just BUG() on failure. What do you think?
 
 I was hoping that this question would lead to making this loop a bit
 simpler as you would have to spread some of the code in the loop
 into functions.
 
 And keep 'remmaped' and 'released' reset every loop.
 
 However, if it makes the code more complex - then please
 forget my question.
 
 Using BUG() instead would make the code less complex. Do you really
 think xen_update_mem_tables() would ever fail in a sane system?
 
 - set_phys_to_machine() would fail only on a memory shortage. Just
   going on without adding more memory wouldn't lead to a healthy system,
   I think.
 - The hypervisor calls would fail only in case of parameter errors.
   This should never happen, so dying seems to be the correct reaction.
 
 David, what do you think?
 
 
 Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-14 Thread Konrad Rzeszutek Wilk

On Fri, Nov 14, 2014 at 05:59:23PM +0100, Sander Eikelenboom wrote:
 
 Friday, November 14, 2014, 4:43:58 PM, you wrote:
 
  On 14.11.14 at 16:20, li...@eikelenboom.it wrote:
  If it still helps i could try Andrews suggestion and try out with only 
  commit aeeea485 ..
 
  Yes, even if it's pretty certain it's the second of the commits, verifying
  this would be helpful (or if the assumption is wrong, the pattern it's
  dying with would change and hence perhaps provide further clues).
 
  Jan
 
 
 Ok with a revert of f6dd295 .. it survived cooking and eating a nice bowl of 
 pasta without a panic. So it would probably be indeed that specific commit.

Thank you for confirmation.

Could you give some specifics on the guests? As in what kind of devices you
are giving it - and more interestingly - what type of interrupt mechanism
do they use (/proc/interrupts along with 'dmesg' | grep name of driver should
give some ideas).

I wouldn't worry about the PV case as that bypasses the dpci codebase - just
on the HVM side.

Thanks again!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5 v2 0/4] libxl: CODING_STYLE

2014-11-14 Thread Konrad Rzeszutek Wilk

On Fri, Nov 14, 2014 at 06:52:37PM +, Ian Jackson wrote:
 The structural considerations, error handling patterns, and so on, in
 libxl have remained undocumented.  This has been a problem during
 recent code submissions and reviews.
 
 In this series I have attempted to document current best practice.
 The first patch contains much new material and the others are minor
 and consequential fixes.
 
 *1/4 libxl: CODING_STYLE: Much new material
  2/4 libxl: CODING_STYLE: Deprecate `error' for out blocks
  3/4 libxl: CODING_STYLE: Mention function out parameters
 *4/4 libxl: CODING_STYLE: Discuss existing style problems
 
  * = Modified patches, to take into account review comments from
  Ian Campbell.
 
 I would like to suggest that this ought to go into 4.5 because:
  - it would be useful documentation to assist with any nontrivial
bugfixes that need to be made to libxl
  - there isn't any risk of it actually breaking anything, because
it's just doc changes.

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 Thanks,
 Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] PCI and VGA Passthrough regressions on Xen 4.4.1 vs 4.3.2

2014-11-15 Thread Konrad Rzeszutek Wilk

On November 15, 2014 6:59:51 AM EST, Zir Blazer zir_bla...@hotmail.com wrote:
I'm happy that you managed to reproduce the sound issue. It means that
it will be possible to track it down and fix it.
So, if I understanded properly: It does NOT happen on your custom Xen
4.4 build, but the one that Fedora got is affected by it.

I did an 'custom' build (the only custom thing is that it is stock and has
debug=y) and with Xen 4.4 from that build I can hear the sound issue.

From my testing - Xen 4.5 - good. Xen 4.4 - not good. Hadn't yet done 4.3.

Which is excellent as it removes variables.

Could it be
that your unstable build got any recent patch that fixes it that Fedora
doesn't have should it be using the stable source? Should be the same
with Arch Linux.
Have your tried with the workaround of not using xen-pciback for the
Sound Card and instead using xl pci-assignable-add? It should work,
too.

The kernel I am using as dom0 has no sound drivers so the pre-init workaround
is not in effect.

Depending on the list of patches - it might be easier to bisect based on 4.5
instead of Xen 4.3 or 4.4.

Anyhow have to figure out how to make this automated to do the week long
bisection.

For my Radeon 5770, I use Catalyst 12.1 Drivers. The reason is that
they're the latest which includes OpenCL support out of the box in
Windows XP. It got removed from later versions, and not so much later,
AMD branched out old GPU into another set of Drivers. If I recall
correctly, the 12.1 are still unified, so they should work for your
4xxx.
Cause AMD website has become a pain in the ass to browse if you're
looking for older Driver versions archives, you can use the TechPowerUp
mirror:
www.techpowerup.com/downloads/2096/amd-catalyst-12-1-software-suite-winxp-32-bit/mirrors
Download with confidence, they're the same ones that I'm using and
TechPowerUp is a reputable Hardware reviewer website. Otherwise, you
can download the latest legacy 14.4 from AMD website:
support.amd.com/en-us/download/desktop/legacy?product=legacy2os=Windows%20XP%20-%20Professional/HomeRenderOnServer=true

Ah! Thank you for the link!

I will stay tuned on this.

Date: Fri, 14 Nov 2014 17:12:19 -0500
From: konrad.w...@oracle.com
To: zir_bla...@hotmail.com
CC: xen-devel@lists.xen.org
Subject: Re: [Xen-devel] PCI and VGA Passthrough regressions on Xen
4.4.1 vs 4.3.2

On Fri, Nov 14, 2014 at 04:08:14PM -0500, Konrad Rzeszutek Wilk
wrote:
On Mon, Oct 06, 2014 at 03:55:36AM -0300, Zir Blazer wrote:
There is a regression in both PCI and VGA Passthrough in Xen 4.4.1
when compared to 4.3.2. Due to the added complexity of VGA Passthrough,
I was suggested to focus instead in my PCI Passthrough issue, which
involves the integrated Sound Card of my Motherboard. While it passes
to the DomU and works, it produces robotic, distorted and lagged noise
everytime it reproduces sound. The Video Card is an absolute no-go. On
the previous Xen version, everything else being equal, both works
properly.I have been trying to gather as much information as possible
of my issue according to Xen Wiki articles
http://wiki.xen.org/wiki/Debugging_Xen and
http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen so this E-Mail
comes attached with tons of logs.

HARDWARE
Processor: Intel Xeon E3-1245V3 (Haswell) -
http://ark.intel.com/products/75462/Intel-Xeon-Processor-E3-1245-v3-8M-Cache-3_40-GHz
Motherboard: Supermicro X10SAT (Chipset C226, BIOS R2.0) -
http://www.supermicro.com/products/motherboard/Xeon/C220/X10SAT.cfm
Sound Card: Integrated Realtek ALC1150

So I tried this on my box which has similar specs:

Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

[ 0.00] DMI: Supermicro X10SAE/X10SAE, BIOS 2.00 04/21/2014

with todays' xen-unstable and with a simple guest config:

builder='hvm'
memory = 2048
name = WinXP
vcpus=1
vif = [ 'type=ioemu, mac=00:0F:4B:00:00:61,bridge=switch' ]
disk=[ 'phy:/dev/G/WinXP,hda,w']
vnc=1
videoram=8
vnclisten=0.0.0.0
vncpasswd=''
stdvga=0
usb=1
usbdevice='tablet'
qemu_model_version=traditional
device_model_version = 'qemu-xen-traditional'
pci=[01:00.0,01:00.1,00:1b.0]

# lspci -s 01:00.*
01:00.0 VGA compatible controller: ATI Technologies Inc RV770
[Radeon HD 4870]
01:00.1 Audio device: ATI Technologies Inc HD48x0 audio

# lspci -s 00:1b.0
00:1b.0 Audio device: Intel Corporation Device 8c20 (rev 04)

This is using Windows XP 32 (don't have 64bit laying around) and
when
using the Realtek HD Sound Effects and the '3D Audio Demo' it plays
nicely on my speakers.

Aha! But if I use Xen 4.4 from Fedora 20 I get an horrible sound.

The plot thickens! The other interesting thing is that my unstable
build
is with 'debug=y' while the Xen 4.4 from Fedora is not.

The VGA is a different story as it seems there are no Video drivers
for XP for it available at all!

___
Xen-devel mailing list

Re: [Xen-devel] [PATCH RFC] EFI: allow retry of ExitBootServices() call

2014-11-17 Thread Konrad Rzeszutek Wilk

On Mon, Nov 17, 2014 at 01:35:17PM +, Jan Beulich wrote:
  On 14.11.14 at 16:32, konrad.w...@oracle.com wrote:
  On Fri, Nov 14, 2014 at 12:37:30PM +, Jan Beulich wrote:
  The specification is kind of vague under what conditions
  ExitBootServices() may legitimately fail, requiring the OS loader to
  retry:
  
  If MapKey value is incorrect, ExitBootServices() returns
   EFI_INVALID_PARAMETER and GetMemoryMap() with ExitBootServices() must
   be called again. Firmware implementation may choose to do a partial
   shutdown of the boot services during the first call to
   ExitBootServices(). EFI OS loader should not make calls to any boot
   service function other then GetMemoryMap() after the first call to
   ExitBootServices().
  
  While our code guarantees the map key to be valid, there are systems
  where a firmware internal notification sent while processing
  ExitBootServices() reportedly results in changes to the memory map.
  
  s/reportedly/in fact/
  In that case, make a best effort second try: Avoid any boot service
  calls other than the two named above, with the possible exception of
  error paths. Those aren't a problem, since if we end up needing to
  retry, we're hosed when something goes wrong as much as if we didn't
  make the retry attempt.
  
  For x86, a minimal adjustment to efi_arch_process_memory_map() is
  needed for it to cope with potentially being called a second time.
  
  Wow. Talk about timing. We saw this and were going to see
  doing something similar.
 
 So what are your thoughts then regarding this patch going into 4.5?

Definitly should go in - and I would even say backport to earlier
versions of Xen.

 And for the LIST_POISON* override one?

Yes as well please.
 
 Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-18 Thread Konrad Rzeszutek Wilk

 Without #define DIFF_LIST 1:
 1) The guest still crashes (xl-dmesg-not-defined.txt)

AHA!

 --MARK--
  0: 8305085ffd28 [p:83054ef27e88, n:83054ef27e88]
  1: 8305085ffd28 [p:0200200200200200, n:0100100100100100]

The same pirq_dpci structure is put twice on the list. That
will surely make it unhappy. The reason the #define DIFF_LIST 1
would work is that it has code to deal with that odd scenario.

But .. more importantly - the code should not allow you to
put _two_ of the same 'pirq_dpci' structures on the list.

 CPU00: 
 d16 OK-softirq 186msec ago, state:1, 3257 count, [prev:82d0802e7e88, 
next:82d0802e7e88] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT  
PIRQ:87
 d16 OK-raise   236msec ago, state:1, 3257 count, [prev:0200200200200200, 
next:0100100100100100] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT  
PIRQ:87
 CPU01: 
 d16 OK-softirq 232msec ago, state:1, 2978 count, [prev:83054ef57e70, 
next:83054ef57e70] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT  
PIRQ:87
 d16 OK-raise   281msec ago, state:1, 2978 count, [prev:0200200200200200, 
next:0100100100100100] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT  
PIRQ:87
 CPU02: 
 d16 OK-softirq 373msec ago, state:1, 2378 count, [prev:83054ef47e70, 
next:83054ef47e70] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT  
PIRQ:87
 d16 OK-raise   423msec ago, state:1, 2378 count, [prev:0200200200200200, 
next:0100100100100100] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT  
PIRQ:87
 CPU03: 
 d16 OK-softirq 867msec ago, state:1, 2744 count, [prev:83054ef37e70, 
next:83054ef37e70] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT  
PIRQ:87
 d16 OK-raise   916msec ago, state:1, 2744 count, [prev:0200200200200200, 
next:0100100100100100] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT  
PIRQ:87
 CPU04: 
 d16 OK-softirq 497msec ago, state:1, 76910 count, [prev:83054ef2e3e0, 
next:83054ef2e3e0] 8305085ffd28MACH_PCI_SHIFT MAPPED_SHIFT 
GUEST_PCI_SHIFT  PIRQ:0
 d16 OK-raise   489msec ago, state:1, 76916 count, [prev:83054ef2e3e0, 
next:83054ef2e3e0] 8305085ffd28MACH_PCI_SHIFT MAPPED_SHIFT 
GUEST_PCI_SHIFT  PIRQ:0
 d16 ERR-poison 600msec ago, state:0, 1 count, [prev:0200200200200200, 
next:0100100100100100] 8305085ffd28MACH_PCI_SHIFT MAPPED_SHIFT 
GUEST_PCI_SHIFT  PIRQ:0
 CPU05: 
 d16 OK-softirq 852msec ago, state:1, 2207 count, [prev:83054ef1fe70, 
next:83054ef1fe70] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT  
PIRQ:87
 d16 OK-raise   902msec ago, state:1, 2207 count, [prev:0200200200200200, 
next:0100100100100100] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT  
PIRQ:87
 domain_crash called from io.c:964
 Domain 16 reported crashed by domain 7 on cpu#4:


 
 With #define DIFF_LIST 1:
 2) Nor the guest or the host crashed (let it run for about an hour and 15 
 minutes), 
but the USB XHCI driver bailed out quickly after guest boot, so there were 
 no MSI-X interrupts anymore.
(xl-dmesg-defined-nousb.txt, dmesg-guest-defined-nousb.txt)

Hm, that is odd. Can you tell me how you get your SeaBIOS to add those
extra statements? I don't seem to see them with my SeaBIOS (I've an XHCI
controlled hooked up to the guest too). Maybe I am missing an CONFIG in the
SeaBIOS build.

 3) On another boot the USB XHCI didn't bail out, after a while the host 
 crashes. (serial.log)
(XEN) [2014-11-18 14:53:37.364] RIP:e008:[82d08014a4de] 
 hvm_do_IRQ_dpci+0xf4/0x131
which resolves to:
# addr2line -e xen-syms 82d08014a4de
/usr/src/new/xen-unstable-vanilla/xen/include/xen/list.h:67
which is:
static inline void __list_add(struct list_head *new,
   struct list_head *prev,
   struct list_head *next)
 {
 next-prev = new;
 new-next = next;
 new-prev = prev;
 Here -prev-next = new;
 }

Looking at the stack:


(XEN) [2014-11-18 14:53:37.306]  0: 8303faaf25a8 [p:83054ef2e3e0, 
n:83054ef2e3e0]

We detected that the list was poison and were printing it, while an:

 [ Xen-4.5.0-rc  x86_64  debug=y  Not tainted ]
 CPU:4
 RIP:e008:[82d08014a4de] hvm_do_IRQ_dpci+0xf4/0x131
 RFLAGS: 00010082   CONTEXT: hypervisor
 rax: 83054ef2e3e0   rbx: 8303faaf25a8   rcx: 8303faaf2610
 rdx: 0200200200200200   rsi: 0006   rdi: 6ef3f123
 rbp: 83054ef27be8   rsp: 83054ef27bd8   r8:  8303faaf2010
 r9:  002f   r10: 0132   r11: 0003
 r12: 8305125d6000   r13:    r14: 8303faaf2580
 r15: 002f   cr0: 8005003b   cr4: 06f0
 cr3: 0005448c   cr2: ff600400
 ds: 002b   es: 002b   fs:    gs:    ss: e010   cs: e008
 Xen stack trace from rsp=83054ef27bd8:
83054ef27be8 8303faaf26c0 83054ef27c78 82d080172060
0020 83054ef27cf6 0046 83054ef27c70

Re: [Xen-devel] [for-xen-4.5 PATCH] Ignore non-zero data in unused xsave area.

2014-11-18 Thread Konrad Rzeszutek Wilk

On Tue, Nov 18, 2014 at 10:26:31AM -0500, Don Koch wrote:
 If we restore an xsave area from an older xen that has a larger
 size than the xcr0 bit call for, it is possible to have non-zero
 data in the unused area if an xsave has ever been done that used
 that area (e.g. during a context switch). Since the vcpu's xsave
 area is never zeroed after the initial allocation, that data is
 still there. Since we are told that said area was not written to
 during the save or migration, there is no need to restore it.
 
 Signed-off-by: Don Koch dk...@verizon.com
 ---
 Turns out the assertion that the unused xsave area is zero
 is wrong. Unfortunately, that leaves the following as the
 only way I can think of to work around it (and is no worse
 than xsave/xrestore during context switches). Alternate
 suggestions welcome.

This is Xen 4.5 material I presume.

 
  xen/arch/x86/hvm/hvm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
 index 8f49b44..b2c0bc4 100644
 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -2044,7 +2044,7 @@ static int hvm_load_cpu_xsave_states(struct domain *d, 
 hvm_domain_context_t *h)
  printk(XENLOG_G_WARNING
 HVM%d.%u restore mismatch: xsave length %#x  %#x 
 (non-zero data at %#x)\n,
 d-domain_id, vcpuid, desc-length, size, i);
 -return -EOPNOTSUPP;
 +break;
  }
  }
  printk(XENLOG_G_WARNING
 -- 
 1.8.3.1
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxl: Document device parameter of libxl_device_type_add functions

2014-11-18 Thread Konrad Rzeszutek Wilk

On Tue, Nov 18, 2014 at 05:44:20PM +, Ian Jackson wrote:
 Euan Harris writes ([PATCH] libxl: Document device parameter of 
 libxl_device_type_add functions):
  The device parameter of libxl_device_type_add is an in/out parameter.
  Unspecified fields are filled in with appropriate values for the created
  device when the function returns.  Document this behaviour.
 
 Thanks.
 
 Acked-by: Ian Jackson ian.jack...@eu.citrix.com
 
 Konrad, this should go into 4.5 IMO because it's a doc comment
 describing existing behaviour which we want everyone to rely on.

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 Ian.
 
  Signed-off-by: Euan Harris euan.har...@citrix.com
  ---
   tools/libxl/libxl.h | 4 
   1 file changed, 4 insertions(+)
  
  diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
  index c3451bd..41d6e8d 100644
  --- a/tools/libxl/libxl.h
  +++ b/tools/libxl/libxl.h
  @@ -1109,6 +1109,10 @@ void libxl_vtpminfo_list_free(libxl_vtpminfo *, int 
  nr_vtpms);
*   domain to connect to the device and therefore cannot block on the
*   guest.
*
  + *   device is an in/out parameter:  fields left unspecified when the
  + *   structure is passed in are filled in with appropriate values for
  + *   the device created.
  + *
* libxl_device_type_remove(ctx, domid, device):
*
*   Removes the given device from the specified domain by performing
  -- 
  1.9.3
  

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-18 Thread Konrad Rzeszutek Wilk

On Tue, Nov 18, 2014 at 05:20:44PM +, Jan Beulich wrote:
  On 18.11.14 at 18:03, li...@eikelenboom.it wrote:
  Tuesday, November 18, 2014, 5:16:50 PM, you wrote:
   1) test_and_[set|clear]_bit sometimes return unexpected values.
  [But this might be invalid as the addition of the 8303faaf25a8
   might be correct - as the second dpci the softirq is processing
   could be the MSI one]
  
  Would there be an easy way to stress test this function separately in some 
  debugging function to see if it indeed is returning unexpected values ?
 
 I don't think there's a reasonable chance of these functions to return
 unexpected values - they're being used elsewhere, and you'd have
 had other problems in the past if they didn't behave as expected.

Interestingly most of the 'test_and_[set|clear]_bit operate on 'unsigned long'
while we do 'unsigned int' here. But the 'test_and'' are all btXl so double-word
safe.

The fact that Sander is able to get 'test_and_clear_bit(STATE_SCHED)' to return
zero - when in fact it should return a positive value - implies that some actor
is messing with the 'state' outside our raise/softirq dialogue.

pt_irq_guest_eoi does pirq_dpci-state = 0 unconditionally!

Lets see if this debug + potential patch helps. This should be applied
on top of the other patch you had. Just in case I am attaching all four
to this email.

From 8093140e374fceb9121ccd07726fb3898b212bfb Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Date: Tue, 18 Nov 2014 15:08:15 -0500
Subject: [PATCH 4/5] DEBUG4: Add an 'stamp' and potential fix.

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/drivers/passthrough/io.c | 57 +++-
 xen/include/xen/hvm/irq.h|  1 +
 2 files changed, 41 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index b786bd2..8a8fc62 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -26,6 +26,8 @@
 #include asm/hvm/iommu.h
 #include asm/hvm/support.h
 #include xen/hvm/irq.h
+#include xen/console.h
+
 
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
@@ -61,21 +63,29 @@ struct _debug_f {
 struct list_head list;
 unsigned int state;
 struct hvm_pirq_dpci *dpci;
+unsigned long stamp;
 };
 
 struct __debug {
 struct _debug_f ok;
 struct _debug_f poison;
 struct _debug_f raise;
+struct _debug_f busy_raise;
 struct _debug_f reset;
 struct _debug_f zombie_softirq;
 struct _debug_f zombie_raise;
+struct _debug_f timeout;
+struct _debug_f clear;
 };
 
 static DEFINE_PER_CPU(struct __debug, _d);
 
 void _record(struct _debug_f *d, struct hvm_pirq_dpci *pirq_dpci)
 {
+
+if (pirq_dpci-pirq)
+return;
+
 if (pirq_dpci-dom)
 d-domid = pirq_dpci-dom-domain_id;
 else
@@ -86,6 +96,7 @@ void _record(struct _debug_f *d, struct hvm_pirq_dpci 
*pirq_dpci)
 d-list.prev = pirq_dpci-softirq_list.prev;
 d-state = pirq_dpci-state;
 d-dpci = pirq_dpci;
+d-stamp = pirq_dpci-stamp++;
 }
 
 enum {
@@ -95,6 +106,9 @@ enum {
 OK_SOFTIRQ,
 OK_RAISE,
 OK_RESET,
+OK_TIMEOUT,
+OK_BUSY,
+OK_CLEAR,
 };
 
 static void dump_record(struct _debug_f *d, unsigned int type)
@@ -106,7 +120,11 @@ static void dump_record(struct _debug_f *d, unsigned int 
type)
 [OK_SOFTIRQ] = OK-softirq,
 [OK_RAISE]   = OK-raise  ,
 [OK_RESET]   = OK-reset  ,
+[OK_TIMEOUT] = OK-timeout,
+[OK_BUSY]= OK-busy   ,
+[OK_CLEAR]   = OK-clear  ,
 };
+#if 0
 #define LONG(x) [_HVM_IRQ_DPCI_##x] = #x
 static const char *const names_flag[] = {
 LONG(MACH_PCI_SHIFT),
@@ -117,6 +135,7 @@ static void dump_record(struct _debug_f *d, unsigned int 
type)
 };
 #undef LONG
 unsigned int i;
+#endif
 s_time_t now;
 
 if ( d-domid == 0 )
@@ -126,20 +145,21 @@ static void dump_record(struct _debug_f *d, unsigned int 
type)
 BUG();
 
 now = NOW();
-printk(d%d %s %lumsec ago, state:%x, %ld count, [prev:%p, next:%p] %p,
+printk(d%d %s %lumsec ago, state:%x, %ld count, [prev:%p, next:%p] %p 
%lx,
d-domid, names[type],
(unsigned long)((now - d-last) / MILLISECS(1)),
-d-state, d-count, d-list.prev, d-list.next, d-dpci);
+d-state, d-count, d-list.prev, d-list.next, d-dpci, d-stamp);
 
 if ( d-dpci )
 {
 struct hvm_pirq_dpci *pirq_dpci = d-dpci;
-
+#if 0
 for ( i = 0; i = _HVM_IRQ_DPCI_GUEST_MSI_SHIFT; i++ )
 if ( pirq_dpci-flags  (1  i) )
 printk(%s , names_flag[i]);
 
 printk( PIRQ:%d, pirq_dpci-pirq);
+#endif
 if (pirq_dpci-line)
 printk( LINE: %d, pirq_dpci-line);
 }
@@ -159,7 +179,10 @@ static void dump_debug(unsigned char key)
 printk(CPU%02d: \n ,cpu);
 dump_record(d-ok, OK_SOFTIRQ);
 dump_record(d-raise, OK_RAISE

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-18 Thread Konrad Rzeszutek Wilk

 
 Uhmm i thought i had these switched off (due to problems earlier and then 
 forgot 
 about them .. however looking at the earlier reports these lines were also in 
 those reports).
 
 The xen-syms and these last runs are all with a prestine xen tree cloned 
 today (staging 
 branch), so the qemu-xen and seabios defined with that were also freshly 
 cloned 
 and had a new default seabios config. (just to rule out anything stale in my 
 tree)
 
 If you don't see those messages .. perhaps your seabios and qemu trees (and 
 at least the 
 seabios config) are not the most recent (they don't get updated automatically 
 when you just do a git pull on the main tree) ?
 
 In /tools/firmware/seabios-dir/.config i have:
 CONFIG_USB=y
 CONFIG_USB_UHCI=y
 CONFIG_USB_OHCI=y
 CONFIG_USB_EHCI=y
 CONFIG_USB_XHCI=y
 CONFIG_USB_MSC=y
 CONFIG_USB_UAS=y
 CONFIG_USB_HUB=y
 CONFIG_USB_KEYBOARD=y
 CONFIG_USB_MOUSE=y
 

I seem to have the same thing. Perhaps it is my XHCI controller being wonky.

 And this is all just from a:
 - git clone git://xenbits.xen.org/xen.git -b staging
 - make clean  ./configure  make -j6  make -j6 install

Aye. 
.. snip..
   1) test_and_[set|clear]_bit sometimes return unexpected values.
  [But this might be invalid as the addition of the 8303faaf25a8
   might be correct - as the second dpci the softirq is processing
   could be the MSI one]
 
 Would there be an easy way to stress test this function separately in some 
 debugging function to see if it indeed is returning unexpected values ?

Sadly no. But you got me looking in the right direction when you mentioned
'timeout'.
 
   2) INIT_LIST_HEAD operations on the same CPU are not honored.
 
 Just curious, have you also tested the patches on AMD hardware ?

Yes. To reproduce this the first thing I did was to get an AMD box.

 
  
  When i look at the combination of (2) and (3), It seems it could be an 
  interaction between the two passed through devices and/or different IRQ 
  types.
 
  Could be - as in it is causing this issue to show up faster than
  expected. Or it is the one that triggers more than one dpci happening
  at the same time.
 
 Well that didn't seem to be it (see separate amendment i mailed previously)

Right, the current theory I've is that the interrupts are not being
Acked within 8 milisecond and we reset the 'state' - and at the same
time we get an interrupt and schedule it - while we are still processing
the same interrupt. This would explain why the 'test_and_clear_bit'
got the wrong value.

In regards to the list poison - following this thread of logic - with
the 'state = 0' set we open the floodgates for any CPU to put the same
'struct hvm_pirq_dpci' on its list.

We do reset the 'state' on _every_ GSI that is mapped to a guest - so
we also reset the 'state' for the MSI one (XHCI). Anyhow in your case:

CPUX:   CPUY:
pt_irq_time_out:
state = 0;  
[out of timer coder, theraise_softirq
 pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 0]

softirq_dpcisoftirq_dpci:
list_del
[entries poison]
list_del = BOOM

Is what I believe is happening.

The INTX device - once I put a load on it - does not trigger
any pt_irq_time_out, so that would explain why I cannot hit this.

But I believe your card hits these hiccups.   

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-18 Thread Konrad Rzeszutek Wilk

On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote:
 
 Tuesday, November 18, 2014, 9:56:33 PM, you wrote:
 
  
  Uhmm i thought i had these switched off (due to problems earlier and then 
  forgot 
  about them .. however looking at the earlier reports these lines were also 
  in 
  those reports).
  
  The xen-syms and these last runs are all with a prestine xen tree cloned 
  today (staging 
  branch), so the qemu-xen and seabios defined with that were also freshly 
  cloned 
  and had a new default seabios config. (just to rule out anything stale in 
  my tree)
  
  If you don't see those messages .. perhaps your seabios and qemu trees 
  (and at least the 
  seabios config) are not the most recent (they don't get updated 
  automatically 
  when you just do a git pull on the main tree) ?
  
  In /tools/firmware/seabios-dir/.config i have:
  CONFIG_USB=y
  CONFIG_USB_UHCI=y
  CONFIG_USB_OHCI=y
  CONFIG_USB_EHCI=y
  CONFIG_USB_XHCI=y
  CONFIG_USB_MSC=y
  CONFIG_USB_UAS=y
  CONFIG_USB_HUB=y
  CONFIG_USB_KEYBOARD=y
  CONFIG_USB_MOUSE=y
  
 
  I seem to have the same thing. Perhaps it is my XHCI controller being wonky.
 
  And this is all just from a:
  - git clone git://xenbits.xen.org/xen.git -b staging
  - make clean  ./configure  make -j6  make -j6 install
 
  Aye. 
  .. snip..
1) test_and_[set|clear]_bit sometimes return unexpected values.
   [But this might be invalid as the addition of the 8303faaf25a8
might be correct - as the second dpci the softirq is processing
could be the MSI one]
  
  Would there be an easy way to stress test this function separately in some 
  debugging function to see if it indeed is returning unexpected values ?
 
  Sadly no. But you got me looking in the right direction when you mentioned
  'timeout'.
  
2) INIT_LIST_HEAD operations on the same CPU are not honored.
  
  Just curious, have you also tested the patches on AMD hardware ?
 
  Yes. To reproduce this the first thing I did was to get an AMD box.
 
  
   
   When i look at the combination of (2) and (3), It seems it could be an 
   interaction between the two passed through devices and/or different IRQ 
   types.
  
   Could be - as in it is causing this issue to show up faster than
   expected. Or it is the one that triggers more than one dpci happening
   at the same time.
  
  Well that didn't seem to be it (see separate amendment i mailed previously)
 
  Right, the current theory I've is that the interrupts are not being
  Acked within 8 milisecond and we reset the 'state' - and at the same
  time we get an interrupt and schedule it - while we are still processing
  the same interrupt. This would explain why the 'test_and_clear_bit'
  got the wrong value.
 
  In regards to the list poison - following this thread of logic - with
  the 'state = 0' set we open the floodgates for any CPU to put the same
  'struct hvm_pirq_dpci' on its list.
 
  We do reset the 'state' on _every_ GSI that is mapped to a guest - so
  we also reset the 'state' for the MSI one (XHCI). Anyhow in your case:
 
  CPUX:   CPUY:
  pt_irq_time_out:
  state = 0;  
  [out of timer coder, theraise_softirq
   pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 0]
 
  softirq_dpcisoftirq_dpci:
  list_del
  [entries poison]
  list_del = BOOM
  
  Is what I believe is happening.
 
  The INTX device - once I put a load on it - does not trigger
  any pt_irq_time_out, so that would explain why I cannot hit this.
 
  But I believe your card hits these hiccups.   
 
 
 Hi Konrad,
 
 I just tested you 5 patches and as a result i still got an(other) host crash:
 (complete serial log attached)
 
 (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
 tainted ]
 (XEN) [2014-11-18 21:55:41.591] CPU:0
 (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
 tainted ]
 (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08012c7e7]CPU:2
 (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08014a461] 
 hvm_do_IRQ_dpci+0xbd/0x13c
 (XEN) [2014-11-18 21:55:41.591] RFLAGS: 00010006
 _spin_unlock+0x1f/0x30CONTEXT: hypervisor

Duh!

Here is another patch on top of the five you have (attached and inline).

From 18008650f10199e2b83f74394f634a97086e34b8 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Date: Tue, 18 Nov 2014 20:48:43 -0500
Subject: [PATCH] debug: Whether we want to clear the 'dom' field or not when
 changing the 'state' bit.

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/drivers/passthrough/io.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index aad5607..24e2bd6 100644
--- a/xen/drivers/passthrough

Re: [Xen-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-19 Thread Konrad Rzeszutek Wilk

On November 19, 2014 5:52:58 AM EST, Stefano Stabellini 
stefano.stabell...@eu.citrix.com wrote:
ping?

On Tue, 18 Nov 2014, Stefano Stabellini wrote:
 Konrad,
 I think we should have this fix in Xen 4.5. Should I go ahead and
 backport it?

Go for it. Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com)

 
 On Mon, 17 Nov 2014, Don Slutz wrote:
  The other callers to blk_set_enable_write_cache() in this file
  already check for s-blk == NULL.
  
  Signed-off-by: Don Slutz dsl...@verizon.com
  ---
  
  I think this is a bugfix that should be back ported to stable
  releases.
  
  I also think this should be done in xen's copy of QEMU for 4.5 with
  back port(s) to active stable releases.
  
  Note: In 2.1 and earlier the routine is
  bdrv_set_enable_write_cache(); variable is s-bs.
  
   hw/ide/core.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/hw/ide/core.c b/hw/ide/core.c
  index 00e21cf..d4af5e2 100644
  --- a/hw/ide/core.c
  +++ b/hw/ide/core.c
  @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque,
int version_id)
   {
   IDEState *s = opaque;
   
  -if (s-identify_set) {
  +if (s-blk  s-identify_set) {
   blk_set_enable_write_cache(s-blk, !!(s-identify_data[85]
 (1  5)));
   }
   return 0;
  -- 
  1.8.4
  
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Strangeness in generated xen-command-line.html

2014-11-19 Thread Konrad Rzeszutek Wilk

On November 19, 2014 6:05:33 AM EST, Andrew Cooper andrew.coop...@citrix.com 
wrote:
On 19/11/14 11:04, Ian Campbell wrote:
 On Wed, 2014-11-19 at 10:52 +, Andrew Cooper wrote:
 On 19/11/14 10:46, Ian Campbell wrote:
 On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote:
 I've not been able to find a workaround...
 This works for me...

 8---

 From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00
2001
 From: Ian Campbell ian.campb...@citrix.com
 Date: Wed, 19 Nov 2014 10:42:18 +
 Subject: [PATCH] docs: workaround markdown parser error in
  xen-command-line.markdown

 Some versions of markdown (specifically the one in Debian Wheezy,
currently
 used to generate
 http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html)
seem to be
 confused by nested lists in the middle of multi-paragraph parent
list entries
 as seen in the com1,com2 entry.

 The effect is that the Default section of all following entries
are replace
 by some sort of hash or checksum (at least, a string of 32 random
seeming hex
 digits).

 Workaround this issue by making the decriptions of the DPS options
a nested
 list, moving the existing nested list describing the options for S
into a third
 level list. This seems to avoid the issue, and is arguably better
formatting in
 its own right (at least its not a regression IMHO)

 Signed-off-by: Ian Campbell ian.campb...@citrix.com
 I had just identified a different way, but this way is slightly
better.

 If you take out all the blank lines visible in the context below,
the
 resulting HTML will be correctly formatted and rather neater (i.e.
 without sporadic blank lines).
 Agreed.

 8--

 From 53398a9729d391f1fb7b6f753a0032b1f3604d4d Mon Sep 17 00:00:00
2001
 From: Ian Campbell ian.campb...@citrix.com
 Date: Wed, 19 Nov 2014 10:42:18 +
 Subject: [PATCH] docs: workaround markdown parser error in
  xen-command-line.markdown

 Some versions of markdown (specifically the one in Debian Wheezy,
currently
 used to generate
 http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem
to be
 confused by nested lists in the middle of multi-paragraph parent list
entries
 as seen in the com1,com2 entry.

 The effect is that the Default section of all following entries are
replace
 by some sort of hash or checksum (at least, a string of 32 random
seeming hex
 digits).

 Workaround this issue by making the decriptions of the DPS options a
nested
 list, moving the existing nested list describing the options for S
into a third
 level list. This seems to avoid the issue, and is arguably better
formatting in
 its own right (at least its not a regression IMHO)

 Signed-off-by: Ian Campbell ian.campb...@citrix.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com)

In case you were thinking of putting in 4.5

 ---
 v2: Less blank lines == nicer output.
 ---
  docs/misc/xen-command-line.markdown |   21 -
  1 file changed, 8 insertions(+), 13 deletions(-)

 diff --git a/docs/misc/xen-command-line.markdown
b/docs/misc/xen-command-line.markdown
 index 0830e5f..b7eaeea 100644
 --- a/docs/misc/xen-command-line.markdown
 +++ b/docs/misc/xen-command-line.markdown
 @@ -247,19 +247,14 @@ Both option `com1` and `com2` follow the same
format.
  * Optionally, a clock speed measured in hz can be specified.
  * `DPS` represents the number of data bits, the parity, and the
number
of stop bits.
 -
 -  `D` is an integer between 5 and 8 for the number of data bits.
 -
 -  `P` is a single character representing the type of parity:
 -
 -   * `n` No
 -   * `o` Odd
 -   * `e` Even
 -   * `m` Mark
 -   * `s` Space
 -
 -  `S` is an integer 1 or 2 for the number of stop bits.
 -
 +  * `D` is an integer between 5 and 8 for the number of data bits.
 +  * `P` is a single character representing the type of parity:
 +  * `n` No
 +  * `o` Odd
 +  * `e` Even
 +  * `m` Mark
 +  * `s` Space
 +  * `S` is an integer 1 or 2 for the number of stop bits.
  * `io-base` is an integer which specifies the IO base port for
UART
registers.
  * `irq` is the IRQ number to use, or `0` to use the UART in poll


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-19 Thread Konrad Rzeszutek Wilk

On Wed, Nov 19, 2014 at 12:16:44PM +0100, Sander Eikelenboom wrote:
 
 Wednesday, November 19, 2014, 2:55:41 AM, you wrote:
 
  On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote:
  
  Tuesday, November 18, 2014, 9:56:33 PM, you wrote:
  
   
   Uhmm i thought i had these switched off (due to problems earlier and 
   then forgot 
   about them .. however looking at the earlier reports these lines were 
   also in 
   those reports).
   
   The xen-syms and these last runs are all with a prestine xen tree 
   cloned today (staging 
   branch), so the qemu-xen and seabios defined with that were also 
   freshly cloned 
   and had a new default seabios config. (just to rule out anything stale 
   in my tree)
   
   If you don't see those messages .. perhaps your seabios and qemu trees 
   (and at least the 
   seabios config) are not the most recent (they don't get updated 
   automatically 
   when you just do a git pull on the main tree) ?
   
   In /tools/firmware/seabios-dir/.config i have:
   CONFIG_USB=y
   CONFIG_USB_UHCI=y
   CONFIG_USB_OHCI=y
   CONFIG_USB_EHCI=y
   CONFIG_USB_XHCI=y
   CONFIG_USB_MSC=y
   CONFIG_USB_UAS=y
   CONFIG_USB_HUB=y
   CONFIG_USB_KEYBOARD=y
   CONFIG_USB_MOUSE=y
   
  
   I seem to have the same thing. Perhaps it is my XHCI controller being 
   wonky.
  
   And this is all just from a:
   - git clone git://xenbits.xen.org/xen.git -b staging
   - make clean  ./configure  make -j6  make -j6 install
  
   Aye. 
   .. snip..
 1) test_and_[set|clear]_bit sometimes return unexpected values.
[But this might be invalid as the addition of the 8303faaf25a8
 might be correct - as the second dpci the softirq is processing
 could be the MSI one]
   
   Would there be an easy way to stress test this function separately in 
   some 
   debugging function to see if it indeed is returning unexpected values ?
  
   Sadly no. But you got me looking in the right direction when you 
   mentioned
   'timeout'.
   
 2) INIT_LIST_HEAD operations on the same CPU are not honored.
   
   Just curious, have you also tested the patches on AMD hardware ?
  
   Yes. To reproduce this the first thing I did was to get an AMD box.
  
   

When i look at the combination of (2) and (3), It seems it could be 
an 
interaction between the two passed through devices and/or different 
IRQ types.
   
Could be - as in it is causing this issue to show up faster than
expected. Or it is the one that triggers more than one dpci happening
at the same time.
   
   Well that didn't seem to be it (see separate amendment i mailed 
   previously)
  
   Right, the current theory I've is that the interrupts are not being
   Acked within 8 milisecond and we reset the 'state' - and at the same
   time we get an interrupt and schedule it - while we are still processing
   the same interrupt. This would explain why the 'test_and_clear_bit'
   got the wrong value.
  
   In regards to the list poison - following this thread of logic - with
   the 'state = 0' set we open the floodgates for any CPU to put the same
   'struct hvm_pirq_dpci' on its list.
  
   We do reset the 'state' on _every_ GSI that is mapped to a guest - so
   we also reset the 'state' for the MSI one (XHCI). Anyhow in your case:
  
   CPUX:   CPUY:
   pt_irq_time_out:
   state = 0;  
   [out of timer coder, theraise_softirq
pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 
   0]
  
   softirq_dpcisoftirq_dpci:
   list_del
   [entries poison]
   list_del = BOOM
   
   Is what I believe is happening.
  
   The INTX device - once I put a load on it - does not trigger
   any pt_irq_time_out, so that would explain why I cannot hit this.
  
   But I believe your card hits these hiccups.   
  
  
  Hi Konrad,
  
  I just tested you 5 patches and as a result i still got an(other) host 
  crash:
  (complete serial log attached)
  
  (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
  tainted ]
  (XEN) [2014-11-18 21:55:41.591] CPU:0
  (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
  tainted ]
  (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08012c7e7]CPU:2
  (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08014a461] 
  hvm_do_IRQ_dpci+0xbd/0x13c
  (XEN) [2014-11-18 21:55:41.591] RFLAGS: 00010006
  _spin_unlock+0x1f/0x30CONTEXT: hypervisor
 
  Duh!
 
  Here is another patch on top of the five you have (attached and inline).
 
 Hi Konrad,
 
 Happy to report it has been running with this additional patch for 2 hours 
 now 
 without any problems. I think you nailed it :-)

Could you also do an 'xl debug-keys k' and send that please?

 More than happy to test the definitive patch as well.

Re: [Xen-devel] [PATCH v10 for-xen-4.5 2/2] dpci: Replace tasklet with an softirq

2014-11-19 Thread Konrad Rzeszutek Wilk

On Fri, Nov 14, 2014 at 11:11:46AM -0500, Konrad Rzeszutek Wilk wrote:
 On Fri, Nov 14, 2014 at 03:13:42PM +, Jan Beulich wrote:
   On 12.11.14 at 03:23, konrad.w...@oracle.com wrote:
   +static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
   +{
   +struct domain *d = pirq_dpci-dom;
   +
   +ASSERT(spin_is_locked(d-event_lock));
   +
   +switch ( cmpxchg(pirq_dpci-state, 1  STATE_SCHED, 0) )
   +{
   +case (1  STATE_SCHED):
   +/*
   + * We are going to try to de-schedule the softirq before it goes 
   in
   + * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'.
   + */
   +put_domain(d);
   +/* fallthrough. */
  
  Considering Sander's report, the only suspicious place I find is this
  one: When the STATE_SCHED flag is set, pirq_dpci is on some
  CPU's list. What guarantees it to get removed from that list before
  getting inserted on another one?
 
 None. The moment that STATE_SCHED is cleared, 'raise_softirq_for'
 is free to manipulate the list.

I was too quick to say this. A bit more inspection shows that while
'raise_softirq_for' is free to manipulate the list - it won't be called.

The reason is that the pt_pirq_softirq_reset is called _after_ the IRQ
action handler are removed for this IRQ. That means we will not receive
any interrupts for it and call 'raise_softirq_for'. At least until
'pt_irq_create_bind' is called. And said function has a check for
this too:

42  * A crude 'while' loop with us dropping the spinlock and giving 
   
243  * the softirq_dpci a chance to run.

244  * We MUST check for this condition as the softirq could be scheduled   

245  * and hasn't run yet. Note that this code replaced tasklet_kill which  

246  * would have spun forever and would do the same thing (wait to flush 
out   
247  * outstanding hvm_dirq_assist calls.   

248  */ 

249 if ( pt_pirq_softirq_active(pirq_dpci) )  

Hence the patch below is not needed.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.

2014-11-19 Thread Konrad Rzeszutek Wilk

If we pass in INTx type devices to a guest on an over-subscribed
machine - and in an over-worked guest - we can cause the
pirq_dpci-softirq_list to become corrupted.

The reason for this is that the 'pt_irq_guest_eoi' ends up
setting the 'state' to zero value. However the 'state' value
(STATE_SCHED, STATE_RUN) is used to communicate between
 'raise_softirq_for' and 'dpci_softirq' to determine whether the
'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the
teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was
not adhering to the proper dialogue and was not using locked cmpxchg or
test_bit operations and ended setting 'state' set to zero. That
meant 'raise_softirq_for' was free to schedule it while the
'struct hvm_pirq_dpci'' was still on an per-cpu list.
The end result was list_del being called twice and the second call
corrupting the per-cpu list.

For this to occur one of the CPUs must be in the idle loop executing
softirqs and the interrupt handler in the guest must not
respond to the pending interrupt within 8ms, and we must receive
another interrupt for this device on another CPU.

CPU0:  CPU1:

timer_softirq_action
 \- pt_irq_time_out
 state = 0;do_IRQ
 [out of timer code, theraise_softirq
 pirq_dpci is on the CPU0 dpci_list]  [adds the pirq_dpci to CPU1
   dpci_list as state == 0]

softirq_dpci:softirq_dpci:
list_del
[list entries are poisoned]
list_del = BOOM

The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked
semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was
pt_pirq_softirq_reset) to use cmpxchg. We also expand said function
to reset the '-dom' only on the teardown paths - but not on the
timeouts.

Reported-by: Sander Eikelenboom li...@eikelenboom.it
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/drivers/passthrough/io.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..2039d31 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -57,7 +57,7 @@ enum {
  * This can be called multiple times, but the softirq is only raised once.
  * That is until the STATE_SCHED state has been cleared. The state can be
  * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
- * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
+ * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
  * the softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
@@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci 
*pirq_dpci)
 }
 
 /*
- * Reset the pirq_dpci-dom parameter to NULL.
+ * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set,
+ * reset pirq_dpci-dom parameter to NULL (used for teardown).
  *
  * This function checks the different states to make sure it can do it
  * at the right time. If it unschedules the 'hvm_dirq_assist' from running
  * it also refcounts (which is what the softirq would have done) properly.
  */
-static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
+static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
+   unsigned int clear)
 {
 struct domain *d = pirq_dpci-dom;
 
@@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci 
*pirq_dpci)
  * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
  * in local variable before it sets STATE_RUN - and therefore will not
  * dereference '-dom' which would crash.
+ *
+ * However, if this is called from 'pt_irq_time_out' we do not want to
+ * clear the '-dom' as we can re-use the 'pirq_dpci' after that and
+ * need '-dom'.
  */
-pirq_dpci-dom = NULL;
+if ( clear )
+pirq_dpci-dom = NULL;
 break;
 }
 }
@@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci,
 if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
   pirq_dpci-flags) )
 {
-pirq_dpci-state = 0;
+pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */);
 pirq_dpci-pending = 0;
 pirq_guest_eoi(dpci_pirq(pirq_dpci));
 }
@@ -285,7 +292,7 @@ int pt_irq_create_bind(
  * to be scheduled but we must deal with the one that may 
be
  * in the queue.
  */
-pt_pirq_softirq_reset(pirq_dpci);
+pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */);
 }
 }
 if ( unlikely(rc) )
@@ -536,9 +543,9 @@ int

Re: [Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry

2014-11-19 Thread Konrad Rzeszutek Wilk

On Wed, Nov 19, 2014 at 05:44:49PM +, Stefano Stabellini wrote:
 UIE being set can cause maintenance interrupts to occur when Xen writes
 to one or more LR registers. The effect is a busy loop around the
 interrupt handler in Xen
 (http://marc.info/?l=xen-develm=141597517132682): everything gets stuck.
 
 Konrad, this fixes an actual bug, at least on OMAP5. It should have no
 bad side effects on any other platforms as far as I can tell. It should
 go in 4.5.

Have you checked (aka ran the tests) on the other platforms?
 
 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Tested-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com
  ^^^
 'Reported-and-Tested-by'
 
 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index 70d10d6..df140b9 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
  if ( is_idle_vcpu(v) )
  return;
  
 +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
 +
  spin_lock_irqsave(v-arch.vgic.lock, flags);
  
  while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
 @@ -527,8 +529,6 @@ void gic_inject(void)
  
  if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
  gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
 -else
 -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
  }
  
  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V3 2/8] xen: Delay remapping memory of pv-domain

2014-11-19 Thread Konrad Rzeszutek Wilk

On Fri, Nov 14, 2014 at 06:14:06PM +0100, Juergen Gross wrote:
 On 11/14/2014 05:47 PM, Konrad Rzeszutek Wilk wrote:
 On Fri, Nov 14, 2014 at 05:53:19AM +0100, Juergen Gross wrote:
 On 11/13/2014 08:56 PM, Konrad Rzeszutek Wilk wrote:
 +   mfn_save = virt_to_mfn(buf);
 +
 +   while (xen_remap_mfn != INVALID_P2M_ENTRY) {
 
 So the 'list' is constructed by going forward - that is from low-numbered
 PFNs to higher numbered ones. But the 'xen_remap_mfn' is going the
 other way - from the highest PFN to the lowest PFN.
 
 Won't that mean we will restore the chunks of memory in the wrong
 order? That is we will still restore them in chunks size, but the
 chunks will be in descending order instead of ascending?
 
 No, the information where to put each chunk is contained in the chunk
 data. I can add a comment explaining this.
 
 Right, the MFNs in a chunks are going to be restored in the right order.
 
 I was thinking that the chunks (so a set of MFNs) will be restored in
 the opposite order that they are written to.
 
 And oddly enough the chunks are done in 512-3 = 509 MFNs at once?
 
 More don't fit on a single page due to the other info needed. So: yes.
 
 But you could use two pages - one for the structure and the other
 for the list of MFNs. That would fix the problem of having only
 509 MFNs being contingous per chunk when restoring.
 
 That's no problem (see below).
 
 Anyhow the point I had that I am worried is that we do not restore the
 MFNs in the same order. We do it in chunk size which is OK (so the 509 MFNs
 at once)- but the order we traverse the restoration process is the opposite 
 of
 the save process. Say we have 4MB of contingous MFNs, so two (err, three)
 chunks. The first one we iterate is from 0-509, the second is 510-1018, the
 last is 1019-1023. When we restore (remap) we start with the last 'chunk'
 so we end up restoring them: 1019-1023, 510-1018, 0-509 order.
 
 No. When building up the chunks we save in each chunk where to put it
 on remap. So in your example 0-509 should be mapped at dest+0,
 510-1018 at dest+510, and 1019-1023 at dest+1019.
 
 When remapping we map 1019-1023 to dest+1019, 510-1018 at dest+510
 and last 0-509 at dest+0. So we do the mapping in reverse order, but
 to the correct pfns.

Excellent! Could a condensed version of that explanation be put in the code ?

 
 Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V3 7/8] xen: switch to linear virtual mapped sparse p2m list

2014-11-19 Thread Konrad Rzeszutek Wilk

On Tue, Nov 11, 2014 at 06:43:45AM +0100, Juergen Gross wrote:
 At start of the day the Xen hypervisor presents a contiguous mfn list
 to a pv-domain. In order to support sparse memory this mfn list is
 accessed via a three level p2m tree built early in the boot process.
 Whenever the system needs the mfn associated with a pfn this tree is
 used to find the mfn.
 
 Instead of using a software walked tree for accessing a specific mfn
 list entry this patch is creating a virtual address area for the
 entire possible mfn list including memory holes. The holes are
 covered by mapping a pre-defined  page consisting only of invalid
 mfn entries. Access to a mfn entry is possible by just using the
 virtual base address of the mfn list and the pfn as index into that
 list. This speeds up the (hot) path of determining the mfn of a
 pfn.
 
 Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
 showed following improvements:
 
 Elapsed time: 32:50 -  32:35
 System:   18:07 -  17:47
 User:104:00 - 103:30
 
 Tested on 64 bit dom0 and 32 bit domU.
 
 Signed-off-by: Juergen Gross jgr...@suse.com
 ---
  arch/x86/include/asm/xen/page.h |  14 +-
  arch/x86/xen/mmu.c  |  32 +-
  arch/x86/xen/p2m.c  | 732 
 +---
  arch/x86/xen/xen-ops.h  |   2 +-
  4 files changed, 342 insertions(+), 438 deletions(-)
 
 diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
 index 07d8a7b..4a227ec 100644
 --- a/arch/x86/include/asm/xen/page.h
 +++ b/arch/x86/include/asm/xen/page.h
 @@ -72,7 +72,19 @@ extern unsigned long m2p_find_override_pfn(unsigned long 
 mfn, unsigned long pfn)
   */
  static inline unsigned long __pfn_to_mfn(unsigned long pfn)
  {
 - return get_phys_to_machine(pfn);
 + unsigned long mfn;
 +
 + if (pfn  xen_p2m_size)
 + mfn = xen_p2m_addr[pfn];
 + else if (unlikely(pfn  xen_max_p2m_pfn))
 + return get_phys_to_machine(pfn);
 + else
 + return IDENTITY_FRAME(pfn);
 +
 + if (unlikely(mfn == INVALID_P2M_ENTRY))
 + return get_phys_to_machine(pfn);
 +
 + return mfn;
  }
  
  static inline unsigned long pfn_to_mfn(unsigned long pfn)
 diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
 index 31ca515..0b43c45 100644
 --- a/arch/x86/xen/mmu.c
 +++ b/arch/x86/xen/mmu.c
 @@ -1158,20 +1158,16 @@ static void __init xen_cleanhighmap(unsigned long 
 vaddr,
* instead of somewhere later and be confusing. */
   xen_mc_flush();
  }
 -static void __init xen_pagetable_p2m_copy(void)
 +
 +static void __init xen_pagetable_p2m_free(void)
  {
   unsigned long size;
   unsigned long addr;
 - unsigned long new_mfn_list;
 -
 - if (xen_feature(XENFEAT_auto_translated_physmap))
 - return;
  
   size = PAGE_ALIGN(xen_start_info-nr_pages * sizeof(unsigned long));
  
 - new_mfn_list = xen_revector_p2m_tree();
   /* No memory or already called. */
 - if (!new_mfn_list || new_mfn_list == xen_start_info-mfn_list)
 + if ((unsigned long)xen_p2m_addr == xen_start_info-mfn_list)
   return;
  
   /* using __ka address and sticking INVALID_P2M_ENTRY! */
 @@ -1189,8 +1185,6 @@ static void __init xen_pagetable_p2m_copy(void)
  
   size = PAGE_ALIGN(xen_start_info-nr_pages * sizeof(unsigned long));
   memblock_free(__pa(xen_start_info-mfn_list), size);
 - /* And revector! Bye bye old array */
 - xen_start_info-mfn_list = new_mfn_list;
  
   /* At this stage, cleanup_highmap has already cleaned __ka space
* from _brk_limit way up to the max_pfn_mapped (which is the end of
 @@ -1214,12 +1208,26 @@ static void __init xen_pagetable_p2m_copy(void)
  }
  #endif
  
 -static void __init xen_pagetable_init(void)
 +static void __init xen_pagetable_p2m_setup(void)
  {
 - paging_init();
 + if (xen_feature(XENFEAT_auto_translated_physmap))
 + return;
 +
 + xen_vmalloc_p2m_tree();
 +
  #ifdef CONFIG_X86_64
 - xen_pagetable_p2m_copy();
 + xen_pagetable_p2m_free();
  #endif
 + /* And revector! Bye bye old array */
 + xen_start_info-mfn_list = (unsigned long)xen_p2m_addr;
 +}
 +
 +static void __init xen_pagetable_init(void)
 +{
 + paging_init();
 +
 + xen_pagetable_p2m_setup();
 +
   /* Allocate and initialize top and mid mfn levels for p2m structure */
   xen_build_mfn_list_list();
  
 diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
 index 328875a..7df446d 100644
 --- a/arch/x86/xen/p2m.c
 +++ b/arch/x86/xen/p2m.c
 @@ -3,21 +3,22 @@
   * guests themselves, but it must also access and update the p2m array
   * during suspend/resume when all the pages are reallocated.
   *
 - * The p2m table is logically a flat array, but we implement it as a
 - * three-level tree to allow the address space to be sparse.
 + * The logical flat p2m table is mapped to a linear kernel memory area.
 + * For accesses by Xen a three-level tree linked via

Re: [Xen-devel] [PATCH V3 0/8] xen: Switch to virtual mapped linear p2m list

2014-11-19 Thread Konrad Rzeszutek Wilk

On Tue, Nov 11, 2014 at 06:43:38AM +0100, Juergen Gross wrote:
 Paravirtualized kernels running on Xen use a three level tree for
 translation of guest specific physical addresses to machine global
 addresses. This p2m tree is used for construction of page table
 entries, so the p2m tree walk is performance critical.
 
 By using a linear virtual mapped p2m list accesses to p2m elements
 can be sped up while even simplifying code. To achieve this goal
 some p2m related initializations have to be performed later in the
 boot process, as the final p2m list can be set up only after basic
 memory management functions are available.
 

Hey Juergen,

I finially finished looking at the patchset. Had some comments,
some questions that I hope can make it in the patch so that in
six months or so when somebody looks at the code they can
understand the subtle pieces.

Looking forward to the v4! (Thought keep in mind that next week
is Thanksgiving week so won't be able to look much after Wednesday)

  arch/x86/include/asm/pgtable_types.h |1 +
  arch/x86/include/asm/xen/page.h  |   49 +-
  arch/x86/mm/pageattr.c   |   20 +
  arch/x86/xen/mmu.c   |   38 +-
  arch/x86/xen/p2m.c   | 1315 
 ++
  arch/x86/xen/setup.c |  460 ++--
  arch/x86/xen/xen-ops.h   |6 +-
  7 files changed, 854 insertions(+), 1035 deletions(-)

And best of - we are deleting more code!

 
 -- 
 2.1.2
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug

2014-11-19 Thread Konrad Rzeszutek Wilk

On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote:
 The existence check is to make sure a device is not added to a guest
 multiple times.
 
 PCI device backend path has different rules from vif, disk etc. For
 example:
 /local/domain/0/backend/pci/9/0/dev-1/:03:10.1
 /local/domain/0/backend/pci/9/0/key-1/:03:10.1
 /local/domain/0/backend/pci/9/0/dev-2/:03:10.2
 /local/domain/0/backend/pci/9/0/key-2/:03:10.2
 
 The devid for PCI devices is hardcoded 0. libxl__device_exists only
 checks up to /local/.../9/0 so it always returns true even the device is
 assignable.
 
 Remove invocation of libxl__device_exists. We're sure at this point that
 the PCI device is assignable (hence no xenstore entry or JSON entry).
 The check is done before hand. For HVM guest it's done by calling
 xc_test_assign_device and for PV guest it's done by calling
 pciback_dev_is_assigned.
 
 Reported-by: Li, Liang Z liang.z...@intel.com
 Signed-off-by: Wei Liu wei.l...@citrix.com
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Konrad Wilk konrad.w...@oracle.com
 ---
 This patch fixes a regression in 4.5.

Ouch! That needs then to be fixed.

Is the version you would want to commit? I did test it - and it
looked to do the right thing - thought the xen-pciback is stuck in the
7 state. However that is a seperate issue that I believe is due to
Xen pciback not your patches.

 
 The risk is that I misunderstood semantics of xc_test_assign_device and
 pciback_dev_is_assigned and end up adding several entries to JSON config
 template. But if the assignable tests are incorrect I think we have a
 bigger problem to worry about than duplicated entries in JSON template.
 
 It would be good for someone to have PCI hotplug setup to run a quick test.  I
 think Liang confirmed (indrectly) that xc_test_assign_device worked well for
 him so I think there's won't be multiple JSON template entries for HVM guests.
 However PV side still remains to be tested.
 ---
  tools/libxl/libxl_pci.c |8 
  1 file changed, 8 deletions(-)
 
 diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
 index 9f40100..316643c 100644
 --- a/tools/libxl/libxl_pci.c
 +++ b/tools/libxl/libxl_pci.c
 @@ -175,14 +175,6 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
 uint32_t domid, libxl_d
  rc = libxl__xs_transaction_start(gc, t);
  if (rc) goto out;
  
 -rc = libxl__device_exists(gc, t, device);
 -if (rc  0) goto out;
 -if (rc == 1) {
 -LOG(ERROR, device already exists in xenstore);
 -rc = ERROR_DEVICE_EXISTS;
 -goto out;
 -}
 -
  rc = libxl__set_domain_configuration(gc, domid, d_config);
  if (rc) goto out;
  
 -- 
 1.7.10.4
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v9 12/13] swiotlb-xen: pass dev_addr to xen_dma_unmap_page and xen_dma_sync_single_for_cpu

2014-11-19 Thread Konrad Rzeszutek Wilk

On Wed, Nov 12, 2014 at 11:40:53AM +, Stefano Stabellini wrote:
 xen_dma_unmap_page and xen_dma_sync_single_for_cpu take a dma_addr_t
 handle as argument, not a physical address.

Ouch. Should this also go on stable tree?

 
 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Reviewed-by: Catalin Marinas catalin.mari...@arm.com
 ---
  drivers/xen/swiotlb-xen.c |6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
 index 3725ee4..498b654 100644
 --- a/drivers/xen/swiotlb-xen.c
 +++ b/drivers/xen/swiotlb-xen.c
 @@ -449,7 +449,7 @@ static void xen_unmap_single(struct device *hwdev, 
 dma_addr_t dev_addr,
  
   BUG_ON(dir == DMA_NONE);
  
 - xen_dma_unmap_page(hwdev, paddr, size, dir, attrs);
 + xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs);
  
   /* NOTE: We use dev_addr here, not paddr! */
   if (is_xen_swiotlb_buffer(dev_addr)) {
 @@ -497,14 +497,14 @@ xen_swiotlb_sync_single(struct device *hwdev, 
 dma_addr_t dev_addr,
   BUG_ON(dir == DMA_NONE);
  
   if (target == SYNC_FOR_CPU)
 - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir);
 + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir);
  
   /* NOTE: We use dev_addr here, not paddr! */
   if (is_xen_swiotlb_buffer(dev_addr))
   swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target);
  
   if (target == SYNC_FOR_DEVICE)
 - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir);
 + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir);
  
   if (dir != DMA_FROM_DEVICE)
   return;
 -- 
 1.7.10.4
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] set pv guest default video_memkb to 0

2014-11-19 Thread Konrad Rzeszutek Wilk

On Tue, Nov 18, 2014 at 03:57:08PM -0500, Zhigang Wang wrote:
 Before this patch, pv guest video_memkb is -1, which is an invalid value.
 And it will cause the xenstore 'memory/targe' calculation wrong:
 
 memory/target = info-target_memkb - info-video_memkb

CC-ing the maintainers.

Is this an regression as compared to Xen 4.4 or is this also in Xen 4.4?

Thanks.

 
 Signed-off-by: Zhigang Wang zhigang.x.w...@oracle.com
 ---
  tools/libxl/libxl_create.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
 index b1ff5ae..1198225 100644
 --- a/tools/libxl/libxl_create.c
 +++ b/tools/libxl/libxl_create.c
 @@ -357,6 +357,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
  break;
  case LIBXL_DOMAIN_TYPE_PV:
  libxl_defbool_setdefault(b_info-u.pv.e820_host, false);
 +if (b_info-video_memkb == LIBXL_MEMKB_DEFAULT)
 +b_info-video_memkb = 0;
  if (b_info-shadow_memkb == LIBXL_MEMKB_DEFAULT)
  b_info-shadow_memkb = 0;
  if (b_info-u.pv.slack_memkb == LIBXL_MEMKB_DEFAULT)
 -- 
 1.8.3.1
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] docs/commandline: Fix formatting issues

2014-11-19 Thread Konrad Rzeszutek Wilk

On Wed, Nov 19, 2014 at 11:22:18AM +, Ian Campbell wrote:
 On Wed, 2014-11-19 at 11:17 +, Andrew Cooper wrote:
  In both of these cases, markdown was interpreting the text as regular text,
  and reflowing it as a regular paragraph, leading to a single line as output.
  Reformat them as code blocks inside blockquote blocks, which causes them to
  take their precise whitespace layout.
  
  Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
 Acked-by: Ian Campbell ian.campb...@citrix.com
 
  CC: Ian Jackson ian.jack...@eu.citrix.com
  CC: Wei Liu wei.l...@citrix.com
  CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  
  ---
  
  Konrad: this is a documentation fix, so requesting a 4.5 ack please.
 
 FWIW IMHO documentation fixes in general should have a very low bar to
 cross until very late in the release cycle...

I concur, I updated the release criteria doc so that it will be expediated
in the future.

 
  ---
   docs/misc/xen-command-line.markdown |   38 
  +--
   1 file changed, 19 insertions(+), 19 deletions(-)
  
  diff --git a/docs/misc/xen-command-line.markdown 
  b/docs/misc/xen-command-line.markdown
  index f054d4b..e3a5a15 100644
  --- a/docs/misc/xen-command-line.markdown
  +++ b/docs/misc/xen-command-line.markdown
  @@ -475,13 +475,13 @@ defaults of 1 and unlimited respectively are used 
  instead.
   
   For example, with `dom0_max_vcpus=4-8`:
   
  - Number of
  -  PCPUs | Dom0 VCPUs
  -   2|  4
  -   4|  4
  -   6|  6
  -   8|  8
  -  10|  8
  +Number of
  + PCPUs | Dom0 VCPUs
  +  2|  4
  +  4|  4
  +  6|  6
  +  8|  8
  + 10|  8
   
   ### dom0\_mem
`= List of ( min:size | max:size | size )`
  @@ -684,18 +684,18 @@ supported only when compiled with XSM\_ENABLE=y on 
  x86.
   The specified value is a bit mask with the individual bits having the
   following meaning:
   
  -Bit  0 - debug level 0 (unused at present)
  -Bit  1 - debug level 1 (Control Register logging)
  -Bit  2 - debug level 2 (VMX logging of MSR restores when context switching)
  -Bit  3 - debug level 3 (unused at present)
  -Bit  4 - I/O operation logging
  -Bit  5 - vMMU logging
  -Bit  6 - vLAPIC general logging
  -Bit  7 - vLAPIC timer logging
  -Bit  8 - vLAPIC interrupt logging
  -Bit  9 - vIOAPIC logging
  -Bit 10 - hypercall logging
  -Bit 11 - MSR operation logging
  + Bit  0 - debug level 0 (unused at present)
  + Bit  1 - debug level 1 (Control Register logging)
  + Bit  2 - debug level 2 (VMX logging of MSR restores when context 
  switching)
  + Bit  3 - debug level 3 (unused at present)
  + Bit  4 - I/O operation logging
  + Bit  5 - vMMU logging
  + Bit  6 - vLAPIC general logging
  + Bit  7 - vLAPIC timer logging
  + Bit  8 - vLAPIC interrupt logging
  + Bit  9 - vIOAPIC logging
  + Bit 10 - hypercall logging
  + Bit 11 - MSR operation logging
   
   Recognized in debug builds of the hypervisor only.
   
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug

2014-11-19 Thread Konrad Rzeszutek Wilk

On Wed, Nov 19, 2014 at 09:21:23PM +, Wei Liu wrote:
 On Wed, Nov 19, 2014 at 04:01:54PM -0500, Konrad Rzeszutek Wilk wrote:
  On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote:
   The existence check is to make sure a device is not added to a guest
   multiple times.
   
   PCI device backend path has different rules from vif, disk etc. For
   example:
   /local/domain/0/backend/pci/9/0/dev-1/:03:10.1
   /local/domain/0/backend/pci/9/0/key-1/:03:10.1
   /local/domain/0/backend/pci/9/0/dev-2/:03:10.2
   /local/domain/0/backend/pci/9/0/key-2/:03:10.2
   
   The devid for PCI devices is hardcoded 0. libxl__device_exists only
   checks up to /local/.../9/0 so it always returns true even the device is
   assignable.
   
   Remove invocation of libxl__device_exists. We're sure at this point that
   the PCI device is assignable (hence no xenstore entry or JSON entry).
   The check is done before hand. For HVM guest it's done by calling
   xc_test_assign_device and for PV guest it's done by calling
   pciback_dev_is_assigned.
   
   Reported-by: Li, Liang Z liang.z...@intel.com
   Signed-off-by: Wei Liu wei.l...@citrix.com
   Cc: Ian Campbell ian.campb...@citrix.com
   Cc: Ian Jackson ian.jack...@eu.citrix.com
   Cc: Konrad Wilk konrad.w...@oracle.com
   ---
   This patch fixes a regression in 4.5.
  
  Ouch! That needs then to be fixed.
  
  Is the version you would want to commit? I did test it - and it
 
 Yes.

Then Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
  looked to do the right thing - thought the xen-pciback is stuck in the
  7 state. However that is a seperate issue that I believe is due to
  Xen pciback not your patches.
  
 
 Thanks for testing.
 
 Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v0 RFC 0/2] xl/libxl support for PVUSB

2014-11-19 Thread Konrad Rzeszutek Wilk

On Sun, Nov 16, 2014 at 10:36:28AM +0800, Simon Cao wrote:
 Hi,
 
 I was working on the work. But I was busing preparing some job interviews
 in the last three months, sorry for this long delay. I will update my
 progress in a few days.

OK, I put your name for this to be in Xen 4.6.

Thanks!
 
 Thanks!
 
 Bo Cao
 
 On Mon, Nov 10, 2014 at 4:37 PM, Chun Yan Liu cy...@suse.com wrote:
 
  Is there any progress on this work? I didn't see new version after this.
  Anyone knows the status?
 
  Thanks,
  Chunyan
 
   On 8/11/2014 at 04:23 AM, in message
  1407702234-22309-1-git-send-email-caobosi...@gmail.com, Bo Cao
  caobosi...@gmail.com wrote:
   Finally I have a workable version xl/libxl support for PVUSB. Most of
   its commands work property now, but there are still some probelm to be
   solved.
   Please take a loot and give me some advices.
  
   == What have been implemented ? ==
   I have implemented libxl functions for PVUSB in libxl_usb.c. It mainly
   consists of two part:
   usbctrl_add/remove/list and usb_add/remove/list in which usbctrl denote
  usb
   controller in which
   usd device can be plugged in. I don't use ao_dev in
   libxl_deivce_usbctrl_add since we don't need to
   execute hotplug script for usbctrl and without ao_dev, adding default
   usbctrl for usb device
   would be easier.
  
   For the cammands to manipulate usb device such as xl usb-attach and xl
   usb-detach, this patch now only
   support to specify usb devices by their interface in sysfs. Using this
   interface, we can read usb device
   information through sysfs and bind/unbind usb device. (The support for
   mapping the lsusb bus:addr to the
   sysfs usb interface will come later).
  
   == What needs to do next ? ==
   There are two main problems to be solved.
  
   1.  PVUSB Options in VM Guest's Configuration File
   The interface in VM Guest's configuration file to add usb device is:
   usb=[interface=1-1].
   But the problem is now is that after the default usbctrl is added, the
  state
   of usbctrl is 2, e,g, XenbusStateInitWait,
   waiting for xen-usbfront to connect. The xen-usbfront in VM Guest isn't
   loaded. Therefore, sysfs_intf_write
   will report error. Does anyone have any clue how to solve this?
  
   2. sysfs_intf_write
   In the process of xl usb-attach domid intf=1-1, after writing
  1-1 to
   Xenstore entry, we need to
   bind the controller of this usb device to usbback driver so that it can
  be
   used by VM Guest. For exampele,
   for usb device 1-1, it's controller interface maybe 1-1:1.0, and we
   write this value to /sys/bus/usb/driver/usbback/bind.
   But for some devices, they have two controllers, for example 1-1:1.0
  and
   1-1:1.1. I think this means it has two functions,
   such as usbhid and usb-storage. So in this case, we bind the two
  controller
   to usbback?
  
   
   There maybe some errors or bugs in the codes. Feel free to tell me.
  
   Cheers,
  
   - Simon
  
   ---
   CC: George Dunlap george.dun...@eu.citrix.com
   CC: Ian Jackson ian.jack...@citrix.com
   CC: Ian Campbell ian.campb...@citrix.com
   CC: Pasi Kärkkäinen pa...@iki.fi
   CC: Lars Kurth lars.ku...@citrix.com
  
  
  
   ___
   Xen-devel mailing list
   Xen-devel@lists.xen.org
   http://lists.xen.org/xen-devel
  
 
 

 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.

2014-11-19 Thread Konrad Rzeszutek Wilk

On Wed, Nov 19, 2014 at 08:17:35PM +0100, Sander Eikelenboom wrote:
 
 Wednesday, November 19, 2014, 8:01:31 PM, you wrote:
 
  On Wed, Nov 19, 2014 at 07:54:39PM +0100, Sander Eikelenboom wrote:
  
  Wednesday, November 19, 2014, 6:31:39 PM, you wrote:
  
   Hey,
  
   This patch should fix the issue that Sander had seen. The full details
   are in the patch itself. Sander, if you could - please test 
   origin/staging
   with this patch to make sure it does fix the issue.
  
  
xen/drivers/passthrough/io.c | 27 +--
  
   Konrad Rzeszutek Wilk (1):
 dpci: Fix list corruption if INTx device is used and an IRQ 
   timeout is invoked.
  
1 file changed, 17 insertions(+), 10 deletions(-)
  
  
  Hi Konrad,
  
  Hmm just tested with a freshly cloned tree .. unfortunately it blew up 
  again.
  (i must admit i also re-enabled stuff i had disabled in debugging like, 
  cpuidle, cpufreq). 
 
  Argh.
 
  Could you also try the first patch the STATE_ZOMBIE one?
 
 Building now ..

(Attached and inline)

Sander mentioned to me over IRC that with the STATE_ZOMBIE patch things work 
peachy for him.

The patch in combination with the previous adds two extra paths:

1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is 
cleared.
2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the 
STATE_ZOMBIE.

Lets follow the case without the zombie patch and with the zombie patch:

w/o zombie:

timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
0.
pirq_dpci is still on dpci_list.
dpci_sofitrq
while (!list_emptry(our_list))
list_del, but has not yet done 'entry-next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the 
dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry-next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.
dpci_softirq
while (!list_emtpry)
list_del, but -next already has LIST_POISON1 and we blow up.


w/ zombie:
timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
STATE_ZOMBIE.
pirq_dpci is still on dpci_list.
dpci_sofitrq
while (!list_emptry(our_list))
list_del, but has not yet done 'entry-next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state, it is STATE_ZOMBIE so returns.
[interrupt is done, back to dpci_softirq]
finishes the entry-next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.

So it seems that the STATE_ZOMBIE is needed, but for a different reason that
Jan initially thought of:


From c89a97f695fda245f5fcb16ddb36d3df7f6f28b9 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Date: Fri, 14 Nov 2014 12:15:26 -0500
Subject: [PATCH] dpci: Add ZOMBIE state to allow the softirq to finish with
 the dpci_pirq.

When we want to cancel an outstanding 'struct hvm_pirq_dpci' we perform
and cmpxch on the state to set it to zero. That is OK on the teardown
paths as it is guarnateed that the do_IRQ action handler has been removed.
Hence no more interrupts can be scheduled. But with the introduction
of dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
invoked.
we now utilize the pt_pirq_softirq_cancel when we want to cancel
outstanding operations. However once we cancel them the do_IRQ is
free to schedule them back in - even if said 'struct hvm_pirq_dpci'
is still on the dpci_list.

The code base before this patch could follow this race:

\-timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
0.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(our_list))
list_del, but has not yet done 'entry-next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the 
dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry-next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.

\- dpci_softirq
while (!list_emtpry)
list_del, but -next already has LIST_POISON1 and we blow up.

This patch in combination adds two extra paths:

1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is 
cleared.
2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the 
STATE_ZOMBIE.

Using the example above the code-paths would be now:
\- timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
STATE_ZOMBIE.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(our_list

[Xen-devel] [for xen-4.5 PATCH v2] Fix list corruption in dpci_softirq.

2014-11-19 Thread Konrad Rzeszutek Wilk

Hey,

Attached are two patches that fix the dpci_softirq list corruption
that Sander was observing.


 xen/drivers/passthrough/io.c | 55 +++-
 1 file changed, 39 insertions(+), 16 deletions(-)

Konrad Rzeszutek Wilk (2):
  dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
invoked.
  dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.

2014-11-19 Thread Konrad Rzeszutek Wilk

If we pass in INTx type devices to a guest on an over-subscribed
machine - and in an over-worked guest - we can cause the
pirq_dpci-softirq_list to become corrupted.

The reason for this is that the 'pt_irq_guest_eoi' ends up
setting the 'state' to zero value. However the 'state' value
(STATE_SCHED, STATE_RUN) is used to communicate between
 'raise_softirq_for' and 'dpci_softirq' to determine whether the
'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the
teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was
not adhering to the proper dialogue and was not using locked cmpxchg or
test_bit operations and ended setting 'state' set to zero. That
meant 'raise_softirq_for' was free to schedule it while the
'struct hvm_pirq_dpci'' was still on an per-cpu list.
The end result was list_del being called twice and the second call
corrupting the per-cpu list.

For this to occur one of the CPUs must be in the idle loop executing
softirqs and the interrupt handler in the guest must not
respond to the pending interrupt within 8ms, and we must receive
another interrupt for this device on another CPU.

CPU0:  CPU1:

timer_softirq_action
 \- pt_irq_time_out
 state = 0;do_IRQ
 [out of timer code, theraise_softirq
 pirq_dpci is on the CPU0 dpci_list]  [adds the pirq_dpci to CPU1
   dpci_list as state == 0]

softirq_dpci:softirq_dpci:
list_del
[list entries are poisoned]
list_del = BOOM

The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked
semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was
pt_pirq_softirq_reset) to use cmpxchg. We also expand said function
to reset the '-dom' only on the teardown paths - but not on the
timeouts.

Reported-and-Tested-by: Sander Eikelenboom li...@eikelenboom.it
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/drivers/passthrough/io.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..2039d31 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -57,7 +57,7 @@ enum {
  * This can be called multiple times, but the softirq is only raised once.
  * That is until the STATE_SCHED state has been cleared. The state can be
  * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
- * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
+ * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
  * the softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
@@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci 
*pirq_dpci)
 }
 
 /*
- * Reset the pirq_dpci-dom parameter to NULL.
+ * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set,
+ * reset pirq_dpci-dom parameter to NULL (used for teardown).
  *
  * This function checks the different states to make sure it can do it
  * at the right time. If it unschedules the 'hvm_dirq_assist' from running
  * it also refcounts (which is what the softirq would have done) properly.
  */
-static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
+static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
+   unsigned int clear)
 {
 struct domain *d = pirq_dpci-dom;
 
@@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci 
*pirq_dpci)
  * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
  * in local variable before it sets STATE_RUN - and therefore will not
  * dereference '-dom' which would crash.
+ *
+ * However, if this is called from 'pt_irq_time_out' we do not want to
+ * clear the '-dom' as we can re-use the 'pirq_dpci' after that and
+ * need '-dom'.
  */
-pirq_dpci-dom = NULL;
+if ( clear )
+pirq_dpci-dom = NULL;
 break;
 }
 }
@@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci,
 if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
   pirq_dpci-flags) )
 {
-pirq_dpci-state = 0;
+pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */);
 pirq_dpci-pending = 0;
 pirq_guest_eoi(dpci_pirq(pirq_dpci));
 }
@@ -285,7 +292,7 @@ int pt_irq_create_bind(
  * to be scheduled but we must deal with the one that may 
be
  * in the queue.
  */
-pt_pirq_softirq_reset(pirq_dpci);
+pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */);
 }
 }
 if ( unlikely(rc) )
@@ -536,9

Re: [Xen-devel] [PATCH v2 for 4.5] xl: Return proper error codes for block-attach and block-detach

2014-11-20 Thread Konrad Rzeszutek Wilk

On Thu, Nov 20, 2014 at 03:47:04PM +, Ian Campbell wrote:
 On Mon, 2014-11-17 at 12:36 +, George Dunlap wrote:
  On 11/14/2014 11:12 AM, Ian Campbell wrote:
   On Thu, 2014-11-13 at 19:04 +, George Dunlap wrote:
   Return proper error codes on failure so that scripts can tell whether
   the command completed properly or not.
  
   Also while we're at it, normalize init and dispose of
   libxl_device_disk structures.  This means calling init and dispose in
   the actual functions where they are declared.
  
   This in turn means changing parse_disk_config_multistring() to not
   call libxl_device_disk_init.  And that in turn means changes to
   callers of parse_disk_config().
  
   It also means removing libxl_device_disk_init() from
   libxl.c:libxl_vdev_to_device_disk().  This is only called from
   xl_cmdimpl.c.
  
   Signed-off-by: George Dunlap george.dun...@eu.citrix.com
   ---
   CC: Ian Campbell ian.campb...@citrix.com
   CC: Ian Jackson ian.jack...@citrix.com
   CC: Wei Liu wei.l...@citrix.com
   CC: Konrad Wilk konrad.w...@oracle.com
  
   Release justification: This is a bug fix.  It's a fairly minor one,
   but it's also a very simple one.
  
   v2:
 - Restructure functions to make sure init and dispose are properly
 called.
   Sadly this bit has somewhat reduced the truth of the second half of your
   release justification, since the patch is a fair bit more subtle though.
   Although IMHO it hasn't invalidated the case for taking the patch for
   4.5 (modulo comments below).
  
  Well, I think we can take the hacky short-term fix I posted before, 
  which is simple, and do a proper fix after the 4.6 dev window opens up; 
  or we can do a more complete fix now.
 
 Specifically the hacky short-term fix is
 1415813493-25330-1-git-send-email-george.dun...@eu.citrix.com ?
 
 I could live with that, perhaps with the commit log explaining that a
 little.
 
 Konrad?

hands George the band-aid tape

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
  
  Or, if the valgrind thing is really important, I could use the change 
  you suggested, and do return rc ? 1 : 0; but I really think that's 
  going in the wrong direction...
  
-George
  
  
   ---
 tools/libxl/libxl.c  |  2 --
 tools/libxl/xl_cmdimpl.c | 28 +---
 2 files changed, 21 insertions(+), 9 deletions(-)
  
   diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
   index f7961f6..d9c4ce7 100644
   --- a/tools/libxl/libxl.c
   +++ b/tools/libxl/libxl.c
   @@ -2611,8 +2611,6 @@ int libxl_vdev_to_device_disk(libxl_ctx *ctx, 
   uint32_t domid,
 if (devid  0)
 return ERROR_INVAL;
 
   -libxl_device_disk_init(disk);
   _init functions are idempotent, so this is harmless, I think. Existing
   callers (including things which aren't xl) might be relying on it.
   Outside of a freeze period that might be acceptable (those callers are
   buggy) but since you are asking for a freeze exception I think this may
   be going a step too far in cleaning things up.
  
   In terms of other callers you've missed
   tools/ocaml/libs/xl/xenlight_stubs.c (which appears buggy to me) in
   tree, plus whatever use e.g. libvirt makes of it.
  
   diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
   index 3c9f146..25af715 100644
   --- a/tools/libxl/xl_cmdimpl.c
   +++ b/tools/libxl/xl_cmdimpl.c
   @@ -485,8 +485,6 @@ static void parse_disk_config_multistring(XLU_Config 
   **config,
 {
 int e;
 
   -libxl_device_disk_init(disk);
   Likewise here, although to a lesser extent since this is xl not libxl.
 
   @@ -6378,13 +6382,17 @@ int main_blockattach(int argc, char **argv)
 printf(disk: %s\n, json);
 free(json);
 if (ferror(stdout) || fflush(stdout)) { perror(stdout); 
   exit(-1); }
   -return 0;
   You should set rc = 0 here rather than initing it at declaration to
   catch error paths which don't set the result. (we aren't very consistent
   about this in the code today so I'm only mentioning it because other
   changes seem to be needed, if that turns out not to be the case and
   there's no need for v3 then this shouldn't block acceptance)
  
   +goto out;
   I'm not 100% convinced by the use of the goto out error handling style
   for a success path, but it's probably better than duplicating the exit
   path or adding !dryrun checks to all the following operations. Ian,
   since you recently posted updated coding style for things around this,
   what do you think?
  
   Ian.
  
  
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.

2014-11-20 Thread Konrad Rzeszutek Wilk

On Thu, Nov 20, 2014 at 11:55:43AM +0100, Jan Beulich wrote:
  On 19.11.14 at 23:21, konrad.w...@oracle.com wrote:
 
 Leaving aside the question of whether this is the right approach, in
 case it is a couple of comments:
 
  @@ -85,7 +91,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci 
  *pirq_dpci)
*/
   bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
   {
  -if ( pirq_dpci-state  ((1  STATE_RUN) | (1  STATE_SCHED)) )
  +if ( pirq_dpci-state  ((1  STATE_RUN) | (1  STATE_SCHED) | (1  
  STATE_ZOMBIE) ) )
 
 This is becoming unwieldy - perhaps better just if ( pirq_dpci-state )?
 
  @@ -122,6 +128,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci 
  *pirq_dpci,
   /* fallthrough. */
   case (1  STATE_RUN):
   case (1  STATE_RUN) | (1  STATE_SCHED):
  +case (1  STATE_RUN) | (1  STATE_SCHED) | (1  STATE_ZOMBIE):
 
 How would we get into this state? Afaict STATE_ZOMBIE can't be set
 at the same time as STATE_SCHED.
 
  @@ -786,6 +793,7 @@ unlock:
   static void dpci_softirq(void)
   {
   unsigned int cpu = smp_processor_id();
  +unsigned int reset = 0;
 
 bool_t (and would better be inside the loop, avoiding the pointless
 re-init on the last iteration).
 
 But taking it as a whole - aren't we now at risk of losing an interrupt
 instance the guest expects, due to raise_softirq_for() bailing on
 zombie entries, and dpci_softirq() being the only place where the
 zombie state gets cleared?

Ah crud.

So a simple fix could be to seperate the 'state' to only deal with the
raise_softirq and softirq_dpci. And then add a new (old) 'masked' to
deal between hvm_dirq_assist, pt_irq_guest_eoi and hvm_do_IRQ_dpci.


From 94a98e20a8ab721a58788919f92e3524a6c6e25c Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Date: Thu, 20 Nov 2014 14:28:11 -0500
Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process.

commit f6dd295381f4b6a66acddacf46bca8940586c8d8
dpci: replace tasklet with softirq used the 'masked' as an
two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate
between 'raise_softirq_for' and 'dpci_softirq' to determine whether
the 'struct hvm_pirq_dpci' can be re-scheduled.

However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper
dialogue and was not using locked cmpxchg or test_bit operations and
ended setting 'state' set to zero. That meant 'raise_softirq_for' was
free to schedule it while the 'struct hvm_pirq_dpci'' was still on an
per-cpu list causing an list corruption.

The code would trigger the following path causing list corruption:

\-timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(our_list))
list_del, but has not yet done 'entry-next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the 
dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry-next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.

\- dpci_softirq
while (!list_emtpry)
list_del, but -next already has LIST_POISON1 and we blow up.

An alternative solution was proposed (adding STATE_ZOMBIE and making
pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above
issue but had an fatal bug. It would miss interrupts that are to be scheduled!

This patch brings back the 'masked' boolean which is used as an
communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and
'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime
'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it.

The 'state' is left as a seperate mechanism to provide an mechanism between
'raise_sofitrq' and 'softirq_dpci' to communicate the state of the
'struct hvm_dirq_pirq'.

Reported-by: Sander Eikelenboom li...@eikelenboom.it
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/drivers/passthrough/io.c | 5 +++--
 xen/include/xen/hvm/irq.h| 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..4d8c02f 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -142,7 +142,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci,
 if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
   pirq_dpci-flags) )
 {
-pirq_dpci-state = 0;
+pirq_dpci-masked = 0;
 pirq_dpci-pending = 0;
 pirq_guest_eoi(dpci_pirq(pirq_dpci));
 }
@@ -610,6 +610,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
  !(pirq_dpci-flags  HVM_IRQ_DPCI_MAPPED) )
 return 0;
 
+pirq_dpci-masked = 1;
 raise_softirq_for

Re: [Xen-devel] [PATCH v2 for-4.5] xen/arm: clear UIE on hypervisor entry

2014-11-20 Thread Konrad Rzeszutek Wilk

On Thu, Nov 20, 2014 at 04:47:42PM +, Ian Campbell wrote:
 On Thu, 2014-11-20 at 10:53 +, Stefano Stabellini wrote:
  UIE being set can cause maintenance interrupts to occur when Xen writes
  to one or more LR registers. The effect is a busy loop around the
  interrupt handler in Xen
  (http://marc.info/?l=xen-develm=141597517132682): everything gets stuck.
 
 I think it would be useful to explain somewhere why/how a spurious
 interrupt can occur, if not here then in the comment you've already
 added or in another one in gic_clear_lrs where you clear UIE.
 
 The bit about clearing the LRs on entry causing UIE to become deasserted
 before we get as far as reading the IAR is a bit subtle.
 
  Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Reported-and-Tested-by: Andrii Tseglytskyi 
  andrii.tseglyts...@globallogic.com
  CC: konrad.w...@oracle.com
 
 With the expanded commentary:
 Acked-by: Ian Campbell ian.campb...@citrix.com
 
  ---
  
  Konrad, this fixes an actual bug, at least on OMAP5. It should have no
  bad side effects on any other platforms as far as I can tell. It should
  go in 4.5.

As long as the testing that Julian has asked for does not demonstrate
any issues with this patch I am OK with it going in Xen 4.5.


  
  Changes in v2:
  
  - add an in-code comment about maintenance_interrupt not being called.
  
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 70d10d6..c6c11d3 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
   
  +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
   
   while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
  @@ -527,8 +529,6 @@ void gic_inject(void)
   
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
  -else
  -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
   }
   
   static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
  @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void 
  *dev_id, struct cpu_user_regs *r
* Receiving the interrupt is going to cause gic_inject to be called
* on return to guest that is going to clear the old LRs and inject
* new interrupts.
  + *
  + * Do not add code here: maintenance interrupts caused by setting
  + * GICH_HCR_UIE, might read as spurious interrupts (1023). As a
  + * consequence sometimes this handler might not be called.
*/
   }
   
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5 v2] libxc: don't leak buffer containing the uncompressed PV kernel

2014-11-20 Thread Konrad Rzeszutek Wilk

On Thu, Nov 20, 2014 at 03:48:47PM +, Ian Campbell wrote:
 The libxc xc_dom_* infrastructure uses a very simple malloc memory pool which
 is freed by xc_dom_release. However the various xc_try_*_decode routines 
 (other
 than the gzip one) just use plain malloc/realloc and therefore the buffer ends
 up leaked.
 
 The memory pool currently supports mmap'd buffers as well as a directly
 allocated buffers, however the try decode routines make use of realloc and do
 not fit well into this model. Introduce a concept of an external memory block
 to the memory pool and provide an interface to register such memory.
 
 The mmap_ptr and mmap_len fields of the memblock tracking struct lose their
 mmap_ prefix since they are now also used for external memory blocks.
 
 We are only seeing this now because the gzip decoder doesn't leak and it's 
 only
 relatively recently that kernels in the wild have switched to better
 compression.
 
 This is https://bugs.debian.org/767295
 
 Reported by: Gedalya geda...@gedalya.net

Gedelya,

Could you also test this patch to make sure it does fix the
reported issue please?

 Signed-off-by: Ian Campbell ian.campb...@citrix.com
 ---
 v2: Correct handling in xc_try_lzo1x_decode.
 
 This is a bug fix and should go into 4.5.
 
 I have a sneaking suspicion this is going to need to backport a very long 
 way...
 ---
  tools/libxc/include/xc_dom.h|   10 --
  tools/libxc/xc_dom_bzimageloader.c  |   20 
  tools/libxc/xc_dom_core.c   |   61 
 +++
  tools/libxc/xc_dom_decompress_lz4.c |5 +++
  4 files changed, 80 insertions(+), 16 deletions(-)
 
 diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
 index 6ae6a9f..07d7224 100644
 --- a/tools/libxc/include/xc_dom.h
 +++ b/tools/libxc/include/xc_dom.h
 @@ -33,8 +33,13 @@ struct xc_dom_seg {
  
  struct xc_dom_mem {
  struct xc_dom_mem *next;
 -void *mmap_ptr;
 -size_t mmap_len;
 +void *ptr;
 +enum {
 +XC_DOM_MEM_TYPE_MALLOC_INTERNAL,
 +XC_DOM_MEM_TYPE_MALLOC_EXTERNAL,
 +XC_DOM_MEM_TYPE_MMAP,
 +} type;
 +size_t len;
  unsigned char memory[0];
  };
  
 @@ -298,6 +303,7 @@ void xc_dom_log_memory_footprint(struct xc_dom_image 
 *dom);
  /* --- simple memory pool -- */
  
  void *xc_dom_malloc(struct xc_dom_image *dom, size_t size);
 +int xc_dom_register_external(struct xc_dom_image *dom, void *ptr, size_t 
 size);
  void *xc_dom_malloc_page_aligned(struct xc_dom_image *dom, size_t size);
  void *xc_dom_malloc_filemap(struct xc_dom_image *dom,
  const char *filename, size_t * size,
 diff --git a/tools/libxc/xc_dom_bzimageloader.c 
 b/tools/libxc/xc_dom_bzimageloader.c
 index 2225699..964ebdc 100644
 --- a/tools/libxc/xc_dom_bzimageloader.c
 +++ b/tools/libxc/xc_dom_bzimageloader.c
 @@ -161,6 +161,13 @@ static int xc_try_bzip2_decode(
  
  total = (((uint64_t)stream.total_out_hi32)  32) | 
 stream.total_out_lo32;
  
 +if ( xc_dom_register_external(dom, out_buf, total) )
 +{
 +DOMPRINTF(BZIP2: Error registering stream output);
 +free(out_buf);
 +goto bzip2_cleanup;
 +}
 +
  DOMPRINTF(%s: BZIP2 decompress OK, 0x%zx - 0x%lx,
__FUNCTION__, *size, (long unsigned int) total);
  
 @@ -305,6 +312,13 @@ static int _xc_try_lzma_decode(
  }
  }
  
 +if ( xc_dom_register_external(dom, out_buf, stream-total_out) )
 +{
 +DOMPRINTF(%s: Error registering stream output, what);
 +free(out_buf);
 +goto lzma_cleanup;
 +}
 +
  DOMPRINTF(%s: %s decompress OK, 0x%zx - 0x%zx,
__FUNCTION__, what, *size, (size_t)stream-total_out);
  
 @@ -464,7 +478,13 @@ static int xc_try_lzo1x_decode(
  
  dst_len = lzo_read_32(cur);
  if ( !dst_len )
 +{
 +msg = Error registering stream output;
 +if ( xc_dom_register_external(dom, out_buf, out_len) )
 +break;
 +
  return 0;
 +}
  
  if ( dst_len  LZOP_MAX_BLOCK_SIZE )
  {
 diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
 index baa62a1..ecbf981 100644
 --- a/tools/libxc/xc_dom_core.c
 +++ b/tools/libxc/xc_dom_core.c
 @@ -132,6 +132,7 @@ void *xc_dom_malloc(struct xc_dom_image *dom, size_t size)
  return NULL;
  }
  memset(block, 0, sizeof(*block) + size);
 +block-type = XC_DOM_MEM_TYPE_MALLOC_INTERNAL;
  block-next = dom-memblocks;
  dom-memblocks = block;
  dom-alloc_malloc += sizeof(*block) + size;
 @@ -151,23 +152,45 @@ void *xc_dom_malloc_page_aligned(struct xc_dom_image 
 *dom, size_t size)
  return NULL;
  }
  memset(block, 0, sizeof(*block));
 -block-mmap_len = size;
 -block-mmap_ptr = mmap(NULL, block-mmap_len,
 -   PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON,
 -   -1, 0);

Re: [Xen-devel] [PATCH v2 for 4.5] scripts/get_maintainer.pl: Correctly CC the maintainers

2014-11-20 Thread Konrad Rzeszutek Wilk

On Thu, Nov 20, 2014 at 05:36:03PM +, Julien Grall wrote:
 The current script is setting $email_remove_duplicates to 1 by default, on
 complex patch (see [1]), this will result to ommitting randomly some
 maintainers.

One could see that as feature - the emails about bugs or patches
to review don't land in your inbox! 

 
 This is because, the script will:
 1) Get the list of maintainers of the file (incidentally all the
maintainers in THE REST role are added). If the email address already
exists in the global list, skip it. = The role will be lost
 2) Filter the list to remove the entry with THE REST role
 
 So if a maintainers is marked with THE REST role on the first file and
 actually be an x86 maintainers on the script, the script will only retain
 the THE REST role. During the filtering step, this maintainers will
 therefore be dropped.
 
 This patch fixes this by setting $email_remove_duplicates to 0 by default.
 The new behavior of the script will be:
 1) Append the list of maintainers for every file
 2) Filter the list to remove the entry with THE REST role
 3) Remove duplicated email address
 
 Example:
 
 Patch: https://patches.linaro.org/41083/
 
 Before the patch:
 
 Daniel De Graaf dgde...@tycho.nsa.gov
 Ian Jackson ian.jack...@eu.citrix.com
 Stefano Stabellini stefano.stabell...@eu.citrix.com
 Ian Campbell ian.campb...@citrix.com
 Wei Liu wei.l...@citrix.com
 George Dunlap george.dun...@eu.citrix.com
 xen-devel@lists.xen.org
 
 After the patch:
 
 Daniel De Graaf dgde...@tycho.nsa.gov
 Ian Jackson ian.jack...@eu.citrix.com
 Stefano Stabellini stefano.stabell...@eu.citrix.com
 Ian Campbell ian.campb...@citrix.com
 Wei Liu wei.l...@citrix.com
 Stefano Stabellini stefano.stabell...@citrix.com
 Tim Deegan t...@xen.org
 Keir Fraser k...@xen.org
 Jan Beulich jbeul...@suse.com
 George Dunlap george.dun...@eu.citrix.com
 xen-devel@lists.xen.org
 
 [1] http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00060.html
 
 Signed-off-by: Julien Grall julien.gr...@linaro.org
 CC: Don Slutz dsl...@verizon.com
 
 ---
 Changes in v2:
 - Rework the commit message to explain the problem and the
 solution more clearly
 
 I would like to see this patch in Xen 4.5 and backported to Xen 4.4 (first
 time the script has been introduced).
 
 Developpers using this script won't ommitted to cc some maintainers, and 
 it
 will avoid maintainers complaining about miss CC.
 
 The only drawbacks I can see is if the maintainers is referenced twice in
 the file MAINTAINERS with different email, the script won't notice it's
 duplicated and list 2 times. Though, for this one it could be fixed by
 modifying  the MAINTAINERS file. Is it worth for Xen 4.5? For know,
 it seems to only happen with Stefano.

I am OK with this going in Xen 4.5.

Thank you.
 ---
  scripts/get_maintainer.pl |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
 index df920e2..cc445cd 100755
 --- a/scripts/get_maintainer.pl
 +++ b/scripts/get_maintainer.pl
 @@ -35,7 +35,7 @@ my $email_git_min_percent = 5;
  my $email_git_since = 1-year-ago;
  my $email_hg_since = -365;
  my $interactive = 0;
 -my $email_remove_duplicates = 1;
 +my $email_remove_duplicates = 0;
  my $email_use_mailmap = 1;
  my $email_drop_the_rest_supporter_if_supporter_found = 1;
  my $output_multiline = 1;
 -- 
 1.7.10.4
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v9 12/13] swiotlb-xen: pass dev_addr to xen_dma_unmap_page and xen_dma_sync_single_for_cpu

2014-11-20 Thread Konrad Rzeszutek Wilk

On Thu, Nov 20, 2014 at 10:47:51AM +, Stefano Stabellini wrote:
 On Thu, 20 Nov 2014, Stefano Stabellini wrote:
  On Wed, 19 Nov 2014, Konrad Rzeszutek Wilk wrote:
   On Wed, Nov 12, 2014 at 11:40:53AM +, Stefano Stabellini wrote:
xen_dma_unmap_page and xen_dma_sync_single_for_cpu take a dma_addr_t
handle as argument, not a physical address.
   
   Ouch. Should this also go on stable tree?
  
  Good idea
 
 Also can I take that as an Acked-by for this patch?

Yes.
 
 
 There is one last bit of common and x86 changes in this series:
 patch #7 adds a paramter to xen_dma_map_page, even though the x86
 implementation is empty:
 
 http://marc.info/?l=xen-develm=141579253829808w=2
 
 is that OK for you?

Yes.
  
  
Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
Reviewed-by: Catalin Marinas catalin.mari...@arm.com
---
 drivers/xen/swiotlb-xen.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 3725ee4..498b654 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -449,7 +449,7 @@ static void xen_unmap_single(struct device *hwdev, 
dma_addr_t dev_addr,
 
BUG_ON(dir == DMA_NONE);
 
-   xen_dma_unmap_page(hwdev, paddr, size, dir, attrs);
+   xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs);
 
/* NOTE: We use dev_addr here, not paddr! */
if (is_xen_swiotlb_buffer(dev_addr)) {
@@ -497,14 +497,14 @@ xen_swiotlb_sync_single(struct device *hwdev, 
dma_addr_t dev_addr,
BUG_ON(dir == DMA_NONE);
 
if (target == SYNC_FOR_CPU)
-   xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir);
+   xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir);
 
/* NOTE: We use dev_addr here, not paddr! */
if (is_xen_swiotlb_buffer(dev_addr))
swiotlb_tbl_sync_single(hwdev, paddr, size, dir, 
target);
 
if (target == SYNC_FOR_DEVICE)
-   xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir);
+   xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir);
 
if (dir != DMA_FROM_DEVICE)
return;
-- 
1.7.10.4

   
  

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 01:51:24PM +0100, Sander Eikelenboom wrote:
 
 Friday, November 21, 2014, 12:50:16 PM, you wrote:
 
  On November 21, 2014 2:51:33 AM EST, Jan Beulich jbeul...@suse.com wrote:
  On 20.11.14 at 20:51, konrad.w...@oracle.com wrote:
  @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d,
 struct hvm_pirq_dpci *pirq_dpci)
   ASSERT(d-arch.hvm_domain.irq.dpci);
   
   spin_lock(d-event_lock);
  -if ( pirq_dpci-state )
  +if ( test_and_clear_bool(pirq_dpci-masked) )
   {
   struct pirq *pirq = dpci_pirq(pirq_dpci);
   const struct dev_intx_gsi_link *digl;
 
 So this now guards solely against the timeout enforced EOI? Why do
 you no longer need to guard against cancellation (i.e. why isn't this
 looking at both -state and -masked)?
 
 
  The previous state check was superfluous as the dpci_softirq would check 
  for the valid STATE_ before calling hvm_dirq_assist (and deal with 
  cancellation).
 
  I actually had an cleanup patch that would have removed the 'if 
  (pirq_dpci-state) ' and move the code for Xen 4.6.
 
  Anyhow waiting to see if Sander was able to test with this patch.
 
 Jan
 
 Hi Konrad / Jan,
 
 I have tested it for 3 hours now, no host crash so far (even after applying 
 some 
 extra stress to the guest).

Yeey! Thank you for being so flexible and willing to test these patches out!
 
 --
 Sander
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] WARNings in guest during xl save/restore

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 11:08:37AM +0100, Juergen Gross wrote:
 Hi,
 
 during tests of my linear p2m list patches I stumbled over some
 WARNs issued during xl save and xl restore of a pv-domU with
 unpatched linux 3.18-rc5:

Boris had an patch for this I htink..
 
 during save I saw multiple entries like:
 [  176.900393] WARNING: CPU: 0 PID: 9 at arch/x86/xen/enlighten.c:968
 clear_local_APIC+0xa5/0x2b0()
 [  176.900393] Modules linked in: cfg80211 rfkill nfsd auth_rpcgss
 oid_registry nfs_acl nfs lockd grace fscache sunrpc evdev
 x86_pkg_temp_thermal thermal_sys snd_pcm coretemp snd_timer crc32_pclmul
 aesni_intel snd xts soundcore aes_i586 lrw gf128mul ablk_helper pcspkr
 cryptd fuse autofs4 ext4 crc16 mbcache jbd2 crc32c_intel
 [  176.900393] CPU: 0 PID: 9 Comm: migration/0 Tainted: GW
 3.18.0-rc5 #30
 [  176.900393]  0009 c14c40b2  c1054b10 c1599538 
 0009 c158bdc2
 [  176.900393]  03c8 c103c925 c103c925 03c8 0002 
 c15d25eb e8867e64
 [  176.900393]  c1054bd9 0009  c103c925  c103cb54
 0002 
 [  176.900393] Call Trace:
 [  176.900393]  [c14c40b2] ? dump_stack+0x3e/0x4e
 [  176.900393]  [c1054b10] ? warn_slowpath_common+0x90/0xc0
 [  176.900393]  [c103c925] ? clear_local_APIC+0xa5/0x2b0
 [  176.900393]  [c103c925] ? clear_local_APIC+0xa5/0x2b0
 [  176.900393]  [c1054bd9] ? warn_slowpath_null+0x19/0x20
 [  176.900393]  [c103c925] ? clear_local_APIC+0xa5/0x2b0
 [  176.900393]  [c103cb54] ? disable_local_APIC+0x24/0x90
 [  176.900393]  [c103ccde] ? lapic_suspend+0x11e/0x170
 [  176.900393]  [c1360ff9] ? syscore_suspend+0x79/0x220
 [  176.900393]  [c107e782] ? set_next_entity+0x62/0x80
 [  176.900393]  [c13165cd] ? xen_suspend+0x2d/0x110
 [  176.900393]  [c10045df] ? xen_mc_flush+0x13f/0x170
 [  176.900393]  [c10d5619] ? multi_cpu_stop+0xa9/0xd0
 [  176.900393]  [c10d5570] ? cpu_stop_should_run+0x50/0x50
 [  176.900393]  [c10d5771] ? cpu_stopper_thread+0x71/0x100
 [  176.900393]  [c1074214] ? finish_task_switch+0x34/0xd0
 [  176.900393]  [c14c513d] ? __schedule+0x23d/0x7f0
 [  176.900393]  [c10897f4] ? __wake_up_common+0x44/0x70
 [  176.900393]  [c14c8962] ? _raw_spin_lock_irqsave+0x12/0x60
 [  176.900393]  [c1071f22] ? smpboot_thread_fn+0xd2/0x170
 [  176.900393]  [c1071e50] ? SyS_setgroups+0x110/0x110
 [  176.900393]  [c106e801] ? kthread+0xa1/0xc0
 [  176.900393]  [c14c8ea1] ? ret_from_kernel_thread+0x21/0x30
 [  176.900393]  [c106e760] ? kthread_create_on_node+0x120/0x120
 [  176.900393] ---[ end trace b38596d5cfdcde8d ]---
 
 and during restore:
 [  176.900393] WARNING: CPU: 0 PID: 9 at arch/x86/xen/enlighten.c:968
 lapic_resume+0xc6/0x270()
 [  176.900393] Modules linked in: cfg80211 rfkill nfsd auth_rpcgss
 oid_registry nfs_acl nfs lockd grace fscache sunrpc evdev
 x86_pkg_temp_thermal thermal_sys snd_pcm coretemp snd_timer crc32_pclmul
 aesni_intel snd xts soundcore aes_i586 lrw gf128mul ablk_helper pcspkr
 cryptd fuse autofs4 ext4 crc16 mbcache jbd2 crc32c_intel
 [  176.900393] CPU: 0 PID: 9 Comm: migration/0 Tainted: GW
 3.18.0-rc5 #30
 [  176.900393]  0009 c14c40b2  c1054b10 c1599538 
 0009 c158bdc2
 [  176.900393]  03c8 c103c1e6 c103c1e6 03c8 c1030020 0002
 001b 
 [  176.900393]  c1054bd9 0009  c103c1e6  c16432c0
 0108cdfe c15d25dc
 [  176.900393] Call Trace:
 [  176.900393]  [c14c40b2] ? dump_stack+0x3e/0x4e
 [  176.900393]  [c1054b10] ? warn_slowpath_common+0x90/0xc0
 [  176.900393]  [c103c1e6] ? lapic_resume+0xc6/0x270
 [  176.900393]  [c103c1e6] ? lapic_resume+0xc6/0x270
 [  176.900393]  [c1030020] ? mcheck_cpu_init+0x170/0x4f0
 [  176.900393]  [c1054bd9] ? warn_slowpath_null+0x19/0x20
 [  176.900393]  [c103c1e6] ? lapic_resume+0xc6/0x270
 [  176.900393]  [c1360e66] ? syscore_resume+0x46/0x160
 [  176.900393]  [c1009012] ? xen_timer_resume+0x42/0x60
 [  176.900393]  [c131661c] ? xen_suspend+0x7c/0x110
 [  176.900393]  [c10d5619] ? multi_cpu_stop+0xa9/0xd0
 [  176.900393]  [c10d5570] ? cpu_stop_should_run+0x50/0x50
 [  176.900393]  [c10d5771] ? cpu_stopper_thread+0x71/0x100
 [  176.900393]  [c1074214] ? finish_task_switch+0x34/0xd0
 [  176.900393]  [c14c513d] ? __schedule+0x23d/0x7f0
 [  176.900393]  [c10897f4] ? __wake_up_common+0x44/0x70
 [  176.900393]  [c14c8962] ? _raw_spin_lock_irqsave+0x12/0x60
 [  176.900393]  [c1071f22] ? smpboot_thread_fn+0xd2/0x170
 [  176.900393]  [c1071e50] ? SyS_setgroups+0x110/0x110
 [  176.900393]  [c106e801] ? kthread+0xa1/0xc0
 [  176.900393]  [c14c8ea1] ? ret_from_kernel_thread+0x21/0x30
 [  176.900393]  [c106e760] ? kthread_create_on_node+0x120/0x120
 [  176.900393] ---[ end trace b38596d5cfdcde93 ]---
 
 While this seems not to be critical (the system is running after the
 restore) I assume disabling/enabling a local APIC on a pv-domain isn't
 something we want to happen...
 
 
 Juergen
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org

Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.

2014-11-21 Thread Konrad Rzeszutek Wilk

' 
continues
on with setting '-flag=0' and unlocks the lock.

'hvm_dirq_assist' grabs the lock and continues one (and state is STATE_RUN).
Since 'flag = 0' and 'digl_list' is empty, it thunders through the 
'hvm_dirq_assist'
not doing anything. Well, it ends up calling the dreaded 'set_timer' - which 
will hit
another assert as it has never been initialized. Adding in 'mapping = 0' on the
error paths for MSI takes care of that.  That will take care of that - and I 
just
rolled 'masked =0' in the pt_pirq_softirq_reset function.

The legacy interrupt does not have this window, so it cannot do it.

 is being acquired right before the line quoted above, _that_ is

No. The event lock is acquired in 'hvm_dirq_assist' which would be
checking the 'mapping', not 'state'.

   spin_lock(d-event_lock);  
   if ( test_and_clear_bool(pirq_dpci-masked) )  

 what needs closely looking at (and why I was asking about the
 deletion of the [at the first glance unnecessary] checking of -state
 in hvm_dirq_assist()).


From 90d00db0949a8e796d7f812134753a54b2fe3d51 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Date: Thu, 20 Nov 2014 14:28:11 -0500
Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process.

commit f6dd295381f4b6a66acddacf46bca8940586c8d8
dpci: replace tasklet with softirq used the 'masked' as an
two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate
between 'raise_softirq_for' and 'dpci_softirq' to determine whether
the 'struct hvm_pirq_dpci' can be re-scheduled.

However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper
dialogue and was not using locked cmpxchg or test_bit operations and
ended setting 'state' set to zero. That meant 'raise_softirq_for' was
free to schedule it while the 'struct hvm_pirq_dpci'' was still on an
per-cpu list causing an list corruption.

The code would trigger the following path causing list corruption:

\-timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(our_list))
list_del, but has not yet done 'entry-next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the 
dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry-next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.

\- dpci_softirq
while (!list_emtpry)
list_del, but -next already has LIST_POISON1 and we blow up.

An alternative solution was proposed (adding STATE_ZOMBIE and making
pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above
issue but had an fatal bug. It would miss interrupts that are to be scheduled!

This patch brings back the 'masked' boolean which is used as an
communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and
'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime
'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it.

The 'state' is left as a seperate mechanism to provide an mechanism between
'raise_sofitrq' and 'softirq_dpci' to communicate the state of the
'struct hvm_dirq_pirq'.

However since we have now two seperate machines we have to deal with an
cancellations and outstanding interrupt being serviced: 'pt_irq_destroy_bind'
is called while an 'hvm_dirq_assist' is just about to service.
The 'pt_irq_destroy_bind' takes the lock first and kills the timer - and
the moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls
'set_timer' hitting an ASSERT.

By clearing the 'masked' in the 'pt_irq_destroy_bind' we take care of that
scenario by inhibiting 'hvm_dirq_assist' to call the 'set_timer'.

In the 'pt_irq_create_bind' - in the error cases we could be seeing
an softirq scheduled right away and being serviced (though stuck at
the spinlock).  The 'pt_irq_create_bind' fails in 'pt_pirq_softirq_reset'
to change the 'state' (as the state is in 'STATE_RUN', not 'STATE_SCHED').
'pt_irq_create_bind' continues on with setting '-flag=0' and unlocks the lock.

'hvm_dirq_assist' grabs the lock and continues one. Since 'flag = 0' and
'digl_list' is empty, it thunders through the 'hvm_dirq_assist' not doing
anything until it hits 'set_timer' which is undefined for MSI. Adding
in 'masked=0' for the MSI case fixes that.

The legacy interrupt one does not need it as there is no chance of
do_IRQ being called at that point.

Reported-by: Sander Eikelenboom li...@eikelenboom.it
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
v2: Deal with 'masked' in pt_irq_destroy_bind.
v3: Deal with 'masked' in pt_irq_create_bind.
---
 xen/drivers/passthrough/io.c | 12 ++--
 xen/include/xen/hvm/irq.h|  1 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/xen

Re: [Xen-devel] [PATCH v9 10/13] xen/arm/arm64: introduce xen_arch_need_swiotlb

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 04:31:31PM +, Stefano Stabellini wrote:
 On Wed, 12 Nov 2014, Stefano Stabellini wrote:
  Introduce an arch specific function to find out whether a particular dma
  mapping operation needs to bounce on the swiotlb buffer.
  
  On ARM and ARM64, if the page involved is a foreign page and the device
  is not coherent, we need to bounce because at unmap time we cannot
  execute any required cache maintenance operations (we don't know how to
  find the pfn from the mfn).
  
  No change of behaviour for x86.
  
  Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Reviewed-by: David Vrabel david.vra...@citrix.com
  Reviewed-by: Catalin Marinas catalin.mari...@arm.com
  Acked-by: Ian Campbell ian.campb...@citrix.com
  Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 I am thinking of asking a backport of this patch to 3.16+
 
 The catch is that is_device_dma_coherent is not available on older
 kernels, so I'll have to change the arm implementation of
 xen_arch_need_swiotlb to:
 
 +bool xen_arch_need_swiotlb(struct device *dev,
 +unsigned long pfn,
 +unsigned long mfn)
 +{
 + return pfn != mfn;
 +}
 +
 
 It is going to make things slower but it is going to fix the issue with
 cache flushing buffers for non-coherent devices.
 
 Konrad, are you OK with that?

Looks fine.
 
 
  Changes in v6:
  - fix ts.
  
  Changes in v5:
  - fix indentation.
  ---
   arch/arm/include/asm/xen/page.h |4 
   arch/arm/xen/mm.c   |7 +++
   arch/x86/include/asm/xen/page.h |7 +++
   drivers/xen/swiotlb-xen.c   |5 -
   4 files changed, 22 insertions(+), 1 deletion(-)
  
  diff --git a/arch/arm/include/asm/xen/page.h 
  b/arch/arm/include/asm/xen/page.h
  index 135c24a..68c739b 100644
  --- a/arch/arm/include/asm/xen/page.h
  +++ b/arch/arm/include/asm/xen/page.h
  @@ -107,4 +107,8 @@ static inline bool set_phys_to_machine(unsigned long 
  pfn, unsigned long mfn)
   #define xen_remap(cookie, size) ioremap_cache((cookie), (size))
   #define xen_unmap(cookie) iounmap((cookie))
   
  +bool xen_arch_need_swiotlb(struct device *dev,
  +  unsigned long pfn,
  +  unsigned long mfn);
  +
   #endif /* _ASM_ARM_XEN_PAGE_H */
  diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
  index ab700e1..28ebf3e 100644
  --- a/arch/arm/xen/mm.c
  +++ b/arch/arm/xen/mm.c
  @@ -100,6 +100,13 @@ void __xen_dma_sync_single_for_device(struct device 
  *hwdev,
  __xen_dma_page_cpu_to_dev(hwdev, handle, size, dir);
   }
   
  +bool xen_arch_need_swiotlb(struct device *dev,
  +  unsigned long pfn,
  +  unsigned long mfn)
  +{
  +   return ((pfn != mfn)  !is_device_dma_coherent(dev));
  +}
  +
   int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
   unsigned int address_bits,
   dma_addr_t *dma_handle)
  diff --git a/arch/x86/include/asm/xen/page.h 
  b/arch/x86/include/asm/xen/page.h
  index c949923..f58ef6c 100644
  --- a/arch/x86/include/asm/xen/page.h
  +++ b/arch/x86/include/asm/xen/page.h
  @@ -236,4 +236,11 @@ void make_lowmem_page_readwrite(void *vaddr);
   #define xen_remap(cookie, size) ioremap((cookie), (size));
   #define xen_unmap(cookie) iounmap((cookie))
   
  +static inline bool xen_arch_need_swiotlb(struct device *dev,
  +unsigned long pfn,
  +unsigned long mfn)
  +{
  +   return false;
  +}
  +
   #endif /* _ASM_X86_XEN_PAGE_H */
  diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
  index ad2c5eb..3725ee4 100644
  --- a/drivers/xen/swiotlb-xen.c
  +++ b/drivers/xen/swiotlb-xen.c
  @@ -399,7 +399,9 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, 
  struct page *page,
   * buffering it.
   */
  if (dma_capable(dev, dev_addr, size) 
  -   !range_straddles_page_boundary(phys, size)  !swiotlb_force) {
  +   !range_straddles_page_boundary(phys, size) 
  +   !xen_arch_need_swiotlb(dev, PFN_DOWN(phys), PFN_DOWN(dev_addr)) 
  
  +   !swiotlb_force) {
  /* we are not interested in the dma_addr returned by
   * xen_dma_map_page, only in the potential cache flushes 
  executed
   * by the function. */
  @@ -557,6 +559,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct 
  scatterlist *sgl,
  dma_addr_t dev_addr = xen_phys_to_bus(paddr);
   
  if (swiotlb_force ||
  +   xen_arch_need_swiotlb(hwdev, PFN_DOWN(paddr), 
  PFN_DOWN(dev_addr)) ||
  !dma_capable(hwdev, dev_addr, sg-length) ||
  range_straddles_page_boundary(paddr, sg-length)) {
  phys_addr_t map = swiotlb_tbl_map_single(hwdev,
  -- 
  1.7.10.4
  

___
Xen-devel mailing list
Xen-devel

Re: [Xen-devel] [PATCH for-4.5 RFC] pygrub: Fix regression from c/s d1b93ea, attempt 2

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 01:32:13PM +, Andrew Cooper wrote:
 On 20/11/14 16:45, Boris Ostrovsky wrote:
  On 11/20/2014 11:15 AM, Ian Campbell wrote:
  On Thu, 2014-11-20 at 16:08 +, Andrew Cooper wrote:
  On 20/11/14 16:00, Ian Campbell wrote:
  On Mon, 2014-11-17 at 15:19 +, Andrew Cooper wrote:
  c/s d1b93ea causes substantial functional regressions in pygrub's
  ability to
  parse bootloader configuration files.
 
  c/s d1b93ea itself changed an an interface which previously used
  exclusively
  integers, to using strings in the case of a grub configuration
  with explicit
  default set, along with changing the code calling the interface to
  require a
  string.  The default value for default remained as an integer.
 
  As a result, any Extlinux or Lilo configuration (which drives this
  interface
  exclusively with integers), or Grub configuration which doesn't
  explicitly
  declare a default will die with an AttributeError when attempting
  to call
  self.cf.default.isdigit() where default is an integer.
 
  Sadly, this AttributeError gets swallowed by the blanket ignore in
  the loop
  which searches partitions for valid bootloader configurations,
  causing the
  issue to be reported as Unable to find partition containing kernel
 
  This patch attempts to fix the issue by altering all parts of this
  interface
  to use strings, as opposed to integers.
  Would it be less invasive at this point in the release to have the
  place
  where this stuff is used do isinstance(s, str) and isinstance(s, int)?
  It would be BaseString not str, but I am fairly sure the classes have
  altered through Py2's history.  I would not be any more confident with
  that as a solution as trying to correctly to start with.
  Actually isinstance(s, basestring) is what the webpage I was looking at
  said, but I cut-n-pasted the wrong thing.
 
  But regardless could it not do something like:
  if !isinstance(foo.cf.default, int):
 
  I think it would be the other way around, e.g. (not tested):
 
  diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
  index aa7e562..7250f45 100644
  --- a/tools/pygrub/src/pygrub
  +++ b/tools/pygrub/src/pygrub
  @@ -457,7 +457,9 @@ class Grub:
   self.cf.parse(buf)
 
   def image_index(self):
  -if self.cf.default.isdigit():
  +if isinstance(self.cf.default, int)
  +sel = self.cf.default
  +elif if self.cf.default.isdigit():
   sel = int(self.cf.default)
   else:
   # We don't fully support submenus. Look for the leaf
  value in
 
  but yes, this looks less intrusive (assuming this the only place where
  we'd hit this error).
 
 
 That does look plausibly like it would fix the issue.
 
 However, I can't help but feeing that this is hacking around a broken
 patch in the first place.

I cannot think of a reason you would ever feel that way!
surreptitiously checks the roll of Xen 4.5 band-aid

We know that the existing patches work fine for a host of Fedora
families (15-21) (thought I need to double check that the testing framework
that we have is using pygrub and not pvgrub), SLESs and RHEL5s, and OL6s.

The ones that I am worried about are the ExtLinux and such which
I didn't realize would use 'pygrub'.

Thought I just remembered a bug with OL7 grub entries - that is if
you go in the interactive menu things broke down. Boris, do you
remember if that was fixed or just 'deferred'?

 
 ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/2] small swiotlb-xen fixes

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 05:00:05PM +, Stefano Stabellini wrote:
 Hi all,
 I have a couple of small straightforward fixes for swiotlb-xen for 3.19.
 

They look fine to me.

Thanks.
 Cheers,
 
 Stefano
 
 
 Stefano Stabellini (2):
   swiotlb-xen: call xen_dma_sync_single_for_device when appropriate
   swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single
 
  drivers/xen/swiotlb-xen.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] libxl: do not load roms for any NICs except the first to avoid wasting memory

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 05:11:09PM +, Stefano Stabellini wrote:
 The rom is used for pxebooting. We don't need to allow pxebooting from
 more than one network card.  Loading a romfile for every NIC wastes

Why not? Why can't we PXE boot from each network card?

 memory and as a matter of fact breaks configurations with more than 4
 NICs as QEMU fails to allocate memory on behalf of the guest.

What if you have four different type of NICs? Say 1 rlt8193, 1 e1000, one eepro,
and ne2k ?

Don't you want to load the ROM for each one?
 
 With this fix, it is possible to assign more than 4 rtl8139 NICs to the
 guest.
 
 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 
 diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
 index 3e191c3..f907ca9 100644
 --- a/tools/libxl/libxl_dm.c
 +++ b/tools/libxl/libxl_dm.c
 @@ -674,9 +674,10 @@ static char ** 
 libxl__build_device_model_args_new(libxl__gc *gc,
  LIBXL_NIC_TYPE_VIF_IOEMU);
  flexarray_append(dm_args, -device);
  flexarray_append(dm_args,
 -   libxl__sprintf(gc, %s,id=nic%d,netdev=net%d,mac=%s,
 +   libxl__sprintf(gc, %s,id=nic%d,netdev=net%d,mac=%s%s,
  nics[i].model, nics[i].devid,
 -nics[i].devid, smac));
 +nics[i].devid, smac,
 +i ? ,romfile=\\ : ));
  flexarray_append(dm_args, -netdev);
  flexarray_append(dm_args, GCSPRINTF(
type=tap,id=net%d,ifname=%s,
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5-rc2 post update (RC2 was out 2014-Nov-11th)

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 12:54:08PM -0500, Don Slutz wrote:
 On 11/21/14 12:42, konrad.w...@oracle.com wrote:
 Feature patchsets that did not make it in by today have been put
 on the deferred list. If you think your feature should make it by Xen 4.5-RC3
 please make your case.
 
 Xen 4.5-rc2 was out on Monday (11th). There are three known issues, if I have
 missed one (or more) please respond.
 
 (see 'Known Issues' below) which are to be fixed by RC3 - if:
   - The maintainers are fine with it,
   - The risk is minimal to common code paths.
 
 Details for the test-day are at
 
 http://wiki.xen.org/wiki/Xen_4.5_RC2_test_instructions
 
 In terms of bugs, we have:
 
 #11 qxl hypervisor support
 #13 Re: [Xen-devel] man page example: xm block-attach
 #18 xl improve support for migration over non-sshlike tunnels
 #19 xl migrate transport improvements
 #22 xl does not support specifying virtual function for passthrough device
 #23 Remove arbitrary LIBXL_MAXMEM_CONSTANT from libxl, see what breaks
 #24 xl missing support for encrypted VNC
 #27 Re: [Xen-devel] xend vs xl with pci=['bdf'] wherein the 'bdf' are not 
 owned by pciback or pcistub will still launch.
 #28 support PCI hole resize in qemu-xen
 
 [Isn't this fixed by Don's 'mmio_hole' patches?]
 
 I consider mmio_hole a work around for this.  The is more that can be done
 here.

Yeah, I had a discussion with George about this and it seems that the proper 
fix to this
is quite involved (perhaps make it a XEn 4.6 feature/fix?). I need somehow to 
update the
bug system to mention how to use 'mmio_hole' for this.

Thank you!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5 2/2] xl: print out partial configuration in long mode of list command

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 01:04:42PM -0500, Zhigang Wang wrote:
 On 11/14/2014 05:22 AM, Ian Campbell wrote:
  On Fri, 2014-11-14 at 10:09 +, Wei Liu wrote:
  On Fri, Nov 14, 2014 at 09:59:00AM +, Ian Campbell wrote:
  On Wed, 2014-11-12 at 17:04 +, Wei Liu wrote:
  Unconditionally print out the partial configuration.
 
  Can you provide an example of what such a configuration looks like?
 
 
  {
  domid: 2,
  config: {
  c_info: {
  name: s0-raw-vnuma,
  uuid: a8bed4ac-a0fe-4166-8eac-feeb007a2110
  },
  b_info: {
  sched_params: {
 
  },
  type.invalid: {
 
  }
  }
  }
  }
  
  Great, thanks. I propose to insert this into the commit message as I
  check it in (once I check who's acked it etc)
  
  Libxl still complains because it tries to read some nodes that don't
  exist, so xl will just print it out on stderr. This is the same
  behaviour as before though.
  
  I think that's tolerable for 4.5 at least.
 
 I also think this is OK for 4.5.
 
 Konrad: can pickup this for next rc?

Please see http://lists.xen.org/archives/html/xen-devel/2014-11/msg01238.html
 
 Thanks,
 
 Zhigang
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] libxl: do not load roms for any NICs except the first to avoid wasting memory

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 06:48:53PM +, Stefano Stabellini wrote:
 On Fri, 21 Nov 2014, Konrad Rzeszutek Wilk wrote:
  On Fri, Nov 21, 2014 at 05:11:09PM +, Stefano Stabellini wrote:
   The rom is used for pxebooting. We don't need to allow pxebooting from
   more than one network card.  Loading a romfile for every NIC wastes
  
  Why not? Why can't we PXE boot from each network card?
 
 I take it back: you are right, at the moment it is actually possible to
 pxe boot from the fourth nic for example. Maybe we could just load the
 first four romfiles and skip the others (four is the limit before QEMU
 fails to allocate any more memory).

The limit is in the count of ROMs or the memory consumption? What if you
also do PCI passthrough? Does that figure in this calculation?
 
 TBH not all the emulated nics need a romfile but I wouldn't want to go
 down at that level of details beause they become QEMU implementation
 details. I wouldn't want to tie libxl to a certain version of QEMU in
 any way.
 
 A better way would be handling memory allocation errors in QEMU but QEMU
 doesn't do that at the moment and the change there would be certainly
 more invasive that a couple of lines in libxl.
 
 
   memory and as a matter of fact breaks configurations with more than 4
   NICs as QEMU fails to allocate memory on behalf of the guest.
  
  What if you have four different type of NICs? Say 1 rlt8193, 1 e1000, one 
  eepro,
  and ne2k ?
  
  Don't you want to load the ROM for each one?
 
 The rom cannot be reused in QEMU among multiple instances of the same
 nic, so having four different types of nics or just one type doesn't
 change anything.
 
 
   With this fix, it is possible to assign more than 4 rtl8139 NICs to the
   guest.
   
   Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
   
   diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
   index 3e191c3..f907ca9 100644
   --- a/tools/libxl/libxl_dm.c
   +++ b/tools/libxl/libxl_dm.c
   @@ -674,9 +674,10 @@ static char ** 
   libxl__build_device_model_args_new(libxl__gc *gc,

   LIBXL_NIC_TYPE_VIF_IOEMU);
flexarray_append(dm_args, -device);
flexarray_append(dm_args,
   -   libxl__sprintf(gc, %s,id=nic%d,netdev=net%d,mac=%s,
   +   libxl__sprintf(gc, 
   %s,id=nic%d,netdev=net%d,mac=%s%s,
nics[i].model, 
   nics[i].devid,
   -nics[i].devid, smac));
   +nics[i].devid, smac,
   +i ? ,romfile=\\ : 
   ));
flexarray_append(dm_args, -netdev);
flexarray_append(dm_args, GCSPRINTF(
  type=tap,id=net%d,ifname=%s,
   
   ___
   Xen-devel mailing list
   Xen-devel@lists.xen.org
   http://lists.xen.org/xen-devel
  

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 00/19] Virtual NUMA for PV and HVM

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 03:06:42PM +, Wei Liu wrote:
 Hi all
 
 This patch series implements virtual NUMA support for both PV and HVM guest.
 That is, admin can configure via libxl what virtual NUMA topology the guest
 sees.
 
 This is the stage 1 (basic vNUMA support) and part of stage 2 (vNUMA-ware
 ballooning, hypervisor side) described in my previous email to xen-devel [0].
 
 This series is broken into several parts:
 
 1. xen patches: vNUMA debug output and vNUMA-aware memory hypercall support.
 2. libxc/libxl support for PV vNUMA.
 3. libxc/libxl support for HVM vNUMA.
 4. xl vNUMA configuration documentation and parser.
 
 I think one significant difference from Elena's work is that this patch series
 makes use of multiple vmemranges should there be a memory hole, instead of
 shrinking ram. This matches the behaviour of real hardware.

Are some of the patches then borrowed from Elena? If so, she should be credited
in the patches?
 
 The vNUMA auto placement algorithm is missing at the moment and Dario is
 working on it.
 
 This series can be found at:
  git://xenbits.xen.org/people/liuw/xen.git wip.vnuma-v1 
 
 With this series, the following configuration can be used to enabled virtual
 NUMA support, and it works for both PV and HVM guests.
 
 memory = 6000
 vnuma_memory = [3000, 3000]
 vnuma_vcpu_map = [0, 1]
 vnuma_pnode_map = [0, 0]
 vnuma_vdistances = [10, 30] # optional
 
 dmesg output for HVM guest:
 
 [0.00] ACPI: SRAT fc009ff0 000C8 (v01Xen  HVM 
  HVML )
 [0.00] ACPI: SLIT fc00a0c0 00030 (v01Xen  HVM 
  HVML )
 ...snip...
 [0.00] SRAT: PXM 0 - APIC 0x00 - Node 0
 [0.00] SRAT: PXM 1 - APIC 0x02 - Node 1
 [0.00] SRAT: Node 0 PXM 0 [mem 0x-0xbb7f]
 [0.00] SRAT: Node 1 PXM 1 [mem 0xbb80-0xefff]
 [0.00] SRAT: Node 1 PXM 1 [mem 0x1-0x186ff]
 [0.00] NUMA: Initialized distance table, cnt=2
 [0.00] NUMA: Node 1 [mem 0xbb80-0xefff] + [mem 
 0x1-0x1867f] - [mem 0xbb80-0x1867f]
 [0.00] Initmem setup node 0 [mem 0x-0xbb7f]
 [0.00]   NODE_DATA [mem 0xbb7fc000-0xbb7f]
 [0.00] Initmem setup node 1 [mem 0xbb80-0x1867f]
 [0.00]   NODE_DATA [mem 0x1867f7000-0x1867fafff]
 [0.00]  [ea00-ea00029f] PMD - 
 [8800b860-8800baff] on node 0
 [0.00]  [ea0002a0-ea00055f] PMD - 
 [88018300-8801859f] on node 1
 ...snip...
 [0.00] Early memory node ranges
 [0.00]   node   0: [mem 0x1000-0x0009efff]
 [0.00]   node   0: [mem 0x0010-0xbb7f]
 [0.00]   node   1: [mem 0xbb80-0xefffefff]
 [0.00]   node   1: [mem 0x1-0x1867f]
 
 numactl output for HVM guest:
 
 available: 2 nodes (0-1)
 node 0 cpus: 0
 node 0 size: 2999 MB
 node 0 free: 2546 MB
 node 1 cpus: 1
 node 1 size: 2991 MB
 node 1 free: 2144 MB
 node distances:
 node   0   1 
   0:  10  30 
   1:  30  10 
 
 dmesg output for PV guest:
 
 [0.00] NUMA: Initialized distance table, cnt=2
 [0.00] NUMA: Node 1 [mem 0xbb80-0xce68efff] + [mem 
 0x1-0x1a8970fff] - [mem 0xbb80-0x1a8970fff]
 [0.00] NODE_DATA(0) allocated [mem 0xbb7fc000-0xbb7f]
 [0.00] NODE_DATA(1) allocated [mem 0x1a8969000-0x1a896cfff]
 
 numactl output for PV guest:
 
 available: 2 nodes (0-1)
 node 0 cpus: 0
 node 0 size: 2944 MB
 node 0 free: 2917 MB
 node 1 cpus: 1
 node 1 size: 2934 MB
 node 1 free: 2904 MB
 node distances:
 node   0   1 
   0:  10  30
   1:  30  10
 
 And for a HVM guest on a real NUMA-capable machine:
 
 (XEN) Memory location of each domain:
 (XEN) Domain 0 (total: 262144):
 (XEN) Node 0: 245758
 (XEN) Node 1: 16386
 (XEN) Domain 2 (total: 2097226):
 (XEN) Node 0: 1046335
 (XEN) Node 1: 1050891
 (XEN)  2 vnodes, 4 vcpus
 (XEN)vnode   0 - pnode 0
 (XEN) 3840 MB:  0x0 - 0xf000
 (XEN) 256 MB:  0x1 - 0x11000
 (XEN) vcpus: 0 1 
 (XEN)vnode   1 - pnode 1
 (XEN) 4096 MB:  0x11000 - 0x21000
 (XEN) vcpus: 2 3 
 
 Wei.
 
 [0] 2014173606.gc21...@zion.uk.xensource.com
 
 Wei Liu (19):
   xen: dump vNUMA information with debug key u
   xen: make two memory hypercalls vNUMA-aware
   libxc: allocate memory with vNUMA information for PV guest
   libxl: add emacs local variables in libxl_{x86,arm}.c
   libxl: introduce vNUMA types
   libxl: add vmemrange to libxl__domain_build_state
   libxl: introduce libxl__vnuma_config_check
   libxl: x86: factor out e820_host_sanitize
   libxl: functions to build vmemranges for PV guest
   libxl: build, check and pass vNUMA info to Xen for PV guest
   hvmloader: add new fields for vNUMA information
   hvmloader: construct SRAT
   hvmloader: construct SLIT
   hvmloader: disallow memory relocation when vNUMA is

[Xen-devel] [PATCH v4] Fixes for PCI backend for 3.19

2014-11-21 Thread Konrad Rzeszutek Wilk


Hey,

The last time I posted these patches: https://lkml.org/lkml/2014/7/8/533
was so long ago that I don't even remember most comments. The only
one that stuck in mind was David's recommendation to add a new PCI API
call and use that. See patch #6 and #7.

The original posting (v3) had an extra patch that would do slot and
bus reset using do_flr SysFS attribute. I will revisit that once I am
done with this patchset.

I also seem to have had in my a bunch of 'SoB' from David - which
makes no sense - unless I pulled from his tree. Anyhow wherewere
I saw them I removed them.

Please take a look and comment. If I missed some comment from months
ago hopefully the new version has clarified them.


 drivers/pci/pci.c |  5 +--
 drivers/xen/xen-pciback/passthrough.c | 14 ++--
 drivers/xen/xen-pciback/pci_stub.c| 60 ++-
 drivers/xen/xen-pciback/pciback.h |  7 ++--
 drivers/xen/xen-pciback/vpci.c| 14 ++--
 drivers/xen/xen-pciback/xenbus.c  |  4 +--
 include/linux/device.h|  5 +++
 include/linux/pci.h   |  2 ++
 8 files changed, 76 insertions(+), 35 deletions(-)

Konrad Rzeszutek Wilk (7):
  xen/pciback: Don't deadlock when unbinding.
  driver core: Provide an wrapper around the mutex to do lockdep warnings
  xen/pciback: Include the domain id if removing the device whilst still in 
use
  xen/pciback: Print out the domain owning the device.
  xen/pciback: Remove tons of dereferences
  PCI: Expose pci_load_saved_state for public consumption.
  xen/pciback: Restore configuration space when detaching from a guest.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 6/7] PCI: Expose pci_load_saved_state for public consumption.

2014-11-21 Thread Konrad Rzeszutek Wilk

We have the pci_load_and_free_saved_state, and pci_store_saved_state
but are missing the functionality to just load the state
multiple times in the PCI device without having to free/save
the state.

This patch makes it possible to use this function.

CC: Bjorn Helgaas bhelg...@google.com
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 drivers/pci/pci.c   | 5 +++--
 include/linux/pci.h | 2 ++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..f00a9d6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1142,8 +1142,8 @@ EXPORT_SYMBOL_GPL(pci_store_saved_state);
  * @dev: PCI device that we're dealing with
  * @state: Saved state returned from pci_store_saved_state()
  */
-static int pci_load_saved_state(struct pci_dev *dev,
-   struct pci_saved_state *state)
+int pci_load_saved_state(struct pci_dev *dev,
+struct pci_saved_state *state)
 {
struct pci_cap_saved_data *cap;
 
@@ -1171,6 +1171,7 @@ static int pci_load_saved_state(struct pci_dev *dev,
dev-state_saved = true;
return 0;
 }
+EXPORT_SYMBOL_GPL(pci_load_saved_state);
 
 /**
  * pci_load_and_free_saved_state - Reload the save state pointed to by state,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5be8db4..08088cb1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1003,6 +1003,8 @@ void __iomem __must_check *pci_platform_rom(struct 
pci_dev *pdev, size_t *size);
 int pci_save_state(struct pci_dev *dev);
 void pci_restore_state(struct pci_dev *dev);
 struct pci_saved_state *pci_store_saved_state(struct pci_dev *dev);
+int pci_load_saved_state(struct pci_dev *dev,
+struct pci_saved_state *state);
 int pci_load_and_free_saved_state(struct pci_dev *dev,
  struct pci_saved_state **state);
 struct pci_cap_saved_state *pci_find_saved_cap(struct pci_dev *dev, char cap);
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 2/7] driver core: Provide an wrapper around the mutex to do lockdep warnings

2014-11-21 Thread Konrad Rzeszutek Wilk

Instead of open-coding it in drivers that want to double check
that their functions are indeed holding the device lock.

Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Suggested-by: David Vrabel david.vra...@citrix.com
Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org
---
 drivers/xen/xen-pciback/pci_stub.c | 2 +-
 include/linux/device.h | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 9cbe1a3..8b77089 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -278,7 +278,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
/* Cleanup our device
 * (so it's ready for the next domain)
 */
-   lockdep_assert_held(dev-dev.mutex);
+   device_lock_assert(dev-dev);
__pci_reset_function_locked(dev);
pci_restore_state(dev);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index ce1f2160..41d6a75 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -911,6 +911,11 @@ static inline void device_unlock(struct device *dev)
mutex_unlock(dev-mutex);
 }
 
+static inline void device_lock_assert(struct device *dev)
+{
+   lockdep_assert_held(dev-mutex);
+}
+
 void driver_init(void);
 
 /*
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 5/7] xen/pciback: Remove tons of dereferences

2014-11-21 Thread Konrad Rzeszutek Wilk

A little cleanup. No functional difference.

Reviewed-by: Boris Ostrovsky boris.ostrov...@oracle.com
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 drivers/xen/xen-pciback/pci_stub.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index e5ff09d..843a2ba 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -631,10 +631,12 @@ static pci_ers_result_t common_process(struct 
pcistub_device *psdev,
 {
pci_ers_result_t res = result;
struct xen_pcie_aer_op *aer_op;
+   struct xen_pcibk_device *pdev = psdev-pdev;
+   struct xen_pci_sharedinfo *sh_info = pdev-sh_info;
int ret;
 
/*with PV AER drivers*/
-   aer_op = (psdev-pdev-sh_info-aer_op);
+   aer_op = (sh_info-aer_op);
aer_op-cmd = aer_cmd ;
/*useful for error_detected callback*/
aer_op-err = state;
@@ -655,36 +657,36 @@ static pci_ers_result_t common_process(struct 
pcistub_device *psdev,
* this flag to judge whether we need to check pci-front give aer
* service ack signal
*/
-   set_bit(_PCIB_op_pending, (unsigned long *)psdev-pdev-flags);
+   set_bit(_PCIB_op_pending, (unsigned long *)pdev-flags);
 
/*It is possible that a pcifront conf_read_write ops request invokes
* the callback which cause the spurious execution of wake_up.
* Yet it is harmless and better than a spinlock here
*/
set_bit(_XEN_PCIB_active,
-   (unsigned long *)psdev-pdev-sh_info-flags);
+   (unsigned long *)sh_info-flags);
wmb();
-   notify_remote_via_irq(psdev-pdev-evtchn_irq);
+   notify_remote_via_irq(pdev-evtchn_irq);
 
ret = wait_event_timeout(xen_pcibk_aer_wait_queue,
 !(test_bit(_XEN_PCIB_active, (unsigned long *)
-psdev-pdev-sh_info-flags)), 300*HZ);
+sh_info-flags)), 300*HZ);
 
if (!ret) {
if (test_bit(_XEN_PCIB_active,
-   (unsigned long *)psdev-pdev-sh_info-flags)) {
+   (unsigned long *)sh_info-flags)) {
dev_err(psdev-dev-dev,
pcifront aer process not responding!\n);
clear_bit(_XEN_PCIB_active,
- (unsigned long *)psdev-pdev-sh_info-flags);
+ (unsigned long *)sh_info-flags);
aer_op-err = PCI_ERS_RESULT_NONE;
return res;
}
}
-   clear_bit(_PCIB_op_pending, (unsigned long *)psdev-pdev-flags);
+   clear_bit(_PCIB_op_pending, (unsigned long *)pdev-flags);
 
if (test_bit(_XEN_PCIF_active,
-   (unsigned long *)psdev-pdev-sh_info-flags)) {
+   (unsigned long *)sh_info-flags)) {
dev_dbg(psdev-dev-dev,
schedule pci_conf service in  DRV_NAME \n);
xen_pcibk_test_and_schedule_op(psdev-pdev);
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 1/7] xen/pciback: Don't deadlock when unbinding.

2014-11-21 Thread Konrad Rzeszutek Wilk

As commit 0a9fd0152929db372ff61b0d6c280fdd34ae8bdb
'xen/pciback: Document the entry points for 'pcistub_put_pci_dev''
explained there are four entry points in this function.
Two of them are when the user fiddles in the SysFS to
unbind a device which might be in use by a guest or not.

Both 'unbind' states will cause a deadlock as the the PCI lock has
already been taken, which then pci_device_reset tries to take.

We can simplify this by requiring that all callers of
pcistub_put_pci_dev MUST hold the device lock. And then
we can just call the lockless version of pci_device_reset.

To make it even simpler we will modify xen_pcibk_release_pci_dev
to quality whether it should take a lock or not - as it ends
up calling xen_pcibk_release_pci_dev and needs to hold the lock.

Reviewed-by: Boris Ostrovsky boris.ostrov...@oracle.com
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 drivers/xen/xen-pciback/passthrough.c | 14 +++---
 drivers/xen/xen-pciback/pci_stub.c| 12 ++--
 drivers/xen/xen-pciback/pciback.h |  7 ---
 drivers/xen/xen-pciback/vpci.c| 14 +++---
 drivers/xen/xen-pciback/xenbus.c  |  2 +-
 5 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/xen/xen-pciback/passthrough.c 
b/drivers/xen/xen-pciback/passthrough.c
index 828dddc..f16a30e 100644
--- a/drivers/xen/xen-pciback/passthrough.c
+++ b/drivers/xen/xen-pciback/passthrough.c
@@ -69,7 +69,7 @@ static int __xen_pcibk_add_pci_dev(struct xen_pcibk_device 
*pdev,
 }
 
 static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev,
-   struct pci_dev *dev)
+   struct pci_dev *dev, bool lock)
 {
struct passthrough_dev_data *dev_data = pdev-pci_dev_data;
struct pci_dev_entry *dev_entry, *t;
@@ -87,8 +87,13 @@ static void __xen_pcibk_release_pci_dev(struct 
xen_pcibk_device *pdev,
 
mutex_unlock(dev_data-lock);
 
-   if (found_dev)
+   if (found_dev) {
+   if (lock)
+   device_lock(found_dev-dev);
pcistub_put_pci_dev(found_dev);
+   if (lock)
+   device_unlock(found_dev-dev);
+   }
 }
 
 static int __xen_pcibk_init_devices(struct xen_pcibk_device *pdev)
@@ -156,8 +161,11 @@ static void __xen_pcibk_release_devices(struct 
xen_pcibk_device *pdev)
struct pci_dev_entry *dev_entry, *t;
 
list_for_each_entry_safe(dev_entry, t, dev_data-dev_list, list) {
+   struct pci_dev *dev = dev_entry-dev;
list_del(dev_entry-list);
-   pcistub_put_pci_dev(dev_entry-dev);
+   device_lock(dev-dev);
+   pcistub_put_pci_dev(dev);
+   device_unlock(dev-dev);
kfree(dev_entry);
}
 
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 017069a..9cbe1a3 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -250,6 +250,8 @@ struct pci_dev *pcistub_get_pci_dev(struct xen_pcibk_device 
*pdev,
  *  - 'echo BDF  unbind' with a guest still using it. See pcistub_remove
  *
  *  As such we have to be careful.
+ *
+ *  To make this easier, the caller has to hold the device lock.
  */
 void pcistub_put_pci_dev(struct pci_dev *dev)
 {
@@ -276,11 +278,8 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
/* Cleanup our device
 * (so it's ready for the next domain)
 */
-
-   /* This is OK - we are running from workqueue context
-* and want to inhibit the user from fiddling with 'reset'
-*/
-   pci_reset_function(dev);
+   lockdep_assert_held(dev-dev.mutex);
+   __pci_reset_function_locked(dev);
pci_restore_state(dev);
 
/* This disables the device. */
@@ -567,7 +566,8 @@ static void pcistub_remove(struct pci_dev *dev)
/* N.B. This ends up calling pcistub_put_pci_dev which 
ends up
 * doing the FLR. */
xen_pcibk_release_pci_dev(found_psdev-pdev,
-   found_psdev-dev);
+   found_psdev-dev,
+   false /* caller holds the lock. 
*/);
}
 
spin_lock_irqsave(pcistub_devices_lock, flags);
diff --git a/drivers/xen/xen-pciback/pciback.h 
b/drivers/xen/xen-pciback/pciback.h
index f72af87..58e38d5 100644
--- a/drivers/xen/xen-pciback/pciback.h
+++ b/drivers/xen/xen-pciback/pciback.h
@@ -99,7 +99,8 @@ struct xen_pcibk_backend {
unsigned int *domain, unsigned int *bus,
unsigned int *devfn);
int (*publish)(struct xen_pcibk_device *pdev, publish_pci_root_cb cb);
-   void (*release)(struct xen_pcibk_device *pdev, struct pci_dev *dev);
+   void (*release)(struct xen_pcibk_device *pdev, struct pci_dev

[Xen-devel] [PATCH v4 7/7] xen/pciback: Restore configuration space when detaching from a guest.

2014-11-21 Thread Konrad Rzeszutek Wilk

The commit xen/pciback: Don't deadlock when unbinding. was using
the version of pci_reset_function which would lock the device lock.
That is no good as we can dead-lock. As such we swapped to using
the lock-less version and requiring that the callers
of 'pcistub_put_pci_dev' take the device lock. And as such
this bug got exposed.

Using the lock-less version is  OK, except that we tried to
use 'pci_restore_state' after the lock-less version of
__pci_reset_function_locked - which won't work as 'state_saved'
is set to false. Said 'state_saved' is a toggle boolean that
is to be used by the sequence of a) pci_save_state/pci_restore_state
or b) pci_load_and_free_saved_state/pci_restore_state. We don't
want to use a) as the guest might have messed up the PCI
configuration space and we want it to revert to the state
when the PCI device was binded to us. Therefore we pick
b) to restore the configuration space.

We restore from our 'golden' version of PCI configuration space, when an:
 - Device is unbinded from pciback
 - Device is detached from a guest.

Reported-by:  Sander Eikelenboom li...@eikelenboom.it
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 drivers/xen/xen-pciback/pci_stub.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 843a2ba..eb8b58e 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -105,7 +105,7 @@ static void pcistub_device_release(struct kref *kref)
 */
__pci_reset_function_locked(dev);
if (pci_load_and_free_saved_state(dev, dev_data-pci_saved_state))
-   dev_dbg(dev-dev, Could not reload PCI state\n);
+   dev_info(dev-dev, Could not reload PCI state\n);
else
pci_restore_state(dev);
 
@@ -257,6 +257,8 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 {
struct pcistub_device *psdev, *found_psdev = NULL;
unsigned long flags;
+   struct xen_pcibk_dev_data *dev_data;
+   int ret;
 
spin_lock_irqsave(pcistub_devices_lock, flags);
 
@@ -279,9 +281,19 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 * (so it's ready for the next domain)
 */
device_lock_assert(dev-dev);
-   __pci_reset_function_locked(dev);
-   pci_restore_state(dev);
-
+   dev_data = pci_get_drvdata(dev);
+   ret = pci_load_saved_state(dev, dev_data-pci_saved_state);
+   if (ret  0)
+   dev_warn(dev-dev, Could not reload PCI state\n);
+   else {
+   __pci_reset_function_locked(dev);
+   /*
+* The usual sequence is pci_save_state  pci_restore_state
+* but the guest might have messed the configuration space up.
+* Use the initial version (when device was bound to us).
+*/
+   pci_restore_state(dev);
+   }
/* This disables the device. */
xen_pcibk_reset_device(dev);
 
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] frequent lockups in 3.18rc4

2014-11-21 Thread Konrad Rzeszutek Wilk

On Fri, Nov 21, 2014 at 08:51:43PM +0100, Thomas Gleixner wrote:
 On Fri, 21 Nov 2014, Linus Torvalds wrote:
  Here's the simplified end result. Again, this is TOTALLY UNTESTED. I
  compiled it and verified that the code generation looks like what I'd
  have expected, but that's literally it.
  
static noinline int vmalloc_fault(unsigned long address)
{
  pgd_t *pgd_dst;
  pgdval_t pgd_entry;
  unsigned index = pgd_index(address);
  
  if (index  KERNEL_PGD_BOUNDARY)
  return -1;
  
  pgd_entry = init_mm.pgd[index].pgd;
  if (!pgd_entry)
  return -1;
  
  pgd_dst = __va(PAGE_MASK  read_cr3());
  pgd_dst += index;
  
  if (pgd_dst-pgd)
  return -1;
  
  ACCESS_ONCE(pgd_dst-pgd) = pgd_entry;
 
 This will break paravirt. set_pgd/set_pmd are paravirt functions.
 
 But I'm fine with breaking it, then you just need to change
 CONFIG_PARAVIRT to 'def_bool n'

That is not very nice.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] Is: Fixes for xl pci-attach for Xen 4.5 confirmed to fix by Intel.Was:Re: [TestDay] VMX test report for Xen 4.5.0-rc1

2014-11-24 Thread Konrad Rzeszutek Wilk

On Mon, Nov 24, 2014 at 07:43:49AM +, Hu, Robert wrote:
  -Original Message-
  From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
  Sent: Wednesday, November 12, 2014 8:39 PM
  To: Hu, Robert; xen-devel@lists.xen.org
  Cc: jbeul...@suse.com
  Subject: Re: [Xen-devel] [TestDay] VMX test report for Xen 4.5.0-rc1

  On 11/12/2014 01:58 AM, Hu, Robert wrote:
   2. Failed to hotplug a VT-d device with XEN4.5-RC1
  http://bugzilla-archived.xenproject.org/bugzilla/show_bug.cgi?id=1894

  This should be addressed by these two:
  http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00875.html
  http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00874.html
 Tried, these patches works. When will they get in?

CC-ed Ian's so they are aware of the independent testing done by Intel.

Thank you!

  -boris

 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Configure block device IO rate

2014-11-24 Thread Konrad Rzeszutek Wilk

On Mon, Nov 24, 2014 at 12:07:39PM +, George Dunlap wrote:
 On Mon, Nov 24, 2014 at 3:54 AM, aragrawal aragra...@cs.wisc.edu wrote:
  Hi
 
  I am interested in getting the block I/O rate fixed for a Virtual Machine.
  However, it seems that there are no such options to control the block I/O
  rate via the VM configuration file.
  Going through the net, I see suggestions to configure the xen_blkback driver
  using ionice to get that done.
 
  I would like to know if there are any plans to include any such
  configuration option.
  Also, are there any particular reason (apart from the fact that there are
  other ways to get this done) for the absence of such configuration option?
 
 I'm pretty sure it's just because nobody had asked for it yet.
 
 We can put it on our list of things to think about doing / projects
 for people new to the community.
 
 We also accept patches. :-)

Patches for this were in the past posted - it just that nobody explained
to the maintainer (me) why the dm-* or io-nice would not do the same job.

I sadly don't remember from whom they were or such.
 
  -George
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Configure block device IO rate

2014-11-24 Thread Konrad Rzeszutek Wilk

On Mon, Nov 24, 2014 at 03:11:20PM +, Ian Campbell wrote:
 On Mon, 2014-11-24 at 10:06 -0500, Konrad Rzeszutek Wilk wrote:
  On Mon, Nov 24, 2014 at 12:07:39PM +, George Dunlap wrote:
   On Mon, Nov 24, 2014 at 3:54 AM, aragrawal aragra...@cs.wisc.edu wrote:
Hi
   
I am interested in getting the block I/O rate fixed for a Virtual 
Machine.
However, it seems that there are no such options to control the block 
I/O
rate via the VM configuration file.
Going through the net, I see suggestions to configure the xen_blkback 
driver
using ionice to get that done.
   
I would like to know if there are any plans to include any such
configuration option.
Also, are there any particular reason (apart from the fact that there 
are
other ways to get this done) for the absence of such configuration 
option?
   
   I'm pretty sure it's just because nobody had asked for it yet.
   
   We can put it on our list of things to think about doing / projects
   for people new to the community.
   
   We also accept patches. :-)
  
  Patches for this were in the past posted - it just that nobody explained
  to the maintainer (me) why the dm-* or io-nice would not do the same job.
 
 AIUI the question here is why the toolstack doesn't integrate a guest
 cfg file setting to configure dm-* and/or io-nice, rather than why
 blkback doesn't duplicate their functionality.

Ah, that is a different question. That certainly could be done by somebody
who has the time.
 
 Ian.
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/cpuidle: don't count C1 multiple times

2014-11-24 Thread Konrad Rzeszutek Wilk

On Mon, Nov 24, 2014 at 11:47:28AM +, Jan Beulich wrote:
 Commit 4ca6f9f0 (x86/cpuidle: publish new states only after fully
 initializing them) resulted in the state counter to be incremented
 for C1 despite that using a fixed table entry (and the statically
 initialized counter value already accounting for it and C0).
 

Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Reported-by: Steve Freitas sfl...@ihonk.com
Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
(thought it would be good to get from Steve an confirmation that this
fixes it - which I believe is 99% the case).

 Signed-off-by: Jan Beulich jbeul...@suse.com
 
 --- a/xen/arch/x86/acpi/cpu_idle.c
 +++ b/xen/arch/x86/acpi/cpu_idle.c
 @@ -1015,7 +1015,7 @@ static void set_cx(
  cx-target_residency = cx-latency * latency_factor;
  
  smp_wmb();
 -acpi_power-count++;
 +acpi_power-count += (cx-type != ACPI_STATE_C1);
  if ( cx-type == ACPI_STATE_C1 || cx-type == ACPI_STATE_C2 )
  acpi_power-safe_state = cx;
  }
 
 
 

 x86/cpuidle: don't count C1 multiple times
 
 Commit 4ca6f9f0 (x86/cpuidle: publish new states only after fully
 initializing them) resulted in the state counter to be incremented
 for C1 despite that using a fixed table entry (and the statically
 initialized counter value already accounting for it and C0).
 
 Signed-off-by: Jan Beulich jbeul...@suse.com
 
 --- a/xen/arch/x86/acpi/cpu_idle.c
 +++ b/xen/arch/x86/acpi/cpu_idle.c
 @@ -1015,7 +1015,7 @@ static void set_cx(
  cx-target_residency = cx-latency * latency_factor;
  
  smp_wmb();
 -acpi_power-count++;
 +acpi_power-count += (cx-type != ACPI_STATE_C1);
  if ( cx-type == ACPI_STATE_C1 || cx-type == ACPI_STATE_C2 )
  acpi_power-safe_state = cx;
  }


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] virtio leaks cpu mappings, was: qemu crash with virtio on Xen domUs (backtrace included)

2014-11-24 Thread Konrad Rzeszutek Wilk

On Mon, Nov 24, 2014 at 06:44:45PM +, Stefano Stabellini wrote:
 On Mon, 24 Nov 2014, Stefano Stabellini wrote:
  On Mon, 24 Nov 2014, Stefano Stabellini wrote:
   CC'ing Paolo.
   
   
   Wen,
   thanks for the logs.
   
   I investigated a little bit and it seems to me that the bug occurs when
   QEMU tries to unmap only a portion of a memory region previously mapped.
   That doesn't work with xen-mapcache.
   
   See these logs for example:
   
   DEBUG address_space_map phys_addr=78ed8b44 vaddr=7fab50afbb68 len=0xa
   DEBUG address_space_unmap vaddr=7fab50afbb68 len=0x6
  
  Sorry the logs don't quite match, it was supposed to be:
  
  DEBUG address_space_map phys_addr=78ed8b44 vaddr=7fab50afbb64 len=0xa
  DEBUG address_space_unmap vaddr=7fab50afbb68 len=0x6
 
 It looks like the problem is caused by iov_discard_front, called by
 virtio_net_handle_ctrl. By changing iov_base after the sg has already
 been mapped (cpu_physical_memory_map), it causes a leak in the mapping
 because the corresponding cpu_physical_memory_unmap will only unmap a
 portion of the original sg.  On Xen the problem is worse because
 xen-mapcache aborts.

Didn't um Andy post patches for ths:

http://lists.xen.org/archives/html/xen-devel/2014-09/msg02864.html

 
 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index 2ac6ce5..b2b5c2d 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
 @@ -775,7 +775,7 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, 
 VirtQueue *vq)
  struct iovec *iov;
  unsigned int iov_cnt;
  
 -while (virtqueue_pop(vq, elem)) {
 +while (virtqueue_pop_nomap(vq, elem)) {
  if (iov_size(elem.in_sg, elem.in_num)  sizeof(status) ||
  iov_size(elem.out_sg, elem.out_num)  sizeof(ctrl)) {
  error_report(virtio-net ctrl missing headers);
 @@ -784,8 +784,12 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, 
 VirtQueue *vq)
  
  iov = elem.out_sg;
  iov_cnt = elem.out_num;
 -s = iov_to_buf(iov, iov_cnt, 0, ctrl, sizeof(ctrl));
  iov_discard_front(iov, iov_cnt, sizeof(ctrl));
 +
 +virtqueue_map_sg(elem.in_sg, elem.in_addr, elem.in_num, 1);
 +virtqueue_map_sg(elem.out_sg, elem.out_addr, elem.out_num, 0);
 +
 +s = iov_to_buf(iov, iov_cnt, 0, ctrl, sizeof(ctrl));
  if (s != sizeof(ctrl)) {
  status = VIRTIO_NET_ERR;
  } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
 diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
 index 3e4b70c..6a4bd3a 100644
 --- a/hw/virtio/virtio.c
 +++ b/hw/virtio/virtio.c
 @@ -446,7 +446,7 @@ void virtqueue_map_sg(struct iovec *sg, hwaddr *addr,
  }
  }
  
 -int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
 +int virtqueue_pop_nomap(VirtQueue *vq, VirtQueueElement *elem)
  {
  unsigned int i, head, max;
  hwaddr desc_pa = vq-vring.desc;
 @@ -505,9 +505,6 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
  }
  } while ((i = virtqueue_next_desc(desc_pa, i, max)) != max);
  
 -/* Now map what we have collected */
 -virtqueue_map_sg(elem-in_sg, elem-in_addr, elem-in_num, 1);
 -virtqueue_map_sg(elem-out_sg, elem-out_addr, elem-out_num, 0);
  
  elem-index = head;
  
 @@ -517,6 +514,16 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
  return elem-in_num + elem-out_num;
  }
  
 +int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
 +{
 +int rc = virtqueue_pop_nomap(vq, elem);
 +if (rc  0) {
 +virtqueue_map_sg(elem-in_sg, elem-in_addr, elem-in_num, 1);
 +virtqueue_map_sg(elem-out_sg, elem-out_addr, elem-out_num, 0);
 +}
 +return rc;
 +}
 +
  /* virtio device */
  static void virtio_notify_vector(VirtIODevice *vdev, uint16_t vector)
  {
 diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
 index 3e54e90..40a3977 100644
 --- a/include/hw/virtio/virtio.h
 +++ b/include/hw/virtio/virtio.h
 @@ -174,6 +174,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
 *elem,
  void virtqueue_map_sg(struct iovec *sg, hwaddr *addr,
  size_t num_sg, int is_write);
  int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem);
 +int virtqueue_pop_nomap(VirtQueue *vq, VirtQueueElement *elem);
  int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
unsigned int out_bytes);
  void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] vNUMA: rename interface structures

2014-11-25 Thread Konrad Rzeszutek Wilk

On Tue, Nov 25, 2014 at 02:11:39PM +, Jan Beulich wrote:
  On 25.11.14 at 13:36, jbeul...@suse.com wrote:
  No-one (including me) paid attention during review that these
  structures don't adhere to the naming requirements of the public
  interface: Consistently use xen_ prefixes at least for all new
  additions.
  
  Signed-off-by: Jan Beulich jbeul...@suse.com
 
 Sorry, again forgot to Cc you (for 4.5): No functional change, but
 avoiding a later (incompatible) interface change.

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com


 
 Jan
 
  --- a/tools/libxc/include/xenctrl.h
  +++ b/tools/libxc/include/xenctrl.h
  @@ -1264,7 +1264,7 @@ int xc_domain_setvnuma(xc_interface *xch
   uint32_t nr_vnodes,
   uint32_t nr_regions,
   uint32_t nr_vcpus,
  -vmemrange_t *vmemrange,
  +xen_vmemrange_t *vmemrange,
   unsigned int *vdistance,
   unsigned int *vcpu_to_vnode,
   unsigned int *vnode_to_pnode);
  --- a/tools/libxc/xc_domain.c
  +++ b/tools/libxc/xc_domain.c
  @@ -2171,7 +2171,7 @@ int xc_domain_setvnuma(xc_interface *xch
  uint32_t nr_vnodes,
  uint32_t nr_vmemranges,
  uint32_t nr_vcpus,
  -   vmemrange_t *vmemrange,
  +   xen_vmemrange_t *vmemrange,
  unsigned int *vdistance,
  unsigned int *vcpu_to_vnode,
  unsigned int *vnode_to_pnode)
  --- a/xen/common/domctl.c
  +++ b/xen/common/domctl.c
  @@ -345,7 +345,7 @@ static struct vnuma_info *vnuma_alloc(un
   vnuma-vdistance = xmalloc_array(unsigned int, nr_vnodes * nr_vnodes);
   vnuma-vcpu_to_vnode = xmalloc_array(unsigned int, nr_vcpus);
   vnuma-vnode_to_pnode = xmalloc_array(unsigned int, nr_vnodes);
  -vnuma-vmemrange = xmalloc_array(vmemrange_t, nr_ranges);
  +vnuma-vmemrange = xmalloc_array(xen_vmemrange_t, nr_ranges);
   
   if ( vnuma-vdistance == NULL || vnuma-vmemrange == NULL ||
vnuma-vcpu_to_vnode == NULL || vnuma-vnode_to_pnode == NULL )
  --- a/xen/common/memory.c
  +++ b/xen/common/memory.c
  @@ -972,7 +972,7 @@ long do_memory_op(unsigned long cmd, XEN
   
   case XENMEM_get_vnumainfo:
   {
  -struct vnuma_topology_info topology;
  +struct xen_vnuma_topology_info topology;
   struct domain *d;
   unsigned int dom_vnodes, dom_vranges, dom_vcpus;
   struct vnuma_info tmp;
  @@ -1033,7 +1033,7 @@ long do_memory_op(unsigned long cmd, XEN
   read_unlock(d-vnuma_rwlock);
   
   tmp.vdistance = xmalloc_array(unsigned int, dom_vnodes * 
  dom_vnodes);
  -tmp.vmemrange = xmalloc_array(vmemrange_t, dom_vranges);
  +tmp.vmemrange = xmalloc_array(xen_vmemrange_t, dom_vranges);
   tmp.vcpu_to_vnode = xmalloc_array(unsigned int, dom_vcpus);
   
   if ( tmp.vdistance == NULL ||
  --- a/xen/include/public/domctl.h
  +++ b/xen/include/public/domctl.h
  @@ -980,7 +980,7 @@ struct xen_domctl_vnuma {
   /*
* memory rages for each vNUMA node
*/
  -XEN_GUEST_HANDLE_64(vmemrange_t) vmemrange;
  +XEN_GUEST_HANDLE_64(xen_vmemrange_t) vmemrange;
   };
   typedef struct xen_domctl_vnuma xen_domctl_vnuma_t;
   DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t);
  --- a/xen/include/public/memory.h
  +++ b/xen/include/public/memory.h
  @@ -530,14 +530,13 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_
   #define XENMEM_get_vnumainfo26
   
   /* vNUMA node memory ranges */
  -struct vmemrange {
  +struct xen_vmemrange {
   uint64_t start, end;
   unsigned int flags;
   unsigned int nid;
   };
  -
  -typedef struct vmemrange vmemrange_t;
  -DEFINE_XEN_GUEST_HANDLE(vmemrange_t);
  +typedef struct xen_vmemrange xen_vmemrange_t;
  +DEFINE_XEN_GUEST_HANDLE(xen_vmemrange_t);
   
   /*
* vNUMA topology specifies vNUMA node number, distance table,
  @@ -548,7 +547,7 @@ DEFINE_XEN_GUEST_HANDLE(vmemrange_t);
* copied back to guest. Domain returns expected values of nr_vnodes,
* nr_vmemranges and nr_vcpus to guest if the values where incorrect.
*/
  -struct vnuma_topology_info {
  +struct xen_vnuma_topology_info {
   /* IN */
   domid_t domid;
   uint16_t pad;
  @@ -566,12 +565,12 @@ struct vnuma_topology_info {
   uint64_t pad;
   } vcpu_to_vnode;
   union {
  -XEN_GUEST_HANDLE(vmemrange_t) h;
  +XEN_GUEST_HANDLE(xen_vmemrange_t) h;
   uint64_t pad;
   } vmemrange;
   };
  -typedef struct vnuma_topology_info vnuma_topology_info_t;
  -DEFINE_XEN_GUEST_HANDLE(vnuma_topology_info_t);
  +typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
  +DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
   
   /* Next available subop number is 27

Re: [Xen-devel] [PATCH 0/2 V3] fix rename: xenstore not fully updated

2014-11-25 Thread Konrad Rzeszutek Wilk

On Tue, Nov 25, 2014 at 02:13:01PM +, Ian Campbell wrote:
 On Fri, 2014-11-21 at 12:01 -0500, Konrad Rzeszutek Wilk wrote:
  On Thu, Nov 20, 2014 at 03:28:30PM +, Ian Campbell wrote:
   On Wed, 2014-11-19 at 16:25 -0500, Konrad Rzeszutek Wilk wrote:
On Wed, Nov 19, 2014 at 11:26:32AM +, Ian Jackson wrote:
 Hi Konrad, I have another release ack request:
 
 Chunyan Liu writes ([PATCH 0/2 V3] fix rename: xenstore not fully 
 updated):
  Currently libxl__domain_rename only update 
  /local/domain/domid/name,
  still some places in xenstore are not updated, including:
  /vm/uuid/name and 
  /local/domain/0/backend/device/domid/.../domain.
  This patch series updates /vm/uuid/name in xenstore,
 
 This ([PATCH 2/2 V3] fix rename: xenstore not fully updated) is a
 bugfix which I think should go into Xen 4.5.
 
 The risk WITHOUT this patch is that there are out-of-tree tools which
 look here for the domain name and will get confused after it is
 renamed.

When was this introduced? Did it exist with Xend?
   
   Based on:
git grep domain\ RELEASE-4.4.0  tools/python/
git grep domain\' RELEASE-4.4.0 tools/python/
   it doesn't appear so, but someone with a xend install would be needed to
   confirm for sure.
   
   Given that this has always been wrong for a libxl domain after migration
   it seems likely to me that noone is looking at this field.
  
  And this is not a regression though.
  
  What I am understanding is that we advertise to out-side toolstack
  users for a couple of releases something which is misleading and wrong.
 
 ...and also basically useless, there is really no reason to be looking
 for the domain's name under a subset of the backend nodes (only vkb, vfb
 and console have this key at all). None of the helper function which we
 provide do this.
 
 Also these nodes are not advertised as supported either via
 docs/misc/xenstore-paths.markdown or xen/include/public/io/*.h.
 
  My fear is that there such toolstack users out there who will
  get their pitchforks out when this shows up.
  
  But perhaps there is a way to mitigate this. Is there another way
  (and can it be in the commit description) to get the proper
  domain name? I presume it is just looking at /local/domain/%d/name ?
  
  As such could this be added:
  
  The proper way to get the domain name is to get it from
  /local/domain/%d/name instead from this field.
 
 The proper way is to use libxl_domid_to_name, not to go scrobbling
 around in xenstore. We could say this (or both) in the commit message
 though if it would help to reassure you.

That (both) would most certainly reassure me. Thank you!

 
 Either way I think it really would be best to take this fix rather than
 worrying overly about the infinitesimal possibility that someone might
 be using this key.

Right, so with the reassurance text added on:
Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 Ian.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-xen-4.5] libxl: Allow copying smaller bitmap into a larger one

2014-11-25 Thread Konrad Rzeszutek Wilk

On Tue, Nov 25, 2014 at 09:57:54AM -0500, Boris Ostrovsky wrote:
 On 11/25/2014 06:41 AM, Dario Faggioli wrote:
 On Tue, 2014-11-25 at 11:15 +, Wei Liu wrote:
 And here it is.
 
 Boris, can you give it a shot?
 
 ---8---
  From 77531e31d239887b9f36c03e434300bc30683092 Mon Sep 17 00:00:00 2001
 From: Wei Liu wei.l...@citrix.com
 Date: Tue, 25 Nov 2014 10:59:47 +
 Subject: [PATCH] libxl: allow copying between bitmaps of different sizes
 
 When parsing bitmap objects JSON parser will create libxl_bitmap map of the
 smallest size needed.
 
 This can cause problems when saved image file specifies CPU affinity.  For
 example, if 'vcpu_hard_affinity' in the saved image has only the first CPU
 specified, just a single byte will be allocated and libxl_bitmap-size will 
 be
 set to 1.
 
 This will result in assertion in 
 libxl_set_vcpuaffinity()-libxl_bitmap_copy()
 since the destination bitmap is created for maximum number of CPUs.
 
 We could allocate that bitmap of the same size as the source, however, it is
 later passed to xc_vcpu_setaffinity() which expects it to be sized to the 
 max
 number of CPUs
 
 To fix this issue, introduce an internal function to allowing copying 
 between
 bitmaps of different sizes. Note that this function is only used in
 libxl_set_vcpuaffinity at the moment. Though NUMA placement logic invoke
 libxl_bitmap_copy as well there's no need to replace those invocations.  
 NUMA
 placement logic comes into effect when no vcpu / node pinning is provided, 
 so
 it always operates on bitmap of the same sizes (that is, size of maximum
 number of cpus /nodes).
 
 Reported-by: Boris Ostrovsky boris.ostrov...@oracle.com
 Signed-off-by: Wei Liu wei.l...@citrix.com
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Dario Faggioli dario.faggi...@citrix.com
 
 If this end up being the approach, it can have the following:
 
 Reviewed-by: Dario Faggioli dario.faggi...@citrix.com
 
 Tested-by: Boris Ostrovsky boris.ostrov...@oracle.com

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Thank you!
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/2 V3] fix rename: xenstore not fully updated

2014-11-25 Thread Konrad Rzeszutek Wilk

On Tue, Nov 25, 2014 at 03:58:33PM +, Ian Campbell wrote:
 On Tue, 2014-11-25 at 10:51 -0500, Konrad Rzeszutek Wilk wrote:
  Right, so with the reassurance text added on:
  Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 Thanks.
 
 Chunyan, I propose to commit adding the following to the commit text of
 [PATCH 1/2 V3] remove domain field in xenstore backend dir:
 
 The correct way to obtain a domain's name is via
 libxl_domid_to_name(), or by reading
 from /local/domain/$DOMID/name for toolstacks not using libxl.
 
 Does that sound OK to you?

Yes.
 
 Ian.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] INSTALL: correct EXTRA_CFLAGS handling

2014-11-25 Thread Konrad Rzeszutek Wilk

On Tue, Nov 25, 2014 at 05:27:53PM +0100, Olaf Hering wrote:
 On Tue, Nov 25, Konrad Rzeszutek Wilk wrote:
 
  On Tue, Nov 25, 2014 at 05:04:09PM +0100, Olaf Hering wrote:
   +Using additional CFLAGS to build tools running in dom0 is required when
  Why the mention of 'buld tools running in dom0'? It sounds like it is
  required to use dom0 to build tools?
 
 Does it really read like dom0 is required?! But I can change it as you

It did to me but I am not an native English speaker.

Perhaps what you meant was 'build tools which will run in dom0' ?

 suggest.
 
 Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-xen-4.5] x86/pvh/vpmu: Disable VPMU for PVH guests

2014-11-25 Thread Konrad Rzeszutek Wilk

On Tue, Nov 25, 2014 at 04:39:54PM +, Jan Beulich wrote:
  On 25.11.14 at 17:19, boris.ostrov...@oracle.com wrote:
  On 11/25/2014 09:55 AM, Jan Beulich wrote:
 
  Regardless, do you think that disabling VPMU for PVH is worth anyway?
  That depends on what (bad) consequences not doing so has.
  
  I haven't seen anything (besides VAPIC accesses) but I think it would be 
  prudent to prevent any VPMU activity from happening. I can see, for 
  example MSRs and APIC vector being written. All of which look benign on 
  the first sight but who knows...
 
 Yeah, it's not really a problem to put it in (if Konrad agrees; remember
 that PVH is still experimental, and hence fixing bugs caused only by it
 may be out of scope at this point - in any event I think that if your
 patch is to go in, mine should too).

The beaty of experimental is that we can add it later in the cycle as
at worst they will regress something that is unbaked already.

From that perspective the bar to put fixes for 'experimental' is lower
than normal code. The part that I am worried about is the common paths
and this potentially causing regressions on the those.

But the potential for that is low that I am OK with these patches
going in. 
 
 Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] (4.5-rc1) Problems using xl migrate

2014-11-25 Thread Konrad Rzeszutek Wilk

On Tue, Nov 25, 2014 at 01:03:34PM -0500, Daniel De Graaf wrote:
 On 11/25/2014 05:07 AM, George Dunlap wrote:
 On Mon, Nov 24, 2014 at 10:05 PM, Daniel De Graaf dgde...@tycho.nsa.gov 
 wrote:
 I do. The error is
 (XEN) flask_domctl: Unknown op 72
 
 Incidentally, Flask is running in permissive mode.
 
   Michael Young
 
 
 This means that the new domctl needs to be added to the switch statement
 in flask/hooks.c.  This error is triggered in permissive mode because it
 is a code error rather than a policy error (which is what permissive mode
 is intended to debug).
 
 If that's the case, should we make that a BUG_ON()?  Or at least an
 ASSERT() (which will only bug when compiled with debug=y), followed by
 allow if in permissive mode, and deny if in enforcing mode?
 
 Having it default deny, even in permissive mode, breaks the principle
 of least surprise, I think. :-)
 
   -George
 Either one of these will allow a guest to crash the hypervisor by requesting
 an undefined domctl, which is not really a good idea.  Linux uses a flag in
 the security policy which defines if unknown permissions are allowed or
 denied; I will send a patch adding this to Xen's security server and using
 it instead of -EPERM in the default case of the switch statements.

Thought I think that for the DEBUG case we want to still be boldly
told about it so we can fix it.
 
 The patch adding this feature probably shouldn't be applied to 4.5, but I'll
 send it anyway.  I will also send a separate patch adding the 2 domctls.
 
 -- 
 Daniel De Graaf
 National Security Agency
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] xsm/flask: add two missing domctls

2014-11-25 Thread Konrad Rzeszutek Wilk

On Tue, Nov 25, 2014 at 06:19:05PM +, Andrew Cooper wrote:
 On 25/11/14 16:57, Daniel De Graaf wrote:
  Reported-by: Michael Young m.a.yo...@durham.ac.uk
  Signed-off-by: Daniel De Graaf dgde...@tycho.nsa.gov
 
 Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
 
 CC'd Konrad, as this should be accepted into Xen-4.5.  Without it,
 migration/suspend fails with -EPERM in the default case when XSM is
 compiled into Xen.

Yup. Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 Daniel: there are 4 hypercalls for getting/setting bits of PV VCPU state:
 
 XEN_DOMCTL_{get,set}vcpucontext
 XEN_DOMCTL_{get,set}_ext_vcpucontext
 XEN_DOMCTL_{get,set}vcpuextstate
 XEN_DOMCTL_{get,set}_vcpu_msrs
 
 I see no reason for these to have separate access vectors; you typically
 either need to use all of them, or none, but I presume it is too late to
 coalesce the vectors in a backwards compatible way?
 
 ~Andrew
 
  ---
   xen/xsm/flask/hooks.c   | 2 ++
   xen/xsm/flask/policy/access_vectors | 2 ++
   2 files changed, 4 insertions(+)
 
  diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
  index 0ba2ce9..d48463f 100644
  --- a/xen/xsm/flask/hooks.c
  +++ b/xen/xsm/flask/hooks.c
  @@ -672,9 +672,11 @@ static int flask_domctl(struct domain *d, int cmd)
   return current_has_perm(d, SECCLASS_HVM, HVM__CACHEATTR);
   
   case XEN_DOMCTL_set_ext_vcpucontext:
  +case XEN_DOMCTL_set_vcpu_msrs:
   return current_has_perm(d, SECCLASS_DOMAIN, 
  DOMAIN__SETEXTVCPUCONTEXT);
   
   case XEN_DOMCTL_get_ext_vcpucontext:
  +case XEN_DOMCTL_get_vcpu_msrs:
   return current_has_perm(d, SECCLASS_DOMAIN, 
  DOMAIN__GETEXTVCPUCONTEXT);
   
   case XEN_DOMCTL_setvcpuextstate:
  diff --git a/xen/xsm/flask/policy/access_vectors 
  b/xen/xsm/flask/policy/access_vectors
  index 1cd451e..1da9f63 100644
  --- a/xen/xsm/flask/policy/access_vectors
  +++ b/xen/xsm/flask/policy/access_vectors
  @@ -151,8 +151,10 @@ class domain
   # XEN_DOMCTL_sendtrigger
   trigger
   # XEN_DOMCTL_get_ext_vcpucontext
  +# XEN_DOMCTL_set_vcpu_msrs
   getextvcpucontext
   # XEN_DOMCTL_set_ext_vcpucontext
  +# XEN_DOMCTL_get_vcpu_msrs
   setextvcpucontext
   # XEN_DOMCTL_getvcpuextstate
   getvcpuextstate
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] kdump with xen-unstable on efi machine

2014-11-26 Thread Konrad Rzeszutek Wilk

On Wed, Nov 26, 2014 at 03:01:51PM +0100, Juergen Gross wrote:
 On 11/26/2014 01:41 PM, Andrew Cooper wrote:
 On 26/11/14 12:15, Juergen Gross wrote:
 Hi,
 
 I tried to enable kdump on my test-machine with actual xen-unstable
 booting via EFI.
 
 The kdump kernel is not being loaded.
 
 I'm seeing the memory being reserved:
 
 (XEN) EFI RAM map:
 (XEN)   - 000a (usable)
 (XEN)  0010 - 4bc0 (usable)
 (XEN)  4bc0 - 5bc0 (reserved)
 (XEN)  5bc0 - 5bfec000 (usable)
 (XEN)  5bfec000 - 5c00 (ACPI NVS)
 (XEN)  5c00 - 6a429000 (usable)
 (XEN)  6a429000 - 6a42c000 (reserved)
 (XEN)  6a42c000 - 6a7a2000 (usable)
 (XEN)  6a7a2000 - 6a7a8000 (reserved)
 (XEN)  6a7a8000 - 6a987000 (usable)
 (XEN)  6a987000 - 6a98d000 (reserved)
 (XEN)  6a98d000 - 6aa63000 (usable)
 (XEN)  6aa63000 - 6aa73000 (reserved)
 (XEN)  6aa73000 - 6ac6 (usable)
 (XEN)  6ac6 - 6ac61000 (reserved)
 (XEN)  6ac61000 - 6ac9b000 (ACPI data)
 (XEN)  6ac9b000 - 6acac000 (reserved)
 (XEN)  6acac000 - 6acad000 (usable)
 (XEN)  6acad000 - 6acae000 (reserved)
 (XEN)  6acae000 - 7189c000 (usable)
 (XEN)  7189c000 - 71946000 (reserved)
 (XEN)  71946000 - 72d76000 (ACPI NVS)
 (XEN)  72d76000 - 72db2000 (ACPI data)
 (XEN)  72db2000 - 72edc000 (usable)
 (XEN)  8000 - 9000 (reserved)
 (XEN)  0001 - 00208000 (usable)
 (XEN) Kdump: 256MB (262144kB) at 0x206dff4000
 
 I'd expect this area being visible in the efi or e820 map presented to
 dom0, but I can't see anything:
 
 This is expected.  The dom0 kernel now has nothing at all do with
 loading crash kernel.  Loading happens via hypercalls straight from the
 kexec utility.
 
 You need kexec-tools 2.0.4 (I think) or later, compiled with Xen
 support, but it should JustWork.
 
 Should. I have kexec 2.0.5 with Xen support. Doesn't work:
 
 Excerpt form strace:
 
 sysctl operation failed -- need to rebuild the user-space tool set?\n
 
 My personal translation: kexec is tightly coupled to the Xen version
 (this one was built against Xen 4.4.1 AFAIK).

Odd, the hypercall interface did not change in Xen 4.5 for kexec?

Perhaps it is making some other hypercalls that are tied in
to the version of Xen (like sysctl ones?).

I presume with recompiling it works?
 
 Perhaps we should add kexec to the tools directory?

Gosh no.
 
 
 Juergen
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen-pciback: Add MODULE_ALIAS for pciback.

2014-11-26 Thread Konrad Rzeszutek Wilk


On Nov 26, 2014 11:39 AM, Luis R. Rodriguez mcg...@do-not-panic.com wrote:

 On Wed, Aug 20, 2014 at 1:26 PM, Ian Campbell ian.campb...@citrix.com 
 wrote: 
  On Wed, 2014-08-20 at 13:20 -0400, Konrad Rzeszutek Wilk wrote: 
  On Wed, Aug 20, 2014 at 06:18:52PM +0100, Ian Campbell wrote: 
   On Wed, 2014-08-20 at 12:40 -0400, Konrad Rzeszutek Wilk wrote: 
The rest of the Xen device drivers use an module alias 
to load devices when they shop up in XenBus. 
   
   show. 

     MODULE_LICENSE(Dual BSD/GPL); 
     MODULE_ALIAS(xen-backend:pci); 
+MODULE_ALIAS(xen:pci); 
   
   Isn't that xen-backend:pci already the right thing for a backend device? 
   xen: is for frontends, I thought. 
  
  Oh, you are right. Cool! Thanks! 
  
  The patch turned out to be even more trivial than expected ;-) 

 Is this what we expected to be pending work for the item device 
 hotplug (MODULE_ALIAS) on the upstream TODO list? 

 http://wiki.xenproject.org/wiki/XenParavirtOps#Upstream_delta_details 

 This was simply to just auto load that driver when needed right? 


Right which it does now.

 Also as for actual PCI device hotplug support: 

 http://wiki.xen.org/wiki/Xen_PCI_Passthrough#Hotplug 

 I don't think we need a delta for that do we? 

Nope. This one is all done.

 Luis 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] tools/oxenstored: Fix | vs error in fd event handling

2014-11-26 Thread Konrad Rzeszutek Wilk

On Wed, Nov 26, 2014 at 06:24:11PM +, Dave Scott wrote:
 
  On 26 Nov 2014, at 15:38, Zheng Li d...@zheng.li wrote:
  
  On 26/11/2014 15:09, Andrew Cooper wrote:
  This makes fields 0 and 1 true more often than they should be, resulting
  problems when handling events.
  
  Indeed, looks like a mistake I made when rewriting the logic terms lately. 
  The result is POLLUP or POLLERR events being returned in more categories 
  than we'd interest. Thanks for fixing this!
  
  Acked-by: Zheng Li d...@zheng.li
 
 This also looks fine to me
 
 Acked-by: David Scott dave.sc...@citrix.com

Would it be possible to get an Reviewed-by please?

Thank you.
 
 Cheers,
 Dave
 
  
  
  Cheers,
  Zheng
  
  
  ---
  
  This was discovered with XenServers internal Coverity instance.  I have 
  yet to
  work out why the issue is not identified by the upstream coverity scanning.
  
  Konrad: This is a bug in the default event handling used by oxenstored in 
  4.5,
  as the default switched from select() to poll() in the 4.5 timeframe.  It
  would appear that the negative side effects are limited to just logspam 
  about
  certain clients attempting invalid actions, but I can't rule out anything 
  more
  problematic.
  ---
   tools/ocaml/xenstored/select_stubs.c |4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
  
  diff --git a/tools/ocaml/xenstored/select_stubs.c 
  b/tools/ocaml/xenstored/select_stubs.c
  index 4a8edb5..af72b84 100644
  --- a/tools/ocaml/xenstored/select_stubs.c
  +++ b/tools/ocaml/xenstored/select_stubs.c
  @@ -56,8 +56,8 @@ CAMLprim value stub_select_on_poll(value fd_events, 
  value timeo) {
 events = Field(Field(fd_events, i), 1);
  
 if (c_fds[i].revents  POLLNVAL) unix_error(EBADF, 
  select, Nothing);
  -  Field(events, 0) = Val_bool(c_fds[i].events | POLLIN  
   c_fds[i].revents  (POLLIN |POLLHUP|POLLERR));
  -  Field(events, 1) = Val_bool(c_fds[i].events | POLLOUT 
   c_fds[i].revents  (POLLOUT|POLLHUP|POLLERR));
  +  Field(events, 0) = Val_bool(c_fds[i].events  POLLIN  
   c_fds[i].revents  (POLLIN |POLLHUP|POLLERR));
  +  Field(events, 1) = Val_bool(c_fds[i].events  POLLOUT 
   c_fds[i].revents  (POLLOUT|POLLHUP|POLLERR));
 Field(events, 2) = Val_bool(c_fds[i].revents  POLLPRI);
  
 }
  
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] docs/commandline: Refresh document for 4.5

2014-11-27 Thread Konrad Rzeszutek Wilk


On Nov 27, 2014 8:06 AM, Jan Beulich jbeul...@suse.com wrote:

  On 27.11.14 at 12:06, andrew.coop...@citrix.com wrote: 
  Konrad: A release ack please.  It would be nice if the command line 
  documentation was correct for 4.5. 

 Honestly, unless really controversial, I don't think release acks are 
 really needed for purely documentation changes. 

nods
 

 Jan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [COVERITY ACCESS] Request for access to Coverity

2014-11-27 Thread Konrad Rzeszutek Wilk


On Nov 27, 2014 6:59 AM, Tim Deegan t...@xen.org wrote:

 At 11:39 + on 27 Nov (1417084797), George Dunlap wrote: 
  -BEGIN PGP SIGNED MESSAGE- 
  Hash: SHA512 
  
  I agree to the conditions in the XenProject Coverity contribution 
  guidelines [1]. 
  
  I'm a developer working for Citrix Systems UK, Ltd.  I've been active 
  in the Xen community since 2006; I was release coordinator for the 4.3 
  and 4.4 releases, and I am currently maintainer of the Xen scheduling 
  subsystem. 
  
  I would like access primarily to be able to write and speak 
  intelligently about Xen and Coverity in blog postings and conference 
  talks.  I figure it would be easier to go through the stats and 
  history myself, rather than trying to get some other developer to do 
  it for me.  (Although of course once I have access I'll probably end 
  up doing some work looking at scans anyway.) 

 +1 

+1 too.

 Tim. 

 ___ 
 Xen-devel mailing list 
 Xen-devel@lists.xen.org 
 http://lists.xen.org/xen-devel 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] INSTALL: correct EXTRA_CFLAGS handling

2014-11-27 Thread Konrad Rzeszutek Wilk

On November 27, 2014 4:26:26 AM EST, Olaf Hering o...@aepfle.de wrote:
The already documented configure patch was not applied.
Adjust documentation to describe existing behaviour.

Signed-off-by: Olaf Hering o...@aepfle.de
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Reviewed-by: me.

Don't need an release ack for it.

---
v3: reword as suggested by Konrad, trim Cc list
v2: resend due to lack of Cc: tags

 INSTALL | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/INSTALL b/INSTALL
index 6bb9d23..0bc67ea 100644
--- a/INSTALL
+++ b/INSTALL
@@ -128,13 +128,6 @@ original xenstored will be used. Valid names are
xenstored and
 oxenstored.
   --with-xenstored=name
 
-Using additional CFLAGS to build tools running in dom0 is required
when
-building distro packages. This is the option to pass things like
-RPM_OPT_FLAGS.
-  --with-extra-cflags-tools=EXTRA_CFLAGS
-  --with-extra-cflags-qemu-traditional=EXTRA_CFLAGS
-  --with-extra-cflags-qemu-upstream=EXTRA_CFLAGS
-
 Instead of starting the tools in dom0 with sysv runlevel scripts they
can also be started by systemd. If this option is enabled xenstored
will
 receive the communication socked directly from systemd. So starting it
@@ -241,6 +234,13 @@ QEMU_UPSTREAM_URL=
 QEMU_TRADITIONAL_URL=
 SEABIOS_UPSTREAM_URL=
 
+Using additional CFLAGS to build tools which will run in dom0 is
+required when building distro packages. These variables can be used to
+pass RPM_OPT_FLAGS.
+EXTRA_CFLAGS_XEN_TOOLS=
+EXTRA_CFLAGS_QEMU_TRADITIONAL=
+EXTRA_CFLAGS_QEMU_XEN=
+
 This variable can be used to use DIR/include and DIR/lib during build.
 This is the same as PREPEND_LIB and PREPEND_INCLUDES. APPEND_LIB and
 APPEND_INCLUDES= will be appended to the CFLAGS/LDFLAGS variable.
@@ -310,10 +310,10 @@ sudo make install BOOT_DIR=/ood/path/boot
EFI_DIR=/odd/path/efi
 %build
 export WGET=$(type -P false)
 export GIT=$(type -P false)
+export EXTRA_CFLAGS_XEN_TOOLS=$RPM_OPT_FLAGS
+export EXTRA_CFLAGS_QEMU_TRADITIONAL=$RPM_OPT_FLAGS
+export EXTRA_CFLAGS_QEMU_XEN=$RPM_OPT_FLAGS
 %configure \
---with-extra-cflags-tools=$RPM_OPT_FLAGS \
---with-extra-cflags-qemu-traditional=$RPM_OPT_FLAGS \
---with-extra-cflags-qemu-upstream=$RPM_OPT_FLAGS \
 --with-initddir=%{_initddir}
 unset CFLAGS CXXFLAGS FFLAGS LDFLAGS
 make



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5 3/3] python/xs: Correct the indirection of the NULL xshandle() check

2014-12-01 Thread Konrad Rzeszutek Wilk

On Thu, Nov 27, 2014 at 12:34:34PM +, Andrew Cooper wrote:
 The code now now matches its comment, and will actually catch the case of a
 bad xs handle.
 
 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
 Coverity-ID: 1055948
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Ian Jackson ian.jack...@eu.citrix.com
 CC: Wei Liu wei.l...@citrix.com
 CC: Xen Coverity Team cover...@xen.org

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  tools/python/xen/lowlevel/xs/xs.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/tools/python/xen/lowlevel/xs/xs.c 
 b/tools/python/xen/lowlevel/xs/xs.c
 index 84e1711..ec364bb 100644
 --- a/tools/python/xen/lowlevel/xs/xs.c
 +++ b/tools/python/xen/lowlevel/xs/xs.c
 @@ -816,7 +816,7 @@ static int parse_transaction_path(XsHandle *self, 
 PyObject *args,
  
  *xh = xshandle(self);
  
 -if (!xh)
 +if (!*xh)
  return 0;
  
  if (!PyArg_ParseTuple(args, ss, thstr, path))
 -- 
 1.7.10.4
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5 2/3] python/xc: Fix multiple issues in pyxc_readconsolering()

2014-12-01 Thread Konrad Rzeszutek Wilk

On Fri, Nov 28, 2014 at 11:38:52AM +, Ian Campbell wrote:
 On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote:
  Don't leak a 16k allocation if PyArg_ParseTupleAndKeywords() or the first
  xc_readconsolering() fail.  It is trivial to run throught the processes 
  memory
  by repeatedly passing junk parameters to this function.
  
  In the case that the call to xc_readconsolering() in the while loop fails,
  reinstate str before breaking out, and passing a spurious pointer to free().
  
  Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
  Coverity-IDs: 1054984 1055906
  CC: Ian Campbell ian.campb...@citrix.com
  CC: Ian Jackson ian.jack...@eu.citrix.com
  CC: Wei Liu wei.l...@citrix.com
  CC: Xen Coverity Team cover...@xen.org
 
 Acked-by: Ian Campbell ian.campb...@citrix.com

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
  ---
   tools/python/xen/lowlevel/xc/xc.c |   13 ++---
   1 file changed, 6 insertions(+), 7 deletions(-)
  
  diff --git a/tools/python/xen/lowlevel/xc/xc.c 
  b/tools/python/xen/lowlevel/xc/xc.c
  index c70b388..2aa0dc7 100644
  --- a/tools/python/xen/lowlevel/xc/xc.c
  +++ b/tools/python/xen/lowlevel/xc/xc.c
  @@ -1089,7 +1089,7 @@ static PyObject *pyxc_readconsolering(XcObject *self,
   {
   unsigned int clear = 0, index = 0, incremental = 0;
   unsigned int count = 16384 + 1, size = count;
  -char*str = malloc(size), *ptr;
  +char*str, *ptr;
   PyObject*obj;
   int  ret;
   
  @@ -1097,15 +1097,17 @@ static PyObject *pyxc_readconsolering(XcObject 
  *self,
   
   if ( !PyArg_ParseTupleAndKeywords(args, kwds, |iii, kwd_list,
 clear, index, incremental) ||
  - !str )
  + !(str = malloc(size)) )
   return NULL;
   
   ret = xc_readconsolering(self-xc_handle, str, count, clear,
incremental, index);
  -if ( ret  0 )
  +if ( ret  0 ) {
  +free(str);
   return pyxc_error_to_exception(self-xc_handle);
  +}
   
  -while ( !incremental  count == size )
  +while ( !incremental  count == size  ret = 0 )
   {
   size += count - 1;
   if ( size  count )
  @@ -1119,9 +1121,6 @@ static PyObject *pyxc_readconsolering(XcObject *self,
   count = size - count;
   ret = xc_readconsolering(self-xc_handle, str, count, clear,
1, index);
  -if ( ret  0 )
  -break;
  -
   count += str - ptr;
   str = ptr;
   }
 
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxl: Don't derefence null new_name pointer in libxl_domain_rename()

2014-12-01 Thread Konrad Rzeszutek Wilk

On Mon, Dec 01, 2014 at 02:50:56PM +, Ian Campbell wrote:
 On Mon, 2014-12-01 at 14:27 +, Euan Harris wrote:
  libxl__domain_rename() unconditionally dereferences its new_name
  parameter, to check whether it is an empty string.   Add a check to
  avoid a segfault if new_name is null.
  
  Signed-off-by: Euan Harris euan.har...@citrix.com
 
 Acked-by: Ian Campbell ian.campb...@citrix.com
 
 I think this is a good fix to have for 4.5, Konrad CCd.

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
  ---
   tools/libxl/libxl.c |7 +++
   1 files changed, 7 insertions(+), 0 deletions(-)
  
  diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
  index f84f7c2..6e84b5d 100644
  --- a/tools/libxl/libxl.c
  +++ b/tools/libxl/libxl.c
  @@ -385,6 +385,13 @@ int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
   }
   }
   
  +if (!new_name) {
  +LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
  +new domain name not specified);
  +rc = ERROR_INVAL;
  +goto x_rc;
  +}
  +
   if (new_name[0]) {
   /* nonempty names must be unique */
   uint32_t domid_e;
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 7/7] xen/pciback: Restore configuration space when detaching from a guest.

2014-12-01 Thread Konrad Rzeszutek Wilk

On Mon, Dec 01, 2014 at 02:14:56PM +, David Vrabel wrote:
 On 21/11/14 22:17, Konrad Rzeszutek Wilk wrote:
  The commit xen/pciback: Don't deadlock when unbinding. was using
  the version of pci_reset_function which would lock the device lock.
  That is no good as we can dead-lock. As such we swapped to using
  the lock-less version and requiring that the callers
  of 'pcistub_put_pci_dev' take the device lock. And as such
  this bug got exposed.
  
  Using the lock-less version is  OK, except that we tried to
  use 'pci_restore_state' after the lock-less version of
  __pci_reset_function_locked - which won't work as 'state_saved'
  is set to false. Said 'state_saved' is a toggle boolean that
  is to be used by the sequence of a) pci_save_state/pci_restore_state
  or b) pci_load_and_free_saved_state/pci_restore_state. We don't
  want to use a) as the guest might have messed up the PCI
  configuration space and we want it to revert to the state
  when the PCI device was binded to us. Therefore we pick
  b) to restore the configuration space.
  
  We restore from our 'golden' version of PCI configuration space, when an:
   - Device is unbinded from pciback
   - Device is detached from a guest.
  
  Reported-by:  Sander Eikelenboom li...@eikelenboom.it
  Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  ---
   drivers/xen/xen-pciback/pci_stub.c | 20 
   1 file changed, 16 insertions(+), 4 deletions(-)
  
  diff --git a/drivers/xen/xen-pciback/pci_stub.c 
  b/drivers/xen/xen-pciback/pci_stub.c
  index 843a2ba..eb8b58e 100644
  --- a/drivers/xen/xen-pciback/pci_stub.c
  +++ b/drivers/xen/xen-pciback/pci_stub.c
  @@ -105,7 +105,7 @@ static void pcistub_device_release(struct kref *kref)
   */
  __pci_reset_function_locked(dev);
  if (pci_load_and_free_saved_state(dev, dev_data-pci_saved_state))
  -   dev_dbg(dev-dev, Could not reload PCI state\n);
  +   dev_info(dev-dev, Could not reload PCI state\n);
 
 Why dev_info when...
 
  @@ -279,9 +281,19 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
   * (so it's ready for the next domain)
   */
  device_lock_assert(dev-dev);
  -   __pci_reset_function_locked(dev);
  -   pci_restore_state(dev);
  -
  +   dev_data = pci_get_drvdata(dev);
  +   ret = pci_load_saved_state(dev, dev_data-pci_saved_state);
  +   if (ret  0)
  +   dev_warn(dev-dev, Could not reload PCI state\n);
 
 ... this one is dev_warn?

Should be the same, dev_info I think?
 
  +   else {
  +   __pci_reset_function_locked(dev);
 
 I think the reset should always be attempted regardless of whether the
 correct state was loaded or not.

/me nods. Will redo it as such.
 
 David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 04/10] x86: paravirt: Wrap initialization of set_iopl_mask in a macro

2014-12-01 Thread Konrad Rzeszutek Wilk

On Sun, Nov 02, 2014 at 09:32:20AM -0800, Josh Triplett wrote:
 This will allow making set_iopl_mask optional later.
 
 Signed-off-by: Josh Triplett j...@joshtriplett.org

Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  arch/x86/include/asm/paravirt_types.h | 1 +
  arch/x86/kernel/paravirt.c| 2 +-
  arch/x86/xen/enlighten.c  | 4 ++--
  3 files changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/include/asm/paravirt_types.h 
 b/arch/x86/include/asm/paravirt_types.h
 index 7549b8b..3caf2a8 100644
 --- a/arch/x86/include/asm/paravirt_types.h
 +++ b/arch/x86/include/asm/paravirt_types.h
 @@ -143,6 +143,7 @@ struct pv_cpu_ops {
   void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t);
  
   void (*set_iopl_mask)(unsigned mask);
 +#define INIT_SET_IOPL_MASK(f) .set_iopl_mask = f,
  
   void (*wbinvd)(void);
   void (*io_delay)(void);
 diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
 index 548d25f..e7969d4 100644
 --- a/arch/x86/kernel/paravirt.c
 +++ b/arch/x86/kernel/paravirt.c
 @@ -383,7 +383,7 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
   .iret = native_iret,
   .swapgs = native_swapgs,
  
 - .set_iopl_mask = native_set_iopl_mask,
 + INIT_SET_IOPL_MASK(native_set_iopl_mask)
   .io_delay = native_io_delay,
  
   .start_context_switch = paravirt_nop,
 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
 index 1a3f044..8ad0778 100644
 --- a/arch/x86/xen/enlighten.c
 +++ b/arch/x86/xen/enlighten.c
 @@ -912,7 +912,7 @@ static void xen_load_sp0(struct tss_struct *tss,
   xen_mc_issue(PARAVIRT_LAZY_CPU);
  }
  
 -static void xen_set_iopl_mask(unsigned mask)
 +static void __maybe_unused xen_set_iopl_mask(unsigned mask)
  {
   struct physdev_set_iopl set_iopl;
  
 @@ -1279,7 +1279,7 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst 
 = {
   .write_idt_entry = xen_write_idt_entry,
   .load_sp0 = xen_load_sp0,
  
 - .set_iopl_mask = xen_set_iopl_mask,
 + INIT_SET_IOPL_MASK(xen_set_iopl_mask)
   .io_delay = xen_io_delay,
  
   /* Xen takes care of %gs when switching to usermode for us */
 -- 
 2.1.1
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Q77 IGD instantly crashes on xen-pciback bind.

2014-12-01 Thread Konrad Rzeszutek Wilk

On Fri, Nov 28, 2014 at 08:00:40PM -0600, Dr. Greg Wettstein wrote:
 On Nov 27, 12:11pm, Sander Eikelenboom wrote:
 } Subject: Re: [Xen-devel] Q77 IGD instantly crashes on xen-pciback bind.
 
 Hi, hope the week has gone well for everyone.
 
   So we are obviously working with qemu-dm-traditional and with the
   IGD/LVDS BIOS configuration issue fixed the adapater passthrough is
   working and Windows7 is coming up and detecting the IGD as a standard
   VGA display adapter.  Additional invocations of the VM after the first
   one result in failed passthrough with a garbled display.
 
  This is probably due to the current lack of slot/bus reset in
  xen-pciback, Konrad has a preliminary kernel patch for xen-pciback
  that does this.  I have attached the patch, though it has some rough
  edges in the design :-)
 
  I'm currently running with his 3.19 xen-pciback patches series + the
  preliminary patch for slot/bus reset and rebooting a guest with
  vga/pci passthrough now works. (i'm running with a radeon card,
  passed through as a secondary card to the emulated qemu one, in a
  linux guest using qemu-xen, so i can't help you with your other
  questions and problems).
 
 Thanks for taking the time for respond and forward along the patch.
 
 I back ported the do_flr patch into the 3.14.x kernel and spent some
 time working with it.  I thought it might be useful to others to
 document what we ran into.
 
 First of all the issue with the unsuccessful boot of Windows after the
 first invocation doesn't appear to have anything to do with resetting
 the card.  This was fixed by installing the most recent version of the
 Intel HD drivers in the Windows guest.
 
 If IGD passthrough is done without the HD drivers Windows 7 appears to
 use its standard VGA driver which seems to be able to initialize and
 run the IGD device but does not appear to shutdown the device in a
 manner in which it can be re-started.  After the first invocation of
 the guest is shutdown the screen goes to a solid color.  Subsequent
 invocations result in the flashing multi-color screens which others
 have documented.
 
 With the HD drivers installed IGD passthrough works fine through
 multiple invocations of a guest with the stock xen-pciback in 3.14.x.
 We ran 40-50 repetitive Windows guest invocations and every one was
 completely deterministic.
 
 That being said we ran into an issue which we wanted to bounce off the
 list in the context of this thread.
 
 One of the things we were not able to do was to successfuly re-start
 the IGD display for dom0.  After spending a lot of time going through
 Konrad's reset code it appears the IGD devices in the Q77 and Q87
 chipset implementations we are looking at are not advertising reset
 functions which can be used.
 
 It appears that the IGD devices are advertising that they need a
 Device Specific Initialization (DSI) sequence in order to be enabled
 or powered up, if we interpret the output of lscpi properly, ie:
 
 ---
 00:02.0 VGA compatible controller: Intel Corporation Device 0152 (rev 09) 
 (prog-if 00 [VGA controller])
 Subsystem: Intel Corporation Device 2036
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
 Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- 
 TAbort-
 MAbort- SERR- PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 11
 Region 0: Memory at f780 (64-bit, non-prefetchable) [size=4M]
 Region 2: Memory at e000 (64-bit, prefetchable) [size=256M]
 Region 4: I/O ports at f000 [size=64]
 Expansion ROM at unassigned [disabled]
 Capabilities: [90] MSI: Mask- 64bit- Count=1/1 Enable-
 Address:   Data: 
 Capabilities: [d0] Power Management version 2
 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
 PME(D0-,D1-,D2-,D3hot-,D3cold-)
 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [a4] PCIe advanced features ?
 ---
 
 The ATI adapters which we have a lot of passthrough experience with
 are FLR capable and can be re-enabled after passthrough by writing a
 '1' to the enable pseudo-file in the /sys/bus/pci/devices/BDF
 directory.  The IGD adapters seem to fail to respond to a request to
 be re-enabled.
 
 We would certainly be interested in any suggestions the collective may
 have with respect to how to get the adapter turned back on and/or
 implement a reset.

CC-ing Chien who has been looking at adding IGD passthrough support
in the qemu-upstream. Perhaps he can provide some ideas.

Thanks.
 
  Sander
 
 Have a good weekend.
 
 Greg
 
 }-- End of excerpt from Sander Eikelenboom
 
 As always,
 Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
 4206 N. 19th Ave.   Specializing in

Re: [Xen-devel] [PATCH] xen/arm: Handle platforms with edge-triggered virtual timer

2014-12-01 Thread Konrad Rzeszutek Wilk

On Fri, Nov 28, 2014 at 03:17:06PM +, Julien Grall wrote:
 Some platforms (such as Xgene and ARMv8 models) use an edge-triggered 
 interrupt
 for the virtual timer. Even if the timer output signal is masked in the
 context switch, the GIC will keep track that of any interrupts raised
 while IRQs are disabled. As soon as IRQs are re-enabled, the virtual
 interrupt timer will be injected to Xen.
 
 If an idle vVCPU was scheduled next then the interrupt handler doesn't
 expect to the receive the IRQ and will crash:
 
 (XEN)[00228388] _spin_lock_irqsave+0x28/0x94 (PC)
 (XEN)[00228380] _spin_lock_irqsave+0x20/0x94 (LR)
 (XEN)[00250510] vgic_vcpu_inject_irq+0x40/0x1b0
 (XEN)[0024bcd0] vtimer_interrupt+0x4c/0x54
 (XEN)[00247010] do_IRQ+0x1a4/0x220
 (XEN)[00244864] gic_interrupt+0x50/0xec
 (XEN)[0024fbac] do_trap_irq+0x20/0x2c
 (XEN)[00255240] hyp_irq+0x5c/0x60
 (XEN)[00241084] context_switch+0xb8/0xc4
 (XEN)[0022482c] schedule+0x684/0x6d0
 (XEN)[0022785c] __do_softirq+0xcc/0xe8
 (XEN)[002278d4] do_softirq+0x14/0x1c
 (XEN)[00240fac] idle_loop+0x134/0x154
 (XEN)[0024c160] start_secondary+0x14c/0x15c
 (XEN)[0001] 0001
 
 The proper solution is to context switch the virtual interrupt state at
 the GIC level. This would also avoid masking the output signal which
 requires specific handling in the guest OS and more complex code in Xen
 to deal with EOIs, and so is desirable for that reason too.
 
 Sadly, this solution requires some refactoring which would not be
 suitable for a freeze exception for the Xen 4.5 release.
 
 For now implement a temporary solution which ignores the virtual timer
 interrupt when the idle VCPU is running.
 
 Signed-off-by: Julien Grall julien.gr...@linaro.org
 
 ---
 
 Changes in v2:
 - Reword the commit message and comment in the code to explain the
 real bug. Based on Ian's reword.
 - Use unlikely
 
 This patch is a bug fix candidate for Xen 4.5 and backport for Xen 4.4.

Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

 It affects at least Xgene platform and ARMv8 models where Xen may
 randomly crash.
 
 This patch don't inject the virtual timer interrupt if the current VCPU
 is the idle one. For now, I think this patch is the safest way to resolve
 the problem.
 
 I will work on a proper solution for Xen 4.6.
 ---
  xen/arch/arm/time.c | 13 +
  1 file changed, 13 insertions(+)
 
 diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
 index a6436f1..471d7a9 100644
 --- a/xen/arch/arm/time.c
 +++ b/xen/arch/arm/time.c
 @@ -169,6 +169,19 @@ static void timer_interrupt(int irq, void *dev_id, 
 struct cpu_user_regs *regs)
  
  static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs 
 *regs)
  {
 +/*
 + * Edge-triggered interrupt can be used for the virtual timer. Even
 + * if the timer output signal is masked in the context switch, the
 + * GIC will keep track that of any interrupts raised while IRQS as
 + * disabled. As soon as IRQs are re-enabled, the virtual interrupt
 + * will be injected to Xen.
 + *
 + * If an IDLE vCPU was scheduled next then we should ignore the
 + * interrupt.
 + */
 +if ( unlikely(is_idle_vcpu(current)) )
 +return;
 +
  current-arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0);
  WRITE_SYSREG32(current-arch.virt_timer.ctl | CNTx_CTL_MASK, 
 CNTV_CTL_EL0);
  vgic_vcpu_inject_irq(current, current-arch.virt_timer.irq);
 -- 
 2.1.3
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v13 09/11] pvqspinlock, x86: Add para-virtualization support

2014-12-01 Thread Konrad Rzeszutek Wilk

On Wed, Oct 29, 2014 at 04:19:09PM -0400, Waiman Long wrote:
 This patch adds para-virtualization support to the queue spinlock
 code base with minimal impact to the native case. There are some
 minor code changes in the generic qspinlock.c file which should be
 usable in other architectures. The other code changes are specific
 to x86 processors and so are all put under the arch/x86 directory.
 
 On the lock side, the slowpath code is split into 2 separate functions
 generated from the same code - one for bare metal and one for PV guest.
 The switching is done in the _raw_spin_lock* functions. This makes
 sure that the performance impact to the bare metal case is minimal,
 just a few NOPs in the _raw_spin_lock* functions. In the PV slowpath
 code, there are 2 paravirt callee saved calls that minimize register
 pressure.
 
 On the unlock side, however, the disabling of unlock function inlining
 does have some slight impact on bare metal performance.
 
 The actual paravirt code comes in 5 parts;
 
  - init_node; this initializes the extra data members required for PV
state. PV state data is kept 1 cacheline ahead of the regular data.
 
  - link_and_wait_node; this replaces the regular MCS queuing code. CPU
halting can happen if the wait is too long.
 
  - wait_head; this waits until the lock is avialable and the CPU will
be halted if the wait is too long.
 
  - wait_check; this is called after acquiring the lock to see if the
next queue head CPU is halted. If this is the case, the lock bit is
changed to indicate the queue head will have to be kicked on unlock.
 
  - queue_unlock;  this routine has a jump label to check if paravirt
is enabled. If yes, it has to do an atomic cmpxchg to clear the lock
bit or call the slowpath function to kick the queue head cpu.
 
 Tracking the head is done in two parts, firstly the pv_wait_head will
 store its cpu number in whichever node is pointed to by the tail part
 of the lock word. Secondly, pv_link_and_wait_node() will propagate the
 existing head from the old to the new tail node.
 
 Signed-off-by: Waiman Long waiman.l...@hp.com
 ---
  arch/x86/include/asm/paravirt.h   |   19 ++
  arch/x86/include/asm/paravirt_types.h |   20 ++
  arch/x86/include/asm/pvqspinlock.h|  411 
 +
  arch/x86/include/asm/qspinlock.h  |   71 ++-
  arch/x86/kernel/paravirt-spinlocks.c  |6 +
  include/asm-generic/qspinlock.h   |2 +
  kernel/locking/qspinlock.c|   69 +-
  7 files changed, 591 insertions(+), 7 deletions(-)
  create mode 100644 arch/x86/include/asm/pvqspinlock.h
 
 diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
 index cd6e161..7e296e6 100644
 --- a/arch/x86/include/asm/paravirt.h
 +++ b/arch/x86/include/asm/paravirt.h
 @@ -712,6 +712,24 @@ static inline void __set_fixmap(unsigned /* enum 
 fixed_addresses */ idx,
  
  #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
  
 +#ifdef CONFIG_QUEUE_SPINLOCK
 +
 +static __always_inline void pv_kick_cpu(int cpu)
 +{
 + PVOP_VCALLEE1(pv_lock_ops.kick_cpu, cpu);
 +}
 +
 +static __always_inline void pv_lockwait(u8 *lockbyte)
 +{
 + PVOP_VCALLEE1(pv_lock_ops.lockwait, lockbyte);
 +}
 +
 +static __always_inline void pv_lockstat(enum pv_lock_stats type)
 +{
 + PVOP_VCALLEE1(pv_lock_ops.lockstat, type);
 +}
 +
 +#else
  static __always_inline void __ticket_lock_spinning(struct arch_spinlock 
 *lock,
   __ticket_t ticket)
  {
 @@ -723,6 +741,7 @@ static __always_inline void __ticket_unlock_kick(struct 
 arch_spinlock *lock,
  {
   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
  }
 +#endif
  
  #endif
  
 diff --git a/arch/x86/include/asm/paravirt_types.h 
 b/arch/x86/include/asm/paravirt_types.h
 index 7549b8b..49e4b76 100644
 --- a/arch/x86/include/asm/paravirt_types.h
 +++ b/arch/x86/include/asm/paravirt_types.h
 @@ -326,6 +326,9 @@ struct pv_mmu_ops {
  phys_addr_t phys, pgprot_t flags);
  };
  
 +struct mcs_spinlock;
 +struct qspinlock;
 +
  struct arch_spinlock;
  #ifdef CONFIG_SMP
  #include asm/spinlock_types.h
 @@ -333,9 +336,26 @@ struct arch_spinlock;
  typedef u16 __ticket_t;
  #endif
  
 +#ifdef CONFIG_QUEUE_SPINLOCK
 +enum pv_lock_stats {
 + PV_HALT_QHEAD,  /* Queue head halting   */
 + PV_HALT_QNODE,  /* Other queue node halting */
 + PV_HALT_ABORT,  /* Halting aborted  */
 + PV_WAKE_KICKED, /* Wakeup by kicking*/
 + PV_WAKE_SPURIOUS,   /* Spurious wakeup  */
 + PV_KICK_NOHALT  /* Kick but CPU not halted  */
 +};
 +#endif
 +
  struct pv_lock_ops {
 +#ifdef CONFIG_QUEUE_SPINLOCK
 + struct paravirt_callee_save kick_cpu;
 + struct paravirt_callee_save lockstat;
 + struct paravirt_callee_save lockwait;
 +#else
   struct paravirt_callee_save lock_spinning;
   void (*unlock_kick)(struct

Re: [Xen-devel] [PATCH for-4.5] pygrub: Fix regression from c/s d1b93ea, attempt 2

2014-12-02 Thread Konrad Rzeszutek Wilk

On Tue, Dec 02, 2014 at 02:02:58PM +, Ian Campbell wrote:
 On Tue, 2014-12-02 at 02:43 +, Andrew Cooper wrote:
  On 01/12/2014 20:30, Konrad Rzeszutek Wilk wrote:
   On Fri, Nov 28, 2014 at 11:31:24AM +, Ian Campbell wrote:
   On Tue, 2014-11-25 at 11:11 -0500, Boris Ostrovsky wrote:
   c/s d1b93ea causes substantial functional regressions in pygrub's 
   ability to
   parse bootloader configuration files.
  
   c/s d1b93ea itself changed an an interface which previously used 
   exclusively
   integers, to using strings in the case of a grub configuration with 
   explicit
   default set, along with changing the code calling the interface to 
   require a
   string.  The default value for default remained as an integer.
  
   As a result, any Extlinux or Lilo configuration (which drives this 
   interface
   exclusively with integers), or Grub configuration which doesn't 
   explicitly
   declare a default will die with an AttributeError when attempting to 
   call
   self.cf.default.isdigit() where default is an integer.
  
   Sadly, this AttributeError gets swallowed by the blanket ignore in the 
   loop
   which searches partitions for valid bootloader configurations, causing 
   the
   issue to be reported as Unable to find partition containing kernel
  
   We should explicitly check type of default in image_index() and 
   process it
   appropriately.
  
   Reported-by: Andrew Cooper andrew.coop...@citrix.com
   Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
   Acked-by: Ian Campbell ian.campb...@citrix.com
  
   In general I would prefer the first line of the commit message to be a
   short description of the actual functional change and not a reference to
   fixing some other commit, which is basically meaningless to anyone
   reading the log even now, nevermind in six months. I think I'm going to
   start picking up on this more frequently from now on.
  
   CCing Konrad for RM input. I'm much less worried about corner cases etc
   wrt the freeze etc than I was with the larger fix proposed earlier.
   I am bit lost. I thought Andrew NACKed this?
  
  I didn't, as I am not in a position to.  I have not tested the result,
  but believe it is sufficient to fix the exact error at hand.  If the
  maintainers think it is the best solution then so be it, but I am still
  of the opinion that it is is still a hack upon a hack.
 
 At this point in a freeze I'm much happier with:
 
  tools/pygrub/src/pygrub |4 +++-
  1 files changed, 3 insertions(+), 1 deletions(-)

The same here. This is now the season of handing out band-aids.

As such Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 than
  tools/pygrub/src/ExtLinuxConf.py |6 +++---
  tools/pygrub/src/GrubConf.py |7 ++-
  tools/pygrub/src/LiloConf.py |6 +++---
  3 files changed, 8 insertions(+), 11 deletions(-)
 
 Plus Boris' patch is far easier to reason about in isolation in a
 dynamically/duck typed language.
 
 Ian.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] PVHVM drivers in upstream linux kernel

2014-12-02 Thread Konrad Rzeszutek Wilk

On Tue, Dec 02, 2014 at 11:05:14AM +, Ian Campbell wrote:
 On Tue, 2014-12-02 at 10:54 +, David Vrabel wrote:
  On 02/12/14 09:39, Juergen Gross wrote:
   Hi,
   
   looking into the upstream linux sources I realized that the PVHVM
   drivers of XEN are only available with the pvops kernel. Is this on
   purpose? Shouldn't the frontend drivers, xen/platform-pci.c etc. be
   configurable without having to enable CONFIG_PARAVIRT?
  
  I suppose that would be possible but I don't think it's a useful
  configuration because you would lose PV spinlocks for example.
 
 IIRC the reason this hasn't been implemented until now is that
 refactoring would be required to the various bits of driver code which
 assumes PAE + PARAVIRT when they aren't strictly needed, e.g. grant
 table code. Whether its worth the churn at this stage is debatable, but
 I think the (in)ability to use PV spinlocks is a red-herring.
 
 Adding PV drivers to an HVM guest is a useful thing to do, even without
 PV spinlocks. PV IO gets you far more incremental benefit than the locks
 do, adding PV IO paths is the number 1 thing which should be done to any
 guest.
 
 One actual usecase is installing from a distro installer which isn't
 PAE, let alone PARAVIRT enabled[0], to get far enough that you can
 install a more capable PVHVM kernel with more bells and whistles.
 
 If there were distros around who refused wholesale to enable PARAVIRT
 even in a non-default kernel then it would be more likely that they
 could be convinced to enable a set of PV IO drivers, since they have 0
 impact on a non-PARAVIRT system, and still give significant benefit to
 Xen users. I don't know of any of the major distros are refusing
 PARAVIRT in this way though.
 
 Ian.
 
 [0] The default i386 Debian installer falls into this camp, but you can
 use the special PV Xen variant to install as PVHVM too so it's not so
 critical.

And the Fedora 21 LiveISO (32-bit) does too.
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3573 matches

Mail list logo