date:20170515

Re: [Xen-devel] IOMMU support on AMD Ryzen, simple patch needed

2017-05-15 Thread Bjoern

Ok, it's NOT working after all. Trying to install a HVM causes a 
reboot... so I guess there is more work left there after all. Would have 
been too easy I guess ;)



Am 15.05.2017 um 21:20 schrieb Bjoern:

Hi,

I just completed getting Qubes-OS working with Ryzen and IOMMU - at 
least it looks like it to me and ran out of the box BIOS wise.


All that was required is a small patch in 
xen/arch/x86/oprofile/nmi_int.c - Ryzen family 17h is the same as 15h. 
Without that, "xl dmesg" under Ubuntu 17.04 (self compiled 4.8.3) 
would show that family 17h isn't supported, with the above fix 
everything shows up fine.


Xen 4.8.0 has the IOMMU patch 
(https://patchwork.kernel.org/patch/9145119/) which was required for 
Qubes (Xen 4.6.5), and then it just required the above change and it's 
working apparently... at least Qubes reports working Xen - so looks good.


This is a fyi mail - I do not want to push this fix or something into 
Xen as I also have no idea if I'm missing something else, but if 
someone else wants to pick this up, by all means please do :)


Cheers,
Bjoern

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [linux-linus test] 109449: tolerable FAIL - PUSHED

2017-05-15 Thread osstest service owner

flight 109449 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109449/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemut-winxpsp3 16 guest-stop  fail in 109428 pass in 109449
 test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail in 109428 
pass in 109449
 test-amd64-amd64-xl-qemuu-winxpsp3 16 guest-stop fail in 109428 pass in 109449
 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail pass in 
109428
 test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail pass in 
109428
 test-armhf-armhf-xl-rtds 11 guest-startfail pass in 109428
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 18 guest-start.2  fail pass in 109428
 test-amd64-amd64-xl-qemut-winxpsp3 17 guest-start/win.repeat fail pass in 
109428

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  9 debian-installfail REGR. vs. 59254

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumprun-amd64 16 rumprun-demo-xenstorels/xenstorels.repeat 
fail baseline untested
 test-armhf-armhf-libvirt-raw 12 saverestore-support-check fail baseline 
untested
 test-armhf-armhf-xl-vhd  10 guest-start fail baseline untested
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-check fail blocked in 59254
 test-armhf-armhf-libvirt13 saverestore-support-check fail blocked in 59254
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-start/win.repeat fail blocked in 
59254
 test-armhf-armhf-xl-vhd   5 xen-install   fail in 109428 baseline untested
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop   fail in 109428 like 59254
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stopfail in 109428 like 59254
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeat fail in 109428 like 59254
 test-armhf-armhf-xl-rtds12 migrate-support-check fail in 109428 never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-check fail in 109428 never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 59254
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-arm64-arm64-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-rtds 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 11

Re: [Xen-devel] Is there any limitation on the firmware size in Xen?

2017-05-15 Thread Gary Lin

On Fri, May 12, 2017 at 10:31:28PM +1000, Jan Beulich wrote:
> >>> On 12.05.17 at 12:06,  wrote:
> > On Fri, May 12, 2017 at 07:32:49PM +1000, Jan Beulich wrote:
> >> >>> On 11.05.17 at 11:02,  wrote:
> >> > On Thu, May 11, 2017 at 06:14:42PM +1000, Jan Beulich wrote:
> >> >> Note that hvmloader's main() has
> >> >> 
> >> >> BUG_ON(hvm_start_info->magic != XEN_HVM_START_MAGIC_VALUE);
> >> >> 
> >> >> very early, so you having got past this means the corruption
> >> >> occurred inside hvmloader (or at least while it was already
> >> >> running). Could you comment out the call to perform_tests()
> >> >> and try again?
> >> >> 
> >> > You got it. After commenting out perform_tests(), the grub2 menu showed
> >> > and the system booted.
> >> > 
> >> > It seems that perform_tests() cleared 0x40~0x80, and that's why 
> >> > the members of hvm_start_info became 0 in my test.
> >> 
> >> So could you give the below/attached patch a try?
> >> 
> > It won't compile.
> > 
> > tests.c: In function 'perform_tests':
> > tests.c:248:50: error: 'i' may be used uninitialized in this function 
> > [-Werror=maybe-uninitialized]
> >  if ( TEST_MEM_BASE < (uintptr_t)(modlist + i) &&
> >   ^
> > cc1: all warnings being treated as errors
> 
> Oops - quite obviously. No idea why neither of the two gcc versions
> I've built this with caught the issue. Below/attached a better one (also
> with a few other changes).
> 
This patch works for me :)

Thanks,

Gary Lin

> Jan
> 
> --- a/tools/firmware/hvmloader/tests.c
> +++ b/tools/firmware/hvmloader/tests.c
> @@ -19,7 +19,9 @@
>   * this program; If not, see .
>   */
>  
> +#include "config.h"
>  #include "util.h"
> +#include 
>  
>  #define TEST_FAIL 0
>  #define TEST_PASS 1
> @@ -28,11 +30,13 @@
>  /*
>   * Memory layout during tests:
>   *  4MB to 8MB is cleared.
> - *  Page directory resides at 8MB.
> - *  4 page table pages reside at 8MB+4kB to 8MB+20kB.
> - *  Pagetables identity-map 0-16MB, except 4kB at va 6MB maps to pa 5MB.
> + *  Page directory resides at 4MB.
> + *  2 page table pages reside at 4MB+4kB to 4MB+12kB.
> + *  Pagetables identity-map 0-8MB, except 4kB at va 6MB maps to pa 5MB.
>   */
> -#define PD_START (8ul << 20)
> +#define TEST_MEM_BASE (4ul << 20)
> +#define TEST_MEM_SIZE (4ul << 20)
> +#define PD_START TEST_MEM_BASE
>  #define PT_START (PD_START + 4096)
>  
>  static void setup_paging(void)
> @@ -41,10 +45,10 @@ static void setup_paging(void)
>  uint32_t *pt = (uint32_t *)PT_START;
>  uint32_t i;
>  
> -/* Identity map 0-16MB. */
> -for ( i = 0; i < 4; i++ )
> +/* Identity map 0-8MB. */
> +for ( i = 0; i < 2; i++ )
>  pd[i] = (unsigned long)pt + (i<<12) + 3;
> -for ( i = 0; i < (4*1024); i++ )
> +for ( i = 0; i < 2 * 1024; i++ )
>  pt[i] = (i << 12) + 3;
>  
>  /* Page at virtual 6MB maps to physical 5MB. */
> @@ -112,7 +116,7 @@ static int rep_io_test(void)
>  stop_paging();
>  
>  i = 0;
> -for ( p = (uint32_t *)0x40ul; p < (uint32_t *)0x70ul; p++ )
> +for ( p = (uint32_t *)0x4ff000ul; p < (uint32_t *)0x602000ul; p++ )
>  {
>  uint32_t expected = 0;
>  if ( check[i].addr == (unsigned long)p )
> @@ -144,12 +148,12 @@ static int shadow_gs_test(void)
>  if ( !(edx & (1u<<29)) )
>  return TEST_SKIP;
>  
> -/* Long mode pagetable setup: Identity map 0-16MB with 2MB mappings. */
> +/* Long mode pagetable setup: Identity map 0-8MB with 2MB mappings. */
>  *pd = (unsigned long)pd + 0x1007; /* Level 4 */
>  pd += 512;
>  *pd = (unsigned long)pd + 0x1007; /* Level 3 */
>  pd += 512;
> -for ( i = 0; i < 8; i++ ) /* Level 2 */
> +for ( i = 0; i < 4; i++ ) /* Level 2 */
>  *pd++ = (i << 21) + 0x1e3;
>  
>  asm volatile (
> @@ -191,8 +195,7 @@ static int shadow_gs_test(void)
>  
>  void perform_tests(void)
>  {
> -int i, passed, skipped;
> -
> +unsigned int i, passed, skipped;
>  static struct {
>  int (* const test)(void);
>  const char *description;
> @@ -204,12 +207,80 @@ void perform_tests(void)
>  
>  printf("Testing HVM environment:\n");
>  
> -if ( hvm_info->low_mem_pgend < 0x1000 )
> +BUILD_BUG_ON(SCRATCH_PHYSICAL_ADDRESS > HVMLOADER_PHYSICAL_ADDRESS);
> +if ( hvm_info->low_mem_pgend <
> + ((TEST_MEM_BASE + TEST_MEM_SIZE) >> PAGE_SHIFT) )
> +{
> +printf("Skipping tests due to insufficient memory (<%luMB)\n",
> +   (TEST_MEM_BASE + TEST_MEM_SIZE) >> 20);
> +return;
> +}
> +
> +if ( (unsigned long)_end > TEST_MEM_BASE )
> +{
> +printf("Skipping tests due to overlap with base image\n");
> +return;
> +}
> +
> +if ( hvm_start_info->cmdline_paddr &&
> + hvm_start_info->cmdline_paddr < TEST_MEM_BASE + TEST_MEM_SIZE &&
> +

Re: [Xen-devel] [BUG] EDAC infomation partially missing

2017-05-15 Thread Elliott Mitchell

On Mon, May 15, 2017 at 02:02:53AM -0600, Jan Beulich wrote:
> >>> On 14.05.17 at 00:36,  wrote:
> > I haven't yet done as much experimentation as Andreas Pflug has, but I
> > can confirm I'm also running into this bug with Xen 4.4.1.
> > 
> > I've only tried Linux kernel 3.16.43, but as Dom0:
> > 
> > EDAC MC: Ver: 3.0.0
> > AMD64 EDAC driver v3.4.0
> > EDAC amd64: DRAM ECC enabled.
> > EDAC amd64: NB MCE bank disabled, set MSR 0x017b[4] on node 0 to enable.
> > EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not 
> > load.
> > AMD64 EDAC driver v3.4.0
> > EDAC amd64: DRAM ECC enabled.
> > EDAC amd64: NB MCE bank disabled, set MSR 0x017b[4] on node 0 to enable.
> > EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not 
> > load.
> 
> Afaict the driver as is simply can't work in a Xen Dom0; it needs
> enabling (read: para-virtualizing). I'm actually glad to see it doesn't
> load (the worse alternative would be for it to load and then do the
> wrong thing or give you a false sense of safety of your data).

I'm unsure of how to evaluate the situation.  Since ECC is enabled in the
BIOS, data should be safe whether or not the EDAC driver loads.  I
/suspect/ the EDAC driver failing to load merely means reportting of ECC
errors won't happen.  I suspect the only paravirtualization needed is to
map the physical address of the soft|hard errors to which VM's memory
range was effected.  What this effects is which VM should panic in case
of hard errors.

Depending upon the environment there may or may not be cause to report
soft errors anywhere beside Dom0.  In most cases a soft error will at
worst trigger a desire to replace the memory module, but not trigger a
panic for the affected VM.  It is only once a hard error occurs that it
is urgent to warn the effected VM and cause a panic; in this case it
may also be desireable to first alert Dom0 anyway.

As such I'm inclined to think force-enabling ECC EDAC monitoring in Dom0
is the best approach for now.  As long as a hard error doesn't occur in
Dom0's address range, Dom0 is in the best position to deal with the
situation.  The worst case is a hard error occuring in Xen's address
range, since that will mean all VMs on the machine are likely to be
toast.

I think this should be a fairly high priority for Xen since ECC memory is
a feature very common on systems running with a hypervisor.

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Questions about PVHv2/HVMlite

2017-05-15 Thread Boris Ostrovsky




On 05/15/2017 03:51 PM, Gary R Hook wrote:

So I've been slogging through online docs and the code, trying to
understand where things stand with PVH.

I think my primary questions are:
  1) How do I identify a PVHv2/HVMlite guest?


[root@dhcp-burlington7-2nd-B-east-10-152-55-52 ~]# dmesg | grep PVH
[0.00] Booting paravirtualized kernel on Xen PVH
[root@dhcp-burlington7-2nd-B-east-10-152-55-52 ~]#



  2) Or, perhaps more importantly, what distinguishes said guest?


Simplifying things a bit, it's an HVM guest that doesn't have device 
model (i.e. qemu) and which is booted directly (i.e. without hvmloader)



I've got Xen 4.9 unstable built/installed/booted, and am running 4.10
kernels on my
dom0 and guests.


domU PVH support has been added in 4.11 kernel so you don't have it.




I've gotten a guest booted, and a basic Ubuntu 14.04 installed from a
distro ISO onto a
raw disk (a logical volume). All good.

If I use the example file /etc/xen/example.hvm to define a simple guest
(but no VGA:
nographic=1), I see that I have a qemu instance running, which I expect,
along with some
threads:


This is exactly the thing that PVH guests won't have. You are likely 
booting a regular HVM guest.


An PVH guest's config looks something like

kernel="/root/64/vmlinux"
builder="hvm"
device_model_version="none"
extra="root=/dev/xvda1 console=hvc0"
memory=8192
vcpus=2
name = "pvh"
disk=['/root/virt/f22.img,raw,xvda,rw']

(note device_model_version)

-boris



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 08/18] xen/pvcalls: implement connect command

2017-05-15 Thread Boris Ostrovsky




On 05/15/2017 04:36 PM, Stefano Stabellini wrote:

Allocate a socket. Keep track of socket <-> ring mappings with a new data
structure, called sock_mapping. Implement the connect command by calling
inet_stream_connect, and mapping the new indexes page and data ring.
Associate the socket to an ioworker randomly.

When an active socket is closed (sk_state_change), set in_error to
-ENOTCONN and notify the other end, as specified by the protocol.

sk_data_ready will be implemented later.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 145 +
 1 file changed, 145 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2eae096..9ac1cf2 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -63,6 +63,29 @@ struct pvcalls_back_priv {
struct work_struct register_work;
 };

+struct sock_mapping {
+   struct list_head list;
+   struct list_head queue;


Since you have two lists it would be helpful if names were a bit more 
descriptive.


(and comments for at least some fields would be welcome too)


+   struct pvcalls_back_priv *priv;
+   struct socket *sock;
+   int data_worker;
+   uint64_t id;
+   grant_ref_t ref;
+   struct pvcalls_data_intf *ring;
+   void *bytes;
+   struct pvcalls_data data;
+   uint32_t ring_order;
+   int irq;
+   atomic_t read;
+   atomic_t write;
+   atomic_t release;
+   void (*saved_data_ready)(struct sock *sk);
+};
+
+static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
+static int pvcalls_back_release_active(struct xenbus_device *dev,
+  struct pvcalls_back_priv *priv,
+  struct sock_mapping *map);
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
@@ -97,9 +120,126 @@ static int pvcalls_back_socket(struct xenbus_device *dev,
return 1;
 }

+static void pvcalls_sk_state_change(struct sock *sock)
+{
+   struct sock_mapping *map = sock->sk_user_data;
+   struct pvcalls_data_intf *intf;
+
+   if (map == NULL)
+   return;
+
+   intf = map->ring;
+   intf->in_error = -ENOTCONN;
+   notify_remote_via_irq(map->irq);
+}
+
+static void pvcalls_sk_data_ready(struct sock *sock)
+{
+}
+
 static int pvcalls_back_connect(struct xenbus_device *dev,
struct xen_pvcalls_request *req)
 {
+   struct pvcalls_back_priv *priv;
+   int ret;
+   struct socket *sock;
+   struct sock_mapping *map = NULL;
+   void *page;
+   struct xen_pvcalls_response *rsp;
+
+   if (dev == NULL)
+   return 0;
+   priv = dev_get_drvdata(>dev);
+
+   map = kzalloc(sizeof(*map), GFP_KERNEL);
+   if (map == NULL) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   ret = sock_create(AF_INET, SOCK_STREAM, 0, );
+   if (ret < 0) {
+   kfree(map);
+   goto out;
+   }
+   INIT_LIST_HEAD(>queue);
+   map->data_worker = get_random_int() % pvcalls_back_global.nr_ioworkers;
+
+   map->priv = priv;
+   map->sock = sock;
+   map->id = req->u.connect.id;
+   map->ref = req->u.connect.ref;
+
+   ret = xenbus_map_ring_valloc(dev, >u.connect.ref, 1, );
+   if (ret < 0) {
+   sock_release(map->sock);
+   kfree(map);
+   goto out;
+   }
+   map->ring = page;
+   map->ring_order = map->ring->ring_order;
+   /* first read the order, then map the data ring */
+   virt_rmb();



Not sure I understand what the barrier is for here. I don't think 
compiler will reorder ring_order access with the call.




+   if (map->ring_order > MAX_RING_ORDER) {
+   ret = -EFAULT;
+   goto out;
+   }


If the barrier is indeed needed this check belongs before it.

-boris



+   ret = xenbus_map_ring_valloc(dev, map->ring->ref,
+(1 << map->ring_order), );
+   if (ret < 0) {
+   sock_release(map->sock);
+   xenbus_unmap_ring_vfree(dev, map->ring);
+   kfree(map);
+   goto out;
+   }
+   map->bytes = page;
+
+   ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
+   req->u.connect.evtchn,
+   pvcalls_back_conn_event,
+   0,
+   "pvcalls-backend",
+   map);
+   if (ret < 0) {
+   sock_release(map->sock);
+   kfree(map);
+   goto out;
+   }
+   map->irq = ret;
+
+   map->data.in = map->bytes;
+

Re: [Xen-devel] [PATCH 07/18] xen/pvcalls: implement socket command

2017-05-15 Thread Boris Ostrovsky




On 05/15/2017 04:35 PM, Stefano Stabellini wrote:

Just reply with success to the other end for now. Delay the allocation
of the actual socket to bind and/or connect.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2b2a49a..2eae096 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -12,12 +12,17 @@
  * GNU General Public License for more details.
  */

+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 

 #include 
 #include 
@@ -65,7 +70,31 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 static int pvcalls_back_socket(struct xenbus_device *dev,
struct xen_pvcalls_request *req)
 {
-   return 0;
+   struct pvcalls_back_priv *priv;
+   int ret;
+   struct xen_pvcalls_response *rsp;
+
+   if (dev == NULL)
+   return 0;
+   priv = dev_get_drvdata(>dev);


This is inconsistent with pvcalls_back_event() tests, where you check 
both for NULL. OTOH, I am not sure a check is needed at all since you've 
just tested these in pvcalls_back_event().



-boris


+
+   if (req->u.socket.domain != AF_INET ||
+   req->u.socket.type != SOCK_STREAM ||
+   (req->u.socket.protocol != 0 &&
+req->u.socket.protocol != AF_INET))
+   ret = -EAFNOSUPPORT;
+   else
+   ret = 0;
+
+   /* leave the actual socket allocation for later */
+
+   rsp = RING_GET_RESPONSE(>ring, priv->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.socket.id = req->u.socket.id;
+   rsp->ret = ret;
+
+   return 1;
 }

 static int pvcalls_back_connect(struct xenbus_device *dev,



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 06/18] xen/pvcalls: handle commands from the frontend

2017-05-15 Thread Boris Ostrovsky




On 05/15/2017 04:35 PM, Stefano Stabellini wrote:

When the other end notifies us that there are commands to be read
(pvcalls_back_event), wake up the backend thread to parse the command.

The command ring works like most other Xen rings, so use the usual
ring macros to read and write to it. The functions implementing the
commands are empty stubs for now.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 115 +
 1 file changed, 115 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 876e577..2b2a49a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -62,12 +62,127 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }

+static int pvcalls_back_socket(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_connect(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_release(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_bind(struct xenbus_device *dev,
+struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_listen(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_accept(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_poll(struct xenbus_device *dev,
+struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_handle_cmd(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   int ret = 0;
+
+   switch (req->cmd) {
+   case PVCALLS_SOCKET:
+   ret = pvcalls_back_socket(dev, req);
+   break;
+   case PVCALLS_CONNECT:
+   ret = pvcalls_back_connect(dev, req);
+   break;
+   case PVCALLS_RELEASE:
+   ret = pvcalls_back_release(dev, req);
+   break;
+   case PVCALLS_BIND:
+   ret = pvcalls_back_bind(dev, req);
+   break;
+   case PVCALLS_LISTEN:
+   ret = pvcalls_back_listen(dev, req);
+   break;
+   case PVCALLS_ACCEPT:
+   ret = pvcalls_back_accept(dev, req);
+   break;
+   case PVCALLS_POLL:
+   ret = pvcalls_back_poll(dev, req);
+   break;
+   default:
+   ret = -ENOTSUPP;
+   break;
+   }
+   return ret;
+}
+
 static void pvcalls_back_work(struct work_struct *work)
 {
+   struct pvcalls_back_priv *priv = container_of(work,
+   struct pvcalls_back_priv, register_work);
+   int notify, notify_all = 0, more = 1;
+   struct xen_pvcalls_request req;
+   struct xenbus_device *dev = priv->dev;
+
+   atomic_set(>work, 1);
+
+   while (more || !atomic_dec_and_test(>work)) {
+   while (RING_HAS_UNCONSUMED_REQUESTS(>ring)) {
+   RING_COPY_REQUEST(>ring,
+ priv->ring.req_cons++,
+ );
+
+   if (pvcalls_back_handle_cmd(dev, ) > 0) {


Can you make handlers make "traditional" returns, i.e. <0 on error and 0 
on success? Or do you really need to distinguish 0 from >0?



+   RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
+   >ring, notify);
+   notify_all += notify;
+   }
+   }
+
+   if (notify_all)
+   notify_remote_via_irq(priv->irq);
+
+   RING_FINAL_CHECK_FOR_REQUESTS(>ring, more);
+   }
 }

 static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
 {
+   struct xenbus_device *dev = dev_id;
+   struct pvcalls_back_priv *priv = NULL;
+
+   if (dev == NULL)
+   return IRQ_HANDLED;
+
+   priv = dev_get_drvdata(>dev);
+   if (priv == NULL)
+   return IRQ_HANDLED;


These two aren't errors?


+
+   atomic_inc(>work);


Is this really needed? We have a new entry on the ring, so the outer 
loop in pvcalls_back_work() will pick this up (by setting 'more').



-boris


+   queue_work(priv->wq, >register_work);
+
return IRQ_HANDLED;
 }




___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 05/18] xen/pvcalls: connect to a frontend

2017-05-15 Thread Boris Ostrovsky




On 05/15/2017 04:35 PM, Stefano Stabellini wrote:

Introduce a per-frontend data structure named pvcalls_back_priv. It
contains pointers to the command ring, its event channel, a list of
active sockets and a tree of passive sockets (passing sockets need to be
looked up from the id on listen, accept and poll commands, while active
sockets only on release).


It would be useful to put this into a comment in pvcalls_back_priv 
definition.




It also has an unbound workqueue to schedule the work of parsing and
executing commands on the command ring. pvcallss_lock protects the two
lists. In pvcalls_back_global, keep a list of connected frontends.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 87 ++
 1 file changed, 87 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 86eca19..876e577 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -44,13 +44,100 @@ struct pvcalls_back_global {
struct rw_semaphore privs_lock;
 } pvcalls_back_global;

+struct pvcalls_back_priv {
+   struct list_head list;
+   struct xenbus_device *dev;
+   struct xen_pvcalls_sring *sring;
+   struct xen_pvcalls_back_ring ring;
+   int irq;
+   struct list_head socket_mappings;
+   struct radix_tree_root socketpass_mappings;
+   struct rw_semaphore pvcallss_lock;


Same question as before regarding using rw semaphore --- I only see 
down/up_writes.


And what does the name (pvcallss) stand for?



+   atomic_t work;
+   struct workqueue_struct *wq;
+   struct work_struct register_work;
+};
+
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }

+static void pvcalls_back_work(struct work_struct *work)
+{
+}
+
+static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
+{
+   return IRQ_HANDLED;
+}
+
 static int backend_connect(struct xenbus_device *dev)
 {
+   int err, evtchn;
+   grant_ref_t ring_ref;
+   void *addr = NULL;
+   struct pvcalls_back_priv *priv = NULL;
+
+   priv = kzalloc(sizeof(struct pvcalls_back_priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "port", "%u",
+  );
+   if (err != 1) {
+   err = -EINVAL;
+   xenbus_dev_fatal(dev, err, "reading %s/event-channel",
+dev->otherend);
+   goto error;
+   }
+
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref", "%u", _ref);
+   if (err != 1) {
+   err = -EINVAL;
+   xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
+dev->otherend);
+   goto error;
+   }
+
+   err = xenbus_map_ring_valloc(dev, _ref, 1, );
+   if (err < 0)
+   goto error;



I'd move this closer to first use, below.

-boris


+
+   err = bind_interdomain_evtchn_to_irqhandler(dev->otherend_id, evtchn,
+   pvcalls_back_event, 0,
+   "pvcalls-backend", dev);
+   if (err < 0)
+   goto error;
+
+   priv->wq = alloc_workqueue("pvcalls_back_wq", WQ_UNBOUND, 1);
+   if (!priv->wq) {
+   err = -ENOMEM;
+   goto error;
+   }
+   INIT_WORK(>register_work, pvcalls_back_work);
+   priv->dev = dev;
+   priv->sring = addr;
+   BACK_RING_INIT(>ring, priv->sring, XEN_PAGE_SIZE * 1);
+   priv->irq = err;
+   INIT_LIST_HEAD(>socket_mappings);
+   INIT_RADIX_TREE(>socketpass_mappings, GFP_KERNEL);
+   init_rwsem(>pvcallss_lock);
+   dev_set_drvdata(>dev, priv);
+   down_write(_back_global.privs_lock);
+   list_add_tail(>list, _back_global.privs);
+   up_write(_back_global.privs_lock);
+   queue_work(priv->wq, >register_work);
+
return 0;
+
+ error:
+   if (addr != NULL)
+   xenbus_unmap_ring_vfree(dev, addr);
+   if (priv->wq)
+   destroy_workqueue(priv->wq);
+   unbind_from_irqhandler(priv->irq, dev);
+   kfree(priv);
+   return err;
 }

 static int backend_disconnect(struct xenbus_device *dev)



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 04/18] xen/pvcalls: xenbus state handling

2017-05-15 Thread Boris Ostrovsky




On 05/15/2017 04:35 PM, Stefano Stabellini wrote:

Introduce the code to handle xenbus state changes.

Implement the probe function for the pvcalls backend. Write the
supported versions, max-page-order and function-calls nodes to xenstore,
as required by the protocol.

Introduce stub functions for disconnecting/connecting to a frontend.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 133 +
 1 file changed, 133 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 46a889a..86eca19 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,6 +25,9 @@
 #include 
 #include 

+#define PVCALLS_VERSIONS "1"
+#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
+
 struct pvcalls_ioworker {
struct work_struct register_work;
atomic_t io;
@@ -45,15 +48,145 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }

+static int backend_connect(struct xenbus_device *dev)
+{
+   return 0;
+}
+
+static int backend_disconnect(struct xenbus_device *dev)
+{
+   return 0;
+}
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
  const struct xenbus_device_id *id)
 {
+   int err;
+
+   err = xenbus_printf(XBT_NIL, dev->nodename, "versions", "%s",
+   PVCALLS_VERSIONS);
+   if (err) {
+   pr_warn("%s write out 'version' failed\n", __func__);
+   return -EINVAL;


Why not return err? (below too)



+   }
+
+   err = xenbus_printf(XBT_NIL, dev->nodename, "max-page-order", "%u",
+   MAX_RING_ORDER);
+   if (err) {
+   pr_warn("%s write out 'max-page-order' failed\n", __func__);
+   return -EINVAL;
+   }
+
+   /* "1" means socket, connect, release, bind, listen, accept and poll*/
+   err = xenbus_printf(XBT_NIL, dev->nodename, "function-calls", "1");



Should "1" be defined in the (public) header file?



+   if (err) {
+   pr_warn("%s write out 'function-calls' failed\n", __func__);
+   return -EINVAL;
+   }
+
+   err = xenbus_switch_state(dev, XenbusStateInitWait);
+   if (err)
+   return err;
+
return 0;
 }

+static void set_backend_state(struct xenbus_device *dev,
+ enum xenbus_state state)
+{
+   while (dev->state != state) {
+   switch (dev->state) {
+   case XenbusStateClosed:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateConnected:
+   xenbus_switch_state(dev, XenbusStateInitWait);
+   break;
+   case XenbusStateClosing:
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateInitWait:
+   case XenbusStateInitialised:
+   switch (state) {
+   case XenbusStateConnected:
+   backend_connect(dev);
+   xenbus_switch_state(dev, XenbusStateConnected);
+   break;
+   case XenbusStateClosing:
+   case XenbusStateClosed:
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateConnected:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateClosing:
+   case XenbusStateClosed:
+   down_write(_back_global.privs_lock);
+   backend_disconnect(dev);
+   up_write(_back_global.privs_lock);



Unless you plan to have more stuff under the semaphore, I'd consider 
putting them in backend_disconnect().




+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateClosing:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateConnected:
+   case XenbusStateClosed:
+   xenbus_switch_state(dev, XenbusStateClosed);
+

Re: [Xen-devel] [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend

2017-05-15 Thread Boris Ostrovsky




On 05/15/2017 04:35 PM, Stefano Stabellini wrote:

The pvcalls backend has one ioworker per cpu: the ioworkers are
implemented as a cpu bound workqueue, and will deal with the actual
socket and data ring reads/writes.

ioworkers are global: we only have one set for all the frontends. They
process requests on their wqs list in order, once they are done with a
request, they'll remove it from the list. A spinlock is used for
protecting the list. Each ioworker is bound to a different cpu to
maximize throughput.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 64 ++
 1 file changed, 64 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2dbf7d8..46a889a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,6 +25,26 @@
 #include 
 #include 

+struct pvcalls_ioworker {
+   struct work_struct register_work;
+   atomic_t io;
+   struct list_head wqs;
+   spinlock_t lock;
+   int num;
+};
+
+struct pvcalls_back_global {
+   struct pvcalls_ioworker *ioworkers;
+   int nr_ioworkers;
+   struct workqueue_struct *wq;
+   struct list_head privs;
+   struct rw_semaphore privs_lock;


Is there a reason why these are called "privs"?

And why are you using a rw semaphore --- I only noticed two instances of 
use and both are writes.




+} pvcalls_back_global;
+
+static void pvcalls_back_ioworker(struct work_struct *work)
+{
+}
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
  const struct xenbus_device_id *id)
 {
@@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
.uevent = pvcalls_back_uevent,
.otherend_changed = pvcalls_back_changed,
 };
+
+static int __init pvcalls_back_init(void)
+{
+   int ret, i, cpu;
+
+   if (!xen_domain())
+   return -ENODEV;
+
+   ret = xenbus_register_backend(_back_driver);
+   if (ret < 0)
+   return ret;
+
+   init_rwsem(_back_global.privs_lock);
+   INIT_LIST_HEAD(_back_global.privs);
+   pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
+   if (!pvcalls_back_global.wq)
+   goto error;
+   pvcalls_back_global.nr_ioworkers = num_online_cpus();



Should nr_ioworkers be updated on CPU hot(un)plug?



+   pvcalls_back_global.ioworkers = kzalloc(
+   sizeof(*pvcalls_back_global.ioworkers) *
+   pvcalls_back_global.nr_ioworkers, GFP_KERNEL);
+   if (!pvcalls_back_global.ioworkers)
+   goto error;
+   i = 0;
+   for_each_online_cpu(cpu) {
+   pvcalls_back_global.ioworkers[i].num = i;
+   atomic_set(_back_global.ioworkers[i].io, 1);
+   spin_lock_init(_back_global.ioworkers[i].lock);
+   INIT_LIST_HEAD(_back_global.ioworkers[i].wqs);
+   INIT_WORK(_back_global.ioworkers[i].register_work,
+   pvcalls_back_ioworker);
+   i++;
+   }
+   return 0;
+
+error:
+   if (pvcalls_back_global.wq)
+   destroy_workqueue(pvcalls_back_global.wq);
+   xenbus_unregister_driver(_back_driver);
+   kfree(pvcalls_back_global.ioworkers);
+   memset(_back_global, 0, sizeof(pvcalls_back_global));
+   return -ENOMEM;


This routine could use more newlines. (and in other patches too)

-boris


+}
+module_init(pvcalls_back_init);



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [linux-next test] 109448: regressions - FAIL

2017-05-15 Thread osstest service owner

flight 109448 linux-next real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109448/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-debianhvm-amd64 9 debian-hvm-install fail REGR. vs. 
109428
 test-amd64-i386-xl-qemuu-winxpsp3 16 guest-stop  fail REGR. vs. 109428
 test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 
109428
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 12 guest-saverestore fail REGR. vs. 
109428

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-rumprun-i386 16 rumprun-demo-xenstorels/xenstorels.repeat fail 
REGR. vs. 109428
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stopfail REGR. vs. 109428

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 109404
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 109428
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 109428
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 109428
 test-amd64-i386-xl-qemut-winxpsp3 16 guest-stop   fail like 109428
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 109428
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-rtds 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 12 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass

version targeted for testing:
 linuxecf5e3d45a01969de14e7feb1126f948fc2a2635
baseline version:
 linux2ea659a9ef488125eb46da6eb571de5eae5c43f6

Last test of basis

Re: [Xen-devel] null domains after xl destroy

2017-05-15 Thread Steven Haigh


On 2017-05-16 10:49, Glenn Enright wrote:

On 15/05/17 21:57, Juergen Gross wrote:

On 13/05/17 06:02, Glenn Enright wrote:

On 09/05/17 21:24, Roger Pau Monné wrote:

On Mon, May 08, 2017 at 11:10:24AM +0200, Juergen Gross wrote:

On 04/05/17 00:17, Glenn Enright wrote:

On 04/05/17 04:58, Steven Haigh wrote:

On 04/05/17 01:53, Juergen Gross wrote:

On 03/05/17 12:45, Steven Haigh wrote:

Just wanted to give this a little nudge now people seem to be
back on
deck...


Glenn, could you please give the attached patch a try?

It should be applied on top of the other correction, the old 
debug

patch should not be applied.

I have added some debug output to make sure we see what is 
happening.


This patch is included in kernel-xen-4.9.26-1

It should be in the repos now.



Still seeing the same issue. Without the extra debug patch all I 
see in

the logs after destroy is this...

xen-blkback: xen_blkif_disconnect: busy
xen-blkback: xen_blkif_free: delayed = 0


Hmm, to me it seems as if some grant isn't being unmapped.

Looking at gnttab_unmap_refs_async() I wonder how this is supposed 
to

work:

I don't see how a grant would ever be unmapped in case of
page_count(item->pages[pc]) > 1 in __gnttab_unmap_refs_async(). All 
it
does is deferring the call to the unmap operation again and again. 
Or

am I missing something here?


No, I don't think you are missing anything, but I cannot see how 
this

can be
solved in a better way, unmapping a page that's still referenced is
certainly
not the best option, or else we risk triggering a page-fault 
elsewhere.


IMHO, gnttab_unmap_refs_async should have a timeout, and return an
error at
some point. Also, I'm wondering whether there's a way to keep track 
of

who has
references on a specific page, but so far I haven't been able to
figure out how
to get this information from Linux.

Also, I've noticed that __gnttab_unmap_refs_async uses page_count,
shouldn't it
use page_ref_count instead?

Roger.



In case it helps, I have continued to work on this. I notices 
processed

left behind (under 4.9.27). The same issue is ongoing.

# ps auxf | grep [x]vda
root  2983  0.0  0.0  0 0 ?S01:44   0:00  \_
[1.xvda1-1]
root  5457  0.0  0.0  0 0 ?S02:06   0:00  \_
[3.xvda1-1]
root  7382  0.0  0.0  0 0 ?S02:36   0:00  \_
[4.xvda1-1]
root  9668  0.0  0.0  0 0 ?S02:51   0:00  \_
[6.xvda1-1]
root 11080  0.0  0.0  0 0 ?S02:57   0:00  \_
[7.xvda1-1]

# xl list
Name  ID   Mem VCPUs  State   Time(s)
Domain-0  0  1512 2 r- 118.5
(null)1 8 4 --p--d  43.8
(null)3 8 4 --p--d   6.3
(null)4 8 4 --p--d  73.4
(null)6 8 4 --p--d  14.7
(null)7 8 4 --p--d  30

Those all have...

[root 11080]# cat wchan
xen_blkif_schedule

[root 11080]# cat stack
[] xen_blkif_schedule+0x418/0xb40
[] kthread+0xe5/0x100
[] ret_from_fork+0x25/0x30
[] 0x


And found another reference count bug. Would you like to give the
attached patch (to be applied additionally to the previous ones) a 
try?



Juergen



This seems to have solved the issue in 4.9.28, with all three patches
applied. Awesome!

On my main test machine I can no longer replicate what I was
originally seeing, and in dmesg I now see this flow...

xen-blkback: xen_blkif_disconnect: busy
xen-blkback: xen_blkif_free: delayed = 1
xen-blkback: xen_blkif_free: delayed = 0

xl list is clean, xenstore looks right. No extraneous processes left 
over.


Thankyou Juergen, so much. Really appreciate your persistence with
this. Anything I can do to help push this upstream please let me know.
Feel free to add a reported-by line with my name if you think it
appropriate.


This is good news.

Juergen, Can I request a full patch set posted to the list (plz CC me) - 
and I'll ensure we can build the kernel with all 3 (?) patches applied 
and test properly.


I'll build up a complete kernel with those patches and give a tested-by 
if all goes well.


--
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] null domains after xl destroy

2017-05-15 Thread Glenn Enright


On 15/05/17 21:57, Juergen Gross wrote:

On 13/05/17 06:02, Glenn Enright wrote:

On 09/05/17 21:24, Roger Pau Monné wrote:

On Mon, May 08, 2017 at 11:10:24AM +0200, Juergen Gross wrote:

On 04/05/17 00:17, Glenn Enright wrote:

On 04/05/17 04:58, Steven Haigh wrote:

On 04/05/17 01:53, Juergen Gross wrote:

On 03/05/17 12:45, Steven Haigh wrote:

Just wanted to give this a little nudge now people seem to be
back on
deck...


Glenn, could you please give the attached patch a try?

It should be applied on top of the other correction, the old debug
patch should not be applied.

I have added some debug output to make sure we see what is happening.


This patch is included in kernel-xen-4.9.26-1

It should be in the repos now.



Still seeing the same issue. Without the extra debug patch all I see in
the logs after destroy is this...

xen-blkback: xen_blkif_disconnect: busy
xen-blkback: xen_blkif_free: delayed = 0


Hmm, to me it seems as if some grant isn't being unmapped.

Looking at gnttab_unmap_refs_async() I wonder how this is supposed to
work:

I don't see how a grant would ever be unmapped in case of
page_count(item->pages[pc]) > 1 in __gnttab_unmap_refs_async(). All it
does is deferring the call to the unmap operation again and again. Or
am I missing something here?


No, I don't think you are missing anything, but I cannot see how this
can be
solved in a better way, unmapping a page that's still referenced is
certainly
not the best option, or else we risk triggering a page-fault elsewhere.

IMHO, gnttab_unmap_refs_async should have a timeout, and return an
error at
some point. Also, I'm wondering whether there's a way to keep track of
who has
references on a specific page, but so far I haven't been able to
figure out how
to get this information from Linux.

Also, I've noticed that __gnttab_unmap_refs_async uses page_count,
shouldn't it
use page_ref_count instead?

Roger.



In case it helps, I have continued to work on this. I notices processed
left behind (under 4.9.27). The same issue is ongoing.

# ps auxf | grep [x]vda
root  2983  0.0  0.0  0 0 ?S01:44   0:00  \_
[1.xvda1-1]
root  5457  0.0  0.0  0 0 ?S02:06   0:00  \_
[3.xvda1-1]
root  7382  0.0  0.0  0 0 ?S02:36   0:00  \_
[4.xvda1-1]
root  9668  0.0  0.0  0 0 ?S02:51   0:00  \_
[6.xvda1-1]
root 11080  0.0  0.0  0 0 ?S02:57   0:00  \_
[7.xvda1-1]

# xl list
Name  ID   Mem VCPUs  State   Time(s)
Domain-0  0  1512 2 r- 118.5
(null)1 8 4 --p--d  43.8
(null)3 8 4 --p--d   6.3
(null)4 8 4 --p--d  73.4
(null)6 8 4 --p--d  14.7
(null)7 8 4 --p--d  30

Those all have...

[root 11080]# cat wchan
xen_blkif_schedule

[root 11080]# cat stack
[] xen_blkif_schedule+0x418/0xb40
[] kthread+0xe5/0x100
[] ret_from_fork+0x25/0x30
[] 0x


And found another reference count bug. Would you like to give the
attached patch (to be applied additionally to the previous ones) a try?


Juergen



This seems to have solved the issue in 4.9.28, with all three patches 
applied. Awesome!


On my main test machine I can no longer replicate what I was originally 
seeing, and in dmesg I now see this flow...


xen-blkback: xen_blkif_disconnect: busy
xen-blkback: xen_blkif_free: delayed = 1
xen-blkback: xen_blkif_free: delayed = 0

xl list is clean, xenstore looks right. No extraneous processes left over.

Thankyou Juergen, so much. Really appreciate your persistence with this. 
Anything I can do to help push this upstream please let me know. Feel 
free to add a reported-by line with my name if you think it appropriate.


Regards, Glenn
http://rimuhosting.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.9 rc4

2017-05-15 Thread Goel, Sameer


On 5/8/2017 12:41 PM, Julien Grall wrote:
> Hi all,
> 
> Xen 4.9 rc4 is tagged. You can check that out from xen.git:
> 
>  git://xenbits.xen.org/xen.git 4.9.0-rc4
> 

Tested the release on Qualcomm Datacenter Technologies QDF2400 platform (Linux 
kernel 4.11). Basic DOM0 boot works fine after
disabling pl011 serial and SMMUv3. This is as expected.

- Sameer

> For your convenience there is also a tarball at:
> https://downloads.xenproject.org/release/xen/4.9.0-rc4/xen-4.9.0-rc4.tar.gz
> 
> And the signature is at:
> https://downloads.xenproject.org/release/xen/4.9.0-rc4/xen-4.9.0-rc4.tar.gz.sig
> 
> Please send bug reports and test reports to
> xen-de...@lists.xenproject.org. When sending bug reports, please CC
> relevant maintainers and me (julien.gr...@arm.com).
> 
> As a reminder, there will be another Xen Test Day tomorrow (Tuesday 9th May),
> for the instructions see:
> 
> https://blog.xenproject.org/2017/04/13/announcing-xen-project-4-9-rc-and-test-day-schedule/
> 
> Cheers,
> 

-- 
 Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, 
Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH RFC 5/6] qemu-xen-trad: sasl: introduce SASL authentication and encryption layer

2017-05-15 Thread Simon Waterman

This change adds calls to the SASL API to negotiate SASL auth and
includes SASL encode/decode into read and write flows if the SASL
mechanism is providing SSF.

The code is taken from upstream with minor adjustments for
compatibility with qemu-xen-traditional.

Signed-off-by: Simon Waterman 
---
 vnc.c | 329 ++
 1 file changed, 292 insertions(+), 37 deletions(-)

diff --git a/vnc.c b/vnc.c
index 728efec..ff460b8 100644
--- a/vnc.c
+++ b/vnc.c
@@ -80,6 +80,58 @@ static DisplayChangeListener *dcl;
   (((x) + (1ULL << (vs)->dirty_pixel_shift) - 1) >> (vs)->dirty_pixel_shift)
 #define DP2X(vs, x) ((x) << (vs)->dirty_pixel_shift)
 
+#ifndef CONFIG_STUBDOM
+static char *addr_to_string(const char *format,
+struct sockaddr_storage *sa,
+socklen_t salen) {
+char *addr;
+char host[NI_MAXHOST];
+char serv[NI_MAXSERV];
+int err;
+size_t addrlen;
+
+if ((err = getnameinfo((struct sockaddr *)sa, salen,
+   host, sizeof(host),
+   serv, sizeof(serv),
+   NI_NUMERICHOST | NI_NUMERICSERV)) != 0) {
+VNC_DEBUG("Cannot resolve address %d: %s\n",
+  err, gai_strerror(err));
+return NULL;
+}
+
+/* Enough for the existing format + the 2 vars we're
+ * substituting in. */
+addrlen = strlen(format) + strlen(host) + strlen(serv);
+addr = malloc(addrlen + 1);
+snprintf(addr, addrlen, format, host, serv);
+addr[addrlen] = '\0';
+
+return addr;
+}
+
+char *vnc_socket_local_addr(const char *format, int fd) {
+struct sockaddr_storage sa;
+socklen_t salen;
+
+salen = sizeof(sa);
+if (getsockname(fd, (struct sockaddr*), ) < 0)
+return NULL;
+
+return addr_to_string(format, , salen);
+}
+
+char *vnc_socket_remote_addr(const char *format, int fd) {
+struct sockaddr_storage sa;
+socklen_t salen;
+
+salen = sizeof(sa);
+if (getpeername(fd, (struct sockaddr*), ) < 0)
+return NULL;
+
+return addr_to_string(format, , salen);
+}
+#endif /* !CONFIG_STUBDOM */
+
 void do_info_vnc(void)
 {
 if (vnc_state == NULL)
@@ -770,6 +822,9 @@ int vnc_client_io_error(VncState *vs, int ret, int 
last_errno)
}
vs->wiremode = VNC_WIREMODE_CLEAR;
 #endif /* CONFIG_VNC_TLS */
+#ifdef CONFIG_VNC_SASL
+vnc_sasl_client_cleanup(vs);
+#endif /* CONFIG_VNC_SASL */
return 0;
 }
 return ret;
@@ -780,65 +835,203 @@ void vnc_client_error(VncState *vs)
 vnc_client_io_error(vs, -1, EINVAL);
 }
 
-static void vnc_client_write(void *opaque)
+#ifdef CONFIG_VNC_TLS
+static long vnc_client_write_tls(gnutls_session_t *session,
+ const uint8_t *data,
+ size_t datalen)
+{
+long ret = gnutls_write(*session, data, datalen);
+if (ret < 0) { 
+if (ret == GNUTLS_E_AGAIN) {
+errno = EAGAIN;
+} else {
+errno = EIO;
+}
+ret = -1;
+}
+return ret;
+}
+#endif /* CONFIG_VNC_TLS */
+
+/*
+ * Called to write a chunk of data to the client socket. The data may
+ * be the raw data, or may have already been encoded by SASL.
+ * The data will be written either straight onto the socket, or
+ * written via the GNUTLS wrappers, if TLS/SSL encryption is enabled
+ *
+ * NB, it is theoretically possible to have 2 layers of encryption,
+ * both SASL, and this TLS layer. It is highly unlikely in practice
+ * though, since SASL encryption will typically be a no-op if TLS
+ * is active
+ *
+ * Returns the number of bytes written, which may be less than
+ * the requested 'datalen' if the socket would block. Returns
+ * -1 on error, and disconnects the client socket.
+ */
+long vnc_client_write_buf(VncState *vs, const uint8_t *data, size_t datalen)
 {
 long ret;
-VncState *vs = opaque;
-
 #ifdef CONFIG_VNC_TLS
 if (vs->tls_session) {
-   ret = gnutls_write(vs->tls_session, vs->output.buffer, 
vs->output.offset);
-   if (ret < 0) {
-   if (ret == GNUTLS_E_AGAIN)
-   errno = EAGAIN;
-   else
-   errno = EIO;
-   ret = -1;
-   }
-} else
+ret = vnc_client_write_tls(>tls_session, data, datalen);
+} else {
+#endif /* CONFIG_VNC_TLS */
+ret = send(vs->csock, data, datalen, 0);
+#ifdef CONFIG_VNC_TLS
+}
 #endif /* CONFIG_VNC_TLS */
-   ret = send(vs->csock, vs->output.buffer, vs->output.offset, 0);
-ret = vnc_client_io_error(vs, ret, socket_error());
+VNC_DEBUG("Wrote wire %p %zd -> %ld\n", data, datalen, ret);
+return vnc_client_io_error(vs, ret, socket_error());
+}
+
+/*
+ * Called to write buffered data to the client socket, when not
+ * using any SASL SSF encryption layers. Will write as much data
+ * as possible without blocking. If all buffered data is written,
+ *

[Xen-devel] [PATCH RFC 4/6] qemu-xen-trad: sasl: compatibility with vnc.h

2017-05-15 Thread Simon Waterman

This change adjusts vnc.c for compatibility with the API defined
in vnc.h.

Signed-off-by: Simon Waterman 
---
 vnc.c | 212 +-
 1 file changed, 27 insertions(+), 185 deletions(-)

diff --git a/vnc.c b/vnc.c
index 0e61197..728efec 100644
--- a/vnc.c
+++ b/vnc.c
@@ -24,6 +24,7 @@
  * THE SOFTWARE.
  */
 
+#include "vnc.h"
 #include "qemu-common.h"
 #include "console.h"
 #include "sysemu.h"
@@ -50,8 +51,6 @@
minimised vncviewer reasonably quickly. */
 #define VNC_MAX_UPDATE_INTERVAL   5000
 
-#include "vnc_keysym.h"
-#include "keymaps.c"
 #include "d3des.h"
 
 #ifdef CONFIG_VNC_TLS
@@ -59,21 +58,6 @@
 #include 
 #endif /* CONFIG_VNC_TLS */
 
-// #define _VNC_DEBUG 1
-
-#ifdef _VNC_DEBUG
-#define VNC_DEBUG(fmt, ...) do { fprintf(stderr, fmt, ## __VA_ARGS__); } while 
(0)
-
-#if defined(CONFIG_VNC_TLS) && _VNC_DEBUG >= 2
-/* Very verbose, so only enabled for _VNC_DEBUG >= 2 */
-static void vnc_debug_gnutls_log(int level, const char* str) {
-VNC_DEBUG("%d %s", level, str);
-}
-#endif /* CONFIG_VNC_TLS && _VNC_DEBUG */
-#else
-#define VNC_DEBUG(fmt, ...) do { } while (0)
-#endif
-
 #define count_bits(c, v) { \
 for (c = 0; v; v >>= 1) \
 { \
@@ -81,157 +65,13 @@ static void vnc_debug_gnutls_log(int level, const char* 
str) {
 } \
 }
 
-typedef struct Buffer
-{
-size_t capacity;
-size_t offset;
-uint8_t *buffer;
-} Buffer;
-
-typedef struct VncState VncState;
-
-typedef int VncReadEvent(VncState *vs, uint8_t *data, size_t len);
-
-typedef void VncWritePixels(VncState *vs, void *data, int size);
-
-typedef void VncSendHextileTile(VncState *vs,
-int x, int y, int w, int h,
-void *last_bg, 
-void *last_fg,
-int *has_bg, int *has_fg);
-
-#if 0
-#define VNC_MAX_WIDTH 2048
-#define VNC_MAX_HEIGHT 2048
-#define VNC_DIRTY_WORDS (VNC_MAX_WIDTH / (16 * 32))
-#endif
-
-#define VNC_AUTH_CHALLENGE_SIZE 16
-
-enum {
-VNC_AUTH_INVALID = 0,
-VNC_AUTH_NONE = 1,
-VNC_AUTH_VNC = 2,
-VNC_AUTH_RA2 = 5,
-VNC_AUTH_RA2NE = 6,
-VNC_AUTH_TIGHT = 16,
-VNC_AUTH_ULTRA = 17,
-VNC_AUTH_TLS = 18,
-VNC_AUTH_VENCRYPT = 19
-};
-
 #ifdef CONFIG_VNC_TLS
 enum {
 VNC_WIREMODE_CLEAR,
 VNC_WIREMODE_TLS,
 };
-
-enum {
-VNC_AUTH_VENCRYPT_PLAIN = 256,
-VNC_AUTH_VENCRYPT_TLSNONE = 257,
-VNC_AUTH_VENCRYPT_TLSVNC = 258,
-VNC_AUTH_VENCRYPT_TLSPLAIN = 259,
-VNC_AUTH_VENCRYPT_X509NONE = 260,
-VNC_AUTH_VENCRYPT_X509VNC = 261,
-VNC_AUTH_VENCRYPT_X509PLAIN = 262,
-};
-
-#define X509_CA_CERT_FILE "ca-cert.pem"
-#define X509_CA_CRL_FILE "ca-crl.pem"
-#define X509_SERVER_KEY_FILE "server-key.pem"
-#define X509_SERVER_CERT_FILE "server-cert.pem"
-
 #endif /* CONFIG_VNC_TLS */
 
-#define QUEUE_ALLOC_UNIT 10
-
-typedef struct _QueueItem
-{
-int x, y, w, h;
-int32_t enc;
-struct _QueueItem *next;
-} QueueItem;
-
-typedef struct _Queue
-{
-QueueItem *queue_start;
-int start_count;
-QueueItem *queue_end;
-int end_count;
-} Queue;
-
-struct VncState
-{
-QEMUTimer *timer;
-int timer_interval;
-int64_t last_update_time;
-int lsock;
-int csock;
-DisplayState *ds;
-uint64_t *dirty_row;   /* screen regions which are possibly dirty */
-int dirty_pixel_shift;
-uint64_t *update_row;  /* outstanding updates */
-int has_update;/* there's outstanding updates in the
-* visible area */
-
-int update_requested;   /* the client requested an update */
-
-uint8_t *old_data;
-int has_resize;
-int has_hextile;
-int has_pointer_type_change;
-int has_WMVi;
-int absolute;
-int last_x;
-int last_y;
-
-int major;
-int minor;
-
-char *display;
-char *password;
-int auth;
-#ifdef CONFIG_VNC_TLS
-int subauth;
-int x509verify;
-
-char *x509cacert;
-char *x509cacrl;
-char *x509cert;
-char *x509key;
-#endif
-char challenge[VNC_AUTH_CHALLENGE_SIZE];
-int switchbpp;
-
-#ifdef CONFIG_VNC_TLS
-int wiremode;
-gnutls_session_t tls_session;
-#endif
-
-Buffer output;
-Buffer input;
-
-Queue upqueue;
-
-kbd_layout_t *kbd_layout;
-/* current output mode information */
-VncWritePixels *write_pixels;
-VncSendHextileTile *send_hextile_tile;
-DisplaySurface clientds, serverds;
-
-VncReadEvent *read_handler;
-size_t read_handler_expect;
-
-int visible_x;
-int visible_y;
-int visible_w;
-int visible_h;
-
-/* input */
-uint8_t modifiers_state[256];
-};
-
-static VncState *vnc_state; /* needed for info vnc */
 static DisplayChangeListener *dcl;
 
 #define DIRTY_PIXEL_BITS 64
@@ -263,15 +103,10 @@ void do_info_vnc(void)
3) resolutions > 1024
 */
 
-static void vnc_write(VncState *vs, const void *data, size_t len);

[Xen-devel] [PATCH RFC 2/6] qemu-xen-trad: sasl: define SASL auth API

2017-05-15 Thread Simon Waterman

Add the SASL auth API to hook into vnc.c.  Taken from upstream
with minor changes to remove ACL support, which isn't in
qemu-xen-traditional yet.

Signed-off-by: Simon Waterman 
---
 vnc-auth-sasl.h | 67 +
 1 file changed, 67 insertions(+)
 create mode 100644 vnc-auth-sasl.h

diff --git a/vnc-auth-sasl.h b/vnc-auth-sasl.h
new file mode 100644
index 000..42a049f
--- /dev/null
+++ b/vnc-auth-sasl.h
@@ -0,0 +1,67 @@
+/*
+ * QEMU VNC display driver: SASL auth protocol
+ *
+ * Copyright (C) 2009 Red Hat, Inc
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+
+#ifndef __QEMU_VNC_AUTH_SASL_H__
+#define __QEMU_VNC_AUTH_SASL_H__
+
+
+#include 
+
+typedef struct VncStateSASL VncStateSASL;
+
+struct VncStateSASL {
+sasl_conn_t *conn;
+/* If we want to negotiate an SSF layer with client */
+bool wantSSF;
+/* If we are now running the SSF layer */
+bool runSSF;
+/*
+ * If this is non-zero, then wait for that many bytes
+ * to be written plain, before switching to SSF encoding
+ * This allows the VNC auth result to finish being
+ * written in plain.
+ */
+unsigned int waitWriteSSF;
+
+/*
+ * Buffering encoded data to allow more clear data
+ * to be stuffed onto the output buffer
+ */
+const uint8_t *encoded;
+unsigned int encodedLength;
+unsigned int encodedOffset;
+char *username;
+char *mechlist;
+};
+
+void vnc_sasl_client_cleanup(VncState *vs);
+
+long vnc_client_read_sasl(VncState *vs);
+long vnc_client_write_sasl(VncState *vs);
+
+void start_auth_sasl(VncState *vs);
+
+#endif /* __QEMU_VNC_AUTH_SASL_H__ */
+
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH RFC 6/6] qemu-xen-trad: sasl: add SASL option at build time

2017-05-15 Thread Simon Waterman

This change adds build support for the SASL integration,
disabled by default.

Signed-off-by: Simon Waterman 
---
 Makefile.target |  6 ++
 configure   | 34 ++
 2 files changed, 40 insertions(+)

diff --git a/Makefile.target b/Makefile.target
index 3c3db2b..a225a30 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -557,6 +557,12 @@ CPPFLAGS += $(CONFIG_VNC_TLS_CFLAGS)
 LIBS += $(CONFIG_VNC_TLS_LIBS)
 endif
 
+ifdef CONFIG_VNC_SASL
+CPPFLAGS += $(CONFIG_VNC_SASL_CFLAGS)
+LIBS += $(CONFIG_VNC_SASL_LIBS)
+OBJS+= vnc-auth-sasl.o
+endif
+
 ifdef CONFIG_BLUEZ
 LIBS += $(CONFIG_BLUEZ_LIBS)
 endif
diff --git a/configure b/configure
index 4547359..b2e9b79 100755
--- a/configure
+++ b/configure
@@ -164,6 +164,7 @@ fmod_lib=""
 fmod_inc=""
 oss_lib=""
 vnc_tls="yes"
+vnc_sasl="no"
 bsd="no"
 linux="no"
 solaris="no"
@@ -390,6 +391,8 @@ for opt do
   ;;
   --disable-vnc-tls) vnc_tls="no"
   ;;
+  --enable-vnc-sasl) vnc_sasl="yes"
+  ;;
   --disable-slirp) slirp="no"
   ;;
   --disable-vde) vde="no"
@@ -548,6 +551,7 @@ echo "   Available cards: 
$audio_possible_cards"
 echo "  --enable-mixemu  enable mixer emulation"
 echo "  --disable-brlapi disable BrlAPI"
 echo "  --disable-vnc-tlsdisable TLS encryption for VNC server"
+echo "  --enable-vnc-saslenable SASL encryption for VNC server"
 echo "  --disable-curses disable curses output"
 echo "  --disable-bluez  disable bluez stack connectivity"
 echo "  --disable-kvmdisable KVM acceleration support"
@@ -842,6 +846,25 @@ EOF
 fi
 
 ##
+# VNC SASL detection
+if test "$vnc_sasl" = "yes" ; then
+cat > $TMPC <
+#include 
+int main(void) { sasl_server_init(NULL, "qemu"); return 0; }
+EOF
+# Assuming Cyrus-SASL installed in /usr prefix
+vnc_sasl_cflags=""
+vnc_sasl_libs="-lsasl2"
+if $cc $ARCH_CFLAGS -o $TMPE ${OS_CFLAGS} $vnc_sasl_cflags $TMPC \
+   $vnc_sasl_libs 2> /dev/null ; then
+   :
+else
+   vnc_sasl="no"
+fi
+fi
+
+##
 # vde libraries probe
 if test "$vde" = "yes" ; then
   cat > $TMPC << EOF
@@ -1169,6 +1192,11 @@ if test "$vnc_tls" = "yes" ; then
 echo "TLS CFLAGS$vnc_tls_cflags"
 echo "TLS LIBS  $vnc_tls_libs"
 fi
+echo "VNC SASL support  $vnc_sasl"
+if test "$vnc_sasl" = "yes" ; then
+echo "SASL CFLAGS$vnc_sasl_cflags"
+echo "SASL LIBS  $vnc_sasl_libs"
+fi
 if test -n "$sparc_cpu"; then
 echo "Target Sparc Arch $sparc_cpu"
 fi
@@ -1414,6 +1442,12 @@ if test "$vnc_tls" = "yes" ; then
   echo "CONFIG_VNC_TLS_LIBS=$vnc_tls_libs" >> $config_mak
   echo "#define CONFIG_VNC_TLS 1" >> $config_h
 fi
+if test "$vnc_sasl" = "yes" ; then
+  echo "CONFIG_VNC_SASL=yes" >> $config_mak
+  echo "CONFIG_VNC_SASL_CFLAGS=$vnc_sasl_cflags" >> $config_mak
+  echo "CONFIG_VNC_SASL_LIBS=$vnc_sasl_libs" >> $config_mak
+  echo "#define CONFIG_VNC_SASL 1" >> $config_h
+fi
 qemu_version=`head $source_path/VERSION`
 echo "VERSION=$qemu_version" >>$config_mak
 echo "#define QEMU_VERSION \"$qemu_version\"" >> $config_h
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH RFC 3/6] qemu-xen-trad: sasl: implement SASL auth

2017-05-15 Thread Simon Waterman

Taken almost directly from upstream QEMU with minor changes:
1. Replace g_free etc. with standard equivalents
2. Remove ACL support, which is not in qemu-xen-traditional yet.

Signed-off-by: Simon Waterman 
---
 vnc-auth-sasl.c | 613 
 1 file changed, 613 insertions(+)
 create mode 100644 vnc-auth-sasl.c

diff --git a/vnc-auth-sasl.c b/vnc-auth-sasl.c
new file mode 100644
index 000..e3d2efb
--- /dev/null
+++ b/vnc-auth-sasl.c
@@ -0,0 +1,613 @@
+/*
+ * QEMU VNC display driver: SASL auth protocol
+ *
+ * Copyright (C) 2009 Red Hat, Inc
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "vnc.h"
+
+/* Max amount of data we send/recv for SASL steps to prevent DOS */
+#define SASL_DATA_MAX_LEN (1024 * 1024)
+
+
+void vnc_sasl_client_cleanup(VncState *vs)
+{
+if (vs->sasl.conn) {
+vs->sasl.runSSF = false;
+vs->sasl.wantSSF = false;
+vs->sasl.waitWriteSSF = 0;
+vs->sasl.encodedLength = vs->sasl.encodedOffset = 0;
+vs->sasl.encoded = NULL;
+free(vs->sasl.username);
+free(vs->sasl.mechlist);
+vs->sasl.username = vs->sasl.mechlist = NULL;
+sasl_dispose(>sasl.conn);
+vs->sasl.conn = NULL;
+}
+}
+
+
+long vnc_client_write_sasl(VncState *vs)
+{
+long ret;
+
+VNC_DEBUG("Write SASL: Pending output %p size %zd offset %zd "
+  "Encoded: %p size %d offset %d\n",
+  vs->output.buffer, vs->output.capacity, vs->output.offset,
+  vs->sasl.encoded, vs->sasl.encodedLength, 
vs->sasl.encodedOffset);
+
+if (!vs->sasl.encoded) {
+int err;
+err = sasl_encode(vs->sasl.conn,
+  (char *)vs->output.buffer,
+  vs->output.offset,
+  (const char **)>sasl.encoded,
+  >sasl.encodedLength);
+if (err != SASL_OK)
+return vnc_client_io_error(vs, -1, EIO);
+
+vs->sasl.encodedOffset = 0;
+}
+
+ret = vnc_client_write_buf(vs,
+   vs->sasl.encoded + vs->sasl.encodedOffset,
+   vs->sasl.encodedLength - 
vs->sasl.encodedOffset);
+if (!ret)
+return 0;
+
+vs->sasl.encodedOffset += ret;
+if (vs->sasl.encodedOffset == vs->sasl.encodedLength) {
+vs->output.offset = 0;
+vs->sasl.encoded = NULL;
+vs->sasl.encodedOffset = vs->sasl.encodedLength = 0;
+}
+
+/* Can't merge this block with one above, because
+ * someone might have written more unencrypted
+ * data in vs->output while we were processing
+ * SASL encoded output
+ */
+if (vs->output.offset == 0) {
+qemu_set_fd_handler(vs->csock, vnc_client_read, NULL, vs);
+}
+
+return ret;
+}
+
+
+long vnc_client_read_sasl(VncState *vs)
+{
+long ret;
+uint8_t encoded[4096];
+const char *decoded;
+unsigned int decodedLen;
+int err;
+
+ret = vnc_client_read_buf(vs, encoded, sizeof(encoded));
+if (!ret)
+return 0;
+
+err = sasl_decode(vs->sasl.conn,
+  (char *)encoded, ret,
+  , );
+
+if (err != SASL_OK)
+return vnc_client_io_error(vs, -1, -EIO);
+VNC_DEBUG("Read SASL Encoded %p size %ld Decoded %p size %d\n",
+  encoded, ret, decoded, decodedLen);
+buffer_reserve(>input, decodedLen);
+buffer_append(>input, decoded, decodedLen);
+return decodedLen;
+}
+
+
+static int vnc_auth_sasl_check_access(VncState *vs)
+{
+const void *val;
+int err;
+int allow;
+
+err = sasl_getprop(vs->sasl.conn, SASL_USERNAME, );
+if (err != SASL_OK) {
+VNC_DEBUG("cannot query SASL username on connection %d (%s), denying 
access\n",
+  err, sasl_errstring(err, NULL, NULL));
+return -1;
+}
+if (val == NULL) {
+

[Xen-devel] [PATCH RFC 0/6] qemu-xen-trad: sasl: add SASL support to VNC

2017-05-15 Thread Simon Waterman

This patch series back-ports SASL authentication from
upstream QEMU to the VNC server in qemu-xen-traditional.
It enables authentication to the VNC console of a domain
to be controlled using any SASL mechanism when using an
IOEMU stubdom.

SASL can be used with or without X509 certificates.

The option is currently enabled during build by adding
--enable-vnc-sasl to the configure line in xen-setup in the
root of the QEMU tree.

SASL auth can be enabled for a domain using the 'vnclisten'
option in the Xen config file:
vnclisten="0.0.0.0:5,tls,x509verify=/etc/ssl,sasl"

Details of how to configure SASL in QEMU can be found here:
https://qemu.weilnetz.de/doc/qemu-doc.html#vnc_005fsec_005fsasl

 Makefile.target |6
 configure   |   34 +++
 vnc-auth-sasl.c |  613 
 vnc-auth-sasl.h |   67 ++
 vnc.c   |  533 
 vnc.h   |  231 -
 6 files changed, 1257 insertions(+), 227 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH RFC 1/6] qemu-xen-trad: sasl: expose vnc API to SASL auth

2017-05-15 Thread Simon Waterman

Expose minimum VNC API to support SASL auth.  This is mainly the
VncState structure and a subset of the API funcs.

The layout of the file is modelled on the upstream QEMU vnc.h.

Signed-off-by: Simon Waterman 
---
 vnc.h | 231 +++---
 1 file changed, 222 insertions(+), 9 deletions(-)

diff --git a/vnc.h b/vnc.h
index 6981606..66bed0c 100644
--- a/vnc.h
+++ b/vnc.h
@@ -1,5 +1,183 @@
-#ifndef __VNCTIGHT_H
-#define __VNCTIGHT_H
+/*
+ * QEMU VNC display driver
+ *
+ * Copyright (C) 2006 Anthony Liguori 
+ * Copyright (C) 2006 Fabrice Bellard
+ * Copyright (C) 2009 Red Hat, Inc
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef __QEMU_VNC_H
+#define __QEMU_VNC_H
+
+#include "qemu-common.h"
+#include "console.h"
+#include "sysemu.h"
+
+// #define _VNC_DEBUG 1
+
+#ifdef _VNC_DEBUG
+#define VNC_DEBUG(fmt, ...) do { fprintf(stderr, fmt, ## __VA_ARGS__); } while 
(0)
+
+#if defined(CONFIG_VNC_TLS) && _VNC_DEBUG >= 2
+/* Very verbose, so only enabled for _VNC_DEBUG >= 2 */
+static void vnc_debug_gnutls_log(int level, const char* str) {
+VNC_DEBUG("%d %s", level, str);
+}
+#endif /* CONFIG_VNC_TLS && _VNC_DEBUG */
+#else
+#define VNC_DEBUG(fmt, ...) do { } while (0)
+#endif
+
+/*
+ *
+ * Core data structures
+ *
+ */
+
+typedef struct Buffer
+{
+size_t capacity;
+size_t offset;
+uint8_t *buffer;
+} Buffer;
+
+typedef struct VncState VncState;
+
+typedef int VncReadEvent(VncState *vs, uint8_t *data, size_t len);
+
+typedef void VncWritePixels(VncState *vs, void *data, int size);
+
+typedef void VncSendHextileTile(VncState *vs,
+int x, int y, int w, int h,
+void *last_bg,
+void *last_fg,
+int *has_bg, int *has_fg);
+
+#include "vnc_keysym.h"
+#include "keymaps.c"
+
+#ifdef CONFIG_VNC_TLS
+#include 
+#include 
+#endif /* CONFIG_VNC_TLS */
+
+#ifdef CONFIG_VNC_SASL
+#include "vnc-auth-sasl.h"
+#endif
+
+#define VNC_AUTH_CHALLENGE_SIZE 16
+
+#define QUEUE_ALLOC_UNIT 10
+
+typedef struct _QueueItem
+{
+int x, y, w, h;
+int32_t enc;
+struct _QueueItem *next;
+} QueueItem;
+
+typedef struct _Queue
+{
+QueueItem *queue_start;
+int start_count;
+QueueItem *queue_end;
+int end_count;
+} Queue;
+
+struct VncState
+{
+QEMUTimer *timer;
+int timer_interval;
+int64_t last_update_time;
+int lsock;
+int csock;
+DisplayState *ds;
+uint64_t *dirty_row;/* screen regions which are possibly dirty */
+int dirty_pixel_shift;
+uint64_t *update_row;   /* outstanding updates */
+int has_update; /* there's outstanding updates in the
+ * visible area */
+
+int update_requested;   /* the client requested an update */
+
+uint8_t *old_data;
+int has_resize;
+int has_hextile;
+int has_pointer_type_change;
+int has_WMVi;
+int absolute;
+int last_x;
+int last_y;
+
+int major;
+int minor;
+
+char *display;
+char *password;
+int auth;
+#ifdef CONFIG_VNC_TLS
+int subauth;
+int x509verify;
+
+char *x509cacert;
+char *x509cacrl;
+char *x509cert;
+char *x509key;
+#endif
+char challenge[VNC_AUTH_CHALLENGE_SIZE];
+int switchbpp;
+
+#ifdef CONFIG_VNC_TLS
+int wiremode;
+gnutls_session_t tls_session;
+#endif
+
+#ifdef CONFIG_VNC_SASL
+VncStateSASL sasl;
+#endif
+
+Buffer output;
+Buffer input;
+
+Queue upqueue;
+
+kbd_layout_t *kbd_layout;
+/* current output mode information */
+VncWritePixels *write_pixels;
+VncSendHextileTile *send_hextile_tile;
+DisplaySurface

Re: [Xen-devel] [PATCH v8 0/3] arm64, xen: add xen_boot support into grub-mkconfig

2017-05-15 Thread Fu Wei

Hi Daniel,

On 15 May 2017 at 21:46, Daniel Kiper  wrote:
> Hi Julien,
>
> On Mon, May 15, 2017 at 02:43:28PM +0100, Julien Grall wrote:
>> Hi Daniel,
>>
>> On 15/05/17 14:38, Daniel Kiper wrote:
>> >On Sun, May 14, 2017 at 03:43:44PM +0800, fu@linaro.org wrote:
>> >>From: Fu Wei 
>> >>
>> >>This patchset add xen_boot support into grub-mkconfig for
>> >>generating xen boot entrances automatically
>> >>
>> >>Also update the docs/grub.texi for new xen_boot commands.
>> >
>> >LGTM, if there are no objections I will commit it at the end
>> >of this week or the beginning of next one.
>>
>> Thank you!
>>
>> Can you also please commit patch [1] which has been sitting on the grub
>> ML for more than a year? This is preventing to boot Xen ARM with GRUB.
>>
>> Cheers,
>>
>> [1] https://lists.gnu.org/archive/html/grub-devel/2016-02/msg00205.html
>
> Will do with this patch series.
>

Great thanks ! :-)


> Daniel



-- 
Best regards,

Fu Wei
Software Engineer
Red Hat

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v9 4/5] shutdown: Add source information to SHUTDOWN and RESET

2017-05-15 Thread Eric Blake

Time to wire up all the call sites that request a shutdown or
reset to use the enum added in the previous patch.

It would have been less churn to keep the common case with no
arguments as meaning guest-triggered, and only modified the
host-triggered code paths, via a wrapper function, but then we'd
still have to audit that I didn't miss any host-triggered spots;
changing the signature forces us to double-check that I correctly
categorized all callers.

Since command line options can change whether a guest reset request
causes an actual reset vs. a shutdown, it's easy to also add the
information to reset requests.

Signed-off-by: Eric Blake 
Acked-by: David Gibson  [ppc parts]
Reviewed-by: Mark Cave-Ayland  [SPARC part]
Reviewed-by: Cornelia Huck  [s390x parts]

---
v8: rebase later in series
v7: no change
v6: defer event additions to later, add reviews of unchanged portions
v5: drop accidental addition of unrelated files
v4: s/ShutdownType/ShutdownCause/, no thanks to mingw header pollution
v3: retitle again, fix qemu-iotests, use enum rather than raw bool
in all callers
v2: retitle (was "event: Add signal information to SHUTDOWN"),
completely rework to post bool based on whether it is guest-initiated
v1: initial submission, exposing just Unix signals from host
---
 include/sysemu/sysemu.h |  4 ++--
 vl.c| 18 --
 hw/acpi/core.c  |  4 ++--
 hw/arm/highbank.c   |  4 ++--
 hw/arm/integratorcp.c   |  2 +-
 hw/arm/musicpal.c   |  2 +-
 hw/arm/omap1.c  | 10 ++
 hw/arm/omap2.c  |  2 +-
 hw/arm/spitz.c  |  2 +-
 hw/arm/stellaris.c  |  2 +-
 hw/arm/tosa.c   |  2 +-
 hw/i386/pc.c|  2 +-
 hw/i386/xen/xen-hvm.c   |  2 +-
 hw/input/pckbd.c|  4 ++--
 hw/ipmi/ipmi.c  |  4 ++--
 hw/isa/lpc_ich9.c   |  2 +-
 hw/mips/boston.c|  2 +-
 hw/mips/mips_malta.c|  2 +-
 hw/mips/mips_r4k.c  |  4 ++--
 hw/misc/arm_sysctl.c|  8 
 hw/misc/cbus.c  |  2 +-
 hw/misc/macio/cuda.c|  4 ++--
 hw/misc/slavio_misc.c   |  4 ++--
 hw/misc/zynq_slcr.c |  2 +-
 hw/pci-host/apb.c   |  4 ++--
 hw/pci-host/bonito.c|  2 +-
 hw/pci-host/piix.c  |  2 +-
 hw/ppc/e500.c   |  2 +-
 hw/ppc/mpc8544_guts.c   |  2 +-
 hw/ppc/ppc.c|  2 +-
 hw/ppc/ppc405_uc.c  |  2 +-
 hw/ppc/spapr_hcall.c|  2 +-
 hw/ppc/spapr_rtas.c |  4 ++--
 hw/s390x/ipl.c  |  2 +-
 hw/sh4/r2d.c|  2 +-
 hw/timer/etraxfs_timer.c|  2 +-
 hw/timer/m48t59.c   |  4 ++--
 hw/timer/milkymist-sysctl.c |  4 ++--
 hw/timer/pxa2xx_timer.c |  2 +-
 hw/watchdog/watchdog.c  |  2 +-
 hw/xenpv/xen_domainbuild.c  |  2 +-
 hw/xtensa/xtfpga.c  |  2 +-
 kvm-all.c   |  6 +++---
 os-win32.c  |  2 +-
 qmp.c   |  4 ++--
 replay/replay.c |  4 ++--
 target/alpha/sys_helper.c   |  4 ++--
 target/arm/psci.c   |  4 ++--
 target/i386/excp_helper.c   |  2 +-
 target/i386/hax-all.c   |  6 +++---
 target/i386/helper.c|  2 +-
 target/i386/kvm.c   |  2 +-
 target/s390x/helper.c   |  2 +-
 target/s390x/kvm.c  |  4 ++--
 target/s390x/misc_helper.c  |  4 ++--
 target/sparc/int32_helper.c |  2 +-
 ui/sdl.c|  2 +-
 ui/sdl2.c   |  4 ++--
 trace-events|  2 +-
 ui/cocoa.m  |  2 +-
 60 files changed, 98 insertions(+), 98 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 52102fd..e540e6f 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -62,13 +62,13 @@ typedef enum WakeupReason {
 QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;

-void qemu_system_reset_request(void);
+void qemu_system_reset_request(ShutdownCause reason);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
 void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
-void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request(ShutdownCause reason);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/vl.c b/vl.c
index 51ed60f..bc5c1be 100644
--- a/vl.c
+++ b/vl.c
@@ -1724,7 +1724,7 @@ void qemu_system_guest_panicked(GuestPanicInformation 
*info)
 if (!no_shutdown) {
 qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_POWEROFF,
!!info, info, _abort);
-qemu_system_shutdown_request();
+

[Xen-devel] [PATCH v9 2/5] shutdown: Prepare for use of an enum in reset/shutdown_request

2017-05-15 Thread Eric Blake

We want to track why a guest was shutdown; in particular, being able
to tell the difference between a guest request (such as ACPI request)
and host request (such as SIGINT) will prove useful to libvirt.
Since all requests eventually end up changing shutdown_requested in
vl.c, the logical change is to make that value track the reason,
rather than its current 0/1 contents.

Since command-line options control whether a reset request is turned
into a shutdown request instead, the same treatment is given to
reset_requested.

This patch adds an internal enum ShutdownCause that describes reasons
that a shutdown can be requested, and changes qemu_system_reset() to
pass the reason through, although for now nothing is actually changed
with regards to what gets reported.  The enum could be exported via
QAPI at a later date, if deemed necessary, but for now, there has not
been a request to expose that much detail to end clients.

For the most part, we turn 0 into SHUTDOWN_CAUSE_NONE, and 1 into
SHUTDOWN_CAUSE_HOST_ERROR; the only specific case where we have enough
information right now to use a different value is when we are reacting
to a host signal.  It will take a further patch to edit all call-sites
that can trigger a reset or shutdown request to properly pass in any
other reasons; this patch includes TODOs to point such places out.

qemu_system_reset() trades its 'bool report' parameter for a
'ShutdownCause reason', with all non-zero values having the same
effect; this lets us get rid of the weird #defines for VMRESET_*
as synonyms for bools.

Signed-off-by: Eric Blake 

---
v9: one more stray FIXME
v8: s/FIXME/TODO/, include SHUTDOWN_CAUSE__MAX now rather than later,
tweak comment on GUEST_SHUTDOWN to mention suspend
v7: drop 'bool report' from qemu_system_reset(), reorder enum to put
HOST_ERROR == 1, improve commit message
v6: make ShutdownCause internal-only, add SHUTDOWN_CAUSE_NONE so that
comparison to 0 still works, tweak initial FIXME values
v5: no change
v4: s/ShutdownType/ShutdownCause/, no thanks to mingw header pollution
v3: new patch
---
 include/sysemu/sysemu.h | 23 -
 vl.c| 53 ++---
 hw/i386/xen/xen-hvm.c   |  7 +--
 migration/colo.c|  2 +-
 migration/savevm.c  |  2 +-
 5 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 15656b7..52102fd 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -33,8 +33,21 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
 void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
 void vm_state_notify(int running, RunState state);

-#define VMRESET_SILENT   false
-#define VMRESET_REPORT   true
+/* Enumeration of various causes for shutdown. */
+typedef enum ShutdownCause {
+SHUTDOWN_CAUSE_NONE,  /* No shutdown request pending */
+SHUTDOWN_CAUSE_HOST_ERROR,/* An error prevents further use of guest */
+SHUTDOWN_CAUSE_HOST_QMP,  /* Reaction to a QMP command, like 'quit' */
+SHUTDOWN_CAUSE_HOST_SIGNAL,   /* Reaction to a signal, such as SIGINT */
+SHUTDOWN_CAUSE_HOST_UI,   /* Reaction to UI event, like window close */
+SHUTDOWN_CAUSE_GUEST_SHUTDOWN,/* Guest shutdown/suspend request, via
+ ACPI or other hardware-specific means */
+SHUTDOWN_CAUSE_GUEST_RESET,   /* Guest reset request, and command line
+ turns that into a shutdown */
+SHUTDOWN_CAUSE_GUEST_PANIC,   /* Guest panicked, and command line turns
+ that into a shutdown */
+SHUTDOWN_CAUSE__MAX,
+} ShutdownCause;

 void vm_start(void);
 int vm_prepare_start(void);
@@ -62,10 +75,10 @@ void qemu_system_debug_request(void);
 void qemu_system_vmstop_request(RunState reason);
 void qemu_system_vmstop_request_prepare(void);
 bool qemu_vmstop_requested(RunState *r);
-int qemu_shutdown_requested_get(void);
-int qemu_reset_requested_get(void);
+ShutdownCause qemu_shutdown_requested_get(void);
+ShutdownCause qemu_reset_requested_get(void);
 void qemu_system_killed(int signal, pid_t pid);
-void qemu_system_reset(bool report);
+void qemu_system_reset(ShutdownCause reason);
 void qemu_system_guest_panicked(GuestPanicInformation *info);
 size_t qemu_target_page_size(void);

diff --git a/vl.c b/vl.c
index 7396748..5c61d8c 100644
--- a/vl.c
+++ b/vl.c
@@ -1597,8 +1597,9 @@ void vm_state_notify(int running, RunState state)
 }
 }

-static int reset_requested;
-static int shutdown_requested, shutdown_signal;
+static ShutdownCause reset_requested;
+static ShutdownCause shutdown_requested;
+static int shutdown_signal;
 static pid_t shutdown_pid;
 static int powerdown_requested;
 static int debug_requested;
@@ -1612,19 +1613,19 @@ static NotifierList wakeup_notifiers =
 NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
 static uint32_t

Re: [Xen-devel] [GIT PULL] (xen) stable/for-jens-4.12

2017-05-15 Thread Jens Axboe

On 05/15/2017 03:28 PM, Konrad Rzeszutek Wilk wrote:
> Hey Jens,
> 
> Could you kindly pull:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> stable/for-jens-4.12
> 
> which has one tiny fix:
> 
> Thanks!
> 
> Gustavo A. R. Silva (1):
>   block: xen-blkback: add null check to avoid null pointer dereference
> 
>  drivers/block/xen-blkback/xenbus.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
Pulled, thanks.


-- 
Jens Axboe


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [GIT PULL] (xen) stable/for-jens-4.12

2017-05-15 Thread Konrad Rzeszutek Wilk

Hey Jens,

Could you kindly pull:

 git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
stable/for-jens-4.12

which has one tiny fix:

Thanks!

Gustavo A. R. Silva (1):
  block: xen-blkback: add null check to avoid null pointer dereference

 drivers/block/xen-blkback/xenbus.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] block: xen-blkback: add null check to avoid null pointer dereference

2017-05-15 Thread Konrad Rzeszutek Wilk

On Thu, May 11, 2017 at 10:27:35AM -0500, Gustavo A. R. Silva wrote:
> Add null check before calling xen_blkif_put() to avoid potential
> null pointer dereference.
> 

Applied to 'stable/for-jens-4.12' and will push soon to Jens.

> Addresses-Coverity-ID: 1350942
> Cc: Juergen Gross 
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/block/xen-blkback/xenbus.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/xenbus.c 
> b/drivers/block/xen-blkback/xenbus.c
> index 8fe61b5..1f3dfaa 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -504,11 +504,13 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
>  
>   dev_set_drvdata(>dev, NULL);
>  
> - if (be->blkif)
> + if (be->blkif) {
>   xen_blkif_disconnect(be->blkif);
>  
> - /* Put the reference we set in xen_blkif_alloc(). */
> - xen_blkif_put(be->blkif);
> + /* Put the reference we set in xen_blkif_alloc(). */
> + xen_blkif_put(be->blkif);
> + }
> +
>   kfree(be->mode);
>   kfree(be);
>   return 0;
> -- 
> 2.5.0
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 13/18] xen/pvcalls: implement release command

2017-05-15 Thread Stefano Stabellini

Release both active and passive sockets. For active sockets, make sure
to avoid possible conflicts with the ioworker reading/writing to those
sockets concurrently. Set map->release to let the ioworker know
atomically that the socket will be released soon, then wait until the
ioworker removed the socket from its list.

Unmap indexes pages and data rings.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 94 +-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index d5b7412..22c6426 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -253,13 +253,105 @@ static int pvcalls_back_release_active(struct 
xenbus_device *dev,
   struct pvcalls_back_priv *priv,
   struct sock_mapping *map)
 {
+   struct pvcalls_ioworker *iow =
+   _back_global.ioworkers[map->data_worker];
+   unsigned long flags;
+   bool in_loop = false;
+
+
+   disable_irq(map->irq);
+   if (map->sock->sk != NULL) {
+   lock_sock(map->sock->sk);
+   map->sock->sk->sk_user_data = NULL;
+   map->sock->sk->sk_data_ready = map->saved_data_ready;
+   release_sock(map->sock->sk);
+   }
+
+   atomic_set(>release, 1);
+
+   /*
+* To avoid concurrency problems with ioworker, check if the socket
+* has any outstanding io requests. If so, wait until the ioworker
+* removes it from the list before proceeding.
+*/
+   spin_lock_irqsave(>lock, flags);
+   in_loop = !list_empty(>queue);
+   spin_unlock_irqrestore(>lock, flags);
+
+   if (in_loop) {
+   atomic_inc(>io);
+   queue_work_on(map->data_worker, pvcalls_back_global.wq,
+ >register_work);
+   while (atomic_read(>release) > 0)
+   cond_resched();
+   }
+
+   down_write(>pvcallss_lock);
+   list_del(>list);
+   up_write(>pvcallss_lock);
+
+   xenbus_unmap_ring_vfree(dev, (void *)map->bytes);
+   xenbus_unmap_ring_vfree(dev, (void *)map->ring);
+   unbind_from_irqhandler(map->irq, map);
+
+   sock_release(map->sock);
+   kfree(map);
+
+   return 0;
+}
+
+static int pvcalls_back_release_passive(struct xenbus_device *dev,
+   struct pvcalls_back_priv *priv,
+   struct sockpass_mapping *mappass)
+{
+   if (mappass->sock->sk != NULL) {
+   lock_sock(mappass->sock->sk);
+   mappass->sock->sk->sk_user_data = NULL;
+   mappass->sock->sk->sk_data_ready = mappass->saved_data_ready;
+   release_sock(mappass->sock->sk);
+   }
+   down_write(>pvcallss_lock);
+   radix_tree_delete(>socketpass_mappings, mappass->id);
+   sock_release(mappass->sock);
+   flush_workqueue(mappass->wq);
+   destroy_workqueue(mappass->wq);
+   kfree(mappass);
+   up_write(>pvcallss_lock);
+
return 0;
 }
 
 static int pvcalls_back_release(struct xenbus_device *dev,
struct xen_pvcalls_request *req)
 {
-   return 0;
+   struct pvcalls_back_priv *priv;
+   struct sock_mapping *map, *n;
+   struct sockpass_mapping *mappass;
+   int ret = 0;
+   struct xen_pvcalls_response *rsp;
+
+   priv = dev_get_drvdata(>dev);
+
+   list_for_each_entry_safe(map, n, >socket_mappings, list) {
+   if (map->id == req->u.release.id) {
+   ret = pvcalls_back_release_active(dev, priv, map);
+   goto out;
+   }
+   }
+   mappass = radix_tree_lookup(>socketpass_mappings,
+   req->u.release.id);
+   if (mappass != NULL) {
+   ret = pvcalls_back_release_passive(dev, priv, mappass);
+   goto out;
+   }
+
+out:
+   rsp = RING_GET_RESPONSE(>ring, priv->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->u.release.id = req->u.release.id;
+   rsp->cmd = req->cmd;
+   rsp->ret = ret;
+   return 1;
 }
 
 static void __pvcalls_back_accept(struct work_struct *work)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend

2017-05-15 Thread Stefano Stabellini

The pvcalls backend has one ioworker per cpu: the ioworkers are
implemented as a cpu bound workqueue, and will deal with the actual
socket and data ring reads/writes.

ioworkers are global: we only have one set for all the frontends. They
process requests on their wqs list in order, once they are done with a
request, they'll remove it from the list. A spinlock is used for
protecting the list. Each ioworker is bound to a different cpu to
maximize throughput.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 64 ++
 1 file changed, 64 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2dbf7d8..46a889a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,6 +25,26 @@
 #include 
 #include 
 
+struct pvcalls_ioworker {
+   struct work_struct register_work;
+   atomic_t io;
+   struct list_head wqs;
+   spinlock_t lock;
+   int num;
+};
+
+struct pvcalls_back_global {
+   struct pvcalls_ioworker *ioworkers;
+   int nr_ioworkers;
+   struct workqueue_struct *wq;
+   struct list_head privs;
+   struct rw_semaphore privs_lock;
+} pvcalls_back_global;
+
+static void pvcalls_back_ioworker(struct work_struct *work)
+{
+}
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
  const struct xenbus_device_id *id)
 {
@@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
.uevent = pvcalls_back_uevent,
.otherend_changed = pvcalls_back_changed,
 };
+
+static int __init pvcalls_back_init(void)
+{
+   int ret, i, cpu;
+
+   if (!xen_domain())
+   return -ENODEV;
+
+   ret = xenbus_register_backend(_back_driver);
+   if (ret < 0)
+   return ret;
+
+   init_rwsem(_back_global.privs_lock);
+   INIT_LIST_HEAD(_back_global.privs);
+   pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
+   if (!pvcalls_back_global.wq)
+   goto error;
+   pvcalls_back_global.nr_ioworkers = num_online_cpus();
+   pvcalls_back_global.ioworkers = kzalloc(
+   sizeof(*pvcalls_back_global.ioworkers) *
+   pvcalls_back_global.nr_ioworkers, GFP_KERNEL);
+   if (!pvcalls_back_global.ioworkers)
+   goto error;
+   i = 0;
+   for_each_online_cpu(cpu) {
+   pvcalls_back_global.ioworkers[i].num = i;
+   atomic_set(_back_global.ioworkers[i].io, 1);
+   spin_lock_init(_back_global.ioworkers[i].lock);
+   INIT_LIST_HEAD(_back_global.ioworkers[i].wqs);
+   INIT_WORK(_back_global.ioworkers[i].register_work,
+   pvcalls_back_ioworker);
+   i++;
+   }
+   return 0;
+
+error:
+   if (pvcalls_back_global.wq)
+   destroy_workqueue(pvcalls_back_global.wq);
+   xenbus_unregister_driver(_back_driver);
+   kfree(pvcalls_back_global.ioworkers);
+   memset(_back_global, 0, sizeof(pvcalls_back_global));
+   return -ENOMEM;
+}
+module_init(pvcalls_back_init);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 09/18] xen/pvcalls: implement bind command

2017-05-15 Thread Stefano Stabellini

Allocate a socket. Track the allocated passive sockets with a new data
structure named sockpass_mapping. It contains an unbound workqueue to
schedule delayed work for the accept and poll commands. It also has a
reqcopy field to be used to store a copy of a request for delayed work.
Reads/writes to it are protected by a lock (the "copy_lock" spinlock).
Initialize the workqueue in pvcalls_back_bind.

Implement the bind command with inet_bind.

The pass_sk_data_ready event handler will be added later.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 89 +-
 1 file changed, 88 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 9ac1cf2..ff4634d 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -82,6 +82,18 @@ struct sock_mapping {
void (*saved_data_ready)(struct sock *sk);
 };
 
+struct sockpass_mapping {
+   struct list_head list;
+   struct pvcalls_back_priv *priv;
+   struct socket *sock;
+   uint64_t id;
+   struct xen_pvcalls_request reqcopy;
+   spinlock_t copy_lock;
+   struct workqueue_struct *wq;
+   struct work_struct register_work;
+   void (*saved_data_ready)(struct sock *sk);
+};
+
 static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
 static int pvcalls_back_release_active(struct xenbus_device *dev,
   struct pvcalls_back_priv *priv,
@@ -249,10 +261,85 @@ static int pvcalls_back_release(struct xenbus_device *dev,
return 0;
 }
 
+static void __pvcalls_back_accept(struct work_struct *work)
+{
+}
+
+static void pvcalls_pass_sk_data_ready(struct sock *sock)
+{
+}
+
 static int pvcalls_back_bind(struct xenbus_device *dev,
 struct xen_pvcalls_request *req)
 {
-   return 0;
+   struct pvcalls_back_priv *priv;
+   int ret, err;
+   struct socket *sock;
+   struct sockpass_mapping *map = NULL;
+   struct xen_pvcalls_response *rsp;
+
+   if (dev == NULL)
+   return 0;
+   priv = dev_get_drvdata(>dev);
+
+   map = kzalloc(sizeof(*map), GFP_KERNEL);
+   if (map == NULL) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   INIT_WORK(>register_work, __pvcalls_back_accept);
+   spin_lock_init(>copy_lock);
+   map->wq = alloc_workqueue("pvcalls_wq", WQ_UNBOUND, 1);
+   if (!map->wq) {
+   ret = -ENOMEM;
+   kfree(map);
+   goto out;
+   }
+
+   ret = sock_create(AF_INET, SOCK_STREAM, 0, );
+   if (ret < 0) {
+   destroy_workqueue(map->wq);
+   kfree(map);
+   goto out;
+   }
+
+   ret = inet_bind(sock, (struct sockaddr *)>u.bind.addr,
+   req->u.bind.len);
+   if (ret < 0) {
+   destroy_workqueue(map->wq);
+   kfree(map);
+   goto out;
+   }
+
+   map->priv = priv;
+   map->sock = sock;
+   map->id = req->u.bind.id;
+
+   down_write(>pvcallss_lock);
+   err = radix_tree_insert(>socketpass_mappings, map->id,
+   map);
+   up_write(>pvcallss_lock);
+   if (err) {
+   ret = err;
+   destroy_workqueue(map->wq);
+   kfree(map);
+   goto out;
+   }
+
+   lock_sock(sock->sk);
+   map->saved_data_ready = sock->sk->sk_data_ready;
+   sock->sk->sk_user_data = map;
+   sock->sk->sk_data_ready = pvcalls_pass_sk_data_ready;
+   release_sock(sock->sk);
+
+out:
+   rsp = RING_GET_RESPONSE(>ring, priv->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.bind.id = req->u.bind.id;
+   rsp->ret = ret;
+   return 1;
 }
 
 static int pvcalls_back_listen(struct xenbus_device *dev,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 02/18] xen/pvcalls: introduce the pvcalls xenbus backend

2017-05-15 Thread Stefano Stabellini

Introduce a xenbus backend for the pvcalls protocol, as defined by
https://xenbits.xen.org/docs/unstable/misc/pvcalls.html.

This patch only adds the stubs, the code will be added by the following
patches.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 61 ++
 1 file changed, 61 insertions(+)
 create mode 100644 drivers/xen/pvcalls-back.c

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
new file mode 100644
index 000..2dbf7d8
--- /dev/null
+++ b/drivers/xen/pvcalls-back.c
@@ -0,0 +1,61 @@
+/*
+ * (c) 2017 Stefano Stabellini 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int pvcalls_back_probe(struct xenbus_device *dev,
+ const struct xenbus_device_id *id)
+{
+   return 0;
+}
+
+static void pvcalls_back_changed(struct xenbus_device *dev,
+enum xenbus_state frontend_state)
+{
+}
+
+static int pvcalls_back_remove(struct xenbus_device *dev)
+{
+   return 0;
+}
+
+static int pvcalls_back_uevent(struct xenbus_device *xdev,
+  struct kobj_uevent_env *env)
+{
+   return 0;
+}
+
+static const struct xenbus_device_id pvcalls_back_ids[] = {
+   { "pvcalls" },
+   { "" }
+};
+
+static struct xenbus_driver pvcalls_back_driver = {
+   .ids = pvcalls_back_ids,
+   .probe = pvcalls_back_probe,
+   .remove = pvcalls_back_remove,
+   .uevent = pvcalls_back_uevent,
+   .otherend_changed = pvcalls_back_changed,
+};
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 11/18] xen/pvcalls: implement accept command

2017-05-15 Thread Stefano Stabellini

Implement the accept command by calling inet_accept. To avoid blocking
in the kernel, call inet_accept(O_NONBLOCK) from a workqueue, which get
scheduled on sk_data_ready (for a passive socket, it means that there
are connections to accept).

Use the reqcopy field to store the request. Accept the new socket from
the delayed work function, create a new sock_mapping for it, map
the indexes page and data ring, and reply to the other end. Choose an
ioworker for the socket randomly.

Only support one outstanding blocking accept request for every socket at
any time.

Add a field to sock_mapping to remember the passive socket from which an
active socket was created.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 156 +
 1 file changed, 156 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index a762877..d8e0a60 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -67,6 +67,7 @@ struct sock_mapping {
struct list_head list;
struct list_head queue;
struct pvcalls_back_priv *priv;
+   struct sockpass_mapping *sockpass;
struct socket *sock;
int data_worker;
uint64_t id;
@@ -263,10 +264,128 @@ static int pvcalls_back_release(struct xenbus_device 
*dev,
 
 static void __pvcalls_back_accept(struct work_struct *work)
 {
+   struct sockpass_mapping *mappass = container_of(
+   work, struct sockpass_mapping, register_work);
+   struct sock_mapping *map;
+   struct pvcalls_ioworker *iow;
+   struct pvcalls_back_priv *priv;
+   struct xen_pvcalls_response *rsp;
+   struct xen_pvcalls_request *req;
+   void *page = NULL;
+   int notify;
+   int ret = -EINVAL;
+   unsigned long flags;
+
+   priv = mappass->priv;
+   /* We only need to check the value of "cmd" atomically on read. */
+   spin_lock_irqsave(>copy_lock, flags);
+   req = >reqcopy;
+   if (req->cmd != PVCALLS_ACCEPT) {
+   spin_unlock_irqrestore(>copy_lock, flags);
+   return;
+   }
+   spin_unlock_irqrestore(>copy_lock, flags);
+
+   map = kzalloc(sizeof(*map), GFP_KERNEL);
+   if (map == NULL) {
+   ret = -ENOMEM;
+   goto out_error;
+   }
+
+   map->sock = sock_alloc();
+   if (!map->sock)
+   goto out_error;
+
+   INIT_LIST_HEAD(>queue);
+   map->data_worker = get_random_int() % pvcalls_back_global.nr_ioworkers;
+   map->ref = req->u.accept.ref;
+
+   map->priv = priv;
+   map->sockpass = mappass;
+   map->sock->type = mappass->sock->type;
+   map->sock->ops = mappass->sock->ops;
+   map->id = req->u.accept.id_new;
+
+   ret = xenbus_map_ring_valloc(priv->dev, >u.accept.ref, 1, );
+   if (ret < 0)
+   goto out_error;
+   map->ring = page;
+   map->ring_order = map->ring->ring_order;
+   /* first read the order, then map the data ring */
+   virt_rmb();
+   if (map->ring_order > MAX_RING_ORDER) {
+   ret = -EFAULT;
+   goto out_error;
+   }
+   ret = xenbus_map_ring_valloc(priv->dev, map->ring->ref,
+(1 << map->ring_order), );
+   if (ret < 0)
+   goto out_error;
+   map->bytes = page;
+
+   ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
+   req->u.accept.evtchn,
+   pvcalls_back_conn_event,
+   0,
+   "pvcalls-backend",
+   map);
+   if (ret < 0)
+   goto out_error;
+   map->irq = ret;
+
+   map->data.in = map->bytes;
+   map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
+
+   down_write(>pvcallss_lock);
+   list_add_tail(>list, >socket_mappings);
+   up_write(>pvcallss_lock);
+
+   ret = inet_accept(mappass->sock, map->sock, O_NONBLOCK, true);
+   if (ret == -EAGAIN)
+   goto out_error;
+
+   lock_sock(map->sock->sk);
+   map->saved_data_ready = map->sock->sk->sk_data_ready;
+   map->sock->sk->sk_user_data = map;
+   map->sock->sk->sk_data_ready = pvcalls_sk_data_ready;
+   map->sock->sk->sk_state_change = pvcalls_sk_state_change;
+   release_sock(map->sock->sk);
+
+   iow = _back_global.ioworkers[map->data_worker];
+   spin_lock_irqsave(>lock, flags);
+   atomic_inc(>read);
+   if (list_empty(>queue))
+   list_add_tail(>queue, >wqs);
+   spin_unlock_irqrestore( >lock, flags);
+   atomic_inc(>io);
+   queue_work_on(map->data_worker, pvcalls_back_global.wq, 
>register_work);
+
+out_error:
+   if (ret < 0)

[Xen-devel] [PATCH 16/18] xen/pvcalls: implement read

2017-05-15 Thread Stefano Stabellini

When an active socket has data available, add the relative sock_mapping
to the ioworker list, increment the io and read counters, and schedule
the ioworker.

Implement the read function by reading from the socket, writing the data
to the data ring.

Set in_error on error.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 89 ++
 1 file changed, 89 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index db3e02c..0f715a8 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -102,6 +102,79 @@ static int pvcalls_back_release_active(struct 
xenbus_device *dev,
 
 static void pvcalls_conn_back_read(unsigned long opaque)
 {
+   struct sock_mapping *map = (struct sock_mapping *)opaque;
+   struct msghdr msg;
+   struct kvec vec[2];
+   RING_IDX cons, prod, size, wanted, array_size, masked_prod, masked_cons;
+   int32_t error;
+   struct pvcalls_data_intf *intf = map->ring;
+   struct pvcalls_data *data = >data;
+   int ret;
+
+   array_size = XEN_FLEX_RING_SIZE(map->ring_order);
+   cons = intf->in_cons;
+   prod = intf->in_prod;
+   error = intf->in_error;
+   /* read the indexes first, then deal with the data */
+   virt_mb();
+
+   if (error)
+   return;
+
+   size = pvcalls_queued(prod, cons, array_size);
+   if (size >= array_size)
+   return;
+   lock_sock(map->sock->sk);
+   if (skb_queue_empty(>sock->sk->sk_receive_queue)) {
+   atomic_set(>read, 0);
+   release_sock(map->sock->sk);
+   return;
+   }
+   release_sock(map->sock->sk);
+   wanted = array_size - size;
+   masked_prod = pvcalls_mask(prod, array_size);
+   masked_cons = pvcalls_mask(cons, array_size);
+
+   memset(, 0, sizeof(msg));
+   msg.msg_iter.type = ITER_KVEC|WRITE;
+   msg.msg_iter.count = wanted;
+   if (masked_prod < masked_cons) {
+   vec[0].iov_base = data->in + masked_prod;
+   vec[0].iov_len = wanted;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 1;
+   } else {
+   vec[0].iov_base = data->in + masked_prod;
+   vec[0].iov_len = array_size - masked_prod;
+   vec[1].iov_base = data->in;
+   vec[1].iov_len = wanted - vec[0].iov_len;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 2;
+   }
+
+   atomic_set(>read, 0);
+   ret = inet_recvmsg(map->sock, , wanted, MSG_DONTWAIT);
+   WARN_ON(ret > 0 && ret > wanted);
+   if (ret == -EAGAIN) /* shouldn't happen */
+   return;
+   if (!ret)
+   ret = -ENOTCONN;
+   lock_sock(map->sock->sk);
+   if (ret > 0 && !skb_queue_empty(>sock->sk->sk_receive_queue))
+   atomic_inc(>read);
+   release_sock(map->sock->sk);
+
+   /* write the data, then modify the indexes */
+   virt_wmb();
+   if (ret < 0)
+   intf->in_error = ret;
+   else
+   intf->in_prod = prod + ret;
+   /* update the indexes, then notify the other end */
+   virt_wmb();
+   notify_remote_via_irq(map->irq);
+
+   return;
 }
 
 static int pvcalls_conn_back_write(struct sock_mapping *map)
@@ -192,6 +265,22 @@ static void pvcalls_sk_state_change(struct sock *sock)
 
 static void pvcalls_sk_data_ready(struct sock *sock)
 {
+   struct sock_mapping *map = sock->sk_user_data;
+   struct pvcalls_ioworker *iow;
+   unsigned long flags;
+
+   if (map == NULL)
+   return;
+
+   iow = _back_global.ioworkers[map->data_worker];
+   spin_lock_irqsave(>lock, flags);
+   atomic_inc(>read);
+   if (list_empty(>queue))
+   list_add_tail(>queue, >wqs);
+   spin_unlock_irqrestore(>lock, flags);
+   atomic_inc(>io);
+   queue_work_on(map->data_worker, pvcalls_back_global.wq,
+   >register_work);
 }
 
 static int pvcalls_back_connect(struct xenbus_device *dev,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 14/18] xen/pvcalls: disconnect and module_exit

2017-05-15 Thread Stefano Stabellini

Implement backend_disconnect. Call pvcalls_back_release_active on active
sockets and pvcalls_back_release_passive on passive sockets.

Implement module_exit by calling backend_disconnect on frontend
connections.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 48 ++
 1 file changed, 48 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 22c6426..0daa90a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -855,6 +855,35 @@ static int backend_connect(struct xenbus_device *dev)
 
 static int backend_disconnect(struct xenbus_device *dev)
 {
+   struct pvcalls_back_priv *priv;
+   struct sock_mapping *map, *n;
+   struct sockpass_mapping *mappass;
+   struct radix_tree_iter iter;
+   void **slot;
+
+
+   priv = dev_get_drvdata(>dev);
+
+   list_for_each_entry_safe(map, n, >socket_mappings, list) {
+   pvcalls_back_release_active(dev, priv, map);
+   }
+   radix_tree_for_each_slot(slot, >socketpass_mappings, , 0) {
+   mappass = radix_tree_deref_slot(slot);
+   if (!mappass || radix_tree_exception(mappass)) {
+   if (radix_tree_deref_retry(mappass)) {
+   slot = radix_tree_iter_retry();
+   continue;
+   }
+   } else
+   pvcalls_back_release_passive(dev, priv, mappass);
+   }
+   xenbus_unmap_ring_vfree(dev, (void *)priv->sring);
+   unbind_from_irqhandler(priv->irq, dev);
+   list_del(>list);
+   destroy_workqueue(priv->wq);
+   kfree(priv);
+   dev_set_drvdata(>dev, NULL);
+
return 0;
 }
 
@@ -1056,3 +1085,22 @@ static int __init pvcalls_back_init(void)
return -ENOMEM;
 }
 module_init(pvcalls_back_init);
+
+static void __exit pvcalls_back_fin(void)
+{
+   struct pvcalls_back_priv *priv, *npriv;
+
+   down_write(_back_global.privs_lock);
+   list_for_each_entry_safe(priv, npriv, _back_global.privs,
+list) {
+   backend_disconnect(priv->dev);
+   }
+   up_write(_back_global.privs_lock);
+
+   xenbus_unregister_driver(_back_driver);
+   destroy_workqueue(pvcalls_back_global.wq);
+   kfree(pvcalls_back_global.ioworkers);
+   memset(_back_global, 0, sizeof(pvcalls_back_global));
+}
+
+module_exit(pvcalls_back_fin);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 17/18] xen/pvcalls: implement write

2017-05-15 Thread Stefano Stabellini

When the other end notifies us that there is data to be written
(pvcalls_back_conn_event), add the relative sock_mapping to the ioworker
list, increment the io and write counters, and schedule the ioworker.

Implement the write function called by ioworker by reading the data from
the data ring, writing it to the socket by calling inet_sendmsg.

Set out_error on error.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 80 +-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 0f715a8..2de43c3 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -179,7 +179,67 @@ static void pvcalls_conn_back_read(unsigned long opaque)
 
 static int pvcalls_conn_back_write(struct sock_mapping *map)
 {
-   return 0;
+   struct pvcalls_data_intf *intf = map->ring;
+   struct pvcalls_data *data = >data;
+   struct msghdr msg;
+   struct kvec vec[2];
+   RING_IDX cons, prod, size, ring_size;
+   int ret;
+
+   cons = intf->out_cons;
+   prod = intf->out_prod;
+   /* read the indexes before dealing with the data */
+   virt_mb();
+
+   ring_size = XEN_FLEX_RING_SIZE(map->ring_order);
+   size = pvcalls_queued(prod, cons, ring_size);
+   if (size == 0)
+   return 0;
+
+   memset(, 0, sizeof(msg));
+   msg.msg_flags |= MSG_DONTWAIT;
+   msg.msg_iter.type = ITER_KVEC|READ;
+   msg.msg_iter.count = size;
+   if (pvcalls_mask(prod, ring_size) > pvcalls_mask(cons, ring_size)) {
+   vec[0].iov_base = data->out + pvcalls_mask(cons, ring_size);
+   vec[0].iov_len = size;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 1;
+   } else {
+   vec[0].iov_base = data->out + pvcalls_mask(cons, ring_size);
+   vec[0].iov_len = XEN_FLEX_RING_SIZE(ring_size) -
+   pvcalls_mask(cons, ring_size);
+   vec[1].iov_base = data->out;
+   vec[1].iov_len = size - vec[0].iov_len;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 2;
+   }
+
+   atomic_set(>write, 0);
+   ret = inet_sendmsg(map->sock, , size);
+   if (ret == -EAGAIN || ret < size) {
+   atomic_inc(>write);
+   atomic_inc(_back_global.ioworkers[map->data_worker].io);
+   }
+   if (ret == -EAGAIN)
+   return ret;
+
+   /* write the data, then update the indexes */
+   virt_wmb();
+   if (ret < 0) {
+   intf->out_error = ret;
+   } else {
+   intf->out_error = 0;
+   intf->out_cons = cons + ret;
+   prod = intf->out_prod;
+   }
+   /* update the indexes, then notify the other end */
+   virt_wmb();
+   if (prod != cons + ret)
+   atomic_inc(>write);
+   notify_remote_via_irq(map->irq);
+
+   return ret;
 }
 
 static void pvcalls_back_ioworker(struct work_struct *work)
@@ -914,6 +974,24 @@ static irqreturn_t pvcalls_back_event(int irq, void 
*dev_id)
 
 static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
 {
+   struct sock_mapping *map = sock_map;
+   struct pvcalls_ioworker *iow;
+   unsigned long flags;
+
+   if (map == NULL || map->sock == NULL || map->sock->sk == NULL ||
+   map->sock->sk->sk_user_data != map)
+   return IRQ_HANDLED;
+
+   iow = _back_global.ioworkers[map->data_worker];
+   spin_lock_irqsave(>lock, flags);
+   atomic_inc(>write);
+   if (list_empty(>queue))
+   list_add_tail(>queue, >wqs);
+   spin_unlock_irqrestore(>lock, flags);
+   atomic_inc(>io);
+   queue_work_on(map->data_worker, pvcalls_back_global.wq,
+   >register_work);
+
return IRQ_HANDLED;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 10/18] xen/pvcalls: implement listen command

2017-05-15 Thread Stefano Stabellini

Call inet_listen to implement the listen command.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index ff4634d..a762877 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -345,7 +345,28 @@ static int pvcalls_back_bind(struct xenbus_device *dev,
 static int pvcalls_back_listen(struct xenbus_device *dev,
   struct xen_pvcalls_request *req)
 {
-   return 0;
+   struct pvcalls_back_priv *priv;
+   int ret = -EINVAL;
+   struct sockpass_mapping *map;
+   struct xen_pvcalls_response *rsp;
+
+   if (dev == NULL)
+   return 0;
+   priv = dev_get_drvdata(>dev);
+
+   map = radix_tree_lookup(>socketpass_mappings, req->u.listen.id);
+   if (map == NULL)
+   goto out;
+
+   ret = inet_listen(map->sock, req->u.listen.backlog);
+
+out:
+   rsp = RING_GET_RESPONSE(>ring, priv->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.listen.id = req->u.listen.id;
+   rsp->ret = ret;
+   return 1;
 }
 
 static int pvcalls_back_accept(struct xenbus_device *dev,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 08/18] xen/pvcalls: implement connect command

2017-05-15 Thread Stefano Stabellini

Allocate a socket. Keep track of socket <-> ring mappings with a new data
structure, called sock_mapping. Implement the connect command by calling
inet_stream_connect, and mapping the new indexes page and data ring.
Associate the socket to an ioworker randomly.

When an active socket is closed (sk_state_change), set in_error to
-ENOTCONN and notify the other end, as specified by the protocol.

sk_data_ready will be implemented later.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 145 +
 1 file changed, 145 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2eae096..9ac1cf2 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -63,6 +63,29 @@ struct pvcalls_back_priv {
struct work_struct register_work;
 };
 
+struct sock_mapping {
+   struct list_head list;
+   struct list_head queue;
+   struct pvcalls_back_priv *priv;
+   struct socket *sock;
+   int data_worker;
+   uint64_t id;
+   grant_ref_t ref;
+   struct pvcalls_data_intf *ring;
+   void *bytes;
+   struct pvcalls_data data;
+   uint32_t ring_order;
+   int irq;
+   atomic_t read;
+   atomic_t write;
+   atomic_t release;
+   void (*saved_data_ready)(struct sock *sk);
+};
+
+static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
+static int pvcalls_back_release_active(struct xenbus_device *dev,
+  struct pvcalls_back_priv *priv,
+  struct sock_mapping *map);
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
@@ -97,9 +120,126 @@ static int pvcalls_back_socket(struct xenbus_device *dev,
return 1;
 }
 
+static void pvcalls_sk_state_change(struct sock *sock)
+{
+   struct sock_mapping *map = sock->sk_user_data;
+   struct pvcalls_data_intf *intf;
+
+   if (map == NULL)
+   return;
+
+   intf = map->ring;
+   intf->in_error = -ENOTCONN;
+   notify_remote_via_irq(map->irq);
+}
+
+static void pvcalls_sk_data_ready(struct sock *sock)
+{
+}
+
 static int pvcalls_back_connect(struct xenbus_device *dev,
struct xen_pvcalls_request *req)
 {
+   struct pvcalls_back_priv *priv;
+   int ret;
+   struct socket *sock;
+   struct sock_mapping *map = NULL;
+   void *page;
+   struct xen_pvcalls_response *rsp;
+
+   if (dev == NULL)
+   return 0;
+   priv = dev_get_drvdata(>dev);
+
+   map = kzalloc(sizeof(*map), GFP_KERNEL);
+   if (map == NULL) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   ret = sock_create(AF_INET, SOCK_STREAM, 0, );
+   if (ret < 0) {
+   kfree(map);
+   goto out;
+   }
+   INIT_LIST_HEAD(>queue);
+   map->data_worker = get_random_int() % pvcalls_back_global.nr_ioworkers;
+
+   map->priv = priv;
+   map->sock = sock;
+   map->id = req->u.connect.id;
+   map->ref = req->u.connect.ref;
+
+   ret = xenbus_map_ring_valloc(dev, >u.connect.ref, 1, );
+   if (ret < 0) {
+   sock_release(map->sock);
+   kfree(map);
+   goto out;
+   }
+   map->ring = page;
+   map->ring_order = map->ring->ring_order;
+   /* first read the order, then map the data ring */
+   virt_rmb();
+   if (map->ring_order > MAX_RING_ORDER) {
+   ret = -EFAULT;
+   goto out;
+   }
+   ret = xenbus_map_ring_valloc(dev, map->ring->ref,
+(1 << map->ring_order), );
+   if (ret < 0) {
+   sock_release(map->sock);
+   xenbus_unmap_ring_vfree(dev, map->ring);
+   kfree(map);
+   goto out;
+   }
+   map->bytes = page;
+
+   ret = bind_interdomain_evtchn_to_irqhandler(priv->dev->otherend_id,
+   req->u.connect.evtchn,
+   pvcalls_back_conn_event,
+   0,
+   "pvcalls-backend",
+   map);
+   if (ret < 0) {
+   sock_release(map->sock);
+   kfree(map);
+   goto out;
+   }
+   map->irq = ret;
+
+   map->data.in = map->bytes;
+   map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
+
+   down_write(>pvcallss_lock);
+   list_add_tail(>list, >socket_mappings);
+   up_write(>pvcallss_lock);
+
+   ret = inet_stream_connect(sock, (struct sockaddr *)>u.connect.addr,
+ req->u.connect.len, req->u.connect.flags);
+   if (ret < 0) {
+   pvcalls_back_release_active(dev,

[Xen-devel] [PATCH 15/18] xen/pvcalls: introduce the ioworker

2017-05-15 Thread Stefano Stabellini

We have one ioworker per cpu core. Each ioworker gets assigned active
sockets randomly. Once a socket is assigned to an ioworker, it remains
tied to it until is released.

Each ioworker goes through the list of outstanding read/write requests
by walking a list of struct sock_mapping. Once a request has been dealt
with, the struct sock_mapping is removed from the list.

We use one atomic counter per socket for "read" operations and one
for "write" operations to keep track of the reads/writes to do.

We also use one atomic counter ("io") per ioworker to keep track of how
many outstanding requests we have in total assigned to the ioworker. The
ioworker finishes when there are none.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 0daa90a..db3e02c 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -99,8 +99,52 @@ struct sockpass_mapping {
 static int pvcalls_back_release_active(struct xenbus_device *dev,
   struct pvcalls_back_priv *priv,
   struct sock_mapping *map);
+
+static void pvcalls_conn_back_read(unsigned long opaque)
+{
+}
+
+static int pvcalls_conn_back_write(struct sock_mapping *map)
+{
+   return 0;
+}
+
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
+   struct pvcalls_ioworker *ioworker = container_of(work,
+   struct pvcalls_ioworker, register_work);
+   int num = ioworker->num;
+   struct sock_mapping *map, *n;
+   unsigned long flags;
+
+   while (atomic_read(>io) > 0) {
+   spin_lock_irqsave(>lock, flags);
+   list_for_each_entry_safe(map, n, >wqs, queue) {
+   if (map->data_worker != num)
+   continue;
+
+   if (atomic_read(>release) > 0) {
+   list_del_init(>queue);
+   atomic_set(>release, 0);
+   continue;
+   }
+
+   spin_unlock_irqrestore(>lock, flags);
+   if (atomic_read(>read) > 0)
+   pvcalls_conn_back_read((unsigned long)map);
+   if (atomic_read(>write) > 0)
+   pvcalls_conn_back_write(map);
+   spin_lock_irqsave(>lock, flags);
+
+   if (atomic_read(>read) == 0 &&
+   atomic_read(>write) == 0) {
+   list_del_init(>queue);
+   atomic_set(>release, 0);
+   }
+   }
+   atomic_dec(>io);
+   spin_unlock_irqrestore(>lock, flags);
+   }
 }
 
 static int pvcalls_back_socket(struct xenbus_device *dev,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 04/18] xen/pvcalls: xenbus state handling

2017-05-15 Thread Stefano Stabellini

Introduce the code to handle xenbus state changes.

Implement the probe function for the pvcalls backend. Write the
supported versions, max-page-order and function-calls nodes to xenstore,
as required by the protocol.

Introduce stub functions for disconnecting/connecting to a frontend.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 133 +
 1 file changed, 133 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 46a889a..86eca19 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,6 +25,9 @@
 #include 
 #include 
 
+#define PVCALLS_VERSIONS "1"
+#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
+
 struct pvcalls_ioworker {
struct work_struct register_work;
atomic_t io;
@@ -45,15 +48,145 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
 
+static int backend_connect(struct xenbus_device *dev)
+{
+   return 0;
+}
+
+static int backend_disconnect(struct xenbus_device *dev)
+{
+   return 0;
+}
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
  const struct xenbus_device_id *id)
 {
+   int err;
+
+   err = xenbus_printf(XBT_NIL, dev->nodename, "versions", "%s",
+   PVCALLS_VERSIONS);
+   if (err) {
+   pr_warn("%s write out 'version' failed\n", __func__);
+   return -EINVAL;
+   }
+
+   err = xenbus_printf(XBT_NIL, dev->nodename, "max-page-order", "%u",
+   MAX_RING_ORDER);
+   if (err) {
+   pr_warn("%s write out 'max-page-order' failed\n", __func__);
+   return -EINVAL;
+   }
+
+   /* "1" means socket, connect, release, bind, listen, accept and poll*/
+   err = xenbus_printf(XBT_NIL, dev->nodename, "function-calls", "1");
+   if (err) {
+   pr_warn("%s write out 'function-calls' failed\n", __func__);
+   return -EINVAL;
+   }
+
+   err = xenbus_switch_state(dev, XenbusStateInitWait);
+   if (err)
+   return err;
+
return 0;
 }
 
+static void set_backend_state(struct xenbus_device *dev,
+ enum xenbus_state state)
+{
+   while (dev->state != state) {
+   switch (dev->state) {
+   case XenbusStateClosed:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateConnected:
+   xenbus_switch_state(dev, XenbusStateInitWait);
+   break;
+   case XenbusStateClosing:
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateInitWait:
+   case XenbusStateInitialised:
+   switch (state) {
+   case XenbusStateConnected:
+   backend_connect(dev);
+   xenbus_switch_state(dev, XenbusStateConnected);
+   break;
+   case XenbusStateClosing:
+   case XenbusStateClosed:
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateConnected:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateClosing:
+   case XenbusStateClosed:
+   down_write(_back_global.privs_lock);
+   backend_disconnect(dev);
+   up_write(_back_global.privs_lock);
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateClosing:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateConnected:
+   case XenbusStateClosed:
+   xenbus_switch_state(dev, XenbusStateClosed);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   default:
+   __WARN();
+   }
+   }
+}
+
 static void

[Xen-devel] [PATCH 18/18] xen: introduce a Kconfig option to enable the pvcalls backend

2017-05-15 Thread Stefano Stabellini

Also add pvcalls-back to the Makefile.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/Kconfig  | 12 
 drivers/xen/Makefile |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index f15bb3b7..bbdf059 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -196,6 +196,18 @@ config XEN_PCIDEV_BACKEND
 
  If in doubt, say m.
 
+config XEN_PVCALLS_BACKEND
+   bool "XEN PV Calls backend driver"
+   depends on INET && XEN
+   default n
+   help
+ Experimental backend for the Xen PV Calls protocol
+ (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It
+ allows PV Calls frontends to send POSIX calls to the backend,
+ which implements them.
+
+ If in doubt, say n.
+
 config XEN_SCSI_BACKEND
tristate "XEN SCSI backend driver"
depends on XEN && XEN_BACKEND && TARGET_CORE
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 8feab810..480b928 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_XEN_ACPI_PROCESSOR)  += xen-acpi-processor.o
 obj-$(CONFIG_XEN_EFI)  += efi.o
 obj-$(CONFIG_XEN_SCSI_BACKEND) += xen-scsiback.o
 obj-$(CONFIG_XEN_AUTO_XLATE)   += xlate_mmu.o
+obj-$(CONFIG_XEN_PVCALLS_BACKEND)  += pvcalls-back.o
 xen-evtchn-y   := evtchn.o
 xen-gntdev-y   := gntdev.o
 xen-gntalloc-y := gntalloc.o
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 12/18] xen/pvcalls: implement poll command

2017-05-15 Thread Stefano Stabellini

Implement poll on passive sockets by requesting a delayed response with
mappass->reqcopy, and reply back when there is data on the passive
socket.

Poll on active socket is unimplemented as by the spec, as the frontend
should just wait for events and check the indexes on the indexes page.

Only support one outstanding poll (or accept) request for every passive
socket at any given time.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 70 +-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index d8e0a60..d5b7412 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -381,11 +381,30 @@ static void __pvcalls_back_accept(struct work_struct 
*work)
 static void pvcalls_pass_sk_data_ready(struct sock *sock)
 {
struct sockpass_mapping *mappass = sock->sk_user_data;
+   struct pvcalls_back_priv *priv;
+   struct xen_pvcalls_response *rsp;
+   unsigned long flags;
+   int notify;
 
if (mappass == NULL)
return;
 
-   queue_work(mappass->wq, >register_work);
+   priv = mappass->priv;
+   spin_lock_irqsave(>copy_lock, flags);
+   if (mappass->reqcopy.cmd == PVCALLS_POLL) {
+   rsp = RING_GET_RESPONSE(>ring, priv->ring.rsp_prod_pvt++);
+   rsp->req_id = mappass->reqcopy.req_id;
+   rsp->u.poll.id = mappass->reqcopy.u.poll.id;
+   rsp->cmd = mappass->reqcopy.cmd;
+   rsp->ret = 0;
+
+   mappass->reqcopy.cmd = 0;
+   RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>ring, notify);
+   if (notify)
+   notify_remote_via_irq(mappass->priv->irq);
+   } else
+   queue_work(mappass->wq, >register_work);
+   spin_unlock_irqrestore(>copy_lock, flags);
 }
 
 static int pvcalls_back_bind(struct xenbus_device *dev,
@@ -534,6 +553,55 @@ static int pvcalls_back_accept(struct xenbus_device *dev,
 static int pvcalls_back_poll(struct xenbus_device *dev,
 struct xen_pvcalls_request *req)
 {
+   struct pvcalls_back_priv *priv;
+   struct sockpass_mapping *mappass;
+   struct xen_pvcalls_response *rsp;
+   struct inet_connection_sock *icsk;
+   struct request_sock_queue *queue;
+   unsigned long flags;
+   int ret;
+   bool data;
+
+   if (dev == NULL)
+   return 0;
+   priv = dev_get_drvdata(>dev);
+
+   mappass = radix_tree_lookup(>socketpass_mappings, req->u.poll.id);
+   if (mappass == NULL)
+   return 0;
+
+   /*
+* Limitation of the current implementation: only support one
+* concurrent accept or poll call on one socket.
+*/
+   spin_lock_irqsave(>copy_lock, flags);
+   if (mappass->reqcopy.cmd != 0) {
+   ret = -EINTR;
+   goto out;
+   }
+
+   mappass->reqcopy = *req;
+   lock_sock(mappass->sock->sk);
+   icsk = inet_csk(mappass->sock->sk);
+   queue = >icsk_accept_queue;
+   data = queue->rskq_accept_head != NULL;
+   release_sock(mappass->sock->sk);
+   if (data) {
+   mappass->reqcopy.cmd = 0;
+   ret = 0;
+   goto out;
+   }
+
+   return 0;
+
+out:
+   spin_unlock_irqrestore(>copy_lock, flags);
+
+   rsp = RING_GET_RESPONSE(>ring, priv->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.poll.id = req->u.poll.id;
+   rsp->ret = ret;
return 0;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 06/18] xen/pvcalls: handle commands from the frontend

2017-05-15 Thread Stefano Stabellini

When the other end notifies us that there are commands to be read
(pvcalls_back_event), wake up the backend thread to parse the command.

The command ring works like most other Xen rings, so use the usual
ring macros to read and write to it. The functions implementing the
commands are empty stubs for now.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 115 +
 1 file changed, 115 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 876e577..2b2a49a 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -62,12 +62,127 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
 
+static int pvcalls_back_socket(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_connect(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_release(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_bind(struct xenbus_device *dev,
+struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_listen(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_accept(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_poll(struct xenbus_device *dev,
+struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_handle_cmd(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   int ret = 0;
+
+   switch (req->cmd) {
+   case PVCALLS_SOCKET:
+   ret = pvcalls_back_socket(dev, req);
+   break;
+   case PVCALLS_CONNECT:
+   ret = pvcalls_back_connect(dev, req);
+   break;
+   case PVCALLS_RELEASE:
+   ret = pvcalls_back_release(dev, req);
+   break;
+   case PVCALLS_BIND:
+   ret = pvcalls_back_bind(dev, req);
+   break;
+   case PVCALLS_LISTEN:
+   ret = pvcalls_back_listen(dev, req);
+   break;
+   case PVCALLS_ACCEPT:
+   ret = pvcalls_back_accept(dev, req);
+   break;
+   case PVCALLS_POLL:
+   ret = pvcalls_back_poll(dev, req);
+   break;
+   default:
+   ret = -ENOTSUPP;
+   break;
+   }
+   return ret;
+}
+
 static void pvcalls_back_work(struct work_struct *work)
 {
+   struct pvcalls_back_priv *priv = container_of(work,
+   struct pvcalls_back_priv, register_work);
+   int notify, notify_all = 0, more = 1;
+   struct xen_pvcalls_request req;
+   struct xenbus_device *dev = priv->dev;
+
+   atomic_set(>work, 1);
+
+   while (more || !atomic_dec_and_test(>work)) {
+   while (RING_HAS_UNCONSUMED_REQUESTS(>ring)) {
+   RING_COPY_REQUEST(>ring,
+ priv->ring.req_cons++,
+ );
+
+   if (pvcalls_back_handle_cmd(dev, ) > 0) {
+   RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
+   >ring, notify);
+   notify_all += notify;
+   }
+   }
+
+   if (notify_all)
+   notify_remote_via_irq(priv->irq);
+
+   RING_FINAL_CHECK_FOR_REQUESTS(>ring, more);
+   }
 }
 
 static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
 {
+   struct xenbus_device *dev = dev_id;
+   struct pvcalls_back_priv *priv = NULL;
+
+   if (dev == NULL)
+   return IRQ_HANDLED;
+
+   priv = dev_get_drvdata(>dev);
+   if (priv == NULL)
+   return IRQ_HANDLED;
+
+   atomic_inc(>work);
+   queue_work(priv->wq, >register_work);
+
return IRQ_HANDLED;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 05/18] xen/pvcalls: connect to a frontend

2017-05-15 Thread Stefano Stabellini

Introduce a per-frontend data structure named pvcalls_back_priv. It
contains pointers to the command ring, its event channel, a list of
active sockets and a tree of passive sockets (passing sockets need to be
looked up from the id on listen, accept and poll commands, while active
sockets only on release).

It also has an unbound workqueue to schedule the work of parsing and
executing commands on the command ring. pvcallss_lock protects the two
lists. In pvcalls_back_global, keep a list of connected frontends.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 87 ++
 1 file changed, 87 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 86eca19..876e577 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -44,13 +44,100 @@ struct pvcalls_back_global {
struct rw_semaphore privs_lock;
 } pvcalls_back_global;
 
+struct pvcalls_back_priv {
+   struct list_head list;
+   struct xenbus_device *dev;
+   struct xen_pvcalls_sring *sring;
+   struct xen_pvcalls_back_ring ring;
+   int irq;
+   struct list_head socket_mappings;
+   struct radix_tree_root socketpass_mappings;
+   struct rw_semaphore pvcallss_lock;
+   atomic_t work;
+   struct workqueue_struct *wq;
+   struct work_struct register_work;
+};
+
 static void pvcalls_back_ioworker(struct work_struct *work)
 {
 }
 
+static void pvcalls_back_work(struct work_struct *work)
+{
+}
+
+static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
+{
+   return IRQ_HANDLED;
+}
+
 static int backend_connect(struct xenbus_device *dev)
 {
+   int err, evtchn;
+   grant_ref_t ring_ref;
+   void *addr = NULL;
+   struct pvcalls_back_priv *priv = NULL;
+
+   priv = kzalloc(sizeof(struct pvcalls_back_priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "port", "%u",
+  );
+   if (err != 1) {
+   err = -EINVAL;
+   xenbus_dev_fatal(dev, err, "reading %s/event-channel",
+dev->otherend);
+   goto error;
+   }
+
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref", "%u", _ref);
+   if (err != 1) {
+   err = -EINVAL;
+   xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
+dev->otherend);
+   goto error;
+   }
+
+   err = xenbus_map_ring_valloc(dev, _ref, 1, );
+   if (err < 0)
+   goto error;
+
+   err = bind_interdomain_evtchn_to_irqhandler(dev->otherend_id, evtchn,
+   pvcalls_back_event, 0,
+   "pvcalls-backend", dev);
+   if (err < 0)
+   goto error;
+
+   priv->wq = alloc_workqueue("pvcalls_back_wq", WQ_UNBOUND, 1);
+   if (!priv->wq) {
+   err = -ENOMEM;
+   goto error;
+   }
+   INIT_WORK(>register_work, pvcalls_back_work);
+   priv->dev = dev;
+   priv->sring = addr;
+   BACK_RING_INIT(>ring, priv->sring, XEN_PAGE_SIZE * 1);
+   priv->irq = err;
+   INIT_LIST_HEAD(>socket_mappings);
+   INIT_RADIX_TREE(>socketpass_mappings, GFP_KERNEL);
+   init_rwsem(>pvcallss_lock);
+   dev_set_drvdata(>dev, priv);
+   down_write(_back_global.privs_lock);
+   list_add_tail(>list, _back_global.privs);
+   up_write(_back_global.privs_lock);
+   queue_work(priv->wq, >register_work);
+
return 0;
+
+ error:
+   if (addr != NULL)
+   xenbus_unmap_ring_vfree(dev, addr);
+   if (priv->wq)
+   destroy_workqueue(priv->wq);
+   unbind_from_irqhandler(priv->irq, dev);
+   kfree(priv);
+   return err;
 }
 
 static int backend_disconnect(struct xenbus_device *dev)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 07/18] xen/pvcalls: implement socket command

2017-05-15 Thread Stefano Stabellini

Just reply with success to the other end for now. Delay the allocation
of the actual socket to bind and/or connect.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 2b2a49a..2eae096 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -12,12 +12,17 @@
  * GNU General Public License for more details.
  */
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -65,7 +70,31 @@ static void pvcalls_back_ioworker(struct work_struct *work)
 static int pvcalls_back_socket(struct xenbus_device *dev,
struct xen_pvcalls_request *req)
 {
-   return 0;
+   struct pvcalls_back_priv *priv;
+   int ret;
+   struct xen_pvcalls_response *rsp;
+
+   if (dev == NULL)
+   return 0;
+   priv = dev_get_drvdata(>dev);
+
+   if (req->u.socket.domain != AF_INET ||
+   req->u.socket.type != SOCK_STREAM ||
+   (req->u.socket.protocol != 0 &&
+req->u.socket.protocol != AF_INET))
+   ret = -EAFNOSUPPORT;
+   else
+   ret = 0;
+
+   /* leave the actual socket allocation for later */
+
+   rsp = RING_GET_RESPONSE(>ring, priv->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.socket.id = req->u.socket.id;
+   rsp->ret = ret;
+
+   return 1;
 }
 
 static int pvcalls_back_connect(struct xenbus_device *dev,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 00/18] introduce the Xen PV Calls backend

2017-05-15 Thread Stefano Stabellini

Hi all,

this series introduces the backend for the newly introduced PV Calls
procotol.

PV Calls is a paravirtualized protocol that allows the implementation of
a set of POSIX functions in a different domain. The PV Calls frontend
sends POSIX function calls to the backend, which implements them and
returns a value to the frontend and acts on the function call.

For more information about PV Calls, please read:

https://xenbits.xen.org/docs/unstable/misc/pvcalls.html

I tried to split the source code into small pieces to make it easier to
read and understand. Please review!


Stefano Stabellini (18):
  xen: introduce the pvcalls interface header
  xen/pvcalls: introduce the pvcalls xenbus backend
  xen/pvcalls: initialize the module and register the xenbus backend
  xen/pvcalls: xenbus state handling
  xen/pvcalls: connect to a frontend
  xen/pvcalls: handle commands from the frontend
  xen/pvcalls: implement socket command
  xen/pvcalls: implement connect command
  xen/pvcalls: implement bind command
  xen/pvcalls: implement listen command
  xen/pvcalls: implement accept command
  xen/pvcalls: implement poll command
  xen/pvcalls: implement release command
  xen/pvcalls: disconnect and module_exit
  xen/pvcalls: introduce the ioworker
  xen/pvcalls: implement read
  xen/pvcalls: implement write
  xen: introduce a Kconfig option to enable the pvcalls backend

 drivers/xen/Kconfig|   12 +
 drivers/xen/Makefile   |1 +
 drivers/xen/pvcalls-back.c | 1317 
 include/xen/interface/io/pvcalls.h |  117 
 4 files changed, 1447 insertions(+)
 create mode 100644 drivers/xen/pvcalls-back.c
 create mode 100644 include/xen/interface/io/pvcalls.h

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 01/18] xen: introduce the pvcalls interface header

2017-05-15 Thread Stefano Stabellini

Introduce the C header file which defines the PV Calls interface. It is
imported from xen/include/public/io/pvcalls.h.

Signed-off-by: Stefano Stabellini 
CC: konrad.w...@oracle.com
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 include/xen/interface/io/pvcalls.h | 117 +
 1 file changed, 117 insertions(+)
 create mode 100644 include/xen/interface/io/pvcalls.h

diff --git a/include/xen/interface/io/pvcalls.h 
b/include/xen/interface/io/pvcalls.h
new file mode 100644
index 000..c438c1b
--- /dev/null
+++ b/include/xen/interface/io/pvcalls.h
@@ -0,0 +1,117 @@
+#ifndef __XEN_PUBLIC_IO_XEN_PVCALLS_H__
+#define __XEN_PUBLIC_IO_XEN_PVCALLS_H__
+
+#include 
+#include "xen/interface/io/ring.h"
+
+/*
+ * See docs/misc/pvcalls.markdown in xen.git for the full specification:
+ * https://xenbits.xen.org/docs/unstable/misc/pvcalls.html
+ */
+struct pvcalls_data_intf {
+RING_IDX in_cons, in_prod, in_error;
+
+uint8_t pad1[52];
+
+RING_IDX out_cons, out_prod, out_error;
+
+uint8_t pad2[52];
+
+RING_IDX ring_order;
+grant_ref_t ref[];
+};
+DEFINE_XEN_FLEX_RING(pvcalls);
+
+#define PVCALLS_SOCKET 0
+#define PVCALLS_CONNECT1
+#define PVCALLS_RELEASE2
+#define PVCALLS_BIND   3
+#define PVCALLS_LISTEN 4
+#define PVCALLS_ACCEPT 5
+#define PVCALLS_POLL   6
+
+struct xen_pvcalls_request {
+uint32_t req_id; /* private to guest, echoed in response */
+uint32_t cmd;/* command to execute */
+union {
+struct xen_pvcalls_socket {
+uint64_t id;
+uint32_t domain;
+uint32_t type;
+uint32_t protocol;
+} socket;
+struct xen_pvcalls_connect {
+uint64_t id;
+uint8_t addr[28];
+uint32_t len;
+uint32_t flags;
+grant_ref_t ref;
+uint32_t evtchn;
+} connect;
+struct xen_pvcalls_release {
+uint64_t id;
+uint8_t reuse;
+} release;
+struct xen_pvcalls_bind {
+uint64_t id;
+uint8_t addr[28];
+uint32_t len;
+} bind;
+struct xen_pvcalls_listen {
+uint64_t id;
+uint32_t backlog;
+} listen;
+struct xen_pvcalls_accept {
+uint64_t id;
+uint64_t id_new;
+grant_ref_t ref;
+uint32_t evtchn;
+} accept;
+struct xen_pvcalls_poll {
+uint64_t id;
+} poll;
+/* dummy member to force sizeof(struct xen_pvcalls_request)
+ * to match across archs */
+struct xen_pvcalls_dummy {
+uint8_t dummy[56];
+} dummy;
+} u;
+};
+
+struct xen_pvcalls_response {
+uint32_t req_id;
+uint32_t cmd;
+int32_t ret;
+uint32_t pad;
+union {
+struct _xen_pvcalls_socket {
+uint64_t id;
+} socket;
+struct _xen_pvcalls_connect {
+uint64_t id;
+} connect;
+struct _xen_pvcalls_release {
+uint64_t id;
+} release;
+struct _xen_pvcalls_bind {
+uint64_t id;
+} bind;
+struct _xen_pvcalls_listen {
+uint64_t id;
+} listen;
+struct _xen_pvcalls_accept {
+uint64_t id;
+} accept;
+struct _xen_pvcalls_poll {
+uint64_t id;
+} poll;
+struct _xen_pvcalls_dummy {
+uint8_t dummy[8];
+} dummy;
+} u;
+};
+
+DEFINE_RING_TYPES(xen_pvcalls, struct xen_pvcalls_request,
+  struct xen_pvcalls_response);
+
+#endif
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] Questions about PVHv2/HVMlite

2017-05-15 Thread Gary R Hook

So I've been slogging through online docs and the code, trying to 
understand where things stand with PVH.


I think my primary questions are:
  1) How do I identify a PVHv2/HVMlite guest?
  2) Or, perhaps more importantly, what distinguishes said guest?
I've got Xen 4.9 unstable built/installed/booted, and am running 4.10 
kernels on my

dom0 and guests.

I've gotten a guest booted, and a basic Ubuntu 14.04 installed from a 
distro ISO onto a

raw disk (a logical volume). All good.

If I use the example file /etc/xen/example.hvm to define a simple guest 
(but no VGA:
nographic=1), I see that I have a qemu instance running, which I expect, 
along with some

threads:

root  8523 1  0 14:31 ?00:00:03 
/usr/local/lib/xen/bin/qemu-system-i386 -xen-domid 17 -chardev socket

root  8779 2  0 14:31 ?00:00:00 [17.xvda-0]
root  8780 2  0 14:31 ?00:00:00 [vif17.0-q0-gues]
root  8781 2  0 14:31 ?00:00:00 [vif17.0-q0-deal]
root  8782 2  0 14:31 ?00:00:00 [vif17.0-q1-gues]
root  8783 2  0 14:31 ?00:00:00 [vif17.0-q1-deal]

All seems good. Now, I've read through the doc at
https://wiki.xen.org/wiki/Xen_Linux_PV_on_HVM_drivers
and when I log into the above guest, and run dmesg | egrep -i 
'xen|front' I get this output:


[0.00] DMI: Xen HVM domU, BIOS 4.9-rc 04/25/2017
[0.00] Hypervisor detected: Xen
[0.00] Xen version 4.9.
[0.00] Xen Platform PCI: I/O protocol version 1
[0.00] Netfront and the Xen platform PCI driver have been 
compiled for this kernel: unplug emulated NICs.
[0.00] Blkfront and the Xen platform PCI driver have been 
compiled for this kernel: unplug emulated disks.

[0.00] ACPI: RSDP 0x000F6800 24 (v02 Xen   )
[0.00] ACPI: XSDT 0xFC00A5B0 54 (v01 XenHVM 
 HVML )
[0.00] ACPI: FACP 0xFC00A2D0 F4 (v04 XenHVM 
 HVML )
[0.00] ACPI: DSDT 0xFC0012A0 008FAC (v02 XenHVM 
 INTL 20140214)
[0.00] ACPI: APIC 0xFC00A3D0 70 (v02 XenHVM 
 HVML )
[0.00] ACPI: HPET 0xFC00A4C0 38 (v01 XenHVM 
 HVML )
[0.00] ACPI: WAET 0xFC00A500 28 (v01 XenHVM 
 HVML )
[0.00] ACPI: SSDT 0xFC00A530 31 (v02 XenHVM 
 INTL 20140214)
[0.00] ACPI: SSDT 0xFC00A570 31 (v02 XenHVM 
 INTL 20140214)

[0.00] Booting paravirtualized kernel on Xen HVM
[0.00] xen: PV spinlocks enabled
[0.00] xen:events: Using FIFO-based ABI
[0.00] xen:events: Xen HVM callback vector for event delivery is 
enabled
[0.156221] clocksource: xen: mask: 0x max_cycles: 
0x1cd42e4dffb, max_idle_ns: 881590591483 ns

[0.156244] Xen: using vcpuop timer interface
[0.156253] installing Xen timer for CPU 0
[0.157188] installing Xen timer for CPU 1
[0.248050] xenbus: xs_reset_watches failed: -38
[0.292506] xen: --> pirq=16 -> irq=9 (gsi=9)
[0.464822] xen:balloon: Initialising balloon driver
[0.468089] xen_balloon: Initialising balloon driver
[0.476131] clocksource: Switched to clocksource xen
[0.491289] xen: --> pirq=17 -> irq=8 (gsi=8)
[0.491405] xen: --> pirq=18 -> irq=12 (gsi=12)
[0.491511] xen: --> pirq=19 -> irq=1 (gsi=1)
[0.491622] xen: --> pirq=20 -> irq=6 (gsi=6)
[1.058087] xen: --> pirq=21 -> irq=24 (gsi=24)
[1.058369] xen:grant_table: Grant tables using version 1 layout
[1.091277] blkfront: xvda: flush diskcache: enabled; persistent 
grants: enabled; indirect descriptors: enabled;

[1.100218] xen_netfront: Initialising Xen virtual ethernet driver
[1.173298] xenbus_probe_frontend: Device with no driver: device/vkbd/0
[2.692397] systemd[1]: Detected virtualization xen.
[3.453534] input: Xen Virtual Keyboard as /devices/virtual/input/input5
[3.454923] input: Xen Virtual Pointer as /devices/virtual/input/input6

Current linux kernels contains PV drivers, as I understand it. And based 
on the referenced
document, the above messages would seem to imply that this is a PVHv2 
guest here. At least
according to what the referenced document explains as how to identify a 
PVH guest. But

shouldn't this be an HVM guest, per the example config file?

I get that the wiki is stale, so I gotta ask questions:

How do I identify/characterize a a PVHv2/HVMlite guest on Xen 4.9?

What, precisely, -defines- one of these (PVHv2) guests?

Re: my prior question on documentation, how does the
current tech preview define one of these hybrid guests? what are the salient
aspects of said guests, and what is that we want to do to create one?

My apologies if this is a simplistic question, but some clarification 
would be

greatly appreciated.

Gary

___
Xen-devel mailing

[Xen-devel] [PATCH v8 4/5] shutdown: Add source information to SHUTDOWN and RESET

2017-05-15 Thread Eric Blake

Time to wire up all the call sites that request a shutdown or
reset to use the enum added in the previous patch.

It would have been less churn to keep the common case with no
arguments as meaning guest-triggered, and only modified the
host-triggered code paths, via a wrapper function, but then we'd
still have to audit that I didn't miss any host-triggered spots;
changing the signature forces us to double-check that I correctly
categorized all callers.

Since command line options can change whether a guest reset request
causes an actual reset vs. a shutdown, it's easy to also add the
information to reset requests.

Signed-off-by: Eric Blake 
Acked-by: David Gibson  [ppc parts]
Reviewed-by: Mark Cave-Ayland  [SPARC part]
Reviewed-by: Cornelia Huck  [s390x parts]

---
v8: rebase later in series
v7: no change
v6: defer event additions to later, add reviews of unchanged portions
v5: drop accidental addition of unrelated files
v4: s/ShutdownType/ShutdownCause/, no thanks to mingw header pollution
v3: retitle again, fix qemu-iotests, use enum rather than raw bool
in all callers
v2: retitle (was "event: Add signal information to SHUTDOWN"),
completely rework to post bool based on whether it is guest-initiated
v1: initial submission, exposing just Unix signals from host
---
 include/sysemu/sysemu.h |  4 ++--
 vl.c| 18 --
 hw/acpi/core.c  |  4 ++--
 hw/arm/highbank.c   |  4 ++--
 hw/arm/integratorcp.c   |  2 +-
 hw/arm/musicpal.c   |  2 +-
 hw/arm/omap1.c  | 10 ++
 hw/arm/omap2.c  |  2 +-
 hw/arm/spitz.c  |  2 +-
 hw/arm/stellaris.c  |  2 +-
 hw/arm/tosa.c   |  2 +-
 hw/i386/pc.c|  2 +-
 hw/i386/xen/xen-hvm.c   |  2 +-
 hw/input/pckbd.c|  4 ++--
 hw/ipmi/ipmi.c  |  4 ++--
 hw/isa/lpc_ich9.c   |  2 +-
 hw/mips/boston.c|  2 +-
 hw/mips/mips_malta.c|  2 +-
 hw/mips/mips_r4k.c  |  4 ++--
 hw/misc/arm_sysctl.c|  8 
 hw/misc/cbus.c  |  2 +-
 hw/misc/macio/cuda.c|  4 ++--
 hw/misc/slavio_misc.c   |  4 ++--
 hw/misc/zynq_slcr.c |  2 +-
 hw/pci-host/apb.c   |  4 ++--
 hw/pci-host/bonito.c|  2 +-
 hw/pci-host/piix.c  |  2 +-
 hw/ppc/e500.c   |  2 +-
 hw/ppc/mpc8544_guts.c   |  2 +-
 hw/ppc/ppc.c|  2 +-
 hw/ppc/ppc405_uc.c  |  2 +-
 hw/ppc/spapr_hcall.c|  2 +-
 hw/ppc/spapr_rtas.c |  4 ++--
 hw/s390x/ipl.c  |  2 +-
 hw/sh4/r2d.c|  2 +-
 hw/timer/etraxfs_timer.c|  2 +-
 hw/timer/m48t59.c   |  4 ++--
 hw/timer/milkymist-sysctl.c |  4 ++--
 hw/timer/pxa2xx_timer.c |  2 +-
 hw/watchdog/watchdog.c  |  2 +-
 hw/xenpv/xen_domainbuild.c  |  2 +-
 hw/xtensa/xtfpga.c  |  2 +-
 kvm-all.c   |  6 +++---
 os-win32.c  |  2 +-
 qmp.c   |  4 ++--
 replay/replay.c |  4 ++--
 target/alpha/sys_helper.c   |  4 ++--
 target/arm/psci.c   |  4 ++--
 target/i386/excp_helper.c   |  2 +-
 target/i386/hax-all.c   |  6 +++---
 target/i386/helper.c|  2 +-
 target/i386/kvm.c   |  2 +-
 target/s390x/helper.c   |  2 +-
 target/s390x/kvm.c  |  4 ++--
 target/s390x/misc_helper.c  |  4 ++--
 target/sparc/int32_helper.c |  2 +-
 ui/sdl.c|  2 +-
 ui/sdl2.c   |  4 ++--
 trace-events|  2 +-
 ui/cocoa.m  |  2 +-
 60 files changed, 98 insertions(+), 98 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 52102fd..e540e6f 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -62,13 +62,13 @@ typedef enum WakeupReason {
 QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;

-void qemu_system_reset_request(void);
+void qemu_system_reset_request(ShutdownCause reason);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
 void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
-void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request(ShutdownCause reason);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/vl.c b/vl.c
index 4641fdf..808c67b 100644
--- a/vl.c
+++ b/vl.c
@@ -1724,7 +1724,7 @@ void qemu_system_guest_panicked(GuestPanicInformation 
*info)
 if (!no_shutdown) {
 qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_POWEROFF,
!!info, info, _abort);
-qemu_system_shutdown_request();
+

[Xen-devel] [PATCH v8 2/5] shutdown: Prepare for use of an enum in reset/shutdown_request

2017-05-15 Thread Eric Blake

We want to track why a guest was shutdown; in particular, being able
to tell the difference between a guest request (such as ACPI request)
and host request (such as SIGINT) will prove useful to libvirt.
Since all requests eventually end up changing shutdown_requested in
vl.c, the logical change is to make that value track the reason,
rather than its current 0/1 contents.

Since command-line options control whether a reset request is turned
into a shutdown request instead, the same treatment is given to
reset_requested.

This patch adds an internal enum ShutdownCause that describes reasons
that a shutdown can be requested, and changes qemu_system_reset() to
pass the reason through, although for now nothing is actually changed
with regards to what gets reported.  The enum could be exported via
QAPI at a later date, if deemed necessary, but for now, there has not
been a request to expose that much detail to end clients.

For the most part, we turn 0 into SHUTDOWN_CAUSE_NONE, and 1 into
SHUTDOWN_CAUSE_HOST_ERROR; the only specific case where we have enough
information right now to use a different value is when we are reacting
to a host signal.  It will take a further patch to edit all call-sites
that can trigger a reset or shutdown request to properly pass in any
other reasons; this patch includes TODOs to point such places out.

qemu_system_reset() trades its 'bool report' parameter for a
'ShutdownCause reason', with all non-zero values having the same
effect; this lets us get rid of the weird #defines for VMRESET_*
as synonyms for bools.

Signed-off-by: Eric Blake 

---
v8: s/FIXME/TODO/, include SHUTDOWN_CAUSE__MAX now rather than later,
tweak comment on GUEST_SHUTDOWN to mention suspend
v7: drop 'bool report' from qemu_system_reset(), reorder enum to put
HOST_ERROR == 1, improve commit message
v6: make ShutdownCause internal-only, add SHUTDOWN_CAUSE_NONE so that
comparison to 0 still works, tweak initial FIXME values
v5: no change
v4: s/ShutdownType/ShutdownCause/, no thanks to mingw header pollution
v3: new patch
---
 include/sysemu/sysemu.h | 23 -
 vl.c| 53 ++---
 hw/i386/xen/xen-hvm.c   |  7 +--
 migration/colo.c|  2 +-
 migration/savevm.c  |  2 +-
 5 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 15656b7..52102fd 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -33,8 +33,21 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
 void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
 void vm_state_notify(int running, RunState state);

-#define VMRESET_SILENT   false
-#define VMRESET_REPORT   true
+/* Enumeration of various causes for shutdown. */
+typedef enum ShutdownCause {
+SHUTDOWN_CAUSE_NONE,  /* No shutdown request pending */
+SHUTDOWN_CAUSE_HOST_ERROR,/* An error prevents further use of guest */
+SHUTDOWN_CAUSE_HOST_QMP,  /* Reaction to a QMP command, like 'quit' */
+SHUTDOWN_CAUSE_HOST_SIGNAL,   /* Reaction to a signal, such as SIGINT */
+SHUTDOWN_CAUSE_HOST_UI,   /* Reaction to UI event, like window close */
+SHUTDOWN_CAUSE_GUEST_SHUTDOWN,/* Guest shutdown/suspend request, via
+ ACPI or other hardware-specific means */
+SHUTDOWN_CAUSE_GUEST_RESET,   /* Guest reset request, and command line
+ turns that into a shutdown */
+SHUTDOWN_CAUSE_GUEST_PANIC,   /* Guest panicked, and command line turns
+ that into a shutdown */
+SHUTDOWN_CAUSE__MAX,
+} ShutdownCause;

 void vm_start(void);
 int vm_prepare_start(void);
@@ -62,10 +75,10 @@ void qemu_system_debug_request(void);
 void qemu_system_vmstop_request(RunState reason);
 void qemu_system_vmstop_request_prepare(void);
 bool qemu_vmstop_requested(RunState *r);
-int qemu_shutdown_requested_get(void);
-int qemu_reset_requested_get(void);
+ShutdownCause qemu_shutdown_requested_get(void);
+ShutdownCause qemu_reset_requested_get(void);
 void qemu_system_killed(int signal, pid_t pid);
-void qemu_system_reset(bool report);
+void qemu_system_reset(ShutdownCause reason);
 void qemu_system_guest_panicked(GuestPanicInformation *info);
 size_t qemu_target_page_size(void);

diff --git a/vl.c b/vl.c
index 7396748..2060038 100644
--- a/vl.c
+++ b/vl.c
@@ -1597,8 +1597,9 @@ void vm_state_notify(int running, RunState state)
 }
 }

-static int reset_requested;
-static int shutdown_requested, shutdown_signal;
+static ShutdownCause reset_requested;
+static ShutdownCause shutdown_requested;
+static int shutdown_signal;
 static pid_t shutdown_pid;
 static int powerdown_requested;
 static int debug_requested;
@@ -1612,19 +1613,19 @@ static NotifierList wakeup_notifiers =
 NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
 static uint32_t wakeup_reason_mask = ~(1 <<

[Xen-devel] IOMMU support on AMD Ryzen, simple patch needed

2017-05-15 Thread Bjoern


Hi,

I just completed getting Qubes-OS working with Ryzen and IOMMU - at 
least it looks like it to me and ran out of the box BIOS wise.


All that was required is a small patch in 
xen/arch/x86/oprofile/nmi_int.c - Ryzen family 17h is the same as 15h. 
Without that, "xl dmesg" under Ubuntu 17.04 (self compiled 4.8.3) would 
show that family 17h isn't supported, with the above fix everything 
shows up fine.


Xen 4.8.0 has the IOMMU patch 
(https://patchwork.kernel.org/patch/9145119/) which was required for 
Qubes (Xen 4.6.5), and then it just required the above change and it's 
working apparently... at least Qubes reports working Xen - so looks good.


This is a fyi mail - I do not want to push this fix or something into 
Xen as I also have no idea if I'm missing something else, but if someone 
else wants to pick this up, by all means please do :)


Cheers,
Bjoern

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [xen-unstable test] 109441: regressions - FAIL

2017-05-15 Thread osstest service owner

flight 109441 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109441/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemut-winxpsp3 17 guest-start/win.repeat fail REGR. vs. 
109165

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemuu-winxpsp3 16 guest-stop  fail in 109418 pass in 109441
 test-amd64-amd64-xl-qemuu-winxpsp3 16 guest-stop   fail pass in 109395
 test-arm64-arm64-xl-multivcpu  6 xen-boot  fail pass in 109418
 test-amd64-amd64-xl-qemut-winxpsp3 17 guest-start/win.repeat fail pass in 
109418

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-start/win.repeat fail blocked in 
109165
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop   fail in 109395 like 109091
 test-amd64-amd64-xl-qemut-winxpsp3 16 guest-stopfail in 109395 like 109136
 test-amd64-amd64-xl-qemuu-winxpsp3 17 guest-start/win.repeat fail in 109395 
like 109165
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop   fail in 109418 like 109112
 test-amd64-i386-xl-qemut-winxpsp3 16 guest-stop fail in 109418 like 109112
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop  fail in 109418 like 109136
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop  fail in 109418 like 109165
 test-arm64-arm64-xl-multivcpu 12 migrate-support-check fail in 109418 never 
pass
 test-arm64-arm64-xl-multivcpu 13 saverestore-support-check fail in 109418 
never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 17 guest-start/win.repeat fail like 
109091
 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail like 109136
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 109165
 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail like 109165
 test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail like 109165
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 109165
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 16 guest-stopfail like 109165
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail  like 109165
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 109165
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 109165
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-rtds 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-check

Re: [Xen-devel] [Xen-users] UEFI Secure Boot Xen 4.9

2017-05-15 Thread Bill Jacobs (billjac)



> -Original Message-
> From: Daniel Kiper [mailto:daniel.ki...@oracle.com]
> Sent: Monday, May 15, 2017 6:13 AM
> To: Bill Jacobs (billjac) ; george.dun...@citrix.com
> Cc: xen-devel@lists.xen.org; xen-us...@lists.xen.org
> Subject: Re: [Xen-users] UEFI Secure Boot Xen 4.9
> 
> Hey,
> 
> CC-ing Xen-devel to spread some knowledge about the issue.
> 
> On Mon, May 15, 2017 at 10:42:23AM +0100, George Dunlap wrote:
> > On Wed, May 10, 2017 at 11:36 PM, Bill Jacobs (billjac)
> >  wrote:
> > > Hi all
> > >
> > > I gather that with 4.9, UEFI secure boot of Xen should be possible.
> > >
> > > Is this true?
> > >
> > > If so, what are the options for utilizing UEFI secure boot? Do I
> > > need a MSFT-signed shim or grub? Any special changes required for
> > > Xen kernel
> > > (signing?) or has that been done?
> >
> > Bill,
> >
> > I guess in part it depends on what you mean by "utilizing UEFI secure
> > boot".  If you simply want to boot an unsigned Xen on a UEFI system
> > with SecureBoot enabled, then grub would probably work.  If you want
> > to actually do the full SecureBoot thing -- where you have grub check
> > Xen's signature and that of the kernel and initrd, you probably need a
> > bit more.
> >
> > Daniel,
> >
> > Is there any good documentation on this?  The Xen EFI guide
> > (https://wiki.xenproject.org/wiki/Xen_EFI) mentions the shim, but
> > doesn't go into detail about how to sign a binary 
> 
> Unfortunately I do not know anything like that. As you said in general shim is
> supported. Sadly, it works only if you load xen.efi directly from EFI.
> __Upstream__ GRUB2 has not have support for shim yet. I am working on it
> (shim support via GRUB2 requires also some changes in Xen). I hope that I will
> have something which works before Xen conf in Budapest.
> 
> If you wish to use shim with xen.efi then you have to sign xen.efi and vmlinux
> with your key using sbsign or pesign. The process works in the same way like 
> in
> case vmlinux alone. Of course you have to install your public key into MOK
> before enabling secure boot.
> 
> Daniel

Yes, there are options in how this is achievable, and the solutions may be 
different. 

We are targeting a secure boot chain from UEFI fw to .ko, using same signing. 
In our case would skip shim and reduce attack surface, but it appears that the 
mechanisms 'out there' for passing pub key (cert) from UEFI db to Linux 
chainring require shim to do the work. Is that accurate? Does it have to be the 
case? I don't see why. 
For us, ideal case is :
UEFI fw -> (signed)GRUB2.efi->Multiboot2->Xen(signed .ko)

I would be happy to work to help achieve this. 
-Bill


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] libxl/devd: fix a race with concurrent device addition/removal

2017-05-15 Thread Roger Pau Monne

On Mon, May 15, 2017 at 02:37:33PM +0100, Julien Grall wrote:
> Hi Roger,
> 
> On 11/05/17 12:43, Roger Pau Monne wrote:
> > On Thu, May 11, 2017 at 12:06:08PM +0100, Ian Jackson wrote:
> > > Roger Pau Monne writes ("[PATCH v2 1/2] libxl/devd: fix a race with 
> > > concurrent device addition/removal"):
> > > > Current code can free the libxl__device inside of the 
> > > > libxl__ddomain_device
> > > > before the addition has finished if a removal happens while an addition 
> > > > is
> > > > still in process:
> > > ...
> > > > Fix this by creating a temporary copy of the libxl__device, that's
> > > > tracked by the GC of the nested async operation. This ensures that
> > > > the libxl__device used by the async operations cannot be freed while
> > > > being used.
> > > ...
> > > >  GCNEW(aodev);
> > > >  libxl__prepare_ao_device(ao, aodev);
> > > > -aodev->dev = dev;
> > > > +/*
> > > > + * Clone the libxl__device to avoid races if remove_device is 
> > > > called
> > > > + * before the device addition has finished.
> > > > + */
> > > > +GCNEW(aodev->dev);
> > > > +*aodev->dev = *dev;
> > > 
> > > This does conveniently disentangle the memory management, so I think
> > > it's a good approach.
> > > 
> > > But it reads kind of oddly to me.  I think it is not buggy, but can
> > > you add a comment to the definition of libxl__device, saying that it
> > > is a transparent structure containing no external memory references ?
> > 
> > Sure, before implementing this I already took a look at the contents of the
> > libxl__device struct, but I agree that a comment is in place in case someone
> > expands the fields of the struct later on.
> > 
> > > Otherwise this copy is not really justifiable, because in C, in
> > > general, structs might contain private fields, or memory references or
> > > linked list entries or something.
> > 
> > Thanks, Roger.
> > 
> > NB: FWIW, I'm planning to keep Wei's RB since this is a cosmetic change.
> 
> Is this patch series targeting Xen 4.9?

Yes, I think so. I will post a new version soon and Cc you.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [qemu-mainline test] 109445: regressions - FAIL

2017-05-15 Thread osstest service owner

flight 109445 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109445/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-xsm   5 xen-buildfail REGR. vs. 107636
 build-arm64   5 xen-buildfail REGR. vs. 107636
 build-i3865 xen-buildfail REGR. vs. 107636
 build-amd64-xsm   5 xen-buildfail REGR. vs. 107636
 build-amd64   5 xen-buildfail REGR. vs. 107636
 build-armhf-xsm   5 xen-buildfail REGR. vs. 107636
 build-i386-xsm5 xen-buildfail REGR. vs. 107636
 build-armhf   5 xen-buildfail REGR. vs. 107636

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  1 build-check(1) blocked n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  1 build-check(1) blocked n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-winxpsp3  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvh-amd   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-winxpsp3  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-amd64-pvgrub  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64  1 build-check(1) blocked n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm  1 build-check(1)blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl-pvh-intel  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-raw1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-amd64-i386-pvgrub  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1

[Xen-devel] [linux-3.18 test] 109446: regressions - FAIL

2017-05-15 Thread osstest service owner

flight 109446 linux-3.18 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109446/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-credit2   7 host-ping-check-xen  fail REGR. vs. 109161
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 17 guest-start/win.repeat fail REGR. 
vs. 109161

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stopfail REGR. vs. 109161

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-rtds  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop   fail blocked in 109161
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 109161
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 109161
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 109161
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 109161
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 109161
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 build-arm64-pvops 5 kernel-build fail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass

version targeted for testing:
 linuxb3eba07a079ee4b628e40d6fecb44e2bc8f139e8
baseline version:
 linux68e50dad01f491a0645b720d6bf5a2f00411fbec

Last test of basis   109161  2017-05-08 06:20:31 Z7 days
Testing same since   109446  2017-05-15 07:47:20 Z0 days1 attempts


People who touched revisions under test:
  Alan Stern 
  Alex Deucher 
  Alexander Potapenko 
  Amitkumar Karwar 
  Andrew Morton 
  Andrey Konovalov 
  Andy Shevchenko 
  Ard Biesheuvel 
  Arend van Spriel 
  Arnd Bergmann 
  Ben Hutchings 
  Brian Norris 
  Brian Norris 
  Cong Wang 
  David Ahern 
  David S. Miller 
  Dmitry Torokhov 
  Eric Dumazet 
  Ganapathi Bhat 
  Greg Kroah-Hartman

Re: [Xen-devel] HPET enabled in BIOS, not presented as available_clocksource -- config, kernel code, &/or BIOS?

2017-05-15 Thread Austin S. Hemmelgarn


On 2017-05-13 19:17, PGNet Dev wrote:

On 5/13/17 3:15 PM, Valentin Vidic wrote:

Try booting without 'hpet=force,verbose clocksource=hpet' and it should
select xen by default:


Nope. Well, not quite ...

With both

'hpet=force,verbose clocksource=hpet'

removed, I end up with

cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc xen
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

But with

clocksource=xen

*explicitly* added

cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc xen
cat /sys/devices/system/clocksource/clocksource0/current_clocksourcexen
xen

and in *console*, NOT dmesg, output,

grep -i hpet tmp.txt
(XEN) ACPI: HPET 9E8298F8, 0038 (r1 SUPERM SMCI--MB  1072009 
AMI.5)
(XEN) ACPI: HPET id: 0x8086a701 base: 0xfed0
(XEN) [VT-D] MSI HPET: :f0:0f.0
(XEN) Platform timer is 14.318MHz HPET
[0.00] ACPI: HPET 0x9E8298F8 38 (v01 SUPERM 
SMCI--MB 01072009 AMI. 00
[0.00] ACPI: HPET id: 0x8086a701 base: 0xfed0
[0.00] ACPI: HPET 0x9E8298F8 38 (v01 SUPERM 
SMCI--MB 01072009 AMI. 00
[0.00] ACPI: HPET id: 0x8086a701 base: 0xfed0
[8.515245] hpet_acpi_add: no address or irqs in _CRS
[8.515245] hpet_acpi_add: no address or irqs in _CRS
(XEN) [2017-05-13 23:04:27] HVM1 save: HPET



and

dmesg | grep -i clocksource | grep -v line:
[0.00] clocksource: refined-jiffies: mask: 0x 
max_cycles: 0x, max_idle_ns: 7645519600211568 ns
[0.004000] clocksource: xen: mask: 0x 
max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[0.375709] clocksource: jiffies: mask: 0x 
max_cycles: 0x, max_idle_ns: 764504178510 ns
[4.656634] clocksource: Switched to clocksource xen
[8.912897] clocksource: tsc: mask: 0x 
max_cycles: 0x2c94dffea94, max_idle_ns: 440795361700 ns

jiffies, now? hm. no idea where that came from. and why the 'tsc' ?

So I'm still unclear -- is this^ now, correctly "all" using MSI/HPET?
That depends on what you mean by everything correctly using the HPET. 
Using clocksource=xen (or autoselecting it) will cause the kernel to get 
timing info from Xen.  If you're running as a guest, this is absolutely 
what you want (unless you're using HVM), and with possible limited and 
extremely specific exceptions, this is almost certainly what you want in 
Domain-0 as well.  Given that Xen is using the HPEt for timing itself, 
using clocksource=xen will result in Linux _indirectly_ using the HPET 
through Xen without involving the HPET driver (in essence, Xen is your 
HPET driver in this situation), which will get you essentially the same 
precision that you would get by using the HPET directly.


So, if you just want the precision offered by the HPET, then yes, you 
are getting the same thing through the Xen clocksource.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Proposal to allow setting up shared memory areas between VMs from xl config file

2017-05-15 Thread Stefano Stabellini

On Mon, 15 May 2017, Jan Beulich wrote:
> >>> On 15.05.17 at 12:21,  wrote:
> > On 05/15/2017 09:52 AM, Jan Beulich wrote:
> > On 15.05.17 at 10:20,  wrote:
> >>> On 15/05/2017 09:08, Jan Beulich wrote:
> >>> On 12.05.17 at 19:01,  wrote:
> > 
> > 1. Motivation and Description
> > 
> > Virtual machines use grant table hypercalls to setup a share page for
> > inter-VMs communications. These hypercalls are used by all PV
> > protocols today. However, very simple guests, such as baremetal
> > applications, might not have the infrastructure to handle the grant 
> > table.
> > This project is about setting up several shared memory areas for 
> > inter-VMs
> > communications directly from the VM config file.
> > So that the guest kernel doesn't have to have grant table support to be
> > able to communicate with other guests.
> 
>  I think it would help to compare your proposal with the alternative of
>  adding grant table infrastructure to such environments (which I
>  wouldn't expect to be all that difficult). After all introduction of a
>  (seemingly) redundant mechanism comes at the price of extra /
>  duplicate code in the tool stack and maybe even in the hypervisor.
>  Hence there needs to be a meaningfully higher gain than price here.
> >>>
> >>> This is a key feature for embedded because they want to be able to share
> >>> buffer very easily at domain creation time between two guests.
> >>>
> >>> Adding the grant table driver in the guest OS as a high a cost when the
> >>> goal is to run unmodified OS in a VM. This is achievable on ARM if you
> >>> use passthrough.
> >>
> >> "high cost" is pretty abstract and vague. And I admit I have difficulty
> >> seeing how an entirely unmodified OS could leverage this newly
> >> proposed sharing model.
> > 
> > Let's step back for a moment, I will come back on Zhongze proposal 
> > afterwards.
> > 
> > Using grant table in the guest will obviously require the grant-table 
> > driver. It is not that bad. However, how do you pass the grant ref 
> > number to the other guest? The only way I can see is xenstore, so yet 
> > another driver to port.
> 
> Just look at the amount of code that was needed to get PV drivers
> to work in x86 HVM guests. It's not all that much. Plus making such
> available in a new environment doesn't normally mean everything
> needs to be written from scratch.

The requirement is to allow shared communication between unmodified
bare-metal applications. These applications are extremely simple and
lack the basic infrastructure that an operating system has, nor they
would want to introduce it. I have been hearing this request from
embedded people for months now.


> > On Zhongze proposal, the share page will be mapped at the a specific 
> > address in the guest memory. I agree this will require some work in the 
> > toolstack, on the hypervisor side we could re-use the foreign mapping 
> > API. But on the guest side there are nothing to do Xen specific.
> 
> So what is the equivalent of the shared page on bare hardware?

Bare-metal apps already have the concept of a shared page to communicate
with hardware devices, co-processors and other hardware/firmare
intercommunication frameworks.


> > What's the benefit? Baremetal guest are usually tiny, you could use the 
> > device-tree (and hence generic way) to present the share page for 
> > communicating. This means no Xen PV drivers, and therefore easier to 
> > move an OS in Xen VM.
> 
> Is this intended to be an ARM-specific extension, or a generic one?
> There's no DT on x86 to pass such information, and I can't easily
> see alternatives there. Also the consumer of the shared page info
> is still a PV component of the guest. You simply can't have an
> entirely unmodified guest which at the same time is Xen (or
> whatever other component sits at the other end of the shared
> page) aware.

I was going to propose for this work to be arch-neutral. However, it is
true that with the existing x86 software and hardware ecosystem, it
wouldn't be much use there. Given that the work is technically common
though, I don't see any downsides on enabling it on x86 on the off
chance that somebody will find it useful. However, if you prefer to
keep it ARM only, that's fine by me too.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [ARM] Native application design and discussion (I hope)

2017-05-15 Thread Stefano Stabellini

On Mon, 15 May 2017, George Dunlap wrote:
> On Fri, May 12, 2017 at 7:47 PM, Volodymyr Babchuk
>  wrote:
> >> Regarding modules (#3): The problem that loadable modules were
> >> primarily introduced to solve in Linux wasn't "How to deal with
> >> proprietary drivers", or even "how to deal with out-of-tree drivers".
> >> The problem was, "How to we allow software providers to 1) have a
> >> single kernel binary, which 2) has drivers for all the different
> >> systems on which it needs to run, but 3) not take a massive amount of
> >> memory or space on systems, given that any given system will not need
> >> the vast majority of drivers?"
> >>
> >> Suppose hypothetically that we decided that the mediators you describe
> >> need to run in the hypervisor.  As long as Kconfig is sufficient for
> >> people to enable or disable what they need to make a functional and
> >> efficient system, then there's no need to introduce modules.  If we
> >> reached a point where people wanted a single binary that could do
> >> either or OP-TEE mediator or the Google mediator, or both, or neither,
> >> but didn't to include all of them in the core binary (perhaps because
> >> of memory constraints), then loadable modules would be a good solution
> >> to consider.  But either way, if we decided they should run in the
> >> hypervisor, then all things being equal it would still be better to
> >> have both implementations in-tree.
> >>
> >> There are a couple of reasons for the push-back on loadable modules.
> >> The first is the extra complication and infrastructure it adds.  But
> >> the second is that people have a strong temptation to use them for
> >> out-of-tree and proprietary code, both of which we'd like to avoid if
> >> possible.  If there comes a point in time where loadable modules are
> >> the only reasonable solution to the problem, I will support having
> >> them; but until that time I will look for other solutions if I can.
> >>
> >> Does that make sense?
> > Yes, thank you. Legal questions is not my best side. Looks like I was
> > too quick, when proposed modules as a solution to our needs. Sorry, I
> > had to investigate this topic further before talking about it.
> >
> > So, let's get back to native apps. We had internal discussion about
> > possible use cases and want to share our conclusions.
> >
> > 1. Emulators. As Stefano pointed, this is ideal use case for small,
> > fast native apps that are accounted in a calling vcpu time slice.
> >
> > 2. Virtual coprocessor backend/driver. The part that does actual job:
> > makes coprocessor to save or restore context. It is also small,
> > straightforward app, but it should have access to a real HW.
> >
> > 3. TEE mediators. They need so much privileges, so there actually are
> > no sense in putting them into native apps. For example, to work
> > properly OP-TEE mediator needs to: pin guest pages, map guest pages to
> > perform IPA->MPA translation, send vIRQs to guests, issue real SMCs.
> 
> As I think I've said elsewhere, apart from "issue real SMCs", all of
> that functionality is already available to device models running in
> domain 0, in the sense that there are interfaces which cause Xen to
> make those things happen: when the devicemodel maps a page, that
> increases the refcount and effectively pins it; the devicemodel
> accesses *all* guest pages in terms of guest memory addresses, but (I
> believe) can ask Xen for a p->m translation of a particular page in
> memory; and it can set vIRQs pending to the guest.  It seems likely
> that a suitable hypervisor interface could be made to expose SMC
> functionality to device models as well.

I'll repeat here for convenience. The discussion started from the need
for something that has lower latency, and more importantly, more
deterministic latency, than a dom0 device model. A dom0 device model
cannot guarantee even the weakest of latency requirements.

On ARM there are no dom0 device models today, and given their critical
limitations, I prefer to introduce something different from the start.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))

2017-05-15 Thread Stefano Stabellini

On Mon, 15 May 2017, George Dunlap wrote:
> [Reducing CC list now that we're off the topic of modules]
> 
> On Fri, May 12, 2017 at 8:04 PM, Volodymyr Babchuk
>  wrote:
> > Stefano,
> >
> > On 12 May 2017 at 21:43, Stefano Stabellini  wrote:
> >
> >> On the topic of the technical reasons for being out of the hypervisor
> >> (EL0 app or stubdom), I'll spend a couple of words on security.
> >>
> >> How large are these components? If they increase the hypervisor code
> >> size too much, it's best if they are run elsewhere.
> > I'm talking about OP-TEE now.
> > "Large" as "large code base"? I have shared my PoC driver. Here it is
> > [1]. My expectation: 1,000-2,000 lines of code for mediator + some
> > OP-TEE headers.
> >
> >> What is their guest-exposed attack surface? If it's large it's best to
> >> run them out of the hypervisor.
> > OP-TEE mediator will trap SMC calls and parse parameter buffers
> > according to OP-TEE ABI specification. ABI is very simple, so I can't
> > say that there will be attack surface.
> >
> >> My gut feeling is that both these points might be a problem.
> > The real problem, that is needs the same privileges, as hypervisor
> > itself. I wrote this in parallel thread:
> > it needs to pin guest pages (to ensure that page will be not
> > transferred to another domain, while OP-TEE uses it), it needs to map
> > guest page so it can do IPA->PA translation in a command buffer, it
> > needs to execute SMCs (but we can limit it there, thanks to SMCCC),
> > probably it will need to inject vIRQ to guest to wake it up.
> 
> Xen is different than Linux in that it attempts to take a "practical
> microkernel" approach.  "Microkernel" meaning that we prefer to do as
> much *outside* of the hypervisor as possible.  "Practical" meaning, if
> running it outside the hypervisor causes too much complexity or too
> much performance overhead, then we don't stand on ideology but allow
> things to run inside of Xen.
> 
> With the exception of SMCs (which I don't know anything about), device
> models (e.g., QEMU) already have  of this functionality on x86,
> running from dom0 or from a stubdomain.
> 
> Do OP-TEE mediators require a lot of performance?  I.e., do the
> operations happen very frequently and/or are they particularly
> latency-sensitive?  If not then it might be worth implementing it as a
> dom0 device model first, and then exploring higher-performing options
> if that turns out to be too slow.

The whole discussion started from the need for something that has lower
latency, and more importantly, more deterministic latency, than a dom0
device model.

Any use-cases with even the weakest of real-time requirements won't be
satisfied by a dom0 device model, where the max latency is basically
infinite.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] Commit moratorium to staging

2017-05-15 Thread Julien Grall


Committers,

It looks like osstest is a bit behind because of ARM64 boxes (they are 
fully loaded) and XP testing (they now have been removed see [1]).


I'd like to cut the next rc when staging == master, so please stop 
committing today.


Ian forced pushed osstest today, so hopefully we can get a push tomorrow.

Cheers,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg00425.html


--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [XEN-devel] arm64: fix incorrect pa_range_info table to support 42 bit PA systems.

2017-05-15 Thread Julien Grall


Hi,

On 15/05/17 18:11, Feng Kan wrote:

On Mon, May 15, 2017 at 7:53 AM, Julien Grall  wrote:

Hello Feng,

On 13/05/17 01:26, Feng Kan wrote:


The pa_range_info table contain incorrect root_order and t0sz which
prevent 42 bit PA systems from booting dom0.



As I mentioned in the previous thread [1], this is not a bug. What you
configure below is the stage-2 page table and not the hypervisor page-table.

It is perfectly fine to expose less IPA (Intermediate Physical Address) bits
than the number of PA (Physical Address) bits as long as all the address
wired are below 40 bits (assumption made by the patch who added this code).
Does your hardware have devices/RAM above 40 bits?

Yes,  the APM X-Gene series have all been 42 bit PA systems.
Particularly X-Gene 3, which
has its PCIe0 all the way up starting at 41 bit.


Thank you for the information.


If so, then you need to

mention in the commit message.

I will be more clear in the commit message.


This bring another question, now you will allocate 8 pages by default for
both DOM0 and guests. Exposing 42 bits IPA to a guest does not sound
necessary, so we would waste memory here. How are you going to address that?

To be honest, I hadn't thought of that. I had assume system such as
these would have
plenty of memory. I will take a look regarding this. If you have any
suggestions that would
be greatly appreciated


I am not totally against using 8 pages. Although a TODO will be useful 
in the code and the commit message.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [XEN-devel] arm64: fix incorrect pa_range_info table to support 42 bit PA systems.

2017-05-15 Thread Feng Kan

On Mon, May 15, 2017 at 7:53 AM, Julien Grall  wrote:
> Hello Feng,
>
> On 13/05/17 01:26, Feng Kan wrote:
>>
>> The pa_range_info table contain incorrect root_order and t0sz which
>> prevent 42 bit PA systems from booting dom0.
>
>
> As I mentioned in the previous thread [1], this is not a bug. What you
> configure below is the stage-2 page table and not the hypervisor page-table.
>
> It is perfectly fine to expose less IPA (Intermediate Physical Address) bits
> than the number of PA (Physical Address) bits as long as all the address
> wired are below 40 bits (assumption made by the patch who added this code).
> Does your hardware have devices/RAM above 40 bits?
Yes,  the APM X-Gene series have all been 42 bit PA systems.
Particularly X-Gene 3, which
has its PCIe0 all the way up starting at 41 bit.

If so, then you need to
> mention in the commit message.
I will be more clear in the commit message.
>
> This bring another question, now you will allocate 8 pages by default for
> both DOM0 and guests. Exposing 42 bits IPA to a guest does not sound
> necessary, so we would waste memory here. How are you going to address that?
To be honest, I hadn't thought of that. I had assume system such as
these would have
plenty of memory. I will take a look regarding this. If you have any
suggestions that would
be greatly appreciated.
>
> Lastly, please quote the ARM ARM when you modify the generic ARM code to
> help the reviewer checking your code.
Thanks, will do.
>
> Cheers,
>
> [1]
> https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg01254.html
>
>
>>
>> Signed-off-by: Feng Kan 
>> ---
>>  xen/arch/arm/p2m.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 34d5776..cbb8675 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1479,7 +1479,7 @@ void __init setup_virt_paging(void)
>>  [0] = { 32,  32/*32*/,  0,  1 },
>>  [1] = { 36,  28/*28*/,  0,  1 },
>>  [2] = { 40,  24/*24*/,  1,  1 },
>> -[3] = { 42,  24/*22*/,  1,  1 },
>> +[3] = { 42,  22/*22*/,  3,  1 },
>>  [4] = { 44,  20/*20*/,  0,  2 },
>>  [5] = { 48,  16/*16*/,  0,  2 },
>>  [6] = { 0 }, /* Invalid */
>>
>
> --
> Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] include: fix build without C++ compiler installed

2017-05-15 Thread Wei Liu

On Mon, May 15, 2017 at 01:02:48AM -0600, Jan Beulich wrote:
> >>> On 12.05.17 at 18:20,  wrote:
> > On Fri, May 12, 2017 at 12:52:54AM -0600, Jan Beulich wrote:
> >> The rule for headers++.chk wants to move headers++.chk.new to the
> >> designated target, which means we have to create that file in the first
> >> place.
> >> 
> >> Signed-off-by: Jan Beulich 
> > 
> > Reviewed-by: Wei Liu 
> 
> Thanks.
> 
> > If I were to fix it I would just skip the check altogether if CXX isn't
> > available. But this approach is fine, too.
> 
> I may not be understanding what you mean: The test is being skipped;
> the destination file is being touched so that on an incremental re-build
> the rule wouldn't be re-run. What else are you imagining? Suppressing
> the headers++.chk target altogether would likely be more code churn,

Yes that's what I was thinking. But as you said it's going to be more
code churn.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Proposal to allow setting up shared memory areas between VMs from xl config file

2017-05-15 Thread Wei Liu

On Sat, May 13, 2017 at 10:28:27AM +0800, Zhongze Liu wrote:
> 2017-05-13 1:51 GMT+08:00 Wei Liu :
> > Hi Zhongze
> >
> > This is a nice write-up. Some comments below. Feel free to disagree with
> > what I say below, this is more a discussion than picking on your design
> > or plan.
> >
> 
> HI, Wei Liu
> 
> Thanks for your time reading through my proposal.
> 
> >
> > On Sat, May 13, 2017 at 01:01:39AM +0800, Zhongze Liu wrote:
> >> Hi, Xen developers,
> >>
> >> I'm Zhongze Liu, a GSoC student of this year. Glad to meet you in the
> >> Xen Project.  As an initial step to implementing my GSoC proposal, which
> >> is still a draft,  I'm posting it here. And hope to hear from you your
> >> suggestions.
> >>
> >> 
> >> 1. Motivation and Description
> >> 
> >> Virtual machines use grant table hypercalls to setup a share page for
> >> inter-VMs communications. These hypercalls are used by all PV
> >> protocols today. However, very simple guests, such as baremetal
> >> applications, might not have the infrastructure to handle the grant table.
> >> This project is about setting up several shared memory areas for inter-VMs
> >> communications directly from the VM config file.
> >> So that the guest kernel doesn't have to have grant table support to be
> >> able to communicate with other guests.
> >>
> >> 
> >> 2. Implementation Plan:
> >> 
> >>
> >> ==
> >> 2.1 Introduce a new VM config option in xl:
> >> ==
> >> The shared areas should be shareable among several VMs,
> >> every shared physical memory area is assigned to a set of VMs.
> >> Therefore, a “token” or “identifier” should be used here to uniquely
> >> identify a backing memory area.
> >>
> >>
> >> I would suggest using an unsigned integer to serve as the identifier.
> >> For example:
> >>
> >> In xl config file of vm1:
> >>
> >> static_shared_mem = [“addr_range1= ID1”, “addr_range2 = ID2”]
> >>
> >> In xl config file of vm2:
> >>
> >> static_shared_mem = [“addr_range3 = ID1”]
> >>
> >> In xl config file of vm3:
> >>
> >> static_shared_mem = [“addr_range4 = ID2”]
> >
> > I can envisage you need some more attributes: what about the attributes
> > like RW / RO / WO (or even X)?
> >
> > Also, I assume the granularity of the mapping is a page, but as far as I
> > can tell there are two page granularity on ARM, you do need to consider
> > both and what should happen if you mix and match them. What about
> > mapping several pages and different VM use overlapping ranges?
> >
> > Can you give some concrete examples? What does addr_rangeX look like in
> > practice?
> >
> >
> 
> Yes, those attributes are necessary and should be explicitly specified in the
> config file. I'll add them in the next version of this proposal. And taking 
> the
> granularity into consideration, what do you say if we change the entries into
> something like:
> 'start=0xcafebabe, end=0xdeedbeef, granularity=4K, prot=RWX'.

I realised I may have gone too far after reading your reply.

What is the end purpose of this project? If you only want to insert a
mfn into guest address space and don't care how the guest is going to
map it, you can omit the prot= part. If you want stricter control, you
will need them -- and that would also have implications on the
hypervisor code you need.

I suggest you write the manual for the new mechanism you propose first.
That way you describe the feature in a sysadmin-friendly way.  Describe
the syntax, the effect of the new mechanism and how people are supposed
to use it under what circumstances.

> 
> >
> >>
> >>
> >> In the example above. A memory area A1 will be shared between
> >> vm1 and vm2 -- vm1 can access this area using addr_range1
> >> and vm2 using addr_range3. Likewise, a memory area A2 will be
> >> shared between vm1 and vm3 -- vm1 can access A2 using addr_range2
> >> and vm3 using addr_range4.
> >>
> >> The shared memory area denoted by an identifier IDx will be
> >> allocated when it first appear, and the memory pages will be taken from
> >> the first VM whose static_shared_mem list contains IDx. Take the above
> >> config files for example, if we instantiate vm1, vm2 and vm3, one after
> >> another, the memory areas denoted by ID1 and ID2 will both be allocated
> >> in and taken from vm1.
> >
> > Hmm... I can see some potential hazards. Currently, multiple xl processes
> > are serialized by a lock, and your assumption is the creation is done in
> > order, but suppose sometime later they can run in parallel. When you
> > have several "xl create" and they race with each other, what will
> > happen?
> >
> > This can be solved by serializing in libxl or hypervisor, I think.
> > It is up to you to choose where to do it.
> >
> > Also, please

Re: [Xen-devel] 4.9rc4: Cannot build with higher than -j4 - was: linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory

2017-05-15 Thread Wei Liu

On Mon, May 15, 2017 at 04:29:31PM +0100, Julien Grall wrote:
> Hi George,
> 
> CC Ian and Wei for feedback on the error.
> 

No you didn't. ;p

I've tracked down the issue. Trying to work out a fix at the moment.

The culprit is: stubdom build runs before tools build. The link farm
rune in tools/include depends on setting XEN_OS.

I'm rather baffled why this issue didn't surface until now.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/5] VT-d PI: restrict the vcpu number on a given pcpu

2017-05-15 Thread Chao Gao

On Mon, May 15, 2017 at 04:48:47PM +0100, George Dunlap wrote:
>On Thu, May 11, 2017 at 7:04 AM, Chao Gao  wrote:
>> Currently, a blocked vCPU is put in its pCPU's pi blocking list. If
>> too many vCPUs are blocked on a given pCPU, it will incur that the list
>> grows too long. After a simple analysis, there are 32k domains and
>> 128 vcpu per domain, thus about 4M vCPUs may be blocked in one pCPU's
>> PI blocking list. When a wakeup interrupt arrives, the list is
>> traversed to find some specific vCPUs to wake them up. This traversal in
>> that case would consume much time.
>>
>> To mitigate this issue, this patch limits the vcpu number on a given
>> pCPU, taking factors such as perfomance of common case, current hvm vcpu
>> count and current pcpu count into consideration. With this method, for
>> the common case, it works fast and for some extreme cases, the list
>> length is under control.
>>
>> The change in vmx_pi_unblock_vcpu() is for the following case:
>> vcpu is running -> try to block (this patch may change NSDT to
>> another pCPU) but notification comes in time, thus the vcpu
>> goes back to running station -> VM-entry (we should set NSDT again,
>> reverting the change we make to NSDT in vmx_vcpu_block())
>>
>> Signed-off-by: Chao Gao 
>> ---
>>  xen/arch/x86/hvm/vmx/vmx.c | 78 
>> +-
>>  1 file changed, 71 insertions(+), 7 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index efff6cd..c0d0b58 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -100,16 +100,70 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
>>  spin_lock_init(_cpu(vmx_pi_blocking, cpu).lock);
>>  }
>>
>> +/*
>> + * Choose an appropriate pcpu to receive wakeup interrupt.
>> + * By default, the local pcpu is chosen as the destination. But if the
>> + * vcpu number of the local pcpu exceeds a limit, another pcpu is chosen.
>> + *
>> + * Currently, choose (v_tot/p_tot) + K as the limit of vcpu, where
>> + * v_tot is the total number of vcpus on the system, p_tot is the total
>> + * number of pcpus in the system, and K is a fixed number. Experments shows
>> + * the maximal time to wakeup a vcpu from a 128-entry blocking list is
>> + * considered acceptable. So choose 128 as the fixed number K.
>> + *
>> + * This policy makes sure:
>> + * 1) for common cases, the limit won't be reached and the local pcpu is 
>> used
>> + * which is beneficial to performance (at least, avoid an IPI when 
>> unblocking
>> + * vcpu).
>> + * 2) for the worst case, the blocking list length scales with the vcpu 
>> count
>> + * divided by the pcpu count.
>> + */
>> +#define PI_LIST_FIXED_NUM 128
>> +#define PI_LIST_LIMIT (atomic_read(_hvm_vcpus) / num_online_cpus() 
>> + \
>> +   PI_LIST_FIXED_NUM)
>> +
>> +static unsigned int vmx_pi_choose_dest_cpu(struct vcpu *v)
>> +{
>> +int count, limit = PI_LIST_LIMIT;
>> +unsigned int dest = v->processor;
>> +
>> +count = atomic_read(_cpu(vmx_pi_blocking, dest).counter);
>> +while ( unlikely(count >= limit) )
>> +{
>> +dest = cpumask_cycle(dest, _online_map);
>> +count = atomic_read(_cpu(vmx_pi_blocking, dest).counter);
>> +}
>> +return dest;
>> +}
>> +
>>  static void vmx_vcpu_block(struct vcpu *v)
>>  {
>>  unsigned long flags;
>> -unsigned int dest;
>> +unsigned int dest, dest_cpu;
>>  spinlock_t *old_lock;
>> -spinlock_t *pi_blocking_list_lock =
>> -   _cpu(vmx_pi_blocking, v->processor).lock;
>>  struct pi_desc *pi_desc = >arch.hvm_vmx.pi_desc;
>> +spinlock_t *pi_blocking_list_lock;
>> +
>> +/*
>> + * After pCPU goes down, the per-cpu PI blocking list is cleared.
>> + * To make sure the parameter vCPU is added to the chosen pCPU's
>> + * PI blocking list before the list is cleared, just retry when
>> + * finding the pCPU has gone down. Also retry to choose another
>> + * pCPU when finding the list length reachs the limit.
>> + */
>> + retry:
>> +dest_cpu = vmx_pi_choose_dest_cpu(v);
>> +pi_blocking_list_lock = _cpu(vmx_pi_blocking, dest_cpu).lock;
>>
>>  spin_lock_irqsave(pi_blocking_list_lock, flags);
>> +if ( unlikely((!cpu_online(dest_cpu)) ||
>> +  (atomic_read(_cpu(vmx_pi_blocking, dest_cpu).counter) 
>> >=
>> +   PI_LIST_LIMIT)) )
>> +{
>> +spin_unlock_irqrestore(pi_blocking_list_lock, flags);
>> +goto retry;
>> +}
>
>Algorithmically I think this is on the right track. But all these
>atomic reads and writes are a mess.  Atomic accesses aren't free; and
>the vast majority of the time you're doing things with the
>pi_blocking_list_lock anyway.
>
>Why not do something like this at the top of vmx_vcpu_block()
>(replacing dest_cpu with pi_cpu for clarity)?
>
>pi_cpu = v->processor;
>retry:
>pi_blocking_list_lock =

Re: [Xen-devel] [PATCH v2 3/5] VT-d PI: restrict the vcpu number on a given pcpu

2017-05-15 Thread George Dunlap

On Thu, May 11, 2017 at 7:04 AM, Chao Gao  wrote:
> Currently, a blocked vCPU is put in its pCPU's pi blocking list. If
> too many vCPUs are blocked on a given pCPU, it will incur that the list
> grows too long. After a simple analysis, there are 32k domains and
> 128 vcpu per domain, thus about 4M vCPUs may be blocked in one pCPU's
> PI blocking list. When a wakeup interrupt arrives, the list is
> traversed to find some specific vCPUs to wake them up. This traversal in
> that case would consume much time.
>
> To mitigate this issue, this patch limits the vcpu number on a given
> pCPU, taking factors such as perfomance of common case, current hvm vcpu
> count and current pcpu count into consideration. With this method, for
> the common case, it works fast and for some extreme cases, the list
> length is under control.
>
> The change in vmx_pi_unblock_vcpu() is for the following case:
> vcpu is running -> try to block (this patch may change NSDT to
> another pCPU) but notification comes in time, thus the vcpu
> goes back to running station -> VM-entry (we should set NSDT again,
> reverting the change we make to NSDT in vmx_vcpu_block())
>
> Signed-off-by: Chao Gao 
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 78 
> +-
>  1 file changed, 71 insertions(+), 7 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index efff6cd..c0d0b58 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -100,16 +100,70 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
>  spin_lock_init(_cpu(vmx_pi_blocking, cpu).lock);
>  }
>
> +/*
> + * Choose an appropriate pcpu to receive wakeup interrupt.
> + * By default, the local pcpu is chosen as the destination. But if the
> + * vcpu number of the local pcpu exceeds a limit, another pcpu is chosen.
> + *
> + * Currently, choose (v_tot/p_tot) + K as the limit of vcpu, where
> + * v_tot is the total number of vcpus on the system, p_tot is the total
> + * number of pcpus in the system, and K is a fixed number. Experments shows
> + * the maximal time to wakeup a vcpu from a 128-entry blocking list is
> + * considered acceptable. So choose 128 as the fixed number K.
> + *
> + * This policy makes sure:
> + * 1) for common cases, the limit won't be reached and the local pcpu is used
> + * which is beneficial to performance (at least, avoid an IPI when unblocking
> + * vcpu).
> + * 2) for the worst case, the blocking list length scales with the vcpu count
> + * divided by the pcpu count.
> + */
> +#define PI_LIST_FIXED_NUM 128
> +#define PI_LIST_LIMIT (atomic_read(_hvm_vcpus) / num_online_cpus() + 
> \
> +   PI_LIST_FIXED_NUM)
> +
> +static unsigned int vmx_pi_choose_dest_cpu(struct vcpu *v)
> +{
> +int count, limit = PI_LIST_LIMIT;
> +unsigned int dest = v->processor;
> +
> +count = atomic_read(_cpu(vmx_pi_blocking, dest).counter);
> +while ( unlikely(count >= limit) )
> +{
> +dest = cpumask_cycle(dest, _online_map);
> +count = atomic_read(_cpu(vmx_pi_blocking, dest).counter);
> +}
> +return dest;
> +}
> +
>  static void vmx_vcpu_block(struct vcpu *v)
>  {
>  unsigned long flags;
> -unsigned int dest;
> +unsigned int dest, dest_cpu;
>  spinlock_t *old_lock;
> -spinlock_t *pi_blocking_list_lock =
> -   _cpu(vmx_pi_blocking, v->processor).lock;
>  struct pi_desc *pi_desc = >arch.hvm_vmx.pi_desc;
> +spinlock_t *pi_blocking_list_lock;
> +
> +/*
> + * After pCPU goes down, the per-cpu PI blocking list is cleared.
> + * To make sure the parameter vCPU is added to the chosen pCPU's
> + * PI blocking list before the list is cleared, just retry when
> + * finding the pCPU has gone down. Also retry to choose another
> + * pCPU when finding the list length reachs the limit.
> + */
> + retry:
> +dest_cpu = vmx_pi_choose_dest_cpu(v);
> +pi_blocking_list_lock = _cpu(vmx_pi_blocking, dest_cpu).lock;
>
>  spin_lock_irqsave(pi_blocking_list_lock, flags);
> +if ( unlikely((!cpu_online(dest_cpu)) ||
> +  (atomic_read(_cpu(vmx_pi_blocking, dest_cpu).counter) 
> >=
> +   PI_LIST_LIMIT)) )
> +{
> +spin_unlock_irqrestore(pi_blocking_list_lock, flags);
> +goto retry;
> +}

Algorithmically I think this is on the right track. But all these
atomic reads and writes are a mess.  Atomic accesses aren't free; and
the vast majority of the time you're doing things with the
pi_blocking_list_lock anyway.

Why not do something like this at the top of vmx_vcpu_block()
(replacing dest_cpu with pi_cpu for clarity)?

pi_cpu = v->processor;
retry:
pi_blocking_list_lock = _cpu(vmx_pi_blocking, pi_cpu).lock;
spin_lock_irqsave(pi_blocking_list_lock, flags);
/*
 * Since dest_cpu may now be one other than the one v is currently
 * running on, check to make sure that

Re: [Xen-devel] [PATCH for-4.9 1/2] x86/pv: Fix the handling of `int $x` for vectors which alias exceptions

2017-05-15 Thread Andrew Cooper

On 15/05/17 16:31, Jan Beulich wrote:
 On 15.05.17 at 14:50,  wrote:
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -633,9 +633,12 @@ void pv_inject_event(const struct x86_event *event)
>>  const struct trap_info *ti;
>>  const uint8_t vector = event->vector;
>>  const bool use_error_code =
>> +(event->type == X86_EVENTTYPE_HW_EXCEPTION) &&
>>  ((vector < 32) && (TRAP_HAVE_EC & (1u << vector)));
>>  unsigned int error_code = event->error_code;
>>  
>> +ASSERT(event->type == X86_EVENTTYPE_HW_EXCEPTION ||
>> +   event->type == X86_EVENTTYPE_SW_INTERRUPT);
> Wouldn't it be better to tighten this even further:
>
> if ( event->type == X86_EVENTTYPE_HW_EXCEPTION )
> {
> ASSERT(vector < 32);
> use_error_code = TRAP_HAVE_EC & (1u << vector);
> }
> else
> {
> ASSERT(event->type == X86_EVENTTYPE_SW_INTERRUPT);
> use_error_code = false;
> }
>
> ? If you agree
> Reviewed-by: Jan Beulich 
> with this or a substantially identical change.

Yeah.  I'm happy with this, and it will have a small knock-on to the
following patch.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.9 2/2] x86/pv: Replace do_guest_trap() with pv_inject_hw_exception()

2017-05-15 Thread Jan Beulich

>>> On 15.05.17 at 14:50,  wrote:
> do_guest_trap() is now functionally equivelent to pv_inject_hw_exception(),
> but with a less useful API as it requires the error code parameter to be
> passed implicitly via cpu_user_regs.
> 
> Extend pv_inject_event() with a further assertion which checks that hardware
> exception vectors are below 32, which is an x86 architectural expectation.

Interesting. As said for patch 1, I think this would better go there,
especially if ...

> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Julien Grall 
> 
> While not strictly a bugfix for 4.9, it would be nice to have it included (in
> light of the previous patch) to avoid the function duplication.

... that patch makes 4.9 but this one doesn't (and I think allowing
this one in would be bending the rules at least slightly).

> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -640,6 +640,8 @@ void pv_inject_event(const struct x86_event *event)
>  ASSERT(event->type == X86_EVENTTYPE_HW_EXCEPTION ||
> event->type == X86_EVENTTYPE_SW_INTERRUPT);
>  ASSERT(vector == event->vector); /* Confirm no truncation. */
> +if ( event->type == X86_EVENTTYPE_HW_EXCEPTION )
> +ASSERT(vector < 32);

With this hunk possibly removed (if a functionally similar addition
to patch 1 is being done)
Reviewed-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.9 1/2] x86/pv: Fix the handling of `int $x` for vectors which alias exceptions

2017-05-15 Thread Jan Beulich

>>> On 15.05.17 at 14:50,  wrote:
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -633,9 +633,12 @@ void pv_inject_event(const struct x86_event *event)
>  const struct trap_info *ti;
>  const uint8_t vector = event->vector;
>  const bool use_error_code =
> +(event->type == X86_EVENTTYPE_HW_EXCEPTION) &&
>  ((vector < 32) && (TRAP_HAVE_EC & (1u << vector)));
>  unsigned int error_code = event->error_code;
>  
> +ASSERT(event->type == X86_EVENTTYPE_HW_EXCEPTION ||
> +   event->type == X86_EVENTTYPE_SW_INTERRUPT);

Wouldn't it be better to tighten this even further:

if ( event->type == X86_EVENTTYPE_HW_EXCEPTION )
{
ASSERT(vector < 32);
use_error_code = TRAP_HAVE_EC & (1u << vector);
}
else
{
ASSERT(event->type == X86_EVENTTYPE_SW_INTERRUPT);
use_error_code = false;
}

? If you agree
Reviewed-by: Jan Beulich 
with this or a substantially identical change.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] 4.9rc4: Cannot build with higher than -j4 - was: linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory

2017-05-15 Thread Julien Grall


Hi George,

CC Ian and Wei for feedback on the error.

On 15/05/17 14:04, George Dunlap wrote:

On Sun, May 14, 2017 at 10:50 AM, Steven Haigh  wrote:

On 10/05/17 23:02, Steven Haigh wrote:

On 10/05/17 01:20, M A Young wrote:

On Tue, 9 May 2017, Steven Haigh wrote:


I'm trying to use the same build procedure I had for working correctly
for Xen 4.7 & 4.8.1 - but am coming across this error:

gcc  -DPIC -m64 -DBUILD_ID -fno-strict-aliasing -std=gnu99 -Wall
-Wstrict-prototypes -Wdeclaration-after-statement
-Wno-unused-but-set-variable -Wno-unused-local-typedefs   -g3 -O0
-fno-omit-frame-pointer -D__XEN_INTERFACE_VERSION__=__XE
N_LATEST_INTERFACE_VERSION__ -MMD -MF .linux.opic.d -D_LARGEFILE_SOURCE
-D_LARGEFILE64_SOURCE   -Werror -Wmissing-prototypes -I./include
-I/builddir/build/BUILD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/include
-I/builddir/build/BUI
LD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/libs/toollog/include
-I/builddir/build/BUILD/xen-4.9.0-rc4/tools/libs/evtchn/../../../tools/include
 -fPIC -c -o linux.opic linux.c
mv headers.chk.new headers.chk
linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory
 #include 
^
compilation terminated.
linux.c:27:28: fatal error: xen/sys/evtchn.h: No such file or directory
 #include 
^
compilation terminated.

Any clues as to what to start pulling apart that changed between 4.8.1
and 4.9.0-rc4 that could cause this?


It worked for me in a test build, eg. see one of the builds at
https://copr.fedorainfracloud.org/coprs/myoung/xentest/build/549124/


Ok, after lots of debugging, when I run 'make dist', I usually use the
macro for smp building, so I end up with:
  make %{?_smp_mflags} dist

It seems this is hit and miss as to it actually working.

I have had a 100% success rate (but slow builds) with:
  make dist

Trying with 'make -j4 dist' seems to work the couple of times I've tried it.

This seems to be a new problem that I haven't come across before in 4.4,
4.5, 4.6, 4.7 or my initial 4.8.1 builds - so its new to 4.9.0 rc's.

The consensus on #xen seems to be that there is a race between libs &
include - and that these are supposed to be built in sequence and not
parallel.

I'm a little over my depth now - as I assume this heads into Makefile land.

If it helps, there is a full build log available at:
  https://cloud.crc.id.au/index.php/s/iTWJE3A1TQBhgDq

I've committed my current progress in my git tree:
  https://xen.crc.id.au/git/?p=xen49;a=tree

Right now, we're looking at lines 304 / 305 of SPECS/xen49.spec


Just wanted to give this a nudge. It seems if you build with above -j4
(on a machine with suitable number of cores), the build will fail. This
is a degradation from any version previous to 4.9.


Julien,

Probably something we should put on your list as a release blocker.


I think you are right. Added in the list.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/5] xentrace: add TRC_HVM_PI_LIST_ADD

2017-05-15 Thread George Dunlap

On 11/05/17 07:04, Chao Gao wrote:
> This patch adds TRC_HVM_PI_LIST_ADD to track adding one entry to
> the per-pcpu blocking list. Also introduce a 'counter' to track
> the number of entries in the list.

So first of all, you have the importance of the order here backwards.
The most important thing this patch is doing is adding a counter to see
how many vcpus are on the list; tracing how that counter is moving is
secondary.

Secondly...

> 
> Signed-off-by: Chao Gao 
> ---
>  tools/xentrace/formats  |  1 +
>  xen/arch/x86/hvm/vmx/vmx.c  | 12 +++-
>  xen/include/asm-x86/hvm/trace.h |  1 +
>  xen/include/public/trace.h  |  1 +
>  4 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/xentrace/formats b/tools/xentrace/formats
> index 8b31780..999ca8c 100644
> --- a/tools/xentrace/formats
> +++ b/tools/xentrace/formats
> @@ -125,6 +125,7 @@
>  0x00082020  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  INTR_WINDOW [ value = 
> 0x%(1)08x ]
>  0x00082021  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  NPF [ gpa = 
> 0x%(2)08x%(1)08x mfn = 0x%(4)08x%(3)08x qual = 0x%(5)04x p2mt = 0x%(6)04x ]
>  0x00082023  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  TRAP[ vector = 
> 0x%(1)02x ]
> +0x00082026  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  PI_LIST_ADD [ domid = 
> 0x%(1)04x vcpu = 0x%(2)04x, pcpu = 0x%(3)04x, #entry = 0x%(4)04x ]
>  
>  0x0010f001  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  page_grant_map  [ domid 
> = %(1)d ]
>  0x0010f002  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  page_grant_unmap[ domid 
> = %(1)d ]
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index c8ef18a..efff6cd 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -82,6 +82,7 @@ static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
>  struct vmx_pi_blocking_vcpu {
>  struct list_head list;
>  spinlock_t   lock;
> +atomic_t counter;

Why is this atomic?  There's already a lock for this structure, and as
far as I can tell every access throughout the series is (or could be)
protected by a lock.

Finally, please add an entry to tools/xentrace/xenalyze.c to interpret
this value as well.

Thanks,
 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/vvmx: Improvements to INVEPT instruction handling

2017-05-15 Thread Andrew Cooper

On 08/02/17 07:46, Tian, Kevin wrote:
>> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
>> Sent: Tuesday, February 07, 2017 12:55 AM
>>
>>  * Latch current once at the start.
>>  * Avoid the memory operand read for INVEPT_ALL_CONTEXT.  Experimentally, 
>> this
>>is how hardware behaves, and avoids an unnecessary pagewalk.
>>  * Reject Reg/Reg encodings of the instruction.
>>  * Audit eptp against maxphysaddr.
>>  * Introduce and use VMX_INSN_INVALID_INV_OPERAND to correct the vmfail
>>semantics.
>>  * Add extra newlines for clarity
>>
>> Also, introduce some TODOs for further checks which should be performed.
>> These checks are hard to perform at the moment, as there is no easy way to 
>> see
>> which MSR values where given to the guest.
>>
>> Signed-off-by: Andrew Cooper 
> Acked-by: Kevin Tian 

Actually, it turns out that a combination of 2b2793d3 and f438b1c5 is
entirely broken for 32bit hypervisors, and this patch was an accidental
bugfix.

decode_vmx_inst() reads using the default memory operand size, meaning
that a 32bit code segment executing INVEPT only fills in the bottom half
of 

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [XEN-devel] arm64: fix incorrect pa_range_info table to support 42 bit PA systems.

2017-05-15 Thread Julien Grall


Hello Feng,

On 13/05/17 01:26, Feng Kan wrote:

The pa_range_info table contain incorrect root_order and t0sz which
prevent 42 bit PA systems from booting dom0.


As I mentioned in the previous thread [1], this is not a bug. What you 
configure below is the stage-2 page table and not the hypervisor page-table.


It is perfectly fine to expose less IPA (Intermediate Physical Address) 
bits than the number of PA (Physical Address) bits as long as all the 
address wired are below 40 bits (assumption made by the patch who added 
this code). Does your hardware have devices/RAM above 40 bits? If so, 
then you need to mention in the commit message.


This bring another question, now you will allocate 8 pages by default 
for both DOM0 and guests. Exposing 42 bits IPA to a guest does not sound 
necessary, so we would waste memory here. How are you going to address that?


Lastly, please quote the ARM ARM when you modify the generic ARM code to 
help the reviewer checking your code.


Cheers,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg01254.html




Signed-off-by: Feng Kan 
---
 xen/arch/arm/p2m.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 34d5776..cbb8675 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1479,7 +1479,7 @@ void __init setup_virt_paging(void)
 [0] = { 32,  32/*32*/,  0,  1 },
 [1] = { 36,  28/*28*/,  0,  1 },
 [2] = { 40,  24/*24*/,  1,  1 },
-[3] = { 42,  24/*22*/,  1,  1 },
+[3] = { 42,  22/*22*/,  3,  1 },
 [4] = { 44,  20/*20*/,  0,  2 },
 [5] = { 48,  16/*16*/,  0,  2 },
 [6] = { 0 }, /* Invalid */



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 4/5] VT-d PI: Adding reference count to pi_desc

2017-05-15 Thread George Dunlap

On Thu, May 11, 2017 at 7:04 AM, Chao Gao  wrote:
> This patch intruduces a 'refcnt' field in vmx_pi_blocking to track
> the reference count of 'pi_desc' of the vCPU. And change this field
> every time we re-program one IRTE.
>
> Signed-off-by: Chao Gao 
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 29 
>  xen/drivers/passthrough/io.c   |  2 +-
>  xen/drivers/passthrough/vtd/intremap.c | 41 
> --
>  xen/include/asm-x86/hvm/domain.h   |  6 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  3 +++
>  xen/include/asm-x86/iommu.h|  2 +-
>  xen/include/asm-x86/msi.h  |  2 +-

This doesn't apply to staging anymore:

error: while searching for:
int iommu_enable_x2apic_IR(void);
void iommu_disable_x2apic_IR(void);

int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq,
   const uint8_t gvec);

#endif /* !__ARCH_X86_IOMMU_H__ */

error: patch failed: xen/include/asm-x86/iommu.h:92


 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Enabling VT-d PI by default

2017-05-15 Thread George Dunlap

On Mon, May 15, 2017 at 2:35 PM, Andrew Cooper
 wrote:
> On 15/05/17 11:27, George Dunlap wrote:
>> On Fri, May 12, 2017 at 12:05 PM, Andrew Cooper
>>  wrote:
>>> Citrix Netscalar SDX boxes have more MSI-X interrupts than fit in the
>>> cumulative IDTs of a top end dual-socket Xeon server systems.  Some of
>>> the device drivers are purposefully modelled to use fewer interrupts
>>> than they otherwise would want to.
>>>
>>> Using PI is the proper solution longterm, because doing so would remove
>>> any need to allocate IDT vectors for the interrupts; the IOMMU could be
>>> programmed to dump device vectors straight into the PI block without
>>> them ever going through Xen's IDT.
>> I wouldn't necessarily call that a "proper" solution. With PI, instead
>> of an interrupt telling you exactly which VM to wake up and/or which
>> routine you need to run, instead you have to search through
>> (potentially) thousands of entries to see which vcpu the interrupt you
>> received wanted to wake up; and you need to do that on every single
>> interrupt.  (Obviously it does have the advantage that if the vcpu
>> happens to be running Xen doesn't get an interrupt at all.)
>
> Having spoken to the PI architects, this is not how the technology was
> designed to be used.
>
> On systems with this number of in-flight interrupts, trying to track
> "who got what interrupt" for priority boosting purposes is a waste of
> time, as we spend ages taking vmexits to process interrupt notifications
> for out-of-context vcpus.
>
> The way the PI architects envisaged the technology being used is that
> Suppress Notification is set at all points other than executing in
> non-root mode for the vcpu in question (there is a small race window
> around clearing SN on vmentry), and that the scheduler uses Outstanding
> Notification on each of the PI blocks when it rebalances credit to see
> which vcpus have had interrupts in the last 30ms.

It sounds like they may have made the mistake that the Credit1
designers made, in analyzing only a system that was overloaded; and
one where all workloads were identical, as opposed to analyzing a
system that was at least sometimes partially loaded, and where
workloads were very different.

You're right that if you weren't going to preempt the currently
running vcpu anyway, there's no need for Xen to get the interrupt.

But it should be obvious that on a system that's idle (even for a
relatively short amount of time) that we want to get the interrupt and
wake up the appropriate vcpu immediately.  It should also be obvious
that in a mixed workload, where one vcpu is doing tons of computation
and another is mainly handling interrupts quickly and going to sleep
again, that we would want Xen at regular intervals to check to see if
it should run the vcpu that's mostly handling interrupts.  We
generally wouldn't want to delay waking up the lower-priority vcpu
longer than 1ms.

In both cases, waiting 30ms to see if we should wake somebody up is
far too long.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Proposal to allow setting up shared memory areas between VMs from xl config file

2017-05-15 Thread Jan Beulich

>>> On 15.05.17 at 16:13,  wrote:
> On 15/05/17 13:28, Jan Beulich wrote:
> On 15.05.17 at 12:21,  wrote:
>>> On Zhongze proposal, the share page will be mapped at the a specific
>>> address in the guest memory. I agree this will require some work in the
>>> toolstack, on the hypervisor side we could re-use the foreign mapping
>>> API. But on the guest side there are nothing to do Xen specific.
>>
>> So what is the equivalent of the shared page on bare hardware?

No answer here?

>>> What's the benefit? Baremetal guest are usually tiny, you could use the
>>> device-tree (and hence generic way) to present the share page for
>>> communicating. This means no Xen PV drivers, and therefore easier to
>>> move an OS in Xen VM.
>>
>> Is this intended to be an ARM-specific extension, or a generic one?
>> There's no DT on x86 to pass such information, and I can't easily
>> see alternatives there. Also the consumer of the shared page info
>> is still a PV component of the guest. You simply can't have an
>> entirely unmodified guest which at the same time is Xen (or
>> whatever other component sits at the other end of the shared
>> page) aware.
> 
> The toolstack will setup the shared page in both the producer and 
> consumer guest. This will be setup during the domain creation. My 
> understanding is it will not be possible to share page after the two 
> domains have been created.

Whether this is going to become too limiting remains to be seen, but
in any event this doesn't answer my question regarding the x86 side.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Support of lguest?

2017-05-15 Thread Juergen Gross

On 15/05/17 16:22, Vitaly Kuznetsov wrote:
> Juergen Gross  writes:
> 
>> Lguest and Xen pv-guests are the only users of pv_mmu_ops (with the
>> one exception of the .exit_mmap member, which is being used by Xen
>> HVM-guests, too).
>>
>> As it is possible now to build a kernel without Xen pv-guest support
>> while keeping PVH and PVHVM support, I thought about putting most
>> pv_mmu_ops functions in #ifdef CONFIG_XEN_HAS_PVMMU sections.
> 
> There is an ongoing work to enable PV TLB flushing for Hyper-V guests:
> http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2017-April/104411.html
> 
> it utilizes .flush_tlb_others member in pv_mmu_ops.
> 
> hopefully, this work will be merged in 4.13.

Thanks for the information. I'll keep .flush_tlb_others member outside
the #ifdef section.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] x86/string: Clean up x86/string.h

2017-05-15 Thread Jan Beulich

>>> On 15.05.17 at 15:08,  wrote:
> On 15/05/17 11:19, Jan Beulich wrote:
>>  >>> On 15.05.17 at 12:08,  wrote:
>> On 12.05.17 at 19:35,  wrote:
 --- a/xen/include/asm-x86/string.h
 +++ b/xen/include/asm-x86/string.h
 @@ -2,13 +2,23 @@
  #define __X86_STRING_H__
  
  #define __HAVE_ARCH_MEMCPY
 -#define memcpy(t,f,n) (__builtin_memcpy((t),(f),(n)))
 +void *memcpy(void *dest, const void *src, size_t n);
 +#define memcpy(d, s, n) __builtin_memcpy(d, s, n)
  
 -/* Some versions of gcc don't have this builtin. It's non-critical 
 anyway. 
>>> */
  #define __HAVE_ARCH_MEMMOVE
 -extern void *memmove(void *dest, const void *src, size_t n);
 +void *memmove(void *dest, const void *src, size_t n);
 +#define memmove(d, s, n) __builtin_memmove(d, s, n)
  
  #define __HAVE_ARCH_MEMSET
 -#define memset(s,c,n) (__builtin_memset((s),(c),(n)))
 +void *memset(void *dest, int c, size_t n);
 +#define memset(s, c, n) __builtin_memset(s, c, n)
>>> Now that xen/string.h has the exact same declarations and
>>> definitions already, why don't you simply delete the overrides
>>> from here?
>> Hmm, wait - I guess you need to keep them because of the custom
>> implementation. That's awkward, there shouldn't be a need to have
>> redundant declarations just because there are custom
>> implementations. How about making __HAVE_ARCH_* serve both
>> purposes, by allowing it to have different values (besides being
>> defined or undefined)?
> 
> I don't understand how you would intend this new __HAVE_ARCH_* to work.

E.g. __HAVE_ARCH_* = 2 meaning arch provides declaration and
definition (i.e. generic header and source skip theirs), while
__HAVE_ARCH_* = 1 meaning arch provides just a definition, but
the generic declaration and macro (where applicable) are fine (i.e.
only the generic source skips its piece of code).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Support of lguest?

2017-05-15 Thread Vitaly Kuznetsov

Juergen Gross  writes:

> Lguest and Xen pv-guests are the only users of pv_mmu_ops (with the
> one exception of the .exit_mmap member, which is being used by Xen
> HVM-guests, too).
>
> As it is possible now to build a kernel without Xen pv-guest support
> while keeping PVH and PVHVM support, I thought about putting most
> pv_mmu_ops functions in #ifdef CONFIG_XEN_HAS_PVMMU sections.

There is an ongoing work to enable PV TLB flushing for Hyper-V guests:
http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2017-April/104411.html

it utilizes .flush_tlb_others member in pv_mmu_ops.

hopefully, this work will be merged in 4.13.

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Proposal to allow setting up shared memory areas between VMs from xl config file

2017-05-15 Thread Julien Grall


Hi Jan,

On 15/05/17 13:28, Jan Beulich wrote:

On 15.05.17 at 12:21,  wrote:

On 05/15/2017 09:52 AM, Jan Beulich wrote:

On 15.05.17 at 10:20,  wrote:

On 15/05/2017 09:08, Jan Beulich wrote:

On 12.05.17 at 19:01,  wrote:


1. Motivation and Description

Virtual machines use grant table hypercalls to setup a share page for
inter-VMs communications. These hypercalls are used by all PV
protocols today. However, very simple guests, such as baremetal
applications, might not have the infrastructure to handle the grant table.
This project is about setting up several shared memory areas for inter-VMs
communications directly from the VM config file.
So that the guest kernel doesn't have to have grant table support to be
able to communicate with other guests.


I think it would help to compare your proposal with the alternative of
adding grant table infrastructure to such environments (which I
wouldn't expect to be all that difficult). After all introduction of a
(seemingly) redundant mechanism comes at the price of extra /
duplicate code in the tool stack and maybe even in the hypervisor.
Hence there needs to be a meaningfully higher gain than price here.


This is a key feature for embedded because they want to be able to share
buffer very easily at domain creation time between two guests.

Adding the grant table driver in the guest OS as a high a cost when the
goal is to run unmodified OS in a VM. This is achievable on ARM if you
use passthrough.


"high cost" is pretty abstract and vague. And I admit I have difficulty
seeing how an entirely unmodified OS could leverage this newly
proposed sharing model.


Let's step back for a moment, I will come back on Zhongze proposal
afterwards.

Using grant table in the guest will obviously require the grant-table
driver. It is not that bad. However, how do you pass the grant ref
number to the other guest? The only way I can see is xenstore, so yet
another driver to port.


Just look at the amount of code that was needed to get PV drivers
to work in x86 HVM guests. It's not all that much. Plus making such
available in a new environment doesn't normally mean everything
needs to be written from scratch.


Even if PV drivers don't need to be written from scratch, it has a 
certain cost to port them to a new OS. By trying to make most of the VM 
interface agnostic to Xen, we potentially allow vendor to switch to Xen 
easily.





On Zhongze proposal, the share page will be mapped at the a specific
address in the guest memory. I agree this will require some work in the
toolstack, on the hypervisor side we could re-use the foreign mapping
API. But on the guest side there are nothing to do Xen specific.


So what is the equivalent of the shared page on bare hardware?


What's the benefit? Baremetal guest are usually tiny, you could use the
device-tree (and hence generic way) to present the share page for
communicating. This means no Xen PV drivers, and therefore easier to
move an OS in Xen VM.


Is this intended to be an ARM-specific extension, or a generic one?
There's no DT on x86 to pass such information, and I can't easily
see alternatives there. Also the consumer of the shared page info
is still a PV component of the guest. You simply can't have an
entirely unmodified guest which at the same time is Xen (or
whatever other component sits at the other end of the shared
page) aware.


The toolstack will setup the shared page in both the producer and 
consumer guest. This will be setup during the domain creation. My 
understanding is it will not be possible to share page after the two 
domains have been created.


This feature is not meant to replace grant-table but here to ease 
sharing a page between 2 guests without introducing any Xen knowledge in 
both guests.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [For Xen-4.10 Resend PATCH 3/3] Avoid excess icache flushes in populate_physmap() before domain has been created

2017-05-15 Thread Punit Agrawal

populate_physmap() calls alloc_heap_pages() per requested
extent. alloc_heap_pages() invalidates the entire icache per
extent. During domain creation, the icache invalidations can be deffered
until all the extents have been allocated as there is no risk of
executing stale instructions from the icache.

Introduce a new flag "MEMF_no_icache_flush" to be used to prevent
alloc_heap_pages() from performing icache maintenance operations. Use
the flag in populate_physmap() before the domain has been unpaused and
perform required icache maintenance function at the end of the
allocation.

One concern is the lack of synchronisation around testing for
"creation_finished". But it seems, in practice the window where it is
out of sync should be small enough to not matter.

Signed-off-by: Punit Agrawal 
---
 xen/common/memory.c| 31 ++-
 xen/common/page_alloc.c|  2 +-
 xen/include/asm-x86/page.h |  4 
 xen/include/xen/mm.h   |  2 ++
 4 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 52879e7438..34d2dda8b4 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -152,16 +152,26 @@ static void populate_physmap(struct memop_args *a)
 max_order(curr_d)) )
 return;
 
-/*
- * With MEMF_no_tlbflush set, alloc_heap_pages() will ignore
- * TLB-flushes. After VM creation, this is a security issue (it can
- * make pages accessible to guest B, when guest A may still have a
- * cached mapping to them). So we do this only during domain creation,
- * when the domain itself has not yet been unpaused for the first
- * time.
- */
 if ( unlikely(!d->creation_finished) )
+{
+/*
+ * With MEMF_no_tlbflush set, alloc_heap_pages() will ignore
+ * TLB-flushes. After VM creation, this is a security issue (it can
+ * make pages accessible to guest B, when guest A may still have a
+ * cached mapping to them). So we do this only during domain creation,
+ * when the domain itself has not yet been unpaused for the first
+ * time.
+ */
 a->memflags |= MEMF_no_tlbflush;
+/*
+ * With MEMF_no_icache_flush, alloc_heap_pages() will skip
+ * performing icache flushes. We do it only before domain
+ * creation as once the domain is running there is a danger of
+ * executing instructions from stale caches if icache flush is
+ * delayed.
+ */
+a->memflags |= MEMF_no_icache_flush;
+}
 
 for ( i = a->nr_done; i < a->nr_extents; i++ )
 {
@@ -211,7 +221,6 @@ static void populate_physmap(struct memop_args *a)
 }
 
 mfn = gpfn;
-page = mfn_to_page(mfn);
 }
 else
 {
@@ -255,6 +264,10 @@ static void populate_physmap(struct memop_args *a)
 out:
 if ( need_tlbflush )
 filtered_flush_tlb_mask(tlbflush_timestamp);
+
+if ( a->memflags & MEMF_no_icache_flush )
+invalidate_icache();
+
 a->nr_done = i;
 }
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index eba78f1a3d..8bcef6a547 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -833,7 +833,7 @@ static struct page_info *alloc_heap_pages(
 /* Ensure cache and RAM are consistent for platforms where the
  * guest can control its own visibility of/through the cache.
  */
-flush_page_to_ram(page_to_mfn([i]), true);
+flush_page_to_ram(page_to_mfn([i]), !(memflags & 
MEMF_no_icache_flush));
 }
 
 spin_unlock(_lock);
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 4cadb12646..3a375282f6 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -375,6 +375,10 @@ perms_strictly_increased(uint32_t old_flags, uint32_t 
new_flags)
 
 #define PAGE_ALIGN(x) (((x) + PAGE_SIZE - 1) & PAGE_MASK)
 
+static inline void invalidate_icache(void)
+{
+}
+
 #endif /* __X86_PAGE_H__ */
 
 /*
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 88de3c1fa6..ee50d4cd7b 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -224,6 +224,8 @@ struct npfec {
 #define  MEMF_no_owner(1U<<_MEMF_no_owner)
 #define _MEMF_no_tlbflush 6
 #define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
+#define _MEMF_no_icache_flush 7
+#define  MEMF_no_icache_flush (1U<<_MEMF_no_icache_flush)
 #define _MEMF_node8
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n) n) + 1) & MEMF_node_mask) << _MEMF_node)
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [For Xen-4.10 Resend PATCH 2/3] arm: p2m: Prevent redundant icache flushes

2017-05-15 Thread Punit Agrawal

When toolstack requests flushing the caches, flush_page_to_ram() is
called for each page of the requested domain. This needs to unnecessary
icache invalidation operations.

Let's take the responsibility of performing icache operations and use
the recently introduced flag to prevent redundant icache operations by
flush_page_to_ram().

Signed-off-by: Punit Agrawal 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/arm/p2m.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 29f2e2fad3..07357bce7d 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1392,13 +1392,15 @@ int p2m_cache_flush(struct domain *d, gfn_t start, 
unsigned long nr)
 /* XXX: Implement preemption */
 while ( gfn_x(start) < gfn_x(next_gfn) )
 {
-flush_page_to_ram(mfn_x(mfn), true);
+flush_page_to_ram(mfn_x(mfn), false);
 
 start = gfn_add(start, 1);
 mfn = mfn_add(mfn, 1);
 }
 }
 
+invalidate_icache();
+
 p2m_read_unlock(p2m);
 
 return 0;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [For Xen-4.10 Resend PATCH 1/3] Allow control of icache invalidations when calling flush_page_to_ram()

2017-05-15 Thread Punit Agrawal

flush_page_to_ram() unconditionally drops the icache. In certain
situations this leads to execessive icache flushes when
flush_page_to_ram() ends up being repeatedly called in a loop.

Introduce a parameter to allow callers of flush_page_to_ram() to take
responsibility of synchronising the icache. This is in preparations for
adding logic to make the callers perform the necessary icache
maintenance operations.

Signed-off-by: Punit Agrawal 
---
 xen/arch/arm/mm.c  | 5 +++--
 xen/arch/arm/p2m.c | 2 +-
 xen/common/page_alloc.c| 2 +-
 xen/include/asm-arm/page.h | 2 +-
 xen/include/asm-x86/flushtlb.h | 2 +-
 5 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 48f74f6e65..082c872c72 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -420,7 +420,7 @@ unsigned long domain_page_map_to_mfn(const void *ptr)
 }
 #endif
 
-void flush_page_to_ram(unsigned long mfn)
+void flush_page_to_ram(unsigned long mfn, bool sync_icache)
 {
 void *v = map_domain_page(_mfn(mfn));
 
@@ -435,7 +435,8 @@ void flush_page_to_ram(unsigned long mfn)
  * I-Cache (See D4.9.2 in ARM DDI 0487A.k_iss10775). Instead of using flush
  * by VA on select platforms, we just flush the entire cache here.
  */
-invalidate_icache();
+if ( sync_icache )
+invalidate_icache();
 }
 
 void __init arch_init_memory(void)
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 34d57760d7..29f2e2fad3 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1392,7 +1392,7 @@ int p2m_cache_flush(struct domain *d, gfn_t start, 
unsigned long nr)
 /* XXX: Implement preemption */
 while ( gfn_x(start) < gfn_x(next_gfn) )
 {
-flush_page_to_ram(mfn_x(mfn));
+flush_page_to_ram(mfn_x(mfn), true);
 
 start = gfn_add(start, 1);
 mfn = mfn_add(mfn, 1);
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e41fb4cd3..eba78f1a3d 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -833,7 +833,7 @@ static struct page_info *alloc_heap_pages(
 /* Ensure cache and RAM are consistent for platforms where the
  * guest can control its own visibility of/through the cache.
  */
-flush_page_to_ram(page_to_mfn([i]));
+flush_page_to_ram(page_to_mfn([i]), true);
 }
 
 spin_unlock(_lock);
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index 4b46e8831c..497b4c86ad 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -407,7 +407,7 @@ static inline void flush_xen_data_tlb_range_va(unsigned 
long va,
 }
 
 /* Flush the dcache for an entire page. */
-void flush_page_to_ram(unsigned long mfn);
+void flush_page_to_ram(unsigned long mfn, bool sync_icache);
 
 /*
  * Print a walk of a page table or p2m
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 8b7adef7c5..bd2be7e482 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -118,7 +118,7 @@ void flush_area_mask(const cpumask_t *, const void *va, 
unsigned int flags);
 #define flush_tlb_one_all(v)\
 flush_tlb_one_mask(_online_map, v)
 
-static inline void flush_page_to_ram(unsigned long mfn) {}
+static inline void flush_page_to_ram(unsigned long mfn, bool sync_icache) {}
 static inline int invalidate_dcache_va_range(const void *p,
  unsigned long size)
 { return -EOPNOTSUPP; }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [For Xen-4.10 Resend PATCH 0/3] Reduce unnecessary icache maintenance operations

2017-05-15 Thread Punit Agrawal

Hi,

This series was previously posted as an RFC[0]. An issue was discovered
in the RFC related to delaying icache invalidations when the domain is
active. Accordingly, Patch 3 has been modified to avoid per-page
icache invalidations only during domain creation.

Changes from RFC:

* Fixed coding style issue in Patch 1
* Added reviewed-by tags
* Re-worked Patch 3 to defer icache optimisation only during domain creation

Patch 1 adds a parameter to flush_page_to_ram() to prevent performing
icache maintenance per page. Current calls to flush_page_to_ram() loop
over pages and performing a full icache flush for each page is
excessive.

Patch 2 hoists icache maintenance from flush_page_to_ram() to
p2m_cache_flush().

Patch 3 introduces a new MEMF_ flag to indicate to alloc_heap_pages()
that icache maintenance will be performed by the caller. The icache
maintenance operations are performed in populate_physmap() during
domain creation. As I couldn't find icache maintenance operations for
x86, an empty helper is introduced.

Thanks,
Punit

[0] https://www.mail-archive.com/xen-devel@lists.xen.org/msg102934.html


Punit Agrawal (3):
  Allow control of icache invalidations when calling flush_page_to_ram()
  arm: p2m: Prevent redundant icache flushes
  Avoid excess icache flushes in populate_physmap() before domain has
been created

 xen/arch/arm/mm.c  |  5 +++--
 xen/arch/arm/p2m.c |  4 +++-
 xen/common/memory.c| 31 ++-
 xen/common/page_alloc.c|  2 +-
 xen/include/asm-arm/page.h |  2 +-
 xen/include/asm-x86/flushtlb.h |  2 +-
 xen/include/asm-x86/page.h |  4 
 xen/include/xen/mm.h   |  2 ++
 8 files changed, 37 insertions(+), 15 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] Support of lguest?

2017-05-15 Thread Juergen Gross

Lguest and Xen pv-guests are the only users of pv_mmu_ops (with the
one exception of the .exit_mmap member, which is being used by Xen
HVM-guests, too).

As it is possible now to build a kernel without Xen pv-guest support
while keeping PVH and PVHVM support, I thought about putting most
pv_mmu_ops functions in #ifdef CONFIG_XEN_HAS_PVMMU sections. If there
wouldn't be lguest...

So my question: is anybody still using lguest or would like to keep it?

If yes, I'd add CONFIG_PARAVIRT_MMU selected by CONFIG_XEN_PV and
CONFIG_LGUEST_GUEST.

If no, I'd remove lguest support and just use CONFIG_XEN_HAS_PVMMU,
in case nobody would like me to use CONFIG_PARAVIRT_MMU even if
lguest isn't there any more.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [For Xen-4.10 PATCH 0/3] Reduce unnecessary icache maintenance operations

2017-05-15 Thread Punit Agrawal

Looks like I've got Konrad's email wrong. Please ignore this
thread. I'll repost with the right address. :(

Apologies for the spam.

Punit


Punit Agrawal  writes:

> Hi,
>
> This series was previously posted as an RFC[0]. An issue was discovered
> in the RFC related to delaying icache invalidations when the domain is
> active. Accordingly, Patch 3 has been modified to avoid per-page
> icache invalidations only during domain creation.
>
> Changes from RFC:
>
> * Fixed coding style issue in Patch 1
> * Added reviewed-by tags
> * Re-worked Patch 3 to defer icache optimisation only during domain creation
>
> Patch 1 adds a parameter to flush_page_to_ram() to prevent performing
> icache maintenance per page. Current calls to flush_page_to_ram() loop
> over pages and performing a full icache flush for each page is
> excessive.
>
> Patch 2 hoists icache maintenance from flush_page_to_ram() to
> p2m_cache_flush().
>
> Patch 3 introduces a new MEMF_ flag to indicate to alloc_heap_pages()
> that icache maintenance will be performed by the caller. The icache
> maintenance operations are performed in populate_physmap() during
> domain creation. As I couldn't find icache maintenance operations for
> x86, an empty helper is introduced.
>
> Thanks,
> Punit
>
> [0] https://www.mail-archive.com/xen-devel@lists.xen.org/msg102934.html
>
>
> Punit Agrawal (3):
>   Allow control of icache invalidations when calling flush_page_to_ram()
>   arm: p2m: Prevent redundant icache flushes
>   Avoid excess icache flushes in populate_physmap() before domain has
> been created
>
>  xen/arch/arm/mm.c  |  5 +++--
>  xen/arch/arm/p2m.c |  4 +++-
>  xen/common/memory.c| 31 ++-
>  xen/common/page_alloc.c|  2 +-
>  xen/include/asm-arm/page.h |  2 +-
>  xen/include/asm-x86/flushtlb.h |  2 +-
>  xen/include/asm-x86/page.h |  4 
>  xen/include/xen/mm.h   |  2 ++
>  8 files changed, 37 insertions(+), 15 deletions(-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [For Xen-4.10 PATCH 3/3] Avoid excess icache flushes in populate_physmap() before domain has been created

2017-05-15 Thread Punit Agrawal

populate_physmap() calls alloc_heap_pages() per requested
extent. alloc_heap_pages() invalidates the entire icache per
extent. During domain creation, the icache invalidations can be deffered
until all the extents have been allocated as there is no risk of
executing stale instructions from the icache.

Introduce a new flag "MEMF_no_icache_flush" to be used to prevent
alloc_heap_pages() from performing icache maintenance operations. Use
the flag in populate_physmap() before the domain has been unpaused and
perform required icache maintenance function at the end of the
allocation.

One concern is the lack of synchronisation around testing for
"creation_finished". But it seems, in practice the window where it is
out of sync should be small enough to not matter.

Signed-off-by: Punit Agrawal 
---
 xen/common/memory.c| 31 ++-
 xen/common/page_alloc.c|  2 +-
 xen/include/asm-x86/page.h |  4 
 xen/include/xen/mm.h   |  2 ++
 4 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 52879e7438..34d2dda8b4 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -152,16 +152,26 @@ static void populate_physmap(struct memop_args *a)
 max_order(curr_d)) )
 return;
 
-/*
- * With MEMF_no_tlbflush set, alloc_heap_pages() will ignore
- * TLB-flushes. After VM creation, this is a security issue (it can
- * make pages accessible to guest B, when guest A may still have a
- * cached mapping to them). So we do this only during domain creation,
- * when the domain itself has not yet been unpaused for the first
- * time.
- */
 if ( unlikely(!d->creation_finished) )
+{
+/*
+ * With MEMF_no_tlbflush set, alloc_heap_pages() will ignore
+ * TLB-flushes. After VM creation, this is a security issue (it can
+ * make pages accessible to guest B, when guest A may still have a
+ * cached mapping to them). So we do this only during domain creation,
+ * when the domain itself has not yet been unpaused for the first
+ * time.
+ */
 a->memflags |= MEMF_no_tlbflush;
+/*
+ * With MEMF_no_icache_flush, alloc_heap_pages() will skip
+ * performing icache flushes. We do it only before domain
+ * creation as once the domain is running there is a danger of
+ * executing instructions from stale caches if icache flush is
+ * delayed.
+ */
+a->memflags |= MEMF_no_icache_flush;
+}
 
 for ( i = a->nr_done; i < a->nr_extents; i++ )
 {
@@ -211,7 +221,6 @@ static void populate_physmap(struct memop_args *a)
 }
 
 mfn = gpfn;
-page = mfn_to_page(mfn);
 }
 else
 {
@@ -255,6 +264,10 @@ static void populate_physmap(struct memop_args *a)
 out:
 if ( need_tlbflush )
 filtered_flush_tlb_mask(tlbflush_timestamp);
+
+if ( a->memflags & MEMF_no_icache_flush )
+invalidate_icache();
+
 a->nr_done = i;
 }
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index eba78f1a3d..8bcef6a547 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -833,7 +833,7 @@ static struct page_info *alloc_heap_pages(
 /* Ensure cache and RAM are consistent for platforms where the
  * guest can control its own visibility of/through the cache.
  */
-flush_page_to_ram(page_to_mfn([i]), true);
+flush_page_to_ram(page_to_mfn([i]), !(memflags & 
MEMF_no_icache_flush));
 }
 
 spin_unlock(_lock);
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 4cadb12646..3a375282f6 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -375,6 +375,10 @@ perms_strictly_increased(uint32_t old_flags, uint32_t 
new_flags)
 
 #define PAGE_ALIGN(x) (((x) + PAGE_SIZE - 1) & PAGE_MASK)
 
+static inline void invalidate_icache(void)
+{
+}
+
 #endif /* __X86_PAGE_H__ */
 
 /*
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 88de3c1fa6..ee50d4cd7b 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -224,6 +224,8 @@ struct npfec {
 #define  MEMF_no_owner(1U<<_MEMF_no_owner)
 #define _MEMF_no_tlbflush 6
 #define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
+#define _MEMF_no_icache_flush 7
+#define  MEMF_no_icache_flush (1U<<_MEMF_no_icache_flush)
 #define _MEMF_node8
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n) n) + 1) & MEMF_node_mask) << _MEMF_node)
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [For Xen-4.10 PATCH 0/3] Reduce unnecessary icache maintenance operations

2017-05-15 Thread Punit Agrawal

Hi,

This series was previously posted as an RFC[0]. An issue was discovered
in the RFC related to delaying icache invalidations when the domain is
active. Accordingly, Patch 3 has been modified to avoid per-page
icache invalidations only during domain creation.

Changes from RFC:

* Fixed coding style issue in Patch 1
* Added reviewed-by tags
* Re-worked Patch 3 to defer icache optimisation only during domain creation

Patch 1 adds a parameter to flush_page_to_ram() to prevent performing
icache maintenance per page. Current calls to flush_page_to_ram() loop
over pages and performing a full icache flush for each page is
excessive.

Patch 2 hoists icache maintenance from flush_page_to_ram() to
p2m_cache_flush().

Patch 3 introduces a new MEMF_ flag to indicate to alloc_heap_pages()
that icache maintenance will be performed by the caller. The icache
maintenance operations are performed in populate_physmap() during
domain creation. As I couldn't find icache maintenance operations for
x86, an empty helper is introduced.

Thanks,
Punit

[0] https://www.mail-archive.com/xen-devel@lists.xen.org/msg102934.html


Punit Agrawal (3):
  Allow control of icache invalidations when calling flush_page_to_ram()
  arm: p2m: Prevent redundant icache flushes
  Avoid excess icache flushes in populate_physmap() before domain has
been created

 xen/arch/arm/mm.c  |  5 +++--
 xen/arch/arm/p2m.c |  4 +++-
 xen/common/memory.c| 31 ++-
 xen/common/page_alloc.c|  2 +-
 xen/include/asm-arm/page.h |  2 +-
 xen/include/asm-x86/flushtlb.h |  2 +-
 xen/include/asm-x86/page.h |  4 
 xen/include/xen/mm.h   |  2 ++
 8 files changed, 37 insertions(+), 15 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [For Xen-4.10 PATCH 2/3] arm: p2m: Prevent redundant icache flushes

2017-05-15 Thread Punit Agrawal

When toolstack requests flushing the caches, flush_page_to_ram() is
called for each page of the requested domain. This needs to unnecessary
icache invalidation operations.

Let's take the responsibility of performing icache operations and use
the recently introduced flag to prevent redundant icache operations by
flush_page_to_ram().

Signed-off-by: Punit Agrawal 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/arm/p2m.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 29f2e2fad3..07357bce7d 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1392,13 +1392,15 @@ int p2m_cache_flush(struct domain *d, gfn_t start, 
unsigned long nr)
 /* XXX: Implement preemption */
 while ( gfn_x(start) < gfn_x(next_gfn) )
 {
-flush_page_to_ram(mfn_x(mfn), true);
+flush_page_to_ram(mfn_x(mfn), false);
 
 start = gfn_add(start, 1);
 mfn = mfn_add(mfn, 1);
 }
 }
 
+invalidate_icache();
+
 p2m_read_unlock(p2m);
 
 return 0;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [For Xen-4.10 PATCH 1/3] Allow control of icache invalidations when calling flush_page_to_ram()

2017-05-15 Thread Punit Agrawal

flush_page_to_ram() unconditionally drops the icache. In certain
situations this leads to execessive icache flushes when
flush_page_to_ram() ends up being repeatedly called in a loop.

Introduce a parameter to allow callers of flush_page_to_ram() to take
responsibility of synchronising the icache. This is in preparations for
adding logic to make the callers perform the necessary icache
maintenance operations.

Signed-off-by: Punit Agrawal 
---
 xen/arch/arm/mm.c  | 5 +++--
 xen/arch/arm/p2m.c | 2 +-
 xen/common/page_alloc.c| 2 +-
 xen/include/asm-arm/page.h | 2 +-
 xen/include/asm-x86/flushtlb.h | 2 +-
 5 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 48f74f6e65..082c872c72 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -420,7 +420,7 @@ unsigned long domain_page_map_to_mfn(const void *ptr)
 }
 #endif
 
-void flush_page_to_ram(unsigned long mfn)
+void flush_page_to_ram(unsigned long mfn, bool sync_icache)
 {
 void *v = map_domain_page(_mfn(mfn));
 
@@ -435,7 +435,8 @@ void flush_page_to_ram(unsigned long mfn)
  * I-Cache (See D4.9.2 in ARM DDI 0487A.k_iss10775). Instead of using flush
  * by VA on select platforms, we just flush the entire cache here.
  */
-invalidate_icache();
+if ( sync_icache )
+invalidate_icache();
 }
 
 void __init arch_init_memory(void)
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 34d57760d7..29f2e2fad3 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1392,7 +1392,7 @@ int p2m_cache_flush(struct domain *d, gfn_t start, 
unsigned long nr)
 /* XXX: Implement preemption */
 while ( gfn_x(start) < gfn_x(next_gfn) )
 {
-flush_page_to_ram(mfn_x(mfn));
+flush_page_to_ram(mfn_x(mfn), true);
 
 start = gfn_add(start, 1);
 mfn = mfn_add(mfn, 1);
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e41fb4cd3..eba78f1a3d 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -833,7 +833,7 @@ static struct page_info *alloc_heap_pages(
 /* Ensure cache and RAM are consistent for platforms where the
  * guest can control its own visibility of/through the cache.
  */
-flush_page_to_ram(page_to_mfn([i]));
+flush_page_to_ram(page_to_mfn([i]), true);
 }
 
 spin_unlock(_lock);
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index 4b46e8831c..497b4c86ad 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -407,7 +407,7 @@ static inline void flush_xen_data_tlb_range_va(unsigned 
long va,
 }
 
 /* Flush the dcache for an entire page. */
-void flush_page_to_ram(unsigned long mfn);
+void flush_page_to_ram(unsigned long mfn, bool sync_icache);
 
 /*
  * Print a walk of a page table or p2m
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 8b7adef7c5..bd2be7e482 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -118,7 +118,7 @@ void flush_area_mask(const cpumask_t *, const void *va, 
unsigned int flags);
 #define flush_tlb_one_all(v)\
 flush_tlb_one_mask(_online_map, v)
 
-static inline void flush_page_to_ram(unsigned long mfn) {}
+static inline void flush_page_to_ram(unsigned long mfn, bool sync_icache) {}
 static inline int invalidate_dcache_va_range(const void *p,
  unsigned long size)
 { return -EOPNOTSUPP; }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] include: fix build without C++ compiler installed

2017-05-15 Thread Julien Grall


Hi Jan,

On 12/05/17 07:52, Jan Beulich wrote:

The rule for headers++.chk wants to move headers++.chk.new to the
designated target, which means we have to create that file in the first
place.

Signed-off-by: Jan Beulich 


Release-acked-by: Julien Grall 

Cheers,



--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -120,7 +120,10 @@ headers99.chk: $(PUBLIC_C99_HEADERS) Mak

 headers++.chk: $(PUBLIC_HEADERS) Makefile
rm -f $@.new
-   $(CXX) -v >/dev/null 2>&1 || exit 0;  \
+   if ! $(CXX) -v >/dev/null 2>&1; then  \
+   touch $@.new; \
+   exit 0;   \
+   fi;   \
$(foreach i, $(filter %.h,$^),\
echo "#include "\"$(i)\"  \
| $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__\





--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] tools: don't require unavailable optional libraries in pkg-config files

2017-05-15 Thread Julien Grall


Hi Juergen,

On 12/05/17 14:10, Juergen Gross wrote:

blktap2 is optional, so there should be no pkg-config file requiring
xenblktapctl if it isn't enabled for the build.

Add a filter mechanism to tools/Rules.mk to filter out optional
libraries.

Signed-off-by: Juergen Gross 


Release-acked-by: Julien Grall 

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.9] ioemu-stubdom: don't link -softmmu and -linux-user

2017-05-15 Thread Julien Grall


Hi,

On 12/05/17 17:23, Wei Liu wrote:

On Fri, May 12, 2017 at 04:21:06PM +0100, Wei Liu wrote:

They are generated by ./configure. Having them linked can cause race
between tools build and stubdom build.

Signed-off-by: Wei Liu 


FTR Juergen told me on IRC:

Reviewed-by: Juergen Gross 


Release-acked-by: Julien Grall 

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [ARM] Native application design and discussion (I hope)

2017-05-15 Thread Andrii Anisov



On 12.05.17 21:47, Volodymyr Babchuk wrote:

vcoproc driver should be able to work with real HW, probably it will
handle real IRQs from device, we need one instance of driver per
device, not per domain. Andrii can correct me, but vcoproc framework
is not tied to vcpus, so it can work in context of any vcpu. Thus, it
will be accounted for that vcpu, that happened to execute at current
moment. Probably, this is not fair.
I guess the mmio access emulation should be accounted per domain's vcpu. 
Context switching - per idle vcpu (xen itself).

Should it be two different "native apps"?


Can we run vcoproc driver in a stubdomain? Probably yes, if we can
guarantee latency (as if in real time system). Just one example: 60FPS
displays are standard at this time. 1/60 gives us 16ms to deliver
frame to a display. 16ms should be enough to render next frame,
compose it, deliver to a display controller. Actually it is plenty of
time (most of the time). Now imagine that we want to share one GPU
between two domains. Actual render tasks can be very fast, lets say 1
ms for each domain. But to render both of them, we need to switch GPU
context at least two times (one time to render Guest A task, one time
to render Guest B task). This gives us 8ms between switches. If we
will put vcoproc driver to a stubdomain, we will be at mercy of vCPU
scheduler. It is good scheduler, but I don't know if it suits for this
use case. 8ms is an upper bound. If there will be three domains
sharing GPU, limit will be 6 ms. And, actually, one slice per domain
is not enough, because domain may be willing to render own portion
later. So, 1 ms will be more realistic requirement. I mean, that
stubdom with coproc driver should be scheduled every 1ms not matter of
what.
With native apps (or some light stubdomain) which will be scheduled
right when it is needed - this is much easier task.

At least, this is my vision of vcoproc driver problem. Andrii can
correct me, if I'm terribly wrong.

All above is correct enough.

--

*Andrii Anisov*



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/efi: Reserve EFI properties table

2017-05-15 Thread Julien Grall


Hi Andrew,

On 08/05/17 17:29, Andrew Cooper wrote:

On 08/05/17 17:17, Ross Lagerwall wrote:

Some EFI firmware implementations may place the EFI properties table in
RAM marked as BootServicesData, which Xen does not consider as reserved.
When dom0 tries to access the EFI properties table (which Linux >= 4.4
does), it crashes with a page fault.


The pagefault is just a side effect of Linux blindly assuming that the
ioremap() request succeeded.

From Xen's point of view, Dom0 tries to map a page which doesn't belong
to dom_xen, resulting a permission failure.


  Fix this by unconditionally
marking the EFI properties table as reserved in the E820, much like is
done with the dmi regions.

Signed-off-by: Ross Lagerwall 


Reviewed-by: Andrew Cooper 

This is probably also 4.9 material.


It looks like Jan had some comments on this patch. I haven't seen any 
reply from Ross, I will wait before looking from a release perspective.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V5] x86/ioreq_server: Make p2m_finish_type_change actually work

2017-05-15 Thread George Dunlap

On Sat, May 13, 2017 at 1:34 AM, Xiong Zhang  wrote:
> Commit 6d774a951696 ("x86/ioreq server: synchronously reset outstanding
> p2m_ioreq_server entries when an ioreq server unmaps") introduced
> p2m_finish_type_change(), which was meant to synchronously finish a
> previously initiated type change over a gpfn range.  It did this by
> calling get_entry(), checking if it was the appropriate type, and then
> calling set_entry().
>
> Unfortunately, a previous commit (1679e0df3df6 "x86/ioreq server:
> asynchronously reset outstanding p2m_ioreq_server entries") modified
> get_entry() to always return the new type after the type change, meaning
> that p2m_finish_type_change() never changed any entries.  Which means
> when an ioreq server was detached and then re-attached (as happens in
> XenGT on reboot) the re-attach failed.
>
> Fix this by using the existing p2m-specific recalculation logic instead
> of doing a read-check-write loop.
>
> Fix: 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
>   outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
>
> Signed-off-by: Xiong Zhang 
> Signed-off-by: Yu Zhang 
> Reviewed-by: George Dunlap 
> Reviewed-by: Jan Beulich 
> ---
> v1: Add ioreq_pre_recalc query flag to get the old p2m_type.(Jan)
> v2: Add p2m->recalc() hook to change gfn p2m_type. (George)
> v3: Make commit message clearer. (George)
> Keep the name of p2m-specific recal function unchanged. (Jan)
> v4: Move version info below S-o-B and handle return value of
> p2m->recalc. (Jan)
> v5: Fix coding style. (Julien)
>
> The target of this patch is Xen 4.9.
> ---
>  xen/arch/x86/hvm/dm.c |  5 +++--
>  xen/arch/x86/mm/p2m-ept.c |  1 +
>  xen/arch/x86/mm/p2m-pt.c  |  1 +
>  xen/arch/x86/mm/p2m.c | 35 +++
>  xen/include/asm-x86/p2m.h |  9 +
>  5 files changed, 33 insertions(+), 18 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
> index d72b7bd..99bf66a 100644
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -412,8 +412,9 @@ static int dm_op(domid_t domid,
>  first_gfn <= p2m->max_mapped_pfn )
>  {
>  /* Iterate p2m table for 256 gfns each time. */
> -p2m_finish_type_change(d, _gfn(first_gfn), 256,
> -   p2m_ioreq_server, p2m_ram_rw);
> +rc = p2m_finish_type_change(d, _gfn(first_gfn), 256);
> +if ( rc < 0 )
> +break;
>
>  first_gfn += 256;
>
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index f37a1f2..09efba7 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -1238,6 +1238,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
>
>  p2m->set_entry = ept_set_entry;
>  p2m->get_entry = ept_get_entry;
> +p2m->recalc = resolve_misconfig;
>  p2m->change_entry_type_global = ept_change_entry_type_global;
>  p2m->change_entry_type_range = ept_change_entry_type_range;
>  p2m->memory_type_changed = ept_memory_type_changed;
> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
> index 5079b59..2eddeee 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -1153,6 +1153,7 @@ void p2m_pt_init(struct p2m_domain *p2m)
>  {
>  p2m->set_entry = p2m_pt_set_entry;
>  p2m->get_entry = p2m_pt_get_entry;
> +p2m->recalc = do_recalc;
>  p2m->change_entry_type_global = p2m_pt_change_entry_type_global;
>  p2m->change_entry_type_range = p2m_pt_change_entry_type_range;
>  p2m->write_p2m_entry = paging_write_p2m_entry;
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 1d57e5c..1600422 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1011,33 +1011,44 @@ void p2m_change_type_range(struct domain *d,
>  p2m_unlock(p2m);
>  }
>
> -/* Synchronously modify the p2m type for a range of gfns from ot to nt. */
> -void p2m_finish_type_change(struct domain *d,
> -gfn_t first_gfn, unsigned long max_nr,
> -p2m_type_t ot, p2m_type_t nt)
> +/*
> + * Finish p2m type change for gfns which are marked as need_recalc in a 
> range.
> + * Returns: 0/1 for success, negative for failure
> + */
> +int p2m_finish_type_change(struct domain *d,
> +   gfn_t first_gfn, unsigned long max_nr)
>  {
>  struct p2m_domain *p2m = p2m_get_hostp2m(d);
> -p2m_type_t t;
>  unsigned long gfn = gfn_x(first_gfn);
>  unsigned long last_gfn = gfn + max_nr - 1;
> -
> -ASSERT(ot != nt);
> -ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
> +int rc = 0;
>
>  p2m_lock(p2m);
>
>  last_gfn = min(last_gfn, p2m->max_mapped_pfn);
>  while ( gfn <= last_gfn )
>  {
> -

Re: [Xen-devel] [PATCH v8 0/3] arm64, xen: add xen_boot support into grub-mkconfig

2017-05-15 Thread Daniel Kiper

Hi Julien,

On Mon, May 15, 2017 at 02:43:28PM +0100, Julien Grall wrote:
> Hi Daniel,
>
> On 15/05/17 14:38, Daniel Kiper wrote:
> >On Sun, May 14, 2017 at 03:43:44PM +0800, fu@linaro.org wrote:
> >>From: Fu Wei 
> >>
> >>This patchset add xen_boot support into grub-mkconfig for
> >>generating xen boot entrances automatically
> >>
> >>Also update the docs/grub.texi for new xen_boot commands.
> >
> >LGTM, if there are no objections I will commit it at the end
> >of this week or the beginning of next one.
>
> Thank you!
>
> Can you also please commit patch [1] which has been sitting on the grub
> ML for more than a year? This is preventing to boot Xen ARM with GRUB.
>
> Cheers,
>
> [1] https://lists.gnu.org/archive/html/grub-devel/2016-02/msg00205.html

Will do with this patch series.

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V4] x86/ioreq_server: Make p2m_finish_type_change actually work

2017-05-15 Thread George Dunlap

On Fri, May 12, 2017 at 3:42 AM, Xiong Zhang  wrote:
> Commit 6d774a951696 ("x86/ioreq server: synchronously reset outstanding
> p2m_ioreq_server entries when an ioreq server unmaps") introduced
> p2m_finish_type_change(), which was meant to synchronously finish a
> previously initiated type change over a gpfn range.  It did this by
> calling get_entry(), checking if it was the appropriate type, and then
> calling set_entry().
>
> Unfortunately, a previous commit (1679e0df3df6 "x86/ioreq server:
> asynchronously reset outstanding p2m_ioreq_server entries") modified
> get_entry() to always return the new type after the type change, meaning
> that p2m_finish_type_change() never changed any entries.  Which means
> when an ioreq server was detached and then re-attached (as happens in
> XenGT on reboot) the re-attach failed.
>
> Fix this by using the existing p2m-specific recalculation logic instead
> of doing a read-check-write loop.
>
> Fix: 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
>   outstanding p2m_ioreq_server entries when an ioreq server unmaps")'
>
> Signed-off-by: Xiong Zhang 
> Signed-off-by: Yu Zhang 
> Reviewed-by: George Dunlap 
> ---
> v1: Add ioreq_pre_recalc query flag to get the old p2m_type.(Jan)
> v2: Add p2m->recalc() hook to change gfn p2m_type. (George)
> v3: Make commit message clearer. (George)
> Keep the name of p2m-specific recal function unchanged. (Jan)
> v4: Move version info below S-o-B and handle return value of
> p2m->recalc. (Jan)

Sorry to be picky, but the handling of the return value introduces
another way the patch could be incorrect, so you should have dropped
my R-b.

I'll respond to v5.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

1 2 >

1 - 100 of 152 matches

Mail list logo