Re: [Xen-devel] [PATCH] x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise

2017-06-20 Thread Ethan Zhao
Borislav,

How about skipping the misc_register() steps and keep silence if
you know it is running as Dom0 ? as it is easy to know.


Reviewed-by: Ethan Zhao 

On Wed, Jun 21, 2017 at 5:16 AM, Borislav Petkov  wrote:
> From: Juergen Gross 
>
> When running under Xen as dom0, /dev/mcelog is being provided by Xen
> instead of the normal mcelog character device of the MCE core. Convert
> an error message being issued by the MCE core in this case to an
> informative message that Xen has registered the device.
>
> Signed-off-by: Juergen Gross 
> Cc: Tony Luck 
> Cc: linux-edac 
> Cc: x86-ml 
> Cc: xen-de...@lists.xenproject.org
> Link: http://lkml.kernel.org/r/20170614084059.19294-1-jgr...@suse.com
> [ Massage a bit. ]
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/cpu/mcheck/dev-mcelog.c | 8 +++-
>  drivers/xen/mcelog.c| 2 ++
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/dev-mcelog.c 
> b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
> index a80427c30c93..10cec43aac38 100644
> --- a/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
> +++ b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
> @@ -415,9 +415,15 @@ static __init int dev_mcelog_init_device(void)
> /* register character device /dev/mcelog */
> err = misc_register(_chrdev_device);
> if (err) {
> -   pr_err("Unable to init device /dev/mcelog (rc: %d)\n", err);
> +   if (err == -EBUSY)
> +   /* Xen dom0 might have registered the device already. 
> */
> +   pr_info("Unable to init device /dev/mcelog, already 
> registered");
> +   else
> +   pr_err("Unable to init device /dev/mcelog (rc: 
> %d)\n", err);
> +
> return err;
> }
> +
> mce_register_decode_chain(_mcelog_nb);
> return 0;
>  }
> diff --git a/drivers/xen/mcelog.c b/drivers/xen/mcelog.c
> index a493c7315e94..6cc1c15bcd84 100644
> --- a/drivers/xen/mcelog.c
> +++ b/drivers/xen/mcelog.c
> @@ -408,6 +408,8 @@ static int __init xen_late_init_mcelog(void)
> if (ret)
> goto deregister;
>
> +   pr_info("/dev/mcelog registered by Xen\n");
> +
> return 0;
>
>  deregister:
> --
> 2.13.0
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable-smoke test] 110907: tolerable trouble: broken/pass - PUSHED

2017-06-20 Thread osstest service owner
flight 110907 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/110907/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  d8f1b48fd665d7aad1711de2f073540d07d2d041
baseline version:
 xen  461b2165346de236fff2d00d1c318062f1daab08

Last test of basis   110573  2017-06-19 15:01:32 Z1 days
Testing same since   110907  2017-06-21 00:20:54 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Andrew Morton 
  Artem Bityutskiy 
  David Woodhouse 
  George Dunlap 
  Ian Jackson 
  Jan Beulich 
  Julien Grall 
  Konrad Rzeszutek Wilk 
  Linus Torvalds 
  Peter Zijlstra 
  Petre Pircalabu 
  Praveen Kumar 
  Roger Pau Monné 
  Ross Lagerwall 
  Tamas K Lengyel 
  Tim Deegan 
  Wei Liu 
  Wolfram Strepp 

jobs:
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  broken  
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=xen-unstable-smoke
+ revision=d8f1b48fd665d7aad1711de2f073540d07d2d041
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
d8f1b48fd665d7aad1711de2f073540d07d2d041
+ branch=xen-unstable-smoke
+ revision=d8f1b48fd665d7aad1711de2f073540d07d2d041
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.9-testing
+ '[' xd8f1b48fd665d7aad1711de2f073540d07d2d041 = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : 

[Xen-devel] [PATCH] ARM: SMMUv2: Add compatible match entry for cavium smmuv2

2017-06-20 Thread Manish Jaggi
This patch adds cavium,smmu-v2 compatible match entry in smmu driver

Signed-off-by: Manish Jaggi 
---
 xen/drivers/passthrough/arm/smmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 1082fcf..887f874 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2272,6 +2272,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,mmu-400", .data = (void *)ARM_SMMU_V1 },
{ .compatible = "arm,mmu-401", .data = (void *)ARM_SMMU_V1 },
{ .compatible = "arm,mmu-500", .data = (void *)ARM_SMMU_V2 },
+   { .compatible = "cavium,smmu-v2", .data = (void *)ARM_SMMU_V2 },
{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/4] ARM: ITS: Deny hardware domain access to its

2017-06-20 Thread Manish Jaggi
This patch extends the gicv3_iomem_deny_access functionality by adding support
for its region as well. Added function gicv3_its_deny_access.

Signed-off-by: Manish Jaggi 
---
 xen/arch/arm/gic-v3-its.c| 19 +++
 xen/arch/arm/gic-v3.c|  7 +++
 xen/include/asm-arm/gic_v3_its.h |  8 
 3 files changed, 34 insertions(+)

diff --git a/xen/arch/arm/gic-v3-its.c b/xen/arch/arm/gic-v3-its.c
index e11f29a..98c8f46 100644
--- a/xen/arch/arm/gic-v3-its.c
+++ b/xen/arch/arm/gic-v3-its.c
@@ -20,6 +20,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -905,6 +906,24 @@ struct pending_irq *gicv3_assign_guest_event(struct domain 
*d,
 return pirq;
 }
 
+int gicv3_its_deny_access(const struct domain *d)
+{
+int rc = 0;
+unsigned long mfn, nr;
+const struct host_its *its_data;
+
+list_for_each_entry(its_data, _its_list, entry)
+{
+mfn = paddr_to_pfn(its_data->addr);
+nr = PFN_UP(ACPI_GICV3_ITS_MEM_SIZE);
+rc = iomem_deny_access(d, mfn, mfn + nr);
+if ( rc )
+break;
+}
+
+return rc;
+}
+
 /*
  * Create the respective guest DT nodes from a list of host ITSes.
  * This copies the reg property, so the guest sees the ITS at the same address
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index 558b32c..f6fbf2f 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1308,6 +1308,13 @@ static int gicv3_iomem_deny_access(const struct domain 
*d)
 if ( rc )
 return rc;
 
+if ( gicv3_its_host_has_its() )
+{
+rc = gicv3_its_deny_access(d);
+if ( rc )
+return rc;
+}
+
 for ( i = 0; i < gicv3.rdist_count; i++ )
 {
 mfn = gicv3.rdist_regions[i].base >> PAGE_SHIFT;
diff --git a/xen/include/asm-arm/gic_v3_its.h b/xen/include/asm-arm/gic_v3_its.h
index bcfa181..84dbb9c 100644
--- a/xen/include/asm-arm/gic_v3_its.h
+++ b/xen/include/asm-arm/gic_v3_its.h
@@ -143,6 +143,9 @@ int gicv3_its_acpi_init(struct acpi_subtable_header *header,
 const unsigned long end);
 #endif
 
+/* Deny iomem access for its */
+int gicv3_its_deny_access(const struct domain *d);
+
 bool gicv3_its_host_has_its(void);
 
 unsigned int vgic_v3_its_count(const struct domain *d);
@@ -212,6 +215,11 @@ static inline int gicv3_its_acpi_init(struct 
acpi_subtable_header *header,
 }
 #endif
 
+static inline int gicv3_its_deny_access(const struct domain *d)
+{
+return 0;
+}
+
 static inline bool gicv3_its_host_has_its(void)
 {
 return false;
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/4] ARM: ACPI: Add ITS to hardware domain MADT

2017-06-20 Thread Manish Jaggi
This patch adds ITS information in hardware domain's MADT table.
Also this patch interoduces .get_hwdom_madt_size in gic_hw_operations,
to return the complete size of MADT table for hardware domain.

Signed-off-by: Manish Jaggi 
---
 xen/arch/arm/domain_build.c  |  7 +--
 xen/arch/arm/gic-v2.c|  6 ++
 xen/arch/arm/gic-v3-its.c| 34 ++
 xen/arch/arm/gic-v3.c| 18 ++
 xen/arch/arm/gic.c   | 11 +++
 xen/include/asm-arm/gic.h|  3 +++
 xen/include/asm-arm/gic_v3_its.h | 12 
 7 files changed, 85 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 3abacc0..15c7f9b 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1802,12 +1802,7 @@ static int estimate_acpi_efi_size(struct domain *d, 
struct kernel_info *kinfo)
 acpi_size = ROUNDUP(sizeof(struct acpi_table_fadt), 8);
 acpi_size += ROUNDUP(sizeof(struct acpi_table_stao), 8);
 
-madt_size = sizeof(struct acpi_table_madt)
-+ sizeof(struct acpi_madt_generic_interrupt) * d->max_vcpus
-+ sizeof(struct acpi_madt_generic_distributor);
-if ( d->arch.vgic.version == GIC_V3 )
-madt_size += sizeof(struct acpi_madt_generic_redistributor)
- * d->arch.vgic.nr_regions;
+madt_size = gic_get_hwdom_madt_size(d);
 acpi_size += ROUNDUP(madt_size, 8);
 
 addr = acpi_os_get_root_pointer();
diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index ffbe47c..e92dc3d 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -1012,6 +1012,11 @@ static int gicv2_iomem_deny_access(const struct domain 
*d)
 return iomem_deny_access(d, mfn, mfn + nr);
 }
 
+static u32 gicv2_get_hwdom_madt_size(const struct domain *d)
+{
+return 0;
+}
+
 #ifdef CONFIG_ACPI
 static int gicv2_make_hwdom_madt(const struct domain *d, u32 offset)
 {
@@ -1248,6 +1253,7 @@ const static struct gic_hw_operations gicv2_ops = {
 .read_apr= gicv2_read_apr,
 .make_hwdom_dt_node  = gicv2_make_hwdom_dt_node,
 .make_hwdom_madt = gicv2_make_hwdom_madt,
+.get_hwdom_madt_size = gicv2_get_hwdom_madt_size,
 .map_hwdom_extra_mappings = gicv2_map_hwdown_extra_mappings,
 .iomem_deny_access   = gicv2_iomem_deny_access,
 .do_LPI  = gicv2_do_LPI,
diff --git a/xen/arch/arm/gic-v3-its.c b/xen/arch/arm/gic-v3-its.c
index 98c8f46..7f8ff34 100644
--- a/xen/arch/arm/gic-v3-its.c
+++ b/xen/arch/arm/gic-v3-its.c
@@ -924,6 +924,40 @@ int gicv3_its_deny_access(const struct domain *d)
 return rc;
 }
 
+#ifdef CONFIG_ACPI
+u32 gicv3_its_madt_generic_translator_size(void)
+{
+const struct host_its *its_data;
+u32 size = 0;
+
+list_for_each_entry(its_data, _its_list, entry)
+size += sizeof(struct acpi_madt_generic_translator);
+
+return size;
+}
+
+u32 gicv3_its_make_hwdom_madt(u8 *base_ptr, u32 offset)
+{
+struct acpi_madt_generic_translator *gic_its;
+const struct host_its *its_data;
+u32 table_len = offset, size;
+
+/* Update GIC ITS information in hardware domain's MADT */
+list_for_each_entry(its_data, _its_list, entry)
+{
+size = sizeof(struct acpi_madt_generic_translator);
+gic_its = (struct acpi_madt_generic_translator *)(base_ptr
+   + table_len);
+gic_its->header.type = ACPI_MADT_TYPE_GENERIC_TRANSLATOR;
+gic_its->header.length = size;
+gic_its->base_address = its_data->addr;
+gic_its->translation_id = its_data->translation_id;
+table_len +=  size;
+}
+
+return table_len;
+}
+#endif
 /*
  * Create the respective guest DT nodes from a list of host ITSes.
  * This copies the reg property, so the guest sees the ITS at the same address
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index f6fbf2f..c7a8c1c 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1407,9 +1407,21 @@ static int gicv3_make_hwdom_madt(const struct domain *d, 
u32 offset)
 table_len += size;
 }
 
+table_len = gicv3_its_make_hwdom_madt(base_ptr, table_len);
 return table_len;
 }
 
+static u32 gicv3_get_hwdom_madt_size(const struct domain *d)
+{
+u32 size;
+size  = sizeof(struct acpi_madt_generic_redistributor)
+ * d->arch.vgic.nr_regions;
+if ( gicv3_its_host_has_its() )
+size  += gicv3_its_madt_generic_translator_size();
+
+return size;
+}
+
 static int __init
 gic_acpi_parse_madt_cpu(struct acpi_subtable_header *header,
 const unsigned long end)
@@ -1605,6 +1617,11 @@ static int gicv3_make_hwdom_madt(const struct domain *d, 
u32 offset)
 {
 return 0;
 }
+
+static u32 gicv3_get_hwdom_madt_size(const struct domain *d)
+{
+return 0;
+}
 #endif
 
 /* Set up the GIC */
@@ -1706,6 +1723,7 @@ static const struct gic_hw_operations 

[Xen-devel] [PATCH 0/4] ARM: ACPI: ITS: Add ITS Support for ACPI hardware domain

2017-06-20 Thread Manish Jaggi
This patch series adds the support of ITS for ACPI hardware domain.
It is tested on staging branch with has ITS v12 patchset by Andre.

I have tried to incorporate the review comments on the RFC v1/v2 patch.
The single patch in RFC is now split into 4 patches. 

Patch1: ARM: ITS: Add translation_id to host_its
 Adds translation_id in host_its data structure, which is populated from 
 translation_id read from firmwar MADT. This value is then programmed into
 local MADT created for hardware domain in patch 4.

Patch2: ARM: ITS: ACPI: Introduce gicv3_its_acpi_init
 Introduces function for its_acpi_init, which calls add_to_host_its_list
 which is a common function also called from _dt variant.

Patch3: ARM: ITS: Deny hardware domain access to its
 Extends the gicv3_iomem_deny to include its regions as well

Patch4: ARM: ACPI: Add ITS to hardware domain MADT
 This patch adds ITS information in hardware domain's MADT table. 
 Also this patch interoduces .get_hwdom_madt_size in gic_hw_operations,
 to return the complete size of MADT table for hardware domain.


Manish Jaggi (4):
  ARM: ITS: Add translation_id to host_its
  ARM: ITS: ACPI: Introduce gicv3_its_acpi_init
  ARM: ITS: Deny hardware domain access to its
  ARM: ACPI: Add ITS to hardware domain MADT

 xen/arch/arm/domain_build.c  |   7 +--
 xen/arch/arm/gic-v2.c|   6 +++
 xen/arch/arm/gic-v3-its.c| 102 +++
 xen/arch/arm/gic-v3.c|  31 
 xen/arch/arm/gic.c   |  11 +
 xen/include/asm-arm/gic.h|   3 ++
 xen/include/asm-arm/gic_v3_its.h |  36 ++
 7 files changed, 180 insertions(+), 16 deletions(-)

-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/4] ARM: ITS: ACPI: Introduce gicv3_its_acpi_init

2017-06-20 Thread Manish Jaggi
This patch adds gicv3_its_acpi_init. To avoid duplicate code for
initializing and adding to host_its_list a common function
add_to_host_its_list is added which is called by both _dt_init and _acpi_init.

Signed-off-by: Manish Jaggi 
---
 xen/arch/arm/gic-v3-its.c| 49 
 xen/arch/arm/gic-v3.c|  6 +
 xen/include/asm-arm/gic_v3_its.h | 14 
 3 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/gic-v3-its.c b/xen/arch/arm/gic-v3-its.c
index 2d36030..e11f29a 100644
--- a/xen/arch/arm/gic-v3-its.c
+++ b/xen/arch/arm/gic-v3-its.c
@@ -33,6 +33,7 @@
 
 #define ITS_CMD_QUEUE_SZSZ_1M
 
+#define ACPI_GICV3_ITS_MEM_SIZE (SZ_64K)
 /*
  * No lock here, as this list gets only populated upon boot while scanning
  * firmware tables for all host ITSes, and only gets iterated afterwards.
@@ -976,11 +977,35 @@ int gicv3_its_make_hwdom_dt_nodes(const struct domain *d,
 return res;
 }
 
+/* Common function for addind to host_its_list
+*/
+static int add_to_host_its_list(u64 addr, u64 size,
+  u32 translation_id, const void *node)
+{
+struct host_its *its_data;
+its_data = xzalloc(struct host_its);
+
+if ( !its_data )
+return -1;
+
+if ( node )
+its_data->dt_node = node;
+
+its_data->addr = addr;
+its_data->size = size;
+its_data->translation_id = translation_id;
+printk("GICv3: Found ITS @0x%lx\n", addr);
+
+list_add_tail(_data->entry, _its_list);
+
+return 0;
+}
+
 /* Scan the DT for any ITS nodes and create a list of host ITSes out of it. */
 void gicv3_its_dt_init(const struct dt_device_node *node)
 {
 const struct dt_device_node *its = NULL;
-struct host_its *its_data;
+static int its_id = 1;
 
 /*
  * Check for ITS MSI subnodes. If any, add the ITS register
@@ -996,19 +1021,23 @@ void gicv3_its_dt_init(const struct dt_device_node *node)
 if ( dt_device_get_address(its, 0, , ) )
 panic("GICv3: Cannot find a valid ITS frame address");
 
-its_data = xzalloc(struct host_its);
-if ( !its_data )
-panic("GICv3: Cannot allocate memory for ITS frame");
+if ( add_to_host_its_list(addr, size, its_id++, its) )
+panic("GICV3: Adding Host ITS failed ");
+}
+}
 
-its_data->addr = addr;
-its_data->size = size;
-its_data->dt_node = its;
+#ifdef CONFIG_ACPI
+int gicv3_its_acpi_init(struct acpi_subtable_header *header, const unsigned 
long end)
+{
+struct acpi_madt_generic_translator *its_entry;
 
-printk("GICv3: Found ITS @0x%lx\n", addr);
+its_entry = (struct acpi_madt_generic_translator *)header;
 
-list_add_tail(_data->entry, _its_list);
-}
+return add_to_host_its_list(its_entry->base_address,
+ACPI_GICV3_ITS_MEM_SIZE,
+its_entry->translation_id, NULL);
 }
+#endif
 
 /*
  * Local variables:
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index c927306..558b32c 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1567,6 +1567,12 @@ static void __init gicv3_acpi_init(void)
 
 gicv3.rdist_stride = 0;
 
+count = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_TRANSLATOR,
+  gicv3_its_acpi_init, 0);
+
+if ( count <= 0 )
+panic("GICv3: Can't get ITS entry");
+
 /*
  * In ACPI, 0 is considered as the invalid address. However the rest
  * of the initialization rely on the invalid address to be
diff --git a/xen/include/asm-arm/gic_v3_its.h b/xen/include/asm-arm/gic_v3_its.h
index 96b910b..bcfa181 100644
--- a/xen/include/asm-arm/gic_v3_its.h
+++ b/xen/include/asm-arm/gic_v3_its.h
@@ -105,6 +105,7 @@
 
 #include 
 #include 
+#include 
 
 #define HOST_ITS_FLUSH_CMD_QUEUE(1U << 0)
 #define HOST_ITS_USES_PTA   (1U << 1)
@@ -137,6 +138,11 @@ extern struct list_head host_its_list;
 /* Parse the host DT and pick up all host ITSes. */
 void gicv3_its_dt_init(const struct dt_device_node *node);
 
+#ifdef CONFIG_ACPI
+int gicv3_its_acpi_init(struct acpi_subtable_header *header,
+const unsigned long end);
+#endif
+
 bool gicv3_its_host_has_its(void);
 
 unsigned int vgic_v3_its_count(const struct domain *d);
@@ -198,6 +204,14 @@ static inline void gicv3_its_dt_init(const struct 
dt_device_node *node)
 {
 }
 
+#ifdef CONFIG_ACPI
+static inline int gicv3_its_acpi_init(struct acpi_subtable_header *header,
+const unsigned long end)
+{
+return false;
+}
+#endif
+
 static inline bool gicv3_its_host_has_its(void)
 {
 return false;
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/4] ARM: ITS: Add translation_id to host_its

2017-06-20 Thread Manish Jaggi
This patch adds a translation_id to host_its data structure.
Value stored in this id should be copied over to hardware domains
MADT table.

Signed-off-by: Manish Jaggi 
---
 xen/include/asm-arm/gic_v3_its.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/asm-arm/gic_v3_its.h b/xen/include/asm-arm/gic_v3_its.h
index 1fac1c7..96b910b 100644
--- a/xen/include/asm-arm/gic_v3_its.h
+++ b/xen/include/asm-arm/gic_v3_its.h
@@ -118,6 +118,8 @@ struct host_its {
 const struct dt_device_node *dt_node;
 paddr_t addr;
 paddr_t size;
+/* A unique value to identify each ITS */
+u32 translation_id;
 void __iomem *its_base;
 unsigned int devid_bits;
 unsigned int evid_bits;
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] docs: improve ARM passthrough doc

2017-06-20 Thread Stefano Stabellini
Add a warning: use passthrough with care.

Add a pointer to the gic device tree bindings. Add an explanation on how
to calculate irq numbers from device tree.

Add a brief explanation of the reg property and a pointer to the xl docs
for a description of the iomem property. Add a note that in the example
we are using different memory addresses for guests and host.

Signed-off-by: Stefano Stabellini 

---
Changes in v2:
- fix typo
- add more info on the interrupts property and different interrupt
  controllers

diff --git a/docs/misc/arm/passthrough.txt b/docs/misc/arm/passthrough.txt
index 082e9ab..323257d 100644
--- a/docs/misc/arm/passthrough.txt
+++ b/docs/misc/arm/passthrough.txt
@@ -12,7 +12,11 @@ property "xen,passthrough". The command to do it in U-Boot 
is:
 2) Create a partial device tree describing the device. The IRQ are mapped
 1:1 to the guest (i.e VIRQ == IRQ). For MMIO, you will have to find a hole
 in the guest memory layout (see xen/include/public/arch-arm.h, note that
-the layout is not stable and can change between versions of Xen).
+the layout is not stable and can change between versions of Xen). Please
+be aware that passing a partial device tree to a VM is a powerful tool,
+use it with care. In production, only allow assignment of devices which
+have been previously tested and known to work correctly when given to
+guests. 
 
 /dts-v1/;
 
@@ -48,6 +52,8 @@ Note:
 - #size-cells
 * See http://www.devicetree.org/Device_Tree_Usage for more
 information about device tree.
+* In this example, the device MMIO region is placed at a different
+address (0x1000) compared to the host address (0xfff51000)
 
 3) Compile the partial guest device with dtc (Device Tree Compiler).
 For our purpose, the compiled file will be called guest-midway.dtb and
@@ -60,3 +66,20 @@ dtdev = [ "/soc/ethernet@fff51000" ]
 irqs = [ 112, 113, 114 ]
 iomem = [ "0xfff51,1@0x1" ]
 
+Please refer to your platform docs for the MMIO ranges and interrupts.
+
+They can also be calculated from the original device tree (not
+recommended). You can read about the "interrupts" property format in the
+device tree bindings of the interrupt controller of your platform. For
+example, in the case of GICv2 see [arm,gic.txt]; in the case of GICv3
+see [arm,gic-v3.txt] in the Linux repository. For both GICv2 and GICv3
+the "interrupts" property format is the same: the first cell is the
+interrupt type, and the second cell is the interrupt number.  Given that
+SPI numbers start from 32, in this example 80 + 32 = 112.
+
+See man [xl.cfg] for the iomem format. The reg property is just a pair
+of address, then size numbers, each of them can occupy 1 or 2 cells.
+
+[arm,gic.txt]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/interrupt-controller/arm,gic.txt
+[arm,gic-v3.txt]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt
+[xl.cfg]: https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] docs: improve ARM passthrough doc

2017-06-20 Thread Stefano Stabellini
On Tue, 20 Jun 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 06/16/2017 09:29 PM, Stefano Stabellini wrote:
> > Add a warning: use passthrough with care.
> > 
> > Add a pointer to the gic device tree bindings. Add an explanation on how
> > to calculate irq numbers from device tree.
> > 
> > Add a brief explanation of the reg property and a pointer to the xl docs
> > for a description of the iomem property. Add a note that in the example
> > we are using different memory addresses for guests and host.
> > 
> > Signed-off-by: Stefano Stabellini 
> > 
> > diff --git a/docs/misc/arm/passthrough.txt b/docs/misc/arm/passthrough.txt
> > index 082e9ab..7140a61 100644
> > --- a/docs/misc/arm/passthrough.txt
> > +++ b/docs/misc/arm/passthrough.txt
> > @@ -12,7 +12,11 @@ property "xen,passthrough". The command to do it in
> > U-Boot is:
> >   2) Create a partial device tree describing the device. The IRQ are mapped
> >   1:1 to the guest (i.e VIRQ == IRQ). For MMIO, you will have to find a hole
> >   in the guest memory layout (see xen/include/public/arch-arm.h, note that
> > -the layout is not stable and can change between versions of Xen).
> > +the layout is not stable and can change between versions of Xen). Please
> > +be aware that passing a partial device tree to a VM is a powerful tool,
> > +use it with care. In production, only allow assignment of devices which
> > +have been previously tested and known to work correctly when given to
> > +guests.
> > /dts-v1/;
> >   @@ -48,6 +52,8 @@ Note:
> >   - #size-cells
> >   * See http://www.devicetree.org/Device_Tree_Usage for more
> >   information about device tree.
> > +* In this example, the device MMIO region is placed at a different
> > +address (0x1000) compared to the host address (0xfff51000)
> > 3) Compile the partial guest device with dtc (Device Tree Compiler).
> >   For our purpose, the compiled file will be called guest-midway.dtb and
> > @@ -60,3 +66,16 @@ dtdev = [ "/soc/ethernet@fff51000" ]
> >   irqs = [ 112, 113, 114 ]
> >   iomem = [ "0xfff51,1@0x1" ]
> >   +Please refer to your platform docs for the MMIO ranges and interrupts.
> > +
> > +They can also be calculated from the original device tree (not
> > +recommended). See [arm,gic.txt] in the Linux repository for a
> 
> [arm,gic.txt] documentation is only valid for GICv2. GICv3 has a different
> documentation, though the interrupt format is the same at the moment.
> 
> I think this should be clarified and explain that the interrupt format will
> depend on the virtual interrupt controller exposed to the guest.

I'll do, thanks


> > +description of the "interrupts" property format. For the GIC, the first
> > +cell is interrupt type, and the second cell is the interrupt number.
> > +Given that SPI numbers start from 32, in this example 80 + 32 = 112. > +
> > +See man [xl.cfg] for the iomem format. The reg property is just a pair
> > +of address, then size nunbers, each of them can occupy 1 or 2 cells.
> 
> s/nunbers/numbers/
> 
> > +
> > +[arm,gic.txt]:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/interrupt-controller/arm,gic.txt
> > +[xl.cfg]: https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html
> > 
> 
> Cheers,
> 
> -- 
> Julien Grall
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/3] xen-disk: add support for multi-page shared rings

2017-06-20 Thread Stefano Stabellini
On Tue, 20 Jun 2017, Paul Durrant wrote:
> The blkif protocol has had provision for negotiation of multi-page shared
> rings for some time now and many guest OS have support in their frontend
> drivers.
> 
> This patch makes the necessary modifications to xen-disk support a shared
> ring up to order 4 (i.e. 16 pages).
> 
> Signed-off-by: Paul Durrant 

Thanks for the patch!

> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> ---
>  hw/block/xen_disk.c | 141 
> 
>  1 file changed, 110 insertions(+), 31 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 9b06e3aa81..a9942d32db 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -36,8 +36,6 @@
>  
>  static int batch_maps   = 0;
>  
> -static int max_requests = 32;
> -
>  /* - */
>  
>  #define BLOCK_SIZE  512
> @@ -84,6 +82,8 @@ struct ioreq {
>  BlockAcctCookie acct;
>  };
>  
> +#define MAX_RING_PAGE_ORDER 4
> +
>  struct XenBlkDev {
>  struct XenDevicexendev;  /* must be first */
>  char*params;
> @@ -94,7 +94,8 @@ struct XenBlkDev {
>  booldirectiosafe;
>  const char  *fileproto;
>  const char  *filename;
> -int ring_ref;
> +unsigned intring_ref[1 << MAX_RING_PAGE_ORDER];
> +unsigned intnr_ring_ref;
>  void*sring;
>  int64_t file_blk;
>  int64_t file_size;
> @@ -110,6 +111,7 @@ struct XenBlkDev {
>  int requests_total;
>  int requests_inflight;
>  int requests_finished;
> +unsigned intmax_requests;
>  
>  /* Persistent grants extension */
>  gbooleanfeature_discard;
> @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
>  struct ioreq *ioreq = NULL;
>  
>  if (QLIST_EMPTY(>freelist)) {
> -if (blkdev->requests_total >= max_requests) {
> +if (blkdev->requests_total >= blkdev->max_requests) {
>  goto out;
>  }
>  /* allocate new struct */
> @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
>  ioreq_runio_qemu_aio(ioreq);
>  }
>  
> -if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> +if (blkdev->more_work && blkdev->requests_inflight < 
> blkdev->max_requests) {
>  qemu_bh_schedule(blkdev->bh);
>  }
>  }
> @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
>  blk_handle_requests(blkdev);
>  }
>  
> -/*
> - * We need to account for the grant allocations requiring contiguous
> - * chunks; the worst case number would be
> - * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> - * but in order to keep things simple just use
> - * 2 * max_req * max_seg.
> - */
> -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> -
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
> @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
>  if (xen_mode != XEN_EMULATE) {
>  batch_maps = 1;
>  }
> -if (xengnttab_set_max_grants(xendev->gnttabdev,
> -MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> -xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> -  strerror(errno));
> -}
>  }
>  
>  static void blk_parse_discard(struct XenBlkDev *blkdev)
> @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
>!blkdev->feature_grant_copy);
>  xenstore_write_be_int(>xendev, "info", info);
>  
> +xenstore_write_be_int(>xendev, "max-ring-page-order",
> +  MAX_RING_PAGE_ORDER);
> +
>  blk_parse_discard(blkdev);
>  
>  g_free(directiosafe);
> @@ -1058,12 +1049,25 @@ out_error:
>  return -1;
>  }
>  
> +/*
> + * We need to account for the grant allocations requiring contiguous
> + * chunks; the worst case number would be
> + * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> + * but in order to keep things simple just use
> + * 2 * max_req * max_seg.
> + */
> +#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> +
>  static int blk_connect(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
>  int pers, index, qflags;
>  bool readonly = true;
>  bool writethrough = true;
> +int order, ring_ref;
> +unsigned int ring_size, max_grants;
> +unsigned int i;
> +uint32_t *domids;
>  
>  /* read-only ? */
>  if (blkdev->directiosafe) {
> 

Re: [Xen-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-20 Thread Stefano Stabellini
On Tue, 20 Jun 2017, Paul Durrant wrote:
> If grant copy is available then it will always be used in preference to
> persistent maps. In this case feature-persistent should not be advertized
> to the frontend, otherwise it may needlessly copy data into persistently
> granted buffers.
> 
> Signed-off-by: Paul Durrant 

CC'ing Roger.

It is true that using feature-persistent together with grant copies is a
a very bad idea.

But this change enstablishes an explicit preference of
feature_grant_copy over feature-persistent in the xen_disk backend. It
is not obvious to me that it should be the case.

Why is feature_grant_copy (without feature-persistent) better than
feature-persistent (without feature_grant_copy)? Shouldn't we simply
avoid grant copies to copy data to persistent grants?


> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> ---
>  hw/block/xen_disk.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a22805fbc..9b06e3aa81 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
>  
>  blkdev->file_blk  = BLOCK_SIZE;
>  
> +blkdev->feature_grant_copy =
> +(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 
> 0);
> +
> +xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
> +  blkdev->feature_grant_copy ? "enabled" : "disabled");
> +
>  /* fill info
>   * blk_connect supplies sector-size and sectors
>   */
>  xenstore_write_be_int(>xendev, "feature-flush-cache", 1);
> -xenstore_write_be_int(>xendev, "feature-persistent", 1);
> +xenstore_write_be_int(>xendev, "feature-persistent",
> +  !blkdev->feature_grant_copy);
>  xenstore_write_be_int(>xendev, "info", info);
>  
>  blk_parse_discard(blkdev);
> @@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
>  
>  xen_be_bind_evtchn(>xendev);
>  
> -blkdev->feature_grant_copy =
> -(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 
> 0);
> -
> -xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
> -  blkdev->feature_grant_copy ? "enabled" : "disabled");
> -
>  xen_pv_printf(>xendev, 1, "ok: proto %s, ring-ref %d, "
>"remote port %d, local port %d\n",
>blkdev->xendev.protocol, blkdev->ring_ref,
> -- 
> 2.11.0
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] xen: idle_loop: either deal with tasklets or go idle

2017-06-20 Thread Stefano Stabellini
On Tue, 20 Jun 2017, Dario Faggioli wrote:
> In fact, there are two kinds of tasklets: vCPU and
> softirq context. When we want to do vCPU context tasklet
> work, we force the idle vCPU (of a particular pCPU) into
> execution, and run it from there.
> 
> This means there are two possible reasons for choosing
> to run the idle vCPU:
> 1) we want a pCPU to go idle,
> 2) we want to run some vCPU context tasklet work.
> 
> If we're in case 2), it does not make sense to even
> try to go idle (as the check will _always_ fail).
> 
> This patch rearranges the code of the body of idle
> vCPUs, so that we actually check whether we are in
> case 1) or 2), and act accordingly.
> 
> As a matter of fact, this also means that we do not
> check if there's any tasklet work to do after waking
> up from idle. This is not a problem, because:
> a) for softirq context tasklets, if any is queued
>"during" wakeup from idle, TASKLET_SOFTIRQ is
>raised, and the call to do_softirq() (which is still
>happening *after* the wakeup) will take care of it;
> b) for vCPU context tasklets, if any is queued "during"
>wakeup from idle, SCHEDULE_SOFTIRQ is raised and
>do_softirq() (happening after the wakeup) calls
>the scheduler. The scheduler sees that there is
>tasklet work pending and confirms the idle vCPU
>in execution, which then will get to execute
>do_tasklet().
> 
> Signed-off-by: Dario Faggioli 

Reviewed-by: Stefano Stabellini 


> ---
> Cc: Andrew Cooper 
> Cc: Julien Grall 
> Cc: Stefano Stabellini 
> Cc: Boris Ostrovsky 
> Cc: Jan Beulich 
> ---
> Changes from v1:
> * drop the pointless parentheses and indirection of pm_idle (in x86's idle
>   loop);
> * remove the 'cpu' input parameter to do_tasklet(), as suggested during 
> review;
> * in do_tasklet(), convert the check that there is tasklet work to do into an
>   ASSERT() to the same effect, as suggested during review;
> * add a comment in cpu_is_haltable() on why we check the per-cpu
>   tasklet_work_to_do variable directly, rather than calling the new
>   tasklet_work_to_do() helper.
> ---
>  xen/arch/arm/domain.c |   21 ++---
>  xen/arch/x86/domain.c |   12 +---
>  xen/common/tasklet.c  |   12 
>  xen/include/xen/sched.h   |5 +
>  xen/include/xen/tasklet.h |   10 ++
>  5 files changed, 46 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 76310ed..2dc8b0a 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -41,20 +41,27 @@ DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
>  
>  void idle_loop(void)
>  {
> +unsigned int cpu = smp_processor_id();
> +
>  for ( ; ; )
>  {
> -if ( cpu_is_offline(smp_processor_id()) )
> +if ( cpu_is_offline(cpu) )
>  stop_cpu();
>  
> -local_irq_disable();
> -if ( cpu_is_haltable(smp_processor_id()) )
> +/* Are we here for running vcpu context tasklets, or for idling? */
> +if ( unlikely(tasklet_work_to_do(cpu)) )
> +do_tasklet();
> +else
>  {
> -dsb(sy);
> -wfi();
> +local_irq_disable();
> +if ( cpu_is_haltable(cpu) )
> +{
> +dsb(sy);
> +wfi();
> +}
> +local_irq_enable();
>  }
> -local_irq_enable();
>  
> -do_tasklet();
>  do_softirq();
>  /*
>   * We MUST be last (or before dsb, wfi). Otherwise after we get the
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 49388f4..3a061a9 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -112,12 +112,18 @@ static void play_dead(void)
>  
>  static void idle_loop(void)
>  {
> +unsigned int cpu = smp_processor_id();
> +
>  for ( ; ; )
>  {
> -if ( cpu_is_offline(smp_processor_id()) )
> +if ( cpu_is_offline(cpu) )
>  play_dead();
> -(*pm_idle)();
> -do_tasklet();
> +
> +/* Are we here for running vcpu context tasklets, or for idling? */
> +if ( unlikely(tasklet_work_to_do(cpu)) )
> +do_tasklet();
> +else
> +pm_idle();
>  do_softirq();
>  /*
>   * We MUST be last (or before pm_idle). Otherwise after we get the
> diff --git a/xen/common/tasklet.c b/xen/common/tasklet.c
> index 365a777..0f0a6f8 100644
> --- a/xen/common/tasklet.c
> +++ b/xen/common/tasklet.c
> @@ -111,11 +111,15 @@ void do_tasklet(void)
>  struct list_head *list = _cpu(tasklet_list, cpu);
>  
>  /*
> - * Work must be enqueued *and* scheduled. Otherwise there is no work to
> - * do, and/or scheduler needs to run to update idle vcpu priority.
> + * We want to be sure any 

Re: [Xen-devel] [PATCH] xen/disk: don't leak stack data via response ring

2017-06-20 Thread Stefano Stabellini
On Tue, 20 Jun 2017, Jan Beulich wrote:
> Rather than constructing a local structure instance on the stack, fill
> the fields directly on the shared ring, just like other (Linux)
> backends do. Build on the fact that all response structure flavors are
> actually identical (the old code did make this assumption too).
> 
> This is XSA-216.
> 
> Reported by: Anthony Perard 
> Signed-off-by: Jan Beulich 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Acked-by: Anthony PERARD 
> ---
> v2: Add QEMU_PACKED to fix handling 32-bit guests by 64-bit qemu.
> 
> --- a/hw/block/xen_blkif.h
> +++ b/hw/block/xen_blkif.h
> @@ -14,9 +14,6 @@
>  struct blkif_common_request {
>  char dummy;
>  };
> -struct blkif_common_response {
> -char dummy;
> -};
>  
>  /* i386 protocol version */
>  #pragma pack(push, 4)
> @@ -36,13 +33,7 @@ struct blkif_x86_32_request_discard {
>  blkif_sector_t sector_number;/* start sector idx on disk (r/w only)  
> */
>  uint64_t   nr_sectors;   /* # of contiguous sectors to discard   
> */
>  };
> -struct blkif_x86_32_response {
> -uint64_tid;  /* copied from request */
> -uint8_t operation;   /* copied from request */
> -int16_t status;  /* BLKIF_RSP_???   */
> -};
>  typedef struct blkif_x86_32_request blkif_x86_32_request_t;
> -typedef struct blkif_x86_32_response blkif_x86_32_response_t;
>  #pragma pack(pop)
>  
>  /* x86_64 protocol version */
> @@ -62,20 +53,14 @@ struct blkif_x86_64_request_discard {
>  blkif_sector_t sector_number;/* start sector idx on disk (r/w only)  
> */
>  uint64_t   nr_sectors;   /* # of contiguous sectors to discard   
> */
>  };
> -struct blkif_x86_64_response {
> -uint64_t   __attribute__((__aligned__(8))) id;
> -uint8_t operation;   /* copied from request */
> -int16_t status;  /* BLKIF_RSP_???   */
> -};
>
>  typedef struct blkif_x86_64_request blkif_x86_64_request_t;
> -typedef struct blkif_x86_64_response blkif_x86_64_response_t;
>  
>  DEFINE_RING_TYPES(blkif_common, struct blkif_common_request,
> -  struct blkif_common_response);
> +  struct blkif_response);
>  DEFINE_RING_TYPES(blkif_x86_32, struct blkif_x86_32_request,
> -  struct blkif_x86_32_response);
> +  struct blkif_response QEMU_PACKED);

In my test, the previous sizes and alignments of the response structs
were (on both x86_32 and x86_64):

sizeof(blkif_x86_32_response)=12   sizeof(blkif_x86_64_response)=16
align(blkif_x86_32_response)=4 align(blkif_x86_64_response)=8

While with these changes are now, when compiled on x86_64:
sizeof(blkif_x86_32_response)=11   sizeof(blkif_x86_64_response)=16
align(blkif_x86_32_response)=1 align(blkif_x86_64_response)=8

when compiled on x86_32:
sizeof(blkif_x86_32_response)=11   sizeof(blkif_x86_64_response)=12
align(blkif_x86_32_response)=1 align(blkif_x86_64_response)=4

Did I do my tests wrong?

QEMU_PACKED is not the same as #pragma pack(push, 4). In fact, it is the
same as #pragma pack(push, 1), causing the struct to be densely packed,
leaving no padding whatsever.

In addition, without __attribute__((__aligned__(8))),
blkif_x86_64_response won't be 8 bytes aligned when built on x86_32.

Am I missing something?


>  DEFINE_RING_TYPES(blkif_x86_64, struct blkif_x86_64_request,
> -  struct blkif_x86_64_response);
> +  struct blkif_response);
>  
>  union blkif_back_rings {
>  blkif_back_ring_tnative;
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -769,31 +769,30 @@ static int blk_send_response_one(struct
>  struct XenBlkDev  *blkdev = ioreq->blkdev;
>  int   send_notify   = 0;
>  int   have_requests = 0;
> -blkif_response_t  resp;
> -void  *dst;
> -
> -resp.id= ioreq->req.id;
> -resp.operation = ioreq->req.operation;
> -resp.status= ioreq->status;
> +blkif_response_t  *resp;
>  
>  /* Place on the response ring for the relevant domain. */
>  switch (blkdev->protocol) {
>  case BLKIF_PROTOCOL_NATIVE:
> -dst = RING_GET_RESPONSE(>rings.native, 
> blkdev->rings.native.rsp_prod_pvt);
> +resp = RING_GET_RESPONSE(>rings.native,
> + blkdev->rings.native.rsp_prod_pvt);
>  break;
>  case BLKIF_PROTOCOL_X86_32:
> -dst = RING_GET_RESPONSE(>rings.x86_32_part,
> -blkdev->rings.x86_32_part.rsp_prod_pvt);
> +resp = RING_GET_RESPONSE(>rings.x86_32_part,
> + blkdev->rings.x86_32_part.rsp_prod_pvt);
>  break;
>  case BLKIF_PROTOCOL_X86_64:
> -dst = RING_GET_RESPONSE(>rings.x86_64_part,
> -

[Xen-devel] [PATCH] x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise

2017-06-20 Thread Borislav Petkov
From: Juergen Gross 

When running under Xen as dom0, /dev/mcelog is being provided by Xen
instead of the normal mcelog character device of the MCE core. Convert
an error message being issued by the MCE core in this case to an
informative message that Xen has registered the device.

Signed-off-by: Juergen Gross 
Cc: Tony Luck 
Cc: linux-edac 
Cc: x86-ml 
Cc: xen-de...@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170614084059.19294-1-jgr...@suse.com
[ Massage a bit. ]
Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/cpu/mcheck/dev-mcelog.c | 8 +++-
 drivers/xen/mcelog.c| 2 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/dev-mcelog.c 
b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
index a80427c30c93..10cec43aac38 100644
--- a/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
+++ b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
@@ -415,9 +415,15 @@ static __init int dev_mcelog_init_device(void)
/* register character device /dev/mcelog */
err = misc_register(_chrdev_device);
if (err) {
-   pr_err("Unable to init device /dev/mcelog (rc: %d)\n", err);
+   if (err == -EBUSY)
+   /* Xen dom0 might have registered the device already. */
+   pr_info("Unable to init device /dev/mcelog, already 
registered");
+   else
+   pr_err("Unable to init device /dev/mcelog (rc: %d)\n", 
err);
+
return err;
}
+
mce_register_decode_chain(_mcelog_nb);
return 0;
 }
diff --git a/drivers/xen/mcelog.c b/drivers/xen/mcelog.c
index a493c7315e94..6cc1c15bcd84 100644
--- a/drivers/xen/mcelog.c
+++ b/drivers/xen/mcelog.c
@@ -408,6 +408,8 @@ static int __init xen_late_init_mcelog(void)
if (ret)
goto deregister;
 
+   pr_info("/dev/mcelog registered by Xen\n");
+
return 0;
 
 deregister:
-- 
2.13.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 07/36] x86/mm: Don't use phys_to_virt in ioremap() if SME is active

2017-06-20 Thread Thomas Gleixner
On Fri, 16 Jun 2017, Tom Lendacky wrote:

> Currently there is a check if the address being mapped is in the ISA
> range (is_ISA_range()), and if it is then phys_to_virt() is used to
> perform the mapping.  When SME is active, however, this will result
> in the mapping having the encryption bit set when it is expected that
> an ioremap() should not have the encryption bit set. So only use the
> phys_to_virt() function if SME is not active

This does not make sense to me. What the heck has phys_to_virt() to do with
the encryption bit. Especially why would the encryption bit be set on that
mapping in the first place?

I'm probably missing something, but this want's some coherent explanation
understandable by mere mortals both in the changelog and the code comment.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 9/9] arm/mem_access: Walk the guest's pt in software

2017-06-20 Thread Sergej Proskurin
Hi Tamas,

[...]

>> +if ( guest_walk_tables(v, gva, , ) < 0 )
>> +/*
>> + * The software gva to ipa translation can still fail, e.g., if 
>> the
>> + * gva is not mapped.
>> + */
> 
> If you end up sending another round of the series, I would prefer to
> see this comment before the if statement (but I wouldn't hold up the
> series over that).
> 
>> +goto err;
>> +

Alright, will do. Thanks!

Cheers,
~Sergej

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 06/36] x86/mm: Add Secure Memory Encryption (SME) support

2017-06-20 Thread Thomas Gleixner
On Fri, 16 Jun 2017, Tom Lendacky wrote:
>  
> +config ARCH_HAS_MEM_ENCRYPT
> + def_bool y
> + depends on X86

That one is silly. The config switch is in the x86 KConfig file, so X86 is
on. If you intended to move this to some generic place outside of
x86/Kconfig then this should be

config ARCH_HAS_MEM_ENCRYPT
bool

and x86/Kconfig should have

select ARCH_HAS_MEM_ENCRYPT

and that should be selected by AMD_MEM_ENCRYPT

> +config AMD_MEM_ENCRYPT
> + bool "AMD Secure Memory Encryption (SME) support"
> + depends on X86_64 && CPU_SUP_AMD
> + ---help---
> +   Say yes to enable support for the encryption of system memory.
> +   This requires an AMD processor that supports Secure Memory
> +   Encryption (SME).

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 9/9] arm/mem_access: Walk the guest's pt in software

2017-06-20 Thread Tamas K Lengyel
On Tue, Jun 20, 2017 at 2:33 PM, Sergej Proskurin
 wrote:
> In this commit, we make use of the gpt walk functionality introduced in
> the previous commits. If mem_access is active, hardware-based gva to ipa
> translation might fail, as gva_to_ipa uses the guest's translation
> tables, access to which might be restricted by the active VTTBR. To
> side-step potential translation errors in the function
> p2m_mem_access_check_and_get_page due to restricted memory (e.g. to the
> guest's page tables themselves), we walk the guest's page tables in
> software.
>
> Signed-off-by: Sergej Proskurin 

Acked-by: Tamas K Lengyel 

> ---
> Cc: Razvan Cojocaru 
> Cc: Tamas K Lengyel 
> Cc: Stefano Stabellini 
> Cc: Julien Grall 
> ---
> v2: Check the returned access rights after walking the guest's page tables in
> the function p2m_mem_access_check_and_get_page.
>
> v3: Adapt Function names and parameter.
>
> v4: Comment why we need to fail if the permission flags that are
> requested by the caller do not satisfy the mapped page.
>
> Cosmetic fix that simplifies the if-statement checking for the
> GV2M_WRITE permission.
> ---
>  xen/arch/arm/mem_access.c | 31 ++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
> index bcf49f5c15..9133ac8f03 100644
> --- a/xen/arch/arm/mem_access.c
> +++ b/xen/arch/arm/mem_access.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  static int __p2m_get_mem_access(struct domain *d, gfn_t gfn,
>  xenmem_access_t *access)
> @@ -101,6 +102,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
> long flag,
>const struct vcpu *v)
>  {
>  long rc;
> +unsigned int perms;
>  paddr_t ipa;
>  gfn_t gfn;
>  mfn_t mfn;
> @@ -110,8 +112,35 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
> long flag,
>  struct p2m_domain *p2m = >domain->arch.p2m;
>
>  rc = gva_to_ipa(gva, , flag);
> +
> +/*
> + * In case mem_access is active, hardware-based gva_to_ipa translation
> + * might fail. Since gva_to_ipa uses the guest's translation tables, 
> access
> + * to which might be restricted by the active VTTBR, we perform a gva to
> + * ipa translation in software.
> + */
>  if ( rc < 0 )
> -goto err;
> +{
> +if ( guest_walk_tables(v, gva, , ) < 0 )
> +/*
> + * The software gva to ipa translation can still fail, e.g., if 
> the
> + * gva is not mapped.
> + */

If you end up sending another round of the series, I would prefer to
see this comment before the if statement (but I wouldn't hold up the
series over that).

> +goto err;
> +

Thanks,
Tamas

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 19/36] x86/mm: Add support to access boot related data in the clear

2017-06-20 Thread Borislav Petkov
On Fri, Jun 16, 2017 at 01:53:26PM -0500, Tom Lendacky wrote:
> Boot data (such as EFI related data) is not encrypted when the system is
> booted because UEFI/BIOS does not run with SME active. In order to access
> this data properly it needs to be mapped decrypted.
> 
> Update early_memremap() to provide an arch specific routine to modify the
> pagetable protection attributes before they are applied to the new
> mapping. This is used to remove the encryption mask for boot related data.
> 
> Update memremap() to provide an arch specific routine to determine if RAM
> remapping is allowed.  RAM remapping will cause an encrypted mapping to be
> generated. By preventing RAM remapping, ioremap_cache() will be used
> instead, which will provide a decrypted mapping of the boot related data.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/io.h |5 +
>  arch/x86/mm/ioremap.c |  179 
> +
>  include/linux/io.h|2 +
>  kernel/memremap.c |   20 -
>  mm/early_ioremap.c|   18 -
>  5 files changed, 217 insertions(+), 7 deletions(-)

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 5/9] arm/mem_access: Extend BIT-operations to unsigned long long

2017-06-20 Thread Sergej Proskurin
We extend the BIT macro to using values of unsigned long long as to
enable setting bits of 64-bit registers on AArch32.  In addition, this
commit adds a define holding the register width of 64 bit double-word
registers. This define simplifies using the associated constants in the
following commits.

Signed-off-by: Sergej Proskurin 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v4: We reused the previous commit with the msg "arm/mem_access: Add
defines holding the width of 32/64bit regs" from v3, as we can reuse
the already existing define BITS_PER_WORD.
---
 xen/include/asm-arm/bitops.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/include/asm-arm/bitops.h b/xen/include/asm-arm/bitops.h
index bda889841b..c3486d497c 100644
--- a/xen/include/asm-arm/bitops.h
+++ b/xen/include/asm-arm/bitops.h
@@ -21,7 +21,8 @@
 #define __clear_bit(n,p)  clear_bit(n,p)
 
 #define BITS_PER_WORD   32
-#define BIT(nr) (1UL << (nr))
+#define BITS_PER_DOUBLE_WORD64
+#define BIT(nr) (1ULL << (nr))
 #define BIT_MASK(nr)(1UL << ((nr) % BITS_PER_WORD))
 #define BIT_WORD(nr)((nr) / BITS_PER_WORD)
 #define BITS_PER_BYTE   8
-- 
2.12.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 9/9] arm/mem_access: Walk the guest's pt in software

2017-06-20 Thread Sergej Proskurin
In this commit, we make use of the gpt walk functionality introduced in
the previous commits. If mem_access is active, hardware-based gva to ipa
translation might fail, as gva_to_ipa uses the guest's translation
tables, access to which might be restricted by the active VTTBR. To
side-step potential translation errors in the function
p2m_mem_access_check_and_get_page due to restricted memory (e.g. to the
guest's page tables themselves), we walk the guest's page tables in
software.

Signed-off-by: Sergej Proskurin 
---
Cc: Razvan Cojocaru 
Cc: Tamas K Lengyel 
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v2: Check the returned access rights after walking the guest's page tables in
the function p2m_mem_access_check_and_get_page.

v3: Adapt Function names and parameter.

v4: Comment why we need to fail if the permission flags that are
requested by the caller do not satisfy the mapped page.

Cosmetic fix that simplifies the if-statement checking for the
GV2M_WRITE permission.
---
 xen/arch/arm/mem_access.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
index bcf49f5c15..9133ac8f03 100644
--- a/xen/arch/arm/mem_access.c
+++ b/xen/arch/arm/mem_access.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int __p2m_get_mem_access(struct domain *d, gfn_t gfn,
 xenmem_access_t *access)
@@ -101,6 +102,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
long flag,
   const struct vcpu *v)
 {
 long rc;
+unsigned int perms;
 paddr_t ipa;
 gfn_t gfn;
 mfn_t mfn;
@@ -110,8 +112,35 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
long flag,
 struct p2m_domain *p2m = >domain->arch.p2m;
 
 rc = gva_to_ipa(gva, , flag);
+
+/*
+ * In case mem_access is active, hardware-based gva_to_ipa translation
+ * might fail. Since gva_to_ipa uses the guest's translation tables, access
+ * to which might be restricted by the active VTTBR, we perform a gva to
+ * ipa translation in software.
+ */
 if ( rc < 0 )
-goto err;
+{
+if ( guest_walk_tables(v, gva, , ) < 0 )
+/*
+ * The software gva to ipa translation can still fail, e.g., if the
+ * gva is not mapped.
+ */
+goto err;
+
+/*
+ * Check permissions that are assumed by the caller. For instance in
+ * case of guestcopy, the caller assumes that the translated page can
+ * be accessed with requested permissions. If this is not the case, we
+ * should fail.
+ *
+ * Please note that we do not check for the GV2M_EXEC permission. Yet,
+ * since the hardware-based translation through gva_to_ipa does not
+ * test for execute permissions this check can be left out.
+ */
+if ( (flag & GV2M_WRITE) && !(perms & GV2M_WRITE) )
+goto err;
+}
 
 gfn = gaddr_to_gfn(ipa);
 
-- 
2.12.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 7/9] arm/mem_access: Add long-descriptor based gpt

2017-06-20 Thread Sergej Proskurin
This commit adds functionality to walk the guest's page tables using the
long-descriptor translation table format for both ARMv7 and ARMv8.
Similar to the hardware architecture, the implementation supports
different page granularities (4K, 16K, and 64K). The implementation is
based on ARM DDI 0487B.a J1-5922, J1-5999, and ARM DDI 0406C.b B3-1510.

Note that the current implementation lacks support for Large VA/PA on
ARMv8.2 architectures (LVA/LPA, 52-bit virtual and physical address
sizes). The associated location in the code is marked appropriately.

Signed-off-by: Sergej Proskurin 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v2: Use TCR_SZ_MASK instead of TTBCR_SZ_MASK for ARM 32-bit guests using
the long-descriptor translation table format.

Cosmetic fixes.

v3: Move the implementation to ./xen/arch/arm/guest_copy.c.

Remove the array strides and declare the array grainsizes as static
const instead of just const to reduce the function stack overhead.

Move parts of the funtion guest_walk_ld into the static functions
get_ttbr_and_gran_64bit and get_top_bit to reduce complexity.

Use the macro BIT(x) instead of (1UL << x).

Add more comments && Cosmetic fixes.

v4: Move functionality responsible for determining the configured IPA
output-size into a separate function get_ipa_output_size. In this
function, we remove the previously used switch statement, which was
responsible for distinguishing between different IPA output-sizes.
Instead, we retrieve the information from the introduced ipa_sizes
array.

Remove the defines GRANULE_SIZE_INDEX_* and TTBR0_VALID from
guest_walk.h. Instead, introduce the enums granule_size_index
active_ttbr directly inside of guest_walk.c so that the associated
fields don't get exported.

Adapt the function to the new parameter of type "struct vcpu *".

Remove support for 52bit IPA output-sizes entirely from this commit.

Use lpae_* helpers instead of p2m_* helpers.

Cosmetic fixes & Additional comments.
---
 xen/arch/arm/guest_walk.c | 385 +-
 1 file changed, 383 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
index b8bb553a6e..c37c595157 100644
--- a/xen/arch/arm/guest_walk.c
+++ b/xen/arch/arm/guest_walk.c
@@ -16,6 +16,8 @@
  */
 
 #include 
+#include 
+#include 
 
 /*
  * The function guest_walk_sd translates a given GVA into an IPA using the
@@ -33,6 +35,142 @@ static int guest_walk_sd(const struct vcpu *v,
 }
 
 /*
+ * Get the IPA output_size (configured in TCR_EL1) that shall be used for the
+ * long-descriptor based translation table walk.
+ */
+static unsigned int get_ipa_output_size(struct domain *d, register_t tcr)
+{
+unsigned int output_size;
+uint64_t ips;
+
+static const uint64_t ipa_sizes[7] = {
+TCR_EL1_IPS_32_BIT_VAL,
+TCR_EL1_IPS_36_BIT_VAL,
+TCR_EL1_IPS_40_BIT_VAL,
+TCR_EL1_IPS_42_BIT_VAL,
+TCR_EL1_IPS_44_BIT_VAL,
+TCR_EL1_IPS_48_BIT_VAL,
+TCR_EL1_IPS_52_BIT_VAL
+};
+
+if ( is_64bit_domain(d) )
+{
+/* Get the intermediate physical address size. */
+ips = (tcr & TCR_EL1_IPS_MASK) >> TCR_EL1_IPS_SHIFT;
+
+/*
+ * Return an error on reserved IPA output-sizes and if the IPA
+ * output-size is 52bit.
+ *
+ * XXX: 52 bit output_size is not supported yet.
+ */
+if ( ips > TCR_EL1_IPS_48_BIT )
+return -EFAULT;
+
+output_size = ipa_sizes[ips];
+}
+else
+output_size = TCR_EL1_IPS_40_BIT_VAL;
+
+return output_size;
+}
+
+/* Normalized page granule size indices. */
+enum granule_size_index {
+GRANULE_SIZE_INDEX_4K,
+GRANULE_SIZE_INDEX_16K,
+GRANULE_SIZE_INDEX_64K
+};
+
+/* Represent whether TTBR0 or TTBR1 is active. */
+enum active_ttbr {
+TTBR0_ACTIVE,
+TTBR1_ACTIVE
+};
+
+/*
+ * Select the TTBR(0|1)_EL1 that will be used for address translation using the
+ * long-descriptor translation table format and return the page granularity
+ * that is used by the selected TTBR.
+ */
+static bool get_ttbr_and_gran_64bit(uint64_t *ttbr, unsigned int *gran,
+register_t tcr, enum active_ttbr ttbrx)
+{
+bool disabled;
+
+if ( ttbrx == TTBR0_ACTIVE )
+{
+/* Normalize granule size. */
+switch ( tcr & TCR_TG0_MASK )
+{
+case TCR_TG0_16K:
+*gran = GRANULE_SIZE_INDEX_16K;
+break;
+case TCR_TG0_64K:
+*gran = GRANULE_SIZE_INDEX_64K;
+break;
+default:
+*gran = GRANULE_SIZE_INDEX_4K;
+}
+
+/* Use TTBR0 for GVA to IPA translation. */
+*ttbr = READ_SYSREG64(TTBR0_EL1);
+
+/* If TCR.EPD0 is set, translations using TTBR0 are disabled. */
+disabled 

[Xen-devel] [PATCH v4 2/9] arm/mem_access: Add defines supporting PTs with varying page sizes

2017-06-20 Thread Sergej Proskurin
The ARMv8 architecture supports pages with different (4K, 16K, and 64K) sizes.
To enable guest page table walks for various configurations, this commit
extends the defines and helpers of the current implementation.

Signed-off-by: Sergej Proskurin 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v3: Eliminate redundant macro definitions by introducing generic macros.

v4: Replace existing macros with ones that generate static inline
helpers as to ease the readability of the code.

Move the introduced code into lpae.h
---
 xen/include/asm-arm/lpae.h | 67 ++
 1 file changed, 67 insertions(+)

diff --git a/xen/include/asm-arm/lpae.h b/xen/include/asm-arm/lpae.h
index 6fbf7c606c..2913428e96 100644
--- a/xen/include/asm-arm/lpae.h
+++ b/xen/include/asm-arm/lpae.h
@@ -151,6 +151,73 @@ static inline bool lpae_is_superpage(lpae_t pte, unsigned 
int level)
 return (level < 3) && lpae_mapping(pte);
 }
 
+/*
+ * The ARMv8 architecture supports pages with different sizes (4K, 16K, and
+ * 64K). To enable guest page table walks for various configurations, the
+ * following helpers enable walking the guest's translation table with varying
+ * page size granularities.
+ */
+
+#define LPAE_SHIFT_4K   (9)
+#define LPAE_SHIFT_16K  (11)
+#define LPAE_SHIFT_64K  (13)
+
+#define lpae_entries(gran)  (_AC(1,U) << LPAE_SHIFT_##gran)
+#define lpae_entry_mask(gran)   (lpae_entries(gran) - 1)
+
+#define PAGE_SHIFT_4K   (12)
+#define PAGE_SHIFT_16K  (14)
+#define PAGE_SHIFT_64K  (16)
+
+#define third_shift(gran)   (PAGE_SHIFT_##gran)
+#define third_size(gran)((paddr_t)1 << third_shift(gran))
+
+#define second_shift(gran)  (third_shift(gran) + LPAE_SHIFT_##gran)
+#define second_size(gran)   ((paddr_t)1 << second_shift(gran))
+
+#define first_shift(gran)   (second_shift(gran) + LPAE_SHIFT_##gran)
+#define first_size(gran)((paddr_t)1 << first_shift(gran))
+
+/* Note that there is no zeroeth lookup level with a 64K granule size. */
+#define zeroeth_shift(gran) (first_shift(gran) + LPAE_SHIFT_##gran)
+#define zeroeth_size(gran)  ((paddr_t)1 << zeroeth_shift(gran))
+
+#define GUEST_TABLE_OFFSET(offs, gran)  ((paddr_t)(offs) & 
lpae_entry_mask(gran))
+#define third_guest_table_offset(gva, gran) GUEST_TABLE_OFFSET((gva >> 
third_shift(gran)), gran)
+#define second_guest_table_offset(gva, gran)GUEST_TABLE_OFFSET((gva >> 
second_shift(gran)), gran)
+#define first_guest_table_offset(gva, gran) GUEST_TABLE_OFFSET((gva >> 
first_shift(gran)), gran)
+#define zeroeth_guest_table_offset(gva, gran)   GUEST_TABLE_OFFSET((gva >> 
zeroeth_shift(gran)), gran)
+
+#define GUEST_TABLE_OFFSET_HELPERS(gran)\
+static inline vaddr_t third_guest_table_offset_##gran##K(vaddr_t gva)   \
+{   \
+return third_guest_table_offset(gva, gran##K);  \
+}   \
+\
+static inline vaddr_t second_guest_table_offset_##gran##K(vaddr_t gva)  \
+{   \
+return second_guest_table_offset(gva, gran##K); \
+}   \
+\
+static inline vaddr_t first_guest_table_offset_##gran##K(vaddr_t gva)   \
+{   \
+return first_guest_table_offset(gva, gran##K);  \
+}   \
+\
+static inline vaddr_t zeroeth_guest_table_offset_##gran##K(vaddr_t gva) \
+{   \
+if ( gran == 64 )   \
+return 0;   \
+else\
+return zeroeth_guest_table_offset((paddr_t)gva, gran##K);   \
+}   \
+
+GUEST_TABLE_OFFSET_HELPERS(4);
+#ifdef CONFIG_ARM_64
+GUEST_TABLE_OFFSET_HELPERS(16);
+GUEST_TABLE_OFFSET_HELPERS(64);
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 /*
-- 
2.12.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 0/9] arm/mem_access: Walk guest page tables in SW if mem_access is active

2017-06-20 Thread Sergej Proskurin
Hi all,

The function p2m_mem_access_check_and_get_page is called from the function
get_page_from_gva if mem_access is active and the hardware-aided translation of
the given guest virtual address (gva) into machine address fails. That is, if
the stage-2 translation tables constrain access to the guests's page tables,
hardware-assisted translation will fail. The idea of the function
p2m_mem_access_check_and_get_page is thus to translate the given gva and check
the requested access rights in software. However, as the current implementation
of p2m_mem_access_check_and_get_page makes use of the hardware-aided gva to ipa
translation, the translation might also fail because of reasons stated above
and will become equally relevant for the altp2m implementation on ARM.  As
such, we provide a software guest translation table walk to address the above
mentioned issue. We submit this patch series as an RFC to discuss the
appropriate location for the code and further functionality required to fix the
above concerns.

The current version of the implementation supports translation of both the
short-descriptor as well as the long-descriptor translation table format on
ARMv7 and ARMv8 (AArch32/AArch64).

This revised version incorporates the comments of the previous patch series and
mainly changes the introduced data-structures and defines to simplify code.
Please note that this patch series is based on Julien Grall's patch series
"xen/arm: Move LPAE definition in a separate header"[1].

The following patch series can be found on Github[0].

Cheers,
~Sergej

[0] https://github.com/sergej-proskurin/xen (branch arm-gpt-walk-v4)
[1] https://lists.xen.org/archives/html/xen-devel/2017-06/msg02095.html

Sergej Proskurin (9):
  arm/mem_access: Add (TCR_|TTBCR_)* defines
  arm/mem_access: Add defines supporting PTs with varying page sizes
  arm/mem_access: Add short-descriptor pte typedefs
  arm/mem_access: Introduce GV2M_EXEC permission
  arm/mem_access: Extend BIT-operations to unsigned long long
  arm/mem_access: Add software guest-page-table walk
  arm/mem_access: Add long-descriptor based gpt
  arm/mem_access: Add short-descriptor based gpt
  arm/mem_access: Walk the guest's pt in software

 xen/arch/arm/Makefile|   1 +
 xen/arch/arm/guest_walk.c| 635 +++
 xen/arch/arm/mem_access.c|  31 +-
 xen/include/asm-arm/bitops.h |   3 +-
 xen/include/asm-arm/guest_walk.h |  35 +++
 xen/include/asm-arm/lpae.h   |  67 +
 xen/include/asm-arm/page.h   |   1 +
 xen/include/asm-arm/processor.h  |  69 -
 xen/include/asm-arm/short-desc.h | 108 +++
 9 files changed, 944 insertions(+), 6 deletions(-)
 create mode 100644 xen/arch/arm/guest_walk.c
 create mode 100644 xen/include/asm-arm/guest_walk.h
 create mode 100644 xen/include/asm-arm/short-desc.h

-- 
2.12.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 4/9] arm/mem_access: Introduce GV2M_EXEC permission

2017-06-20 Thread Sergej Proskurin
We extend the current implementation by an additional permission,
GV2M_EXEC, which will be used to describe execute permissions of PTE's
as part of our guest translation table walk implementation.

Signed-off-by: Sergej Proskurin 
Acked-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
 xen/include/asm-arm/page.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index cef2f28914..b8d641bfaf 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -90,6 +90,7 @@
 /* Flags for get_page_from_gva, gvirt_to_maddr etc */
 #define GV2M_READ  (0u<<0)
 #define GV2M_WRITE (1u<<0)
+#define GV2M_EXEC  (1u<<1)
 
 #ifndef __ASSEMBLY__
 
-- 
2.12.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 6/9] arm/mem_access: Add software guest-page-table walk

2017-06-20 Thread Sergej Proskurin
The function p2m_mem_access_check_and_get_page in mem_access.c
translates a gva to an ipa by means of the hardware functionality of the
ARM architecture. This is implemented in the function gva_to_ipa. If
mem_access is active, hardware-based gva to ipa translation might fail,
as gva_to_ipa uses the guest's translation tables, access to which might
be restricted by the active VTTBR. To address this issue, in this commit
we add a software-based guest-page-table walk, which will be used by the
function p2m_mem_access_check_and_get_page perform the gva to ipa
translation in software in one of the following commits.

Note: The introduced function guest_walk_tables assumes that the domain,
the gva of which is to be translated, is running on the currently active
vCPU. To walk the guest's page tables on a different vCPU, the following
registers would need to be loaded: TCR_EL1, TTBR0_EL1, TTBR1_EL1, and
SCTLR_EL1.

Signed-off-by: Sergej Proskurin 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v2: Rename p2m_gva_to_ipa to p2m_walk_gpt and move it to p2m.c.

Move the functionality responsible for walking long-descriptor based
translation tables out of the function p2m_walk_gpt. Also move out
the long-descriptor based translation out of this commit.

Change function parameters in order to return access access rights
to a requested gva.

Cosmetic fixes.

v3: Rename the introduced functions to guest_walk_(tables|sd|ld) and
move the implementation to guest_copy.(c|h).

Set permissions in guest_walk_tables also if the MMU is disabled.

Change the function parameter of type "struct p2m_domain *" to
"struct vcpu *" in the function guest_walk_tables.

v4: Change the function parameter of type "struct p2m_domain *" to
"struct vcpu *" in the functions guest_walk_(sd|ld) as well.
---
 xen/arch/arm/Makefile|  1 +
 xen/arch/arm/guest_walk.c| 91 
 xen/include/asm-arm/guest_walk.h | 19 +
 3 files changed, 111 insertions(+)
 create mode 100644 xen/arch/arm/guest_walk.c
 create mode 100644 xen/include/asm-arm/guest_walk.h

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 49e1fb2f84..282d2c2949 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_HAS_GICV3) += gic-v3.o
 obj-$(CONFIG_HAS_ITS) += gic-v3-its.o
 obj-$(CONFIG_HAS_ITS) += gic-v3-lpi.o
 obj-y += guestcopy.o
+obj-y += guest_walk.o
 obj-y += hvm.o
 obj-y += io.o
 obj-y += irq.o
diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
new file mode 100644
index 00..b8bb553a6e
--- /dev/null
+++ b/xen/arch/arm/guest_walk.c
@@ -0,0 +1,91 @@
+/*
+ * Guest page table walk
+ * Copyright (c) 2017 Sergej Proskurin 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see .
+ */
+
+#include 
+
+/*
+ * The function guest_walk_sd translates a given GVA into an IPA using the
+ * short-descriptor translation table format in software. This function assumes
+ * that the domain is running on the currently active vCPU. To walk the guest's
+ * page table on a different vCPU, the following registers would need to be
+ * loaded: TCR_EL1, TTBR0_EL1, TTBR1_EL1, and SCTLR_EL1.
+ */
+static int guest_walk_sd(const struct vcpu *v,
+ vaddr_t gva, paddr_t *ipa,
+ unsigned int *perms)
+{
+/* Not implemented yet. */
+return -EFAULT;
+}
+
+/*
+ * The function guest_walk_ld translates a given GVA into an IPA using the
+ * long-descriptor translation table format in software. This function assumes
+ * that the domain is running on the currently active vCPU. To walk the guest's
+ * page table on a different vCPU, the following registers would need to be
+ * loaded: TCR_EL1, TTBR0_EL1, TTBR1_EL1, and SCTLR_EL1.
+ */
+static int guest_walk_ld(const struct vcpu *v,
+ vaddr_t gva, paddr_t *ipa,
+ unsigned int *perms)
+{
+/* Not implemented yet. */
+return -EFAULT;
+}
+
+int guest_walk_tables(const struct vcpu *v, vaddr_t gva,
+  paddr_t *ipa, unsigned int *perms)
+{
+uint32_t sctlr = READ_SYSREG(SCTLR_EL1);
+register_t tcr = READ_SYSREG(TCR_EL1);
+unsigned int _perms = GV2M_READ;
+
+/* We assume that the domain is running on the currently active 

[Xen-devel] [PATCH v4 1/9] arm/mem_access: Add (TCR_|TTBCR_)* defines

2017-06-20 Thread Sergej Proskurin
This commit adds (TCR_|TTBCR_)* defines to simplify access to the
respective register contents. At the same time, we adjust the macro
TCR_T0SZ by using the newly introduced TCR_T0SZ_SHIFT instead of the
hardcoded value.

Signed-off-by: Sergej Proskurin 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v2: Define TCR_SZ_MASK in a way so that it can be also applied to 32-bit guests
using the long-descriptor translation table format.

Extend the previous commit by further defines allowing a simplified access
to the registers TCR_EL1 and TTBCR.

v3: Replace the hardcoded value 0 in the TCR_T0SZ macro with the newly
introduced TCR_T0SZ_SHIFT. Also, replace the hardcoded value 14 in
the TCR_TG0_* macros with the introduced TCR_TG0_SHIFT.

Comment when to apply the defines TTBCR_PD(0|1), according to ARM
DDI 0487B.a and ARM DDI 0406C.b.

Remove TCR_TB_* defines.

Comment when certain TCR_EL2 register fields can be applied.

v4: Cosmetic changes.
---
 xen/include/asm-arm/processor.h | 69 ++---
 1 file changed, 65 insertions(+), 4 deletions(-)

diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
index 855ded1b07..863a569432 100644
--- a/xen/include/asm-arm/processor.h
+++ b/xen/include/asm-arm/processor.h
@@ -94,6 +94,13 @@
 #define TTBCR_N_2KB  _AC(0x03,U)
 #define TTBCR_N_1KB  _AC(0x04,U)
 
+/*
+ * TTBCR_PD(0|1) can be applied only if LPAE is disabled, i.e., TTBCR.EAE==0
+ * (ARM DDI 0487B.a G6-5203 and ARM DDI 0406C.b B4-1722).
+ */
+#define TTBCR_PD0   (_AC(1,U)<<4)
+#define TTBCR_PD1   (_AC(1,U)<<5)
+
 /* SCTLR System Control Register. */
 /* HSCTLR is a subset of this. */
 #define SCTLR_TE(_AC(1,U)<<30)
@@ -154,7 +161,20 @@
 
 /* TCR: Stage 1 Translation Control */
 
-#define TCR_T0SZ(x) ((x)<<0)
+#define TCR_T0SZ_SHIFT  (0)
+#define TCR_T1SZ_SHIFT  (16)
+#define TCR_T0SZ(x) ((x)<

[Xen-devel] [PATCH v4 3/9] arm/mem_access: Add short-descriptor pte typedefs

2017-06-20 Thread Sergej Proskurin
The current implementation does not provide appropriate types for
short-descriptor translation table entries. As such, this commit adds new
types, which simplify managing the respective translation table entries.

Signed-off-by: Sergej Proskurin 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v3: Add more short-descriptor related pte typedefs that will be used by
the following commits.

v4: Move short-descriptor pte typedefs out of page.h into short-desc.h.

Change the type unsigned int to bool of every bitfield in
short-descriptor related data-structures that holds only one bit.

Change the typedef names from pte_sd_* to short_desc_*.
---
 xen/include/asm-arm/short-desc.h | 108 +++
 1 file changed, 108 insertions(+)
 create mode 100644 xen/include/asm-arm/short-desc.h

diff --git a/xen/include/asm-arm/short-desc.h b/xen/include/asm-arm/short-desc.h
new file mode 100644
index 00..6e2ab09d24
--- /dev/null
+++ b/xen/include/asm-arm/short-desc.h
@@ -0,0 +1,108 @@
+#ifndef __ARM_SHORT_DESC_H__
+#define __ARM_SHORT_DESC_H__
+
+/*
+ * Comprises bits of the level 1 short-descriptor format representing
+ * a section.
+ */
+typedef struct __packed {
+bool pxn:1; /* Privileged Execute Never */
+bool sec:1; /* == 1 if section or supersection */
+bool b:1;   /* Bufferable */
+bool c:1;   /* Cacheable */
+bool xn:1;  /* Execute Never */
+unsigned int dom:4; /* Domain field */
+bool impl:1;/* Implementation defined */
+unsigned int ap:2;  /* AP[1:0] */
+unsigned int tex:3; /* TEX[2:0] */
+bool ro:1;  /* AP[2] */
+bool s:1;   /* Shareable */
+bool ng:1;  /* Non-global */
+bool supersec:1;/* Must be 0 for sections */
+bool ns:1;  /* Non-secure */
+unsigned int base:12;   /* Section base address */
+} short_desc_l1_sec_t;
+
+/*
+ * Comprises bits of the level 1 short-descriptor format representing
+ * a supersection.
+ */
+typedef struct __packed {
+bool pxn:1; /* Privileged Execute Never */
+bool sec:1; /* == 1 if section or supersection */
+bool b:1;   /* Bufferable */
+bool c:1;   /* Cacheable */
+bool xn:1;  /* Execute Never */
+unsigned int extbase2:4;/* Extended base address, PA[39:36] */
+bool impl:1;/* Implementation defined */
+unsigned int ap:2;  /* AP[1:0] */
+unsigned int tex:3; /* TEX[2:0] */
+bool ro:1;  /* AP[2] */
+bool s:1;   /* Shareable */
+bool ng:1;  /* Non-global */
+bool supersec:1;/* Must be 0 for sections */
+bool ns:1;  /* Non-secure */
+unsigned int extbase1:4;/* Extended base address, PA[35:32] */
+unsigned int base:8;/* Supersection base address */
+} short_desc_l1_supersec_t;
+
+/*
+ * Comprises bits of the level 2 short-descriptor format representing
+ * a small page.
+ */
+typedef struct __packed {
+bool xn:1;  /* Execute Never */
+bool page:1;/* ==1 if small page */
+bool b:1;   /* Bufferable */
+bool c:1;   /* Cacheable */
+unsigned int ap:2;  /* AP[1:0] */
+unsigned int tex:3; /* TEX[2:0] */
+bool ro:1;  /* AP[2] */
+bool s:1;   /* Shareable */
+bool ng:1;  /* Non-global */
+unsigned int base:20;   /* Small page base address */
+} short_desc_l2_page_t;
+
+/*
+ * Comprises bits of the level 2 short-descriptor format representing
+ * a large page.
+ */
+typedef struct __packed {
+bool lpage:1;   /* ==1 if large page */
+bool page:1;/* ==0 if large page */
+bool b:1;   /* Bufferable */
+bool c:1;   /* Cacheable */
+unsigned int ap:2;  /* AP[1:0] */
+unsigned int sbz:3; /* Should be zero */
+bool ro:1;  /* AP[2] */
+bool s:1;   /* Shareable */
+bool ng:1;  /* Non-global */
+unsigned int tex:3; /* TEX[2:0] */
+bool xn:1;  /* Execute Never */
+unsigned int base:16;   /* Large page base address */
+} short_desc_l2_lpage_t;
+
+/*
+ * Comprises the bits required to walk page tables adhering to the
+ * short-descriptor translation table format.
+ */
+typedef struct __packed {
+unsigned int dt:2;  /* Descriptor type */
+unsigned int pad1:8;
+unsigned int base:22;   /* Base address of block or next table */
+} short_desc_walk_t;
+
+/*
+ * Represents page table 

[Xen-devel] [PATCH v4 8/9] arm/mem_access: Add short-descriptor based gpt

2017-06-20 Thread Sergej Proskurin
This commit adds functionality to walk the guest's page tables using the
short-descriptor translation table format for both ARMv7 and ARMv8. The
implementation is based on ARM DDI 0487B-a J1-6002 and ARM DDI 0406C-b
B3-1506.

Signed-off-by: Sergej Proskurin 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v3: Move the implementation to ./xen/arch/arm/guest_copy.c.

Use defines instead of hardcoded values.

Cosmetic fixes & Added more coments.

v4: Adjusted the names of short-descriptor data-types.

Adapt the function to the new parameter of type "struct vcpu *".

Cosmetic fixes.
---
 xen/arch/arm/guest_walk.c| 167 ++-
 xen/include/asm-arm/guest_walk.h |  16 
 2 files changed, 181 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
index c37c595157..9cc167af49 100644
--- a/xen/arch/arm/guest_walk.c
+++ b/xen/arch/arm/guest_walk.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * The function guest_walk_sd translates a given GVA into an IPA using the
@@ -30,8 +31,170 @@ static int guest_walk_sd(const struct vcpu *v,
  vaddr_t gva, paddr_t *ipa,
  unsigned int *perms)
 {
-/* Not implemented yet. */
-return -EFAULT;
+bool disabled = true;
+int64_t ttbr;
+paddr_t mask, paddr;
+short_desc_t pte, *table;
+struct page_info *page;
+register_t ttbcr = READ_SYSREG(TCR_EL1);
+unsigned int level = 0, n = ttbcr & TTBCR_N_MASK;
+struct domain *d = v->domain;
+
+const paddr_t offsets[2] = {
+((paddr_t)(gva >> 20) & ((1ULL << (12 - n)) - 1)),
+((paddr_t)(gva >> 12) & ((1ULL << 8) - 1))
+};
+
+mask = ((1ULL << BITS_PER_WORD) - 1) &
+   ~((1ULL << (BITS_PER_WORD - n)) - 1);
+
+if ( n == 0 || !(gva & mask) )
+{
+/* Use TTBR0 for GVA to IPA translation. */
+ttbr = READ_SYSREG64(TTBR0_EL1);
+
+/* If TTBCR.PD0 is set, translations using TTBR0 are disabled. */
+disabled = ttbcr & TTBCR_PD0;
+}
+else
+{
+/* Use TTBR1 for GVA to IPA translation. */
+ttbr = READ_SYSREG64(TTBR1_EL1);
+
+/* If TTBCR.PD1 is set, translations using TTBR1 are disabled. */
+disabled = ttbcr & TTBCR_PD1;
+
+/*
+ * TTBR1 translation always works like n==0 TTBR0 translation (ARM DDI
+ * 0487B.a J1-6003).
+ */
+n = 0;
+}
+
+if ( disabled )
+return -EFAULT;
+
+/*
+ * The address of the descriptor for the initial lookup has the following
+ * format: [ttbr<31:14-n>:gva<31-n:20>:00] (ARM DDI 0487B.a J1-6003). In
+ * this way, the first lookup level might comprise up to four consecutive
+ * pages. To avoid mapping all of the pages, we simply map the page that is
+ * needed by the first level translation by incorporating up to 2 MSBs of
+ * the GVA.
+ */
+mask = (1ULL << (14 - n)) - 1;
+paddr = (ttbr & ~mask) | (offsets[level] << 2);
+
+page = get_page_from_gfn(d, paddr_to_pfn(paddr), NULL, P2M_ALLOC);
+if ( !page )
+return -EFAULT;
+
+table = __map_domain_page(page);
+
+/*
+ * Consider that the first level address translation does not need to be
+ * page-aligned if n > 2.
+ */
+if ( n > 2 )
+{
+/* Make sure that we consider the bits ttbr<12:14-n> if n > 2. */
+mask = ((1ULL << 12) - 1) & ~((1ULL << (14 - n)) - 1);
+table = (short_desc_t *)((unsigned long)table | (unsigned long)(ttbr & 
mask));
+}
+
+/*
+ * As we have considered up to 2 MSBs of the GVA for mapping the first
+ * level translation table, we need to make sure that we limit the table
+ * offset that is is indexed by GVA<31-n:20> to max 10 bits to avoid
+ * exceeding the page size limit.
+ */
+mask = ((1ULL << 10) - 1);
+pte = table[offsets[level] & mask];
+
+unmap_domain_page(table);
+put_page(page);
+
+switch ( pte.walk.dt )
+{
+case L1DESC_INVALID:
+return -EFAULT;
+
+case L1DESC_PAGE_TABLE:
+level++;
+
+page = get_page_from_gfn(d, (pte.walk.base >> 2), NULL, P2M_ALLOC);
+if ( !page )
+return -EFAULT;
+
+table = __map_domain_page(page);
+/*
+ * The second level translation table is addressed by PTE<31:10>. Hence
+ * it does not need to be page aligned. Make sure that we also consider
+ * the bits PTE<11:10>.
+ */
+table = (short_desc_t *)((unsigned long)table | ((pte.walk.base & 0x3) 
<< 10));
+
+pte = table[offsets[level]];
+
+unmap_domain_page(table);
+put_page(page);
+
+if ( pte.walk.dt == L2DESC_INVALID )
+return -EFAULT;
+
+if ( pte.pg.page ) /* Small page. */
+{
+mask = (1ULL << PAGE_SHIFT_4K) - 1;
+

[Xen-devel] [PATCH 2/2] x86/xen/efi: Init only efi struct members used by Xen

2017-06-20 Thread Daniel Kiper
Current approach, wholesale efi struct initialization from efi_xen, is not
good. Usually if new member is defined then it is properly initialized in
drivers/firmware/efi/efi.c but not in arch/x86/xen/efi.c. As I saw it happened
a few times until now. So, let's initialize only efi struct members used by
Xen to avoid such issues in the future.

Signed-off-by: Daniel Kiper 
---
 arch/x86/xen/efi.c |   45 -
 1 file changed, 12 insertions(+), 33 deletions(-)

diff --git a/arch/x86/xen/efi.c b/arch/x86/xen/efi.c
index 30bb2e8..01b9faf 100644
--- a/arch/x86/xen/efi.c
+++ b/arch/x86/xen/efi.c
@@ -54,38 +54,6 @@
.tables = EFI_INVALID_TABLE_ADDR  /* Initialized later. */
 };
 
-static const struct efi efi_xen __initconst = {
-   .systab   = NULL, /* Initialized later. */
-   .runtime_version  = 0,/* Initialized later. */
-   .mps  = EFI_INVALID_TABLE_ADDR,
-   .acpi = EFI_INVALID_TABLE_ADDR,
-   .acpi20   = EFI_INVALID_TABLE_ADDR,
-   .smbios   = EFI_INVALID_TABLE_ADDR,
-   .smbios3  = EFI_INVALID_TABLE_ADDR,
-   .sal_systab   = EFI_INVALID_TABLE_ADDR,
-   .boot_info= EFI_INVALID_TABLE_ADDR,
-   .hcdp = EFI_INVALID_TABLE_ADDR,
-   .uga  = EFI_INVALID_TABLE_ADDR,
-   .uv_systab= EFI_INVALID_TABLE_ADDR,
-   .fw_vendor= EFI_INVALID_TABLE_ADDR,
-   .runtime  = EFI_INVALID_TABLE_ADDR,
-   .config_table = EFI_INVALID_TABLE_ADDR,
-   .get_time = xen_efi_get_time,
-   .set_time = xen_efi_set_time,
-   .get_wakeup_time  = xen_efi_get_wakeup_time,
-   .set_wakeup_time  = xen_efi_set_wakeup_time,
-   .get_variable = xen_efi_get_variable,
-   .get_next_variable= xen_efi_get_next_variable,
-   .set_variable = xen_efi_set_variable,
-   .query_variable_info  = xen_efi_query_variable_info,
-   .update_capsule   = xen_efi_update_capsule,
-   .query_capsule_caps   = xen_efi_query_capsule_caps,
-   .get_next_high_mono_count = xen_efi_get_next_high_mono_count,
-   .reset_system = xen_efi_reset_system,
-   .set_virtual_address_map  = NULL, /* Not used under Xen. */
-   .flags= 0 /* Initialized later. */
-};
-
 static efi_system_table_t __init *xen_efi_probe(void)
 {
struct xen_platform_op op = {
@@ -102,7 +70,18 @@ static efi_system_table_t __init *xen_efi_probe(void)
 
/* Here we know that Xen runs on EFI platform. */
 
-   efi = efi_xen;
+   efi.get_time = xen_efi_get_time;
+   efi.set_time = xen_efi_set_time;
+   efi.get_wakeup_time = xen_efi_get_wakeup_time;
+   efi.set_wakeup_time = xen_efi_set_wakeup_time;
+   efi.get_variable = xen_efi_get_variable;
+   efi.get_next_variable = xen_efi_get_next_variable;
+   efi.set_variable = xen_efi_set_variable;
+   efi.query_variable_info = xen_efi_query_variable_info;
+   efi.update_capsule = xen_efi_update_capsule;
+   efi.query_capsule_caps = xen_efi_query_capsule_caps;
+   efi.get_next_high_mono_count = xen_efi_get_next_high_mono_count;
+   efi.reset_system = xen_efi_reset_system;
 
efi_systab_xen.tables = info->cfg.addr;
efi_systab_xen.nr_tables = info->cfg.nent;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/2] efi: Process MEMATTR table only if EFI_MEMMAP

2017-06-20 Thread Daniel Kiper
Otherwise e.g. Xen dom0 on x86_64 EFI platforms crashes.

In theory we can check EFI_PARAVIRT too, however,
EFI_MEMMAP looks more generic and covers more cases.

Signed-off-by: Daniel Kiper 
---
 drivers/firmware/efi/efi.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index b372aad..045d6d3 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -528,7 +528,8 @@ int __init efi_config_parse_tables(void *config_tables, int 
count, int sz,
}
}
 
-   efi_memattr_init();
+   if (efi_enabled(EFI_MEMMAP))
+   efi_memattr_init();
 
/* Parse the EFI Properties table if it exists */
if (efi.properties_table != EFI_INVALID_TABLE_ADDR) {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 0/2] xen/efi: Fixes

2017-06-20 Thread Daniel Kiper
Hey,

Two small fixes for Xen dom0 running on x86_64 EFI platforms.

I am CC-ing stable maintainers because similar stuff is needed for various
stable kernels too. Unfortunately, almost every version needs a bit different
set of fixes. So, please treat this email more as head up than real set of
patches for your kernel. If you wish to get Xen EFI stuff fixed just drop me
a line. Then I will prepare set of patches for your kernel (if needed).

Daniel

 arch/x86/xen/efi.c |   45 -
 drivers/firmware/efi/efi.c |3 ++-
 2 files changed, 14 insertions(+), 34 deletions(-)

Daniel Kiper (2):
  efi: Process MEMATTR table only if EFI_MEMMAP
  x86/xen/efi: Init only efi struct members used by Xen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Fwd: VM Live Migration with Local Storage

2017-06-20 Thread Igor Druzhinin
On 12/06/17 04:16, Bruno Alvisio wrote:
> Hello,
> 
> I think it would be beneficial to add local disk migration feature for
> ‘blkback' backend since it is one of the mostly used backends. I would
> like to start a discussion about the design of the machinery needed to
> achieve this feature.
> 
> ===
> Objective
> Add a feature to migrate VMs that have local storage and use the blkback
> iface.
> ===
> 
> ===
> User Interface
> Add a cmd line option in “xl migrate” command to specify if local disks
> need to be copied to the destination node.
> ===
> 
> ===
> Design
> 
>  1. As part of the libxl_domain_suspend, the “disk mirroring machinery”
> starts an asynchronous job that copies the disks blocks from source
> to the destination.
>  2. The protocol to copy the disks should resemble the one used for
> memory copy:
> 
>   * Do first initial copy of the disk.
>   * Check of sectors that have been written since copy started. For
> this, the blkback driver should be aware that migration of disk is
> happening and in this case forward the write request to
> the “migration machinery” so that a record of dirty blocks are logged.
>   * Migration machinery copies “dirty” blocks until convergence.

Be careful with that. You don't really want to merge block and memory
live migrations. They should be linked but proceed independently since
we don't want to have the last iteration of memory transfer stalled
waiting for disk convergence. Some mix of pre-copy and post-copy
approach might be suitable.

Igor

>   * Duplicate all the disk writes/reads to both disks in source and
> destinations node while VM is being suspended.
> 
> 
> Block Diagram
> 
>+—--+
>|  VM   |
>+---+
>   |
>   | I/O Write
>   |
>   V
> +--+   +---+   +-+
> |  blkback | > |  Source   |  sectors Stream   | Destination |
> +--+   |  mirror   |-->|   mirror|
>   || machinery |   I/O Writes  |  machinery  |
>   |+---+   +-+
>   |  |
>   |  |
>   | To I/O block layer   |
>   |  |
>   V  V
> +--+   +-+
> |   disk   |   |   Mirrored  |
> +--+   | Disk|
>+-+
> 
> 
> ==
> Initial Questions
> 
>  1. Is it possible to leverage the current design of QEMU for drive
> mirroring for Xen?
>  2. What is the best place to implement this protocol? As part of Xen or
> the kernel?
>  3. Is it possible to use the same stream currently used for migrating
> the memory to also migrate the disk blocks?
> 
> 
> Any guidance/feedback for a more specific design is greatly appreciated.
> 
> Thanks,
> 
> Bruno
> 
> On Wed, Feb 22, 2017 at 5:00 AM, Wei Liu  > wrote:
> 
> Hi Bruno
> 
> Thanks for your interest.
> 
> On Tue, Feb 21, 2017 at 10:34:45AM -0800, Bruno Alvisio wrote:
> > Hello,
> >
> > I have been to doing some research and as far as I know XEN supports
> > Live Migration
> > of VMs that only have shared storage. (i.e. iSCSI) If the VM has been
> > booted with local storage it cannot be live migrated.
> > QEMU seems to support live migration with local storage (I have tested 
> using
> > 'virsh migrate with the '--storage-copy-all' option)
> >
> > I am wondering if this still true in the latest XEN release. Are there 
> plans
> > to add this functionality in future releases? I would be interested in
> > contributing to the Xen Project by adding this functionality.
> >
> 
> No plan at the moment.
> 
> Xen supports a wide variety of disk backends. QEMU is one of them. The
> others are blktap (not upstreamed yet) and in-kernel blkback. The latter
> two don't have the capability to copy local storage to the remote end.
> 
> That said, I think it would be valuable to have such capability for QEMU
> backed disks. We also need to design the machinery so that other
> backends can be made to do the same thing in the future.
> 
> If you want to undertake this project, I suggest you setup a Xen system,
> read xl / libxl source code under tools directory and understand how
> everything is put together. Reading source code could be daunting at
> times, so don't hesitate to ask for pointers. After you have the big
> 

Re: [Xen-devel] Fwd: VM Live Migration with Local Storage

2017-06-20 Thread Konrad Rzeszutek Wilk
On Sun, Jun 11, 2017 at 08:16:04PM -0700, Bruno Alvisio wrote:
> Hello,
> 
> I think it would be beneficial to add local disk migration feature for
> ‘blkback' backend since it is one of the mostly used backends. I would like
> to start a discussion about the design of the machinery needed to achieve
> this feature.
> 
> ===
> Objective
> Add a feature to migrate VMs that have local storage and use the blkback
> iface.
> ===
> 
> ===
> User Interface
> Add a cmd line option in “xl migrate” command to specify if local disks
> need to be copied to the destination node.
> ===
> 
> ===
> Design
> 
>1. As part of the libxl_domain_suspend, the “disk mirroring machinery”
>starts an asynchronous job that copies the disks blocks from source to the
>destination.
>2. The protocol to copy the disks should resemble the one used for
>memory copy:
> 
> 
>- Do first initial copy of the disk.
>- Check of sectors that have been written since copy started. For this,
>the blkback driver should be aware that migration of disk is happening and
>in this case forward the write request to the “migration machinery” so that
>a record of dirty blocks are logged.
>- Migration machinery copies “dirty” blocks until convergence.
>- Duplicate all the disk writes/reads to both disks in source and
>destinations node while VM is being suspended.
> 
> 
> Block Diagram
> 
>+—--+
>|  VM   |
>+---+
>   |
>   | I/O Write
>   |
>   V
> +--+   +---+   +-+
> |  blkback | > |  Source   |  sectors Stream   | Destination |
> +--+   |  mirror   |-->|   mirror|
>   || machinery |   I/O Writes  |  machinery  |
>   |+---+   +-+
>   |  |
>   |  |
>   | To I/O block layer   |
>   |  |
>   V  V
> +--+   +-+
> |   disk   |   |   Mirrored  |
> +--+   | Disk|
>+-+
> 
> 
> ==
> Initial Questions
> 
>1. Is it possible to leverage the current design of QEMU for drive
>mirroring for Xen?

Yes. It has qdisk which implement blkback interface.

>2. What is the best place to implement this protocol? As part of Xen or
>the kernel?

QEMU
>3. Is it possible to use the same stream currently used for migrating
>the memory to also migrate the disk blocks?

Probably.
> 
> 
> Any guidance/feedback for a more specific design is greatly appreciated.
> 
> Thanks,
> 
> Bruno
> 
> On Wed, Feb 22, 2017 at 5:00 AM, Wei Liu  wrote:
> 
> > Hi Bruno
> >
> > Thanks for your interest.
> >
> > On Tue, Feb 21, 2017 at 10:34:45AM -0800, Bruno Alvisio wrote:
> > > Hello,
> > >
> > > I have been to doing some research and as far as I know XEN supports
> > > Live Migration
> > > of VMs that only have shared storage. (i.e. iSCSI) If the VM has been
> > > booted with local storage it cannot be live migrated.
> > > QEMU seems to support live migration with local storage (I have tested
> > using
> > > 'virsh migrate with the '--storage-copy-all' option)
> > >
> > > I am wondering if this still true in the latest XEN release. Are there
> > plans
> > > to add this functionality in future releases? I would be interested in
> > > contributing to the Xen Project by adding this functionality.
> > >
> >
> > No plan at the moment.
> >
> > Xen supports a wide variety of disk backends. QEMU is one of them. The
> > others are blktap (not upstreamed yet) and in-kernel blkback. The latter
> > two don't have the capability to copy local storage to the remote end.
> >
> > That said, I think it would be valuable to have such capability for QEMU
> > backed disks. We also need to design the machinery so that other
> > backends can be made to do the same thing in the future.
> >
> > If you want to undertake this project, I suggest you setup a Xen system,
> > read xl / libxl source code under tools directory and understand how
> > everything is put together. Reading source code could be daunting at
> > times, so don't hesitate to ask for pointers. After you have the big
> > picture in mind, we can then discuss how to implement the functionality
> > on xen-devel.
> >
> > Does this sound good to you?
> >
> > Wei.
> >
> > > Thanks,
> > >
> > > Bruno
> >
> > > ___
> > > Xen-devel mailing list
> > > 

Re: [Xen-devel] stub domain crash related to bind_interdomain

2017-06-20 Thread Sarah Newman
On 06/20/2017 01:24 AM, Jan Beulich wrote:
 On 20.06.17 at 01:39,  wrote:
>> I have gotten messages like this sporadically in the qemu-dm log for stub 
>> domains, both at domain start and domain reboot:
>>
>> evtchn_open() -> 7
>> ERROR: bind_interdomain failed with rc=-22xenevtchn_bind_interdomain(121, 0) 
>> = -22
>> bind interdomain ioctl error 22
>> Unable to find x86 CPU definition
>> close(0)
>>
>> It is not always remote port 0 that fails but typically is so.
> 
> But I'm afraid this is a relevant distinction, and hence you may be
> seeing two different issues. Have you been able to find out where
> that remote port is coming from? I ask because port 0 is never a
> valid one (see evtchn_init() setting it to ECS_RESERVED).

By inspection I think it is
shared_page->vcpu_ioreq[i].vp_eport used in helper2.c:cpu_x86_init because 
otherwise I should see another message like

xc_evtchn_bind_interdomain(21, 3) = 0
first, and I only see one message from xc_evtchn_bind_interdomain.

I think it should be reproducible within a few hundred reboots.

--Sarah

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] libxc: add xc_domain_add_to_physmap_batch to wrap XENMEM_add_to_physmap_batch

2017-06-20 Thread Zhongze Liu
This is a preparation for the proposal "allow setting up shared memory areas
between VMs from xl config file". See:
V2: https://lists.xen.org/archives/html/xen-devel/2017-06/msg02256.html
V1: https://lists.xen.org/archives/html/xen-devel/2017-05/msg01288.html

The plan is to use XENMEM_add_to_physmap_batch in xl to map foregin pages from
one DomU to another so that the page could be shared. But currently there is no
wrapper for XENMEM_add_to_physmap_batch in libxc, so we just add a wrapper for
it.

Signed-off-by: Zhongze Liu 
---
Changed Since v1:
  * explain why such a sudden wrapper
  * change the parameters' types

Cc: Ian Jackson ,
Cc: Wei Liu ,
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Jan Beulich 

Re: [Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file

2017-06-20 Thread Julien Grall

Hi,

Thank you for the new proposal.

On 06/20/2017 06:18 PM, Zhongze Liu wrote:

In the example above. A memory area ID1 will be shared between vm1 and vm2.
This area will be taken from vm1 and mapped into vm2's stage-2 page table.
The parameter "prot=RO" means that this memory area are offered with read-only
permission. vm1 can access this area using gmfn1~gmfn2, and vm2 using
gmfn5~gmfn6.


[...]



==
2.3 mapping the memory areas
==
Handle the newly added config option in tools/{xl, libxl} and utilize
toos/libxc to do the actual memory mapping. Specifically, we will use
a wrapper to XENMME_add_to_physmap_batch with XENMAPSPACE_gmfn_foreign to
do the actual mapping. But since there isn't such a wrapper in libxc, we'll
have to add a new wrapper, xc_domain_add_to_physmap_batch in libxc/xc_domain.c


In the paragrah above, you suggest the user can select the permission on 
the shared page. However, the hypercall XENMEM_add_to_physmap does not 
currently take permission. So how do you plan to handle that?


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 15/18] xen/pvcalls: implement the ioworker functions

2017-06-20 Thread Boris Ostrovsky
On 06/15/2017 03:09 PM, Stefano Stabellini wrote:
> We have one ioworker per socket. Each ioworker goes through the list of
> outstanding read/write requests. Once all requests have been dealt with,
> it returns.
>
> We use one atomic counter per socket for "read" operations and one
> for "write" operations to keep track of the reads/writes to do.
>
> We also use one atomic counter ("io") per ioworker to keep track of how
> many outstanding requests we have in total assigned to the ioworker. The
> ioworker finishes when there are none.
>
> Signed-off-by: Stefano Stabellini 
> CC: boris.ostrov...@oracle.com
> CC: jgr...@suse.com

Reviewed-by: Boris Ostrovsky 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC v2]Proposal to allow setting up shared memory areas between VMs from xl config file

2017-06-20 Thread Zhongze Liu

1. Motivation and Description

Virtual machines use grant table hypercalls to setup a share page for
inter-VMs communications. These hypercalls are used by all PV
protocols today. However, very simple guests, such as baremetal
applications, might not have the infrastructure to handle the grant table.
This project is about setting up several shared memory areas for inter-VMs
communications directly from the VM config file.
So that the guest kernel doesn't have to have grant table support (in the
embedded space, this is not unusual) to be able to communicate with
other guests.


2. Implementation Plan:


==
2.1 Introduce a new VM config option in xl:
==
The shared areas should be shareable among several (>=2) VMs, so
every shared physical memory area is assigned to a set of VMs.
Therefore, a “token” or “identifier” should be used here to uniquely
identify a backing memory area.

The backing area would be taken from one domain, which we will regard
as the "master domain", and this domain should be created prior to any
other "slave domain"s. Again, we have to use some kind of tag to tell who
is the "master domain".

And the ability to specify the attributes of the pages (say, WO/RO/X)
to be shared should be also given to the user. For the master domain,
these attributes often describes the maximum permission allowed for the
shared pages, and for the slave domains, these attributes are often used
to describe with what permissions this area will be mapped.
This information should also be specified in the xl config entry.

To handle all these, I would suggest using an unsigned integer to serve as the
identifier, and using a "master" tag in the master domain's xl config entry
to announce that she will provide the backing memory pages. A separate
entry would be used to describe the attributes of the shared memory area, of
the form "prot=RW".
For example:

In xl config file of vm1:

static_shared_mem = ["id = ID1, begin = gmfn1, end = gmfn2,
  granularity = 4k, prot = RO, master”,
 "id = ID2, begin = gmfn3, end = gmfn4,
 granularity = 4k, prot = RW, master”]

In xl config file of vm2:

static_shared_mem = ["id = ID1, begin = gmfn5, end = gmfn6,
  granularity = 4k, prot = RO”]

In xl config file of vm3:

static_shared_mem = ["id = ID2, begin = gmfn7, end = gmfn8,
  granularity = 4k, prot = RW”]

gmfn's above are all hex of the form "0x2".

In the example above. A memory area ID1 will be shared between vm1 and vm2.
This area will be taken from vm1 and mapped into vm2's stage-2 page table.
The parameter "prot=RO" means that this memory area are offered with read-only
permission. vm1 can access this area using gmfn1~gmfn2, and vm2 using
gmfn5~gmfn6.
Likewise, a memory area ID will be shared between vm1 and vm3 with read and
write permissions. vm1 is the master and vm2 the slave. vm1 can access the
area using gmfn3~gmfn4 and vm3 using gmfn7~gmfn8.

The "granularity" is optional in the slaves' config entries. But if it's
presented in the slaves' config entry, it has to be the same with its master's.
Besides, the size of the gmfn range must also match. And overlapping backing
memory areas are well defined.

Note that the "master" tag in vm1 for both ID1 and ID2 indicates that vm1
should be created prior to both vm2 and vm3, for they both rely on the pages
backed by vm1. If one tries to create vm2 or vm3 prior to vm1, she will get
an error. And in vm1's config file, the "prot=RO" parameter of ID1 indicates
that if one tries to share this page with vm1 with, say, "WR" permission,
she will get an error, too.

==
2.2 Store the mem-sharing information in xenstore
==
For we don't have some persistent storage for xl to store the information
of the shared memory areas, we have to find some way to keep it between xl
launches. And xenstore is a good place to do this. The information for one
shared area should include the ID, master domid and gmfn ranges and
memory attributes in master and slave domains of this area.
A current plan is to place the information under /local/shared_mem/ID.
Still take the above config files as an example:

If we instantiate vm1, vm2 and vm3, one after another,
“xenstore ls -f” should output something like this:

After VM1 was instantiated, the output of “xenstore ls -f”
will be something like this:

/local/shared_mem/ID1/master = domid_of_vm1
/local/shared_mem/ID1/gmfn_begin = gmfn1
/local/shared_mem/ID1/gmfn_end = gmfn2
/local/shared_mem/ID1/granularity = "4k"
/local/shared_mem/ID1/permissions = "RO"

Re: [Xen-devel] [PATCH v4 13/18] xen/pvcalls: implement release command

2017-06-20 Thread Boris Ostrovsky

> +
> +static int pvcalls_back_release_passive(struct xenbus_device *dev,
> + struct pvcalls_fedata *fedata,
> + struct sockpass_mapping *mappass)
> +{
> + if (mappass->sock->sk != NULL) {
> + write_lock_bh(>sock->sk->sk_callback_lock);
> + mappass->sock->sk->sk_user_data = NULL;
> + mappass->sock->sk->sk_data_ready = mappass->saved_data_ready;
> + write_unlock_bh(>sock->sk->sk_callback_lock);
> + }
> + down(>socket_lock);
> + radix_tree_delete(>socketpass_mappings, mappass->id);
> + sock_release(mappass->sock);
> + flush_workqueue(mappass->wq);
> + destroy_workqueue(mappass->wq);
> + kfree(mappass);
> + up(>socket_lock);

Can you raise the semaphore earlier, once the mapping is deleted from
the tree?

Also, why are you not locking the tree in pvcalls_back_accept()?

-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 12/18] xen/pvcalls: implement poll command

2017-06-20 Thread Boris Ostrovsky

> @@ -499,6 +521,55 @@ static int pvcalls_back_accept(struct xenbus_device *dev,
>  static int pvcalls_back_poll(struct xenbus_device *dev,
>struct xen_pvcalls_request *req)
>  {
> + struct pvcalls_fedata *fedata;
> + struct sockpass_mapping *mappass;
> + struct xen_pvcalls_response *rsp;
> + struct inet_connection_sock *icsk;
> + struct request_sock_queue *queue;
> + unsigned long flags;
> + int ret;
> + bool data;
> +
> + fedata = dev_get_drvdata(>dev);
> +
> + mappass = radix_tree_lookup(>socketpass_mappings, 
> req->u.poll.id);
> + if (mappass == NULL)
> + return -EINVAL;
> +
> + /*
> +  * Limitation of the current implementation: only support one
> +  * concurrent accept or poll call on one socket.
> +  */
> + spin_lock_irqsave(>copy_lock, flags);
> + if (mappass->reqcopy.cmd != 0) {
> + ret = -EINTR;
> + goto out;
> + }
> +
> + mappass->reqcopy = *req;
> + icsk = inet_csk(mappass->sock->sk);
> + queue = >icsk_accept_queue;
> + spin_lock(>rskq_lock);
> + data = queue->rskq_accept_head != NULL;
> + spin_unlock(>rskq_lock);

What is the purpose of the queue lock here?

-boris

> + if (data) {
> + mappass->reqcopy.cmd = 0;
> + ret = 0;
> + goto out;
> + }
> + spin_unlock_irqrestore(>copy_lock, flags);
> +
> + /* Tell the caller we don't need to send back a notification yet */
> + return -1;
>


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 04/27] x86: move PV invalid op emulation code

2017-06-20 Thread Wei Liu
On Tue, Jun 20, 2017 at 10:21:27AM -0600, Jan Beulich wrote:
> >>> On 08.06.17 at 19:11,  wrote:
> > @@ -1053,8 +982,8 @@ void do_invalid_op(struct cpu_user_regs *regs)
> >  
> >  if ( likely(guest_mode(regs)) )
> >  {
> > -if ( !emulate_invalid_rdtscp(regs) &&
> > - !emulate_forced_invalid_op(regs) )
> > +if ( !pv_emulate_invalid_rdtscp(regs) &&
> > + !pv_emulate_forced_invalid_op(regs) )
> 
> I wonder if the first couldn't be called by the second, making it
> unnecessary to export both. Or maybe have a wrapper
> pv_emulate_invalid_op() around both.
> 

Do you want me to refactor and move code in the same patch? Wouldn't
that make it hard for you to review?

I can submit a follow-up patch to do what you ask for.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Notes on stubdoms and latency on ARM

2017-06-20 Thread Volodymyr Babchuk
Hi Julien,

On 20 June 2017 at 03:45, Julien Grall  wrote:
>> On 19 June 2017 at 10:54, Stefano Stabellini 
>> wrote:
>>
 But given the conversation so far, it seems likely that that is mainly
 due to the fact that context switching on ARM has not been optimized.
>>>
>>>
>>> True. However, Volodymyr took the time to demonstrate the performance of
>>> EL0 apps vs. stubdoms with a PoC, which is much more than most Xen
>>> contributors do. Nodoby provided numbers for a faster ARM context switch
>>> yet. I don't know on whom should fall the burden of proving that a
>>> lighter context switch can match the EL0 app numbers. I am not sure it
>>> would be fair to ask Volodymyr to do it.
>>
>> Thanks. Actually, we discussed this topic internally today. Main
>> concern today is not a SMCs and OP-TEE (I will be happy to do this
>> right in XEN), but vcopros and GPU virtualization. Because of legal
>> issues, we can't put this in XEN. And because of vcpu framework nature
>> we will need multiple calls to vgpu driver per one vcpu context
>> switch.
>> I'm going to create worst case scenario, where multiple vcpu are
>> active and there are no free pcpu, to see how credit or credit2
>> scheduler will call my stubdom.
>> Also, I'm very interested in Julien's idea about stubdom without GIC.
>> Probably, I'll try to hack something like that to see how it will
>> affect overall switching latency
>
> This can only work if your stubdomain does not require interrupt. However,
> if you are dealing with devices you likely need interrupts, am I correct?
Ah yes, you are correct. I thought about OP-TEE use case, when there
are no interrupts. In case of co-processor virtualization we probably
will need interrupts.

> The problem would be the same with an EL0 app.
In case of EL0 there will be no problem, because EL0 can't handle
interrupts :) XEN should receive interrupt and invoke app. Yes, this
is another problem with apps, if we want to use them as devices
drivers.

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babc...@gmail.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 11/36] x86/mm: Add SME support for read_cr3_pa()

2017-06-20 Thread Tom Lendacky

On 6/20/2017 11:17 AM, Andy Lutomirski wrote:

On Fri, Jun 16, 2017 at 11:51 AM, Tom Lendacky  wrote:

The cr3 register entry can contain the SME encryption mask that indicates
the PGD is encrypted.  The encryption mask should not be used when
creating a virtual address from the cr3 register, so remove the SME
encryption mask in the read_cr3_pa() function.

During early boot SME will need to use a native version of read_cr3_pa(),
so create native_read_cr3_pa().

Signed-off-by: Tom Lendacky 
---
  arch/x86/include/asm/processor-flags.h |3 ++-
  arch/x86/include/asm/processor.h   |5 +
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor-flags.h 
b/arch/x86/include/asm/processor-flags.h
index 79aa2f9..cb6999c 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -2,6 +2,7 @@
  #define _ASM_X86_PROCESSOR_FLAGS_H

  #include 
+#include 

  #ifdef CONFIG_VM86
  #define X86_VM_MASKX86_EFLAGS_VM
@@ -33,7 +34,7 @@
   */
  #ifdef CONFIG_X86_64
  /* Mask off the address space ID bits. */
-#define CR3_ADDR_MASK 0x7000ull
+#define CR3_ADDR_MASK __sme_clr(0x7000ull)


Can you update the comment one line above, too?


Yup, will do.

Thanks,
Tom





___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 05/27] x86/traps: remove now unused inclusion of emulate.h

2017-06-20 Thread Jan Beulich
>>> On 08.06.17 at 19:11,  wrote:
> Signed-off-by: Wei Liu 

Acked-by: Jan Beulich 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 04/27] x86: move PV invalid op emulation code

2017-06-20 Thread Jan Beulich
>>> On 08.06.17 at 19:11,  wrote:
> @@ -1053,8 +982,8 @@ void do_invalid_op(struct cpu_user_regs *regs)
>  
>  if ( likely(guest_mode(regs)) )
>  {
> -if ( !emulate_invalid_rdtscp(regs) &&
> - !emulate_forced_invalid_op(regs) )
> +if ( !pv_emulate_invalid_rdtscp(regs) &&
> + !pv_emulate_forced_invalid_op(regs) )

I wonder if the first couldn't be called by the second, making it
unnecessary to export both. Or maybe have a wrapper
pv_emulate_invalid_op() around both.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 1/3] livepatch: Add local and global symbol resolution.

2017-06-20 Thread Jan Beulich
>>> On 20.06.17 at 17:59,  wrote:
> Our mechanism when deploying livepatches is to replace the loaded
> livepatch with another one. Which means we only have on livepatch
> applied and during the upgrade process have to load another one.

I think this is the main problematic part here: You're trying to fix one
use case (single patch being replaced every time) while - afaict - you
break the other one (multiple stacked patches). For your case,
wouldn't it be sufficient to load the patch with some flag indicating
that all symbol handling (resolution as well as insertion) should only
consider the main image? Of course with that flag, a plain "apply"
would need to fail as long as there's any other patch loaded.

The downside of this is that it would special case things a little too
much for my taste. For example I'd expect to also be able to have
a couple of stacked patches and replace just the topmost one(s).
That would require more than a boolean flag to tell which symbols
to consider and which to ignore.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 11/36] x86/mm: Add SME support for read_cr3_pa()

2017-06-20 Thread Andy Lutomirski
On Fri, Jun 16, 2017 at 11:51 AM, Tom Lendacky  wrote:
> The cr3 register entry can contain the SME encryption mask that indicates
> the PGD is encrypted.  The encryption mask should not be used when
> creating a virtual address from the cr3 register, so remove the SME
> encryption mask in the read_cr3_pa() function.
>
> During early boot SME will need to use a native version of read_cr3_pa(),
> so create native_read_cr3_pa().
>
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/processor-flags.h |3 ++-
>  arch/x86/include/asm/processor.h   |5 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/processor-flags.h 
> b/arch/x86/include/asm/processor-flags.h
> index 79aa2f9..cb6999c 100644
> --- a/arch/x86/include/asm/processor-flags.h
> +++ b/arch/x86/include/asm/processor-flags.h
> @@ -2,6 +2,7 @@
>  #define _ASM_X86_PROCESSOR_FLAGS_H
>
>  #include 
> +#include 
>
>  #ifdef CONFIG_VM86
>  #define X86_VM_MASKX86_EFLAGS_VM
> @@ -33,7 +34,7 @@
>   */
>  #ifdef CONFIG_X86_64
>  /* Mask off the address space ID bits. */
> -#define CR3_ADDR_MASK 0x7000ull
> +#define CR3_ADDR_MASK __sme_clr(0x7000ull)

Can you update the comment one line above, too?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 07/18] xen/pvcalls: implement socket command

2017-06-20 Thread Roger Pau Monné
On Thu, Jun 15, 2017 at 12:09:36PM -0700, Stefano Stabellini wrote:
> Just reply with success to the other end for now. Delay the allocation
> of the actual socket to bind and/or connect.
> 
> Signed-off-by: Stefano Stabellini 
> CC: boris.ostrov...@oracle.com
> CC: jgr...@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 437c2ad..953458b 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -12,12 +12,17 @@
>   * GNU General Public License for more details.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
>  
>  #include 
>  #include 
> @@ -54,6 +59,28 @@ struct pvcalls_fedata {
>  static int pvcalls_back_socket(struct xenbus_device *dev,
>   struct xen_pvcalls_request *req)
>  {
> + struct pvcalls_fedata *fedata;
> + int ret;
> + struct xen_pvcalls_response *rsp;
> +
> + fedata = dev_get_drvdata(>dev);
> +
> + if (req->u.socket.domain != AF_INET ||
> + req->u.socket.type != SOCK_STREAM ||
> + (req->u.socket.protocol != IPPROTO_IP &&
> +  req->u.socket.protocol != AF_INET))
> + ret = -EAFNOSUPPORT;

Sorry for jumping into this out of the blue, but shouldn't all the
constants used above be part of the protocol? AF_INET/SOCK_STREAM/...
are all part of POSIX, but their specific value is not defined in the
standard, hence we should have XEN_AF_INET/XEN_SOCK_STREAM/... Or am I
just missing something?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] xen-disk: use an IOThread per instance

2017-06-20 Thread Paul Durrant
> -Original Message-
> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: 20 June 2017 17:08
> To: Paul Durrant ; xen-de...@lists.xenproject.org;
> qemu-de...@nongnu.org; qemu-bl...@nongnu.org
> Cc: Anthony Perard ; Kevin Wolf
> ; Stefano Stabellini ; Max Reitz
> 
> Subject: Re: [PATCH 3/3] xen-disk: use an IOThread per instance
> 
> On 20/06/2017 15:47, Paul Durrant wrote:
> > This patch allocates an IOThread object for each xen_disk instance and
> > sets the AIO context appropriately on connect. This allows processing
> > of I/O to proceed in parallel.
> >
> > The patch also adds tracepoints into xen_disk to make it possible to
> > follow the state transtions of an instance in the log.
> >
> > Signed-off-by: Paul Durrant 
> 
> The QEMU block layer is not yet thread safe, but code running in
> IOThreads still has to take the AioContext lock.  You need to call
> aio_context_acquire/release in blk_bh and qemu_aio_complete.
> 

Ok, thanks. I'll update the patch and re-test.

Cheers,

  Paul

> Paolo
> 
> > ---
> > Cc: Stefano Stabellini 
> > Cc: Anthony Perard 
> > Cc: Kevin Wolf 
> > Cc: Max Reitz 
> > ---
> >  hw/block/trace-events |  7 +++
> >  hw/block/xen_disk.c   | 44
> +++-
> >  2 files changed, 50 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/block/trace-events b/hw/block/trace-events
> > index 65e83dc258..608b24ba66 100644
> > --- a/hw/block/trace-events
> > +++ b/hw/block/trace-events
> > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int
> num_reqs, uint64_t offset,
> >  # hw/block/hd-geometry.c
> >  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p
> LCHS %d %d %d"
> >  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t
> secs, int trans) "blk %p CHS %u %u %u trans %d"
> > +
> > +# hw/block/xen_disk.c
> > +xen_disk_alloc(char *name) "%s"
> > +xen_disk_init(char *name) "%s"
> > +xen_disk_connect(char *name) "%s"
> > +xen_disk_disconnect(char *name) "%s"
> > +xen_disk_free(char *name) "%s"
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index a9942d32db..ec1085c802 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -27,10 +27,13 @@
> >  #include "hw/xen/xen_backend.h"
> >  #include "xen_blkif.h"
> >  #include "sysemu/blockdev.h"
> > +#include "sysemu/iothread.h"
> >  #include "sysemu/block-backend.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qmp/qdict.h"
> >  #include "qapi/qmp/qstring.h"
> > +#include "qom/object_interfaces.h"
> > +#include "trace.h"
> >
> >  /* - */
> >
> > @@ -128,6 +131,9 @@ struct XenBlkDev {
> >  DriveInfo   *dinfo;
> >  BlockBackend*blk;
> >  QEMUBH  *bh;
> > +
> > +IOThread*iothread;
> > +AioContext  *ctx;
> >  };
> >
> >  /* - */
> > @@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
> >  static void blk_alloc(struct XenDevice *xendev)
> >  {
> >  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> > +Object *obj;
> > +char *name;
> > +Error *err = NULL;
> > +
> > +trace_xen_disk_alloc(xendev->name);
> >
> >  QLIST_INIT(>inflight);
> >  QLIST_INIT(>finished);
> >  QLIST_INIT(>freelist);
> > -blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> > +
> > +obj = object_new(TYPE_IOTHREAD);
> > +name = g_strdup_printf("iothread-%s", xendev->name);
> > +
> > +object_property_add_child(object_get_objects_root(), name, obj,
> );
> > +assert(!err);
> > +
> > +g_free(name);
> > +
> > +user_creatable_complete(obj, );
> > +assert(!err);
> > +
> > +blkdev->iothread = (IOThread *)object_dynamic_cast(obj,
> TYPE_IOTHREAD);
> > +blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> > +blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> > +
> >  if (xen_mode != XEN_EMULATE) {
> >  batch_maps = 1;
> >  }
> > @@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
> >  int info = 0;
> >  char *directiosafe = NULL;
> >
> > +trace_xen_disk_init(xendev->name);
> > +
> >  /* read xenstore entries */
> >  if (blkdev->params == NULL) {
> >  char *h = NULL;
> > @@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
> >  unsigned int i;
> >  uint32_t *domids;
> >
> > +trace_xen_disk_connect(xendev->name);
> > +
> >  /* read-only ? */
> >  if (blkdev->directiosafe) {
> >  qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
> > @@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
> 

Re: [Xen-devel] [PATCH 3/3] xen-disk: use an IOThread per instance

2017-06-20 Thread Paolo Bonzini
On 20/06/2017 15:47, Paul Durrant wrote:
> This patch allocates an IOThread object for each xen_disk instance and
> sets the AIO context appropriately on connect. This allows processing
> of I/O to proceed in parallel.
> 
> The patch also adds tracepoints into xen_disk to make it possible to
> follow the state transtions of an instance in the log.
> 
> Signed-off-by: Paul Durrant 

The QEMU block layer is not yet thread safe, but code running in
IOThreads still has to take the AioContext lock.  You need to call
aio_context_acquire/release in blk_bh and qemu_aio_complete.

Paolo

> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> ---
>  hw/block/trace-events |  7 +++
>  hw/block/xen_disk.c   | 44 +++-
>  2 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 65e83dc258..608b24ba66 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int 
> num_reqs, uint64_t offset,
>  # hw/block/hd-geometry.c
>  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p 
> LCHS %d %d %d"
>  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, 
> int trans) "blk %p CHS %u %u %u trans %d"
> +
> +# hw/block/xen_disk.c
> +xen_disk_alloc(char *name) "%s"
> +xen_disk_init(char *name) "%s"
> +xen_disk_connect(char *name) "%s"
> +xen_disk_disconnect(char *name) "%s"
> +xen_disk_free(char *name) "%s"
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index a9942d32db..ec1085c802 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -27,10 +27,13 @@
>  #include "hw/xen/xen_backend.h"
>  #include "xen_blkif.h"
>  #include "sysemu/blockdev.h"
> +#include "sysemu/iothread.h"
>  #include "sysemu/block-backend.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qstring.h"
> +#include "qom/object_interfaces.h"
> +#include "trace.h"
>  
>  /* - */
>  
> @@ -128,6 +131,9 @@ struct XenBlkDev {
>  DriveInfo   *dinfo;
>  BlockBackend*blk;
>  QEMUBH  *bh;
> +
> +IOThread*iothread;
> +AioContext  *ctx;
>  };
>  
>  /* - */
> @@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
> +Object *obj;
> +char *name;
> +Error *err = NULL;
> +
> +trace_xen_disk_alloc(xendev->name);
>  
>  QLIST_INIT(>inflight);
>  QLIST_INIT(>finished);
>  QLIST_INIT(>freelist);
> -blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> +
> +obj = object_new(TYPE_IOTHREAD);
> +name = g_strdup_printf("iothread-%s", xendev->name);
> +
> +object_property_add_child(object_get_objects_root(), name, obj, );
> +assert(!err);
> +
> +g_free(name);
> +
> +user_creatable_complete(obj, );
> +assert(!err);
> +
> +blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
> +blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> +blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> +
>  if (xen_mode != XEN_EMULATE) {
>  batch_maps = 1;
>  }
> @@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
>  int info = 0;
>  char *directiosafe = NULL;
>  
> +trace_xen_disk_init(xendev->name);
> +
>  /* read xenstore entries */
>  if (blkdev->params == NULL) {
>  char *h = NULL;
> @@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
>  unsigned int i;
>  uint32_t *domids;
>  
> +trace_xen_disk_connect(xendev->name);
> +
>  /* read-only ? */
>  if (blkdev->directiosafe) {
>  qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
> @@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
>  blkdev->persistent_gnt_count = 0;
>  }
>  
> +blk_set_aio_context(blkdev->blk, blkdev->ctx);
> +
>  xen_be_bind_evtchn(>xendev);
>  
>  xen_pv_printf(>xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> @@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
>  
> +trace_xen_disk_disconnect(xendev->name);
> +
> +aio_context_acquire(blkdev->ctx);
> +
>  if (blkdev->blk) {
> +blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
>  blk_detach_dev(blkdev->blk, blkdev);
>  blk_unref(blkdev->blk);
>  blkdev->blk = NULL;
>  }
>  xen_pv_unbind_evtchn(>xendev);
>  
> +

Re: [Xen-devel] [PATCH v4 03/27] x86: move PV gate op emulation code

2017-06-20 Thread Jan Beulich
>>> On 08.06.17 at 19:11,  wrote:
> --- a/xen/include/asm-x86/pv/traps.h
> +++ b/xen/include/asm-x86/pv/traps.h
> @@ -26,10 +26,12 @@
>  #include 
>  
>  int pv_emulate_privileged_op(struct cpu_user_regs *regs);
> +void pv_emulate_gate_op(struct cpu_user_regs *regs);
>  
>  #else  /* !CONFIG_PV */
>  
>  int pv_emulate_privileged_op(struct cpu_user_regs *regs) { return 0; }
> +void pv_emulate_gate_op(struct cpu_user_regs *regs) {}

Missing "inline" (also applies to patch 2 as I've just noticed)? With
that corrected,
Acked-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 02/27] x86: move PV privileged instruction emulation code

2017-06-20 Thread Jan Beulich
>>> On 08.06.17 at 19:11,  wrote:
> Move the code to pv/emul-priv-op.c. Prefix emulate_privileged_op with
> pv_ and export it via pv/traps.h.
> 
> Also move gpr_switch.S since it is used by the privileged instruction
> emulation code only.
> 
> Code motion only except for the rename. Cleanup etc will come later.
> 
> Signed-off-by: Wei Liu 

Acked-by: Jan Beulich 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 01/27] x86: factor out common PV emulation code

2017-06-20 Thread Jan Beulich
>>> On 08.06.17 at 19:11,  wrote:
> We're going to split PV emulation code into several files. This patch
> extracts the functions needed by them into a dedicated file.
> 
> The functions are now prefixed with "pv_emul_" and exported via a
> local header file.
> 
> While at it, change bool_t to bool.

On the basis that beyond this the two functions are simply being
moved, ...

> Signed-off-by: Wei Liu 

Acked-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 11/18] xen/pvcalls: implement accept command

2017-06-20 Thread Boris Ostrovsky

>  static void __pvcalls_back_accept(struct work_struct *work)
>  {
> + struct sockpass_mapping *mappass = container_of(
> + work, struct sockpass_mapping, register_work);
> + struct sock_mapping *map;
> + struct pvcalls_ioworker *iow;
> + struct pvcalls_fedata *fedata;
> + struct socket *sock;
> + struct xen_pvcalls_response *rsp;
> + struct xen_pvcalls_request *req;
> + int notify;
> + int ret = -EINVAL;
> + unsigned long flags;
> +
> + fedata = mappass->fedata;
> + /*
> +  * __pvcalls_back_accept can race against pvcalls_back_accept.
> +  * We only need to check the value of "cmd" on read. It could be
> +  * done atomically, but to simplify the code on the write side, we
> +  * use a spinlock.
> +  */
> + spin_lock_irqsave(>copy_lock, flags);
> + req = >reqcopy;
> + if (req->cmd != PVCALLS_ACCEPT) {
> + spin_unlock_irqrestore(>copy_lock, flags);
> + return;
> + }
> + spin_unlock_irqrestore(>copy_lock, flags);
> +
> + sock = sock_alloc();
> + if (sock == NULL)
> + goto out_error;
> + sock->type = mappass->sock->type;
> + sock->ops = mappass->sock->ops;
> +
> + ret = inet_accept(mappass->sock, sock, O_NONBLOCK, true);
> + if (ret == -EAGAIN) {
> + sock_release(sock);
> + goto out_error;
> + }
> +
> + map = pvcalls_new_active_socket(fedata,
> + req->u.accept.id_new,
> + req->u.accept.ref,
> + req->u.accept.evtchn,
> + sock);
> + if (!map) {
> + sock_release(sock);
> + goto out_error;

Again, lost ret value.

> + }
> +
> + map->sockpass = mappass;
> + iow = >ioworker;
> + atomic_inc(>read);
> + atomic_inc(>io);
> + queue_work(iow->wq, >register_work);
> +
> +out_error:
> + rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
> + rsp->req_id = req->req_id;
> + rsp->cmd = req->cmd;
> + rsp->u.accept.id = req->u.accept.id;
> + rsp->ret = ret;
> + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>ring, notify);
> + if (notify)
> + notify_remote_via_irq(fedata->irq);
> +
> + mappass->reqcopy.cmd = 0;
>  }
>  
>  static void pvcalls_pass_sk_data_ready(struct sock *sock)
>  {
> + struct sockpass_mapping *mappass = sock->sk_user_data;
> +
> + if (mappass == NULL)
> + return;
> +
> + queue_work(mappass->wq, >register_work);
>  }
>  
>  static int pvcalls_back_bind(struct xenbus_device *dev,
> @@ -383,6 +456,43 @@ static int pvcalls_back_listen(struct xenbus_device *dev,
>  static int pvcalls_back_accept(struct xenbus_device *dev,
>  struct xen_pvcalls_request *req)
>  {
> + struct pvcalls_fedata *fedata;
> + struct sockpass_mapping *mappass;
> + int ret = -EINVAL;

Unnecessary initialization.

-boris

> + struct xen_pvcalls_response *rsp;
> + unsigned long flags;
> +
> + fedata = dev_get_drvdata(>dev);
> +
> + mappass = radix_tree_lookup(>socketpass_mappings,
> + req->u.accept.id);
> + if (mappass == NULL)
> + goto out_error;
> +
> + /* 
> +  * Limitation of the current implementation: only support one
> +  * concurrent accept or poll call on one socket.
> +  */
> + spin_lock_irqsave(>copy_lock, flags);
> + if (mappass->reqcopy.cmd != 0) {
> + spin_unlock_irqrestore(>copy_lock, flags);
> + ret = -EINTR;
> + goto out_error;
> + }
> +
> + mappass->reqcopy = *req;
> + spin_unlock_irqrestore(>copy_lock, flags);
> + queue_work(mappass->wq, >register_work);
> +
> + /* Tell the caller we don't need to send back a notification yet */
> + return -1;
> +
> +out_error:
> + rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
> + rsp->req_id = req->req_id;
> + rsp->cmd = req->cmd;
> + rsp->u.accept.id = req->u.accept.id;
> + rsp->ret = ret;
>   return 0;
>  }
>  


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 1/3] livepatch: Add local and global symbol resolution.

2017-06-20 Thread Konrad Rzeszutek Wilk
On Tue, Jun 20, 2017 at 01:51:41AM -0600, Jan Beulich wrote:
> >>> On 20.06.17 at 04:47,  wrote:
> > This way we can load livepatches with symbol names that
> > are the same as long as they are local ('static').
> > 
> > The use case here is to replace an existing livepatch
> > with a newer one - and one which has the same local symbols.
> > 
> > Without this patch we get:
> > livepatch.c:819: livepatch: xen_local_symbols: duplicate new symbol: 
> > revert_hook
> > 
> > when loading the new livepatch (before doing the replace).
> > 
> > This due to livepatch assuming that all symbols are all
> > global. With this patch:
> > 
> > readelf --symbols xen_hello_world.livepatch
> > File: xen_hello_world.livepatch
> > 
> > Symbol table '.symtab' contains 55 entries:
> >Num:Value  Size TypeBind   Vis  Ndx Name
> > ..snip..
> > 34:  4 OBJECT  LOCAL  DEFAULT   25 cnt
> > 35: 000a 8 OBJECT  LOCAL  DEFAULT7 __func__.4654
> > 36: 006523 FUNCLOCAL  DEFAULT2 revert_hook
> > 37: 007c23 FUNCLOCAL  DEFAULT2 apply_hook
> > 38: 009354 FUNCLOCAL  DEFAULT2 check_fnc
> > ..snip..
> > 
> > 47: 54 FUNCGLOBAL HIDDEN 2 xen_hello_world
> > 48:  0 NOTYPE  GLOBAL HIDDEN   UND xen_extra_version
> > ..snip..
> > 52:  0 NOTYPE  GLOBAL HIDDEN   UND printk
> > 53: 64 OBJECT  GLOBAL HIDDEN23 
> > livepatch_xen_hello_world
> > 
> > All the 'GLOBAL' have to be unique per livepatch. But the
> > 'LOCAL' can all be the same which means the semantic of 'static'
> > on functions and data variables is the right one.
> 
> I think this is wrong: Afaict your change results in main image and
> patch local symbols to now be treated differently. While this may
> indeed help patches which are meant to replace others, it is going
> to get in the way if a patch wants to reference a local symbol
> already altered (or newly introduced) by a prior one.
> 
> Question then is at what point in time name collisions should be
> detected: In order for patch replacement to work, this obviously
> shouldn't happen at the time the patch is being loaded, but in the
> course of being applied. Otoh iirc "replace" doesn't use separate
> old and new patch names, so the system must already be aware
> of the correlation between them, and hence collision detection at
> patch load time may still be possible (special casing the patch
> being replaced).

This gets complicated, let me break it down to make sure I got it right.

I think what you are saying is that after a livepatch has been
applied the distinction of global/local should be lifted for
that livepatch.

But before we even get to that stage we have to deal with
the situation in which two (or more) livepatches have identical
local symbols ('revert_hook' for example). And there the
local symbols collision should not happen.

Now you mention an interesting case "if a patch wants to reference
a local symbol already altered (or newly introduced) by a prior one"

Let me break that apart, the "(or newly introduced)" meaning
we would have say two livepatches already applied.
The first one adds this new symbol and the second one
depends on the first livepatch (and uses the symbol).
We are trying to replace the second one with a newer version. 

[I remember talking to Ross about this: 
https://lists.xen.org/archives/html/xen-devel/2016-08/msg01756.html]

If the newly introduced symbol is local, the second livepatch
should _not_ be able to resolve it [But it does today, this patch
fixes this].

If the newly introduced symbol is global, the second livepatch
should be able to resolve it [and it does today]

If this symbol we want to reference is not within the live
patches, but in the main code - then:

If the symbol is local, we will still find it (symbols_lookup_by_name)
If the symbol is global, we will still find it (symbols_lookup_by_name)

Ah, so your point is that since the main code does not provide
_any_ semantics about local/global symbols, then by extension
the livepatches should neither.

While my patch does introduce this distinction?

Hmm.

Well, what I can say is that this issue (local symbol collision
in the livepatches) has been bitting us since November.

Our mechanism when deploying livepatches is to replace the loaded
livepatch with another one. Which means we only have on livepatch
applied and during the upgrade process have to load another one.

In November (XSA-204) we started shipping in the livepatches
a new local symbol (x86_emulate.c#_get_fpu).

livepatch: 617712b: new symbol x86_emulate.c#_get_fpu

Then when the next XSA came about and we made a new livepatch that
included everything prior and the new one, and we got:

livepatch.c:751: livepatch: fbab204: duplicate new symbol: 
x86_emulate.c#_get_fpu

which aborted the 

Re: [Xen-devel] [PATCH v2] xen: idle_loop: either deal with tasklets or go idle

2017-06-20 Thread Jan Beulich
>>> On 20.06.17 at 15:00,  wrote:
> In fact, there are two kinds of tasklets: vCPU and
> softirq context. When we want to do vCPU context tasklet
> work, we force the idle vCPU (of a particular pCPU) into
> execution, and run it from there.
> 
> This means there are two possible reasons for choosing
> to run the idle vCPU:
> 1) we want a pCPU to go idle,
> 2) we want to run some vCPU context tasklet work.
> 
> If we're in case 2), it does not make sense to even
> try to go idle (as the check will _always_ fail).
> 
> This patch rearranges the code of the body of idle
> vCPUs, so that we actually check whether we are in
> case 1) or 2), and act accordingly.
> 
> As a matter of fact, this also means that we do not
> check if there's any tasklet work to do after waking
> up from idle. This is not a problem, because:
> a) for softirq context tasklets, if any is queued
>"during" wakeup from idle, TASKLET_SOFTIRQ is
>raised, and the call to do_softirq() (which is still
>happening *after* the wakeup) will take care of it;
> b) for vCPU context tasklets, if any is queued "during"
>wakeup from idle, SCHEDULE_SOFTIRQ is raised and
>do_softirq() (happening after the wakeup) calls
>the scheduler. The scheduler sees that there is
>tasklet work pending and confirms the idle vCPU
>in execution, which then will get to execute
>do_tasklet().
> 
> Signed-off-by: Dario Faggioli 

Except for ARM bits
Reviewed-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 08/36] x86/mm: Add support to enable SME in early boot processing

2017-06-20 Thread Tom Lendacky

On 6/20/2017 2:38 AM, Borislav Petkov wrote:

On Fri, Jun 16, 2017 at 01:51:15PM -0500, Tom Lendacky wrote:

Add support to the early boot code to use Secure Memory Encryption (SME).
Since the kernel has been loaded into memory in a decrypted state, encrypt
the kernel in place and update the early pagetables with the memory
encryption mask so that new pagetable entries will use memory encryption.

The routines to set the encryption mask and perform the encryption are
stub routines for now with functionality to be added in a later patch.

Because of the need to have the routines available to head_64.S, the
mem_encrypt.c is always built and #ifdefs in mem_encrypt.c will provide
functionality or stub routines depending on CONFIG_AMD_MEM_ENCRYPT.

Signed-off-by: Tom Lendacky 
---
  arch/x86/include/asm/mem_encrypt.h |8 +++
  arch/x86/kernel/head64.c   |   33 +-
  arch/x86/kernel/head_64.S  |   39 ++--
  arch/x86/mm/Makefile   |4 +---
  arch/x86/mm/mem_encrypt.c  |   24 ++
  5 files changed, 93 insertions(+), 15 deletions(-)


...


diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index b99d469..9a78277 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -11,6 +11,9 @@
   */
  
  #include 

+#include 
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
  
  /*

   * Since SME related variables are set early in the boot process they must
@@ -19,3 +22,24 @@
   */
  unsigned long sme_me_mask __section(.data) = 0;
  EXPORT_SYMBOL_GPL(sme_me_mask);
+
+void __init sme_encrypt_kernel(void)
+{
+}


Just the minor:

void __init sme_encrypt_kernel(void) { }

in case you have to respin.


I have to re-spin for the kbuild test error.  But given that this
function will be filled in later it's probably not worth doing the
space savings here.

Thanks,
Tom



Reviewed-by: Borislav Petkov 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 10/18] xen/pvcalls: implement listen command

2017-06-20 Thread Boris Ostrovsky
On 06/15/2017 03:09 PM, Stefano Stabellini wrote:
> Call inet_listen to implement the listen command.
>
> Signed-off-by: Stefano Stabellini 
> CC: boris.ostrov...@oracle.com
> CC: jgr...@suse.com

Reviewed-by: Boris Ostrovsky 

> ---
>  drivers/xen/pvcalls-back.c | 19 +++
>  1 file changed, 19 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index c17d970..e5c535d 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -358,6 +358,25 @@ static int pvcalls_back_bind(struct xenbus_device *dev,
>  static int pvcalls_back_listen(struct xenbus_device *dev,
>  struct xen_pvcalls_request *req)
>  {
> + struct pvcalls_fedata *fedata;
> + int ret = -EINVAL;
> + struct sockpass_mapping *map;
> + struct xen_pvcalls_response *rsp;
> +
> + fedata = dev_get_drvdata(>dev);
> +
> + map = radix_tree_lookup(>socketpass_mappings, req->u.listen.id);
> + if (map == NULL)
> + goto out;
> +
> + ret = inet_listen(map->sock, req->u.listen.backlog);
> +
> +out:
> + rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
> + rsp->req_id = req->req_id;
> + rsp->cmd = req->cmd;
> + rsp->u.listen.id = req->u.listen.id;
> + rsp->ret = ret;
>   return 0;

newline before return would be useful (if you respin this)

>  }
>  


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 07/18] xen/pvcalls: implement socket command

2017-06-20 Thread Boris Ostrovsky
On 06/15/2017 03:09 PM, Stefano Stabellini wrote:
> Just reply with success to the other end for now. Delay the allocation
> of the actual socket to bind and/or connect.
>
> Signed-off-by: Stefano Stabellini 
> CC: boris.ostrov...@oracle.com
> CC: jgr...@suse.com

Reviewed-by: Boris Ostrovsky 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 08/18] xen/pvcalls: implement connect command

2017-06-20 Thread Boris Ostrovsky
On 06/15/2017 03:09 PM, Stefano Stabellini wrote:
> Allocate a socket. Keep track of socket <-> ring mappings with a new data
> structure, called sock_mapping. Implement the connect command by calling
> inet_stream_connect, and mapping the new indexes page and data ring.
> Allocate a workqueue and a work_struct, called ioworker, to perform
> reads and writes to the socket.
>
> When an active socket is closed (sk_state_change), set in_error to
> -ENOTCONN and notify the other end, as specified by the protocol.
>
> sk_data_ready and pvcalls_back_ioworker will be implemented later.
>
> Signed-off-by: Stefano Stabellini 
> CC: boris.ostrov...@oracle.com
> CC: jgr...@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 171 
> +
>  1 file changed, 171 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 953458b..4eecd83 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -56,6 +56,39 @@ struct pvcalls_fedata {
>   struct work_struct register_work;
>  };
>  
> +struct pvcalls_ioworker {
> + struct work_struct register_work;
> + struct workqueue_struct *wq;
> +};
> +
> +struct sock_mapping {
> + struct list_head list;
> + struct pvcalls_fedata *fedata;
> + struct socket *sock;
> + uint64_t id;
> + grant_ref_t ref;
> + struct pvcalls_data_intf *ring;
> + void *bytes;
> + struct pvcalls_data data;
> + uint32_t ring_order;
> + int irq;
> + atomic_t read;
> + atomic_t write;
> + atomic_t io;
> + atomic_t release;
> + void (*saved_data_ready)(struct sock *sk);
> + struct pvcalls_ioworker ioworker;
> +};
> +
> +static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map);
> +static int pvcalls_back_release_active(struct xenbus_device *dev,
> +struct pvcalls_fedata *fedata,
> +struct sock_mapping *map);
> +
> +static void pvcalls_back_ioworker(struct work_struct *work)
> +{
> +}
> +
>  static int pvcalls_back_socket(struct xenbus_device *dev,
>   struct xen_pvcalls_request *req)
>  {
> @@ -84,9 +117,142 @@ static int pvcalls_back_socket(struct xenbus_device *dev,
>   return 0;
>  }
>  
> +static void pvcalls_sk_state_change(struct sock *sock)
> +{
> + struct sock_mapping *map = sock->sk_user_data;
> + struct pvcalls_data_intf *intf;
> +
> + if (map == NULL)
> + return;
> +
> + intf = map->ring;
> + intf->in_error = -ENOTCONN;
> + notify_remote_via_irq(map->irq);
> +}
> +
> +static void pvcalls_sk_data_ready(struct sock *sock)
> +{
> +}
> +
> +static struct sock_mapping *pvcalls_new_active_socket(
> + struct pvcalls_fedata *fedata,
> + uint64_t id,
> + grant_ref_t ref,
> + uint32_t evtchn,
> + struct socket *sock)
> +{
> + int ret;
> + struct sock_mapping *map;
> + void *page;
> +
> + map = kzalloc(sizeof(*map), GFP_KERNEL);
> + if (map == NULL)
> + return NULL;
> +
> + map->fedata = fedata;
> + map->sock = sock;
> + map->id = id;
> + map->ref = ref;
> +
> + ret = xenbus_map_ring_valloc(fedata->dev, , 1, );
> + if (ret < 0)
> + goto out;
> + map->ring = page;
> + map->ring_order = map->ring->ring_order;
> + /* first read the order, then map the data ring */
> + virt_rmb();
> + if (map->ring_order > MAX_RING_ORDER) {
> + pr_warn("%s frontend requested ring_order %u, which is > MAX 
> (%u)\n",
> + __func__, map->ring_order, MAX_RING_ORDER);
> + goto out;
> + }
> + ret = xenbus_map_ring_valloc(fedata->dev, map->ring->ref,
> +  (1 << map->ring_order), );
> + if (ret < 0)
> + goto out;
> + map->bytes = page;
> +
> + ret = bind_interdomain_evtchn_to_irqhandler(fedata->dev->otherend_id,
> + evtchn,
> + pvcalls_back_conn_event,
> + 0,
> + "pvcalls-backend",
> + map);
> + if (ret < 0)
> + goto out;
> + map->irq = ret;
> +
> + map->data.in = map->bytes;
> + map->data.out = map->bytes + XEN_FLEX_RING_SIZE(map->ring_order);
> + 
> + map->ioworker.wq = alloc_workqueue("pvcalls_io", WQ_UNBOUND, 1);
> + if (!map->ioworker.wq)
> + goto out;
> + atomic_set(>io, 1);
> + INIT_WORK(>ioworker.register_work, pvcalls_back_ioworker);
> +
> + down(>socket_lock);
> + list_add_tail(>list, >socket_mappings);
> + up(>socket_lock);
> +
> + write_lock_bh(>sock->sk->sk_callback_lock);
> + map->saved_data_ready = map->sock->sk->sk_data_ready;
> +

Re: [Xen-devel] [PATCH v4 08/18] xen/pvcalls: implement connect command

2017-06-20 Thread Boris Ostrovsky

>> +
>>  static int pvcalls_back_connect(struct xenbus_device *dev,
>>  struct xen_pvcalls_request *req)
>>  {
>> +struct pvcalls_fedata *fedata;
>> +int ret = -EINVAL;
>> +struct socket *sock;
>> +struct sock_mapping *map;
>> +struct xen_pvcalls_response *rsp;
>> +
>> +fedata = dev_get_drvdata(>dev);
>> +
>> +ret = sock_create(AF_INET, SOCK_STREAM, 0, );
>> +if (ret < 0)
>> +goto out;
>> +ret = inet_stream_connect(sock, (struct sockaddr *)>u.connect.addr,
>> +  req->u.connect.len, req->u.connect.flags);
>> +if (ret < 0) {
>> +sock_release(sock);
>> +goto out;
>> +}
>> +
>> +map = pvcalls_new_active_socket(fedata,
>> +req->u.connect.id,
>> +req->u.connect.ref,
>> +req->u.connect.evtchn,
>> +sock);
>> +if (!map) {
>> +sock_release(map->sock);
>> +goto out;
> Unnecessary goto.

Oh, and also ret needs to be set since it will be cleared by
inet_stream_connect().

-boris


>
>> +}
>> +
>> +out:
>> +rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
>> +rsp->req_id = req->req_id;
>> +rsp->cmd = req->cmd;
>> +rsp->u.connect.id = req->u.connect.id;
>> +rsp->ret = ret;
>> +
>> +return ret;
>


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 14/36] x86/mm: Insure that boot memory areas are mapped properly

2017-06-20 Thread Borislav Petkov
On Fri, Jun 16, 2017 at 01:52:32PM -0500, Tom Lendacky wrote:
> The boot data and command line data are present in memory in a decrypted
> state and are copied early in the boot process.  The early page fault
> support will map these areas as encrypted, so before attempting to copy
> them, add decrypted mappings so the data is accessed properly when copied.
> 
> For the initrd, encrypt this data in place. Since the future mapping of
> the initrd area will be mapped as encrypted the data will be accessed
> properly.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/mem_encrypt.h |6 +++
>  arch/x86/include/asm/pgtable.h |3 ++
>  arch/x86/kernel/head64.c   |   30 +--
>  arch/x86/kernel/setup.c|9 +
>  arch/x86/mm/kasan_init_64.c|2 +
>  arch/x86/mm/mem_encrypt.c  |   70 
> 
>  6 files changed, 115 insertions(+), 5 deletions(-)

...

> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index b7671b9..ea5e3a6 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -19,6 +19,8 @@
>  
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  /*
>   * Since SME related variables are set early in the boot process they must
> @@ -101,6 +103,74 @@ void __init sme_early_decrypt(resource_size_t paddr, 
> unsigned long size)
>   __sme_early_enc_dec(paddr, size, false);
>  }
>  
> +static void __init __sme_early_map_unmap_mem(void *vaddr, unsigned long size,
> +  bool map)
> +{
> + unsigned long paddr = (unsigned long)vaddr - __PAGE_OFFSET;
> + pmdval_t pmd_flags, pmd;
> +
> + /* Use early_pmd_flags but remove the encryption mask */
> + pmd_flags = __sme_clr(early_pmd_flags);
> +
> + do {
> + pmd = map ? (paddr & PMD_MASK) + pmd_flags : 0;
> + __early_make_pgtable((unsigned long)vaddr, pmd);
> +
> + vaddr += PMD_SIZE;
> + paddr += PMD_SIZE;
> + size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE;
> + } while (size);
> +
> + write_cr3(__read_cr3());

local_flush_tlb() or __native_flush_tlb(). Probably the native variant
since this is early fun.

> +}
> +
> +static void __init __sme_map_unmap_bootdata(char *real_mode_data, bool map)
> +{
> + struct boot_params *boot_data;
> + unsigned long cmdline_paddr;
> +
> + /* If SME is not active, the bootdata is in the correct state */
> + if (!sme_active())
> + return;
> +
> + if (!map) {
> + /*
> +  * If unmapping, get the command line address before
> +  * unmapping the real_mode_data.
> +  */
> + boot_data = (struct boot_params *)real_mode_data;
> + cmdline_paddr = boot_data->hdr.cmd_line_ptr |
> + ((u64)boot_data->ext_cmd_line_ptr << 32);

Let it stick out:

cmdline_paddr = bd->hdr.cmd_line_ptr | ((u64)bd->ext_cmd_line_ptr << 
32);

> + }
> +
> + __sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), map);
> +
> + if (map) {
> + /*
> +  * If mapping, get the command line address after mapping
> +  * the real_mode_data.
> +  */
> + boot_data = (struct boot_params *)real_mode_data;
> + cmdline_paddr = boot_data->hdr.cmd_line_ptr |
> + ((u64)boot_data->ext_cmd_line_ptr << 32);
> + }
> +
> + if (!cmdline_paddr)
> + return;
> +
> + __sme_early_map_unmap_mem(__va(cmdline_paddr), COMMAND_LINE_SIZE, map);

Ok, so from looking at this function now - it does different things
depending on whether we map or not. So it doesn't look like a worker
function anymore and you can move the stuff back to the original callers
below. Should make the whole flow more readable.

> +}
> +
> +void __init sme_unmap_bootdata(char *real_mode_data)
> +{
> + __sme_map_unmap_bootdata(real_mode_data, false);
> +}
> +
> +void __init sme_map_bootdata(char *real_mode_data)
> +{
> + __sme_map_unmap_bootdata(real_mode_data, true);
> +}
> +
>  void __init sme_early_init(void)
>  {
>   unsigned int i;
> 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 OSSTEST 3/3] ts-livepatch: Expand testcase to include global/local symbols

2017-06-20 Thread Ian Jackson
Konrad Rzeszutek Wilk writes ("[PATCH v1 OSSTEST 3/3] ts-livepatch: Expand 
testcase to include global/local symbols"):
> testing. The test is to verify that the local symbols
> of payloads are ignored during loading.

Can we do this with substeps rather than a conditional test
execution ?

For example, if xen.git should suddenly stop producing this file, it
ought to be a test failure.

Maybe we should make each test into a separate testid ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 2/9] x86/ecam: add handlers for the PVH Dom0 MMCFG areas

2017-06-20 Thread Roger Pau Monne
On Tue, Jun 20, 2017 at 07:14:07AM -0600, Jan Beulich wrote:
> >>> On 20.06.17 at 13:56,  wrote:
> > On Fri, May 19, 2017 at 07:25:22AM -0600, Jan Beulich wrote:
> >> >>> On 27.04.17 at 16:35,  wrote:
> >> > +{
> >> > +unsigned int i;
> >> > +int rc;
> >> > +
> >> > +for ( i = 0; i < pci_mmcfg_config_num; i++ )
> >> > +{
> >> > +rc = register_vpci_ecam_handler(d, pci_mmcfg_config[i].address,
> >> > +
> >> > pci_mmcfg_config[i].start_bus_number,
> >> > +
> >> > pci_mmcfg_config[i].end_bus_number,
> >> > +
> >> > pci_mmcfg_config[i].pci_segment);
> >> > +if ( rc )
> >> > +return rc;
> >> > +}
> >> > +
> >> > +return 0;
> >> > +}
> >> 
> >> What about regions becoming available only post-boot?
> > 
> > This is not yet supported. It needs to be implemented using the
> > PHYSDEVOP_pci_mmcfg_reserved hypercall.
> 
> But then the patch here is incomplete.

OK, I don't think it's going to be a lot of code, it's just
registering extra MMCFG regions.

> >> > +  unsigned int len, unsigned long *data)
> >> > +{
> >> > +struct domain *d = v->domain;
> >> > +struct hvm_ecam *ecam;
> >> > +unsigned int bus, devfn, reg;
> >> > +uint32_t data32;
> >> > +int rc;
> >> > +
> >> > +vpci_lock(d);
> >> > +ecam = vpci_ecam_find(d, addr);
> >> > +if ( !ecam )
> >> > +{
> >> > +vpci_unlock(d);
> >> > +return X86EMUL_UNHANDLEABLE;
> >> > +}
> >> > +
> >> > +vpci_ecam_decode_addr(ecam, addr, , , );
> >> > +
> >> > +if ( vpci_access_check(reg, len) || reg >= 0xfff )
> >> 
> >> So this function iirc allows only 1-, 2-, and 4-byte accesses. Other
> >> than with port I/O, MMCFG allows wider ones, and once again I
> >> don't think hardware would raise any kind of fault in such a case.
> >> The general expectation is for the fabric to split such accesses.
> > 
> > Hm, the PCIe spec is not authoritative in this regard, is states that
> > supporting 8B accesses is not mandatory. Xen/Linux/FreeBSD will never
> > attempt any access > 4B, hence I haven't coded this case.
> > 
> > Would you be fine with leaving this for later, or would you rather
> > have it implemented as part of this series?
> 
> Since it shouldn't meaningfully much more code, I'd prefer if it was
> done right away. Otherwise I'd have to ask for a "fixme" comment,
> and I'd rather avoid such considering the PVHv1 history.

NP, I've just added it. I have however implemented it by splitting the
access into two 4 byte accesses, and performing two calls to
vpci_{read/write}.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [GIT PULL] (xen) stable/for-jens-3.14

2017-06-20 Thread Konrad Rzeszutek Wilk
On Tue, Jun 20, 2017 at 07:11:01AM -0600, Jens Axboe wrote:
> On 06/20/2017 06:13 AM, Konrad Rzeszutek Wilk wrote:
> > Hey Jens,
> > 
> > Please git pull the following branch:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> > stable/for-jens-4.12
> > 
> > which has security and memory leak fixes in xen block driver.
> 
> for-jens-3.14? Anyway, looks fine, pulled for 4.12.

/me facepalm

Thank you!

I did mean 4.12 and then messed up the title.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] memory: don't suppress P2M update in populate_physmap()

2017-06-20 Thread Julien Grall



On 06/20/2017 02:03 PM, Jan Beulich wrote:

On 20.06.17 at 14:51,  wrote:

On 06/20/2017 01:40 PM, Andrew Cooper wrote:

On 20/06/17 13:39, Julien Grall wrote:

On 06/20/2017 09:37 AM, Andrew Cooper wrote:

On 20/06/17 07:19, Jan Beulich wrote:

Commit d18627583d ("memory: don't hand MFN info to translated guests")
wrongly added a null-handle check there - just like stated in its
description for memory_exchange(), the array is also an input for
populate_physmap() (and hence can't reasonably be null). I have no idea
how I've managed to overlook this.

Signed-off-by: Jan Beulich 


Acked-by: Andrew Cooper 


Am I correct that this is not a bug and only a pointless check?


This is a partial reversion of d18627583d and needs to be included in
4.9, to avoid a regression.


Would you mind to explain why this would introduce regression? AFAICT
the check is just redundant, so keeping it is not that bad.


Afaict there would be a regression only if someone invoked the
hypercall with a null handle (but having valid data at address zero).
Still I agree with Andrew that we'd better include this in 4.9.


Hmmm. It didn't occur to me that someone would put valid data at address 
zero.


Release-acked-by: Julien Grall 

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 09/20] rbtree: adjust root color in rb_insert_color() only when necessary

2017-06-20 Thread Dario Faggioli
On Tue, 2017-06-20 at 01:26 -0600, Jan Beulich wrote:
> > > > On 19.06.17 at 19:13,  wrote:
> > And here we are again. (I.e., in the cited Linux's commit, this is
> > being turned into 'while (true) {`.
> > 
> > So, I think we should gather others' opinion about how to deal with
> > these aspects of this series. So, I'll stop my review for now, and
> > chase feedback.
> 
> I fully second your opinion here. I even wonder whether we
> shouldn't convert the file back to be fully Linux style first thing,
> so that Linux changes can be applied (mostly) as is, specifically
> without having to convert tabs to spaces.
> 
That indeed would be good!

Praveen, this would mean having a patch, at the beginning of the
series, which converts the coding style of the files to Linux one.

Basically, that would mean using tabs for indentation, and undoing any
style change that may have been done in our tree, to make the file
adopt the Xen style.

In practise, the idea is ending up with something that is basically
identical to what was in Linux, before all the patches you are porting
were committed (and without the additional parts and features that we
don't need, of course).

At this point, even generating and applying the patches that you are
porting, in this very series, would be really easy, and less error
prone (as it can be almost entirely automated).

Are you up for this?

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] libxl: add PV display device driver interface

2017-06-20 Thread Wei Liu
On Thu, May 25, 2017 at 03:17:29PM +0300, Oleksandr Grytsov wrote:
> From: Oleksandr Grytsov 
> 

I'm sorry, patch like this is impossible to review because: 1. there is
no commit message 2. it is huge.

I can see it is adding a lot of hooks to the device handling framework.
Please explain why they are needed. This sort of changes (refactoring
and extending existing code) should also be in separate patches.

> Signed-off-by: Oleksandr Grytsov 
> ---
>  tools/libxl/Makefile |   2 +-
>  tools/libxl/libxl.h  |  21 ++
>  tools/libxl/libxl_create.c   |   3 +
>  tools/libxl/libxl_device.c   | 178 -
>  tools/libxl/libxl_internal.h |  24 +++
>  tools/libxl/libxl_types.idl  |  40 +++-
>  tools/libxl/libxl_types_internal.idl |   1 +
>  tools/libxl/libxl_usb.c  |   2 +
>  tools/libxl/libxl_utils.h|   4 +
>  tools/libxl/libxl_vdispl.c   | 372 
> +++
>  10 files changed, 643 insertions(+), 4 deletions(-)
>  create mode 100644 tools/libxl/libxl_vdispl.c
>  };
>  
> diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
> index 5e96676..2954800 100644
> --- a/tools/libxl/libxl_device.c
> +++ b/tools/libxl/libxl_device.c
> @@ -18,7 +18,7 @@
>  
>  #include "libxl_internal.h"
>  
> -static char *libxl__device_frontend_path(libxl__gc *gc, libxl__device 
> *device)
> +char *libxl__device_frontend_path(libxl__gc *gc, libxl__device *device)
>  {
>  char *dom_path = libxl__xs_get_dompath(gc, device->domid);
>  
> @@ -1776,6 +1776,182 @@ out:
>  return AO_CREATE_FAIL(rc);
>  }
>  
> +static int device_add_domain_config(libxl__gc *gc, uint32_t domid,
> +const struct libxl_device_type *dt,
> +void *type)
[...]
> +
> +void libxl__device_add(libxl__egc *egc, uint32_t domid,
> +   const struct libxl_device_type *dt, void *type,
> +   libxl__ao_device *aodev)
[...]
> +
> +void* libxl__device_list(const struct libxl_device_type *dt,
> + libxl_ctx *ctx, uint32_t domid, int *num)
[...]
> +
> +void libxl__device_list_free(const struct libxl_device_type *dt,
> + void *list, int num)
> 

I think existing code already provides these functionalities, right?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/3] xen-disk: add support for multi-page shared rings

2017-06-20 Thread Paul Durrant
The blkif protocol has had provision for negotiation of multi-page shared
rings for some time now and many guest OS have support in their frontend
drivers.

This patch makes the necessary modifications to xen-disk support a shared
ring up to order 4 (i.e. 16 pages).

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Kevin Wolf 
Cc: Max Reitz 
---
 hw/block/xen_disk.c | 141 
 1 file changed, 110 insertions(+), 31 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 9b06e3aa81..a9942d32db 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -36,8 +36,6 @@
 
 static int batch_maps   = 0;
 
-static int max_requests = 32;
-
 /* - */
 
 #define BLOCK_SIZE  512
@@ -84,6 +82,8 @@ struct ioreq {
 BlockAcctCookie acct;
 };
 
+#define MAX_RING_PAGE_ORDER 4
+
 struct XenBlkDev {
 struct XenDevicexendev;  /* must be first */
 char*params;
@@ -94,7 +94,8 @@ struct XenBlkDev {
 booldirectiosafe;
 const char  *fileproto;
 const char  *filename;
-int ring_ref;
+unsigned intring_ref[1 << MAX_RING_PAGE_ORDER];
+unsigned intnr_ring_ref;
 void*sring;
 int64_t file_blk;
 int64_t file_size;
@@ -110,6 +111,7 @@ struct XenBlkDev {
 int requests_total;
 int requests_inflight;
 int requests_finished;
+unsigned intmax_requests;
 
 /* Persistent grants extension */
 gbooleanfeature_discard;
@@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
 struct ioreq *ioreq = NULL;
 
 if (QLIST_EMPTY(>freelist)) {
-if (blkdev->requests_total >= max_requests) {
+if (blkdev->requests_total >= blkdev->max_requests) {
 goto out;
 }
 /* allocate new struct */
@@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
 ioreq_runio_qemu_aio(ioreq);
 }
 
-if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
+if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) 
{
 qemu_bh_schedule(blkdev->bh);
 }
 }
@@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
 blk_handle_requests(blkdev);
 }
 
-/*
- * We need to account for the grant allocations requiring contiguous
- * chunks; the worst case number would be
- * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
- * but in order to keep things simple just use
- * 2 * max_req * max_seg.
- */
-#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
-
 static void blk_alloc(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
@@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
 if (xen_mode != XEN_EMULATE) {
 batch_maps = 1;
 }
-if (xengnttab_set_max_grants(xendev->gnttabdev,
-MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
-xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
-  strerror(errno));
-}
 }
 
 static void blk_parse_discard(struct XenBlkDev *blkdev)
@@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
   !blkdev->feature_grant_copy);
 xenstore_write_be_int(>xendev, "info", info);
 
+xenstore_write_be_int(>xendev, "max-ring-page-order",
+  MAX_RING_PAGE_ORDER);
+
 blk_parse_discard(blkdev);
 
 g_free(directiosafe);
@@ -1058,12 +1049,25 @@ out_error:
 return -1;
 }
 
+/*
+ * We need to account for the grant allocations requiring contiguous
+ * chunks; the worst case number would be
+ * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
+ * but in order to keep things simple just use
+ * 2 * max_req * max_seg.
+ */
+#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
+
 static int blk_connect(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 int pers, index, qflags;
 bool readonly = true;
 bool writethrough = true;
+int order, ring_ref;
+unsigned int ring_size, max_grants;
+unsigned int i;
+uint32_t *domids;
 
 /* read-only ? */
 if (blkdev->directiosafe) {
@@ -1138,9 +1142,39 @@ static int blk_connect(struct XenDevice *xendev)
 xenstore_write_be_int64(>xendev, "sectors",
 blkdev->file_size / blkdev->file_blk);
 
-if (xenstore_read_fe_int(>xendev, "ring-ref", >ring_ref) 
== -1) {
+if (xenstore_read_fe_int(>xendev, "ring-page-order",
+ ) == 

[Xen-devel] Synoptek extended outage - Xen Project Massachusetts test lab

2017-06-20 Thread Ian Jackson
Our test and CI facility has been off the internet since approximately
0100 UTC last night.

This is due to a complete outage of the Synoptek datacentre on Boston
Post Road West, in Marlborough, MA.

One of our support staff went to the datacentre and reports:

   I came on site to speak to him personally.  It seems that five
   telephone poles were taken out yesterday evening.  All fiber lines
   were also impacted (data, phones, etc) so there is about a four to
   five mile stretch of road that has no internet.  Currently the ETA
   to bring things back up its around 1 p.m. Eastern Time.

FYI.

I think we should take up the apparent lack of redundancy with
Synoptek, but after service is restored.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/3] xen-disk: use an IOThread per instance

2017-06-20 Thread Paul Durrant
This patch allocates an IOThread object for each xen_disk instance and
sets the AIO context appropriately on connect. This allows processing
of I/O to proceed in parallel.

The patch also adds tracepoints into xen_disk to make it possible to
follow the state transtions of an instance in the log.

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Kevin Wolf 
Cc: Max Reitz 
---
 hw/block/trace-events |  7 +++
 hw/block/xen_disk.c   | 44 +++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/hw/block/trace-events b/hw/block/trace-events
index 65e83dc258..608b24ba66 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int 
num_reqs, uint64_t offset,
 # hw/block/hd-geometry.c
 hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS 
%d %d %d"
 hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int 
trans) "blk %p CHS %u %u %u trans %d"
+
+# hw/block/xen_disk.c
+xen_disk_alloc(char *name) "%s"
+xen_disk_init(char *name) "%s"
+xen_disk_connect(char *name) "%s"
+xen_disk_disconnect(char *name) "%s"
+xen_disk_free(char *name) "%s"
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index a9942d32db..ec1085c802 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -27,10 +27,13 @@
 #include "hw/xen/xen_backend.h"
 #include "xen_blkif.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/iothread.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
+#include "qom/object_interfaces.h"
+#include "trace.h"
 
 /* - */
 
@@ -128,6 +131,9 @@ struct XenBlkDev {
 DriveInfo   *dinfo;
 BlockBackend*blk;
 QEMUBH  *bh;
+
+IOThread*iothread;
+AioContext  *ctx;
 };
 
 /* - */
@@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
 static void blk_alloc(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
+Object *obj;
+char *name;
+Error *err = NULL;
+
+trace_xen_disk_alloc(xendev->name);
 
 QLIST_INIT(>inflight);
 QLIST_INIT(>finished);
 QLIST_INIT(>freelist);
-blkdev->bh = qemu_bh_new(blk_bh, blkdev);
+
+obj = object_new(TYPE_IOTHREAD);
+name = g_strdup_printf("iothread-%s", xendev->name);
+
+object_property_add_child(object_get_objects_root(), name, obj, );
+assert(!err);
+
+g_free(name);
+
+user_creatable_complete(obj, );
+assert(!err);
+
+blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
+blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
+blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
+
 if (xen_mode != XEN_EMULATE) {
 batch_maps = 1;
 }
@@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
 int info = 0;
 char *directiosafe = NULL;
 
+trace_xen_disk_init(xendev->name);
+
 /* read xenstore entries */
 if (blkdev->params == NULL) {
 char *h = NULL;
@@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
 unsigned int i;
 uint32_t *domids;
 
+trace_xen_disk_connect(xendev->name);
+
 /* read-only ? */
 if (blkdev->directiosafe) {
 qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
@@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
 blkdev->persistent_gnt_count = 0;
 }
 
+blk_set_aio_context(blkdev->blk, blkdev->ctx);
+
 xen_be_bind_evtchn(>xendev);
 
 xen_pv_printf(>xendev, 1, "ok: proto %s, nr-ring-ref %u, "
@@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 
+trace_xen_disk_disconnect(xendev->name);
+
+aio_context_acquire(blkdev->ctx);
+
 if (blkdev->blk) {
+blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
 blk_detach_dev(blkdev->blk, blkdev);
 blk_unref(blkdev->blk);
 blkdev->blk = NULL;
 }
 xen_pv_unbind_evtchn(>xendev);
 
+aio_context_release(blkdev->ctx);
+
 if (blkdev->sring) {
 xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
 blkdev->nr_ring_ref);
@@ -1338,6 +1377,8 @@ static int blk_free(struct XenDevice *xendev)
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 struct ioreq *ioreq;
 
+trace_xen_disk_free(xendev->name);
+
 if (blkdev->blk || blkdev->sring) {
 blk_disconnect(xendev);
 }
@@ -1355,6 +1396,7 @@ static int blk_free(struct XenDevice *xendev)
 

[Xen-devel] [PATCH 0/3] xen-disk: performance improvements

2017-06-20 Thread Paul Durrant
Paul Durrant (3):
  xen-disk: only advertize feature-persistent if grant copy is not
available
  xen-disk: add support for multi-page shared rings
  xen-disk: use an IOThread per instance

 hw/block/trace-events |   7 ++
 hw/block/xen_disk.c   | 200 --
 2 files changed, 168 insertions(+), 39 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-20 Thread Paul Durrant
If grant copy is available then it will always be used in preference to
persistent maps. In this case feature-persistent should not be advertized
to the frontend, otherwise it may needlessly copy data into persistently
granted buffers.

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Kevin Wolf 
Cc: Max Reitz 
---
 hw/block/xen_disk.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 3a22805fbc..9b06e3aa81 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
 
 blkdev->file_blk  = BLOCK_SIZE;
 
+blkdev->feature_grant_copy =
+(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
+
+xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
+  blkdev->feature_grant_copy ? "enabled" : "disabled");
+
 /* fill info
  * blk_connect supplies sector-size and sectors
  */
 xenstore_write_be_int(>xendev, "feature-flush-cache", 1);
-xenstore_write_be_int(>xendev, "feature-persistent", 1);
+xenstore_write_be_int(>xendev, "feature-persistent",
+  !blkdev->feature_grant_copy);
 xenstore_write_be_int(>xendev, "info", info);
 
 blk_parse_discard(blkdev);
@@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
 
 xen_be_bind_evtchn(>xendev);
 
-blkdev->feature_grant_copy =
-(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
-
-xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
-  blkdev->feature_grant_copy ? "enabled" : "disabled");
-
 xen_pv_printf(>xendev, 1, "ok: proto %s, ring-ref %d, "
   "remote port %d, local port %d\n",
   blkdev->xendev.protocol, blkdev->ring_ref,
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Implementation of single-stepping for Xen on ARMv7

2017-06-20 Thread Julien Grall



On 06/19/2017 01:59 PM, Florian Jakobsmeier wrote:

Hello Julien,


Hi Florian,

Please try to configure your e-mail client to quote using '>' rather 
than tabulation. This is easier to follow the discussion.



thank you for your answer and sorry for the delay.



2017-06-14 14:26 GMT+02:00 Julien Grall >:




On 06/12/2017 10:34 AM, Florian Jakobsmeier wrote:

Dear all,


Hello Florian,


I don't have much experience with the debug registers, I have CCed
some folks who may have looked at it.

as part of my Bachelor's Thesis I'm trying to implement a
single-stepping functionality for Xen on ARMv7. My problem with
this is, that I'm not able to trigger a Hardware Breakpoint,
which is configured to use Instruction Address Mismatch and
route the exception to Xen.


You are looking at single-stepping for guest, right?


  Yes I'm trying to implement guest single stepping.


I took the x86 single_step implementation as a reference. To
test my implementation I extended the given "xen-access" tool
test, in order to forward the SS request from xen-access.c to
./xen/arch/arm/monitor.c to the "arch_monitor_domctl_event"
function (just like the x86 implementation)

There, I set the necessary registers according to the ARM
Architectur Manual (ARM DDI 0406C-b). My basic idea is to
perform the following steps (in this order):
1) Configure the system to route debug exceptions to Hyp Mode
2) Initialize one breakpoint for Address Mismatching in
Non-Secure PL1/PL0
3) Define the "to be compared" address as "~0x3" (which is all
1s except Bit[1:0])
4) Set the MDBGen to 1 in order to enable Monitor Debug Mode

To check whether or not my values are set in the registers I
print every value before and after manipulation to ensure that
my values are adopted.
To access the registers I used the already defines Makros
(DBGBCR0), but for testing reasons I work with the general
definition (e.g. WRITE_SYSREG(#VALUE,p14,0,c0,c0,5) for DBGBCR0 ).

Preparation:

I ensured that the DBGen Signal is High, I checked the Debug
Version which is v7.1 (read from the DBGAUTHSTATUS). I also made
sure that the underlying system supports sufficient breakpoints.

These are the values I set in the different registers (in this
order again). Every bit that I don't mention is set to 0

- HDCR.{TDRA,TDOSA,TDA,TDE = 1}
which enables routing to Hyp. According to the ARM ARM setting
TDRA,TDOSA,TDA is required when setting TDE

- DBGBCR0.{BT=0b0100, SSC=0b01, PMC=0b11, E=0b1}=  0x404007
this should enable unlinked Address Musmatch, for Non-Secure PL0 >
- DBGBVR0.{IA = ~0x3}
which sets every bit to 1 (this address should never be reached
as it is mismatched)

- DBGDSCREXT.{MDBGen=1}
which enables Monitor Debug Mode


With the value set in HVBAR (hyp_traps_vector in
/xen/arch/arm/arm32/entry.S) the generated HypTrap (HypTrap
instead of PrefetchAbort because of the routing) should be
handled in do_trap_guest_sync. In this method the "hsr.ec
 " Bits should indicate a
PrefetchAbort exception (hsr.ec 
=0x20) whenever the Breakpoint triggers.

I added a simple if statement to print a string when such a
exception was thrown.

Unfortunately these prints are never generated, which indicates
that either I'm searching for the exception handling on the
wrong location or my breakpoints are not correctly configured.

To check if my configuration is wrong, I also tried the KDB
configuration for the DBGBCR (which is DBGBCR=0x4001E7 as far as
I understood). But this changed nothing in the behaviour.

As Hardware I tested my code with an Arndale as well as a Odroid
XU board (Exynos 5250).

It would be great if anyone, who has experience with the ARM
architecture, could help me in finding the missing information
that is required to successfully set up an address mismatch
breakpoint and succesfully route the associated exceptions to Xen.


I've looked at the spec and your description seem to match it. Where
do you configure the debug registers? Is it the vm_event handler or
when returning to the guest vCPU?

Ok thats good to hear. As mentioned, my approach is to extend the 
xen_access test file. Which sets the registers in the Monitor.c in 
/xen/xen/arch/arm (so from within the Hypervisor). Startet is this 
routin from DOM0. So the execution starts in /tools/tests/xen-access and 
gets forwarded to this function. I "trigger" this event by 

Re: [Xen-devel] [PATCH 1/2] xen/livepatch: Clean up arch relocation handling

2017-06-20 Thread Konrad Rzeszutek Wilk
On Tue, Jun 20, 2017 at 01:56:28AM -0600, Jan Beulich wrote:
> >>> On 20.06.17 at 09:39,  wrote:
> > On 20/06/2017 08:36, Jan Beulich wrote:
> > On 19.06.17 at 20:18,  wrote:
> >>> On Wed, Jun 14, 2017 at 07:28:39PM +0100, Andrew Cooper wrote:
>  Having said that, there is no sanity check that r->r_offset is within
>  base->load_addr + sec->sh_size in arm32, whereas both arm64 and x86
>  appear to do this check.
> >>> True.
> >>>
> >>> And the tricky part (it was to me at least) was that ARM32 is all
> >>> REL and not RELA so the opcode gets modified after the operation.
> >>>
> >>> Which means it gets a bit complex to add a boundary check in
> >>> 'get_addend' .
> >>>
> >>> Hm, it would seem the best way is to add a
> >>>
> >>> if ( r->r_offset >= base->sec->sh_size ||   
> >>> (r->r_offset + sizeof(uint32_t)) > base->sec->sh_size ) 
> >> Where's the uint32_t coming from here?
> > 
> > ARM32.  It's a range check that (void *) is within r_offset, as it
> > (void *) + sizeof(disp) -1
> 
> But not all ARM32 relocations fiddle with a 32-bit word. Granted all

Correct. However 'dest' (which is what get_addend operates on) at this point
is the combination of base->load_addr + r->r_offset.

It is more of making sure that what 'get_addend' operates on is
within the livepatch and not somewhere outside of it.

Then get_addend can get more fine grain approach to figuring out the
next part - that is based on the instruction whether the offset
there is OK or not (and boy the semantics for ARM32 ELF REL are
a complex beast).

> that livepatch code currently supports are, but baking something like
> this in makes future modifications more error prone.
> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen/livepatch: Don't crash on encountering STN_UNDEF relocations

2017-06-20 Thread Konrad Rzeszutek Wilk
On Tue, Jun 20, 2017 at 01:15:18AM -0600, Jan Beulich wrote:
> >>> On 20.06.17 at 01:05,  wrote:
> > On 19/06/2017 19:30, Konrad Rzeszutek Wilk wrote:
> >> On Wed, Jun 14, 2017 at 12:49:21PM -0600, Jan Beulich wrote:
> >> Andrew Cooper  06/14/17 8:34 PM >>>
>  Well - I've got a livepatch with such a relocation.  It is probably a
>  livepatch build tools issue, but the question is whether Xen should ever
>  accept such a livepatch or not (irrespective of whether this exact
>  relocation is permitted within the ELF spec).
> >>> Since the spec explicitly mentions that case, I think we'd better support 
> > it.
> >>> But it wouldn't be the end of the world if we didn't, as presumably there
> >>> aren't that many use cases for it.
> >> OK. In that case:
> >>
> >> Acked-by: Konrad Rzeszutek Wilk 
> 
> I have to admit that I'm surprised by that, not only because of
> what Andrew says below, but also because imo the patch would
> imo need to be done somewhat differently, as outlined earlier
> (making STN_UNDEF less of a special case).

My line of thinking was - if the ELF spec is OK, then lets support it.

But then your point about just giving -EOPNOTSUPP is an excellent
way of "supporting" it - and better yet - we can give the system
admin an nice warning: "Fix your livepatch build-tool as your livepatch
is trying to dereference a NULL point which is unhealthy."

(or such).

Andrew you OK posting a patch like that?

> 
> >> Do you think it would be possible to generate an test-case for this
> >> in arch/test/livepatch?
> > 
> > I can trivially cause this situation to occur with the current build
> > tools, but we are currently presuming a build tools bug to be the
> > underlying issue behind getting a STN_UNDEF relocation in the livepatch.
> > 
> > Given that a STN_UNDEF relocation (appears to) mean a NULL dereference
> > once the relocations are evaluated, I am not happy with supporting such
> > a case.
> 
> Well, quite clearly this can be of use only to produce constants,
> not to produce pointers (unless chained ["expression"] relocations
> are being used, where the result of one element in the chain is the
> addend of the next one, albeit even then this would effectively be
> a NOP relocation, so may be useful only when post-editing binaries
> where the tool doesn't want to change [relocation] section sizes).
> 
> > Therefore, I'm going to insist that we take a concrete decision as to
> > what to do in the hypervisor code, before adding a test case, and
> > advocate for excluding it outright rather than tolerating it in the
> > (certain?) knowledge that Xen will subsequently crash.
> 
> As per the explanation above, we can't tell whether Xen will
> subsequently crash, as we don't know what it is that is being
> relocated by such an relocation. While, as indicated before, I'd like
> to see us support everything the standard mandates, I wouldn't
> view it as a big problem to simply return -EOPNOTSUPP for this case
> for the time being.
> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] preparations 4.7.3 and 4.6.6

2017-06-20 Thread Lars Kurth
Hi all,
I am not going to be able to do the website work until Monday, as
travelling until late Friday
Lars

On 20/06/2017, 20:51, "Wei Liu"  wrote:

>On Fri, Jun 09, 2017 at 06:07:56AM -0600, Jan Beulich wrote:
>> All,
>> 
>> with the goal of releasing in about 3 weeks time, please point out
>> backport candidates you find missing from the respective staging
>> branches, but which you consider relevant. Please note that 4.6.6
>> is expected to be the last xenproject.org managed release from
>> its branch.
>
>I don't think I have any backport to do. Please let me know when I
>should tag the tree.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] clean up and modularize arch dma_mapping interface

2017-06-20 Thread Christoph Hellwig
On Tue, Jun 20, 2017 at 11:19:02AM +0200, Daniel Vetter wrote:
> Ack for the 2 drm patches, but I can also pick them up through drm-misc if
> you prefer that (but then it'll be 4.14).

Nah, I'll plan to set up a dma-mapping tree so that we'll have common
place for dma-mapping work.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Christoph Hellwig
On Tue, Jun 20, 2017 at 11:04:00PM +1000, Stephen Rothwell wrote:
> git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git#dma-mapping-next
> 
> Contacts: Marek Szyprowski and Kyungmin Park (cc'd)
> 
> I have called your tree dma-mapping-hch for now.  The other tree has
> not been updated since 4.9-rc1 and I am not sure how general it is.
> Marek, Kyungmin, any comments?

I'd be happy to join efforts - co-maintainers and reviers are always
welcome.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Christoph Hellwig
On Tue, Jun 20, 2017 at 02:14:36PM +0100, Robin Murphy wrote:
> Hi Christoph,
> 
> On 20/06/17 13:41, Christoph Hellwig wrote:
> > On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote:
> >> I plan to create a new dma-mapping tree to collect all this work.
> >> Any volunteers for co-maintainers, especially from the iommu gang?
> > 
> > Ok, I've created the new tree:
> > 
> >git://git.infradead.org/users/hch/dma-mapping.git for-next
> > 
> > Gitweb:
> > 
> >
> > http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next
> > 
> > And below is the patch to add the MAINTAINERS entry, additions welcome.
> 
> I'm happy to be a reviewer, since I've been working in this area for
> some time, particularly with the dma-iommu code and arm64 DMA ops.

Great, I'll add you!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Robin Murphy
Hi Christoph,

On 20/06/17 13:41, Christoph Hellwig wrote:
> On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote:
>> I plan to create a new dma-mapping tree to collect all this work.
>> Any volunteers for co-maintainers, especially from the iommu gang?
> 
> Ok, I've created the new tree:
> 
>git://git.infradead.org/users/hch/dma-mapping.git for-next
> 
> Gitweb:
> 
>
> http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next
> 
> And below is the patch to add the MAINTAINERS entry, additions welcome.

I'm happy to be a reviewer, since I've been working in this area for
some time, particularly with the dma-iommu code and arm64 DMA ops.

Robin.

> Stephen, can you add this to linux-next?
> 
> ---
> From 335979c41912e6c101a20b719862b2d837370df1 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Tue, 20 Jun 2017 11:17:30 +0200
> Subject: MAINTAINERS: add entry for dma mapping helpers
> 
> This code has been spread between getting in through arch trees, the iommu
> tree, -mm and the drivers tree.  There will be a lot of work in this area,
> including consolidating various arch implementations into more common
> code, so ensure we have a proper git tree that facilitates cooperation with
> the architecture maintainers.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  MAINTAINERS | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 09b5ab6a8a5c..56859d53a424 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2595,6 +2595,19 @@ S: Maintained
>  F:   net/bluetooth/
>  F:   include/net/bluetooth/
>  
> +DMA MAPPING HELPERS
> +M:   Christoph Hellwig 
> +L:   linux-ker...@vger.kernel.org
> +T:   git git://git.infradead.org/users/hch/dma-mapping.git
> +W:   http://git.infradead.org/users/hch/dma-mapping.git
> +S:   Supported
> +F:   lib/dma-debug.c
> +F:   lib/dma-noop.c
> +F:   lib/dma-virt.c
> +F:   drivers/base/dma-mapping.c
> +F:   drivers/base/dma-coherent.c
> +F:   include/linux/dma-mapping.h
> +
>  BONDING DRIVER
>  M:   Jay Vosburgh 
>  M:   Veaceslav Falico 
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 2/9] x86/ecam: add handlers for the PVH Dom0 MMCFG areas

2017-06-20 Thread Jan Beulich
>>> On 20.06.17 at 13:56,  wrote:
> On Fri, May 19, 2017 at 07:25:22AM -0600, Jan Beulich wrote:
>> >>> On 27.04.17 at 16:35,  wrote:
>> > +{
>> > +unsigned int i;
>> > +int rc;
>> > +
>> > +for ( i = 0; i < pci_mmcfg_config_num; i++ )
>> > +{
>> > +rc = register_vpci_ecam_handler(d, pci_mmcfg_config[i].address,
>> > +
>> > pci_mmcfg_config[i].start_bus_number,
>> > +
>> > pci_mmcfg_config[i].end_bus_number,
>> > +pci_mmcfg_config[i].pci_segment);
>> > +if ( rc )
>> > +return rc;
>> > +}
>> > +
>> > +return 0;
>> > +}
>> 
>> What about regions becoming available only post-boot?
> 
> This is not yet supported. It needs to be implemented using the
> PHYSDEVOP_pci_mmcfg_reserved hypercall.

But then the patch here is incomplete.

>> > +  unsigned int len, unsigned long *data)
>> > +{
>> > +struct domain *d = v->domain;
>> > +struct hvm_ecam *ecam;
>> > +unsigned int bus, devfn, reg;
>> > +uint32_t data32;
>> > +int rc;
>> > +
>> > +vpci_lock(d);
>> > +ecam = vpci_ecam_find(d, addr);
>> > +if ( !ecam )
>> > +{
>> > +vpci_unlock(d);
>> > +return X86EMUL_UNHANDLEABLE;
>> > +}
>> > +
>> > +vpci_ecam_decode_addr(ecam, addr, , , );
>> > +
>> > +if ( vpci_access_check(reg, len) || reg >= 0xfff )
>> 
>> So this function iirc allows only 1-, 2-, and 4-byte accesses. Other
>> than with port I/O, MMCFG allows wider ones, and once again I
>> don't think hardware would raise any kind of fault in such a case.
>> The general expectation is for the fabric to split such accesses.
> 
> Hm, the PCIe spec is not authoritative in this regard, is states that
> supporting 8B accesses is not mandatory. Xen/Linux/FreeBSD will never
> attempt any access > 4B, hence I haven't coded this case.
> 
> Would you be fine with leaving this for later, or would you rather
> have it implemented as part of this series?

Since it shouldn't meaningfully much more code, I'd prefer if it was
done right away. Otherwise I'd have to ask for a "fixme" comment,
and I'd rather avoid such considering the PVHv1 history.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [GIT PULL] (xen) stable/for-jens-3.14

2017-06-20 Thread Jens Axboe
On 06/20/2017 06:13 AM, Konrad Rzeszutek Wilk wrote:
> Hey Jens,
> 
> Please git pull the following branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> stable/for-jens-4.12
> 
> which has security and memory leak fixes in xen block driver.

for-jens-3.14? Anyway, looks fine, pulled for 4.12.

-- 
Jens Axboe


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Stephen Rothwell
Hi Christoph,

On Tue, 20 Jun 2017 14:41:40 +0200 Christoph Hellwig  wrote:
>
> On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote:
> > I plan to create a new dma-mapping tree to collect all this work.
> > Any volunteers for co-maintainers, especially from the iommu gang?  
> 
> Ok, I've created the new tree:
> 
>git://git.infradead.org/users/hch/dma-mapping.git for-next
> 
> Gitweb:
> 
>
> http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next
> 
> And below is the patch to add the MAINTAINERS entry, additions welcome.
> 
> Stephen, can you add this to linux-next?

Added from tomorrow.

I have another tree called dma-mapping:

git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git#dma-mapping-next

Contacts: Marek Szyprowski and Kyungmin Park (cc'd)

I have called your tree dma-mapping-hch for now.  The other tree has
not been updated since 4.9-rc1 and I am not sure how general it is.
Marek, Kyungmin, any comments?

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgement of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
 * submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
 * posted to the relevant mailing list,
 * reviewed by you (or another maintainer of your subsystem tree),
 * successfully unit tested, and 
 * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
s...@canb.auug.org.au

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] memory: don't suppress P2M update in populate_physmap()

2017-06-20 Thread Jan Beulich
>>> On 20.06.17 at 14:51,  wrote:
> On 06/20/2017 01:40 PM, Andrew Cooper wrote:
>> On 20/06/17 13:39, Julien Grall wrote:
>>> On 06/20/2017 09:37 AM, Andrew Cooper wrote:
 On 20/06/17 07:19, Jan Beulich wrote:
> Commit d18627583d ("memory: don't hand MFN info to translated guests")
> wrongly added a null-handle check there - just like stated in its
> description for memory_exchange(), the array is also an input for
> populate_physmap() (and hence can't reasonably be null). I have no idea
> how I've managed to overlook this.
>
> Signed-off-by: Jan Beulich 

 Acked-by: Andrew Cooper 
>>>
>>> Am I correct that this is not a bug and only a pointless check?
>> 
>> This is a partial reversion of d18627583d and needs to be included in
>> 4.9, to avoid a regression.
> 
> Would you mind to explain why this would introduce regression? AFAICT 
> the check is just redundant, so keeping it is not that bad.

Afaict there would be a regression only if someone invoked the
hypercall with a null handle (but having valid data at address zero).
Still I agree with Andrew that we'd better include this in 4.9.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] xen: idle_loop: either deal with tasklets or go idle

2017-06-20 Thread Dario Faggioli
In fact, there are two kinds of tasklets: vCPU and
softirq context. When we want to do vCPU context tasklet
work, we force the idle vCPU (of a particular pCPU) into
execution, and run it from there.

This means there are two possible reasons for choosing
to run the idle vCPU:
1) we want a pCPU to go idle,
2) we want to run some vCPU context tasklet work.

If we're in case 2), it does not make sense to even
try to go idle (as the check will _always_ fail).

This patch rearranges the code of the body of idle
vCPUs, so that we actually check whether we are in
case 1) or 2), and act accordingly.

As a matter of fact, this also means that we do not
check if there's any tasklet work to do after waking
up from idle. This is not a problem, because:
a) for softirq context tasklets, if any is queued
   "during" wakeup from idle, TASKLET_SOFTIRQ is
   raised, and the call to do_softirq() (which is still
   happening *after* the wakeup) will take care of it;
b) for vCPU context tasklets, if any is queued "during"
   wakeup from idle, SCHEDULE_SOFTIRQ is raised and
   do_softirq() (happening after the wakeup) calls
   the scheduler. The scheduler sees that there is
   tasklet work pending and confirms the idle vCPU
   in execution, which then will get to execute
   do_tasklet().

Signed-off-by: Dario Faggioli 
---
Cc: Andrew Cooper 
Cc: Julien Grall 
Cc: Stefano Stabellini 
Cc: Boris Ostrovsky 
Cc: Jan Beulich 
---
Changes from v1:
* drop the pointless parentheses and indirection of pm_idle (in x86's idle
  loop);
* remove the 'cpu' input parameter to do_tasklet(), as suggested during review;
* in do_tasklet(), convert the check that there is tasklet work to do into an
  ASSERT() to the same effect, as suggested during review;
* add a comment in cpu_is_haltable() on why we check the per-cpu
  tasklet_work_to_do variable directly, rather than calling the new
  tasklet_work_to_do() helper.
---
 xen/arch/arm/domain.c |   21 ++---
 xen/arch/x86/domain.c |   12 +---
 xen/common/tasklet.c  |   12 
 xen/include/xen/sched.h   |5 +
 xen/include/xen/tasklet.h |   10 ++
 5 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 76310ed..2dc8b0a 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -41,20 +41,27 @@ DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 
 void idle_loop(void)
 {
+unsigned int cpu = smp_processor_id();
+
 for ( ; ; )
 {
-if ( cpu_is_offline(smp_processor_id()) )
+if ( cpu_is_offline(cpu) )
 stop_cpu();
 
-local_irq_disable();
-if ( cpu_is_haltable(smp_processor_id()) )
+/* Are we here for running vcpu context tasklets, or for idling? */
+if ( unlikely(tasklet_work_to_do(cpu)) )
+do_tasklet();
+else
 {
-dsb(sy);
-wfi();
+local_irq_disable();
+if ( cpu_is_haltable(cpu) )
+{
+dsb(sy);
+wfi();
+}
+local_irq_enable();
 }
-local_irq_enable();
 
-do_tasklet();
 do_softirq();
 /*
  * We MUST be last (or before dsb, wfi). Otherwise after we get the
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 49388f4..3a061a9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -112,12 +112,18 @@ static void play_dead(void)
 
 static void idle_loop(void)
 {
+unsigned int cpu = smp_processor_id();
+
 for ( ; ; )
 {
-if ( cpu_is_offline(smp_processor_id()) )
+if ( cpu_is_offline(cpu) )
 play_dead();
-(*pm_idle)();
-do_tasklet();
+
+/* Are we here for running vcpu context tasklets, or for idling? */
+if ( unlikely(tasklet_work_to_do(cpu)) )
+do_tasklet();
+else
+pm_idle();
 do_softirq();
 /*
  * We MUST be last (or before pm_idle). Otherwise after we get the
diff --git a/xen/common/tasklet.c b/xen/common/tasklet.c
index 365a777..0f0a6f8 100644
--- a/xen/common/tasklet.c
+++ b/xen/common/tasklet.c
@@ -111,11 +111,15 @@ void do_tasklet(void)
 struct list_head *list = _cpu(tasklet_list, cpu);
 
 /*
- * Work must be enqueued *and* scheduled. Otherwise there is no work to
- * do, and/or scheduler needs to run to update idle vcpu priority.
+ * We want to be sure any caller has checked that a tasklet is both
+ * enqueued and scheduled, before calling this. And, if the caller has
+ * actually checked, it's not an issue that we are outside of the
+ * critical region, in fact:
+ * - TASKLET_enqueued is cleared only here,
+ * - TASKLET_scheduled is only cleared when schedule() find it set,
+ *   without 

Re: [Xen-devel] preparations 4.7.3 and 4.6.6

2017-06-20 Thread Wei Liu
On Fri, Jun 09, 2017 at 06:07:56AM -0600, Jan Beulich wrote:
> All,
> 
> with the goal of releasing in about 3 weeks time, please point out
> backport candidates you find missing from the respective staging
> branches, but which you consider relevant. Please note that 4.6.6
> is expected to be the last xenproject.org managed release from
> its branch.

I don't think I have any backport to do. Please let me know when I
should tag the tree.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] memory: don't suppress P2M update in populate_physmap()

2017-06-20 Thread Julien Grall

Hi,

On 06/20/2017 01:40 PM, Andrew Cooper wrote:

On 20/06/17 13:39, Julien Grall wrote:

Hi,

On 06/20/2017 09:37 AM, Andrew Cooper wrote:

On 20/06/17 07:19, Jan Beulich wrote:

Commit d18627583d ("memory: don't hand MFN info to translated guests")
wrongly added a null-handle check there - just like stated in its
description for memory_exchange(), the array is also an input for
populate_physmap() (and hence can't reasonably be null). I have no idea
how I've managed to overlook this.

Signed-off-by: Jan Beulich 


Acked-by: Andrew Cooper 


Am I correct that this is not a bug and only a pointless check?


This is a partial reversion of d18627583d and needs to be included in
4.9, to avoid a regression.


Would you mind to explain why this would introduce regression? AFAICT 
the check is just redundant, so keeping it is not that bad.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 1/2] x86/monitor: add masking support for write_ctrlreg events

2017-06-20 Thread Wei Liu
On Mon, Jun 19, 2017 at 03:24:38PM +0300, Petre Pircalabu wrote:
> Add support for filtering out the write_ctrlreg monitor events if they
> are generated only by changing certains bits.
> A new parameter (bitmask) was added to the xc_monitor_write_ctrlreg
> function in order to mask the event generation if the changed bits are
> set.
> 
> Signed-off-by: Petre Pircalabu 
> Acked-by: Tamas K Lengyel 
> ---
>  tools/libxc/include/xenctrl.h | 2 +-
>  tools/libxc/xc_monitor.c  | 5 -

Acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Christoph Hellwig
On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote:
> I plan to create a new dma-mapping tree to collect all this work.
> Any volunteers for co-maintainers, especially from the iommu gang?

Ok, I've created the new tree:

   git://git.infradead.org/users/hch/dma-mapping.git for-next

Gitweb:

   
http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next

And below is the patch to add the MAINTAINERS entry, additions welcome.

Stephen, can you add this to linux-next?

---
From 335979c41912e6c101a20b719862b2d837370df1 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Tue, 20 Jun 2017 11:17:30 +0200
Subject: MAINTAINERS: add entry for dma mapping helpers

This code has been spread between getting in through arch trees, the iommu
tree, -mm and the drivers tree.  There will be a lot of work in this area,
including consolidating various arch implementations into more common
code, so ensure we have a proper git tree that facilitates cooperation with
the architecture maintainers.

Signed-off-by: Christoph Hellwig 
---
 MAINTAINERS | 13 +
 1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 09b5ab6a8a5c..56859d53a424 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2595,6 +2595,19 @@ S:   Maintained
 F: net/bluetooth/
 F: include/net/bluetooth/
 
+DMA MAPPING HELPERS
+M: Christoph Hellwig 
+L: linux-ker...@vger.kernel.org
+T: git git://git.infradead.org/users/hch/dma-mapping.git
+W: http://git.infradead.org/users/hch/dma-mapping.git
+S: Supported
+F: lib/dma-debug.c
+F: lib/dma-noop.c
+F: lib/dma-virt.c
+F: drivers/base/dma-mapping.c
+F: drivers/base/dma-coherent.c
+F: include/linux/dma-mapping.h
+
 BONDING DRIVER
 M: Jay Vosburgh 
 M: Veaceslav Falico 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] memory: don't suppress P2M update in populate_physmap()

2017-06-20 Thread Andrew Cooper
On 20/06/17 13:39, Julien Grall wrote:
> Hi,
>
> On 06/20/2017 09:37 AM, Andrew Cooper wrote:
>> On 20/06/17 07:19, Jan Beulich wrote:
>>> Commit d18627583d ("memory: don't hand MFN info to translated guests")
>>> wrongly added a null-handle check there - just like stated in its
>>> description for memory_exchange(), the array is also an input for
>>> populate_physmap() (and hence can't reasonably be null). I have no idea
>>> how I've managed to overlook this.
>>>
>>> Signed-off-by: Jan Beulich 
>>
>> Acked-by: Andrew Cooper 
>
> Am I correct that this is not a bug and only a pointless check?

This is a partial reversion of d18627583d and needs to be included in
4.9, to avoid a regression.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] memory: don't suppress P2M update in populate_physmap()

2017-06-20 Thread Julien Grall

Hi,

On 06/20/2017 09:37 AM, Andrew Cooper wrote:

On 20/06/17 07:19, Jan Beulich wrote:

Commit d18627583d ("memory: don't hand MFN info to translated guests")
wrongly added a null-handle check there - just like stated in its
description for memory_exchange(), the array is also an input for
populate_physmap() (and hence can't reasonably be null). I have no idea
how I've managed to overlook this.

Signed-off-by: Jan Beulich 


Acked-by: Andrew Cooper 


Am I correct that this is not a bug and only a pointless check?

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] Xen Security Advisory 216 - blkif responses leak backend stack data

2017-06-20 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory XSA-216
  version 4

blkif responses leak backend stack data

UPDATES IN VERSION 4


Move "For patch:" Reported-by to patches as intended.

ISSUE DESCRIPTION
=

The block interface response structure has some discontiguous fields.
Certain backends populate the structure fields of an otherwise
uninitialized instance of this structure on their stacks, leaking
data through the (internal or trailing) padding field.

IMPACT
==

A malicious unprivileged guest may be able to obtain sensitive
information from the host or other guests.

VULNERABLE SYSTEMS
==

All Linux versions supporting the xen-blkback, blkback, or blktap
drivers are vulnerable.

FreeBSD, NetBSD and Windows (with or without PV drivers) are not
vulnerable (either because they do not have backends at all, or
because they use a different implementation technique which does not
suffer from this problem).

All qemu versions supporting the Xen block backend are vulnerable.  The
qemu-xen-traditional code base does not include such code, so is not
vulnerable.  Note that an instance of qemu will be spawned to provide
the backend for most non-raw-format disks; so you may need to apply the
patch to qemu even if you use only PV guests.

MITIGATION
==

There's no mitigation available for x86 PV and ARM guests.

For x86 HVM guests it may be possible to change the guest
configuaration such that a fully virtualized disk is being made
available instead.  However, this would normally entail changes inside
the guest itself.

CREDITS
===

This issue was discovered by Anthony Perard of Citrix.

RESOLUTION
==

Applying the appropriate attached patch resolves this issue.

xsa216-linux-4.11.patch   Linux 4.5 ... 4.11
xsa216-linux-4.4.patchLinux 3.3 ... 4.4
xsa216-qemuu.patchqemu-upstream master, 4.8
xsa216-qemuu-4.7.patchqemu-upstream 4.7, 4.6
xsa216-qemuu-4.5.patchqemu-upstream 4.5
xsa216-linux-2.6.18-xen.patch linux-2.6.18-xen.hg

$ sha256sum xsa216*
d316e16f8da2078966e9d7d516dd0a9ed5a29c3bc479974374c8fa778859913d  
xsa216-linux-2.6.18-xen.patch
4440fe324b61baf0f3f5a73352c4d9ac6f94917e216d8421263a5e67445852db  
xsa216-linux-4.4.patch
eb24bfc0303e13e08fd3710463aea139a92a3f83db7f35119c4d3831154a6453  
xsa216-linux-4.11.patch
b4b8f68fa05d718c5be7023c84d942e43725bcc563ea15556ee9646f6f9bf7e7  
xsa216-qemuu.patch
4fc3665ff07ec79fb31ac66a3fd360a45b7ec546c549c04284f0128ad0c5beba  
xsa216-qemuu-4.5.patch
a0e0dfd5ea2643ae14c220124194388017a3656db3e6ce430913cda800c43aad  
xsa216-qemuu-4.7.patch
$

DEPLOYMENT DURING EMBARGO
=

Deployment of the patches described above (or others which are
substantially similar) is permitted during the embargo, even on
public-facing systems with untrusted guest users and administrators.

However, deployment of the mitigation is NOT permitted (except where
all the affected systems and VMs are administered and used only by
organisations which are members of the Xen Project Security Issues
Predisclosure List).  Specifically, deployment on public cloud systems
is NOT permitted.  This is because this produces a guest-visible
change which will indicate which component contains the vulnerability.

Additionally, distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBCAAGBQJZSRYiAAoJEIP+FMlX6CvZILQIALI17G6L+BIr6rXHglleL6Lz
E9Rvlng8K3e5088Hzs5gwq0c9jeK7i9B8PIjdgTH8OS1YjwpWF4wdPveSNACules
590SQVdwN2+Q1oTqdEECnaaCdl7eiEiv+2vRr+LYXNSLJuIRclnc/Fv3nTz/iuTM
5vwIVS/rpdETDBcMJVbCRvUbMeCx/ZM8+lNmEe20QP6h++pmc8wT7B54yGVwk6LT
eknzRMFYhUqcF8eLTJ/QyHf94x1mujVCHNKbOXkMQ27lWAJ5Jt2ut0IZeA6CFAlw
j/u8azGv9VIpXGLp2R1OWPYbEYeAzvjNC7+qoixiscSvfPkiSTfAv7pmr52jvGg=
=+gya
-END PGP SIGNATURE-


xsa216-linux-2.6.18-xen.patch
Description: Binary data


xsa216-linux-4.4.patch
Description: Binary data


xsa216-linux-4.11.patch
Description: Binary data


xsa216-qemuu.patch
Description: Binary data


xsa216-qemuu-4.5.patch
Description: Binary data


xsa216-qemuu-4.7.patch
Description: Binary data
___
Xen-devel mailing list
Xen-devel@lists.xen.org

[Xen-devel] [GIT PULL] (xen) stable/for-jens-3.14

2017-06-20 Thread Konrad Rzeszutek Wilk
Hey Jens,

Please git pull the following branch:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
stable/for-jens-4.12

which has security and memory leak fixes in xen block driver.

Thank you.

Jan Beulich (1):
  xen-blkback: don't leak stack data via response ring

Juergen Gross (3):
  xen/blkback: fix disconnect while I/Os in flight
  xen/blkback: don't free be structure too early
  xen/blkback: don't use xen_blkif_get() in xen-blkback kthread


 drivers/block/xen-blkback/blkback.c | 26 --
 drivers/block/xen-blkback/common.h  | 26 ++
 drivers/block/xen-blkback/xenbus.c  | 15 ---
 3 files changed, 26 insertions(+), 41 deletions(-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen/disk: don't leak stack data via response ring

2017-06-20 Thread Jan Beulich
Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other (Linux)
backends do. Build on the fact that all response structure flavors are
actually identical (the old code did make this assumption too).

This is XSA-216.

Reported by: Anthony Perard 
Signed-off-by: Jan Beulich 
Reviewed-by: Konrad Rzeszutek Wilk 
Acked-by: Anthony PERARD 
---
v2: Add QEMU_PACKED to fix handling 32-bit guests by 64-bit qemu.

--- a/hw/block/xen_blkif.h
+++ b/hw/block/xen_blkif.h
@@ -14,9 +14,6 @@
 struct blkif_common_request {
 char dummy;
 };
-struct blkif_common_response {
-char dummy;
-};
 
 /* i386 protocol version */
 #pragma pack(push, 4)
@@ -36,13 +33,7 @@ struct blkif_x86_32_request_discard {
 blkif_sector_t sector_number;/* start sector idx on disk (r/w only)  */
 uint64_t   nr_sectors;   /* # of contiguous sectors to discard   */
 };
-struct blkif_x86_32_response {
-uint64_tid;  /* copied from request */
-uint8_t operation;   /* copied from request */
-int16_t status;  /* BLKIF_RSP_???   */
-};
 typedef struct blkif_x86_32_request blkif_x86_32_request_t;
-typedef struct blkif_x86_32_response blkif_x86_32_response_t;
 #pragma pack(pop)
 
 /* x86_64 protocol version */
@@ -62,20 +53,14 @@ struct blkif_x86_64_request_discard {
 blkif_sector_t sector_number;/* start sector idx on disk (r/w only)  */
 uint64_t   nr_sectors;   /* # of contiguous sectors to discard   */
 };
-struct blkif_x86_64_response {
-uint64_t   __attribute__((__aligned__(8))) id;
-uint8_t operation;   /* copied from request */
-int16_t status;  /* BLKIF_RSP_???   */
-};
 typedef struct blkif_x86_64_request blkif_x86_64_request_t;
-typedef struct blkif_x86_64_response blkif_x86_64_response_t;
 
 DEFINE_RING_TYPES(blkif_common, struct blkif_common_request,
-  struct blkif_common_response);
+  struct blkif_response);
 DEFINE_RING_TYPES(blkif_x86_32, struct blkif_x86_32_request,
-  struct blkif_x86_32_response);
+  struct blkif_response QEMU_PACKED);
 DEFINE_RING_TYPES(blkif_x86_64, struct blkif_x86_64_request,
-  struct blkif_x86_64_response);
+  struct blkif_response);
 
 union blkif_back_rings {
 blkif_back_ring_tnative;
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -769,31 +769,30 @@ static int blk_send_response_one(struct
 struct XenBlkDev  *blkdev = ioreq->blkdev;
 int   send_notify   = 0;
 int   have_requests = 0;
-blkif_response_t  resp;
-void  *dst;
-
-resp.id= ioreq->req.id;
-resp.operation = ioreq->req.operation;
-resp.status= ioreq->status;
+blkif_response_t  *resp;
 
 /* Place on the response ring for the relevant domain. */
 switch (blkdev->protocol) {
 case BLKIF_PROTOCOL_NATIVE:
-dst = RING_GET_RESPONSE(>rings.native, 
blkdev->rings.native.rsp_prod_pvt);
+resp = RING_GET_RESPONSE(>rings.native,
+ blkdev->rings.native.rsp_prod_pvt);
 break;
 case BLKIF_PROTOCOL_X86_32:
-dst = RING_GET_RESPONSE(>rings.x86_32_part,
-blkdev->rings.x86_32_part.rsp_prod_pvt);
+resp = RING_GET_RESPONSE(>rings.x86_32_part,
+ blkdev->rings.x86_32_part.rsp_prod_pvt);
 break;
 case BLKIF_PROTOCOL_X86_64:
-dst = RING_GET_RESPONSE(>rings.x86_64_part,
-blkdev->rings.x86_64_part.rsp_prod_pvt);
+resp = RING_GET_RESPONSE(>rings.x86_64_part,
+ blkdev->rings.x86_64_part.rsp_prod_pvt);
 break;
 default:
-dst = NULL;
 return 0;
 }
-memcpy(dst, , sizeof(resp));
+
+resp->id= ioreq->req.id;
+resp->operation = ioreq->req.operation;
+resp->status= ioreq->status;
+
 blkdev->rings.common.rsp_prod_pvt++;
 
 RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>rings.common, send_notify);


xen/disk: don't leak stack data via response ring

Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other (Linux)
backends do. Build on the fact that all response structure flavors are
actually identical (the old code did make this assumption too).

This is XSA-216.

Reported by: Anthony Perard 
Signed-off-by: Jan Beulich 
Reviewed-by: Konrad Rzeszutek Wilk 
Acked-by: Anthony PERARD 
---
v2: Add QEMU_PACKED to fix handling 32-bit guests by 64-bit qemu.

--- a/hw/block/xen_blkif.h
+++ b/hw/block/xen_blkif.h
@@ -14,9 +14,6 @@
 

[Xen-devel] Xen Security Advisory 218 - Races in the grant table unmap code

2017-06-20 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory XSA-218
  version 4

 Races in the grant table unmap code

UPDATES IN VERSION 4


Adjust last patch description and add review tag.

Public release.

ISSUE DESCRIPTION
=

We have discovered two bugs in the code unmapping grant references.

* When a grant had been mapped twice by a backend domain, and then
unmapped by two concurrent unmap calls, the frontend may be informed
that the page had no further mappings when the first call completed rather
than when the second call completed.

* A race triggerable by an unprivileged guest could cause a grant
maptrack entry for grants to be "freed" twice.  The ultimate effect of
this would be for maptrack entries for a single domain to be re-used.

IMPACT
==

For the first issue, for a short window of time, a malicious backend
could still read and write memory that the frontend thought was its
own again.  Depending on the usage, this could be either an
information leak, or a backend-to-frontend privilege escalation.

The second issue is more difficult to analyze. It can probably cause
reference counts to leak, preventing memory from being freed on domain
destruction (denial-of-service), but information leakage or host
privilege escalation cannot be ruled out.

VULNERABLE SYSTEMS
==

All versions of Xen are vulnerable.

Both ARM and x86 are vulnerable.

On x86, systems with either PV or HVM guests are vulnerable.

MITIGATION
==

None.

CREDITS
===

This issue was discovered by Jann Horn of Google Project Zero.

RESOLUTION
==

Applying the appropriate set of attached patches resolves this issue.

xsa218-unstable/*.patchxen-unstable
xsa218-4.8/*.patch Xen 4.8.x
xsa218-4.7/*.patch Xen 4.7.x
xsa218-4.6/*.patch Xen 4.6.x
xsa218-4.5/*.patch Xen 4.5.x

$ sha256sum xsa218*/*
6f5e588edb6d3f0a37b89235e95cdcc7ca73cdff236d86b65e6f608bd15b03ec  
xsa218-unstable/0001-gnttab-fix-unmap-pin-accounting-race.patch
5cb85f0aaa19ff343fc51b08addbf37d62352774115acd28eb18a73f67507e21  
xsa218-unstable/0002-gnttab-Avoid-potential-double-put-of-maptrack-entry.patch
f5f3d27ce2829b3aa5e09b216bf9afcb1dc6b1f9f3b3a0f3ebfe5a68b4948aef  
xsa218-unstable/0003-gnttab-correct-maptrack-table-accesses.patch
fafb8773957bbffb21ab43c7a3559efe15f52d234afba5f2ad2739411946c021  
xsa218-4.5/0001-IOMMU-handle-IOMMU-mapping-and-unmapping-failures.patch
4398ad7111421dbf954ede651cb7f9acd83c654c7fa93d54a4e5f9b7b25fe918  
xsa218-4.5/0002-gnttab-fix-unmap-pin-accounting-race.patch
9d23946afb96a70c574b8c7ff42ed8b30b72e9a1f751ff617a7578c79645c094  
xsa218-4.5/0003-gnttab-Avoid-potential-double-put-of-maptrack-entry.patch
27d92c6f4d89de3fd9e9311337823370303c1ef985cce2bd9bea28f00cd6c184  
xsa218-4.5/0004-gnttab-correct-maptrack-table-accesses.patch
99ac090d7955a46c6c9c73ca62b64cef6b8f05439961e52278c662f030a36ee2  
xsa218-4.6/0001-IOMMU-handle-IOMMU-mapping-and-unmapping-failures.patch
e0f0839336e055c1422cf0f76c37f6d9cc8474b0140ffef2451dca6697a9f20f  
xsa218-4.6/0002-gnttab-fix-unmap-pin-accounting-race.patch
5f6f63211b18bb6ec157353b9e8b844abe3fd767ef1780e6d28731e935559fbc  
xsa218-4.6/0003-gnttab-Avoid-potential-double-put-of-maptrack-entry.patch
6a786a8c4b916b6f99092598bd4d60381907cd7e728c98a79e999afeec4f45a6  
xsa218-4.6/0004-gnttab-correct-maptrack-table-accesses.patch
58354eec5f4f0b87640c702c6e1ce0eeb57dffbd09394a96e88bd6ff42c53e7e  
xsa218-4.7/0001-IOMMU-handle-IOMMU-mapping-and-unmapping-failures.patch
0683d7ffdbe60dc8e1d161adeb0c5465df1840e86353b5cbb96dd204f2dbb526  
xsa218-4.7/0002-gnttab-fix-unmap-pin-accounting-race.patch
6bfef9e1653a305e49653c5b81acb57ca41ee8410ea085d49c9bc7e4ccd31e54  
xsa218-4.7/0003-gnttab-Avoid-potential-double-put-of-maptrack-entry.patch
b4ede29e3a94d9e7992c90b8b7c8d489e071764218b28962b5755a444040e1ae  
xsa218-4.7/0004-gnttab-correct-maptrack-table-accesses.patch
c2a1b40e76764333f3ee34dd9bc7d3e34bab91f8b44eaae7aa6f187bbddb358f  
xsa218-4.8/0001-gnttab-fix-unmap-pin-accounting-race.patch
a210ff17a0ca1a81f2c98cce84a104ac7dd2f1a72fa3855ca5f3b3d13e95468c  
xsa218-4.8/0002-gnttab-Avoid-potential-double-put-of-maptrack-entry.patch
0b8fa3d6a0f3ccb43c8134db2240867d5a850ee0821d4124a1642596b4d6cb5a  
xsa218-4.8/0003-gnttab-correct-maptrack-table-accesses.patch
$

DEPLOYMENT DURING EMBARGO
=

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly 

[Xen-devel] Xen Security Advisory 223 - ARM guest disabling interrupt may crash Xen

2017-06-20 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory XSA-223
  version 2

  ARM guest disabling interrupt may crash Xen

UPDATES IN VERSION 2


Public release.

ISSUE DESCRIPTION
=

Virtual interrupt injection could be triggered by a guest when sending
an SGI (e.g IPI) to any vCPU or by configuring timers. When the virtual
interrupt is masked, a missing check in the injection path may result in
reading invalid hardware register or crashing the host.

IMPACT
==

A guest may cause a hypervisor crash, resulting in a Denial of Service
(DoS).

VULNERABLE SYSTEMS
==

All Xen versions which support ARM are affected.

x86 systems are not affected.

MITIGATION
==

On systems where the guest kernel is controlled by the host rather than
guest administrator, running only kernels which do not disable SGI and
PPI (i.e IRQ < 32) will prevent untrusted guest users from exploiting
this issue. However untrusted guest administrators can still trigger it
unless further steps are taken to prevent them from loading code into
the kernel (e.g by disabling loadable modules etc) or from using other
mechanisms which allow them to run code at kernel privilege.

CREDITS
===

This issue was discovered by Julien Grall of ARM.

RESOLUTION
==

Applying the attached patch resolves this issue.

xsa223.patch   xen-unstable, Xen 4.8.x, Xen 4.7.x, Xen 4.6.x, Xen 4.5.x

$ sha256sum xsa223*
b5c8d8e8dac027069bec7dd812cff3f6f99e5949dd4a8ee729255c38274958b1  xsa223.patch
$

DEPLOYMENT DURING EMBARGO
=

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBCAAGBQJZSQ3WAAoJEIP+FMlX6CvZpxQH/0nQaJEWEuVZlQliIaB3TUK2
nnXBf3cFMsNCBIsrQtYXetZ8amA7cjULd2SX8/WIR60CPZ5Uj/YQtld5cq4LfxMj
Ngma8mPDUMlu6t+n07vee/fte5fZYOpci0teC9NCLDbG5eTWoJ0K7CzN2JUQWLAb
IgeJAVgyNr3ZibqdB8xBb5KlA+hOBE77MQ6yt8qZeltoYezIxprwgcqE/BgMVrcj
+7pz0WtJtdTe7i8i6jGcqvzbl3WhmcppiDLNhv310V+dV+T2e9cJia5EuapbqYD7
mkMLnOTRngJq97q1RwTQlsLMOp+/deZpqueLKttVCSR6VP4GtKpgr/0O1NW91YU=
=hATV
-END PGP SIGNATURE-


xsa223.patch
Description: Binary data
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] Xen Security Advisory 216 - blkif responses leak backend stack data

2017-06-20 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory XSA-216
  version 3

blkif responses leak backend stack data

UPDATES IN VERSION 3


Public release.

Fix a typo ("our" for "or" in Vulnerable Systems).

ISSUE DESCRIPTION
=

The block interface response structure has some discontiguous fields.
Certain backends populate the structure fields of an otherwise
uninitialized instance of this structure on their stacks, leaking
data through the (internal or trailing) padding field.

IMPACT
==

A malicious unprivileged guest may be able to obtain sensitive
information from the host or other guests.

VULNERABLE SYSTEMS
==

All Linux versions supporting the xen-blkback, blkback, or blktap
drivers are vulnerable.

FreeBSD, NetBSD and Windows (with or without PV drivers) are not
vulnerable (either because they do not have backends at all, or
because they use a different implementation technique which does not
suffer from this problem).

All qemu versions supporting the Xen block backend are vulnerable.  The
qemu-xen-traditional code base does not include such code, so is not
vulnerable.  Note that an instance of qemu will be spawned to provide
the backend for most non-raw-format disks; so you may need to apply the
patch to qemu even if you use only PV guests.

MITIGATION
==

There's no mitigation available for x86 PV and ARM guests.

For x86 HVM guests it may be possible to change the guest
configuaration such that a fully virtualized disk is being made
available instead.  However, this would normally entail changes inside
the guest itself.

CREDITS
===

This issue was discovered by Anthony Perard of Citrix.

For patch:
Reported by: Anthony Perard 

RESOLUTION
==

Applying the appropriate attached patch resolves this issue.

xsa216-linux-4.11.patch   Linux 4.5 ... 4.11
xsa216-linux-4.4.patchLinux 3.3 ... 4.4
xsa216-qemuu.patchqemu-upstream master, 4.8
xsa216-qemuu-4.7.patchqemu-upstream 4.7, 4.6
xsa216-qemuu-4.5.patchqemu-upstream 4.5
xsa216-linux-2.6.18-xen.patch linux-2.6.18-xen.hg

$ sha256sum xsa216*
28beb3d876fa0eee77f4377ef2708d764a5d9a2003dd4f1a4ecb9b8bf60658a4  
xsa216-linux-2.6.18-xen.patch
6f6138c0a00df4ed7307ae4e5ee30dbe8594ff05bc1e8fdc7cfd785077d72ddc  
xsa216-linux-4.4.patch
e04da27961cd867f7bbba31677f61e3e425c0e7cc7352a7a2d22b5a35eaf8585  
xsa216-linux-4.11.patch
850b0143cfe3c69c62abdad71be9813014d46c380109fc650689a10c90ff39f4  
xsa216-qemuu.patch
072270274d2554b71579a529c908d16479f8eba6646d8aed2e3d129495b27716  
xsa216-qemuu-4.5.patch
5a64e2c5bb78f1c8fae97354be10fcc63ea39d333d6490e3a422ff30460cdef1  
xsa216-qemuu-4.7.patch
$

DEPLOYMENT DURING EMBARGO
=

Deployment of the patches described above (or others which are
substantially similar) is permitted during the embargo, even on
public-facing systems with untrusted guest users and administrators.

However, deployment of the mitigation is NOT permitted (except where
all the affected systems and VMs are administered and used only by
organisations which are members of the Xen Project Security Issues
Predisclosure List).  Specifically, deployment on public cloud systems
is NOT permitted.  This is because this produces a guest-visible
change which will indicate which component contains the vulnerability.

Additionally, distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBCAAGBQJZSQ3JAAoJEIP+FMlX6CvZWkQIAMXD8Lc1PunNw5x9WsLb2y9U
KA0QrsNve4Ugc/xvCiuqUoV+ljZIRiy57A//ZnNtTR8JiRqpjEC47he3oYNleytN
RfOw2ZzsXdD4F8sqT3YvR0vcPL1Pf7fHzg8Ax19RxdcXRWTrN/b/poxuCu4F5PWn
cFi4tQDYLuEb2e9Sj8ue8RbtcVOEyuSG/dP1E29K7sKdc6GB13nWsa93KJsSRLY6
cwKnOmBy+2H66FcfmWomU+OueKI7y5DsYxYV+VVUBGnBTSn0b3dwpHNKUBCuF1nQ
RqOjo2rHOMBeiGaAlGg8toef7IkRH20p/LjiQxAneMndmta3t9enx8rYYxgFd5k=
=3n1c
-END PGP SIGNATURE-


xsa216-linux-2.6.18-xen.patch
Description: Binary data


xsa216-linux-4.4.patch
Description: Binary data


xsa216-linux-4.11.patch
Description: Binary data


xsa216-qemuu.patch
Description: Binary data


xsa216-qemuu-4.5.patch
Description: Binary data


xsa216-qemuu-4.7.patch
Description: Binary data

  1   2   >