Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-09 Thread Jan Beulich
>>> On 10.07.15 at 08:21,  wrote:

> 
>> -Original Message-
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, July 09, 2015 3:26 PM
>> To: Wu, Feng; Tian, Kevin
>> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Zhang, Yang Z;
>> xen-devel@lists.xen.org; k...@xen.org 
>> Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
>> is blocked
>> 
>> >>> On 09.07.15 at 00:49,  wrote:
>> >>  From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
>> >> Sent: Wednesday, July 08, 2015 9:09 PM
>> >> On 08/07/2015 13:46, Jan Beulich wrote:
>> >>  On 08.07.15 at 13:00,  wrote:
>> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
>> >> >>> vmx_function_table = {
>> >> >>>  .enable_msr_exit_interception =
>> vmx_enable_msr_exit_interception,
>> >> >>>  };
>> >> >>>
>> >> >>> +/*
>> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
>> >> >>> + */
>> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
>> >> >>> +{
>> >> >>> +struct arch_vmx_struct *vmx;
>> >> >>> +unsigned int cpu = smp_processor_id();
>> >> >>> +
>> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
>> >> >>> +
>> >> >>> +/*
>> >> >>> + * FIXME: The length of the list depends on how many
>> >> >>> + * vCPU is current blocked on this specific pCPU.
>> >> >>> + * This may hurt the interrupt latency if the list
>> >> >>> + * grows to too many entries.
>> >> >>> + */
>> >> >> let's go with this linked list first until a real issue is identified.
>> >> > This is exactly the way of thinking I dislike when it comes to code
>> >> > that isn't intended to be experimental only: We shouldn't wait
>> >> > for problems to surface when we already can see them. I.e. if
>> >> > there are no plans to deal with this, I'd ask for the feature to be
>> >> > off by default and be properly marked experimental in the
>> >> > command line option documentation (so people know to stay
>> >> > away from it).
>> >>
>> >> And in this specific case, there is no balancing of vcpus across the
>> >> pcpus lists.
>> >>
>> >> One can construct a pathological case using pinning and pausing to get
>> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer
>> >> interrupts will exasperate the problem by staying on the list for longer
>> >> periods of time.
>> >
>> > In that extreme case I believe many contentions in other code paths will
>> > be much larger than overhead caused by this structure limitation.
>> 
>> Examples?
>> 
>> >> IMO, the PI feature cannot be declared as done/supported with this bug
>> >> remaining.  OTOH, it is fine to be experimental, and disabled by default
>> >> for people who wish to experiment.
>> >>
>> >
>> > Again, I don't expect to see it disabled as experimental. For good
>> > production
>> > environment where vcpus are well balanced and interrupt latency is
>> > sensitive,
>> > linked list should be efficient here. For bad environment like extreme case
>> > you raised, I don't know whether it really matters to just tune interrupt
>> > path.
>> 
>> Can you _guarantee_ that everything potentially leading to such a
>> pathological situation is covered by XSA-77? And even if it is now,
>> removing elements from the waiver list would become significantly
>> more difficult if disconnected behavior like this one would need to
>> be taken into account.
>> 
>> Please understand that history has told us to be rather more careful
>> than might seem necessary with this: ATS originally having been
>> enabled by default is one bold example, and the recent flood of MSI
>> related XSAs is another; I suppose I could find more. All affecting
>> code originating from Intel, apparently written with only functionality
>> in mind, while having left out (other than basic) security considerations.
>> 
>> IOW, with my committer role hat on, the feature is going to be
>> experimental (and hence default off) unless the issue here gets
>> addressed. And no, I cannot immediately suggest a good approach,
>> and with all of the rush before the feature freeze I also can't justify
>> taking a lot of time to think of options.
> 
> Is it acceptable to you if I only add the blocked vcpus that has
> assigned devices to the list? I think that should shorten the
> length of the list.

I actually implied this to be the case already, i.e.
- if it's not, this needs to be fixed anyway,
- it's not going to eliminate the concern (just think of a couple of
  many-vCPU guests all having devices assigned).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-09 Thread Jan Beulich
>>> On 10.07.15 at 07:59,  wrote:
> If you agree with doing all this in a central place, maybe we can create
> an arch hook for 'struct scheduler' to do this and call it in all the places
> vcpu_runstate_change() gets called. What is your opinion about this?

Doing this in a central place is certainly the right approach, but
adding an arch hook that needs to be called everywhere
vcpu_runstate_change() wouldn't serve that purpose. Instead
we'd need to replace all current vcpu_runstate_change() calls
with calls to a new function calling both this and the to be added
arch hook.

But please wait for George's / Dario's feedback, because they
seem to be even less convinced than me about your model of
tying the updates to runstate changes.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-09 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, July 09, 2015 3:26 PM
> To: Wu, Feng; Tian, Kevin
> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Zhang, Yang Z;
> xen-devel@lists.xen.org; k...@xen.org
> Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> >>> On 09.07.15 at 00:49,  wrote:
> >>  From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> >> Sent: Wednesday, July 08, 2015 9:09 PM
> >> On 08/07/2015 13:46, Jan Beulich wrote:
> >>  On 08.07.15 at 13:00,  wrote:
> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> >> >>> vmx_function_table = {
> >> >>>  .enable_msr_exit_interception =
> vmx_enable_msr_exit_interception,
> >> >>>  };
> >> >>>
> >> >>> +/*
> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
> >> >>> + */
> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> >> >>> +{
> >> >>> +struct arch_vmx_struct *vmx;
> >> >>> +unsigned int cpu = smp_processor_id();
> >> >>> +
> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> >> >>> +
> >> >>> +/*
> >> >>> + * FIXME: The length of the list depends on how many
> >> >>> + * vCPU is current blocked on this specific pCPU.
> >> >>> + * This may hurt the interrupt latency if the list
> >> >>> + * grows to too many entries.
> >> >>> + */
> >> >> let's go with this linked list first until a real issue is identified.
> >> > This is exactly the way of thinking I dislike when it comes to code
> >> > that isn't intended to be experimental only: We shouldn't wait
> >> > for problems to surface when we already can see them. I.e. if
> >> > there are no plans to deal with this, I'd ask for the feature to be
> >> > off by default and be properly marked experimental in the
> >> > command line option documentation (so people know to stay
> >> > away from it).
> >>
> >> And in this specific case, there is no balancing of vcpus across the
> >> pcpus lists.
> >>
> >> One can construct a pathological case using pinning and pausing to get
> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer
> >> interrupts will exasperate the problem by staying on the list for longer
> >> periods of time.
> >
> > In that extreme case I believe many contentions in other code paths will
> > be much larger than overhead caused by this structure limitation.
> 
> Examples?
> 
> >> IMO, the PI feature cannot be declared as done/supported with this bug
> >> remaining.  OTOH, it is fine to be experimental, and disabled by default
> >> for people who wish to experiment.
> >>
> >
> > Again, I don't expect to see it disabled as experimental. For good
> > production
> > environment where vcpus are well balanced and interrupt latency is
> > sensitive,
> > linked list should be efficient here. For bad environment like extreme case
> > you raised, I don't know whether it really matters to just tune interrupt
> > path.
> 
> Can you _guarantee_ that everything potentially leading to such a
> pathological situation is covered by XSA-77? And even if it is now,
> removing elements from the waiver list would become significantly
> more difficult if disconnected behavior like this one would need to
> be taken into account.
> 
> Please understand that history has told us to be rather more careful
> than might seem necessary with this: ATS originally having been
> enabled by default is one bold example, and the recent flood of MSI
> related XSAs is another; I suppose I could find more. All affecting
> code originating from Intel, apparently written with only functionality
> in mind, while having left out (other than basic) security considerations.
> 
> IOW, with my committer role hat on, the feature is going to be
> experimental (and hence default off) unless the issue here gets
> addressed. And no, I cannot immediately suggest a good approach,
> and with all of the rush before the feature freeze I also can't justify
> taking a lot of time to think of options.
> 

Is it acceptable to you if I only add the blocked vcpus that has
assigned devices to the list? I think that should shorten the
length of the list.

Thanks,
Feng

> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PCI Pass-through in Xen ARM - Draft 2.

2015-07-09 Thread Pranavkumar Sawargaonkar
On Thu, Jul 9, 2015 at 7:27 PM, Julien Grall  wrote:
>
>
> On 09/07/2015 12:30, Manish Jaggi wrote:
>>
>> On Thursday 09 July 2015 01:38 PM, Julien Grall wrote:
>>>
>>> On 09/07/2015 08:13, Manish Jaggi wrote:
>
>
> If this was a domctl there might be scope for accepting an
> implementation which made assumptions such as sbdf == deviceid. However
> I'd still like to see this topic given proper treatment in the design
> and not just glossed over with "this is how ThunderX does things".

 I got your point.
>
> Or maybe the solution is simple and we should just do it now -- i.e.
> can
> we add a new field to the PHYSDEVOP_pci_host_bridge_add argument struct
> which contains the base deviceid for that bridge

 deviceId would be same as sbdf. As we dont have a way to translate sbdf
 to deviceID.
>>>
>>>
>>> I think we have to be clear in this design document about the
>>> different meaning.
>>>
>>> When the Device Tree is used, it's assumed that the deviceID will be
>>> equal to the requester ID and not the sbdf.
>>
>> Does SMMU v2 has a concept of requesterID.
>> I see requesterID term in SMMUv3
>
>
> The requester ID is part of the PCI spec and not the SMMU.
>
> The version of the SMMUv2 spec doesn't mention anything about PCI. I suspect
> this is because the spec has been written before the introduced PCI. And
> therefore this is implementation defined.
>
>>>
>>> Linux provides a function (pci_for_each_dma_alias) which will return a
>>> requester ID for a given PCI device. It appears that the BDF (the 's'
>>> of sBDF is only internal to Linux and not part of the hardware) is
>>> equal to the requester ID on your platform but we can't assume it for
>>> anyone else.
>>
>> so you mean requesterID = pci_for_each_dma_alias(sbdf)
>
>
> Yes.
>
>>>
>>> When we have a PCI in hand, we have to find the requester ID for this
>>> device.
>>
>> That is the question. How to map requesterID to sbdf
>
>
> See above.
>
>>> On
>>
>> Once ?
>
>
> Yes.
>
>>> we have it we can deduce the streamID and the deviceID. The way to do
>>> it will depend on whether we use device tree or ACPI:
>>> - For device tree, the streamID, and deviceID will be equal to the
>>> requester ID
>>
>> what do you think should be streamID when a device is PCI EP and is
>> enumerated. Also per ARM SMMU 2.0 spec  StreamID is implementation
>> specific.
>> As per SMMUv3 specs
>> "For PCI, it is intended that StreamID is generated from the PCI
>> RequesterID. The generation function may be 1:1
>> where one Root Complex is hosted by one SMMU"
>
>
> I think my sentence "For device tree, the streamID, and deviceID will be
> equal to the requester ID" is pretty clear. FWIW, this is the solution
> chosen for Linux:
>
> "Assume Stream ID == Requester ID for now. We need a way to describe the ID
> mappings in FDT" (see arm_smmu_add_pci_device in drivers/iommu/arm-smmu.c).
>
> You can refer to my point below about ACPI tables. The solution would be
> exactly the same. If we have a requester ID in hand we can do pretty much
> everything.

There is already one proposal by Mark Rutland on this topic about
describing StreamID to Requester ID mapping in DT.
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/333199.html
Probably till it gets finalized assuming RequesterID=StreamId is the
only way since deriving StreamID from PCIe RequsterID will vary from
one vendor to another.

Thanks,
Pranav
>
> The whole point of my previous email is to give insight about what we need
> and what we can deduce based on firmware tables. I didn't cover anything
> implementation details.
>
> Regards,
>
> --
> Julien Grall
>
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters

2015-07-09 Thread Chen, Tiejun

The first issue (which would really be relevant to the documentation
patch) is that the documentation is in a separate commit.  There are
sometimes valid reasons for doing this.  I'm not sure if they apply,


Wei suggested we should organize/spit all patches according to libxl, 
libxc, xc and xl.



but if they do this should be explained in one of the commit
messages.  If this was done I'm afraid I have missed it.


In this patch head description, maybe I can change something like this

This patch parses to enable user configurable parameters to specify
RDM resource and according policies which are defined previously,




+}else if ( !strcmp(optkey, "rdm_policy") ) {
+if ( !strcmp(tok, "strict") ) {
+pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if ( !strcmp(tok, "relaxed") ) {
+pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
+  " policy: 'strict' or 'relaxed'.",
+ tok);
+goto parse_error;
+}


This section has coding style (whitespace) problems and long lines.
If you need to respin, please fix them.


Are you saying this?

} else if (  -> }else if (
} else { -> }else {

Additionally I don't found which line is over 80 characters.




+for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
+switch(state) {
+case STATE_TYPE:
+if (*ptr == '=') {
+state = STATE_RDM_STRATEGY;
+*ptr = '\0';
+if (strcmp(tok, "strategy")) {
+XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}


This code is extremely repetitive.



I just refer to xlu_pci_parse_bdf()

switch(state) {
case STATE_DOMAIN:
if ( *ptr == ':' ) {
state = STATE_BUS;
*ptr = '\0';
if ( hex_convert(tok, &dom, 0x) )
goto parse_error;
tok = ptr + 1;
}
break;


Really I would prefer that this parsing was done with a miniature flex
parser, rather than ad-hoc pointer arithmetic and use of strtok.


Sorry, could you show this explicitly?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-09 Thread Wu, Feng


> -Original Message-
> From: George Dunlap [mailto:george.dun...@eu.citrix.com]
> Sent: Thursday, July 09, 2015 10:28 PM
> To: Dario Faggioli; Jan Beulich
> Cc: Tian, Kevin; Wu, Feng; andrew.coop...@citrix.com; xen-devel; Zhang, Yang
> Z; k...@xen.org
> Subject: Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor
> during vCPU scheduling
> 
> On 07/09/2015 03:18 PM, Dario Faggioli wrote:
> > On Thu, 2015-07-09 at 14:44 +0100, Jan Beulich wrote:
> > On 09.07.15 at 14:53,  wrote:
> >
> >>> Consider the following scenario:
> >>> - v1 blocks on pcpu 0.
> >>>  - vcpu_runstate_change() will do everything necessary for v1 on p0.
> >>> - The scheduler does load balancing and moves v1 to p1, calling
> >>> vcpu_migrate().  Because the vcpu is still blocked,
> >>> vcpu_runstate_change() is not called.
> >>> - A device interrupt is generated.
> >>>
> >>> What happens to the interrupt?  Does everything still work properly, or
> >>> will the device wake-up interrupt go to the wrong pcpu (p0 rather than 
> >>> p1)?
> >>
> >> I think much of this was discussed before, since I also disliked the
> >> hooking into vcpu_runstate_change(). What I remember having
> >> been told is that it really only matters which pCPU's list a vCPU is
> >> on, not what v->processor says.
> >>
> > Right.
> >
> > But, as far as I could understand from the patches I've seen, a vcpu
> > ends up in a list when it blocks, and when it blocks there will be a
> > context switch, and hence we can deal with the queueing during the the
> > context switch itself (which is, in part, an arch specific operation
> > already).
> >
> > What am I missing?
> 
> I think what you're missing is that Jan is answering my question about
> migrating a blocked vcpu, not arguing that vcpu_runstate_change() is the
> right way to go.  At least that's how I understood him. :-)
> 
> But regarding context_switch: I think the reason we need more hooks than
> that is that context_switch only changes into and out of running state.
>  There are also changes that need to happen when you change from blocked
> to offline, offline to blocked, blocked to runnable, &c; these don't go
> through context_switch.  That's why I was suggesting some architectural
> equivalents to the SCHED_OP() callbacks to be added to vcpu_wake &c.
> 
> vcpu_runstate_change() is at the moment a nice quiet cul-de-sac that
> just does a little bit of accounting; I'd rather not have it suddenly
> become a major thoroughfare for runstate change hooks, if we can avoid
> it. :-)

So in my understanding, vcpu_runstate_change() is a central place to
do this, which is good. However, this function is original designed to be
served only for accounting. it is a little intrusive to make it so important
after adding the hooks in it.

If you agree with doing all this in a central place, maybe we can create
an arch hook for 'struct scheduler' to do this and call it in all the places
vcpu_runstate_change() gets called. What is your opinion about this?

Thanks,
Feng

> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-09 Thread Chen, Tiejun

  tools/libxl/libxl_dom.c  |  5 +++
  tools/libxl/libxl_internal.h | 24 +
  tools/libxl/libxl_x86.c  | 83 

...

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 62ef120..41da479 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
  goto out;
  }

+if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
+LOG(ERROR, "setting domain memory map failed");
+goto out;
+}


This is platform-independent code, isn't it ?  In which case this will
break the build on ARM, I think.

Would an ARM maintainer please confirm.



I think you're right. I should make this specific to arch since here 
we're talking e820.


So I tried to refactor this patch,

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d04871c..939178a 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -49,4 +49,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
 _hidden
 int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);

+/* arch specific to construct memory mapping function */
+_hidden
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args);
+
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index f09c860..1526467 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -926,6 +926,14 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, 
uint32_t domid, int irq)

 return xc_domain_bind_pt_spi_irq(CTX->xch, domid, irq, irq);
 }

+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args)
+{
+return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 62ef120..691c1f6 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }

+if (libxl__arch_domain_construct_memmap(gc, d_config, domid, &args)) {
+LOG(ERROR, "setting domain memory map failed");
+goto out;
+}
+
 ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
&state->store_mfn, state->console_port,
&state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..66b3d7f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, 
uint32_t domid, int irq)

 }

 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x10
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args)
+{
...



Aside from that I have no issues with this patch.



But if you think I should resend this let me know.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-3.4 test] 59267: regressions - FAIL

2015-07-09 Thread osstest service owner
flight 59267 linux-3.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59267/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-win7-amd64  6 xen-boot  fail REGR. vs. 30511

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-sedf-pin  6 xen-boot   fail in 58831 pass in 58798
 test-amd64-i386-xl-qemuu-win7-amd64 9 windows-install fail in 58831 pass in 
59267
 test-amd64-amd64-pair10 xen-boot/dst_host   fail pass in 58798
 test-amd64-amd64-pair 9 xen-boot/src_host   fail pass in 58798
 test-amd64-i386-pair 10 xen-boot/dst_host   fail pass in 58831
 test-amd64-i386-pair  9 xen-boot/src_host   fail pass in 58831

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-xl-multivcpu  6 xen-boot   fail baseline untested
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-libvirt-xsm  6 xen-bootfail baseline untested
 test-amd64-i386-libvirt-xsm   6 xen-bootfail baseline untested
 test-amd64-amd64-xl-credit2   6 xen-bootfail baseline untested
 test-amd64-amd64-xl-xsm   6 xen-bootfail baseline untested
 test-amd64-i386-xl-xsm6 xen-bootfail baseline untested
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate 
fail baseline untested
 test-amd64-amd64-xl-rtds  6 xen-bootfail baseline untested
 test-amd64-amd64-xl-sedf  6 xen-boot  fail in 58831 like 30406
 test-amd64-i386-libvirt  11 guest-start  fail   like 30511
 test-amd64-amd64-libvirt 11 guest-start  fail   like 30511
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 30511
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail like 30511
 test-amd64-amd64-xl-qemuu-ovmf-amd64  6 xen-bootfail like 53709-bisect
 test-amd64-i386-xl6 xen-bootfail like 53725-bisect
 test-amd64-i386-freebsd10-amd64  6 xen-boot fail like 58780-bisect
 test-amd64-i386-xl-qemuu-winxpsp3  6 xen-boot   fail like 58786-bisect
 test-amd64-i386-qemut-rhel6hvm-intel  6 xen-bootfail like 58788-bisect
 test-amd64-i386-rumpuserxen-i386  6 xen-bootfail like 58799-bisect
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  6 xen-bootfail like 58801-bisect
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  6 xen-boot   fail like 58803-bisect
 test-amd64-amd64-xl-qemut-winxpsp3  6 xen-boot  fail like 58804-bisect
 test-amd64-i386-freebsd10-i386  6 xen-boot  fail like 58805-bisect
 test-amd64-i386-xl-qemuu-ovmf-amd64  6 xen-boot fail like 58806-bisect
 test-amd64-amd64-xl-qemuu-winxpsp3  6 xen-boot  fail like 58807-bisect
 test-amd64-i386-xl-qemut-winxpsp3  6 xen-boot   fail like 58808-bisect
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  6 xen-bootfail like 58809-bisect
 test-amd64-amd64-rumpuserxen-amd64  6 xen-boot  fail like 58810-bisect
 test-amd64-i386-xl-qemuu-debianhvm-amd64  6 xen-bootfail like 58811-bisect
 test-amd64-amd64-xl-qemut-debianhvm-amd64  6 xen-boot   fail like 58813-bisect
 test-amd64-i386-qemuu-rhel6hvm-intel  6 xen-bootfail like 58814-bisect
 test-amd64-i386-xl-qemut-debianhvm-amd64  6 xen-bootfail like 58815-bisect

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail in 58831 never pass
 test-amd64-i386-libvirt  12 migrate-support-check fail in 58831 never pass
 test-amd64-amd64-libvirt 12 migrate-support-check fail in 58831 never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass

version targeted for testing:
 linuxcf1b3dad6c5699b977273276bada8597636ef3e2
baseline version:
 linuxbb4a05a0400ed6d2f1e13d1f82f289ff74300a70

Last test of basis30511  2014-09-29 16:37:46 Z  283 days
Failing since 32004  2014-12-02 04:10:03 Z  219 days  169 attempts
Testing same since58781  2015-06-20 14:15:50 Z   19 days   23 attempts


500 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64 

Re: [Xen-devel] [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-09 Thread Chen, Tiejun

I have found a few things in this patch which I would like to see
improved.  See below.

Given how late I am with this review, I do not feel that I should be
nacking it at this time.  You have a tools ack from Wei, so my
comments are not a blocker for this series.

But if you need to respin, please take these comments into account,
and consider which are feasible to fix in the time available.  If you
are respinning this series targeting Xen 4.7 or later, please address
all of the points I make below.


Thanks for your comments and looks I should address them now.



Thanks.



+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+   libxl_domain_config *d_config,
+   uint64_t rdm_mem_boundary,
+   struct xc_hvm_build_args *args)

...

+uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull\

<<32);
...

+if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE && !d_config->num_\

pcidevs)

There are quite a few of these long lines, which should be wrapped.
See tools/libxl/CODING_STYLE.


Sorry I can't found any case to what you're talking.

So are you saying this?

if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE &&
!d_config->num_pcidevs)
Or

@@ -143,6 +143,15 @@ static bool overlaps_rdm(uint64_t start, uint64_t 
memsize,

 }

 /*
+ * Check whether any rdm should be exposed..
+ * Returns true if needs, else returns false.
+ */
+static bool exposes_rdm(uint32_t strategy, int num_pcidevs)
+{
+return strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE && !num_pcidevs;
+}
+
+/*
  * Check reported RDM regions and handle potential gfn conflicts according
  * to user preferred policy.
  *
@@ -182,7 +191,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc,
 uint64_t highmem_end = args->highmem_end ? args->highmem_end : 
(1ull<<32);


 /* Might not expose rdm. */
-if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE && 
!d_config->num_pcidevs)

+if (exposes_rdm(strategy, d_config->num_pcidevs))
 return 0;

 /* Query all RDM entries in this platform */





+d_config->num_rdms = nr_entries;
+d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+d_config->num_rdms * sizeof(libxl_device_rdm));


This code is remarkably similar to a function later on which adds an
rdm.  Please can you factor it out.


Do you mean I should merge them as one as possible?

But seems not be possible because we have seveal combinations of these 
two conditions, strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST and one or 
pci devices are also passes through.


#1. strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST but without any devices

So it appropriately needs this libxl__realloc() here. The second 
libxl__realloc() doesn't take any effect.


#2. strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST with one or more devices

Actually we don't need the second libxl__realloc(). This is same as #1.

#3. strategy != LIBXL_RDM_RESERVE_STRATEGY_HOST also with one or more 
devices


So we just need the second libxl__realloc() later. The fist 
libxl__realloc() doesn't be called at all. Especially, its very possible 
we're going to handle less RDMs, compared to 
LIBXL_RDM_RESERVE_STRATEGY_HOST.





+} else
+d_config->num_rdms = 0;


Please can you put { } around the else block too.  I don't think this
mixed style is good.


Fixed.




+for (j = 0; j < d_config->num_rdms; j++) {
+if (d_config->rdms[j].start ==
+ (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)


This construct
(uint64_t)some_pfn << XC_PAGE_SHIFT
appears an awful lot.

I would prefer it if it were done in an inline function (or maybe a
macro).


Like this?

#define (x) ((uint64_t)x << XC_PAGE_SHIFT)

Sorry I can't figure out a good name here :) Any suggestions?





+libxl_domain_build_info *const info = &d_config->b_info;
+/*
+ * Currently we fix this as 2G to guarantte how to handle

  ^

Should read "guarantee".


Fixed.




+ret = libxl__domain_device_construct_rdm(gc, d_config,
+ rdm_mem_boundary,
+ &args);
+if (ret) {
+LOG(ERROR, "checking reserved device memory failed");
+goto out;
+}


`rc' should be used here rather than `ret'.  (It is unfortunate that
this function has poor style already, but it would be best not to make
it worse.)



I can do this but according to tools/libxl/CODING_STYLE, looks I should 
first post a separate patch to fix this code style issue, and then 
rebase my patch, right?


Thanks
TIejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 00/27] Libxl migration v2

2015-07-09 Thread Yang Hongyang

On 07/10/2015 02:26 AM, Andrew Cooper wrote:

This series adds support for the libxl migration v2 stream, and untangles the
existing layering violations of the toolstack and qemu records.

It can be found on the branch "libxl-migv2-v2"
   git://xenbits.xen.org/people/andrewcoop/xen.git
   http://xenbits.xen.org/git-http/people/andrewcoop/xen.git

Major changes in v2 are being rebased over the libxl AO-abort series, and a
redesign of the internal logic to support Remus/COLO buffering and failover.


Great! I've read the series, the redesigned logic seems correct to me, I will
rebase COLO series and test to see if it will work, thanks!



At the end of the series, legacy migration is no longer used.

The Remus code is untested by me.  All other combinations of
suspend/migrate/resume have been tested with PV and HVM guests (qemu-trad and
qemu-upstream), including 32 -> 64 bit migration (which was the underlying bug
causing us to write migration v2 in the first place).

Anyway, thoughts/comments welcome.  Please test!

~Andrew

Summary of Acks/Modified/New from v1

   N bsd-sys-queue-h-seddery: Massage `offsetof'
A   tools/libxc: Always compile the compat qemu variables into xc_sr_context
A   tools/libxl: Introduce ROUNDUP()
   N tools/libxl: Introduce libxl__kill()
AM  tools/libxl: Stash all restore parameters in domain_create_state
   N tools/libxl: Split libxl__domain_create_state.restore_fd in two
  M  tools/libxl: Extra management APIs for the save helper
AM  tools/xl: Mandatory flag indicating the format of the migration stream
 docs: Libxl migration v2 stream specification
AM  tools/python: Libxc migration v2 infrastructure
AM  tools/python: Libxl migration v2 infrastructure
   N tools/python: Other migration infrastructure
AM  tools/python: Verification utility for v2 stream spec compliance
AM  tools/python: Conversion utility for legacy migration streams
  M  tools/libxl: Migration v2 stream format
  M  tools/libxl: Infrastructure for reading a libxl migration v2 stream
  M  tools/libxl: Support converting a legacy stream to a v2 stream
  M  tools/libxl: Convert a legacy stream if needed
  M  tools/libxc+libxl+xl: Restore v2 streams
  M  tools/libxl: Infrastructure for writing a v2 stream
  M  tools/libxc+libxl+xl: Save v2 streams
AM  docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams
  M  tools/libxl: Write checkpoint records into the stream
  M  tools/libx{c,l}: Introduce restore_callbacks.checkpoint()
  M  tools/libxl: Handle checkpoint records in a libxl migration v2 stream
A   tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc
A   tools/libxl: Drop all knowledge of toolstack callbacks



Andrew Cooper (23):
   tools/libxc: Always compile the compat qemu variables into xc_sr_context
   tools/libxl: Introduce ROUNDUP()
   tools/libxl: Introduce libxl__kill()
   tools/libxl: Stash all restore parameters in domain_create_state
   tools/libxl: Split libxl__domain_create_state.restore_fd in two
   tools/libxl: Extra management APIs for the save helper
   tools/xl: Mandatory flag indicating the format of the migration stream
   docs: Libxl migration v2 stream specification
   tools/python: Libxc migration v2 infrastructure
   tools/python: Libxl migration v2 infrastructure
   tools/python: Other migration infrastructure
   tools/python: Verification utility for v2 stream spec compliance
   tools/python: Conversion utility for legacy migration streams
   tools/libxl: Support converting a legacy stream to a v2 stream
   tools/libxl: Convert a legacy stream if needed
   tools/libxc+libxl+xl: Restore v2 streams
   tools/libxc+libxl+xl: Save v2 streams
   docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams
   tools/libxl: Write checkpoint records into the stream
   tools/libx{c,l}: Introduce restore_callbacks.checkpoint()
   tools/libxl: Handle checkpoint records in a libxl migration v2 stream
   tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc
   tools/libxl: Drop all knowledge of toolstack callbacks

Ian Jackson (1):
   bsd-sys-queue-h-seddery: Massage `offsetof'

Ross Lagerwall (3):
   tools/libxl: Migration v2 stream format
   tools/libxl: Infrastructure for reading a libxl migration v2 stream
   tools/libxl: Infrastructure for writing a v2 stream

  docs/specs/libxl-migration-stream.pandoc   |  217 ++
  tools/include/xen-external/bsd-sys-queue-h-seddery |2 +
  tools/libxc/Makefile   |2 -
  tools/libxc/include/xenguest.h |9 +
  tools/libxc/xc_sr_common.h |   12 +-
  tools/libxc/xc_sr_restore.c|   71 +-
  tools/libxc/xc_sr_restore_x86_hvm.c|  124 
  tools/libxc/xc_sr_save_x86_hvm.c   |   36 -
  tools/libxl/Makefile   |2 +
  tools/libxl/libxl.h|   19 +
  tools/libxl/libxl_aoutils.c|   15 +
  tools/li

[Xen-devel] Question about seperating request and response ring in PV network

2015-07-09 Thread openlui
Hi, all:
I am trying to improve the performance of netfront/netback, and  I found 
that there were some discussion about PV network performance improvement in 
devel mailing list ([1]). The proposals mentioned  in [1] are helpful, such as 
multipage ring,  multiqueue, etc, and some of them have already merged into 
upstream.
However, I am wondering if we have already used multipage ring,  why 
"seperate request and response ring" listed in [1] is also helpful for 
performance improvement? I have read the implementation of vmxnet3 driver, it 
does have separated request and response rings, but I can not understand what 
the advantage of this method is.
Thanks.

[1] http://lists.xen.org/archives/html/xen-devel/2013-05/msg01904.html

Best Regards
Lui


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-4.1 test] 59275: tolerable FAIL - PUSHED

2015-07-09 Thread osstest service owner
flight 59275 linux-4.1 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59275/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop  fail pass in 59143

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-libvirt  11 guest-start   fail REGR. vs. 59031
 test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail blocked in 59031
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail in 59143 like 59031
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate/x10 
fail in 59143 like 59260-bisect
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate 
fail like 59257-bisect

Tests which did not succeed, but are not blocking:
 test-amd64-i386-freebsd10-amd64  9 freebsd-install fail never pass
 test-amd64-i386-freebsd10-i386  9 freebsd-install  fail never pass
 test-amd64-amd64-xl-pvh-intel 13 guest-saverestorefail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 11 guest-start  fail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass

version targeted for testing:
 linux6a010c0abd49388a49af3d5a5bfc00e0d5767607
baseline version:
 linuxb953c0d234bc72e8489d3bf51a276c5c4ec85345

Last test of basis59031  2015-07-02 23:39:59 Z7 days
Testing same since59054  2015-07-05 10:20:43 Z4 days5 attempts


People who touched revisions under test:
  Alexander Shishkin 
  Alexey Sokolov 
  Andi Kleen 
  Arnaldo Carvalho de Melo 
  Borislav Petkov 
  Borislav Petkov 
  Dmitry Tunin 
  Greg Kroah-Hartman 
  Imre Palik 
  Ingo Molnar 
  Jiri Olsa 
  Kalle Valo 
  Lukas Wunner 
  Marcel Holtmann 
  Oleg Nesterov 
  Palik, Imre 
  Peter Zijlstra (Intel) 
  Rafał Miłecki 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  fail
 test-amd64-amd64-xl-xsm  pass
 test-armhf-armhf-xl-xsm  pass
 test-amd64-i386-xl-xsm   pass
 test-amd64-amd64-xl-pvh-amd 

Re: [Xen-devel] [PATCH v4 15/15] tools/xen-access: altp2m testcases

2015-07-09 Thread Lengyel, Tamas
> @@ -546,6 +652,23 @@ int main(int argc, char *argv[])
>  }
>
>  break;
> +case VM_EVENT_REASON_SINGLESTEP:
> +printf("Singlestep: rip=%016"PRIx64", vcpu %d\n",
> +   req.regs.x86.rip,
> +   req.vcpu_id);
> +
> +if ( altp2m )
> +{
> +printf("\tSwitching altp2m to view %u!\n",
> altp2m_view_id);
> +
> +rsp.reason = VM_EVENT_REASON_MEM_ACCESS;
>

So this was a workaround for v3 of the series that is no longer necessary -
it's probably cleaner to have the same reason set for the response as the
request was. It's not against any rule, so the code is still correct and
works, it's just not best practice. So in case there is another round on
the series, it could be fixed then.

Tamas
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 04/27] tools/libxl: Introduce libxl__kill()

2015-07-09 Thread Yang Hongyang

On 07/10/2015 02:26 AM, Andrew Cooper wrote:

as a wrapper to kill(2), and use it in preference to sendig in


s/sendig/sendsig/


libxl_save_callout.c.

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
Logically new in v2 - split out from a v1 change which was itself a
cherrypick-and-modify from the AO Abort series
---
  tools/libxl/libxl_aoutils.c  |   15 +++
  tools/libxl/libxl_internal.h |2 ++
  tools/libxl/libxl_save_callout.c |   10 ++
  3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 0931eee..274ef39 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -621,3 +621,18 @@ bool libxl__async_exec_inuse(const libxl__async_exec_state 
*aes)
  assert(time_inuse == child_inuse);
  return child_inuse;
  }
+
+void libxl__kill(libxl__gc *gc, pid_t pid, int sig, const char *what)
+{
+int r = kill(pid, sig);
+if (r) LOGE(WARN, "failed to kill() %s [%lu] (signal %d)",
+what, (unsigned long)pid, sig);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 19fc425..9147de1 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2244,6 +2244,8 @@ struct libxl__async_exec_state {
  int libxl__async_exec_start(libxl__async_exec_state *aes);
  bool libxl__async_exec_inuse(const libxl__async_exec_state *aes);

+void libxl__kill(libxl__gc *gc, pid_t pid, int sig, const char *what);
+
  /*- device addition/removal -*/

  typedef struct libxl__ao_device libxl__ao_device;
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 087c2d5..b82a5c1 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -244,12 +244,6 @@ static void run_helper(libxl__egc *egc, 
libxl__save_helper_state *shs,
  libxl__carefd_close(childs_pipes[1]);
  helper_failed(egc, shs, rc);;
  }
-static void sendsig(libxl__gc *gc, libxl__save_helper_state *shs, int sig)
-{
-int r = kill(shs->child.pid, sig);
-if (r) LOGE(WARN, "failed to kill save/restore helper [%lu] (signal %d)",
-(unsigned long)shs->child.pid, sig);
-}

  static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs,
int rc)
@@ -266,7 +260,7 @@ static void helper_failed(libxl__egc *egc, 
libxl__save_helper_state *shs,
  return;
  }

-sendsig(gc, shs, SIGKILL);
+libxl__kill(gc, shs->child.pid, SIGKILL, "save/restore helper");
  }

  static void helper_stop(libxl__egc *egc, libxl__ao_abortable *abrt, int rc)
@@ -282,7 +276,7 @@ static void helper_stop(libxl__egc *egc, 
libxl__ao_abortable *abrt, int rc)
  if (!shs->rc)
  shs->rc = rc;

-sendsig(gc, shs, SIGTERM);
+libxl__kill(gc, shs->child.pid, SIGTERM, "save/restore helper");
  }

  static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 09/15] x86/altp2m: alternate p2m memory events.

2015-07-09 Thread Lengyel, Tamas
> diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
> index 577e971..6dfa9db 100644
> --- a/xen/include/public/vm_event.h
> +++ b/xen/include/public/vm_event.h
> @@ -47,6 +47,16 @@
>  #define VM_EVENT_FLAG_VCPU_PAUSED (1 << 0)
>  /* Flags to aid debugging mem_event */
>  #define VM_EVENT_FLAG_FOREIGN (1 << 1)
> +/*
> + * This flag can be set in a request or a response
> + *
> + * On a request, indicates that the event occurred in the alternate p2m
> specified by
> + * the altp2m_idx request field.
> + *
> + * On a response, indicates that the VCPU should resume in the alternate
> p2m specified
> + * by the altp2m_idx response field if possible.
> + */
> +#define VM_EVENT_FLAG_ALTERNATE_P2M   (1 << 2)
>

This will now collide with staging following Razvan's patch that recently
got merged. Otherwise:

Acked-by: Tamas K Lengyel 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 15/15] tools/xen-access: altp2m testcases

2015-07-09 Thread Ed White
From: Tamas K Lengyel 

Working altp2m test-case. Extended the test tool to support singlestepping
to better highlight the core feature of altp2m view switching.

Signed-off-by: Tamas K Lengyel 
Signed-off-by: Ed White 
---
 tools/tests/xen-access/xen-access.c | 173 ++--
 1 file changed, 148 insertions(+), 25 deletions(-)

diff --git a/tools/tests/xen-access/xen-access.c 
b/tools/tests/xen-access/xen-access.c
index 12ab921..6daa408 100644
--- a/tools/tests/xen-access/xen-access.c
+++ b/tools/tests/xen-access/xen-access.c
@@ -275,6 +275,19 @@ xenaccess_t *xenaccess_init(xc_interface **xch_r, domid_t 
domain_id)
 return NULL;
 }
 
+static inline
+int control_singlestep(
+xc_interface *xch,
+domid_t domain_id,
+unsigned long vcpu,
+bool enable)
+{
+uint32_t op = enable ?
+XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON : 
XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF;
+
+return xc_domain_debug_control(xch, domain_id, op, vcpu);
+}
+
 /*
  * Note that this function is not thread safe.
  */
@@ -317,13 +330,15 @@ static void put_response(vm_event_t *vm_event, 
vm_event_response_t *rsp)
 
 void usage(char* progname)
 {
-fprintf(stderr,
-"Usage: %s [-m]  write|exec|breakpoint\n"
+fprintf(stderr, "Usage: %s [-m]  write|exec", progname);
+#if defined(__i386__) || defined(__x86_64__)
+fprintf(stderr, "|breakpoint|altp2m_write|altp2m_exec");
+#endif
+fprintf(stderr,
 "\n"
 "Logs first page writes, execs, or breakpoint traps that occur on 
the domain.\n"
 "\n"
-"-m requires this program to run, or else the domain may pause\n",
-progname);
+"-m requires this program to run, or else the domain may pause\n");
 }
 
 int main(int argc, char *argv[])
@@ -341,6 +356,8 @@ int main(int argc, char *argv[])
 int required = 0;
 int breakpoint = 0;
 int shutting_down = 0;
+int altp2m = 0;
+uint16_t altp2m_view_id = 0;
 
 char* progname = argv[0];
 argv++;
@@ -379,10 +396,22 @@ int main(int argc, char *argv[])
 default_access = XENMEM_access_rw;
 after_first_access = XENMEM_access_rwx;
 }
+#if defined(__i386__) || defined(__x86_64__)
 else if ( !strcmp(argv[0], "breakpoint") )
 {
 breakpoint = 1;
 }
+else if ( !strcmp(argv[0], "altp2m_write") )
+{
+default_access = XENMEM_access_rx;
+altp2m = 1;
+}
+else if ( !strcmp(argv[0], "altp2m_exec") )
+{
+default_access = XENMEM_access_rw;
+altp2m = 1;
+}
+#endif
 else
 {
 usage(argv[0]);
@@ -415,22 +444,73 @@ int main(int argc, char *argv[])
 goto exit;
 }
 
-/* Set the default access type and convert all pages to it */
-rc = xc_set_mem_access(xch, domain_id, default_access, ~0ull, 0);
-if ( rc < 0 )
+/* With altp2m we just create a new, restricted view of the memory */
+if ( altp2m )
 {
-ERROR("Error %d setting default mem access type\n", rc);
-goto exit;
-}
+xen_pfn_t gfn = 0;
+unsigned long perm_set = 0;
+
+rc = xc_altp2m_set_domain_state( xch, domain_id, 1 );
+if ( rc < 0 )
+{
+ERROR("Error %d enabling altp2m on domain!\n", rc);
+goto exit;
+}
+
+rc = xc_altp2m_create_view( xch, domain_id, default_access, 
&altp2m_view_id );
+if ( rc < 0 )
+{
+ERROR("Error %d creating altp2m view!\n", rc);
+goto exit;
+}
 
-rc = xc_set_mem_access(xch, domain_id, default_access, START_PFN,
-   (xenaccess->max_gpfn - START_PFN) );
+DPRINTF("altp2m view created with id %u\n", altp2m_view_id);
+DPRINTF("Setting altp2m mem_access permissions.. ");
 
-if ( rc < 0 )
+for(; gfn < xenaccess->max_gpfn; ++gfn)
+{
+rc = xc_altp2m_set_mem_access( xch, domain_id, altp2m_view_id, gfn,
+   default_access);
+if ( !rc )
+perm_set++;
+}
+
+DPRINTF("done! Permissions set on %lu pages.\n", perm_set);
+
+rc = xc_altp2m_switch_to_view( xch, domain_id, altp2m_view_id );
+if ( rc < 0 )
+{
+ERROR("Error %d switching to altp2m view!\n", rc);
+goto exit;
+}
+
+rc = xc_monitor_singlestep( xch, domain_id, 1 );
+if ( rc < 0 )
+{
+ERROR("Error %d failed to enable singlestep monitoring!\n", rc);
+goto exit;
+}
+}
+
+if ( !altp2m )
 {
-ERROR("Error %d setting all memory to access type %d\n", rc,
-  default_access);
-goto exit;
+/* Set the default access type and convert all pages to it */
+rc = xc_set_mem_access(xch, domain_id, default_access, ~0ull, 0);
+if ( rc < 0 )
+{
+ERROR("Error %d setting default mem acc

[Xen-devel] [PATCH v4 14/15] tools/libxc: add support to altp2m hvmops

2015-07-09 Thread Ed White
From: Tamas K Lengyel 

Wrappers to issue altp2m hvmops.

Signed-off-by: Tamas K Lengyel 
Signed-off-by: Ravi Sahita 
---
 tools/libxc/Makefile  |   1 +
 tools/libxc/include/xenctrl.h |  21 
 tools/libxc/xc_altp2m.c   | 237 ++
 3 files changed, 259 insertions(+)
 create mode 100644 tools/libxc/xc_altp2m.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 153b79e..c2c2b1c 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -10,6 +10,7 @@ override CONFIG_MIGRATE := n
 endif
 
 CTRL_SRCS-y   :=
+CTRL_SRCS-y   += xc_altp2m.c
 CTRL_SRCS-y   += xc_core.c
 CTRL_SRCS-$(CONFIG_X86) += xc_core_x86.c
 CTRL_SRCS-$(CONFIG_ARM) += xc_core_arm.c
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..ecddf28 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2316,6 +2316,27 @@ void xc_tmem_save_done(xc_interface *xch, int dom);
 int xc_tmem_restore(xc_interface *xch, int dom, int fd);
 int xc_tmem_restore_extra(xc_interface *xch, int dom, int fd);
 
+/**
+ * altp2m operations
+ */
+
+int xc_altp2m_get_domain_state(xc_interface *handle, domid_t dom, bool *state);
+int xc_altp2m_set_domain_state(xc_interface *handle, domid_t dom, bool state);
+int xc_altp2m_set_vcpu_enable_notify(xc_interface *handle, xen_pfn_t gfn);
+int xc_altp2m_create_view(xc_interface *handle, domid_t domid,
+  xenmem_access_t default_access, uint16_t *view_id);
+int xc_altp2m_destroy_view(xc_interface *handle, domid_t domid,
+   uint16_t view_id);
+/* Switch all vCPUs of the domain to the specified altp2m view */
+int xc_altp2m_switch_to_view(xc_interface *handle, domid_t domid,
+ uint16_t view_id);
+int xc_altp2m_set_mem_access(xc_interface *handle, domid_t domid,
+ uint16_t view_id, xen_pfn_t gfn,
+ xenmem_access_t access);
+int xc_altp2m_change_gfn(xc_interface *handle, domid_t domid,
+ uint16_t view_id, xen_pfn_t old_gfn,
+ xen_pfn_t new_gfn);
+
 /** 
  * Mem paging operations.
  * Paging is supported only on the x86 architecture in 64 bit mode, with
diff --git a/tools/libxc/xc_altp2m.c b/tools/libxc/xc_altp2m.c
new file mode 100644
index 000..a4be36b
--- /dev/null
+++ b/tools/libxc/xc_altp2m.c
@@ -0,0 +1,237 @@
+/**
+ *
+ * xc_altp2m.c
+ *
+ * Interface to altp2m related HVMOPs
+ *
+ * Copyright (c) 2015 Tamas K Lengyel (ta...@tklengyel.com)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  
USA
+ */
+
+#include "xc_private.h"
+#include 
+#include 
+
+int xc_altp2m_get_domain_state(xc_interface *handle, domid_t dom, bool *state)
+{
+int rc;
+DECLARE_HYPERCALL;
+DECLARE_HYPERCALL_BUFFER(xen_hvm_altp2m_op_t, arg);
+
+arg = xc_hypercall_buffer_alloc(handle, arg, sizeof(*arg));
+if ( arg == NULL )
+return -1;
+
+hypercall.op = __HYPERVISOR_hvm_op;
+hypercall.arg[0] = HVMOP_altp2m;
+hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+arg->cmd = HVMOP_altp2m_get_domain_state;
+arg->domain = dom;
+
+rc = do_xen_hypercall(handle, &hypercall);
+
+if ( !rc )
+*state = arg->u.domain_state.state;
+
+xc_hypercall_buffer_free(handle, arg);
+return rc;
+}
+
+int xc_altp2m_set_domain_state(xc_interface *handle, domid_t dom, bool state)
+{
+int rc;
+DECLARE_HYPERCALL;
+DECLARE_HYPERCALL_BUFFER(xen_hvm_altp2m_op_t, arg);
+
+arg = xc_hypercall_buffer_alloc(handle, arg, sizeof(*arg));
+if ( arg == NULL )
+return -1;
+
+hypercall.op = __HYPERVISOR_hvm_op;
+hypercall.arg[0] = HVMOP_altp2m;
+hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+arg->cmd = HVMOP_altp2m_set_domain_state;
+arg->domain = dom;
+arg->u.domain_state.state = state;
+
+rc = do_xen_hypercall(handle, &hypercall);
+
+xc_hypercall_buffer_free(handle, arg);
+return rc;
+}
+
+/* This is a bit odd to me that it acts on current.. */
+int xc_altp2m_set_vcpu_enable_notify(xc_interface *handle, xen_pfn_t gfn)
+{
+int rc;
+DECLARE_HYPERCALL;
+DECLARE_H

[Xen-devel] [PATCH v4 13/15] x86/altp2m: XSM hooks for altp2m HVM ops

2015-07-09 Thread Ed White
From: Ravi Sahita 

Signed-off-by: Ravi Sahita 

Acked-by: Daniel De Graaf 
---
 tools/flask/policy/policy/modules/xen/xen.if |  4 ++--
 xen/arch/x86/hvm/hvm.c   |  6 ++
 xen/include/xsm/dummy.h  | 12 
 xen/include/xsm/xsm.h| 12 
 xen/xsm/dummy.c  |  2 ++
 xen/xsm/flask/hooks.c| 12 
 xen/xsm/flask/policy/access_vectors  |  7 +++
 7 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.if 
b/tools/flask/policy/policy/modules/xen/xen.if
index f4cde11..6177fe9 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -8,7 +8,7 @@
 define(`declare_domain_common', `
allow $1 $2:grant { query setup };
allow $1 $2:mmu { adjust physmap map_read map_write stat pinpage 
updatemp mmuext_op };
-   allow $1 $2:hvm { getparam setparam };
+   allow $1 $2:hvm { getparam setparam altp2mhvm_op };
allow $1 $2:domain2 get_vnumainfo;
 ')
 
@@ -58,7 +58,7 @@ define(`create_domain_common', `
allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage 
mmuext_op updatemp };
allow $1 $2:grant setup;
allow $1 $2:hvm { cacheattr getparam hvmctl irqlevel pciroute sethvmc
-   setparam pcilevel trackdirtyvram nested };
+   setparam pcilevel trackdirtyvram nested altp2mhvm 
altp2mhvm_op };
 ')
 
 # create_domain(priv, target)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6e59e68..7c82e89 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5887,6 +5887,9 @@ static int hvmop_set_param(
 nestedhvm_vcpu_destroy(v);
 break;
 case HVM_PARAM_ALTP2MHVM:
+rc = xsm_hvm_param_altp2mhvm(XSM_PRIV, d);
+if ( rc )
+break;
 if ( a.value > 1 )
 rc = -EINVAL;
 if ( a.value &&
@@ -6490,6 +6493,9 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 }
 
 if ( !rc )
+rc = xsm_hvm_altp2mhvm_op(XSM_TARGET, d ? d : current->domain);
+
+if ( !rc )
 {
 switch ( a.cmd )
 {
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index f044c0f..e0b561d 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -548,6 +548,18 @@ static XSM_INLINE int xsm_hvm_param_nested(XSM_DEFAULT_ARG 
struct domain *d)
 return xsm_default_action(action, current->domain, d);
 }
 
+static XSM_INLINE int xsm_hvm_param_altp2mhvm(XSM_DEFAULT_ARG struct domain *d)
+{
+XSM_ASSERT_ACTION(XSM_PRIV);
+return xsm_default_action(action, current->domain, d);
+}
+
+static XSM_INLINE int xsm_hvm_altp2mhvm_op(XSM_DEFAULT_ARG struct domain *d)
+{
+XSM_ASSERT_ACTION(XSM_TARGET);
+return xsm_default_action(action, current->domain, d);
+}
+
 static XSM_INLINE int xsm_vm_event_control(XSM_DEFAULT_ARG struct domain *d, 
int mode, int op)
 {
 XSM_ASSERT_ACTION(XSM_PRIV);
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index c872d44..dc48d23 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -147,6 +147,8 @@ struct xsm_operations {
 int (*hvm_param) (struct domain *d, unsigned long op);
 int (*hvm_control) (struct domain *d, unsigned long op);
 int (*hvm_param_nested) (struct domain *d);
+int (*hvm_param_altp2mhvm) (struct domain *d);
+int (*hvm_altp2mhvm_op) (struct domain *d);
 int (*get_vnumainfo) (struct domain *d);
 
 int (*vm_event_control) (struct domain *d, int mode, int op);
@@ -586,6 +588,16 @@ static inline int xsm_hvm_param_nested (xsm_default_t def, 
struct domain *d)
 return xsm_ops->hvm_param_nested(d);
 }
 
+static inline int xsm_hvm_param_altp2mhvm (xsm_default_t def, struct domain *d)
+{
+return xsm_ops->hvm_param_altp2mhvm(d);
+}
+
+static inline int xsm_hvm_altp2mhvm_op (xsm_default_t def, struct domain *d)
+{
+return xsm_ops->hvm_altp2mhvm_op(d);
+}
+
 static inline int xsm_get_vnumainfo (xsm_default_t def, struct domain *d)
 {
 return xsm_ops->get_vnumainfo(d);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index e84b0e4..3461d4f 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -116,6 +116,8 @@ void xsm_fixup_ops (struct xsm_operations *ops)
 set_to_dummy_if_null(ops, hvm_param);
 set_to_dummy_if_null(ops, hvm_control);
 set_to_dummy_if_null(ops, hvm_param_nested);
+set_to_dummy_if_null(ops, hvm_param_altp2mhvm);
+set_to_dummy_if_null(ops, hvm_altp2mhvm_op);
 
 set_to_dummy_if_null(ops, do_xsm_op);
 #ifdef CONFIG_COMPAT
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 6e37d29..2b998c9 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1170,6 +1170,16 @@ static int flask_hvm_param_nested(struct domain *d)
 return current_has_perm(d, S

[Xen-devel] [PATCH v4 12/15] x86/altp2m: Add altp2mhvm HVM domain parameter.

2015-07-09 Thread Ed White
The altp2mhvm and nestedhvm parameters are mutually
exclusive and cannot be set together.

Signed-off-by: Ed White 

Reviewed-by: Andrew Cooper  for the hypervisor bits.
---
 docs/man/xl.cfg.pod.5   | 12 
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_create.c  |  1 +
 tools/libxl/libxl_dom.c |  2 ++
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c| 10 ++
 xen/arch/x86/hvm/hvm.c  | 23 +--
 xen/include/public/hvm/params.h |  5 -
 8 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..18afd46 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1035,6 +1035,18 @@ enabled by default and you should usually omit it. It 
may be necessary
 to disable the HPET in order to improve compatibility with guest
 Operating Systems (X86 only)
 
+=item B
+
+Enables or disables hvm guest access to alternate-p2m capability.
+Alternate-p2m allows a guest to manage multiple p2m guest physical
+"memory views" (as opposed to a single p2m). This option is
+disabled by default and is available only to hvm domains.
+You may want this option if you want to access-control/isolate
+access to specific guest physical memory pages accessed by
+the guest, e.g. for HVM domain memory introspection or
+for isolation/access-control of memory between components within
+a single guest hvm domain.
+
 =item B
 
 Enable or disables guest access to hardware virtualisation features,
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..17222e7 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -745,6 +745,12 @@ typedef struct libxl__ctx libxl_ctx;
 #define LIBXL_HAVE_BUILDINFO_SERIAL_LIST 1
 
 /*
+ * LIBXL_HAVE_ALTP2M
+ * If this is defined, then libxl supports alternate p2m functionality.
+ */
+#define LIBXL_HAVE_ALTP2M 1
+
+/*
  * LIBXL_HAVE_REMUS
  * If this is defined, then libxl supports remus.
  */
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f366a09..418deee 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -329,6 +329,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 libxl_defbool_setdefault(&b_info->u.hvm.hpet,   true);
 libxl_defbool_setdefault(&b_info->u.hvm.vpt_align,  true);
 libxl_defbool_setdefault(&b_info->u.hvm.nested_hvm, false);
+libxl_defbool_setdefault(&b_info->u.hvm.altp2m, false);
 libxl_defbool_setdefault(&b_info->u.hvm.usb,false);
 libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci,   true);
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index bdc0465..2f1200e 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -300,6 +300,8 @@ static void hvm_set_conf_params(xc_interface *handle, 
uint32_t domid,
 libxl_defbool_val(info->u.hvm.vpt_align));
 xc_hvm_param_set(handle, domid, HVM_PARAM_NESTEDHVM,
 libxl_defbool_val(info->u.hvm.nested_hvm));
+xc_hvm_param_set(handle, domid, HVM_PARAM_ALTP2MHVM,
+libxl_defbool_val(info->u.hvm.altp2m));
 }
 
 int libxl__build_pre(libxl__gc *gc, uint32_t domid,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e1632fa..fb641fe 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -440,6 +440,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
("mmio_hole_memkb",  MemKB),
("timer_mode",   libxl_timer_mode),
("nested_hvm",   libxl_defbool),
+   ("altp2m",   libxl_defbool),
("smbios_firmware",  string),
("acpi_firmware",string),
("nographic",libxl_defbool),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c858068..43cf6bf 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1500,6 +1500,16 @@ static void parse_config_data(const char *config_source,
 
 xlu_cfg_get_defbool(config, "nestedhvm", &b_info->u.hvm.nested_hvm, 0);
 
+xlu_cfg_get_defbool(config, "altp2mhvm", &b_info->u.hvm.altp2m, 0);
+
+if (!libxl_defbool_is_default(b_info->u.hvm.nested_hvm) &&
+libxl_defbool_val(b_info->u.hvm.nested_hvm) &&
+!libxl_defbool_is_default(b_info->u.hvm.altp2m) &&
+libxl_defbool_val(b_info->u.hvm.altp2m)) {
+fprintf(stderr, "ERROR: nestedhvm and altp2mhvm cannot be used 
together\n");
+exit(1);
+}
+
 xlu_cfg_replace_string(config, "smbios_firmware",
&b_info->u.hvm.smbios_firmware, 0);

[Xen-devel] [PATCH v4 11/15] x86/altp2m: define and implement alternate p2m HVMOP types.

2015-07-09 Thread Ed White
Signed-off-by: Ed White 
---
 xen/arch/x86/hvm/hvm.c  | 138 
 xen/include/public/hvm/hvm_op.h |  82 
 2 files changed, 220 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bda6c1e..23cd507 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6443,6 +6443,144 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 break;
 }
 
+case HVMOP_altp2m:
+{
+struct xen_hvm_altp2m_op a;
+struct domain *d = NULL;
+
+if ( copy_from_guest(&a, arg, 1) )
+return -EFAULT;
+
+switch ( a.cmd )
+{
+case HVMOP_altp2m_get_domain_state:
+case HVMOP_altp2m_set_domain_state:
+case HVMOP_altp2m_create_p2m:
+case HVMOP_altp2m_destroy_p2m:
+case HVMOP_altp2m_switch_p2m:
+case HVMOP_altp2m_set_mem_access:
+case HVMOP_altp2m_change_gfn:
+d = rcu_lock_domain_by_any_id(a.domain);
+if ( d == NULL )
+return -ESRCH;
+
+if ( !is_hvm_domain(d) || !hvm_altp2m_supported() )
+rc = -EINVAL;
+
+break;
+case HVMOP_altp2m_vcpu_enable_notify:
+
+break;
+default:
+return -ENOSYS;
+
+break;
+}
+
+if ( !rc )
+{
+switch ( a.cmd )
+{
+case HVMOP_altp2m_get_domain_state:
+a.u.domain_state.state = altp2m_active(d);
+rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
+
+break;
+case HVMOP_altp2m_set_domain_state:
+{
+struct vcpu *v;
+bool_t ostate;
+
+if ( nestedhvm_enabled(d) )
+{
+rc = -EINVAL;
+break;
+}
+
+ostate = d->arch.altp2m_active;
+d->arch.altp2m_active = !!a.u.domain_state.state;
+
+/* If the alternate p2m state has changed, handle 
appropriately */
+if ( d->arch.altp2m_active != ostate &&
+ (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
+{
+for_each_vcpu( d, v )
+{
+if ( !ostate )
+altp2m_vcpu_initialise(v);
+else
+altp2m_vcpu_destroy(v);
+}
+
+if ( ostate )
+p2m_flush_altp2m(d);
+}
+
+break;
+}
+default:
+{
+if ( !(d ? d : current->domain)->arch.altp2m_active )
+{
+rc = -EINVAL;
+break;
+}
+
+switch ( a.cmd )
+{
+case HVMOP_altp2m_vcpu_enable_notify:
+{
+struct vcpu *curr = current;
+p2m_type_t p2mt;
+
+if ( (gfn_x(vcpu_altp2m(curr).veinfo_gfn) != INVALID_GFN) 
||
+ (mfn_x(get_gfn_query_unlocked(curr->domain,
+a.u.enable_notify.gfn, &p2mt)) == INVALID_MFN) 
)
+return -EINVAL;
+
+vcpu_altp2m(curr).veinfo_gfn = _gfn(a.u.enable_notify.gfn);
+ap2m_vcpu_update_vmfunc_ve(curr);
+
+break;
+}
+case HVMOP_altp2m_create_p2m:
+if ( !(rc = p2m_init_next_altp2m(d, &a.u.view.view)) )
+rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
+
+break;
+case HVMOP_altp2m_destroy_p2m:
+rc = p2m_destroy_altp2m_by_id(d, a.u.view.view);
+
+break;
+case HVMOP_altp2m_switch_p2m:
+rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view);
+
+break;
+case HVMOP_altp2m_set_mem_access:
+rc = p2m_set_altp2m_mem_access(d, a.u.set_mem_access.view,
+_gfn(a.u.set_mem_access.gfn),
+a.u.set_mem_access.hvmmem_access);
+
+break;
+case HVMOP_altp2m_change_gfn:
+rc = p2m_change_altp2m_gfn(d, a.u.change_gfn.view,
+_gfn(a.u.change_gfn.old_gfn),
+_gfn(a.u.change_gfn.new_gfn));
+
+break;
+}
+
+break;
+}
+}
+}
+
+if ( d )
+rcu_unlock_domain(d);
+
+break;
+}
+
 default:
 {
 gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h

[Xen-devel] [PATCH v4 10/15] x86/altp2m: add remaining support routines.

2015-07-09 Thread Ed White
Add the remaining routines required to support enabling the alternate
p2m functionality.

Signed-off-by: Ed White 

Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/hvm.c   |  58 +-
 xen/arch/x86/mm/hap/Makefile |   1 +
 xen/arch/x86/mm/hap/altp2m_hap.c |  98 ++
 xen/arch/x86/mm/p2m-ept.c|   3 +
 xen/arch/x86/mm/p2m.c| 385 +++
 xen/include/asm-x86/hvm/altp2m.h |   4 +
 xen/include/asm-x86/p2m.h|  33 
 7 files changed, 576 insertions(+), 6 deletions(-)
 create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index dbb4696..bda6c1e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2802,10 +2802,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
long gla,
 mfn_t mfn;
 struct vcpu *curr = current;
 struct domain *currd = curr->domain;
-struct p2m_domain *p2m;
+struct p2m_domain *p2m, *hostp2m;
 int rc, fall_through = 0, paged = 0;
 int sharing_enomem = 0;
 vm_event_request_t *req_ptr = NULL;
+bool_t ap2m_active = 0;
 
 /* On Nested Virtualization, walk the guest page table.
  * If this succeeds, all is fine.
@@ -2865,11 +2866,31 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
long gla,
 goto out;
 }
 
-p2m = p2m_get_hostp2m(currd);
-mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 
+ap2m_active = altp2m_active(currd);
+
+/* Take a lock on the host p2m speculatively, to avoid potential
+ * locking order problems later and to handle unshare etc.
+ */
+hostp2m = p2m_get_hostp2m(currd);
+mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma,
   P2M_ALLOC | (npfec.write_access ? P2M_UNSHARE : 
0),
   NULL);
 
+if ( ap2m_active )
+{
+if ( altp2m_hap_nested_page_fault(curr, gpa, gla, npfec, &p2m) == 1 )
+{
+/* entry was lazily copied from host -- retry */
+__put_gfn(hostp2m, gfn);
+rc = 1;
+goto out;
+}
+
+mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL);
+}
+else
+p2m = hostp2m;
+
 /* Check access permissions first, then handle faults */
 if ( mfn_x(mfn) != INVALID_MFN )
 {
@@ -2909,6 +2930,20 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 
 if ( violation )
 {
+/* Should #VE be emulated for this fault? */
+if ( p2m_is_altp2m(p2m) && !cpu_has_vmx_virt_exceptions )
+{
+bool_t sve;
+
+p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL, &sve);
+
+if ( !sve && ap2m_vcpu_emulate_ve(curr) )
+{
+rc = 1;
+goto out_put_gfn;
+}
+}
+
 if ( p2m_mem_access_check(gpa, gla, npfec, &req_ptr) )
 {
 fall_through = 1;
@@ -2928,7 +2963,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
  (npfec.write_access &&
   (p2m_is_discard_write(p2mt) || (p2mt == p2m_mmio_write_dm))) )
 {
-put_gfn(currd, gfn);
+__put_gfn(p2m, gfn);
+if ( ap2m_active )
+__put_gfn(hostp2m, gfn);
 
 rc = 0;
 if ( unlikely(is_pvh_domain(currd)) )
@@ -2957,6 +2994,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 /* Spurious fault? PoD and log-dirty also take this path. */
 if ( p2m_is_ram(p2mt) )
 {
+rc = 1;
 /*
  * Page log dirty is always done with order 0. If this mfn resides in
  * a large page, we do not change other pages type within that large
@@ -2965,9 +3003,15 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 if ( npfec.write_access )
 {
 paging_mark_dirty(currd, mfn_x(mfn));
+/* If p2m is really an altp2m, unlock here to avoid lock ordering
+ * violation when the change below is propagated from host p2m */
+if ( ap2m_active )
+__put_gfn(p2m, gfn);
 p2m_change_type_one(currd, gfn, p2m_ram_logdirty, p2m_ram_rw);
+__put_gfn(ap2m_active ? hostp2m : p2m, gfn);
+
+goto out;
 }
-rc = 1;
 goto out_put_gfn;
 }
 
@@ -2977,7 +3021,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 rc = fall_through;
 
 out_put_gfn:
-put_gfn(currd, gfn);
+__put_gfn(p2m, gfn);
+if ( ap2m_active )
+__put_gfn(hostp2m, gfn);
 out:
 /* All of these are delayed until we exit, since we might 
  * sleep on event ring wait queues, and we must not hold
diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
index 68f2bb5..216cd90 100644
--- a/xen/arch/x86/mm/hap/Makefile
+++ b/xen/arch/x86/mm/hap/Makefile
@@ -4,6 +4,7 @@ obj-y += guest_wal

[Xen-devel] [PATCH v4 09/15] x86/altp2m: alternate p2m memory events.

2015-07-09 Thread Ed White
Add a flag to indicate that a memory event occurred in an alternate p2m
and a field containing the p2m index. Allow any event response to switch
to a different alternate p2m using the same flag and field.

Modify p2m_mem_access_check() to handle alternate p2m's.

Signed-off-by: Ed White 

Acked-by: Andrew Cooper  for the x86 bits.
Acked-by: George Dunlap 
---
 xen/arch/x86/mm/p2m.c | 19 ++-
 xen/common/vm_event.c |  4 
 xen/include/asm-arm/p2m.h |  6 ++
 xen/include/asm-x86/p2m.h |  3 +++
 xen/include/public/vm_event.h | 11 +++
 5 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 561a83c..d4d1ba1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1514,6 +1514,12 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
 }
 }
 
+void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
+{
+if ( altp2m_active(v->domain) )
+p2m_switch_vcpu_altp2m_by_id(v, idx);
+}
+
 bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
 struct npfec npfec,
 vm_event_request_t **req_ptr)
@@ -1521,7 +1527,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long 
gla,
 struct vcpu *v = current;
 unsigned long gfn = gpa >> PAGE_SHIFT;
 struct domain *d = v->domain;
-struct p2m_domain* p2m = p2m_get_hostp2m(d);
+struct p2m_domain *p2m = NULL;
 mfn_t mfn;
 p2m_type_t p2mt;
 p2m_access_t p2ma;
@@ -1530,6 +1536,11 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long 
gla,
 unsigned long eip = guest_cpu_user_regs()->eip;
 bool_t sve;
 
+if ( altp2m_active(d) )
+p2m = p2m_get_altp2m(v);
+if ( !p2m )
+p2m = p2m_get_hostp2m(d);
+
 /* First, handle rx2rw conversion automatically.
  * These calls to p2m->set_entry() must succeed: we have the gfn
  * locked and just did a successful get_entry(). */
@@ -1636,6 +1647,12 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long 
gla,
 req->vcpu_id = v->vcpu_id;
 
 p2m_vm_event_fill_regs(req);
+
+if ( altp2m_active(v->domain) )
+{
+req->flags |= VM_EVENT_FLAG_ALTERNATE_P2M;
+req->altp2m_idx = vcpu_altp2m(v).p2midx;
+}
 }
 
 /* Pause the current VCPU */
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 120a78a..13224e2 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -399,6 +399,10 @@ void vm_event_resume(struct domain *d, struct 
vm_event_domain *ved)
 
 };
 
+/* Check for altp2m switch */
+if ( rsp.flags & VM_EVENT_FLAG_ALTERNATE_P2M )
+p2m_altp2m_check(v, rsp.altp2m_idx);
+
 /* Unpause domain. */
 if ( rsp.flags & VM_EVENT_FLAG_VCPU_PAUSED )
 vm_event_vcpu_unpause(v);
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 63748ef..08bdce3 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -109,6 +109,12 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
 /* Not supported on ARM. */
 }
 
+static inline
+void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
+{
+/* Not supported on ARM. */
+}
+
 #define p2m_is_foreign(_t)  ((_t) == p2m_map_foreign)
 #define p2m_is_ram(_t)  ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro)
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 0a172e0..722e54c 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -751,6 +751,9 @@ uint16_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t 
eptp);
 /* Switch alternate p2m for a single vcpu */
 bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
 
+/* Check to see if vcpu should be switched to a different p2m. */
+void p2m_altp2m_check(struct vcpu *v, uint16_t idx);
+
 /*
  * p2m type to IOMMU flags
  */
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index 577e971..6dfa9db 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -47,6 +47,16 @@
 #define VM_EVENT_FLAG_VCPU_PAUSED (1 << 0)
 /* Flags to aid debugging mem_event */
 #define VM_EVENT_FLAG_FOREIGN (1 << 1)
+/*
+ * This flag can be set in a request or a response
+ *
+ * On a request, indicates that the event occurred in the alternate p2m 
specified by
+ * the altp2m_idx request field.
+ *
+ * On a response, indicates that the VCPU should resume in the alternate p2m 
specified
+ * by the altp2m_idx response field if possible.
+ */
+#define VM_EVENT_FLAG_ALTERNATE_P2M   (1 << 2)
 
 /*
  * Reasons for the vm event request
@@ -194,6 +204,7 @@ typedef struct vm_event_st {
 uint32_t flags; /* VM_EVENT_FLAG_* */
 uint32_t reason;/* VM_EVENT_REASON_* */
 uint32_t vcpu_id;
+uint16_t altp2m_idx; /* may be used during request and response */
 
 union {
 struct vm_event_pagingmem_paging;
-- 
1.9.1



[Xen-devel] [PATCH v4 08/15] x86/altp2m: add control of suppress_ve.

2015-07-09 Thread Ed White
From: George Dunlap 

The existing ept_set_entry() and ept_get_entry() routines are extended
to optionally set/get suppress_ve.  Passing -1 will set suppress_ve on
new p2m entries, or retain suppress_ve flag on existing entries.

Signed-off-by: George Dunlap 
---
 xen/arch/x86/mm/mem_sharing.c |  5 +++--
 xen/arch/x86/mm/p2m-ept.c | 18 
 xen/arch/x86/mm/p2m-pod.c | 12 +--
 xen/arch/x86/mm/p2m-pt.c  | 10 +++--
 xen/arch/x86/mm/p2m.c | 50 ++-
 xen/include/asm-x86/p2m.h | 24 +++--
 6 files changed, 70 insertions(+), 49 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 16e329e..5780a26 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1257,10 +1257,11 @@ int relinquish_shared_pages(struct domain *d)
 p2m_type_t t;
 mfn_t mfn;
 int set_rc;
+bool_t sve;
 
 if ( atomic_read(&d->shr_pages) == 0 )
 break;
-mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);
+mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, &sve);
 if ( mfn_valid(mfn) && (t == p2m_ram_shared) )
 {
 /* Does not fail with ENOMEM given the DESTROY flag */
@@ -1270,7 +1271,7 @@ int relinquish_shared_pages(struct domain *d)
  * unshare.  Must succeed: we just read the old entry and
  * we hold the p2m lock. */
 set_rc = p2m->set_entry(p2m, gfn, _mfn(0), PAGE_ORDER_4K,
-p2m_invalid, p2m_access_rwx);
+p2m_invalid, p2m_access_rwx, sve);
 ASSERT(set_rc == 0);
 count += 0x10;
 }
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 4111795..1106235 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -657,7 +657,8 @@ bool_t ept_handle_misconfig(uint64_t gpa)
  */
 static int
 ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
-  unsigned int order, p2m_type_t p2mt, p2m_access_t p2ma)
+  unsigned int order, p2m_type_t p2mt, p2m_access_t p2ma,
+  int sve)
 {
 ept_entry_t *table, *ept_entry = NULL;
 unsigned long gfn_remainder = gfn;
@@ -803,7 +804,11 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, 
mfn_t mfn,
 ept_p2m_type_to_flags(p2m, &new_entry, p2mt, p2ma);
 }
 
-new_entry.suppress_ve = 1;
+if ( sve != -1 )
+new_entry.suppress_ve = !!sve;
+else
+new_entry.suppress_ve = is_epte_valid(&old_entry) ?
+old_entry.suppress_ve : 1;
 
 rc = atomic_write_ept_entry(ept_entry, new_entry, target);
 if ( unlikely(rc) )
@@ -850,8 +855,9 @@ out:
 
 /* Read ept p2m entries */
 static mfn_t ept_get_entry(struct p2m_domain *p2m,
-   unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
-   p2m_query_t q, unsigned int *page_order)
+unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
+p2m_query_t q, unsigned int *page_order,
+bool_t *sve)
 {
 ept_entry_t *table = 
map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
 unsigned long gfn_remainder = gfn;
@@ -865,6 +871,8 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
 
 *t = p2m_mmio_dm;
 *a = p2m_access_n;
+if ( sve )
+*sve = 1;
 
 /* This pfn is higher than the highest the p2m map currently holds */
 if ( gfn > p2m->max_mapped_pfn )
@@ -930,6 +938,8 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
 else
 *t = ept_entry->sa_p2mt;
 *a = ept_entry->access;
+if ( sve )
+*sve = ept_entry->suppress_ve;
 
 mfn = _mfn(ept_entry->mfn);
 if ( i )
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index 0679f00..a2f6d02 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -536,7 +536,7 @@ recount:
 p2m_access_t a;
 p2m_type_t t;
 
-(void)p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL);
+(void)p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL, NULL);
 
 if ( t == p2m_populate_on_demand )
 pod++;
@@ -587,7 +587,7 @@ recount:
 p2m_type_t t;
 p2m_access_t a;
 
-mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL);
+mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL, NULL);
 if ( t == p2m_populate_on_demand )
 {
 p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid,
@@ -676,7 +676,7 @@ p2m_pod_zero_check_superpage(struct p2m_domain *p2m, 
unsigned long gfn)
 for ( i=0; iget_entry(p2m, gfn + i, &type, &a, 0, NULL);
+mfn = p2m->get_entry(p2m, gfn + i, &type, &a, 0, NULL, NULL);
 
 if ( i == 0 )
 {
@@ -808,7 +808,7 @@ p2m

[Xen-devel] [PATCH v4 07/15] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator.

2015-07-09 Thread Ed White
From: Ravi Sahita 

Signed-off-by: Ravi Sahita 
---
 xen/arch/x86/hvm/emulate.c | 19 +--
 xen/arch/x86/hvm/vmx/vmx.c | 29 +
 xen/arch/x86/x86_emulate/x86_emulate.c | 20 +++-
 xen/arch/x86/x86_emulate/x86_emulate.h |  4 
 xen/include/asm-x86/hvm/hvm.h  |  2 ++
 5 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index fe5661d..1c90832 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1436,6 +1436,19 @@ static int hvmemul_invlpg(
 return rc;
 }
 
+static int hvmemul_vmfunc(
+struct x86_emulate_ctxt *ctxt)
+{
+int rc;
+
+rc = hvm_funcs.ap2m_vcpu_emulate_vmfunc(ctxt->regs);
+if ( rc != X86EMUL_OKAY )
+{
+hvmemul_inject_hw_exception(TRAP_invalid_op, 0, ctxt);
+}
+return rc;
+}
+
 static const struct x86_emulate_ops hvm_emulate_ops = {
 .read  = hvmemul_read,
 .insn_fetch= hvmemul_insn_fetch,
@@ -1459,7 +1472,8 @@ static const struct x86_emulate_ops hvm_emulate_ops = {
 .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
 .get_fpu   = hvmemul_get_fpu,
 .put_fpu   = hvmemul_put_fpu,
-.invlpg= hvmemul_invlpg
+.invlpg= hvmemul_invlpg,
+.vmfunc= hvmemul_vmfunc,
 };
 
 static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
@@ -1485,7 +1499,8 @@ static const struct x86_emulate_ops 
hvm_emulate_ops_no_write = {
 .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
 .get_fpu   = hvmemul_get_fpu,
 .put_fpu   = hvmemul_put_fpu,
-.invlpg= hvmemul_invlpg
+.invlpg= hvmemul_invlpg,
+.vmfunc= hvmemul_vmfunc,
 };
 
 static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 28afdaa..2664673 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -82,6 +82,7 @@ static void vmx_fpu_dirty_intercept(void);
 static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content);
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg_intercept(unsigned long vaddr);
+static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
 
 uint8_t __read_mostly posted_intr_vector;
 
@@ -1830,6 +1831,19 @@ static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
 vmx_vmcs_exit(v);
 }
 
+static int vmx_vcpu_emulate_vmfunc(struct cpu_user_regs *regs)
+{
+int rc = X86EMUL_EXCEPTION;
+struct vcpu *curr = current;
+
+if ( !cpu_has_vmx_vmfunc && altp2m_active(curr->domain) &&
+ regs->eax == 0 &&
+ p2m_switch_vcpu_altp2m_by_id(curr, (uint16_t)regs->ecx) )
+rc = X86EMUL_OKAY;
+
+return rc;
+}
+
 static bool_t vmx_vcpu_emulate_ve(struct vcpu *v)
 {
 bool_t rc = 0;
@@ -1898,6 +1912,7 @@ static struct hvm_function_table __initdata 
vmx_function_table = {
 .msr_read_intercept   = vmx_msr_read_intercept,
 .msr_write_intercept  = vmx_msr_write_intercept,
 .invlpg_intercept = vmx_invlpg_intercept,
+.vmfunc_intercept = vmx_vmfunc_intercept,
 .handle_cd= vmx_handle_cd,
 .set_info_guest   = vmx_set_info_guest,
 .set_rdtsc_exiting= vmx_set_rdtsc_exiting,
@@ -1924,6 +1939,7 @@ static struct hvm_function_table __initdata 
vmx_function_table = {
 .ap2m_vcpu_update_eptp = vmx_vcpu_update_eptp,
 .ap2m_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
 .ap2m_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
+.ap2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
@@ -2095,6 +2111,12 @@ static void vmx_invlpg_intercept(unsigned long vaddr)
 vpid_sync_vcpu_gva(curr, vaddr);
 }
 
+static int vmx_vmfunc_intercept(struct cpu_user_regs *regs)
+{
+gdprintk(XENLOG_ERR, "Failed guest VMFUNC execution\n");
+return X86EMUL_EXCEPTION;
+}
+
 static int vmx_cr_access(unsigned long exit_qualification)
 {
 struct vcpu *curr = current;
@@ -3234,6 +3256,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 update_guest_eip();
 break;
 
+case EXIT_REASON_VMFUNC:
+if ( vmx_vmfunc_intercept(regs) == X86EMUL_EXCEPTION )
+hvm_inject_hw_exception(TRAP_invalid_op, 
HVM_DELIVER_NO_ERROR_CODE);
+else
+update_guest_eip();
+break;
+
 case EXIT_REASON_MWAIT_INSTRUCTION:
 case EXIT_REASON_MONITOR_INSTRUCTION:
 case EXIT_REASON_GETSEC:
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c 
b/xen/arch/x86/x86_emulate/x86_emulate.c
index c017c69..d941771 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3816,8 +3816,9 @@ x86_emulate(
 struct segment_register reg;
 unsigned long base, limit, cr0, cr0w;
 
-if ( modrm == 0xdf ) /* invlpga */
+ 

[Xen-devel] [PATCH v4 05/15] x86/altp2m: basic data structures and support routines.

2015-07-09 Thread Ed White
Add the basic data structures needed to support alternate p2m's and
the functions to initialise them and tear them down.

Although Intel hardware can handle 512 EPTP's per hardware thread
concurrently, only 10 per domain are supported in this patch for
performance reasons.

The iterator in hap_enable() does need to handle 512, so that is now
uint16_t.

This change also splits the p2m lock into one lock type for altp2m's
and another type for all other p2m's. The purpose of this is to place
the altp2m list lock between the types, so the list lock can be
acquired whilst holding the host p2m lock.

Signed-off-by: Ed White 

Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/hvm/Makefile|  1 +
 xen/arch/x86/hvm/altp2m.c| 92 +
 xen/arch/x86/hvm/hvm.c   | 21 +
 xen/arch/x86/mm/hap/hap.c| 32 -
 xen/arch/x86/mm/mm-locks.h   | 38 +++-
 xen/arch/x86/mm/p2m.c| 98 
 xen/include/asm-x86/domain.h | 10 
 xen/include/asm-x86/hvm/altp2m.h | 38 
 xen/include/asm-x86/hvm/hvm.h| 17 +++
 xen/include/asm-x86/hvm/vcpu.h   |  9 
 xen/include/asm-x86/p2m.h| 30 +++-
 11 files changed, 382 insertions(+), 4 deletions(-)
 create mode 100644 xen/arch/x86/hvm/altp2m.c
 create mode 100644 xen/include/asm-x86/hvm/altp2m.h

diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index 69af47f..eb1a37b 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -1,6 +1,7 @@
 subdir-y += svm
 subdir-y += vmx
 
+obj-y += altp2m.o
 obj-y += asid.o
 obj-y += emulate.o
 obj-y += event.o
diff --git a/xen/arch/x86/hvm/altp2m.c b/xen/arch/x86/hvm/altp2m.c
new file mode 100644
index 000..f98a38d
--- /dev/null
+++ b/xen/arch/x86/hvm/altp2m.c
@@ -0,0 +1,92 @@
+/*
+ * Alternate p2m HVM
+ * Copyright (c) 2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+void
+altp2m_vcpu_reset(struct vcpu *v)
+{
+struct altp2mvcpu *av = &vcpu_altp2m(v);
+
+av->p2midx = INVALID_ALTP2M;
+av->veinfo_gfn = _gfn(INVALID_GFN);
+
+if ( hvm_funcs.ap2m_vcpu_reset )
+hvm_funcs.ap2m_vcpu_reset(v);
+}
+
+int
+altp2m_vcpu_initialise(struct vcpu *v)
+{
+int rc = -EOPNOTSUPP;
+
+if ( v != current )
+vcpu_pause(v);
+
+if ( !hvm_funcs.ap2m_vcpu_initialise ||
+ (hvm_funcs.ap2m_vcpu_initialise(v) == 0) )
+{
+rc = 0;
+altp2m_vcpu_reset(v);
+vcpu_altp2m(v).p2midx = 0;
+atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
+
+ap2m_vcpu_update_eptp(v);
+}
+
+if ( v != current )
+vcpu_unpause(v);
+
+return rc;
+}
+
+void
+altp2m_vcpu_destroy(struct vcpu *v)
+{
+struct p2m_domain *p2m;
+
+if ( v != current )
+vcpu_pause(v);
+
+if ( hvm_funcs.ap2m_vcpu_destroy )
+hvm_funcs.ap2m_vcpu_destroy(v);
+
+if ( (p2m = p2m_get_altp2m(v)) )
+atomic_dec(&p2m->active_vcpus);
+
+altp2m_vcpu_reset(v);
+
+ap2m_vcpu_update_eptp(v);
+ap2m_vcpu_update_vmfunc_ve(v);
+
+if ( v != current )
+vcpu_unpause(v);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4019658..dbb4696 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -58,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2380,6 +2381,7 @@ void hvm_vcpu_destroy(struct vcpu *v)
 {
 hvm_all_ioreq_servers_remove_vcpu(v->domain, v);
 
+altp2m_vcpu_destroy(v);
 nestedhvm_vcpu_destroy(v);
 
 free_compat_arg_xlat(v);
@@ -6498,6 +6500,25 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v)
 return hvm_funcs.nhvm_intr_blocked(v);
 }
 
+void ap2m_vcpu_update_eptp(struct vcpu *v)
+{
+if ( hvm_funcs.ap2m_vcpu_update_eptp )
+hvm_funcs.ap2m_vcpu_update_eptp(v);
+}
+
+void ap2m_vcpu_update_vmfunc_ve(struct vcpu *v)
+{
+if ( hvm_funcs.ap2m_vcpu_update_vmfunc_ve )
+hvm_funcs.ap2m_vcpu_update_vmfunc_ve(v);
+}
+
+bool_t ap2m_vcpu_emulate_ve(struct vcpu *v)
+{
+if ( hvm_funcs.ap2m_vcpu_emulate_ve )
+return hvm_funcs.ap

[Xen-devel] [PATCH v4 06/15] VMX/altp2m: add code to support EPTP switching and #VE.

2015-07-09 Thread Ed White
Implement and hook up the code to enable VMX support of VMFUNC and #VE.

VMFUNC leaf 0 (EPTP switching) emulation is added in a later patch.

Signed-off-by: Ed White 

Reviewed-by: Andrew Cooper 
Acked-by: Jun Nakajima 
---
 xen/arch/x86/hvm/vmx/vmx.c | 138 +
 1 file changed, 138 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 07527dd..28afdaa 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1763,6 +1764,104 @@ static void vmx_enable_msr_exit_interception(struct 
domain *d)
  MSR_TYPE_W);
 }
 
+static void vmx_vcpu_update_eptp(struct vcpu *v)
+{
+struct domain *d = v->domain;
+struct p2m_domain *p2m = NULL;
+struct ept_data *ept;
+
+if ( altp2m_active(d) )
+p2m = p2m_get_altp2m(v);
+if ( !p2m )
+p2m = p2m_get_hostp2m(d);
+
+ept = &p2m->ept;
+ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
+
+vmx_vmcs_enter(v);
+
+__vmwrite(EPT_POINTER, ept_get_eptp(ept));
+
+if ( v->arch.hvm_vmx.secondary_exec_control &
+SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
+__vmwrite(EPTP_INDEX, vcpu_altp2m(v).p2midx);
+
+vmx_vmcs_exit(v);
+}
+
+static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
+{
+struct domain *d = v->domain;
+u32 mask = SECONDARY_EXEC_ENABLE_VM_FUNCTIONS;
+
+if ( !cpu_has_vmx_vmfunc )
+return;
+
+if ( cpu_has_vmx_virt_exceptions )
+mask |= SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
+
+vmx_vmcs_enter(v);
+
+if ( !d->is_dying && altp2m_active(d) )
+{
+v->arch.hvm_vmx.secondary_exec_control |= mask;
+__vmwrite(VM_FUNCTION_CONTROL, VMX_VMFUNC_EPTP_SWITCHING);
+__vmwrite(EPTP_LIST_ADDR, virt_to_maddr(d->arch.altp2m_eptp));
+
+if ( cpu_has_vmx_virt_exceptions )
+{
+p2m_type_t t;
+mfn_t mfn;
+
+mfn = get_gfn_query_unlocked(d, gfn_x(vcpu_altp2m(v).veinfo_gfn), 
&t);
+
+if ( mfn_x(mfn) != INVALID_MFN )
+__vmwrite(VIRT_EXCEPTION_INFO, mfn_x(mfn) << PAGE_SHIFT);
+else
+mask &= ~SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
+}
+}
+else
+v->arch.hvm_vmx.secondary_exec_control &= ~mask;
+
+__vmwrite(SECONDARY_VM_EXEC_CONTROL,
+v->arch.hvm_vmx.secondary_exec_control);
+
+vmx_vmcs_exit(v);
+}
+
+static bool_t vmx_vcpu_emulate_ve(struct vcpu *v)
+{
+bool_t rc = 0;
+ve_info_t *veinfo = gfn_x(vcpu_altp2m(v).veinfo_gfn) != INVALID_GFN ?
+hvm_map_guest_frame_rw(gfn_x(vcpu_altp2m(v).veinfo_gfn), 0) : NULL;
+
+if ( !veinfo )
+return 0;
+
+if ( veinfo->semaphore != 0 )
+goto out;
+
+rc = 1;
+
+veinfo->exit_reason = EXIT_REASON_EPT_VIOLATION;
+veinfo->semaphore = ~0l;
+veinfo->eptp_index = vcpu_altp2m(v).p2midx;
+
+vmx_vmcs_enter(v);
+__vmread(EXIT_QUALIFICATION, &veinfo->exit_qualification);
+__vmread(GUEST_LINEAR_ADDRESS, &veinfo->gla);
+__vmread(GUEST_PHYSICAL_ADDRESS, &veinfo->gpa);
+vmx_vmcs_exit(v);
+
+hvm_inject_hw_exception(TRAP_virtualisation,
+HVM_DELIVER_NO_ERROR_CODE);
+
+out:
+hvm_unmap_guest_frame(veinfo, 0);
+return rc;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
 .name = "VMX",
 .cpu_up_prepare   = vmx_cpu_up_prepare,
@@ -1822,6 +1921,9 @@ static struct hvm_function_table __initdata 
vmx_function_table = {
 .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
 .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
 .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
+.ap2m_vcpu_update_eptp = vmx_vcpu_update_eptp,
+.ap2m_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
+.ap2m_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
@@ -2743,6 +2845,42 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
 /* Now enable interrupts so it's safe to take locks. */
 local_irq_enable();
+ 
+/*
+ * If the guest has the ability to switch EPTP without an exit,
+ * figure out whether it has done so and update the altp2m data.
+ */
+if ( altp2m_active(v->domain) &&
+(v->arch.hvm_vmx.secondary_exec_control &
+SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
+{
+unsigned long idx;
+
+if ( v->arch.hvm_vmx.secondary_exec_control &
+SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
+__vmread(EPTP_INDEX, &idx);
+else
+{
+unsigned long eptp;
+
+__vmread(EPT_POINTER, &eptp);
+
+if ( (idx = p2m_find_altp2m_by_eptp(v->domain, eptp)) ==
+ INVALID_ALTP2M )
+{
+gdprintk(XENLOG_ERR, "EPTP not found in alternate p2

[Xen-devel] [PATCH v4 04/15] x86/HVM: Hardware alternate p2m support detection.

2015-07-09 Thread Ed White
As implemented here, only supported on platforms with VMX HAP.

By default this functionality is force-disabled, it can be enabled
by specifying altp2m=1 on the Xen command line.

Signed-off-by: Ed White 

Reviewed-by: Andrew Cooper 
---
 docs/misc/xen-command-line.markdown | 7 +++
 xen/arch/x86/hvm/hvm.c  | 7 +++
 xen/arch/x86/hvm/vmx/vmx.c  | 1 +
 xen/include/asm-x86/hvm/hvm.h   | 9 +
 4 files changed, 24 insertions(+)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index aa684c0..3391c66 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -139,6 +139,13 @@ mode during S3 resume.
 > Default: `true`
 
 Permit Xen to use superpages when performing memory management.
+ 
+### altp2m (Intel)
+> `= `
+
++> Default: `false`
+
+Permit multiple copies of host p2m.
 
 ### apic
 > `= bigsmp | default`
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..4019658 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -94,6 +94,10 @@ bool_t opt_hvm_fep;
 boolean_param("hvm_fep", opt_hvm_fep);
 #endif
 
+/* Xen command-line option to enable altp2m */
+static bool_t __initdata opt_altp2m_enabled = 0;
+boolean_param("altp2m", opt_altp2m_enabled);
+
 static int cpu_callback(
 struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -160,6 +164,9 @@ static int __init hvm_enable(void)
 if ( !fns->pvh_supported )
 printk(XENLOG_INFO "HVM: PVH mode not supported on this platform\n");
 
+if ( !opt_altp2m_enabled )
+hvm_funcs.altp2m_supported = 0;
+
 /*
  * Allow direct access to the PC debug ports 0x80 and 0xed (they are
  * often used for I/O delays, but the vmexits simply slow things down).
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index fc29b89..07527dd 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1841,6 +1841,7 @@ const struct hvm_function_table * __init start_vmx(void)
 if ( cpu_has_vmx_ept && (cpu_has_vmx_pat || opt_force_ept) )
 {
 vmx_function_table.hap_supported = 1;
+vmx_function_table.altp2m_supported = 1;
 
 vmx_function_table.hap_capabilities = 0;
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 57f9605..c61cfe7 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -94,6 +94,9 @@ struct hvm_function_table {
 /* Necessary hardware support for PVH mode? */
 int pvh_supported;
 
+/* Necessary hardware support for alternate p2m's? */
+bool_t altp2m_supported;
+
 /* Indicate HAP capabilities. */
 int hap_capabilities;
 
@@ -509,6 +512,12 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
 /* interrupt */
 enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
 
+/* returns true if hardware supports alternate p2m's */
+static inline bool_t hvm_altp2m_supported(void)
+{
+return hvm_funcs.altp2m_supported;
+}
+
 #ifndef NDEBUG
 /* Permit use of the Forced Emulation Prefix in HVM guests */
 extern bool_t opt_hvm_fep;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 02/15] VMX: VMFUNC and #VE definitions and detection.

2015-07-09 Thread Ed White
Currently, neither is enabled globally but may be enabled on a per-VCPU
basis by the altp2m code.

Remove the check for EPTE bit 63 == zero in ept_split_super_page(), as
that bit is now hardware-defined.

Signed-off-by: Ed White 

Reviewed-by: Andrew Cooper 
Acked-by: George Dunlap 
Acked-by: Jun Nakajima 
---
 xen/arch/x86/hvm/vmx/vmcs.c| 42 +++---
 xen/arch/x86/mm/p2m-ept.c  |  1 -
 xen/include/asm-x86/hvm/vmx/vmcs.h | 14 +++--
 xen/include/asm-x86/hvm/vmx/vmx.h  | 13 +++-
 xen/include/asm-x86/msr-index.h|  1 +
 5 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4c5ceb5..bc1cabd 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -101,6 +101,8 @@ u32 vmx_secondary_exec_control __read_mostly;
 u32 vmx_vmexit_control __read_mostly;
 u32 vmx_vmentry_control __read_mostly;
 u64 vmx_ept_vpid_cap __read_mostly;
+u64 vmx_vmfunc __read_mostly;
+bool_t vmx_virt_exception __read_mostly;
 
 const u32 vmx_introspection_force_enabled_msrs[] = {
 MSR_IA32_SYSENTER_EIP,
@@ -140,6 +142,8 @@ static void __init vmx_display_features(void)
 P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt Delivery");
 P(cpu_has_vmx_posted_intr_processing, "Posted Interrupt Processing");
 P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
+P(cpu_has_vmx_vmfunc, "VM Functions");
+P(cpu_has_vmx_virt_exceptions, "Virtualisation Exceptions");
 P(cpu_has_vmx_pml, "Page Modification Logging");
 #undef P
 
@@ -185,6 +189,7 @@ static int vmx_init_vmcs_config(void)
 u64 _vmx_misc_cap = 0;
 u32 _vmx_vmexit_control;
 u32 _vmx_vmentry_control;
+u64 _vmx_vmfunc = 0;
 bool_t mismatch = 0;
 
 rdmsr(MSR_IA32_VMX_BASIC, vmx_basic_msr_low, vmx_basic_msr_high);
@@ -230,7 +235,9 @@ static int vmx_init_vmcs_config(void)
SECONDARY_EXEC_ENABLE_EPT |
SECONDARY_EXEC_ENABLE_RDTSCP |
SECONDARY_EXEC_PAUSE_LOOP_EXITING |
-   SECONDARY_EXEC_ENABLE_INVPCID);
+   SECONDARY_EXEC_ENABLE_INVPCID |
+   SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
+   SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
 rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
 if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
@@ -341,6 +348,24 @@ static int vmx_init_vmcs_config(void)
   || !(_vmx_vmexit_control & VM_EXIT_ACK_INTR_ON_EXIT) )
 _vmx_pin_based_exec_control  &= ~ PIN_BASED_POSTED_INTERRUPT;
 
+/* The IA32_VMX_VMFUNC MSR exists only when VMFUNC is available */
+if ( _vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS )
+{
+rdmsrl(MSR_IA32_VMX_VMFUNC, _vmx_vmfunc);
+
+/*
+ * VMFUNC leaf 0 (EPTP switching) must be supported.
+ *
+ * Or we just don't use VMFUNC.
+ */
+if ( !(_vmx_vmfunc & VMX_VMFUNC_EPTP_SWITCHING) )
+_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VM_FUNCTIONS;
+}
+
+/* Virtualization exceptions are only enabled if VMFUNC is enabled */
+if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
+_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
+
 min = 0;
 opt = VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_BNDCFGS;
 _vmx_vmentry_control = adjust_vmx_controls(
@@ -361,6 +386,9 @@ static int vmx_init_vmcs_config(void)
 vmx_vmentry_control= _vmx_vmentry_control;
 vmx_basic_msr  = ((u64)vmx_basic_msr_high << 32) |
  vmx_basic_msr_low;
+vmx_vmfunc = _vmx_vmfunc;
+vmx_virt_exception = !!(_vmx_secondary_exec_control &
+   SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
 vmx_display_features();
 
 /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
@@ -397,6 +425,9 @@ static int vmx_init_vmcs_config(void)
 mismatch |= cap_check(
 "EPT and VPID Capability",
 vmx_ept_vpid_cap, _vmx_ept_vpid_cap);
+mismatch |= cap_check(
+"VMFUNC Capability",
+vmx_vmfunc, _vmx_vmfunc);
 if ( cpu_has_vmx_ins_outs_instr_info !=
  !!(vmx_basic_msr_high & (VMX_BASIC_INS_OUT_INFO >> 32)) )
 {
@@ -967,6 +998,11 @@ static int construct_vmcs(struct vcpu *v)
 /* Do not enable Monitor Trap Flag unless start single step debug */
 v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
 
+/* Disable VMFUNC and #VE for now: they may be enabled later by altp2m. */
+v->arch.hvm_vmx.secondary_exec_control &=
+~(SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
+  SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
+
 if ( is_pvh_domain(d) )
 {
 /* Disable virtual apics, TPR */
@@ -1790,9 +1826,9 @@ void v

[Xen-devel] [PATCH v4 00/15] Alternate p2m: support multiple copies of host p2m

2015-07-09 Thread Ed White
This set of patches adds support to hvm domains for EPTP switching by creating
multiple copies of the host p2m (currently limited to 10 copies).

The primary use of this capability is expected to be in scenarios where access
to memory needs to be monitored and/or restricted below the level at which the
guest OS page tables operate. Two examples that were discussed at the 2014 Xen
developer summit are:

VM introspection: 
http://www.slideshare.net/xen_com_mgr/
zero-footprint-guest-memory-introspection-from-xen

Secure inter-VM communication:
http://www.slideshare.net/xen_com_mgr/nakajima-nvf

A more detailed design specification can be found at:
http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg01319.html

Each p2m copy is populated lazily on EPT violations.
Permissions for pages in alternate p2m's can be changed in a similar
way to the existing memory access interface, and gfn->mfn mappings can be 
changed.

All this is done through extra HVMOP types.

The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
code is hypervisor-only, the toolstack has not been modified.

The intra-domain code has been tested. Violation notifications can only be 
received
for pages that have been modified (access permissions and/or gfn->mfn mapping) 
intra-domain, and only on VCPU's that have enabled notification.

VMFUNC and #VE will both be emulated on hardware without native support.

This code is not compatible with nested hvm functionality and will refuse to 
work
with nested hvm active. It is also not compatible with migration. It should be
considered experimental.


Changes since v3:

Major changes are:

Replaced patch 8.

Refactored patch 11 to use a single HVMOP with subcodes.

Addressed feedback in patch 7, and some other patches.

Added two tools/test patches from Tamas. Both are optional.

Added various ack's and reviewed-by's.

Rebased.

Ravi Sahita will now be the point of contact for this series.


Changes since v2:

Addressed all v2 feedback *except*:

In patch 5, the per-domain EPTP list page is still allocated from the
Xen heap. If allocated from the domain heap Xen panics - IIRC on Haswell
hardware when walking the EPTP list during exit processing in patch 6.

HVM_ops are not merged. Tamas suggested merging the memory access ops,
but in practice they are not as similar as they appear on the surface.
Razvan suggested merging the implementation code in p2m.c, but that is
also not as common as it appears on the surface.
Andrew suggested merging all altp2m ops into one with a subop code in
the input stucture. His point that only 255 ops can be defined is well
taken, but altp2m uses only 2 more ops than the recently introduced
ioreq ops, and <15% of the available ops have been defined. Since we
don't know how to implement XSM hooks and policy with the subop model,
we have not adopted this suggestion.

The p2m set/get interface is not modified. The altp2m code needs to
write suppress_ve in 2 places and read it in 1 place. The original
patch series managed this by coupling the state of suppress_ve to the
p2m memory type, which Tim disliked. In v2 of the series, special
set/get interaces were added to access suppress_ve only when required.
Jan has suggested changing the existing interfaces, but we feel this
is inappropriate for this experimental patch series. Changing the
existing interfaces would require a design agreement to be reached
and would impact a large amount of existing code.

Andrew kindly added some reviewed-by's to v2. I have not carried
his reviewed-by of the memory event patch forward because Tamas
requested significant changes to the patch.


Changes since v1:

Many changes since v1 in response to maintainer feedback, including:

Suppress_ve state is now decoupled from memory type
VMFUNC emulation handled in x86 emulator
Lazy-copy algorithm copies any page where mfn != INVALID_MFN
All nested page fault handling except lazy-copy is now in
top-level (hvm.c) nested page fault handler
Split p2m lock type (as suggested by Tim) to avoid lock order violations
XSM hooks
Xen parameter to globally enable altp2m (default disabled) and HVM parameter
Altp2m reference counting no longer uses dirty_cpu bitmap
Remapped page tracking to invalidate altp2m's where needed to protect Xen
Many other minor changes

The altp2m invalidation is implemented to a level that I believe satisifies
the requirements of protecting Xen. Invalidation notification is not yet
implemented, and there may be other cases where invalidation is warranted to
protect the integrity of the restrictions placed through altp2m. We may add
further patches in this area.

Testability is still a potential issue. We have offered to make our internal
Windows test binaries available for intra-domain testing. Tamas

[Xen-devel] [PATCH v4 03/15] VMX: implement suppress #VE.

2015-07-09 Thread Ed White
In preparation for selectively enabling #VE in a later patch, set
suppress #VE on all EPTE's.

Suppress #VE should always be the default condition for two reasons:
it is generally not safe to deliver #VE into a guest unless that guest
has been modified to receive it; and even then for most EPT violations only
the hypervisor is able to handle the violation.

Signed-off-by: Ed White 

Reviewed-by: Andrew Cooper 
Reviewed-by: George Dunlap 
Acked-by: Jun Nakajima 
---
 xen/arch/x86/mm/p2m-ept.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index a6c9adf..4111795 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -41,7 +41,8 @@
 #define is_epte_superpage(ept_entry)((ept_entry)->sp)
 static inline bool_t is_epte_valid(ept_entry_t *e)
 {
-return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
+/* suppress_ve alone is not considered valid, so mask it off */
+return ((e->epte & ~(1ul << 63)) != 0 && e->sa_p2mt != p2m_invalid);
 }
 
 /* returns : 0 for success, -errno otherwise */
@@ -219,6 +220,8 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, 
ept_entry_t *entry,
 static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
 {
 struct page_info *pg;
+ept_entry_t *table;
+unsigned int i;
 
 pg = p2m_alloc_ptp(p2m, 0);
 if ( pg == NULL )
@@ -232,6 +235,15 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, 
ept_entry_t *ept_entry)
 /* Manually set A bit to avoid overhead of MMU having to write it later. */
 ept_entry->a = 1;
 
+ept_entry->suppress_ve = 1;
+
+table = __map_domain_page(pg);
+
+for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
+table[i].suppress_ve = 1;
+
+unmap_domain_page(table);
+
 return 1;
 }
 
@@ -281,6 +293,7 @@ static int ept_split_super_page(struct p2m_domain *p2m, 
ept_entry_t *ept_entry,
 epte->sp = (level > 1);
 epte->mfn += i * trunk;
 epte->snp = (iommu_enabled && iommu_snoop);
+epte->suppress_ve = 1;
 
 ept_p2m_type_to_flags(p2m, epte, epte->sa_p2mt, epte->access);
 
@@ -790,6 +803,8 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, 
mfn_t mfn,
 ept_p2m_type_to_flags(p2m, &new_entry, p2mt, p2ma);
 }
 
+new_entry.suppress_ve = 1;
+
 rc = atomic_write_ept_entry(ept_entry, new_entry, target);
 if ( unlikely(rc) )
 old_entry.epte = 0;
@@ -,6 +1126,8 @@ static void ept_flush_pml_buffers(struct p2m_domain *p2m)
 int ept_p2m_init(struct p2m_domain *p2m)
 {
 struct ept_data *ept = &p2m->ept;
+ept_entry_t *table;
+unsigned int i;
 
 p2m->set_entry = ept_set_entry;
 p2m->get_entry = ept_get_entry;
@@ -1134,6 +1151,13 @@ int ept_p2m_init(struct p2m_domain *p2m)
 p2m->flush_hardware_cached_dirty = ept_flush_pml_buffers;
 }
 
+table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
+
+for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
+table[i].suppress_ve = 1;
+
+unmap_domain_page(table);
+
 if ( !zalloc_cpumask_var(&ept->synced_mask) )
 return -ENOMEM;
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 01/15] common/domain: Helpers to pause a domain while in context

2015-07-09 Thread Ed White
From: Andrew Cooper 

For use on codepaths which would need to use domain_pause() but might be in
the target domain's context.  In the case that the target domain is in
context, all other vcpus are paused.

Signed-off-by: Andrew Cooper 
---
 xen/common/domain.c | 28 
 xen/include/xen/sched.h |  5 +
 2 files changed, 33 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 3bc52e6..1bb24ae 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1010,6 +1010,34 @@ int domain_unpause_by_systemcontroller(struct domain *d)
 return 0;
 }
 
+void domain_pause_except_self(struct domain *d)
+{
+struct vcpu *v, *curr = current;
+
+if ( curr->domain == d )
+{
+for_each_vcpu( d, v )
+if ( likely(v != curr) )
+vcpu_pause(v);
+}
+else
+domain_pause(d);
+}
+
+void domain_unpause_except_self(struct domain *d)
+{
+struct vcpu *v, *curr = current;
+
+if ( curr->domain == d )
+{
+for_each_vcpu( d, v )
+if ( likely(v != curr) )
+vcpu_unpause(v);
+}
+else
+domain_unpause(d);
+}
+
 int vcpu_reset(struct vcpu *v)
 {
 struct domain *d = v->domain;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b29d9e7..73d3bc8 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -804,6 +804,11 @@ static inline int 
domain_pause_by_systemcontroller_nosync(struct domain *d)
 {
 return __domain_pause_by_systemcontroller(d, domain_pause_nosync);
 }
+
+/* domain_pause() but safe against trying to pause current. */
+void domain_pause_except_self(struct domain *d);
+void domain_unpause_except_self(struct domain *d);
+
 void cpu_init(void);
 
 struct scheduler;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-09 Thread Chen, Tiejun

  int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
  {
+/* We'd like to force reserve rdm specific to a device by default.*/
+if ( pci->rdm_policy == LIBXL_RDM_RESERVE_POLICY_INVALID)

^

I have just spotted that spurious whitespace.  However I won't block
this for that.


Sorry, this is my typo.



Acked-by: Ian Jackson 

(actually).

I would appreciate it if you could ensure that this is fixed in any
repost.  You may retain my ack if you do that.  Committers should feel
free to fix it on commit.


I fixed this in my tree.

Thanks
Tiejun



Thanks,
Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-09 Thread Wu, Feng


> -Original Message-
> From: George Dunlap [mailto:george.dun...@eu.citrix.com]
> Sent: Thursday, July 09, 2015 8:53 PM
> To: Wu, Feng
> Cc: Dario Faggioli; Tian, Kevin; k...@xen.org; andrew.coop...@citrix.com;
> xen-devel; jbeul...@suse.com; Zhang, Yang Z
> Subject: Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor
> during vCPU scheduling
> 
> On 07/09/2015 12:38 PM, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> George
> >> Dunlap
> >> Sent: Thursday, July 09, 2015 7:20 PM
> >> To: Wu, Feng
> >> Cc: Dario Faggioli; Tian, Kevin; k...@xen.org; andrew.coop...@citrix.com;
> >> xen-devel; jbeul...@suse.com; Zhang, Yang Z
> >> Subject: Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts
> Descriptor
> >> during vCPU scheduling
> >>
> >> On Thu, Jul 9, 2015 at 4:09 AM, Wu, Feng  wrote:
>  That does not necessarily means "we need to do something" in
>  vcpu_runstate_change(). Actually, that's exactly what I'm asking: can
>  you check whether this thing that you need doing can be done somewhere
>  else than in vcpu_runstaete_change() ?
> >>>
> >>> Why do you think vcpu_runstaete_change() is not the right place to do
> this?
> >>
> >> Because what the vcpu_runstate_change() function does at the moment is
> >> *update the vcpu runstate variable*.  It doesn't actually change the
> >> runstate -- the runstate is changed in the various bits of code that
> >> call it; and it's not designed to be a generic place to put hooks on
> >> the runstate changing.
> >>
> >> I haven't done a thorough review of this yet, but at least looking
> >> through this patch, and skimming the titles, I don't see anywhere you
> >> handle migration -- what happens if a vcpu that's blocked / offline /
> >> runnable migrates from one cpu to another?  Is the information
> >> updated?
> >
> > Thanks for your review!
> 
> And I'd like to say -- sorry that I didn't notice this issue sooner; I
> know you've had your series posted for quite a while, but I didn't
> realize until last week that it actually involved the scheduler.  It's
> really my fault for not paying closer attention -- you did CC me in v2
> back in June.
> 
> > The migration is handled in arch_pi_desc_update() which is called
> > by vcpu_runstate_change().
> 
> Well as far as I can tell from looking at the code,
> vcpu_runstate_change() will not be called when migrating a vcpu which is
> already blocked.
> 
> Consider the following scenario:
> - v1 blocks on pcpu 0.
>  - vcpu_runstate_change() will do everything necessary for v1 on p0.
> - The scheduler does load balancing and moves v1 to p1, calling
> vcpu_migrate().  Because the vcpu is still blocked,
> vcpu_runstate_change() is not called.
> - A device interrupt is generated.
> 
> What happens to the interrupt?  Does everything still work properly, or
> will the device wake-up interrupt go to the wrong pcpu (p0 rather than p1)?

I think it works correctly. Before blocking, we save the v->processor, and
save the vcpu on this per-cpu list, even when the vCPU is migrated to
another pCPU, the wakeup notification event will go to the original one
(p0 in this case), this is what I want, in the list of p0, we can find and
unblock the blocked vCPU, this is the point.

Thanks,
Feng

> 
> >  or to add a set of architectural hooks (similar to
> >> the SCHED_OP() hooks) in the various places you need them.
> >
> > I don't have a picture of this method, but from your comments, seems
> > we need to put the logic to many different places, and must be very
> > careful so as to not miss some places. I think the above method
> > is more clear and straightforward, since we have a central place to
> > handle all the cases. Anyway, if you prefer to this one, it would be
> > highly appreciated if you can give a more detailed solution! Thank you!
> 
> Well you can check to make sure you've caught at least all the places
> you had before by searching for vcpu_runstate_change(). :-)
> 
> Using the callback method also can help prompt you to think about other
> times you may need to do something.  For instance, you might still
> consider searching for SCHED_OP() everywhere in schedule.c and seeing if
> that's a place you need to do something (similar to the migration thing
> above).
> 
> Anyway, the most detailed thing I can say at this time is to look at
> SCHED_OP() and see if doing  something like that, but for architectural
> callbacks, makes sense.
> 
> I'll come back and take a closer look a bit later.
> 
>  -George
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-09 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Thursday, July 09, 2015 8:42 PM
> To: Wu, Feng
> Cc: George Dunlap; Tian, Kevin; k...@xen.org; andrew.coop...@citrix.com;
> xen-devel; jbeul...@suse.com; Zhang, Yang Z
> Subject: Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor
> during vCPU scheduling
> 
> On Thu, 2015-07-09 at 11:38 +, Wu, Feng wrote:
> >
> > > -Original Message-
> > > From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> George
> > > Dunlap
> 
> > > > Why do you think vcpu_runstaete_change() is not the right place to do
> this?
> > >
> > > Because what the vcpu_runstate_change() function does at the moment is
> > > *update the vcpu runstate variable*.  It doesn't actually change the
> > > runstate -- the runstate is changed in the various bits of code that
> > > call it; and it's not designed to be a generic place to put hooks on
> > > the runstate changing.
> > >
> > > I haven't done a thorough review of this yet, but at least looking
> > > through this patch, and skimming the titles, I don't see anywhere you
> > > handle migration -- what happens if a vcpu that's blocked / offline /
> > > runnable migrates from one cpu to another?  Is the information
> > > updated?
> >
> > Thanks for your review!
> >
> > The migration is handled in arch_pi_desc_update() which is called
> > by vcpu_runstate_change().
> >
> > >
> > > The right thing to do in this situation is either to change
> > > vcpu_runstate_change() so that it is the central place to make all (or
> > > most) hooks happen;
> >
> > Yes, this is my implementation. I think vcpu_runstate_change()
> > is the _central_ place to do things when vCPU state is changed. This
> > makes things clear and simple. I call an arch hooks to update
> > posted-interrupt descriptor in this function.
> >
> Perhaps, one way to double check this line of reasoning (the fact that
> you think this needs to lay on top of runstates, and more specifically
> in that function), would be to come up with some kind of "list of
> requirements", not taking runstates into account.
> 
> I know there is a design document for this series (and I also know I
> could have commented on it earlier, sorry for that), but that itself
> mentions runstates, which does not help.
> 
> What I mean is, can you describe when you need each specific operation
> needs to happen? Something like "descriptor needs to be updated like
> this upon migration", "notification should be disabled when vcpu starts
> running", "notification method should be changed that other way when
> vcpu is preempted", etc.

I cannot see the differences, I think the requirements are clearly listed in
the design doc and the comments of this patch.

> 
> This would help a lot, IMO, figuring out the actual functional
> requirements that needs to be satisfied for things to work well. Once
> that is done, we can go check in the code where is the best place to put
> each call, hook, or whatever.
> 
> 
> Note that I've already tried to infer the above, by looking at the
> patches, and that is making me think that it would be possible to
> implement things in another way. But maybe I'm missing something. So it
> would be really valuable if you, with all your knowledge of how PI
> should work, could do it.

I keep describing how PI works, what the purpose of the two vectors are,
how special they are from the beginning.

Thanks,
Feng


> 
> Thanks and Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6] run QEMU as non-root

2015-07-09 Thread Jim Fehlig

On 07/09/2015 04:34 AM, Ian Campbell wrote:

On Wed, 2015-07-01 at 15:03 -0600, Jim Fehlig wrote:

Perhaps.  But thanks for providing a way (b_info->device_model_user) for apps to
override the libxl policy.

You mentioned in v5 that libvirt supports setting both the user and the
group and that the qemu driver supports that. How does that work?

AFAICT qemu's -runas option only takes a user and it takes that user's
primary group and uses that with no configurability. I think that's a
fine way to do things, but you implied greater configurability in
libvirt and I'm now curious...


The libvirt qemu driver doesn't use qemu's -runas option. It calls 
setregid()/setreuid() in the child after fork()'ing, but before exec()'ing, qemu.


Regards,
Jim


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-linus test] 59254: tolerable FAIL - PUSHED

2015-07-09 Thread osstest service owner
flight 59254 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59254/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-libvirt  11 guest-start  fail   like 59086
 test-amd64-amd64-libvirt 11 guest-start  fail   like 59086
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 59086
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 59086

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 13 guest-saverestorefail  never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass

version targeted for testing:
 linux45820c294fe1b1a9df495d57f40585ef2d069a39
baseline version:
 linux1c4c7159ed2468f3ac4ce5a7f08d79663d381a93

Last test of basis59086  2015-07-06 04:24:57 Z3 days
Failing since 59130  2015-07-07 04:25:30 Z2 days3 attempts
Testing same since59254  2015-07-09 04:20:48 Z0 days1 attempts


People who touched revisions under test:
  Arnaldo Carvalho de Melo 
  Greg Kroah-Hartman 
  Ingo Molnar 
  John Stultz 
  Laura Abbott 
  Laura Abbott 
  Linus Torvalds 
  Mark Rutland 
  Nathan Lynch 
  Nicolas Pitre 
  Peter Zijlstra (Intel) 
  Peter Zijlstra 
  Russell King 
  Santosh Shilimkar 
  Stephen Boyd 
  Steven Rostedt 
  Szabolcs Nagy 
  Tomas Winkler 
  Vitaly Andrianov 
  Will Deacon 
  Wolfram Sang 
  Yann Droneaud 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm pass
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-xl-xsm  pass
 test-armhf-armhf-xl-xsm  pass
 test-amd64-i386-xl-xsm   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pas

[Xen-devel] [PATCH v2 05/20] block/xen-blkfront: Split blkif_queue_request in 2

2015-07-09 Thread Julien Grall
Currently, blkif_queue_request has 2 distinct execution path:
- Send a discard request
- Send a read/write request

The function is also allocating grants to use for generating the
request. Although, this is only used for read/write request.

Rather than having a function with 2 distinct execution path, separate
the function in 2. This will also remove one level of tabulation.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Roger Pau Monné 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 280 +++
 1 file changed, 153 insertions(+), 127 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 6d89ed3..7107d58 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -392,13 +392,35 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t 
mode,
return 0;
 }
 
-/*
- * Generate a Xen blkfront IO request from a blk layer request.  Reads
- * and writes are handled as expected.
- *
- * @req: a request struct
- */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_discard_req(struct request *req)
+{
+   struct blkfront_info *info = req->rq_disk->private_data;
+   struct blkif_request *ring_req;
+   unsigned long id;
+
+   /* Fill out a communications ring structure. */
+   ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+   id = get_id_from_freelist(info);
+   info->shadow[id].request = req;
+
+   ring_req->operation = BLKIF_OP_DISCARD;
+   ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
+   ring_req->u.discard.id = id;
+   ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
+   if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
+   ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
+   else
+   ring_req->u.discard.flag = 0;
+
+   info->ring.req_prod_pvt++;
+
+   /* Keep a private copy so we can reissue requests when recovering. */
+   info->shadow[id].req = *ring_req;
+
+   return 0;
+}
+
+static int blkif_queue_rw_req(struct request *req)
 {
struct blkfront_info *info = req->rq_disk->private_data;
struct blkif_request *ring_req;
@@ -418,9 +440,6 @@ static int blkif_queue_request(struct request *req)
struct scatterlist *sg;
int nseg, max_grefs;
 
-   if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
-   return 1;
-
max_grefs = req->nr_phys_segments;
if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
/*
@@ -450,139 +469,128 @@ static int blkif_queue_request(struct request *req)
id = get_id_from_freelist(info);
info->shadow[id].request = req;
 
-   if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) {
-   ring_req->operation = BLKIF_OP_DISCARD;
-   ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
-   ring_req->u.discard.id = id;
-   ring_req->u.discard.sector_number = 
(blkif_sector_t)blk_rq_pos(req);
-   if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
-   ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
-   else
-   ring_req->u.discard.flag = 0;
+   BUG_ON(info->max_indirect_segments == 0 &&
+  req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+   BUG_ON(info->max_indirect_segments &&
+  req->nr_phys_segments > info->max_indirect_segments);
+   nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+   ring_req->u.rw.id = id;
+   if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+   /*
+* The indirect operation can only be a BLKIF_OP_READ or
+* BLKIF_OP_WRITE
+*/
+   BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
+   ring_req->operation = BLKIF_OP_INDIRECT;
+   ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
+   BLKIF_OP_WRITE : BLKIF_OP_READ;
+   ring_req->u.indirect.sector_number = 
(blkif_sector_t)blk_rq_pos(req);
+   ring_req->u.indirect.handle = info->handle;
+   ring_req->u.indirect.nr_segments = nseg;
} else {
-   BUG_ON(info->max_indirect_segments == 0 &&
-  req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-   BUG_ON(info->max_indirect_segments &&
-  req->nr_phys_segments > info->max_indirect_segments);
-   nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
-   ring_req->u.rw.id = id;
-   if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+   ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
+   ring_req->u.rw.handle = info->handle;
+   ri

[Xen-devel] [PATCH v2 14/20] xen/grant-table: Make it running on 64KB granularity

2015-07-09 Thread Julien Grall
The Xen interface is using 4KB page granularity. This means that each
grant is 4KB.

The current implementation allocates a Linux page per grant. On Linux
using 64KB page granularity, only the first 4KB of the page will be
used.

We could decrease the memory wasted by sharing the page with multiple
grant. It will require some care with the {Set,Clear}ForeignPage macro.

Note that no changes has been made in the x86 code because both Linux
and Xen will only use 4KB page granularity.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Cc: Stefano Stabellini 
Cc: Russell King 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
---
Changes in v2
- Add David's reviewed-by
---
 arch/arm/xen/p2m.c| 6 +++---
 drivers/xen/grant-table.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c
index 887596c..0ed01f2 100644
--- a/arch/arm/xen/p2m.c
+++ b/arch/arm/xen/p2m.c
@@ -93,8 +93,8 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref 
*map_ops,
for (i = 0; i < count; i++) {
if (map_ops[i].status)
continue;
-   set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
-   map_ops[i].dev_bus_addr >> PAGE_SHIFT);
+   set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT,
+   map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT);
}
 
return 0;
@@ -108,7 +108,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref 
*unmap_ops,
int i;
 
for (i = 0; i < count; i++) {
-   set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
+   set_phys_to_machine(unmap_ops[i].host_addr >> XEN_PAGE_SHIFT,
INVALID_P2M_ENTRY);
}
 
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 3679293..0a1f903 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -668,7 +668,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
if (xen_auto_xlat_grant_frames.count)
return -EINVAL;
 
-   vaddr = xen_remap(addr, PAGE_SIZE * max_nr_gframes);
+   vaddr = xen_remap(addr, XEN_PAGE_SIZE * max_nr_gframes);
if (vaddr == NULL) {
pr_warn("Failed to ioremap gnttab share frames (addr=%pa)!\n",
&addr);
@@ -680,7 +680,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
return -ENOMEM;
}
for (i = 0; i < max_nr_gframes; i++)
-   pfn[i] = PFN_DOWN(addr) + i;
+   pfn[i] = XEN_PFN_DOWN(addr) + i;
 
xen_auto_xlat_grant_frames.vaddr = vaddr;
xen_auto_xlat_grant_frames.pfn = pfn;
@@ -1004,7 +1004,7 @@ static void gnttab_request_version(void)
 {
/* Only version 1 is used, which will always be available. */
grant_table_version = 1;
-   grefs_per_grant_frame = PAGE_SIZE / sizeof(struct grant_entry_v1);
+   grefs_per_grant_frame = XEN_PAGE_SIZE / sizeof(struct grant_entry_v1);
gnttab_interface = &gnttab_v1_ops;
 
pr_info("Grant tables using version %d layout\n", grant_table_version);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 10/20] xen/xenbus: Use Xen page definition

2015-07-09 Thread Julien Grall
All the ring (xenstore, and PV rings) are always based on the page
granularity of Xen.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
Changes in v2:
- Also update the ring mapping function
---
 drivers/xen/xenbus/xenbus_client.c | 6 +++---
 drivers/xen/xenbus/xenbus_probe.c  | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c 
b/drivers/xen/xenbus/xenbus_client.c
index 9ad3272..80272f6 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -388,7 +388,7 @@ int xenbus_grant_ring(struct xenbus_device *dev, void 
*vaddr,
}
grefs[i] = err;
 
-   vaddr = vaddr + PAGE_SIZE;
+   vaddr = vaddr + XEN_PAGE_SIZE;
}
 
return 0;
@@ -555,7 +555,7 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device 
*dev,
if (!node)
return -ENOMEM;
 
-   area = alloc_vm_area(PAGE_SIZE * nr_grefs, ptes);
+   area = alloc_vm_area(XEN_PAGE_SIZE * nr_grefs, ptes);
if (!area) {
kfree(node);
return -ENOMEM;
@@ -750,7 +750,7 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device 
*dev, void *vaddr)
unsigned long addr;
 
memset(&unmap[i], 0, sizeof(unmap[i]));
-   addr = (unsigned long)vaddr + (PAGE_SIZE * i);
+   addr = (unsigned long)vaddr + (XEN_PAGE_SIZE * i);
unmap[i].host_addr = arbitrary_virt_to_machine(
lookup_address(addr, &level)).maddr;
unmap[i].dev_bus_addr = 0;
diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 4308fb3..c67e5ba 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -713,7 +713,7 @@ static int __init xenstored_local_init(void)
 
xen_store_mfn = xen_start_info->store_mfn =
pfn_to_mfn(virt_to_phys((void *)page) >>
-  PAGE_SHIFT);
+  XEN_PAGE_SHIFT);
 
/* Next allocate a local port which xenstored can bind to */
alloc_unbound.dom= DOMID_SELF;
@@ -804,7 +804,7 @@ static int __init xenbus_init(void)
goto out_error;
xen_store_mfn = (unsigned long)v;
xen_store_interface =
-   xen_remap(xen_store_mfn << PAGE_SHIFT, PAGE_SIZE);
+   xen_remap(xen_store_mfn << XEN_PAGE_SHIFT, 
XEN_PAGE_SIZE);
break;
default:
pr_warn("Xenstore state unknown\n");
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 15/20] block/xen-blkfront: Make it running on 64KB page granularity

2015-07-09 Thread Julien Grall
From: Julien Grall 

The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using block
device on a non-modified Xen.

The block API is using segment which should at least be the size of a
Linux page. Therefore, the driver will have to break the page in chunk
of 4K before giving the page to the backend.

Breaking a 64KB segment in 4KB chunk will result to have some chunk with
no data. As the PV protocol always require to have data in the chunk, we
have to count the number of Xen page which will be in use and avoid to
sent empty chunk.

Note that, a pre-defined number of grant is reserved before preparing
the request. This pre-defined number is based on the number and the
maximum size of the segments. If each segment contain a very small
amount of data, the driver may reserve too much grant (16 grant is
reserved per segment with 64KB page granularity).

Futhermore, in the case of persistent grant we allocate one Linux page
per grant although only the 4KB of the page will be effectively use.
This could be improved by share the page with multiple grants.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Roger Pau Monné 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---

Improvement such as support 64KB grant is not taken into consideration in
this patch because we have the requirement to run a Linux using 64KB page
on a non-modified Xen.

Changes in v2:
- Use gnttab_foreach_grant to split a Linux page into grant
---
 drivers/block/xen-blkfront.c | 304 ---
 1 file changed, 198 insertions(+), 106 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 95fd067..644ba76 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -77,6 +77,7 @@ struct blk_shadow {
struct grant **grants_used;
struct grant **indirect_grants;
struct scatterlist *sg;
+   unsigned int num_sg;
 };
 
 struct split_bio {
@@ -106,8 +107,8 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, 
S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for 
the shared ring");
 
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
(info)->nr_ring_pages)
-#define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * 
XENBUS_MAX_RING_PAGES)
+#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * 
(info)->nr_ring_pages)
+#define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * 
XENBUS_MAX_RING_PAGES)
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
  * characters are enough. Define to 20 to keep consist with backend.
@@ -146,6 +147,7 @@ struct blkfront_info
unsigned int discard_granularity;
unsigned int discard_alignment;
unsigned int feature_persistent:1;
+   /* Number of 4K segment handled */
unsigned int max_indirect_segments;
int is_ready;
 };
@@ -173,10 +175,19 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define DEV_NAME   "xvd"   /* name in /dev */
 
-#define SEGS_PER_INDIRECT_FRAME \
-   (PAGE_SIZE/sizeof(struct blkif_request_segment))
-#define INDIRECT_GREFS(_segs) \
-   ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+/*
+ * Xen use 4K pages. The guest may use different page size (4K or 64K)
+ * Number of Xen pages per segment
+ */
+#define XEN_PAGES_PER_SEGMENT   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define SEGS_PER_INDIRECT_FRAME\
+   (XEN_PAGE_SIZE/sizeof(struct blkif_request_segment) / 
XEN_PAGES_PER_SEGMENT)
+#define XEN_PAGES_PER_INDIRECT_FRAME \
+   (XEN_PAGE_SIZE/sizeof(struct blkif_request_segment))
+
+#define INDIRECT_GREFS(_pages) \
+   ((_pages + XEN_PAGES_PER_INDIRECT_FRAME - 
1)/XEN_PAGES_PER_INDIRECT_FRAME)
 
 static int blkfront_setup_indirect(struct blkfront_info *info);
 
@@ -463,14 +474,100 @@ static int blkif_queue_discard_req(struct request *req)
return 0;
 }
 
+struct setup_rw_req {
+   unsigned int grant_idx;
+   struct blkif_request_segment *segments;
+   struct blkfront_info *info;
+   struct blkif_request *ring_req;
+   grant_ref_t gref_head;
+   unsigned int id;
+   /* Only used when persistent grant is used and it's a read request */
+   bool need_copy;
+   unsigned int bvec_off;
+   char *bvec_data;
+};
+
+static void blkif_setup_rw_req_grant(unsigned long mfn, unsigned int offset,
+unsigned int *len, void *data)
+{
+   struct setup_rw_req *setup = data;
+   int n, ref;
+   struct grant *gnt_list_entry;
+   unsigned int fsect, lsect;
+   /* Convenient aliases */
+   unsigned int grant_idx = setup->grant_idx;
+   struct blkif_request *ring_req = setup->ring_req;
+   struct blkfront_info *info = setup->info;
+   struct blk_shadow *shadow = &info->

[Xen-devel] [PATCH v2 13/20] xen/events: fifo: Make it running on 64KB granularity

2015-07-09 Thread Julien Grall
Only use the first 4KB of the page to store the events channel info. It
means that we will wast 60KB every time we allocate page for:
 * control block: a page is allocating per CPU
 * event array: a page is allocating everytime we need to expand it

I think we can reduce the memory waste for the 2 areas by:

* control block: sharing between multiple vCPUs. Although it will
require some bookkeeping in order to not free the page when the CPU
goes offline and the other CPUs sharing the page still there

* event array: always extend the array event by 64K (i.e 16 4K
chunk). That would require more care when we fail to expand the
event channel.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
 drivers/xen/events/events_base.c | 2 +-
 drivers/xen/events/events_fifo.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 96093ae..858d2f6 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -40,11 +40,11 @@
 #include 
 #include 
 #include 
-#include 
 #endif
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index ed673e1..d53c297 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -54,7 +54,7 @@
 
 #include "events_internal.h"
 
-#define EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
+#define EVENT_WORDS_PER_PAGE (XEN_PAGE_SIZE / sizeof(event_word_t))
 #define MAX_EVENT_ARRAY_PAGES (EVTCHN_FIFO_NR_CHANNELS / EVENT_WORDS_PER_PAGE)
 
 struct evtchn_fifo_queue {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-07-09 Thread Julien Grall
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall 
Cc: Ian Campbell 
Cc: Wei Liu 
Cc: net...@vger.kernel.org
---

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

Changes in v2:
- Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
- Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
fragment into a new skb
- Use gnntab_foreach_grant to split a Linux page into grant
---
 drivers/net/xen-netback/common.h  |  15 +++--
 drivers/net/xen-netback/netback.c | 138 +++---
 2 files changed, 93 insertions(+), 60 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..bb68211 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
struct ubuf_info callback_struct;
 };
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 struct xenvif_rx_meta {
int id;
@@ -80,16 +81,18 @@ struct xenvif_rx_meta {
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0x
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
 
 #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
 
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 /* It's possible for an skb to have a maximal number of frags
  * but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
  * ring slot.
  */
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
 #define NETBACK_INVALID_HANDLE -1
 
@@ -203,7 +206,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 /* Maximum number of Rx slots a to-guest packet may use, including the
  * slot needed for GSO meta-data.
  */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
+#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
 
 enum state_bit_shift {
/* This bit marks that the vif is connected */
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 3f77030..828085b 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -263,6 +263,65 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct 
xenvif_queue *queue,
return meta;
 }
 
+struct gop_frag_copy
+{
+   struct xenvif_queue *queue;
+   struct netrx_pending_operations *npo;
+   struct xenvif_rx_meta *meta;
+   int head;
+   int gso_type;
+
+   struct page *page;
+};
+
+static void xenvif_gop_frag_copy_grant(unsigned long mfn, unsigned int offset,
+  unsigned int *len, void *data)
+{
+   struct gop_frag_copy *info = data;
+   struct gnttab_copy *copy_gop;
+   struct xen_page_foreign *foreign;
+   /* Convenient aliases */
+   struct xenvif_queue *queue = info->queue;
+   struct netrx_pending_operations *npo = info->npo;
+   struct page *page = info->page;
+
+   BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+   if (npo->copy_off == MAX_BUFFER_OFFSET)
+   info->meta = get_next_rx_buffer(queue, npo);
+
+   if (npo->copy_off + *len > MAX_BUFFER_OFFSET)
+   *len = MAX_BUFFER_OFFSET - npo->copy_off;
+
+   copy_gop = npo->copy + npo->copy_prod++;
+   copy_gop->flags = GNTCOPY_dest_gref;
+   copy_gop->len = *len;
+
+   foreign = xen_page_foreign(page);
+   if (foreign) {
+   copy_gop->source.domid = foreign->domid;
+   copy_gop->source.u.ref = foreign->gref;
+   copy_gop->flags |= GNTCOPY_source_gref;
+   } else {
+   copy_gop->source.domid = DOMID_SELF;
+   copy_gop->source.u.gmfn = mfn;
+   }
+   copy_gop->source.offset = offset;
+
+   copy_gop->dest.domid = queue->vif->domid;
+   copy_gop->dest.offset = npo->copy_off;
+   copy_gop->dest.u.ref = npo->copy_gref;
+
+   npo->copy_off += *len;
+   info->meta->size += *len;
+
+   /* Leave a gap for the GSO descripto

[Xen-devel] [PATCH v2 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

2015-07-09 Thread Julien Grall
For ARM64 guests, Linux is able to support either 64K or 4K page
granularity. Although, the hypercall interface is always based on 4K
page granularity.

With 64K page granuliarty, a single page will be spread over multiple
Xen frame.

When a driver request/free a balloon page, the balloon driver will have
to split the Linux page in 4K chunk before asking Xen to add/remove the
frame from the guest.

Note that this can work on any page granularity assuming it's a multiple
of 4K.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Cc: Wei Liu 
---
Changes in v2:
- Use xen_apply_to_page to split a page in 4K chunk
- It's not necessary to have a smaller frame list. Re-use
PAGE_SIZE
- Convert reserve_additional_memory to use XEN_... macro
---
 drivers/xen/balloon.c | 147 +++---
 1 file changed, 105 insertions(+), 42 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index fd93369..19a72b1 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -230,6 +230,7 @@ static enum bp_state reserve_additional_memory(long credit)
nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
+   /* TODO */
 /*
  * add_memory() will build page tables for the new memory so
  * the p2m must contain invalid entries so the correct
@@ -242,8 +243,8 @@ static enum bp_state reserve_additional_memory(long credit)
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
unsigned long pfn, i;
 
-   pfn = PFN_DOWN(hotplug_start_paddr);
-   for (i = 0; i < balloon_hotplug; i++) {
+   pfn = XEN_PFN_DOWN(hotplug_start_paddr);
+   for (i = 0; i < (balloon_hotplug * XEN_PFN_PER_PAGE); i++) {
if (!set_phys_to_machine(pfn + i, INVALID_P2M_ENTRY)) {
pr_warn("set_phys_to_machine() failed, no 
memory added\n");
return BP_ECANCELED;
@@ -323,10 +324,72 @@ static enum bp_state reserve_additional_memory(long 
credit)
 }
 #endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */
 
+static int set_frame(struct page *page, unsigned long pfn, void *data)
+{
+   unsigned long *index = data;
+
+   frame_list[(*index)++] = pfn;
+
+   return 0;
+}
+
+#ifdef CONFIG_XEN_HAVE_PVMMU
+static int pvmmu_update_mapping(struct page *page, unsigned long pfn,
+   void *data)
+{
+   unsigned long *index = data;
+   xen_pfn_t frame = frame_list[*index];
+
+   set_phys_to_machine(pfn, frame);
+   /* Link back into the page tables if not highmem. */
+   if (!PageHighMem(page)) {
+   int ret;
+   ret = HYPERVISOR_update_va_mapping(
+   (unsigned long)__va(pfn << XEN_PAGE_SHIFT),
+   mfn_pte(frame, PAGE_KERNEL),
+   0);
+   BUG_ON(ret);
+   }
+
+   (*index)++;
+
+   return 0;
+}
+#endif
+
+static int balloon_remove_mapping(struct page *page, unsigned long pfn,
+ void *data)
+{
+   unsigned long *index = data;
+
+   /* We expect the frame_list to contain the same pfn */
+   BUG_ON(pfn != frame_list[*index]);
+
+   frame_list[*index] = pfn_to_mfn(pfn);
+
+#ifdef CONFIG_XEN_HAVE_PVMMU
+   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   if (!PageHighMem(page)) {
+   int ret;
+
+   ret = HYPERVISOR_update_va_mapping(
+   (unsigned long)__va(pfn << 
XEN_PAGE_SHIFT),
+   __pte_ma(0), 0);
+   BUG_ON(ret);
+   }
+   __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+   }
+#endif
+
+   (*index)++;
+
+   return 0;
+}
+
 static enum bp_state increase_reservation(unsigned long nr_pages)
 {
int rc;
-   unsigned long  pfn, i;
+   unsigned long i, frame_idx;
struct page   *page;
struct xen_memory_reservation reservation = {
.address_bits = 0,
@@ -343,44 +406,43 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
}
 #endif
 
-   if (nr_pages > ARRAY_SIZE(frame_list))
-   nr_pages = ARRAY_SIZE(frame_list);
+   if (nr_pages > (ARRAY_SIZE(frame_list) / XEN_PFN_PER_PAGE))
+   nr_pages = ARRAY_SIZE(frame_list) / XEN_PFN_PER_PAGE;
 
+   frame_idx = 0;
page = list_first_entry_or_null(&ballooned_pages, struct page, lru);
for (i = 0; i < nr_pages; i++) {
if (!page) {
nr_pages = i;
break;
}
-   frame_list[i] = page_to_pfn(page);
+
+   rc = xen_apply_to_page(page, set_frame, &frame_idx);
+
 

[Xen-devel] [PATCH v2 16/20] block/xen-blkback: Make it running on 64KB page granularity

2015-07-09 Thread Julien Grall
The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity behaving as a
block backend on a non-modified Xen.

It's only necessary to adapt the ring size and the number of request per
indirect frames. The rest of the code is relying on the grant table
code.

Note that the grant table code is allocating a Linux page per grant
which will result to waste 6OKB for every grant when Linux is using 64KB
page granularity. This could be improved by sharing the page between
multiple grants.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: "Roger Pau Monné" 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

This has been tested only with a loop device. I plan to test passing
hard drive partition but I didn't yet convert the swiotlb code.
---
 drivers/block/xen-blkback/blkback.c |  5 +++--
 drivers/block/xen-blkback/common.h  | 16 +---
 drivers/block/xen-blkback/xenbus.c  |  9 ++---
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index ced9677..d5cce8c 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -961,7 +961,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request 
*req,
seg[n].nsec = segments[i].last_sect -
segments[i].first_sect + 1;
seg[n].offset = (segments[i].first_sect << 9);
-   if ((segments[i].last_sect >= (PAGE_SIZE >> 9)) ||
+   if ((segments[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
(segments[i].last_sect < segments[i].first_sect)) {
rc = -EINVAL;
goto unmap;
@@ -1210,6 +1210,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
req_operation = req->operation == BLKIF_OP_INDIRECT ?
req->u.indirect.indirect_op : req->operation;
+
if ((req->operation == BLKIF_OP_INDIRECT) &&
(req_operation != BLKIF_OP_READ) &&
(req_operation != BLKIF_OP_WRITE)) {
@@ -1268,7 +1269,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
seg[i].nsec = req->u.rw.seg[i].last_sect -
req->u.rw.seg[i].first_sect + 1;
seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
-   if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||
+   if ((req->u.rw.seg[i].last_sect >= (XEN_PAGE_SIZE >> 
9)) ||
(req->u.rw.seg[i].last_sect <
 req->u.rw.seg[i].first_sect))
goto fail_response;
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index 45a044a..33836bb 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -51,12 +52,21 @@ extern unsigned int xen_blkif_max_ring_order;
  */
 #define MAX_INDIRECT_SEGMENTS 256
 
-#define SEGS_PER_INDIRECT_FRAME \
-   (PAGE_SIZE/sizeof(struct blkif_request_segment))
+/*
+ * Xen use 4K pages. The guest may use different page size (4K or 64K)
+ * Number of Xen pages per segment
+ */
+#define XEN_PAGES_PER_SEGMENT   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define SEGS_PER_INDIRECT_FRAME\
+   (XEN_PAGE_SIZE/sizeof(struct blkif_request_segment) / 
XEN_PAGES_PER_SEGMENT)
+#define XEN_PAGES_PER_INDIRECT_FRAME \
+   (XEN_PAGE_SIZE/sizeof(struct blkif_request_segment))
+
 #define MAX_INDIRECT_PAGES \
((MAX_INDIRECT_SEGMENTS + SEGS_PER_INDIRECT_FRAME - 
1)/SEGS_PER_INDIRECT_FRAME)
 #define INDIRECT_PAGES(_segs) \
-   ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+   ((_segs + XEN_PAGES_PER_INDIRECT_FRAME - 
1)/XEN_PAGES_PER_INDIRECT_FRAME)
 
 /* Not a real protocol.  Used to generate ring structs which contain
  * the elements common to all protocols only.  This way we get a
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index deb3f00..edd27e4 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -176,21 +176,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, 
grant_ref_t *gref,
{
struct blkif_sring *sring;
sring = (struct blkif_sring *)blkif->blk_ring;
-   BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE * 
nr_grefs);
+   BACK_RING_INIT(&blkif->blk_rings.native, sring,
+  XEN_PAGE_SIZE * nr_grefs);
break;
}
case BLKIF_PROTOCOL_X86_32:
{
struct blkif_x8

[Xen-devel] [PATCH v2 19/20] xen/privcmd: Add support for Linux 64KB page granularity

2015-07-09 Thread Julien Grall
The hypercall interface (as well as the toolstack) is always using 4KB
page granularity. When the toolstack is asking for mapping a series of
guest PFN in a batch, it expects to have the page map contiguously in
its virtual memory.

When Linux is using 64KB page granularity, the privcmd driver will have
to map multiple Xen PFN in a single Linux page.

Note that this solution works on page granularity which is a multiple of
4KB.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
Changes in v2:
- Use xen_apply_to_page
---
 drivers/xen/privcmd.c   |   8 +--
 drivers/xen/xlate_mmu.c | 127 +---
 2 files changed, 92 insertions(+), 43 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 5a29616..e8714b4 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -446,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
return -EINVAL;
}
 
-   nr_pages = m.num;
+   nr_pages = DIV_ROUND_UP_ULL(m.num, PAGE_SIZE / XEN_PAGE_SIZE);
if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
return -EINVAL;
 
@@ -494,7 +494,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
goto out_unlock;
}
if (xen_feature(XENFEAT_auto_translated_physmap)) {
-   ret = alloc_empty_pages(vma, m.num);
+   ret = alloc_empty_pages(vma, nr_pages);
if (ret < 0)
goto out_unlock;
} else
@@ -518,6 +518,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
state.global_error  = 0;
state.version   = version;
 
+   BUILD_BUG_ON(((PAGE_SIZE / sizeof(xen_pfn_t)) % XEN_PFN_PER_PAGE) != 0);
/* mmap_batch_fn guarantees ret == 0 */
BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
&pagelist, mmap_batch_fn, &state));
@@ -582,12 +583,13 @@ static void privcmd_close(struct vm_area_struct *vma)
 {
struct page **pages = vma->vm_private_data;
int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+   int nr_pfn = (vma->vm_end - vma->vm_start) >> XEN_PAGE_SHIFT;
int rc;
 
if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages)
return;
 
-   rc = xen_unmap_domain_mfn_range(vma, numpgs, pages);
+   rc = xen_unmap_domain_mfn_range(vma, nr_pfn, pages);
if (rc == 0)
free_xenballooned_pages(numpgs, pages);
else
diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index 58a5389..1fac17c 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -38,31 +38,9 @@
 #include 
 #include 
 
-/* map fgmfn of domid to lpfn in the current domain */
-static int map_foreign_page(unsigned long lpfn, unsigned long fgmfn,
-   unsigned int domid)
-{
-   int rc;
-   struct xen_add_to_physmap_range xatp = {
-   .domid = DOMID_SELF,
-   .foreign_domid = domid,
-   .size = 1,
-   .space = XENMAPSPACE_gmfn_foreign,
-   };
-   xen_ulong_t idx = fgmfn;
-   xen_pfn_t gpfn = lpfn;
-   int err = 0;
-
-   set_xen_guest_handle(xatp.idxs, &idx);
-   set_xen_guest_handle(xatp.gpfns, &gpfn);
-   set_xen_guest_handle(xatp.errs, &err);
-
-   rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
-   return rc < 0 ? rc : err;
-}
-
 struct remap_data {
xen_pfn_t *fgmfn; /* foreign domain's gmfn */
+   xen_pfn_t *efgmfn; /* pointer to the end of the fgmfn array */
pgprot_t prot;
domid_t  domid;
struct vm_area_struct *vma;
@@ -71,24 +49,75 @@ struct remap_data {
struct xen_remap_mfn_info *info;
int *err_ptr;
int mapped;
+
+   /* Hypercall parameters */
+   int h_errs[XEN_PFN_PER_PAGE];
+   xen_ulong_t h_idxs[XEN_PFN_PER_PAGE];
+   xen_pfn_t h_gpfns[XEN_PFN_PER_PAGE];
+
+   int h_iter; /* Iterator */
 };
 
+static int setup_hparams(struct page *page, unsigned long pfn, void *data)
+{
+   struct remap_data *info = data;
+
+   /* We may not have enough domain's gmfn to fill a Linux Page */
+   if (info->fgmfn == info->efgmfn)
+   return 0;
+
+   info->h_idxs[info->h_iter] = *info->fgmfn;
+   info->h_gpfns[info->h_iter] = pfn;
+   info->h_errs[info->h_iter] = 0;
+   info->h_iter++;
+
+   info->fgmfn++;
+
+   return 0;
+}
+
 static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
void *data)
 {
struct remap_data *info = data;
struct page *page = info->pages[info->index++];
-   unsigned long pfn = page_to_pfn(page);
-   pte_t pte = pte_mkspecial(pfn_pte(pfn, info->

[Xen-devel] [PATCH v2 11/20] tty/hvc: xen: Use xen page definition

2015-07-09 Thread Julien Grall
The console ring is always based on the page granularity of Xen.

Signed-off-by: Julien Grall 
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: David Vrabel 
Cc: Stefano Stabellini 
Cc: Boris Ostrovsky 
Cc: linuxppc-...@lists.ozlabs.org
---
 drivers/tty/hvc/hvc_xen.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index a9d837f..2135944 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
if (r < 0 || v == 0)
goto err;
mfn = v;
-   info->intf = xen_remap(mfn << PAGE_SHIFT, PAGE_SIZE);
+   info->intf = xen_remap(mfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE);
if (info->intf == NULL)
goto err;
info->vtermno = HVC_COOKIE;
@@ -392,7 +392,7 @@ static int xencons_connect_backend(struct xenbus_device 
*dev,
if (xen_pv_domain())
mfn = virt_to_mfn(info->intf);
else
-   mfn = __pa(info->intf) >> PAGE_SHIFT;
+   mfn = __pa(info->intf) >> XEN_PAGE_SHIFT;
ret = gnttab_alloc_grant_references(1, &gref_head);
if (ret < 0)
return ret;
@@ -476,7 +476,7 @@ static int xencons_resume(struct xenbus_device *dev)
struct xencons_info *info = dev_get_drvdata(&dev->dev);
 
xencons_disconnect_backend(info);
-   memset(info->intf, 0, PAGE_SIZE);
+   memset(info->intf, 0, XEN_PAGE_SIZE);
return xencons_connect_backend(dev, info);
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 20/20] arm/xen: Add support for 64KB page granularity

2015-07-09 Thread Julien Grall
The hypercall interface is always using 4KB page granularity. This is
requiring to use xen page definition macro when we deal with hypercall.

Note that pfn_to_mfn is working with a Xen pfn (i.e 4KB). We may want to
rename pfn_mfn to make this explicit.

We also allocate a 64KB page for the shared page even though only the
first 4KB is used. I don't think this is really important for now as it
helps to have the pointer 4KB aligned (XENMEM_add_to_physmap is taking a
Xen PFN).

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 
Cc: Russell King 

---
Changes in v2
- Add Stefano's reviewed-by
---
 arch/arm/include/asm/xen/page.h | 12 ++--
 arch/arm/xen/enlighten.c|  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 1bee8ca..ab6eb9a 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -56,19 +56,19 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
 
 static inline xmaddr_t phys_to_machine(xpaddr_t phys)
 {
-   unsigned offset = phys.paddr & ~PAGE_MASK;
-   return XMADDR(PFN_PHYS(pfn_to_mfn(PFN_DOWN(phys.paddr))) | offset);
+   unsigned offset = phys.paddr & ~XEN_PAGE_MASK;
+   return XMADDR(XEN_PFN_PHYS(pfn_to_mfn(XEN_PFN_DOWN(phys.paddr))) | 
offset);
 }
 
 static inline xpaddr_t machine_to_phys(xmaddr_t machine)
 {
-   unsigned offset = machine.maddr & ~PAGE_MASK;
-   return XPADDR(PFN_PHYS(mfn_to_pfn(PFN_DOWN(machine.maddr))) | offset);
+   unsigned offset = machine.maddr & ~XEN_PAGE_MASK;
+   return XPADDR(XEN_PFN_PHYS(mfn_to_pfn(XEN_PFN_DOWN(machine.maddr))) | 
offset);
 }
 /* VIRT <-> MACHINE conversion */
 #define virt_to_machine(v) (phys_to_machine(XPADDR(__pa(v
-#define virt_to_mfn(v) (pfn_to_mfn(virt_to_pfn(v)))
-#define mfn_to_virt(m) (__va(mfn_to_pfn(m) << PAGE_SHIFT))
+#define virt_to_mfn(v) (pfn_to_mfn(virt_to_phys(v) >> XEN_PAGE_SHIFT))
+#define mfn_to_virt(m) (__va(mfn_to_pfn(m) << XEN_PAGE_SHIFT))
 
 static inline xmaddr_t arbitrary_virt_to_machine(void *vaddr)
 {
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 6c09cc4..c7d32af 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -96,8 +96,8 @@ static void xen_percpu_init(void)
pr_info("Xen: initializing cpu%d\n", cpu);
vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
 
-   info.mfn = __pa(vcpup) >> PAGE_SHIFT;
-   info.offset = offset_in_page(vcpup);
+   info.mfn = __pa(vcpup) >> XEN_PAGE_SHIFT;
+   info.offset = xen_offset_in_page(vcpup);
 
err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
BUG_ON(err);
@@ -220,7 +220,7 @@ static int __init xen_guest_init(void)
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
-   xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+   xatp.gpfn = __pa(shared_info_page) >> XEN_PAGE_SHIFT;
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
BUG();
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 17/20] net/xen-netfront: Make it running on 64KB page granularity

2015-07-09 Thread Julien Grall
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Cc: net...@vger.kernel.org
---

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.

Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.

Changes in v2:
- Use gnttab_foreach_grant to split a Linux page in grant
- Fix count slots
---
 drivers/net/xen-netfront.c | 121 -
 1 file changed, 87 insertions(+), 34 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index f948c46..7233b09 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF  0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 /* Minimum number of Rx slots (includes slot for GSO metadata). */
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
struct sk_buff *skb;
unsigned short id;
grant_ref_t ref;
-   unsigned long pfn;
+   struct page *page;
struct xen_netif_rx_request *req;
 
skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
BUG_ON((signed short)ref < 0);
queue->grant_rx_ref[id] = ref;
 
-   pfn = page_to_pfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+   page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
req = RING_GET_REQUEST(&queue->rx, req_prod);
-   gnttab_grant_foreign_access_ref(ref,
-   queue->info->xbdev->otherend_id,
-   pfn_to_mfn(pfn),
-   0);
-
+   gnttab_page_grant_foreign_access_ref(ref,
+
queue->info->xbdev->otherend_id,
+page,
+0);
req->id = id;
req->gref = ref;
}
@@ -415,15 +414,26 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
xennet_maybe_wake_tx(queue);
 }
 
-static struct xen_netif_tx_request *xennet_make_one_txreq(
-   struct netfront_queue *queue, struct sk_buff *skb,
-   struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq
+{
+   struct netfront_queue *queue;
+   struct sk_buff *skb;
+   struct page *page;
+   struct xen_netif_tx_request *tx; /* Last request */
+   unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long mfn, unsigned int offset,
+ unsigned int *len, void *data)
 {
+   struct xennet_gnttab_make_txreq *info = data;
unsigned int id;
struct xen_netif_tx_request *tx;
grant_ref_t ref;
-
-   len = min_t(unsigned int, PAGE_SIZE - offset, len);
+   /* convenient aliases */
+   struct page *page = info->page;
+   struct netfront_queue *queue = info->queue;
+   struct sk_buff *skb = info->skb;
 
id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
@@ -431,7 +441,7 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
BUG_ON((signed short)ref < 0);
 
gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
-   page_to_mfn(page), GNTMAP_readonly);
+   mfn, GNTMAP_readonly);
 
queue->tx_skbs[id].skb = skb;
queue->grant_tx_page[id] = page;
@@ -440,10 +450,37 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
tx->id = id;
tx->gref = ref;
tx->offset = offset;
-   tx->size = len;
+   tx->size = *len;
tx->flags = 0;
 
-   return tx;
+

[Xen-devel] [PATCH v2 00/20] xen/arm64: Add support for 64KB page

2015-07-09 Thread Julien Grall
Hi all,

ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
hypercall interface and PV protocol are always based on 4KB page granularity.

Any attempt to boot a Linux guest with 64KB pages enabled will result to a
guest crash.

This series is a first attempt to allow those Linux running with the current
hypercall interface and PV protocol.

This solution has been chosen because we want to run Linux 64KB in released
Xen ARM version or/and platform using an old version of Linux DOM0.

There is room for improvement, such as support of 64KB grant, modification
of PV protocol to support different page size... They will be explored in a
separate patch series later.

For this new version, new helpers have been added in order to split the Linux
page in multiple grant. This is requiring to produce some callback.

So far I've only done a quick network performance test using iperf. The
server is living in DOM0 and the client in the guest.

Average betwen 10 iperf :

DOM0Guest   Result

4KB-mod 64KB3.176 Gbits/sec
4KB-mod 4KB-mod 3.245 Gbits/sec
4KB-mod 4KB 3.258 Gbits/sec
4KB 4KB 3.292 Gbits/sec
4KB 4KB-mod 3.265 Gbits/sec
4KB 64KB3.189 Gbits/sec

4KB-mod: Linux with the 64KB patch series
4KB: linux/master

The network performance is slightly worst with this series (-0.15%). I suspect,
this is because of using an indirection to setup the grant. This is necessary
in order to ensure that the grant will be correctly sized no matter of the
Linux page granularity. This could be used later in order to support bigger
grant.

TODO list:
- swiotlb not yet converted to 64KB pages
- it may be possible to move some common define between netback/netfront
and blkfront/blkback in an header.

Note that the patches has only been build tested on x86. It may also be
possible that they are not compiling one by one. I will give a look for
the next version.

A branch based on the latest linux/master can be found here:

git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v2

Comments, suggestions are welcomed.

Sincerely yours,

Cc: david.vra...@citrix.com
Cc: konrad.w...@oracle.com
Cc: boris.ostrov...@oracle.com
Cc: wei.l...@citrix.com
Cc: roger@citrix.com

Julien Grall (20):
  xen: Add Xen specific page definition
  xen: Introduce a function to split a Linux page into Xen page
  xen/grant: Introduce helpers to split a page into grant
  xen/grant: Add helper gnttab_page_grant_foreign_access_ref
  block/xen-blkfront: Split blkif_queue_request in 2
  block/xen-blkfront: Store a page rather a pfn in the grant structure
  block/xen-blkfront: split get_grant in 2
  net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
  xen/biomerge: Don't allow biovec to be merge when Linux is not using
4KB page
  xen/xenbus: Use Xen page definition
  tty/hvc: xen: Use xen page definition
  xen/balloon: Don't rely on the page granularity is the same for Xen
and Linux
  xen/events: fifo: Make it running on 64KB granularity
  xen/grant-table: Make it running on 64KB granularity
  block/xen-blkfront: Make it running on 64KB page granularity
  block/xen-blkback: Make it running on 64KB page granularity
  net/xen-netfront: Make it running on 64KB page granularity
  net/xen-netback: Make it running on 64KB page granularity
  xen/privcmd: Add support for Linux 64KB page granularity
  arm/xen: Add support for 64KB page granularity

 arch/arm/include/asm/xen/page.h |  12 +-
 arch/arm/xen/enlighten.c|   6 +-
 arch/arm/xen/p2m.c  |   6 +-
 arch/x86/include/asm/xen/page.h |   2 +-
 drivers/block/xen-blkback/blkback.c |   5 +-
 drivers/block/xen-blkback/common.h  |  16 +-
 drivers/block/xen-blkback/xenbus.c  |   9 +-
 drivers/block/xen-blkfront.c| 536 +++-
 drivers/net/xen-netback/common.h|  15 +-
 drivers/net/xen-netback/netback.c   | 148 ++
 drivers/net/xen-netfront.c  | 121 +---
 drivers/tty/hvc/hvc_xen.c   |   6 +-
 drivers/xen/balloon.c   | 147 +++---
 drivers/xen/biomerge.c  |   7 +
 drivers/xen/events/events_base.c|   2 +-
 drivers/xen/events/events_fifo.c|   2 +-
 drivers/xen/grant-table.c   |  32 ++-
 drivers/xen/privcmd.c   |   8 +-
 drivers/xen/xenbus/xenbus_client.c  |   6 +-
 drivers/xen/xenbus/xenbus_probe.c   |   4 +-
 drivers/xen/xlate_mmu.c | 127 ++---
 include/xen/grant_table.h   |  50 
 include/xen/page.h  |  41 ++-
 23 files changed, 896 insertions(+), 412 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 02/20] xen: Introduce a function to split a Linux page into Xen page

2015-07-09 Thread Julien Grall
The Xen interface is always using 4KB page. This means that a Linux page
may be split across multiple Xen page when the page granularity is not
the same.

This helper will break down a Linux page into 4KB chunk and call the
helper on each of them.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
Changes in v2:
- Patch added
---
 include/xen/page.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/include/xen/page.h b/include/xen/page.h
index 8ebd37b..b1f7722 100644
--- a/include/xen/page.h
+++ b/include/xen/page.h
@@ -39,4 +39,24 @@ struct xen_memory_region 
xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS];
 
 extern unsigned long xen_released_pages;
 
+typedef int (*xen_pfn_fn_t)(struct page *page, unsigned long pfn, void *data);
+
+/* Break down the page in 4KB granularity and call fn foreach xen pfn */
+static inline int xen_apply_to_page(struct page *page, xen_pfn_fn_t fn,
+   void *data)
+{
+   unsigned long pfn = xen_page_to_pfn(page);
+   int i;
+   int ret;
+
+   for (i = 0; i < XEN_PFN_PER_PAGE; i++, pfn++) {
+   ret = fn(page, pfn, data);
+   if (ret)
+   return ret;
+   }
+
+   return ret;
+}
+
+
 #endif /* _XEN_PAGE_H */
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 04/20] xen/grant: Add helper gnttab_page_grant_foreign_access_ref

2015-07-09 Thread Julien Grall
Many PV drivers contain the idiom:

pfn = page_to_mfn(...) /* Or similar */
gnttab_grant_foreign_access_ref

Replace it by a new helper. Note that when Linux is using a different
page granularity than Xen, the helper only gives access to the first 4KB
grant.

This is useful where drivers are allocating a full Linux page for each
grant.

Also include xen/interface/grant_table.h rather than xen/grant_table.h in
asm/page.h for x86 to fix a compilation issue [1]. Only the former is
useful in order to get the structure definition.

[1] Interpendency between asm/page.h and xen/grant_table.h which result
to page_mfn not being defined when necessary.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
Changes in v2:
- Patch added
---
 arch/x86/include/asm/xen/page.h | 2 +-
 include/xen/grant_table.h   | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index c44a5d5..fb2e037 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -12,7 +12,7 @@
 #include 
 
 #include 
-#include 
+#include 
 #include 
 
 /* Xen machine address */
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 6f77378..6a1ef86 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -131,6 +131,15 @@ void gnttab_cancel_free_callback(struct 
gnttab_free_callback *callback);
 void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid,
 unsigned long frame, int readonly);
 
+/* Give access to the first 4K of the page */
+static inline void gnttab_page_grant_foreign_access_ref(
+   grant_ref_t ref, domid_t domid,
+   struct page *page, int readonly)
+{
+   gnttab_grant_foreign_access_ref(ref, domid, page_to_mfn(page),
+   readonly);
+}
+
 void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid,
   unsigned long pfn);
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 07/20] block/xen-blkfront: split get_grant in 2

2015-07-09 Thread Julien Grall
Prepare the code to support 64KB page granularity. The first
implementation will use a full Linux page per indirect and persistent
grant. When non-persistent grant is used, each page of a bio request
may be split in multiple grant.

Furthermore, the field page of the grant structure is only used to copy
data from persistent grant or indirect grant. Avoid to set it for other
use case as it will have no meaning given the page will be split in
multiple grant.

Provide 2 functions, to setup indirect grant, the other for bio page.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Roger Pau Monné 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 85 ++--
 1 file changed, 59 insertions(+), 26 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 7b81d23..95fd067 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -242,34 +242,77 @@ out_of_memory:
return -ENOMEM;
 }
 
-static struct grant *get_grant(grant_ref_t *gref_head,
-  struct page *page,
-   struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_info *info)
 {
struct grant *gnt_list_entry;
-   unsigned long buffer_mfn;
 
BUG_ON(list_empty(&info->grants));
gnt_list_entry = list_first_entry(&info->grants, struct grant,
- node);
+ node);
list_del(&gnt_list_entry->node);
 
-   if (gnt_list_entry->gref != GRANT_INVALID_REF) {
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
info->persistent_gnts_c--;
+
+   return gnt_list_entry;
+}
+
+static void grant_foreign_access(const struct grant *gnt_list_entry,
+const struct blkfront_info *info)
+{
+   gnttab_page_grant_foreign_access_ref(gnt_list_entry->gref,
+info->xbdev->otherend_id,
+gnt_list_entry->page,
+0);
+}
+
+static struct grant *get_grant(grant_ref_t *gref_head,
+  unsigned long mfn,
+  struct blkfront_info *info)
+{
+   struct grant *gnt_list_entry = get_free_grant(info);
+
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
return gnt_list_entry;
+
+   /* Assign a gref to this page */
+   gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
+   BUG_ON(gnt_list_entry->gref == -ENOSPC);
+   if (info->feature_persistent)
+   grant_foreign_access(gnt_list_entry, info);
+   else {
+   /* Grant access to the MFN passed by the caller */
+   gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
+   info->xbdev->otherend_id,
+   mfn, 0);
}
 
+   return gnt_list_entry;
+}
+
+static struct grant *get_indirect_grant(grant_ref_t *gref_head,
+   struct blkfront_info *info)
+{
+   struct grant *gnt_list_entry = get_free_grant(info);
+
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
+   return gnt_list_entry;
+
/* Assign a gref to this page */
gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
-   BUG_ON(!page);
-   gnt_list_entry->page = page;
+   struct page *indirect_page;
+
+   /* Fetch a pre-allocated page to use for indirect grefs */
+   BUG_ON(list_empty(&info->indirect_pages));
+   indirect_page = list_first_entry(&info->indirect_pages,
+struct page, lru);
+   list_del(&indirect_page->lru);
+   gnt_list_entry->page = indirect_page;
}
-   buffer_mfn = page_to_mfn(gnt_list_entry->page);
-   gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
-   info->xbdev->otherend_id,
-   buffer_mfn, 0);
+   grant_foreign_access(gnt_list_entry, info);
+
return gnt_list_entry;
 }
 
@@ -522,29 +565,19 @@ static int blkif_queue_rw_req(struct request *req)
 
if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
(i % SEGS_PER_INDIRECT_FRAME == 0)) {
-   struct page *uninitialized_var(page);
-
if (segments)
kunmap_atomic(segments);
 
n = i / SEGS_PER_INDIRECT_FRAME;
-   if (!info->feature_persistent) {
-   struct page *ind

[Xen-devel] [PATCH v2 09/20] xen/biomerge: Don't allow biovec to be merge when Linux is not using 4KB page

2015-07-09 Thread Julien Grall
When Linux is using 64K page granularity, every page will be slipt in
multiple non-contiguous 4K MFN (page granularity of Xen).

I'm not sure how to handle efficiently the check to know whether we can
merge 2 biovec with a such case. So for now, always says that biovec are
not mergeable.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
Changes in v2:
- Remove the workaround and check if the Linux page granularity
is the same as Xen or not
---
 drivers/xen/biomerge.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index 0edb91c..571567c 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -6,10 +6,17 @@
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
   const struct bio_vec *vec2)
 {
+#if XEN_PAGE_SIZE == PAGE_SIZE
unsigned long mfn1 = pfn_to_mfn(page_to_pfn(vec1->bv_page));
unsigned long mfn2 = pfn_to_mfn(page_to_pfn(vec2->bv_page));
 
return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
((mfn1 == mfn2) || ((mfn1+1) == mfn2));
+#else
+   /* XXX: bio_vec are not mergeable when using different page size in
+* Xen and Linux
+*/
+   return 0;
+#endif
 }
 EXPORT_SYMBOL(xen_biovec_phys_mergeable);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 06/20] block/xen-blkfront: Store a page rather a pfn in the grant structure

2015-07-09 Thread Julien Grall
All the usage of the field pfn are done using the same idiom:

pfn_to_page(grant->pfn)

This will  return always the same page. Store directly the page in the
grant to clean up the code.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Roger Pau Monné 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 37 ++---
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 7107d58..7b81d23 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -67,7 +67,7 @@ enum blkif_state {
 
 struct grant {
grant_ref_t gref;
-   unsigned long pfn;
+   struct page *page;
struct list_head node;
 };
 
@@ -219,7 +219,7 @@ static int fill_grant_buffer(struct blkfront_info *info, 
int num)
kfree(gnt_list_entry);
goto out_of_memory;
}
-   gnt_list_entry->pfn = page_to_pfn(granted_page);
+   gnt_list_entry->page = granted_page;
}
 
gnt_list_entry->gref = GRANT_INVALID_REF;
@@ -234,7 +234,7 @@ out_of_memory:
 &info->grants, node) {
list_del(&gnt_list_entry->node);
if (info->feature_persistent)
-   __free_page(pfn_to_page(gnt_list_entry->pfn));
+   __free_page(gnt_list_entry->page);
kfree(gnt_list_entry);
i--;
}
@@ -243,7 +243,7 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
-   unsigned long pfn,
+  struct page *page,
struct blkfront_info *info)
 {
struct grant *gnt_list_entry;
@@ -263,10 +263,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
-   BUG_ON(!pfn);
-   gnt_list_entry->pfn = pfn;
+   BUG_ON(!page);
+   gnt_list_entry->page = page;
}
-   buffer_mfn = pfn_to_mfn(gnt_list_entry->pfn);
+   buffer_mfn = page_to_mfn(gnt_list_entry->page);
gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
info->xbdev->otherend_id,
buffer_mfn, 0);
@@ -522,7 +522,7 @@ static int blkif_queue_rw_req(struct request *req)
 
if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
(i % SEGS_PER_INDIRECT_FRAME == 0)) {
-   unsigned long uninitialized_var(pfn);
+   struct page *uninitialized_var(page);
 
if (segments)
kunmap_atomic(segments);
@@ -536,15 +536,15 @@ static int blkif_queue_rw_req(struct request *req)
indirect_page = 
list_first_entry(&info->indirect_pages,
 struct page, 
lru);
list_del(&indirect_page->lru);
-   pfn = page_to_pfn(indirect_page);
+   page = indirect_page;
}
-   gnt_list_entry = get_grant(&gref_head, pfn, info);
+   gnt_list_entry = get_grant(&gref_head, page, info);
info->shadow[id].indirect_grants[n] = gnt_list_entry;
-   segments = 
kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+   segments = kmap_atomic(gnt_list_entry->page);
ring_req->u.indirect.indirect_grefs[n] = 
gnt_list_entry->gref;
}
 
-   gnt_list_entry = get_grant(&gref_head, 
page_to_pfn(sg_page(sg)), info);
+   gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
ref = gnt_list_entry->gref;
 
info->shadow[id].grants_used[i] = gnt_list_entry;
@@ -555,7 +555,7 @@ static int blkif_queue_rw_req(struct request *req)
 
BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-   shared_data = 
kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+   shared_data = kmap_atomic(gnt_list_entry->page);
bvec_data = kmap_atomic(sg_page(sg));
 
/*
@@ -1002,7 +1002,7 @@ static void blkif_free(struct blkfront_info *info, int 
suspend)
info->persistent_gnts_c--;
}
if (info->feature_persistent)
-   __free_page(pfn_to_page(persistent_gnt->pfn));
+

[Xen-devel] [PATCH v2 01/20] xen: Add Xen specific page definition

2015-07-09 Thread Julien Grall
The Xen hypercall interface is always using 4K page granularity on ARM
and x86 architecture.

With the incoming support of 64K page granularity for ARM64 guest, it
won't be possible to re-use the Linux page definition in Xen drivers.

Introduce Xen page definition helpers based on the Linux page
definition. They have exactly the same name but prefixed with
XEN_/xen_ prefix.

Also modify page_to_pfn to use new Xen page definition.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
I'm wondering if we should drop page_to_pfn has the macro will likely
misuse when Linux is using 64KB page granularity.

Changes in v2:
- Add XEN_PFN_UP
- Add a comment describing the behavior of page_to_pfn
---
 include/xen/page.h | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/xen/page.h b/include/xen/page.h
index c5ed20b..8ebd37b 100644
--- a/include/xen/page.h
+++ b/include/xen/page.h
@@ -1,11 +1,30 @@
 #ifndef _XEN_PAGE_H
 #define _XEN_PAGE_H
 
+#include 
+
+/* The hypercall interface supports only 4KB page */
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE  (_AC(1,UL) << XEN_PAGE_SHIFT)
+#define XEN_PAGE_MASK  (~(XEN_PAGE_SIZE-1))
+#define xen_offset_in_page(p)  ((unsigned long)(p) & ~XEN_PAGE_MASK)
+#define xen_pfn_to_page(pfn)   \
+   ((pfn_to_page(((unsigned long)(pfn) << XEN_PAGE_SHIFT) >> PAGE_SHIFT)))
+#define xen_page_to_pfn(page)  \
+   (((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
+
+#define XEN_PFN_PER_PAGE   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PFN_DOWN(x)((x) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_UP(x)  (((x) + XEN_PAGE_SIZE-1) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_PHYS(x)((phys_addr_t)(x) << XEN_PAGE_SHIFT)
+
 #include 
 
+/* Return the MFN associated to the first 4KB of the page */
 static inline unsigned long page_to_mfn(struct page *page)
 {
-   return pfn_to_mfn(page_to_pfn(page));
+   return pfn_to_mfn(xen_page_to_pfn(page));
 }
 
 struct xen_memory_region {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 03/20] xen/grant: Introduce helpers to split a page into grant

2015-07-09 Thread Julien Grall
Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.

The new helpers will be in charge to split the Linux page into grant and
call a function given by the caller on each grant.

In order to help some PV drivers, the callback is allowed to use less
data and must update the resulting length. This is useful for netback.

Also provide and helper to count the number of grants within a given
contiguous region.

Signed-off-by: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
---
Changes in v2:
- Patch added
---
 drivers/xen/grant-table.c | 26 ++
 include/xen/grant_table.h | 41 +
 2 files changed, 67 insertions(+)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 62f591f..3679293 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -296,6 +296,32 @@ int gnttab_end_foreign_access_ref(grant_ref_t ref, int 
readonly)
 }
 EXPORT_SYMBOL_GPL(gnttab_end_foreign_access_ref);
 
+void gnttab_foreach_grant(struct page *page, unsigned int offset,
+ unsigned int len, xen_grant_fn_t fn,
+ void *data)
+{
+   unsigned int goffset;
+   unsigned int glen;
+   unsigned long pfn;
+
+   len = min_t(unsigned int, PAGE_SIZE - offset, len);
+   goffset = offset & ~XEN_PAGE_MASK;
+
+   pfn = xen_page_to_pfn(page) + (offset >> XEN_PAGE_SHIFT);
+
+   while (len) {
+   glen = min_t(unsigned int, XEN_PAGE_SIZE - goffset, len);
+   fn(pfn_to_mfn(pfn), goffset, &glen, data);
+
+   goffset += glen;
+   if (goffset == XEN_PAGE_SIZE) {
+   goffset = 0;
+   pfn++;
+   }
+   len -= glen;
+   }
+}
+
 struct deferred_entry {
struct list_head list;
grant_ref_t ref;
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 4478f4b..6f77378 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -45,8 +45,10 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
+#include 
 
 #define GNTTAB_RESERVED_XENSTORE 1
 
@@ -224,4 +226,43 @@ static inline struct xen_page_foreign 
*xen_page_foreign(struct page *page)
 #endif
 }
 
+/* Split Linux page in chunk of the size of the grant and call fn
+ *
+ * Parameters of fn:
+ * mfn: machine frame number based on grant granularity
+ * offset: offset in the grant
+ * len: length of the data in the grant. If fn decides to use less data,
+ * it must update len.
+ * data: internal information
+ */
+typedef void (*xen_grant_fn_t)(unsigned long mfn, unsigned int offset,
+  unsigned int *len, void *data);
+
+void gnttab_foreach_grant(struct page *page, unsigned int offset,
+ unsigned int len, xen_grant_fn_t fn,
+ void *data);
+
+/* Helper to get to call fn only on the first "grant chunk" */
+static inline void gnttab_one_grant(struct page *page, unsigned int offset,
+   unsigned len, xen_grant_fn_t fn,
+   void *data)
+{
+   /* The first request is limited to the size of one grant */
+   len = min_t(unsigned int, XEN_PAGE_SIZE - (offset & ~XEN_PAGE_MASK),
+   len);
+
+   gnttab_foreach_grant(page, offset, len, fn, data);
+}
+
+/* Get the number of grant in a specified region
+ *
+ * offset: Offset in the first page
+ * len: total lenght of data (can cross multiple page)
+ */
+static inline unsigned int gnttab_count_grant(unsigned int offset,
+ unsigned int len)
+{
+   return (XEN_PFN_UP((offset & ~XEN_PAGE_MASK) + len));
+}
+
 #endif /* __ASM_GNTTAB_H__ */
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 08/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop

2015-07-09 Thread Julien Grall
The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall 
Cc: Ian Campbell 
Cc: Wei Liu 
Cc: net...@vger.kernel.org
---
Changes in v2:
- Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 880d0d6..3f77030 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
*queue, struct sk_buff *skb
unsigned long bytes;
int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
+   if (skb_is_gso(skb)) {
+   if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+   else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+   }
+
/* Data must not cross a page boundary. */
BUG_ON(size + offset > PAGE_SIZEgso_type & SKB_GSO_TCPV6)
-   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
-   }
-
if (*head && ((1 << gso_type) & queue->vif->gso_mask))
queue->rx.req_cons++;
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [libvirt test] 59256: regressions - FAIL

2015-07-09 Thread osstest service owner
flight 59256 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59256/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-libvirt-xsm  11 guest-start   fail REGR. vs. 58842

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-libvirt  11 guest-start  fail   like 58842
 test-amd64-amd64-libvirt 11 guest-start  fail   like 58842

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  e9c2734441af0065c69fc1317965a6dd6c7f14e3
baseline version:
 libvirt  d10a5f58c75e7eb5943b44cc36a1e768adb2cdb0

Last test of basis58842  2015-06-23 04:23:54 Z   16 days
Failing since 58870  2015-06-24 04:20:11 Z   15 days   13 attempts
Testing same since59256  2015-07-09 04:20:20 Z0 days1 attempts


People who touched revisions under test:
  Andrea Bolognani 
  Boris Fiuczynski 
  Daniel Veillard 
  Dmitry Guryanov 
  Eric Blake 
  Erik Skultety 
  Jim Fehlig 
  Jiri Denemark 
  John Ferlan 
  Ján Tomko 
  Laine Stump 
  Luyao Huang 
  Martin Kletzander 
  Maxim Nestratov 
  Michal Dubiel 
  Michal Privoznik 
  Mikhail Feoktistov 
  Nikolay Shirokovskiy 
  Nikolay Shirokovskiy 
  Pavel Fedin 
  Pavel Hrdina 
  Peter Krempa 
  Prerna Saxena 
  Serge Hallyn 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  fail
 test-amd64-amd64-libvirt fail
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  fail



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 2139 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-3.18 test] 59222: regressions - FAIL

2015-07-09 Thread osstest service owner
flight 59222 linux-3.18 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59222/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail REGR. vs. 58581

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt  6 xen-bootfail pass in 59001

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate 
fail baseline untested
 test-armhf-armhf-xl-rtds 11 guest-startfail in 59001 baseline untested
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   like 58558
 test-amd64-amd64-libvirt 11 guest-start  fail   like 58558
 test-amd64-i386-libvirt  11 guest-start  fail   like 58581
 test-armhf-armhf-xl-multivcpu  6 xen-boot fail  like 58581
 test-armhf-armhf-xl   6 xen-boot fail   like 58581
 test-armhf-armhf-xl-credit2   6 xen-boot fail   like 58581
 test-armhf-armhf-xl-xsm   6 xen-boot fail   like 58581
 test-armhf-armhf-libvirt-xsm  6 xen-boot fail   like 58581
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58581
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 58581
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 58581

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt-xsm  12 migrate-support-check fail in 59001 never pass
 test-amd64-amd64-libvirt 12 migrate-support-check fail in 59001 never pass
 test-armhf-armhf-libvirt 12 migrate-support-check fail in 59001 never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-freebsd10-i386  9 freebsd-install  fail never pass
 test-amd64-i386-freebsd10-amd64  9 freebsd-install fail never pass
 test-armhf-armhf-xl-cubietruck  6 xen-boot fail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 guest-start.2fail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass

version targeted for testing:
 linuxea5dd38e93b3bec3427e5d3eef000bbf5d637e76
baseline version:
 linuxd048c068d00da7d4cfa5ea7651933b99026958cf

Last test of basis58581  2015-06-15 09:42:22 Z   24 days
Testing same since58976  2015-06-29 19:43:23 Z9 days   11 attempts


People who touched revisions under test:
  "Eric W. Biederman" 
  Aaron Lu 
  Alexander Duyck 
  Alexei Starovoitov 
  Andrew de los Reyes 
  Andrew Morton 
  Andy Lutomirski 
  Anna Schumaker 
  Aravind Gopalakrishnan 
  Ard Biesheuvel 
  Arnd Bergmann 
  Baruch Siach 
  Ben Serebrin 
  Benjamin Tissoires 
  Bjørn Mork 
  Borislav Petkov 
  Chuck Lever 
  Cong Wang 
  Daniel Borkmann 
  Darren Hart 
  Darren Salt 
  David Daney 
  David S. Miller 
  David Woodhouse 
  Devesh Sharma 
  Doug Ledford 
  Eric Dumazet 
  Eric W. Biederman 
  Felipe Balbi 
  Feng Kan 
  Florent Fourcot 
  Florian Fainelli 
  Geert Uytterhoeven 
  Grant Likely 
  Greg Kroah-Hartman 
  Hannes Frederic Sowa 
  Henning Rogge 
  Honggang Li 
  Ian Campbell 
  Ilya Dryomov 
  Ingo Molnar 
  Jakub Sitnicki 
  Jason Gunthorpe 
  Jiada Wang 
  Jiri Kosina 
  Jiri Pirko 
  Joerg Roedel 
  Jovi Zhangwei 
  Ken Xue 
  Kevin Hilman 
  Konstantin Khlebnikov 
  Laura Abbott 
  Laurent Pinchart 
  Linus Lüssing 
  Linus Torvalds 
  Linus Walleij 
  Mark Brown 
  Mark Salyzyn 
  Matt Fleming 
  Matthew Garrett 
  Mauro Carvalho Chehab 
  Max Filippov 
  Meghana Cheripady 
  Mel Gorman 
  Mika Westerberg 
  Milan Plzik 
  Neal Cardwell 
  Neil Horman 
  Nicolas Dichtel 
  Nicolas Iooss 
  Nicolas Pitre 
  Nikolay Aleksandrov 
  Oliver Neukum 
  Oliver Neukum 
  oli...@neukum.org 
  Olof Johansson 
  Pablo Neira Ayuso 
  Paolo Bonzini 
  Pelle Nilsson 
  Peter Zijlstra (Intel) 
  Rafael J. Wysocki 
  Raphael Assenat 
  Richard Cochran 
  Rik Theys 
  Ross Lagerwall 
  Roy Franz 
  Russell King 
  Sasha Levin 
  Sean Young 
  Shawn Bohrer 
  Simon Horman 
  Sowmini Varadhan 
  Sriharsha Basavapatna 
  Steffen Klassert 
  Thadeu Lima de Souza Cascardo 
  Theodore Ts'o 
  Uwe Kleine-König 
  Veaceslav Falico 
  Veeresh U. Kokatnur 
  Vijay Subramanian 
  Vinod Koul 
  Vittorio Gambaletta 
  Vlad Yasevich 
  Vladislav Yasevich 
  WANG Cong 
  Wei Liu 
  Yao Xiwei 
  Yoshihiro Shimoda 
  Yuchung Cheng 

jobs:
 build-amd64-xsm  pass
 build-armh

Re: [Xen-devel] [RFC Patch V1 06/12] PCI: Use for_pci_msi_entry() to access MSI device list

2015-07-09 Thread Konrad Rzeszutek Wilk
On Thu, Jul 09, 2015 at 04:00:41PM +0800, Jiang Liu wrote:
> Use accessor for_pci_msi_entry() to access MSI device list, so we could easily
> move msi_list from struct pci_dev into struct device later.
> 
> Signed-off-by: Jiang Liu 
> ---
>  drivers/pci/msi.c  |   39 ---
>  drivers/pci/xen-pcifront.c |2 +-

Acked-by on the Xen bits.

>  2 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 7b4c20c9f9ca..d09afa78d7a1 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -131,7 +131,7 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int 
> nvec, int type)
>   if (type == PCI_CAP_ID_MSI && nvec > 1)
>   return 1;
>  
> - list_for_each_entry(entry, &dev->msi_list, list) {
> + for_each_pci_msi_entry(entry, dev) {
>   ret = arch_setup_msi_irq(dev, entry);
>   if (ret < 0)
>   return ret;
> @@ -151,7 +151,7 @@ void default_teardown_msi_irqs(struct pci_dev *dev)
>   int i;
>   struct msi_desc *entry;
>  
> - list_for_each_entry(entry, &dev->msi_list, list)
> + for_each_pci_msi_entry(entry, dev)
>   if (entry->irq)
>   for (i = 0; i < entry->nvec_used; i++)
>   arch_teardown_msi_irq(entry->irq + i);
> @@ -168,7 +168,7 @@ static void default_restore_msi_irq(struct pci_dev *dev, 
> int irq)
>  
>   entry = NULL;
>   if (dev->msix_enabled) {
> - list_for_each_entry(entry, &dev->msi_list, list) {
> + for_each_pci_msi_entry(entry, dev) {
>   if (irq == entry->irq)
>   break;
>   }
> @@ -282,7 +282,7 @@ void default_restore_msi_irqs(struct pci_dev *dev)
>  {
>   struct msi_desc *entry;
>  
> - list_for_each_entry(entry, &dev->msi_list, list)
> + for_each_pci_msi_entry(entry, dev)
>   default_restore_msi_irq(dev, entry->irq);
>  }
>  
> @@ -363,21 +363,22 @@ EXPORT_SYMBOL_GPL(pci_write_msi_msg);
>  
>  static void free_msi_irqs(struct pci_dev *dev)
>  {
> + struct list_head *msi_list = dev_to_msi_list(&dev->dev);
>   struct msi_desc *entry, *tmp;
>   struct attribute **msi_attrs;
>   struct device_attribute *dev_attr;
>   int i, count = 0;
>  
> - list_for_each_entry(entry, &dev->msi_list, list)
> + for_each_pci_msi_entry(entry, dev)
>   if (entry->irq)
>   for (i = 0; i < entry->nvec_used; i++)
>   BUG_ON(irq_has_action(entry->irq + i));
>  
>   pci_msi_teardown_msi_irqs(dev);
>  
> - list_for_each_entry_safe(entry, tmp, &dev->msi_list, list) {
> + list_for_each_entry_safe(entry, tmp, msi_list, list) {
>   if (entry->msi_attrib.is_msix) {
> - if (list_is_last(&entry->list, &dev->msi_list))
> + if (list_is_last(&entry->list, msi_list))
>   iounmap(entry->mask_base);
>   }
>  
> @@ -448,7 +449,7 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
>  
>   if (!dev->msix_enabled)
>   return;
> - BUG_ON(list_empty(&dev->msi_list));
> + BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));
>  
>   /* route the table */
>   pci_intx_for_msi(dev, 0);
> @@ -456,7 +457,7 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
>   PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
>  
>   arch_restore_msi_irqs(dev);
> - list_for_each_entry(entry, &dev->msi_list, list)
> + for_each_pci_msi_entry(entry, dev)
>   msix_mask_irq(entry, entry->masked);
>  
>   pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
> @@ -501,7 +502,7 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>   int count = 0;
>  
>   /* Determine how many msi entries we have */
> - list_for_each_entry(entry, &pdev->msi_list, list)
> + for_each_pci_msi_entry(entry, pdev)
>   ++num_msi;
>   if (!num_msi)
>   return 0;
> @@ -510,7 +511,7 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>   msi_attrs = kzalloc(sizeof(void *) * (num_msi + 1), GFP_KERNEL);
>   if (!msi_attrs)
>   return -ENOMEM;
> - list_for_each_entry(entry, &pdev->msi_list, list) {
> + for_each_pci_msi_entry(entry, pdev) {
>   msi_dev_attr = kzalloc(sizeof(*msi_dev_attr), GFP_KERNEL);
>   if (!msi_dev_attr)
>   goto error_attrs;
> @@ -599,7 +600,7 @@ static int msi_verify_entries(struct pci_dev *dev)
>  {
>   struct msi_desc *entry;
>  
> - list_for_each_entry(entry, &dev->msi_list, list) {
> + for_each_pci_msi_entry(entry, dev) {
>   if (!dev->no_64bit_msi || !entry->msg.address_hi)
>   continue;
>   dev_err(&dev->dev, "Device has broken 64-bit MSI but arc

Re: [Xen-devel] [RFC Patch V1 05/12] x86, PCI: Use for_pci_msi_entry() to access MSI device list

2015-07-09 Thread Konrad Rzeszutek Wilk
On Thu, Jul 09, 2015 at 04:00:40PM +0800, Jiang Liu wrote:
> Use accessor for_pci_msi_entry() to access MSI device list, so we could
> easily move msi_list from struct pci_dev into struct device later.
> 
> Signed-off-by: Jiang Liu 

Looks pretty simple. Acked- by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/pci/xen.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index d22f4b5bbc04..ff31ab464213 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -179,7 +179,7 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int 
> nvec, int type)
>   if (ret)
>   goto error;
>   i = 0;
> - list_for_each_entry(msidesc, &dev->msi_list, list) {
> + for_each_pci_msi_entry(msidesc, dev) {
>   irq = xen_bind_pirq_msi_to_irq(dev, msidesc, v[i],
>  (type == PCI_CAP_ID_MSI) ? nvec 
> : 1,
>  (type == PCI_CAP_ID_MSIX) ?
> @@ -230,7 +230,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, 
> int nvec, int type)
>   if (type == PCI_CAP_ID_MSI && nvec > 1)
>   return 1;
>  
> - list_for_each_entry(msidesc, &dev->msi_list, list) {
> + for_each_pci_msi_entry(msidesc, dev) {
>   __pci_read_msi_msg(msidesc, &msg);
>   pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>   ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
> @@ -274,7 +274,7 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev 
> *dev, int nvec, int type)
>   int ret = 0;
>   struct msi_desc *msidesc;
>  
> - list_for_each_entry(msidesc, &dev->msi_list, list) {
> + for_each_pci_msi_entry(msidesc, dev) {
>   struct physdev_map_pirq map_irq;
>   domid_t domid;
>  
> @@ -386,7 +386,7 @@ static void xen_teardown_msi_irqs(struct pci_dev *dev)
>  {
>   struct msi_desc *msidesc;
>  
> - msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
> + msidesc = first_pci_msi_entry(dev);
>   if (msidesc->msi_attrib.is_msix)
>   xen_pci_frontend_disable_msix(dev);
>   else
> -- 
> 1.7.10.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 4/4] iommu: add rmrr Xen command line option for extra rmrrs

2015-07-09 Thread Konrad Rzeszutek Wilk
On Wed, Jul 08, 2015 at 09:50:26PM -0400, elena.ufimts...@oracle.com wrote:
> From: Elena Ufimtseva 
> 
> On some platforms RMRR regions may be not specified
> in ACPI and thus will not be mapped 1:1 in dom0. This
> causes IO Page Faults and prevents dom0 from booting
> in PVH mode.
> New Xen command line option rmrr allows to specify
> such devices and memory regions. These regions are added
> to the list of RMRR defined in ACPI if the device
> is present in system. As a result, additional RMRRs will
> be mapped 1:1 in dom0 with correct permissions.
> 
> Mentioned above problems were discovered during PVH work with
> ThinkCentre M and Dell 5600T. No official documentation
> was found so far in regards to what devices and why cause this.
> Experiments show that ThinkCentre M USB devices with enabled
> debug port generate DMA read transactions to the regions of
> memory marked reserved in host e820 map.
> For Dell 5600T the device and faulting addresses are not found yet.
> 
> For detailed history of the discussion please check following threads:
> http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
> http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html
> 
> Format for rmrr Xen command line option:
> rmrr=start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
> If grub2 used and multiple ranges are specified, ';' should be
> quoted/escaped, refer to grub2 manual for more information.
> 

Reviewed-by: Konrad Rzeszutek Wilk 

Thanks!
> Signed-off-by: Elena Ufimtseva 
> ---
>  docs/misc/xen-command-line.markdown |  13 +++
>  xen/drivers/passthrough/vtd/dmar.c  | 209 
> +++-
>  2 files changed, 221 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown 
> b/docs/misc/xen-command-line.markdown
> index aa684c0..f307f3d 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1197,6 +1197,19 @@ Specify the host reboot method.
>  'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
>   default it will use that method first).
>  
> +### rmrr
> +> '= 
> start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
> +
> +Define RMRR units that are missing from ACPI table along with device they
> +belong to and use them for 1:1 mapping. End addresses can be omitted and one
> +page will be mapped. The ranges are inclusive when start and end are 
> specified.
> +If segment of the first device is not specified, segment zero will be used.
> +If other segments are not specified, first device segment will be used.
> +If a segment is specified for other than the first device and it does not 
> match
> +the one specified for the first one, an error will be reported.
> +Note: grub2 requires to escape or use quotations if special characters are 
> used,
> +namely ';', refer to the grub2 documentation if multiple ranges are 
> specified.
> +
>  ### ro-hpet
>  > `= `
>  
> diff --git a/xen/drivers/passthrough/vtd/dmar.c 
> b/xen/drivers/passthrough/vtd/dmar.c
> index a8e1e5d..f62fb02 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -869,6 +869,145 @@ out:
>  return ret;
>  }
>  
> +#define MAX_EXTRA_RMRR_PAGES 16
> +#define MAX_EXTRA_RMRR 10
> +
> +/* RMRR units derived from command line rmrr option. */
> +#define MAX_EXTRA_RMRR_DEV 20
> +struct extra_rmrr_unit {
> +struct list_head list;
> +unsigned long base_pfn, end_pfn;
> +unsigned int dev_count;
> +u32sbdf[MAX_EXTRA_RMRR_DEV];
> +};
> +static __initdata unsigned int nr_rmrr;
> +static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR];
> +
> +/* Macro for RMRR inclusive range formatting. */
> +#define PRI_RMRR(s,e) "[%lx-%lx]"
> +
> +static void __init add_extra_rmrr(void)
> +{
> +struct acpi_rmrr_unit *acpi_rmrr;
> +struct acpi_rmrr_unit *rmrru;
> +unsigned int dev, seg, i, j;
> +unsigned long pfn;
> +bool_t overlap;
> +
> +for ( i = 0; i < nr_rmrr; i++ )
> +{
> +if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn )
> +{
> +printk(XENLOG_ERR VTDPREFIX
> +   "Invalid RMRR Range "PRI_RMRR(s,e)"\n",
> +   extra_rmrr_units[i].base_pfn, 
> extra_rmrr_units[i].end_pfn);
> +continue;
> +}
> +
> +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn >=
> + MAX_EXTRA_RMRR_PAGES )
> +{
> +printk(XENLOG_ERR VTDPREFIX
> +   "RMRR range "PRI_RMRR(s,e)" exceeds 
> "__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n",
> +   extra_rmrr_units[i].base_pfn, 
> extra_rmrr_units[i].end_pfn);
> +continue;
> +}
> +
> +for ( j = 0; j < nr_rmrr; j++ )
> +{
> +if ( i != j &&
> + extra_rmrr_units[i].base_pfn <= extra_rmrr_units[j].end_pfn 
> &&
> + extra_rmrr_u

Re: [Xen-devel] [PATCH v9 2/4] iommu VT-d: separate rmrr addition function

2015-07-09 Thread Konrad Rzeszutek Wilk
On Wed, Jul 08, 2015 at 09:50:24PM -0400, elena.ufimts...@oracle.com wrote:
> From: Elena Ufimtseva 
> 
> In preparation for auxiliary RMRR data provided on Xen
> command line, make RMRR adding a separate function.
> Also free memery for rmrr device scope in error path. 

s/memery/memory/
> 
> Signed-off-by: Elena Ufimtseva 

Reviewed-by: Konrad Rzeszutek Wilk 
> ---
>  xen/drivers/passthrough/vtd/dmar.c | 126 
> +++--
>  1 file changed, 65 insertions(+), 61 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/dmar.c 
> b/xen/drivers/passthrough/vtd/dmar.c
> index 77ef708..a8e1e5d 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -585,6 +585,68 @@ out:
>  return ret;
>  }
>  
> +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru)
> +{
> +bool_t ignore = 0;
> +unsigned int i = 0;
> +int ret = 0;
> +
> +/* Skip checking if segment is not accessible yet. */
> +if ( !pci_known_segment(rmrru->segment) )
> +i = UINT_MAX;
> +
> +for ( ; i < rmrru->scope.devices_cnt; i++ )
> +{
> +u8 b = PCI_BUS(rmrru->scope.devices[i]);
> +u8 d = PCI_SLOT(rmrru->scope.devices[i]);
> +u8 f = PCI_FUNC(rmrru->scope.devices[i]);
> +
> +if ( pci_device_detect(rmrru->segment, b, d, f) == 0 )
> +{
> +dprintk(XENLOG_WARNING VTDPREFIX,
> +" Non-existent device (%04x:%02x:%02x.%u) is reported"
> +" in RMRR (%"PRIx64", %"PRIx64")'s scope!\n",
> +rmrru->segment, b, d, f,
> +rmrru->base_address, rmrru->end_address);
> +ignore = 1;
> +}
> +else
> +{
> +ignore = 0;
> +break;
> +}
> +}
> +
> +if ( ignore )
> +{
> +dprintk(XENLOG_WARNING VTDPREFIX,
> +"  Ignore the RMRR (%"PRIx64", %"PRIx64") due to "
> +"devices under its scope are not PCI discoverable!\n",
> +rmrru->base_address, rmrru->end_address);
> +scope_devices_free(&rmrru->scope);
> +xfree(rmrru);
> +}
> +else if ( rmrru->base_address > rmrru->end_address )
> +{
> +dprintk(XENLOG_WARNING VTDPREFIX,
> +"  The RMRR (%"PRIx64", %"PRIx64") is incorrect!\n",
> +rmrru->base_address, rmrru->end_address);
> +scope_devices_free(&rmrru->scope);
> +xfree(rmrru);
> +ret = -EFAULT;
> +}
> +else
> +{
> +if ( iommu_verbose )
> +dprintk(VTDPREFIX,
> +"  RMRR region: base_addr %"PRIx64" end_address 
> %"PRIx64"\n",
> +rmrru->base_address, rmrru->end_address);
> +acpi_register_rmrr_unit(rmrru);
> +}
> +
> +return ret;
> +}
> +
>  static int __init
>  acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>  {
> @@ -635,68 +697,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>  ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
> &rmrru->scope, RMRR_TYPE, rmrr->segment);
>  
> -if ( ret || (rmrru->scope.devices_cnt == 0) )
> -xfree(rmrru);
> +if ( !ret && (rmrru->scope.devices_cnt != 0) )
> +register_one_rmrr(rmrru);
>  else
> -{
> -u8 b, d, f;
> -bool_t ignore = 0;
> -unsigned int i = 0;
> -
> -/* Skip checking if segment is not accessible yet. */
> -if ( !pci_known_segment(rmrr->segment) )
> -i = UINT_MAX;
> -
> -for ( ; i < rmrru->scope.devices_cnt; i++ )
> -{
> -b = PCI_BUS(rmrru->scope.devices[i]);
> -d = PCI_SLOT(rmrru->scope.devices[i]);
> -f = PCI_FUNC(rmrru->scope.devices[i]);
> -
> -if ( !pci_device_detect(rmrr->segment, b, d, f) )
> -{
> -dprintk(XENLOG_WARNING VTDPREFIX,
> -" Non-existent device (%04x:%02x:%02x.%u) is 
> reported"
> -" in RMRR (%"PRIx64", %"PRIx64")'s scope!\n",
> -rmrr->segment, b, d, f,
> -rmrru->base_address, rmrru->end_address);
> -ignore = 1;
> -}
> -else
> -{
> -ignore = 0;
> -break;
> -}
> -}
> -
> -if ( ignore )
> -{
> -dprintk(XENLOG_WARNING VTDPREFIX,
> -"  Ignore the RMRR (%"PRIx64", %"PRIx64") due to "
> -"devices under its scope are not PCI discoverable!\n",
> -rmrru->base_address, rmrru->end_address);
> -scope_devices_free(&rmrru->scope);
> -xfree(rmrru);
> -}
> -else if ( base_addr > end_addr )
> -{
> -dprintk(XENLOG_WARNING VTDPREFIX,
> -"  The RMRR (%"PRIx64", %"PRIx64") is incorrect!\n",
> -rmrru->base_addres

[Xen-devel] [PATCH v2 17/27] tools/libxl: Support converting a legacy stream to a v2 stream

2015-07-09 Thread Andrew Cooper
When a legacy stream is found, it needs to be converted to a v2 stream for the
reading logic.  This is done by exec()ing the python conversion utility.

One complication is that the caller of this interface needs to assume
ownership of the output fd, to prevent it being closed while still in use in a
datacopier.

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
No major differences since v1.  It has gained a new ao_abrt as a result
of rebasing over the AO ABORT series, and has been made x86-specific.
---
 tools/libxl/Makefile|1 +
 tools/libxl/libxl_convert_callout.c |  172 +++
 tools/libxl/libxl_internal.h|   48 ++
 3 files changed, 221 insertions(+)
 create mode 100644 tools/libxl/libxl_convert_callout.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index c71c5fe..0ebc35a 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -59,6 +59,7 @@ endif
 LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
+LIBXL_OBJS-$(CONFIG_X86) += libxl_convert_callout.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
 
 ifeq ($(CONFIG_NetBSD),y)
diff --git a/tools/libxl/libxl_convert_callout.c 
b/tools/libxl/libxl_convert_callout.c
new file mode 100644
index 000..02f3b82
--- /dev/null
+++ b/tools/libxl/libxl_convert_callout.c
@@ -0,0 +1,172 @@
+/*
+ * Copyright (C) 2014  Citrix Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h"
+
+#include "libxl_internal.h"
+
+/*
+ * Infrastructure for converting a legacy migration stream into a libxl v2
+ * stream.
+ *
+ * This is done by fork()ing the python conversion script, which takes in a
+ * legacy stream, and puts out a suitably-formatted v2 stream.
+ */
+
+static void helper_failed(libxl__egc *egc,
+  libxl__conversion_helper_state *chs, int rc);
+static void helper_stop(libxl__egc *egc, libxl__ao_abortable*, int rc);
+static void helper_exited(libxl__egc *egc, libxl__ev_child *ch,
+  pid_t pid, int status);
+static void helper_done(libxl__egc *egc,
+libxl__conversion_helper_state *chs);
+
+void libxl__conversion_helper_init(libxl__conversion_helper_state *chs)
+{
+libxl__ao_abortable_init(&chs->abrt);
+libxl__ev_child_init(&chs->child);
+}
+
+void libxl__convert_legacy_stream(libxl__egc *egc,
+  libxl__conversion_helper_state *chs)
+{
+STATE_AO_GC(chs->ao);
+libxl__carefd *child_in = NULL, *child_out = NULL;
+int ret = 0;
+
+chs->rc = 0;
+libxl__conversion_helper_init(chs);
+
+chs->abrt.ao = chs->ao;
+chs->abrt.callback = helper_stop;
+ret = libxl__ao_abortable_register(&chs->abrt);
+if (ret) goto err;
+
+libxl__carefd_begin();
+int fds[2];
+if (libxl_pipe(CTX, fds)) {
+ret = ERROR_FAIL;
+libxl__carefd_unlock();
+goto err;
+}
+child_out = libxl__carefd_record(CTX, fds[0]);
+child_in  = libxl__carefd_record(CTX, fds[1]);
+libxl__carefd_unlock();
+
+pid_t pid = libxl__ev_child_fork(gc, &chs->child, helper_exited);
+if (!pid) {
+char * const args[] =
+{
+getenv("LIBXL_CONVERT_HELPER") ?:
+LIBEXEC_BIN "/convert-legacy-stream",
+"--in", GCSPRINTF("%d", chs->legacy_fd),
+"--out",GCSPRINTF("%d", fds[1]),
+"--width",
+#ifdef __i386__
+"32",
+#else
+"64",
+#endif
+"--guest",  chs->hvm ? "hvm" : "pv",
+"--format", "libxl",
+/* "--verbose", */
+NULL,
+};
+
+libxl_fd_set_cloexec(CTX, chs->legacy_fd, 0);
+libxl_fd_set_cloexec(CTX, libxl__carefd_fd(child_in), 0);
+
+libxl__exec(gc,
+-1, -1, -1,
+args[0], args, NULL);
+}
+
+libxl__carefd_close(child_in);
+chs->v2_carefd = child_out;
+
+assert(!ret);
+return;
+
+ err:
+assert(ret);
+helper_failed(egc, chs, ret);
+}
+
+void libxl__convert_legacy_stream_abort(libxl__egc *egc,
+libxl__conversion_helper_state *chs,
+int rc)
+{
+helper_stop(egc, &chs->abrt, rc);
+}
+
+static void helper_failed(libxl__egc *egc,
+  libxl__conversion_he

[Xen-devel] [PATCH v2 12/27] tools/python: Other migration infrastructure

2015-07-09 Thread Andrew Cooper
Contains:
 * Reverse-engineered notes of the legacy format from xg_save_restore.h
 * Python implementation of the legacy format
 * Public HVM Params used in the legacy stream
 * XL header format

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
New in v2 - removes various many magic numbers from subsequent scripts
---
 tools/python/xen/migration/legacy.py |  279 ++
 tools/python/xen/migration/public.py |   21 +++
 tools/python/xen/migration/xl.py |   12 ++
 3 files changed, 312 insertions(+)
 create mode 100644 tools/python/xen/migration/legacy.py
 create mode 100644 tools/python/xen/migration/public.py
 create mode 100644 tools/python/xen/migration/xl.py

diff --git a/tools/python/xen/migration/legacy.py 
b/tools/python/xen/migration/legacy.py
new file mode 100644
index 000..2f2240a
--- /dev/null
+++ b/tools/python/xen/migration/legacy.py
@@ -0,0 +1,279 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+Libxc legacy migration streams
+
+Documentation and record structures for legacy migration
+"""
+
+"""
+SAVE/RESTORE/MIGRATE PROTOCOL
+=
+
+The general form of a stream of chunks is a header followed by a
+body consisting of a variable number of chunks (terminated by a
+chunk with type 0) followed by a trailer.
+
+For a rolling/checkpoint (e.g. remus) migration then the body and
+trailer phases can be repeated until an external event
+(e.g. failure) causes the process to terminate and commit to the
+most recent complete checkpoint.
+
+HEADER
+--
+
+unsigned long: p2m_size
+
+extended-info (PV-only, optional):
+
+  If first unsigned long == ~0UL then extended info is present,
+  otherwise unsigned long is part of p2m. Note that p2m_size above
+  does not include the length of the extended info.
+
+  extended-info:
+
+unsigned long: signature == ~0UL
+uint32_t   : number of bytes remaining in extended-info
+
+1 or more extended-info blocks of form:
+char[4]  : block identifier
+uint32_t : block data size
+bytes: block data
+
+defined extended-info blocks:
+"vcpu" : VCPU context info containing vcpu_guest_context_t.
+   The precise variant of the context structure
+   (e.g. 32 vs 64 bit) is distinguished by
+   the block size.
+"extv"   : Presence indicates use of extended VCPU context in
+   tail, data size is 0.
+
+p2m (PV-only):
+
+  consists of p2m_size bytes comprising an array of xen_pfn_t sized entries.
+
+BODY PHASE - Format A (for live migration or Remus without compression)
+--
+
+A series of chunks with a common header:
+  int  : chunk type
+
+If the chunk type is +ve then chunk contains guest memory data, and the
+type contains the number of pages in the batch:
+
+unsigned long[]  : PFN array, length == number of pages in batch
+   Each entry consists of XEN_DOMCTL_PFINFO_*
+   in bits 31-28 and the PFN number in bits 27-0.
+page data: PAGE_SIZE bytes for each page marked present in PFN
+   array
+
+If the chunk type is -ve then chunk consists of one of a number of
+metadata types.  See definitions of XC_SAVE_ID_* below.
+
+If chunk type is 0 then body phase is complete.
+
+
+BODY PHASE - Format B (for Remus with compression)
+--
+
+A series of chunks with a common header:
+  int  : chunk type
+
+If the chunk type is +ve then chunk contains array of PFNs corresponding
+to guest memory and type contains the number of PFNs in the batch:
+
+unsigned long[]  : PFN array, length == number of pages in batch
+   Each entry consists of XEN_DOMCTL_PFINFO_*
+   in bits 31-28 and the PFN number in bits 27-0.
+
+If the chunk type is -ve then chunk consists of one of a number of
+metadata types.  See definitions of XC_SAVE_ID_* below.
+
+If the chunk type is -ve and equals XC_SAVE_ID_COMPRESSED_DATA, then the
+chunk consists of compressed page data, in the following format:
+
+unsigned long: Size of the compressed chunk to follow
+compressed data :  variable length data of size indicated above.
+   This chunk consists of compressed page data.
+   The number of pages in one chunk depends on
+   the amount of space available in the sender's
+   output buffer.
+
+Format of compressed data:
+  compressed_data = *
+  delta   = 
+  marker  = (RUNFLAG|SKIPFLAG) bitwise-or RUNLEN [1 byte marker]
+  RUNFLAG = 0
+  SKIPFLAG= 1 << 7
+  RUNLEN  = 7-bit unsigned value indicating number of WORDS in the run
+  run = string of bytes of length sizeof(WORD) * RUNLEN
+
+   If marker contains RUNFLAG, then RUNLEN * 

[Xen-devel] [PATCH v2 27/27] tools/libxl: Drop all knowledge of toolstack callbacks

2015-07-09 Thread Andrew Cooper
Libxl has now been fully adjusted not to need them.

Signed-off-by: Andrew Cooper 
Acked-by: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 tools/libxl/libxl_dom.c|1 -
 tools/libxl/libxl_internal.h   |2 --
 tools/libxl/libxl_save_callout.c   |   39 +---
 tools/libxl/libxl_save_helper.c|   29 ---
 tools/libxl/libxl_save_msgs_gen.pl |7 ++-
 5 files changed, 3 insertions(+), 75 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 9f5ddc9..3c765f4 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -2115,7 +2115,6 @@ void libxl__domain_suspend(libxl__egc *egc, 
libxl__domain_suspend_state *dss)
 callbacks->suspend = libxl__domain_suspend_callback;
 
 callbacks->switch_qemu_logdirty = 
libxl__domain_suspend_common_switch_qemu_logdirty;
-dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save;
 
 dss->sws.fd = dss->fd;
 dss->sws.ao = dss->ao;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1b62f25..13e2493 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2699,8 +2699,6 @@ _hidden void libxl__datacopier_prefixdata(libxl__egc*, 
libxl__datacopier_state*,
 
 typedef struct libxl__srm_save_callbacks {
 libxl__srm_save_autogen_callbacks a;
-int (*toolstack_save)(uint32_t domid, uint8_t **buf,
-  uint32_t *len, void *data);
 } libxl__srm_save_callbacks;
 
 typedef struct libxl__srm_restore_callbacks {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index cd18cd2..2a6662f 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -78,41 +78,12 @@ void libxl__xc_domain_restore(libxl__egc *egc, 
libxl__domain_create_state *dcs,
 void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
 {
 STATE_AO_GC(dss->ao);
-int r, rc, toolstack_data_fd = -1;
-uint32_t toolstack_data_len = 0;
-
-/* Resources we need to free */
-uint8_t *toolstack_data_buf = 0;
 
 unsigned cbflags = libxl__srm_callout_enumcallbacks_save
 (&dss->shs.callbacks.save.a);
 
-if (dss->shs.callbacks.save.toolstack_save) {
-r = dss->shs.callbacks.save.toolstack_save
-(dss->domid, &toolstack_data_buf, &toolstack_data_len, dss);
-if (r) { rc = ERROR_FAIL; goto out; }
-
-dss->shs.toolstack_data_file = tmpfile();
-if (!dss->shs.toolstack_data_file) {
-LOGE(ERROR, "cannot create toolstack data tmpfile");
-rc = ERROR_FAIL;
-goto out;
-}
-toolstack_data_fd = fileno(dss->shs.toolstack_data_file);
-
-r = libxl_write_exactly(CTX, toolstack_data_fd,
-toolstack_data_buf, toolstack_data_len,
-"toolstack data tmpfile", 0);
-if (r) { rc = ERROR_FAIL; goto out; }
-
-/* file position must be reset before passing to libxl-save-helper. */
-r = lseek(toolstack_data_fd, 0, SEEK_SET);
-if (r) { rc = ERROR_FAIL; goto out; }
-}
-
 const unsigned long argnums[] = {
 dss->domid, 0, 0, dss->xcflags, dss->hvm,
-toolstack_data_fd, toolstack_data_len,
 cbflags,
 };
 
@@ -123,18 +94,10 @@ void libxl__xc_domain_save(libxl__egc *egc, 
libxl__domain_suspend_state *dss)
 dss->shs.caller_state = dss;
 dss->shs.need_results = 0;
 
-free(toolstack_data_buf);
-
 run_helper(egc, &dss->shs, "--save-domain", dss->fd,
-   &toolstack_data_fd, 1,
+   NULL, 0,
argnums, ARRAY_SIZE(argnums));
 return;
-
- out:
-free(toolstack_data_buf);
-if (dss->shs.toolstack_data_file) fclose(dss->shs.toolstack_data_file);
-
-libxl__xc_domain_save_done(egc, dss, rc, 0, 0);
 }
 
 
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index f196786..1622bb7 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -213,32 +213,8 @@ int helper_getreply(void *user)
 
 /*- other callbacks -*/
 
-static int toolstack_save_fd;
-static uint32_t toolstack_save_len;
 static struct save_callbacks helper_save_callbacks;
 
-static int toolstack_save_cb(uint32_t domid, uint8_t **buf,
- uint32_t *len, void *data)
-{
-int r;
-
-assert(toolstack_save_fd > 0);
-
-/* This is a hack for remus */
-if (helper_save_callbacks.checkpoint) {
-r = lseek(toolstack_save_fd, 0, SEEK_SET);
-if (r) fail(errno,"rewind toolstack data tmpfile");
-}
-
-*buf = xmalloc(toolstack_save_len);
-r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
-if (r<0) fail(errno,"read toolstack data");
-if (r==0) fail(0,"read toolstack data eof");
-
-*len = toolstack_save_len;
-return 0;
-}
-
 static void startup(const char *op) {
 x

[Xen-devel] [PATCH v2 21/27] tools/libxc+libxl+xl: Save v2 streams

2015-07-09 Thread Andrew Cooper
This is a complicated set of changes which must be done together for
bisectability.

 * libxl-save-helper is updated to unconditionally use libxc migration v2.
 * libxl compatibility workarounds in libxc are disabled for save operations.
 * libxl__stream_write_start() is logically spliced into the event location
   where libxl__xc_domain_save() used to reside.
 * xl is updated to indicate that the stream is now v2

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 tools/libxc/Makefile |2 --
 tools/libxl/libxl.h  |2 ++
 tools/libxl/libxl_dom.c  |   46 +-
 tools/libxl/libxl_save_helper.c  |2 +-
 tools/libxl/libxl_stream_write.c |   35 +++--
 tools/libxl/xl_cmdimpl.c |1 +
 6 files changed, 47 insertions(+), 41 deletions(-)

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 2cd0b1a..1aec848 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -64,8 +64,6 @@ GUEST_SRCS-$(CONFIG_X86) += xc_sr_save_x86_hvm.c
 GUEST_SRCS-y += xc_sr_restore.c
 GUEST_SRCS-y += xc_sr_save.c
 GUEST_SRCS-y += xc_offline_page.c xc_compression.c
-xc_sr_save_x86_hvm.o: CFLAGS += -DXG_LIBXL_HVM_COMPAT
-xc_sr_save_x86_hvm.opic: CFLAGS += -DXG_LIBXL_HVM_COMPAT
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index e64a606..4f24e5f 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -812,6 +812,8 @@
  *
  * If this is defined, then the libxl_domain_create_restore() interface takes
  * a "stream_version" parameter and supports a value of 2.
+ *
+ * libxl_domain_suspend() will produce a v2 stream.
  */
 #define LIBXL_HAVE_STREAM_V2 1
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 8642192..de05124 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1153,6 +1153,8 @@ int libxl__toolstack_restore(uint32_t domid, const 
uint8_t *ptr,
 
 /* Domain suspend (save) */
 
+static void stream_done(libxl__egc *egc,
+libxl__stream_write_state *sws, int rc);
 static void domain_suspend_done(libxl__egc *egc,
 libxl__domain_suspend_state *dss, int rc);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
@@ -2117,50 +2119,22 @@ void libxl__domain_suspend(libxl__egc *egc, 
libxl__domain_suspend_state *dss)
 callbacks->switch_qemu_logdirty = 
libxl__domain_suspend_common_switch_qemu_logdirty;
 dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save;
 
-libxl__xc_domain_save(egc, dss);
+dss->sws.fd = dss->fd;
+dss->sws.ao = dss->ao;
+dss->sws.completion_callback = stream_done;
+
+libxl__stream_write_start(egc, &dss->sws);
 return;
 
  out:
 domain_suspend_done(egc, dss, rc);
 }
 
-void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
-int rc, int retval, int errnoval)
+static void stream_done(libxl__egc *egc,
+libxl__stream_write_state *sws, int rc)
 {
-libxl__domain_suspend_state *dss = dss_void;
-STATE_AO_GC(dss->ao);
-
-/* Convenience aliases */
-const libxl_domain_type type = dss->type;
-
-if (rc)
-goto out;
-
-if (retval) {
-LOGEV(ERROR, errnoval, "saving domain: %s",
- dss->guest_responded ?
- "domain responded to suspend request" :
- "domain did not respond to suspend request");
-if ( !dss->guest_responded )
-rc = ERROR_GUEST_TIMEDOUT;
-else if (dss->rc)
-rc = dss->rc;
-else
-rc = ERROR_FAIL;
-goto out;
-}
-
-if (type == LIBXL_DOMAIN_TYPE_HVM) {
-rc = libxl__domain_suspend_device_model(gc, dss);
-if (rc) goto out;
+libxl__domain_suspend_state *dss = CONTAINER_OF(sws, *dss, sws);
 
-libxl__domain_save_device_model(egc, dss, domain_suspend_done);
-return;
-}
-
-rc = 0;
-
-out:
 domain_suspend_done(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index efbe2eb..f196786 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -286,7 +286,7 @@ int main(int argc, char **argv)
 startup("save");
 setup_signals(save_signal_handler);
 
-r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
+r = xc_domain_save2(xch, io_fd, dom, max_iters, max_factor, flags,
&helper_save_callbacks, hvm);
 complete(r);
 
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index bf568ad..331173f 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -276,8 +276,39 @@ static void libxc_header_done(libxl__egc *egc,
 libxl__xc_domain_save(egc, dss);
 }
 
-sta

[Xen-devel] [PATCH v2 24/27] tools/libx{c, l}: Introduce restore_callbacks.checkpoint()

2015-07-09 Thread Andrew Cooper
And call it when a checkpoint record is found in the libxc stream.

Some parts of this patch have been based on patches from the COLO series.

Signed-off-by: Wen Congyang 
Signed-off-by: Yang Hongyang 
Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
v2: Borrow sufficient fragments from several COLO patches to get
BROKEN_CHANNEL and checkpoint failover to function.
---
 tools/libxc/include/xenguest.h |9 ++
 tools/libxc/xc_sr_common.h |7 +++--
 tools/libxc/xc_sr_restore.c|   53 ++--
 tools/libxl/libxl_save_msgs_gen.pl |2 +-
 4 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 7581263..8799c9f 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -102,6 +102,15 @@ struct restore_callbacks {
 int (*toolstack_restore)(uint32_t domid, const uint8_t *buf,
 uint32_t size, void* data);
 
+/* A checkpoint record has been found in the stream.
+ *
+ * returns:
+ * 0: (error)terminate processing
+ * 1: (success)  continue normally
+ * 2: (failover) failover and resume VM
+ */
+int (*checkpoint)(void* data);
+
 /* to be provided as the last argument to each callback function */
 void* data;
 };
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 08c66db..1f4d4e4 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -130,10 +130,13 @@ struct xc_sr_restore_ops
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
  *
- * @return 0 for success, -1 for failure, or the sentinel value
- * RECORD_NOT_PROCESSED.
+ * @return 0 for success, -1 for failure, or the following sentinels:
+ *  - RECORD_NOT_PROCESSED
+ *  - BROKEN_CHANNEL: under Remus/COLO, this means master may be dead, and
+ *a failover is needed.
  */
 #define RECORD_NOT_PROCESSED 1
+#define BROKEN_CHANNEL 2
 int (*process_record)(struct xc_sr_context *ctx, struct xc_sr_record *rec);
 
 /**
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 9e27dba..c9b5213 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -1,5 +1,7 @@
 #include 
 
+#include 
+
 #include "xc_sr_common.h"
 
 /*
@@ -472,7 +474,7 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 static int handle_checkpoint(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx->xch;
-int rc = 0;
+int rc = 0, ret;
 unsigned i;
 
 if ( !ctx->restore.checkpointed )
@@ -482,6 +484,21 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
 goto err;
 }
 
+ret = ctx->restore.callbacks->checkpoint(ctx->restore.callbacks->data);
+switch ( ret )
+{
+case 1: /* Success */
+break;
+
+case 2: /* Failover */
+rc = BROKEN_CHANNEL;
+goto err;
+
+default: /* Other fatal error */
+rc = -1;
+goto err;
+}
+
 if ( ctx->restore.buffer_all_records )
 {
 IPRINTF("All records buffered");
@@ -560,19 +577,6 @@ static int process_record(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 free(rec->data);
 rec->data = NULL;
 
-if ( rc == RECORD_NOT_PROCESSED )
-{
-if ( rec->type & REC_TYPE_OPTIONAL )
-DPRINTF("Ignoring optional record %#x (%s)",
-rec->type, rec_type_to_str(rec->type));
-else
-{
-ERROR("Mandatory record %#x (%s) not handled",
-  rec->type, rec_type_to_str(rec->type));
-rc = -1;
-}
-}
-
 return rc;
 }
 
@@ -678,7 +682,22 @@ static int restore(struct xc_sr_context *ctx)
 else
 {
 rc = process_record(ctx, &rec);
-if ( rc )
+if ( rc == RECORD_NOT_PROCESSED )
+{
+if ( rec.type & REC_TYPE_OPTIONAL )
+DPRINTF("Ignoring optional record %#x (%s)",
+rec.type, rec_type_to_str(rec.type));
+else
+{
+ERROR("Mandatory record %#x (%s) not handled",
+  rec.type, rec_type_to_str(rec.type));
+rc = -1;
+goto err;
+}
+}
+else if ( rc == BROKEN_CHANNEL )
+goto remus_failover;
+else if ( rc )
 goto err;
 }
 
@@ -735,6 +754,10 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, 
uint32_t dom,
 ctx.restore.checkpointed = checkpointed_stream;
 ctx.restore.callbacks = callbacks;
 
+/* Sanity checks for callbacks. */
+if ( checkpointed_stream )
+assert(callbacks->checkpoint);
+
 IPRINTF("In experimental %s", __fun

[Xen-devel] [PATCH v2 26/27] tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc

2015-07-09 Thread Andrew Cooper
Libxl has now been fully adjusted not to need it.

Signed-off-by: Andrew Cooper 
Acked-by: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 tools/libxc/xc_sr_common.h  |5 --
 tools/libxc/xc_sr_restore.c |   18 -
 tools/libxc/xc_sr_restore_x86_hvm.c |  124 ---
 tools/libxc/xc_sr_save_x86_hvm.c|   36 --
 4 files changed, 183 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 1f4d4e4..64f6082 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -309,11 +309,6 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
-
-/* #ifdef XG_LIBXL_HVM_COMPAT */
-uint32_t qlen;
-void *qbuf;
-/* #endif */
 } restore;
 };
 } x86_hvm;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index c9b5213..2f6a763 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -627,9 +627,6 @@ static void cleanup(struct xc_sr_context *ctx)
 PERROR("Failed to clean up");
 }
 
-#ifdef XG_LIBXL_HVM_COMPAT
-extern int read_qemu(struct xc_sr_context *ctx);
-#endif
 /*
  * Restore a domain.
  */
@@ -656,21 +653,6 @@ static int restore(struct xc_sr_context *ctx)
 goto err;
 }
 
-#ifdef XG_LIBXL_HVM_COMPAT
-if ( ctx->dominfo.hvm &&
- (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) )
-{
-rc = read_qemu(ctx);
-if ( rc )
-{
-if ( ctx->restore.buffer_all_records )
-goto remus_failover;
-else
-goto err;
-}
-}
-#endif
-
 if ( ctx->restore.buffer_all_records &&
  rec.type != REC_TYPE_END &&
  rec.type != REC_TYPE_CHECKPOINT )
diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c 
b/tools/libxc/xc_sr_restore_x86_hvm.c
index 6f5af0e..49d22c7 100644
--- a/tools/libxc/xc_sr_restore_x86_hvm.c
+++ b/tools/libxc/xc_sr_restore_x86_hvm.c
@@ -3,24 +3,6 @@
 
 #include "xc_sr_common_x86.h"
 
-#ifdef XG_LIBXL_HVM_COMPAT
-static int handle_toolstack(struct xc_sr_context *ctx, struct xc_sr_record 
*rec)
-{
-xc_interface *xch = ctx->xch;
-int rc;
-
-if ( !ctx->restore.callbacks || !ctx->restore.callbacks->toolstack_restore 
)
-return 0;
-
-rc = ctx->restore.callbacks->toolstack_restore(
-ctx->domid, rec->data, rec->length, ctx->restore.callbacks->data);
-
-if ( rc < 0 )
-PERROR("restoring toolstack");
-return rc;
-}
-#endif
-
 /*
  * Process an HVM_CONTEXT record from the stream.
  */
@@ -93,98 +75,6 @@ static int handle_hvm_params(struct xc_sr_context *ctx,
 return 0;
 }
 
-#ifdef XG_LIBXL_HVM_COMPAT
-int read_qemu(struct xc_sr_context *ctx);
-int read_qemu(struct xc_sr_context *ctx)
-{
-xc_interface *xch = ctx->xch;
-char qemusig[21];
-uint32_t qlen;
-void *qbuf = NULL;
-int rc = -1;
-
-if ( read_exact(ctx->fd, qemusig, sizeof(qemusig)) )
-{
-PERROR("Error reading QEMU signature");
-goto out;
-}
-
-if ( !memcmp(qemusig, "DeviceModelRecord0002", sizeof(qemusig)) )
-{
-if ( read_exact(ctx->fd, &qlen, sizeof(qlen)) )
-{
-PERROR("Error reading QEMU record length");
-goto out;
-}
-
-qbuf = malloc(qlen);
-if ( !qbuf )
-{
-PERROR("no memory for device model state");
-goto out;
-}
-
-if ( read_exact(ctx->fd, qbuf, qlen) )
-{
-PERROR("Error reading device model state");
-goto out;
-}
-}
-else
-{
-ERROR("Invalid device model state signature '%*.*s'",
-  (int)sizeof(qemusig), (int)sizeof(qemusig), qemusig);
-goto out;
-}
-
-/* With Remus, this could be read many times */
-if ( ctx->x86_hvm.restore.qbuf )
-free(ctx->x86_hvm.restore.qbuf);
-ctx->x86_hvm.restore.qbuf = qbuf;
-ctx->x86_hvm.restore.qlen = qlen;
-rc = 0;
-
-out:
-if (rc)
-free(qbuf);
-return rc;
-}
-
-static int handle_qemu(struct xc_sr_context *ctx)
-{
-xc_interface *xch = ctx->xch;
-char path[256];
-uint32_t qlen = ctx->x86_hvm.restore.qlen;
-void *qbuf = ctx->x86_hvm.restore.qbuf;
-int rc = -1;
-FILE *fp = NULL;
-
-sprintf(path, XC_DEVICE_MODEL_RESTORE_FILE".%u", ctx->domid);
-fp = fopen(path, "wb");
-if ( !fp )
-{
-PERROR("Failed to open '%s' for writing", path);
-goto out;
-}
-
-DPRINTF("Writing %u bytes of QEMU data", qlen);
-if ( fwrite(qbuf, 1, qlen, fp) != qlen )
-{
-PERROR("Failed to write %u bytes of QEMU data", qlen);
-goto out;
-}
-
-rc = 0;
-
- out:
-if ( fp )
-fclose(fp);
-free(qbuf);
-
-return 

[Xen-devel] [PATCH v2 23/27] tools/libxl: Write checkpoint records into the stream

2015-07-09 Thread Andrew Cooper
when signalled to do so by libxl__remus_domain_checkpoint_callback()

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
This patch has changed substantially in v2 as a result of changes earlier in
the series.  No behavioural difference from v1.
---
 tools/libxl/libxl_dom.c  |   18 -
 tools/libxl/libxl_internal.h |7 
 tools/libxl/libxl_stream_write.c |   80 --
 3 files changed, 91 insertions(+), 14 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index de05124..9f5ddc9 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1937,8 +1937,8 @@ static void remus_devices_preresume_cb(libxl__egc *egc,
 
 /*- remus asynchronous checkpoint callback -*/
 
-static void remus_checkpoint_dm_saved(libxl__egc *egc,
-  libxl__domain_suspend_state *dss, int 
rc);
+static void remus_checkpoint_stream_written(
+libxl__egc *egc, libxl__stream_write_state *sws, int rc);
 static void remus_devices_commit_cb(libxl__egc *egc,
 libxl__remus_devices_state *rds,
 int rc);
@@ -1953,17 +1953,14 @@ static void 
libxl__remus_domain_checkpoint_callback(void *data)
 libxl__egc *egc = dss->shs.egc;
 STATE_AO_GC(dss->ao);
 
-/* This would go into tailbuf. */
-if (dss->hvm) {
-libxl__domain_save_device_model(egc, dss, remus_checkpoint_dm_saved);
-} else {
-remus_checkpoint_dm_saved(egc, dss, 0);
-}
+libxl__stream_write_start_checkpoint(egc, &dss->sws);
 }
 
-static void remus_checkpoint_dm_saved(libxl__egc *egc,
-  libxl__domain_suspend_state *dss, int rc)
+static void remus_checkpoint_stream_written(
+libxl__egc *egc, libxl__stream_write_state *sws, int rc)
 {
+libxl__domain_suspend_state *dss = CONTAINER_OF(sws, *dss, sws);
+
 /* Convenience aliases */
 libxl__remus_devices_state *const rds = &dss->rds;
 
@@ -2113,6 +2110,7 @@ void libxl__domain_suspend(libxl__egc *egc, 
libxl__domain_suspend_state *dss)
 callbacks->suspend = libxl__remus_domain_suspend_callback;
 callbacks->postcopy = libxl__remus_domain_resume_callback;
 callbacks->checkpoint = libxl__remus_domain_checkpoint_callback;
+dss->sws.checkpoint_callback = remus_checkpoint_stream_written;
 } else
 callbacks->suspend = libxl__domain_suspend_callback;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2beb534..84e22c2 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2987,9 +2987,13 @@ struct libxl__stream_write_state {
 void (*completion_callback)(libxl__egc *egc,
 libxl__stream_write_state *sws,
 int rc);
+void (*checkpoint_callback)(libxl__egc *egc,
+libxl__stream_write_state *sws,
+int rc);
 /* Private */
 int rc;
 bool running;
+bool in_checkpoint;
 
 /* Active-stuff handling */
 int joined_rc;
@@ -3009,6 +3013,9 @@ struct libxl__stream_write_state {
 _hidden void libxl__stream_write_start(libxl__egc *egc,
libxl__stream_write_state *stream);
 
+_hidden void libxl__stream_write_start_checkpoint(
+libxl__egc *egc, libxl__stream_write_state *stream);
+
 _hidden void libxl__stream_write_abort(libxl__egc *egc,
libxl__stream_write_state *stream,
int rc);
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index 331173f..0f5216b 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -23,6 +23,9 @@
  *  - libxl__stream_write_start()
  * - Start writing a stream from the start.
  *
+ *  - libxl__stream_write_start()
+ * - Write the records which form a checkpoint into a stream.
+ *
  * In normal operation, there are two tasks running at once; this stream
  * processing, and the libxl-save-helper.  check_stream_finished() is used to
  * join all the tasks in both success and error cases.
@@ -39,6 +42,12 @@
  *  - Toolstack record
  *  - if (hvm), Qemu record
  *  - End record
+ *
+ * For checkpointed stream, there is a second loop which is triggered by a
+ * save-helper checkpoint callback.  It writes:
+ *  - Toolstack record
+ *  - if (hvm), Qemu record
+ *  - Checkpoint end record
  */
 
 static void stream_success(libxl__egc *egc,
@@ -73,6 +82,15 @@ static void emulator_record_done(libxl__egc *egc,
 static void write_end_record(libxl__egc *egc,
  libxl__stream_write_state *stream);
 
+/* Event callbacks unique to checkpointed streams. */
+static void checkpoint_done(libxl__egc *egc,
+libxl__stream_write_state *stream,
+   

[Xen-devel] [PATCH v2 14/27] tools/python: Conversion utility for legacy migration streams

2015-07-09 Thread Andrew Cooper
This utility will take a legacy stream as in input, and produce a v2 stream as
an output.  It is exec()'d by libxl to provide backwards compatibility.

Signed-off-by: Andrew Cooper 
Acked-by: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 tools/python/Makefile  |4 +
 tools/python/scripts/convert-legacy-stream |  678 
 2 files changed, 682 insertions(+)
 create mode 100755 tools/python/scripts/convert-legacy-stream

diff --git a/tools/python/Makefile b/tools/python/Makefile
index e933be8..df942a7 100644
--- a/tools/python/Makefile
+++ b/tools/python/Makefile
@@ -17,9 +17,13 @@ build: genwrap.py $(XEN_ROOT)/tools/libxl/libxl_types.idl \
 
 .PHONY: install
 install:
+   $(INSTALL_DIR) $(DESTDIR)$(PRIVATE_BINDIR)
+
CC="$(CC)" CFLAGS="$(PY_CFLAGS)" $(PYTHON) setup.py install \
$(PYTHON_PREFIX_ARG) --root="$(DESTDIR)" --force
 
+   $(INSTALL_PROG) scripts/convert-legacy-stream 
$(DESTDIR)$(PRIVATE_BINDIR)
+
 .PHONY: test
 test:
export LD_LIBRARY_PATH=$$(readlink -f ../libxc):$$(readlink -f 
../xenstore); $(PYTHON) test.py -b -u
diff --git a/tools/python/scripts/convert-legacy-stream 
b/tools/python/scripts/convert-legacy-stream
new file mode 100755
index 000..d54fa22
--- /dev/null
+++ b/tools/python/scripts/convert-legacy-stream
@@ -0,0 +1,678 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+Convert a legacy migration stream to a v2 stream.
+"""
+
+import sys
+import os, os.path
+import syslog
+import traceback
+
+from struct import calcsize, unpack, pack
+
+from xen.migration import legacy, public, libxc, libxl, xl
+
+__version__ = 1
+
+fin = None # Input file/fd
+fout = None# Output file/fd
+twidth = 0 # Legacy toolstack bitness (32 or 64)
+pv = None  # Boolean (pv or hvm)
+qemu = True# Boolean - process qemu record?
+log_to_syslog = False  # Boolean - Log to syslog instead of stdout/err?
+verbose = False# Boolean - Summarise stream contents
+
+def stream_read(_ = None):
+"""Read from the input"""
+return fin.read(_)
+
+def stream_write(_):
+"""Write to the output"""
+return fout.write(_)
+
+def info(msg):
+"""Info message, routed to appropriate destination"""
+if verbose:
+if log_to_syslog:
+for line in msg.split("\n"):
+syslog.syslog(syslog.LOG_INFO, line)
+else:
+print msg
+
+def err(msg):
+"""Error message, routed to appropriate destination"""
+if log_to_syslog:
+for line in msg.split("\n"):
+syslog.syslog(syslog.LOG_ERR, line)
+print >> sys.stderr, msg
+
+class StreamError(StandardError):
+"""Error with the incoming migration stream"""
+pass
+
+class VM(object):
+"""Container of VM parameters"""
+
+def __init__(self, fmt):
+# Common
+self.p2m_size = 0
+
+# PV
+self.max_vcpu_id = 0
+self.online_vcpu_map = []
+self.width = 0
+self.levels = 0
+self.basic_len = 0
+self.extd = False
+self.xsave_len = 0
+
+# libxl
+self.libxl = fmt == "libxl"
+self.xenstore = [] # Deferred "toolstack" records
+
+def write_libxc_ihdr():
+stream_write(pack(libxc.IHDR_FORMAT,
+  libxc.IHDR_MARKER,  # Marker
+  libxc.IHDR_IDENT,   # Ident
+  libxc.IHDR_VERSION, # Version
+  libxc.IHDR_OPT_LE,  # Options
+  0, 0))  # Reserved
+
+def write_libxc_dhdr():
+if pv:
+dtype = libxc.DHDR_TYPE_x86_pv
+else:
+dtype = libxc.DHDR_TYPE_x86_hvm
+
+stream_write(pack(libxc.DHDR_FORMAT,
+  dtype,# Type
+  12,   # Page size
+  0,# Reserved
+  0,# Xen major (converted)
+  __version__)) # Xen minor (converted)
+
+def write_libxl_hdr():
+stream_write(pack(libxl.HDR_FORMAT,
+  libxl.HDR_IDENT, # Ident
+  libxl.HDR_VERSION,   # Version 2
+  libxl.HDR_OPT_LE |   # Options
+  libxl.HDR_OPT_LEGACY # Little Endian and Legacy
+  ))
+
+def write_record(rt, *argl):
+alldata = ''.join(argl)
+length = len(alldata)
+
+record = pack(libxc.RH_FORMAT, rt, length) + alldata
+plen = (8 - (length & 7)) & 7
+record += '\x00' * plen
+
+stream_write(record)
+
+def write_libxc_pv_info(vm):
+write_record(libxc.REC_TYPE_x86_pv_info,
+ pack(libxc.X86_PV_INFO_FORMAT,
+  vm.width, vm.levels, 0, 0))
+
+def write_libxc_pv_p2m_frames(vm, pfns):
+write_record(libxc.REC_TYPE_x86_pv_p2m_frames,
+ pack(libxc.X86_PV_P2M_FRAMES_FORMAT,
+  0, vm.p2m_size - 1),
+ pack("Q" * len(pfns), *pf

[Xen-devel] [PATCH v2 20/27] tools/libxl: Infrastructure for writing a v2 stream

2015-07-09 Thread Andrew Cooper
From: Ross Lagerwall 

This contains the event machinary and state machines to write non-checkpointed
migration v2 stream (with the exception of the xc_domain_save() handling which
is spliced later in a bisectable way).

Signed-off-by: Ross Lagerwall 
Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
As with the read side of things, this has undergone substantial changes in v2.
---
 tools/libxl/Makefile |2 +-
 tools/libxl/libxl_internal.h |   47 
 tools/libxl/libxl_stream_write.c |  451 ++
 3 files changed, 499 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxl/libxl_stream_write.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 0ebc35a..7d44483 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -95,7 +95,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o 
libxl_pci.o \
libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
libxl_internal.o libxl_utils.o libxl_uuid.o \
libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o 
\
-   libxl_stream_read.o \
+   libxl_stream_read.o libxl_stream_write.o \
libxl_save_callout.o _libxl_save_msgs_callout.o \
libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1cf1884..2beb534 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2973,6 +2973,52 @@ typedef void libxl__domain_suspend_cb(libxl__egc*,
 typedef void libxl__save_device_model_cb(libxl__egc*,
  libxl__domain_suspend_state*, int rc);
 
+/* State for writing a libxl migration v2 stream */
+typedef struct libxl__stream_write_state libxl__stream_write_state;
+
+typedef void (*sws_record_done_cb)(libxl__egc *egc,
+   libxl__stream_write_state *sws);
+
+struct libxl__stream_write_state {
+/* filled by the user */
+libxl__ao *ao;
+int fd;
+uint32_t domid;
+void (*completion_callback)(libxl__egc *egc,
+libxl__stream_write_state *sws,
+int rc);
+/* Private */
+int rc;
+bool running;
+
+/* Active-stuff handling */
+int joined_rc;
+
+/* Main stream-writing data */
+size_t padding;
+libxl__datacopier_state dc;
+sws_record_done_cb record_done_callback;
+
+/* Emulator blob handling */
+libxl__datacopier_state emu_dc;
+libxl__carefd *emu_carefd;
+libxl__sr_rec_hdr emu_rec_hdr;
+void *emu_body;
+};
+
+_hidden void libxl__stream_write_start(libxl__egc *egc,
+   libxl__stream_write_state *stream);
+
+_hidden void libxl__stream_write_abort(libxl__egc *egc,
+   libxl__stream_write_state *stream,
+   int rc);
+
+static inline bool libxl__stream_write_inuse(
+const libxl__stream_write_state *stream)
+{
+return stream->running;
+}
+
 typedef struct libxl__logdirty_switch {
 const char *cmd;
 const char *cmd_path;
@@ -3013,6 +3059,7 @@ struct libxl__domain_suspend_state {
 /* private for libxl__domain_save_device_model */
 libxl__save_device_model_cb *save_dm_callback;
 libxl__datacopier_state save_dm_datacopier;
+libxl__stream_write_state sws;
 };
 
 
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
new file mode 100644
index 000..bf568ad
--- /dev/null
+++ b/tools/libxl/libxl_stream_write.c
@@ -0,0 +1,451 @@
+/*
+ * Copyright (C) 2015  Citrix Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*
+ * Infrastructure for writing a domain to a libxl migration v2 stream.
+ *
+ * Entry points from outside:
+ *  - libxl__stream_write_start()
+ * - Start writing a stream from the start.
+ *
+ * In normal operation, there are two tasks running at once; this stream
+ * processing, and the libxl-save-helper.  check_stream_finished() is used to
+ * join all the tasks in both success and error cases.
+ *
+ * Nomenclature for event callbacks:
+ *  - $FOO_done(): Completion callback for $FOO
+ *  - write_$FOO(): Set up the datacop

[Xen-devel] [PATCH v2 10/27] tools/python: Libxc migration v2 infrastructure

2015-07-09 Thread Andrew Cooper
Contains:
 * Python implementation of the libxc migration v2 records
 * Verification code for spec compliance
 * Unit tests

Signed-off-by: Andrew Cooper 
Acked-by: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 tools/python/setup.py|1 +
 tools/python/xen/migration/libxc.py  |  446 ++
 tools/python/xen/migration/tests.py  |   41 
 tools/python/xen/migration/verify.py |   37 +++
 4 files changed, 525 insertions(+)
 create mode 100644 tools/python/xen/migration/__init__.py
 create mode 100644 tools/python/xen/migration/libxc.py
 create mode 100644 tools/python/xen/migration/tests.py
 create mode 100644 tools/python/xen/migration/verify.py

diff --git a/tools/python/setup.py b/tools/python/setup.py
index 439c429..5bf81be 100644
--- a/tools/python/setup.py
+++ b/tools/python/setup.py
@@ -43,6 +43,7 @@ setup(name= 'xen',
   version = '3.0',
   description = 'Xen',
   packages= ['xen',
+ 'xen.migration',
  'xen.lowlevel',
 ],
   ext_package = "xen.lowlevel",
diff --git a/tools/python/xen/migration/__init__.py 
b/tools/python/xen/migration/__init__.py
new file mode 100644
index 000..e69de29
diff --git a/tools/python/xen/migration/libxc.py 
b/tools/python/xen/migration/libxc.py
new file mode 100644
index 000..b0255ac
--- /dev/null
+++ b/tools/python/xen/migration/libxc.py
@@ -0,0 +1,446 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+Libxc Migration v2 streams
+
+Record structures as per docs/specs/libxc-migration-stream.pandoc, and
+verification routines.
+"""
+
+import sys
+
+from struct import calcsize, unpack
+
+from xen.migration.verify import StreamError, RecordError, VerifyBase
+
+# Image Header
+IHDR_FORMAT = "!QIIHHI"
+
+IHDR_MARKER  = 0x
+IHDR_IDENT   = 0x58454E46 # "XENF" in ASCII
+IHDR_VERSION = 2
+
+IHDR_OPT_BIT_ENDIAN = 0
+IHDR_OPT_LE = (0 << IHDR_OPT_BIT_ENDIAN)
+IHDR_OPT_BE = (1 << IHDR_OPT_BIT_ENDIAN)
+
+IHDR_OPT_RESZ_MASK = 0xfffe
+
+# Domain Header
+DHDR_FORMAT = "IHHII"
+
+DHDR_TYPE_x86_pv  = 0x0001
+DHDR_TYPE_x86_hvm = 0x0002
+DHDR_TYPE_x86_pvh = 0x0003
+DHDR_TYPE_arm = 0x0004
+
+dhdr_type_to_str = {
+DHDR_TYPE_x86_pv  : "x86 PV",
+DHDR_TYPE_x86_hvm : "x86 HVM",
+DHDR_TYPE_x86_pvh : "x86 PVH",
+DHDR_TYPE_arm : "ARM",
+}
+
+# Records
+RH_FORMAT = "II"
+
+REC_TYPE_end  = 0x
+REC_TYPE_page_data= 0x0001
+REC_TYPE_x86_pv_info  = 0x0002
+REC_TYPE_x86_pv_p2m_frames= 0x0003
+REC_TYPE_x86_pv_vcpu_basic= 0x0004
+REC_TYPE_x86_pv_vcpu_extended = 0x0005
+REC_TYPE_x86_pv_vcpu_xsave= 0x0006
+REC_TYPE_shared_info  = 0x0007
+REC_TYPE_tsc_info = 0x0008
+REC_TYPE_hvm_context  = 0x0009
+REC_TYPE_hvm_params   = 0x000a
+REC_TYPE_toolstack= 0x000b
+REC_TYPE_x86_pv_vcpu_msrs = 0x000c
+REC_TYPE_verify   = 0x000d
+REC_TYPE_checkpoint   = 0x000e
+
+rec_type_to_str = {
+REC_TYPE_end  : "End",
+REC_TYPE_page_data: "Page data",
+REC_TYPE_x86_pv_info  : "x86 PV info",
+REC_TYPE_x86_pv_p2m_frames: "x86 PV P2M frames",
+REC_TYPE_x86_pv_vcpu_basic: "x86 PV vcpu basic",
+REC_TYPE_x86_pv_vcpu_extended : "x86 PV vcpu extended",
+REC_TYPE_x86_pv_vcpu_xsave: "x86 PV vcpu xsave",
+REC_TYPE_shared_info  : "Shared info",
+REC_TYPE_tsc_info : "TSC info",
+REC_TYPE_hvm_context  : "HVM context",
+REC_TYPE_hvm_params   : "HVM params",
+REC_TYPE_toolstack: "Toolstack",
+REC_TYPE_x86_pv_vcpu_msrs : "x86 PV vcpu msrs",
+REC_TYPE_verify   : "Verify",
+REC_TYPE_checkpoint   : "Checkpoint",
+}
+
+# page_data
+PAGE_DATA_FORMAT = "II"
+PAGE_DATA_PFN_MASK   = (1L << 52) - 1
+PAGE_DATA_PFN_RESZ_MASK  = ((1L << 60) - 1) & ~((1L << 52) - 1)
+
+# flags from xen/public/domctl.h: XEN_DOMCTL_PFINFO_* shifted by 32 bits
+PAGE_DATA_TYPE_SHIFT = 60
+PAGE_DATA_TYPE_LTABTYPE_MASK = (0x7L << PAGE_DATA_TYPE_SHIFT)
+PAGE_DATA_TYPE_LTAB_MASK = (0xfL << PAGE_DATA_TYPE_SHIFT)
+PAGE_DATA_TYPE_LPINTAB   = (0x8L << PAGE_DATA_TYPE_SHIFT) # Pinned 
pagetable
+
+PAGE_DATA_TYPE_NOTAB = (0x0L << PAGE_DATA_TYPE_SHIFT) # Regular page
+PAGE_DATA_TYPE_L1TAB = (0x1L << PAGE_DATA_TYPE_SHIFT) # L1 pagetable
+PAGE_DATA_TYPE_L2TAB = (0x2L << PAGE_DATA_TYPE_SHIFT) # L2 pagetable
+PAGE_DATA_TYPE_L3TAB = (0x3L << PAGE_DATA_TYPE_SHIFT) # L3 pagetable
+PAGE_DATA_TYPE_L4TAB = (0x4L << PAGE_DATA_TYPE_SHIFT) # L4 pagetable
+PAGE_DATA_TYPE_BROKEN= (0xdL << PAGE_DATA_TYPE_SHIFT) # Broken
+PAGE_DATA_TYPE_XALLOC= (0xeL << PAGE_DATA_TYPE_SHIFT) # Allocate-only
+PAGE_DATA_TYPE_XTAB  = (0x

[Xen-devel] [PATCH v2 18/27] tools/libxl: Convert a legacy stream if needed

2015-07-09 Thread Andrew Cooper
For backwards compatibility, a legacy stream needs converting before it can be
read by the v2 stream logic.

This causes the v2 stream logic to need to juggle two parallel tasks.
check_stream_finished() is introduced for the purpose of joining the tasks in
both success and error cases.

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 tools/libxl/libxl_internal.h|7 +++
 tools/libxl/libxl_stream_read.c |   98 ++-
 2 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 68e7f02..1cf1884 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3274,6 +3274,7 @@ struct libxl__stream_read_state {
 /* filled by the user */
 libxl__ao *ao;
 int fd;
+bool legacy;
 void (*completion_callback)(libxl__egc *egc,
 libxl__stream_read_state *srs,
 int rc);
@@ -3281,6 +3282,12 @@ struct libxl__stream_read_state {
 int rc;
 bool running;
 
+/* Active-stuff handling */
+int joined_rc;
+
+/* Conversion helper */
+libxl__carefd *v2_carefd;
+
 /* Main stream-reading data */
 libxl__datacopier_state dc;
 libxl__sr_hdr hdr;
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 6f5d572..3011820 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -57,6 +57,10 @@
  *- stream_write_emulator()
  *- stream_write_emulator_done()
  *- stream_continue()
+ *
+ * Depending on the contents of the stream, there are likely to be several
+ * parallel tasks being managed.  check_stream_finished() is used to join all
+ * tasks in both success and error cases.
  */
 
 /* Stream error/success handling. */
@@ -67,6 +71,16 @@ static void stream_failed(libxl__egc *egc,
 static void stream_done(libxl__egc *egc,
 libxl__stream_read_state *stream);
 
+/* Stream other-active-stuff handling. */
+/* libxl__xc_domain_restore_done() is logically grouped here. */
+#if defined(__x86_64__) || defined(__i386__)
+static void conversion_done(libxl__egc *egc,
+libxl__conversion_helper_state *chs, int rc);
+#endif
+static void check_stream_finished(libxl__egc *egc,
+  libxl__stream_read_state *stream,
+  int rc, const char *what);
+
 /* Event callbacks for main reading loop. */
 static void stream_header_done(libxl__egc *egc,
libxl__datacopier_state *dc,
@@ -112,12 +126,40 @@ static int setup_read(libxl__stream_read_state *stream,
 void libxl__stream_read_start(libxl__egc *egc,
   libxl__stream_read_state *stream)
 {
+libxl__domain_create_state *dcs = CONTAINER_OF(stream, *dcs, srs);
 libxl__datacopier_state *dc = &stream->dc;
+STATE_AO_GC(stream->ao);
 int ret = 0;
 
 /* State initialisation. */
 assert(!stream->running);
 
+/*
+ * Initialise other moving parts so stream_check_finished() can correctly
+ * work out whether to tear them down.
+ */
+libxl__conversion_helper_init(&dcs->chs);
+
+#if defined(__x86_64__) || defined(__i386__)
+if (stream->legacy) {
+/* Convert a legacy stream, if needed. */
+dcs->chs.ao = stream->ao;
+dcs->chs.legacy_fd = stream->fd;
+dcs->chs.hvm =
+(dcs->guest_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM);
+dcs->chs.v2_carefd = NULL;
+dcs->chs.completion_callback = conversion_done;
+
+libxl__convert_legacy_stream(egc, &dcs->chs);
+
+assert(dcs->chs.v2_carefd);
+stream->v2_carefd = dcs->chs.v2_carefd;
+dcs->libxc_fd = stream->fd = libxl__carefd_fd(dcs->chs.v2_carefd);
+}
+#endif
+
+/* stream->fd is now guarenteed to be a v2 stream. */
+
 memset(dc, 0, sizeof(*dc));
 dc->ao = stream->ao;
 dc->readfd = stream->fd;
@@ -183,7 +225,50 @@ static void stream_done(libxl__egc *egc,
 free(rec);
 }
 
-stream->completion_callback(egc, stream, stream->rc);
+if (stream->v2_carefd)
+libxl__carefd_close(stream->v2_carefd);
+
+check_stream_finished(egc, stream, stream->rc, "stream");
+}
+
+static void check_stream_finished(libxl__egc *egc,
+  libxl__stream_read_state *stream,
+  int rc, const char *what)
+{
+libxl__domain_create_state *dcs = CONTAINER_OF(stream, *dcs, srs);
+STATE_AO_GC(stream->ao);
+
+LOG(DEBUG, "Task '%s' joining (rc %d)", what, rc);
+
+if (rc && !stream->joined_rc) {
+bool skip = false;
+/* First reported failure from joining tasks.  Tear everything down */
+stream->joined_rc = rc;
+
+if (libxl__stream_read_inuse(&dcs->srs)) {
+skip = true;
+libxl__stream_read_abort(egc, &dcs

[Xen-devel] [PATCH v2 15/27] tools/libxl: Migration v2 stream format

2015-07-09 Thread Andrew Cooper
From: Ross Lagerwall 

C structures describing the Libxl migration v2 stream format

Signed-off-by: Ross Lagerwall 
Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
v2: Move into libxl__ namespace
---
 tools/libxl/libxl_sr_stream_format.h |   57 ++
 1 file changed, 57 insertions(+)
 create mode 100644 tools/libxl/libxl_sr_stream_format.h

diff --git a/tools/libxl/libxl_sr_stream_format.h 
b/tools/libxl/libxl_sr_stream_format.h
new file mode 100644
index 000..f4f790b
--- /dev/null
+++ b/tools/libxl/libxl_sr_stream_format.h
@@ -0,0 +1,57 @@
+#ifndef LIBXL__SR_STREAM_FORMAT_H
+#define LIBXL__SR_STREAM_FORMAT_H
+
+/*
+ * C structures for the Migration v2 stream format.
+ * See docs/specs/libxl-migration-stream.pandoc
+ */
+
+#include 
+
+typedef struct libxl__sr_hdr
+{
+uint64_t ident;
+uint32_t version;
+uint32_t options;
+} libxl__sr_hdr;
+
+#define RESTORE_STREAM_IDENT 0x4c6962786c466d74UL
+#define RESTORE_STREAM_VERSION   0x0002U
+
+#define RESTORE_OPT_BIG_ENDIAN   (1u << 0)
+#define RESTORE_OPT_LEGACY   (1u << 1)
+
+
+typedef struct libxl__sr_rec_hdr
+{
+uint32_t type;
+uint32_t length;
+} libxl__sr_rec_hdr;
+
+/* All records must be aligned up to an 8 octet boundary */
+#define REC_ALIGN_ORDER  3U
+
+#define REC_TYPE_END 0xU
+#define REC_TYPE_LIBXC_CONTEXT   0x0001U
+#define REC_TYPE_XENSTORE_DATA   0x0002U
+#define REC_TYPE_EMULATOR_CONTEXT0x0003U
+
+typedef struct libxl__sr_emulator_hdr
+{
+uint32_t id;
+uint32_t index;
+} libxl__sr_emulator_hdr;
+
+#define EMULATOR_UNKNOWN 0xU
+#define EMULATOR_QEMU_TRADITIONAL0x0001U
+#define EMULATOR_QEMU_UPSTREAM   0x0002U
+
+#endif /* LIBXL__SR_STREAM_FORMAT_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 13/27] tools/python: Verification utility for v2 stream spec compliance

2015-07-09 Thread Andrew Cooper
Signed-off-by: Andrew Cooper 
Acked-by: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
This is exceedingly useful for development, but not of practical use being
installed into a production dom0.
---
 tools/python/scripts/verify-stream-v2 |  174 +
 1 file changed, 174 insertions(+)
 create mode 100755 tools/python/scripts/verify-stream-v2

diff --git a/tools/python/scripts/verify-stream-v2 
b/tools/python/scripts/verify-stream-v2
new file mode 100755
index 000..3daf257
--- /dev/null
+++ b/tools/python/scripts/verify-stream-v2
@@ -0,0 +1,174 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+""" Verify a v2 format migration stream """
+
+import sys
+import struct
+import os, os.path
+import syslog
+import traceback
+
+from xen.migration.verify import StreamError, RecordError
+from xen.migration.libxc import VerifyLibxc
+from xen.migration.libxl import VerifyLibxl
+
+fin = None # Input file/fd
+log_to_syslog = False  # Boolean - Log to syslog instead of stdout/err?
+verbose = False# Boolean - Summarise stream contents
+quiet = False  # Boolean - Suppress error printing
+
+def info(msg):
+"""Info message, routed to appropriate destination"""
+if not quiet and verbose:
+if log_to_syslog:
+for line in msg.split("\n"):
+syslog.syslog(syslog.LOG_INFO, line)
+else:
+print msg
+
+def err(msg):
+"""Error message, routed to appropriate destination"""
+if not quiet:
+if log_to_syslog:
+for line in msg.split("\n"):
+syslog.syslog(syslog.LOG_ERR, line)
+print >> sys.stderr, msg
+
+def stream_read(_ = None):
+"""Read from input"""
+return fin.read(_)
+
+def rdexact(nr_bytes):
+"""Read exactly nr_bytes from fin"""
+_ = stream_read(nr_bytes)
+if len(_) != nr_bytes:
+raise IOError("Stream truncated")
+return _
+
+def unpack_exact(fmt):
+"""Unpack a format from fin"""
+sz = struct.calcsize(fmt)
+return struct.unpack(fmt, rdexact(sz))
+
+
+def skip_xl_header():
+"""Skip over an xl header in the stream"""
+
+hdr = rdexact(32)
+if hdr != "Xen saved domain, xl format\n \0 \r":
+raise StreamError("No xl header")
+
+_, mflags, _, optlen = unpack_exact("=")
+_ = rdexact(optlen)
+
+info("Processed xl header")
+
+if mflags & 2: # XL_MANDATORY_FLAG_STREAMv2
+return "libxl"
+else:
+return "libxc"
+
+def read_stream(fmt):
+""" Read an entire stream """
+
+try:
+if fmt == "xl":
+fmt = skip_xl_header()
+
+if fmt == "libxc":
+VerifyLibxc(info, stream_read).verify()
+else:
+VerifyLibxl(info, stream_read).verify()
+
+except (IOError, StreamError, RecordError):
+err("Stream Error:")
+err(traceback.format_exc())
+return 1
+
+except StandardError:
+err("Script Error:")
+err(traceback.format_exc())
+err("Please fix me")
+return 2
+
+return 0
+
+def open_file_or_fd(val, mode, buffering):
+"""
+If 'val' looks like a decimal integer, open it as an fd.  If not, try to
+open it as a regular file.
+"""
+
+fd = -1
+try:
+# Does it look like an integer?
+try:
+fd = int(val, 10)
+except ValueError:
+pass
+
+# Try to open it...
+if fd != -1:
+return os.fdopen(fd, mode, buffering)
+else:
+return open(val, mode, buffering)
+
+except StandardError, e:
+if fd != -1:
+err("Unable to open fd %d: %s: %s" %
+(fd, e.__class__.__name__, e))
+else:
+err("Unable to open file '%s': %s: %s" %
+(val, e.__class__.__name__, e))
+
+raise SystemExit(2)
+
+def main():
+""" main """
+from optparse import OptionParser
+global fin, quiet, verbose
+
+# Change stdout to be line-buffered.
+sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 1)
+
+parser = OptionParser(usage = "%prog [options]",
+  description =
+  "Verify a stream according to the v2 spec")
+
+# Optional options
+parser.add_option("-i", "--in", dest = "fin", metavar = "",
+  default = "0",
+  help = "Stream to verify (defaults to stdin)")
+parser.add_option("-v", "--verbose", action = "store_true", default = 
False,
+  help = "Summarise stream contents")
+parser.add_option("-q", "--quiet", action = "store_true", default = False,
+  help = "Suppress all logging/errors")
+parser.add_option("-f", "--format", dest = "format",
+  metavar = "", default = "libxc",
+  choices = ["libxc", "libxl", "xl"],
+  help = "Format of the incoming stream (defaults to 
libxc)")
+parser.add_op

[Xen-devel] [PATCH v2 25/27] tools/libxl: Handle checkpoint records in a libxl migration v2 stream

2015-07-09 Thread Andrew Cooper
This is the final bit of untangling for Remus.

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
As before, Remus functionality is untested, but the new logic here should
handle failovers correctly.  The patch has changed greatly from v1, both in a
functional sence, and because of the knockon effects from earlier changes.
---
 tools/libxl/libxl_create.c  |   27 +++
 tools/libxl/libxl_internal.h|8 
 tools/libxl/libxl_stream_read.c |   97 +++
 3 files changed, 132 insertions(+)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 2a0063a..0325bf1 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -672,6 +672,29 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid,
 libxl_device_model_version_to_string(b_info->device_model_version));
 }
 
+/*- remus asynchronous checkpoint callback -*/
+
+static void remus_checkpoint_stream_done(
+libxl__egc *egc, libxl__stream_read_state *srs, int rc);
+
+static void libxl__remus_domain_checkpoint_callback(void *data)
+{
+libxl__save_helper_state *shs = data;
+libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+libxl__egc *egc = dcs->shs.egc;
+STATE_AO_GC(dcs->ao);
+
+libxl__stream_read_start_checkpoint(egc, &dcs->srs);
+}
+
+static void remus_checkpoint_stream_done(
+libxl__egc *egc, libxl__stream_read_state *srs, int rc)
+{
+libxl__domain_create_state *dcs = CONTAINER_OF(srs, *dcs, srs);
+
+libxl__xc_domain_saverestore_async_callback_done(egc, &dcs->shs, rc);
+}
+
 /*- main domain creation -*/
 
 /* We have a linear control flow; only one event callback is
@@ -939,6 +962,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 libxl_domain_config *const d_config = dcs->guest_config;
 const int restore_fd = dcs->restore_fd;
 libxl__domain_build_state *const state = &dcs->build_state;
+libxl__srm_restore_autogen_callbacks *const callbacks =
+&dcs->shs.callbacks.restore.a;
 
 if (rc) {
 domcreate_rebuild_done(egc, dcs, rc);
@@ -966,6 +991,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 }
 
 /* Restore */
+callbacks->checkpoint = libxl__remus_domain_checkpoint_callback;
 
 rc = libxl__build_pre(gc, domid, d_config, state);
 if (rc)
@@ -975,6 +1001,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 dcs->srs.fd = restore_fd;
 dcs->srs.legacy = (dcs->restore_params.stream_version == 1);
 dcs->srs.completion_callback = domcreate_stream_done;
+dcs->srs.checkpoint_callback = remus_checkpoint_stream_done;
 
 libxl__stream_read_start(egc, &dcs->srs);
 return;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 84e22c2..1b62f25 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3332,9 +3332,13 @@ struct libxl__stream_read_state {
 void (*completion_callback)(libxl__egc *egc,
 libxl__stream_read_state *srs,
 int rc);
+void (*checkpoint_callback)(libxl__egc *egc,
+libxl__stream_read_state *srs,
+int rc);
 /* Private */
 int rc;
 bool running;
+bool in_checkpoint;
 
 /* Active-stuff handling */
 int joined_rc;
@@ -3349,6 +3353,8 @@ struct libxl__stream_read_state {
 LIBXL_STAILQ_HEAD(, libxl__sr_record_buf) record_queue;
 enum {
 SRS_PHASE_NORMAL,
+SRS_PHASE_BUFFERING,
+SRS_PHASE_UNBUFFERING,
 } phase;
 bool recursion_guard;
 
@@ -3362,6 +3368,8 @@ _hidden void libxl__stream_read_start(libxl__egc *egc,
 
 _hidden void libxl__stream_read_continue(libxl__egc *egc,
  libxl__stream_read_state *stream);
+_hidden void libxl__stream_read_start_checkpoint(
+libxl__egc *egc, libxl__stream_read_state *stream);
 
 _hidden void libxl__stream_read_abort(libxl__egc *egc,
   libxl__stream_read_state *stream, int 
rc);
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 81095cd..6cfa05c 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -123,6 +123,10 @@ static int setup_read(libxl__stream_read_state *stream,
 return libxl__datacopier_start(dc);
 }
 
+/* Error handling for checkpoint mini-loop. */
+static void checkpoint_done(libxl__egc *egc,
+libxl__stream_read_state *stream, int rc);
+
 void libxl__stream_read_start(libxl__egc *egc,
   libxl__stream_read_state *stream)
 {
@@ -186,6 +190,18 @@ void libxl__stream_read_start(libxl__egc *egc,
 stream_failed(egc, stream, ret);
 }
 
+void libxl__stream_read_start_checkpoint(libxl__egc *egc,
+ libxl__stream_read_state *stream)
+{
+

[Xen-devel] [PATCH v2 16/27] tools/libxl: Infrastructure for reading a libxl migration v2 stream

2015-07-09 Thread Andrew Cooper
From: Ross Lagerwall 

This contains the event machinary and state machines to read an act on a
non-checkpointed migration v2 stream (with the exception of the
xc_domain_restore() handling which is spliced later in a bisectable way).

It also contains some boilerplate to help support checkpointed streams, which
shall be introduced in a later patch.

Signed-off-by: Ross Lagerwall 
Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---

Large quantities of the logic here are completely overhauled since v1, mostly
as part of fixing the checkpoint buffering bug which was the cause of the
broken Remus failover.  The result is actually more simple overall; all
records are buffered in memory (there is no splicing of the emulator records
any more), with normal streams having exactly 0 or 1 records currently in the
buffer, before processing.  Remus support later will allow multiple buffered
records.
---
 tools/libxl/Makefile|1 +
 tools/libxl/libxl_internal.h|   55 +
 tools/libxl/libxl_stream_read.c |  504 +++
 3 files changed, 560 insertions(+)
 create mode 100644 tools/libxl/libxl_stream_read.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index cc9c152..c71c5fe 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -94,6 +94,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o 
libxl_pci.o \
libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
libxl_internal.o libxl_utils.o libxl_uuid.o \
libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o 
\
+   libxl_stream_read.o \
libxl_save_callout.o _libxl_save_msgs_callout.o \
libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 4bd6ea1..0f17e7c 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -19,6 +19,8 @@
 
 #include "libxl_osdeps.h" /* must come before any other headers */
 
+#include "libxl_sr_stream_format.h"
+
 #include 
 #include 
 #include 
@@ -3211,6 +3213,58 @@ typedef void libxl__domain_create_cb(libxl__egc *egc,
  libxl__domain_create_state*,
  int rc, uint32_t domid);
 
+/* State for manipulating a libxl migration v2 stream */
+typedef struct libxl__stream_read_state libxl__stream_read_state;
+
+typedef struct libxl__sr_record_buf {
+/* private to stream read helper */
+LIBXL_STAILQ_ENTRY(struct libxl__sr_record_buf) entry;
+libxl__sr_rec_hdr hdr;
+void *body;
+} libxl__sr_record_buf;
+
+struct libxl__stream_read_state {
+/* filled by the user */
+libxl__ao *ao;
+int fd;
+void (*completion_callback)(libxl__egc *egc,
+libxl__stream_read_state *srs,
+int rc);
+/* Private */
+int rc;
+bool running;
+
+/* Main stream-reading data */
+libxl__datacopier_state dc;
+libxl__sr_hdr hdr;
+libxl__sr_record_buf *incoming_record;
+LIBXL_STAILQ_HEAD(, libxl__sr_record_buf) record_queue;
+enum {
+SRS_PHASE_NORMAL,
+} phase;
+bool recursion_guard;
+
+/* Emulator blob handling */
+libxl__datacopier_state emu_dc;
+libxl__carefd *emu_carefd;
+};
+
+_hidden void libxl__stream_read_start(libxl__egc *egc,
+  libxl__stream_read_state *stream);
+
+_hidden void libxl__stream_read_continue(libxl__egc *egc,
+ libxl__stream_read_state *stream);
+
+_hidden void libxl__stream_read_abort(libxl__egc *egc,
+  libxl__stream_read_state *stream, int 
rc);
+
+static inline bool libxl__stream_read_inuse(
+const libxl__stream_read_state *stream)
+{
+return stream->running;
+}
+
+
 struct libxl__domain_create_state {
 /* filled in by user */
 libxl__ao *ao;
@@ -3227,6 +3281,7 @@ struct libxl__domain_create_state {
 libxl__stub_dm_spawn_state dmss;
 /* If we're not doing stubdom, we use only dmss.dm,
  * for the non-stubdom device model. */
+libxl__stream_read_state srs;
 libxl__save_helper_state shs;
 /* necessary if the domain creation failed and we have to destroy it */
 libxl__domain_destroy_state dds;
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
new file mode 100644
index 000..6f5d572
--- /dev/null
+++ b/tools/libxl/libxl_stream_read.c
@@ -0,0 +1,504 @@
+/*
+ * Copyright (C) 2015  Citrix Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.

[Xen-devel] [PATCH v2 22/27] docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams

2015-07-09 Thread Andrew Cooper
In a remus senario, libxc will write a CHECKPOINT record, then hand ownership
of the fd to libxl.  Libxl then writes any records required and finishes with
a CHECKPOINT_END record, then hands ownership of the fd back to libxc.

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 docs/specs/libxl-migration-stream.pandoc |   14 +-
 tools/libxl/libxl_sr_stream_format.h |1 +
 tools/python/xen/migration/libxl.py  |   11 +++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/docs/specs/libxl-migration-stream.pandoc 
b/docs/specs/libxl-migration-stream.pandoc
index 7235317..674bfdd 100644
--- a/docs/specs/libxl-migration-stream.pandoc
+++ b/docs/specs/libxl-migration-stream.pandoc
@@ -119,7 +119,9 @@ type 0x: END
 
  0x0003: EMULATOR_CONTEXT
 
- 0x0004 - 0x7FFF: Reserved for future _mandatory_
+ 0x0004: CHECKPOINT_END
+
+ 0x0005 - 0x7FFF: Reserved for future _mandatory_
  records.
 
  0x8000 - 0x: Reserved for future _optional_
@@ -203,3 +205,13 @@ indexIndex of this emulator for the domain, if 
multiple
 
 emulator_ctx Emulator context blob.
 
+
+CHECKPOINT\_END
+---
+
+A checkpoint end record marks the end of a checkpoint in the image.
+
+ 0 1 2 3 4 5 6 7 octet
++-+
+
+The end record contains no fields; its body_length is 0.
diff --git a/tools/libxl/libxl_sr_stream_format.h 
b/tools/libxl/libxl_sr_stream_format.h
index f4f790b..3f3c497 100644
--- a/tools/libxl/libxl_sr_stream_format.h
+++ b/tools/libxl/libxl_sr_stream_format.h
@@ -35,6 +35,7 @@
 #define REC_TYPE_LIBXC_CONTEXT   0x0001U
 #define REC_TYPE_XENSTORE_DATA   0x0002U
 #define REC_TYPE_EMULATOR_CONTEXT0x0003U
+#define REC_TYPE_CHECKPOINT_END  0x0004U
 
 typedef struct libxl__sr_emulator_hdr
 {
diff --git a/tools/python/xen/migration/libxl.py 
b/tools/python/xen/migration/libxl.py
index 4e1f4f8..415502e 100644
--- a/tools/python/xen/migration/libxl.py
+++ b/tools/python/xen/migration/libxl.py
@@ -36,12 +36,14 @@ REC_TYPE_end  = 0x
 REC_TYPE_libxc_context= 0x0001
 REC_TYPE_xenstore_data= 0x0002
 REC_TYPE_emulator_context = 0x0003
+REC_TYPE_checkpoint_end   = 0x0004
 
 rec_type_to_str = {
 REC_TYPE_end  : "End",
 REC_TYPE_libxc_context: "Libxc context",
 REC_TYPE_xenstore_data: "Xenstore data",
 REC_TYPE_emulator_context : "Emulator context",
+REC_TYPE_checkpoint_end   : "Checkpoint end",
 }
 
 # emulator_context
@@ -176,6 +178,13 @@ class VerifyLibxl(VerifyBase):
 self.info("  Index %d, type %s" % (emu_idx, 
emulator_id_to_str[emu_id]))
 
 
+def verify_record_checkpoint_end(self, content):
+""" Checkpoint end record """
+
+if len(content) != 0:
+raise RecordError("Checkpoint end record with non-zero length")
+
+
 record_verifiers = {
 REC_TYPE_end:
 VerifyLibxl.verify_record_end,
@@ -185,4 +194,6 @@ record_verifiers = {
 VerifyLibxl.verify_record_xenstore_data,
 REC_TYPE_emulator_context:
 VerifyLibxl.verify_record_emulator_context,
+REC_TYPE_checkpoint_end:
+VerifyLibxl.verify_record_checkpoint_end,
 }
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 19/27] tools/libxc+libxl+xl: Restore v2 streams

2015-07-09 Thread Andrew Cooper
This is a complicated set of changes which must be done together for
bisectability.

 * libxl-save-helper is updated to unconditionally use libxc migration v2.
 * libxl compatibility workarounds in libxc are disabled for restore operations.
 * libxl__stream_read_start() is logically spliced into the event location
   where libxl__xc_domain_restore() used to reside.

The parameters 'hvm', 'pae', and 'superpages' were previously superfluous, and
are completely unused in migration v2. callbacks->toolstack_restore is handled
via a migration v2 record now, rather than via a callback from libxc.

NB: this change breaks Remus.  Further untangling needs to happen before Remus
will function.

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
Since v1:
 * Drop "legacy_width" from the IDL
 * Gain a LIBXL_HAVE_ to signify support of migration v2 streams
---
 tools/libxc/Makefile|4 ++--
 tools/libxl/libxl.h |   17 ++
 tools/libxl/libxl_create.c  |   48 ---
 tools/libxl/libxl_save_helper.c |2 +-
 tools/libxl/libxl_stream_read.c |   34 +++
 tools/libxl/libxl_types.idl |1 +
 tools/libxl/xl_cmdimpl.c|7 +-
 7 files changed, 76 insertions(+), 37 deletions(-)

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index b659df4..2cd0b1a 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -64,8 +64,8 @@ GUEST_SRCS-$(CONFIG_X86) += xc_sr_save_x86_hvm.c
 GUEST_SRCS-y += xc_sr_restore.c
 GUEST_SRCS-y += xc_sr_save.c
 GUEST_SRCS-y += xc_offline_page.c xc_compression.c
-$(patsubst %.c,%.o,$(GUEST_SRCS-y)): CFLAGS += -DXG_LIBXL_HVM_COMPAT
-$(patsubst %.c,%.opic,$(GUEST_SRCS-y)): CFLAGS += -DXG_LIBXL_HVM_COMPAT
+xc_sr_save_x86_hvm.o: CFLAGS += -DXG_LIBXL_HVM_COMPAT
+xc_sr_save_x86_hvm.opic: CFLAGS += -DXG_LIBXL_HVM_COMPAT
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index e9d63c9..e64a606 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -807,6 +807,23 @@
  */
 #define LIBXL_HAVE_SOCKET_BITMAP_ALLOC 1
 
+/*
+ * LIBXL_HAVE_STREAM_V2
+ *
+ * If this is defined, then the libxl_domain_create_restore() interface takes
+ * a "stream_version" parameter and supports a value of 2.
+ */
+#define LIBXL_HAVE_STREAM_V2 1
+
+/*
+ * LIBXL_HAVE_STREAM_V1
+ *
+ * In the case that LIBXL_HAVE_STREAM_V2 is set, LIBXL_HAVE_STREAM_V1
+ * indicates that libxl_domain_create_restore() can handle a "stream_version"
+ * parameter of 1, and convert the stream format automatically.
+ */
+#define LIBXL_HAVE_STREAM_V1 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index be13204..2a0063a 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -704,6 +704,10 @@ static void domcreate_attach_dtdev(libxl__egc *egc,
 static void domcreate_console_available(libxl__egc *egc,
 libxl__domain_create_state *dcs);
 
+static void domcreate_stream_done(libxl__egc *egc,
+  libxl__stream_read_state *srs,
+  int ret);
+
 static void domcreate_rebuild_done(libxl__egc *egc,
libxl__domain_create_state *dcs,
int ret);
@@ -933,11 +937,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 /* convenience aliases */
 const uint32_t domid = dcs->guest_domid;
 libxl_domain_config *const d_config = dcs->guest_config;
-libxl_domain_build_info *const info = &d_config->b_info;
 const int restore_fd = dcs->restore_fd;
 libxl__domain_build_state *const state = &dcs->build_state;
-libxl__srm_restore_autogen_callbacks *const callbacks =
-&dcs->shs.callbacks.restore.a;
 
 if (rc) {
 domcreate_rebuild_done(egc, dcs, rc);
@@ -970,30 +971,16 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 if (rc)
 goto out;
 
-/* read signature */
-int hvm, pae, superpages;
-switch (info->type) {
-case LIBXL_DOMAIN_TYPE_HVM:
-hvm = 1;
-superpages = 1;
-pae = libxl_defbool_val(info->u.hvm.pae);
-callbacks->toolstack_restore = libxl__toolstack_restore;
-break;
-case LIBXL_DOMAIN_TYPE_PV:
-hvm = 0;
-superpages = 0;
-pae = 1;
-break;
-default:
-rc = ERROR_INVAL;
-goto out;
-}
-libxl__xc_domain_restore(egc, dcs,
- hvm, pae, superpages);
+dcs->srs.ao = ao;
+dcs->srs.fd = restore_fd;
+dcs->srs.legacy = (dcs->restore_params.stream_version == 1);
+dcs->srs.completion_callback = domcreate_stream_done;
+
+libxl__stream_read_start(egc, &dcs->srs);
 return;
 
  out:
-libxl__xc

[Xen-devel] [PATCH v2 11/27] tools/python: Libxl migration v2 infrastructure

2015-07-09 Thread Andrew Cooper
Contains:
 * Python implementation of the libxl migration v2 records
 * Verification code for spec compliance
 * Unit tests

Signed-off-by: Andrew Cooper 
Acked-by: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 tools/python/xen/migration/libxl.py |  188 +++
 tools/python/xen/migration/tests.py |   13 +++
 2 files changed, 201 insertions(+)
 create mode 100644 tools/python/xen/migration/libxl.py

diff --git a/tools/python/xen/migration/libxl.py 
b/tools/python/xen/migration/libxl.py
new file mode 100644
index 000..4e1f4f8
--- /dev/null
+++ b/tools/python/xen/migration/libxl.py
@@ -0,0 +1,188 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+Libxl Migration v2 streams
+
+Record structures as per docs/specs/libxl-migration-stream.pandoc, and
+verification routines.
+"""
+
+import sys
+
+from struct import calcsize, unpack
+from xen.migration.verify import StreamError, RecordError, VerifyBase
+from xen.migration.libxc import VerifyLibxc
+
+# Header
+HDR_FORMAT = "!QII"
+
+HDR_IDENT = 0x4c6962786c466d74 # "LibxlFmt" in ASCII
+HDR_VERSION = 2
+
+HDR_OPT_BIT_ENDIAN = 0
+HDR_OPT_BIT_LEGACY = 1
+
+HDR_OPT_LE = (0 << HDR_OPT_BIT_ENDIAN)
+HDR_OPT_BE = (1 << HDR_OPT_BIT_ENDIAN)
+HDR_OPT_LEGACY = (1 << HDR_OPT_BIT_LEGACY)
+
+HDR_OPT_RESZ_MASK = 0xfffc
+
+# Records
+RH_FORMAT = "II"
+
+REC_TYPE_end  = 0x
+REC_TYPE_libxc_context= 0x0001
+REC_TYPE_xenstore_data= 0x0002
+REC_TYPE_emulator_context = 0x0003
+
+rec_type_to_str = {
+REC_TYPE_end  : "End",
+REC_TYPE_libxc_context: "Libxc context",
+REC_TYPE_xenstore_data: "Xenstore data",
+REC_TYPE_emulator_context : "Emulator context",
+}
+
+# emulator_context
+EMULATOR_CONTEXT_FORMAT = "II"
+
+EMULATOR_ID_unknown   = 0x
+EMULATOR_ID_qemu_trad = 0x0001
+EMULATOR_ID_qemu_upstream = 0x0002
+
+emulator_id_to_str = {
+EMULATOR_ID_unknown   : "Unknown",
+EMULATOR_ID_qemu_trad : "Qemu Traditional",
+EMULATOR_ID_qemu_upstream : "Qemu Upstream",
+}
+
+
+#
+# libxl format
+#
+
+LIBXL_QEMU_SIGNATURE = "DeviceModelRecord0002"
+LIBXL_QEMU_RECORD_HDR = "=%dsI" % (len(LIBXL_QEMU_SIGNATURE), )
+
+class VerifyLibxl(VerifyBase):
+""" Verify a Libxl v2 stream """
+
+def __init__(self, info, read):
+VerifyBase.__init__(self, info, read)
+
+
+def verify(self):
+""" Verity a libxl stream """
+
+self.verify_hdr()
+
+while self.verify_record() != REC_TYPE_end:
+pass
+
+
+def verify_hdr(self):
+""" Verify a Header """
+ident, version, options = self.unpack_exact(HDR_FORMAT)
+
+if ident != HDR_IDENT:
+raise StreamError("Bad image id: Expected 0x%x, got 0x%x"
+  % (HDR_IDENT, ident))
+
+if version != HDR_VERSION:
+raise StreamError("Unknown image version: Expected %d, got %d"
+  % (HDR_VERSION, version))
+
+if options & HDR_OPT_RESZ_MASK:
+raise StreamError("Reserved bits set in image options field: 0x%x"
+  % (options & HDR_OPT_RESZ_MASK))
+
+if ( (sys.byteorder == "little") and
+ ((options & HDR_OPT_BIT_ENDIAN) != HDR_OPT_LE) ):
+raise StreamError(
+"Stream is not native endianess - unable to validate")
+
+endian = ["little", "big"][options & HDR_OPT_LE]
+
+if options & HDR_OPT_LEGACY:
+self.info("Libxl Header: %s endian, legacy converted" % (endian, ))
+else:
+self.info("Libxl Header: %s endian" % (endian, ))
+
+
+def verify_record(self):
+""" Verify an individual record """
+rtype, length = self.unpack_exact(RH_FORMAT)
+
+if rtype not in rec_type_to_str:
+raise StreamError("Unrecognised record type %x" % (rtype, ))
+
+self.info("Libxl Record: %s, length %d"
+  % (rec_type_to_str[rtype], length))
+
+contentsz = (length + 7) & ~7
+content = self.rdexact(contentsz)
+
+padding = content[length:]
+if padding != "\x00" * len(padding):
+raise StreamError("Padding containing non0 bytes found")
+
+if rtype not in record_verifiers:
+raise RuntimeError("No verification function for libxl record '%s'"
+   % rec_type_to_str[rtype])
+else:
+record_verifiers[rtype](self, content[:length])
+
+return rtype
+
+
+def verify_record_end(self, content):
+""" End record """
+
+if len(content) != 0:
+raise RecordError("End record with non-zero length")
+
+
+def verify_record_libxc_context(self, content):
+""" Libxc context record """
+
+if len(content) != 0:
+raise RecordError("Libxc context record with non-zero length")
+
+# Verify the libxc stream, as we can't seek forwards through it
+

[Xen-devel] [PATCH v2 02/27] tools/libxc: Always compile the compat qemu variables into xc_sr_context

2015-07-09 Thread Andrew Cooper
This is safe (as the variables will simply be unused), and is required for
correct compilation when midway through untangling the libxc/libxl
interaction.

The #define is left in place to highlight that the variables can be removed
once the untangling is complete.

Signed-off-by: Andrew Cooper 
Acked-by: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 tools/libxc/xc_sr_common.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 565c5da..08c66db 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -307,10 +307,10 @@ struct xc_sr_context
 void *context;
 size_t contextsz;
 
-#ifdef XG_LIBXL_HVM_COMPAT
+/* #ifdef XG_LIBXL_HVM_COMPAT */
 uint32_t qlen;
 void *qbuf;
-#endif
+/* #endif */
 } restore;
 };
 } x86_hvm;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 00/27] Libxl migration v2

2015-07-09 Thread Andrew Cooper
This series adds support for the libxl migration v2 stream, and untangles the
existing layering violations of the toolstack and qemu records.

It can be found on the branch "libxl-migv2-v2"
  git://xenbits.xen.org/people/andrewcoop/xen.git
  http://xenbits.xen.org/git-http/people/andrewcoop/xen.git

Major changes in v2 are being rebased over the libxl AO-abort series, and a
redesign of the internal logic to support Remus/COLO buffering and failover.

At the end of the series, legacy migration is no longer used.

The Remus code is untested by me.  All other combinations of
suspend/migrate/resume have been tested with PV and HVM guests (qemu-trad and
qemu-upstream), including 32 -> 64 bit migration (which was the underlying bug
causing us to write migration v2 in the first place).

Anyway, thoughts/comments welcome.  Please test!

~Andrew

Summary of Acks/Modified/New from v1

  N bsd-sys-queue-h-seddery: Massage `offsetof'
A   tools/libxc: Always compile the compat qemu variables into xc_sr_context
A   tools/libxl: Introduce ROUNDUP()
  N tools/libxl: Introduce libxl__kill()
AM  tools/libxl: Stash all restore parameters in domain_create_state
  N tools/libxl: Split libxl__domain_create_state.restore_fd in two
 M  tools/libxl: Extra management APIs for the save helper
AM  tools/xl: Mandatory flag indicating the format of the migration stream
docs: Libxl migration v2 stream specification
AM  tools/python: Libxc migration v2 infrastructure
AM  tools/python: Libxl migration v2 infrastructure
  N tools/python: Other migration infrastructure
AM  tools/python: Verification utility for v2 stream spec compliance
AM  tools/python: Conversion utility for legacy migration streams
 M  tools/libxl: Migration v2 stream format
 M  tools/libxl: Infrastructure for reading a libxl migration v2 stream
 M  tools/libxl: Support converting a legacy stream to a v2 stream
 M  tools/libxl: Convert a legacy stream if needed
 M  tools/libxc+libxl+xl: Restore v2 streams
 M  tools/libxl: Infrastructure for writing a v2 stream
 M  tools/libxc+libxl+xl: Save v2 streams
AM  docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams
 M  tools/libxl: Write checkpoint records into the stream
 M  tools/libx{c,l}: Introduce restore_callbacks.checkpoint()
 M  tools/libxl: Handle checkpoint records in a libxl migration v2 stream
A   tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc
A   tools/libxl: Drop all knowledge of toolstack callbacks



Andrew Cooper (23):
  tools/libxc: Always compile the compat qemu variables into xc_sr_context
  tools/libxl: Introduce ROUNDUP()
  tools/libxl: Introduce libxl__kill()
  tools/libxl: Stash all restore parameters in domain_create_state
  tools/libxl: Split libxl__domain_create_state.restore_fd in two
  tools/libxl: Extra management APIs for the save helper
  tools/xl: Mandatory flag indicating the format of the migration stream
  docs: Libxl migration v2 stream specification
  tools/python: Libxc migration v2 infrastructure
  tools/python: Libxl migration v2 infrastructure
  tools/python: Other migration infrastructure
  tools/python: Verification utility for v2 stream spec compliance
  tools/python: Conversion utility for legacy migration streams
  tools/libxl: Support converting a legacy stream to a v2 stream
  tools/libxl: Convert a legacy stream if needed
  tools/libxc+libxl+xl: Restore v2 streams
  tools/libxc+libxl+xl: Save v2 streams
  docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams
  tools/libxl: Write checkpoint records into the stream
  tools/libx{c,l}: Introduce restore_callbacks.checkpoint()
  tools/libxl: Handle checkpoint records in a libxl migration v2 stream
  tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc
  tools/libxl: Drop all knowledge of toolstack callbacks

Ian Jackson (1):
  bsd-sys-queue-h-seddery: Massage `offsetof'

Ross Lagerwall (3):
  tools/libxl: Migration v2 stream format
  tools/libxl: Infrastructure for reading a libxl migration v2 stream
  tools/libxl: Infrastructure for writing a v2 stream

 docs/specs/libxl-migration-stream.pandoc   |  217 ++
 tools/include/xen-external/bsd-sys-queue-h-seddery |2 +
 tools/libxc/Makefile   |2 -
 tools/libxc/include/xenguest.h |9 +
 tools/libxc/xc_sr_common.h |   12 +-
 tools/libxc/xc_sr_restore.c|   71 +-
 tools/libxc/xc_sr_restore_x86_hvm.c|  124 
 tools/libxc/xc_sr_save_x86_hvm.c   |   36 -
 tools/libxl/Makefile   |2 +
 tools/libxl/libxl.h|   19 +
 tools/libxl/libxl_aoutils.c|   15 +
 tools/libxl/libxl_convert_callout.c|  172 +
 tools/libxl/libxl_create.c |   86 ++-
 tools/libxl/libxl_dom.c|   65 +-
 tools/libxl/libxl_internal.h   |  192 -

[Xen-devel] [PATCH v2 01/27] bsd-sys-queue-h-seddery: Massage `offsetof'

2015-07-09 Thread Andrew Cooper
From: Ian Jackson 

For some reason BSD's queue.h uses `__offsetof'.  It expects it to
work just like offsetof.  So use offsetof.

Reported-by: Andrew Cooper 
Signed-off-by: Ian Jackson 
---
 tools/include/xen-external/bsd-sys-queue-h-seddery |2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/xen-external/bsd-sys-queue-h-seddery 
b/tools/include/xen-external/bsd-sys-queue-h-seddery
index 7a957e3..3f8716d 100755
--- a/tools/include/xen-external/bsd-sys-queue-h-seddery
+++ b/tools/include/xen-external/bsd-sys-queue-h-seddery
@@ -69,4 +69,6 @@ s/\b struct \s+ type \b/type/xg;
 
 s,^\#include.*sys/cdefs.*,/* $& */,xg;
 
+s,\b __offsetof \b ,offsetof,xg;
+
 s/\b( NULL )/0/xg;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 04/27] tools/libxl: Introduce libxl__kill()

2015-07-09 Thread Andrew Cooper
as a wrapper to kill(2), and use it in preference to sendig in
libxl_save_callout.c.

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
Logically new in v2 - split out from a v1 change which was itself a
cherrypick-and-modify from the AO Abort series
---
 tools/libxl/libxl_aoutils.c  |   15 +++
 tools/libxl/libxl_internal.h |2 ++
 tools/libxl/libxl_save_callout.c |   10 ++
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 0931eee..274ef39 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -621,3 +621,18 @@ bool libxl__async_exec_inuse(const libxl__async_exec_state 
*aes)
 assert(time_inuse == child_inuse);
 return child_inuse;
 }
+
+void libxl__kill(libxl__gc *gc, pid_t pid, int sig, const char *what)
+{
+int r = kill(pid, sig);
+if (r) LOGE(WARN, "failed to kill() %s [%lu] (signal %d)",
+what, (unsigned long)pid, sig);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 19fc425..9147de1 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2244,6 +2244,8 @@ struct libxl__async_exec_state {
 int libxl__async_exec_start(libxl__async_exec_state *aes);
 bool libxl__async_exec_inuse(const libxl__async_exec_state *aes);
 
+void libxl__kill(libxl__gc *gc, pid_t pid, int sig, const char *what);
+
 /*- device addition/removal -*/
 
 typedef struct libxl__ao_device libxl__ao_device;
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 087c2d5..b82a5c1 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -244,12 +244,6 @@ static void run_helper(libxl__egc *egc, 
libxl__save_helper_state *shs,
 libxl__carefd_close(childs_pipes[1]);
 helper_failed(egc, shs, rc);;
 }
-static void sendsig(libxl__gc *gc, libxl__save_helper_state *shs, int sig)
-{
-int r = kill(shs->child.pid, sig);
-if (r) LOGE(WARN, "failed to kill save/restore helper [%lu] (signal %d)",
-(unsigned long)shs->child.pid, sig);
-}
 
 static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs,
   int rc)
@@ -266,7 +260,7 @@ static void helper_failed(libxl__egc *egc, 
libxl__save_helper_state *shs,
 return;
 }
 
-sendsig(gc, shs, SIGKILL);
+libxl__kill(gc, shs->child.pid, SIGKILL, "save/restore helper");
 }
 
 static void helper_stop(libxl__egc *egc, libxl__ao_abortable *abrt, int rc)
@@ -282,7 +276,7 @@ static void helper_stop(libxl__egc *egc, 
libxl__ao_abortable *abrt, int rc)
 if (!shs->rc)
 shs->rc = rc;
 
-sendsig(gc, shs, SIGTERM);
+libxl__kill(gc, shs->child.pid, SIGTERM, "save/restore helper");
 }
 
 static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 05/27] tools/libxl: Stash all restore parameters in domain_create_state

2015-07-09 Thread Andrew Cooper
Shortly more parameters will appear, and this saves unboxing each one.
libxl_domain_restore_params is mandatory for restore streams, and ignored for
plain creation.  The old 'checkpointed_stream' was incorrectly identified as a
private parameter when it was infact public.

No functional change.

Signed-off-by: Andrew Cooper 
Acked-by: Ian Campbell 
Reviewed-by: Yang Hongyang 
CC: Ian Jackson 
CC: Wei Liu 

---
Since v1:
 * Gate validity on restore_fd being valid.
---
 tools/libxl/libxl_create.c   |   13 +++--
 tools/libxl/libxl_internal.h |2 +-
 tools/libxl/libxl_save_callout.c |2 +-
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index b785ddd..61515da 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1512,8 +1512,8 @@ static void domain_create_cb(libxl__egc *egc,
  int rc, uint32_t domid);
 
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
-uint32_t *domid,
-int restore_fd, int checkpointed_stream,
+uint32_t *domid, int restore_fd,
+const libxl_domain_restore_params *params,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1526,8 +1526,9 @@ static int do_domain_create(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
 libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
 cdcs->dcs.restore_fd = restore_fd;
+if (restore_fd > -1)
+cdcs->dcs.restore_params = *params;
 cdcs->dcs.callback = domain_create_cb;
-cdcs->dcs.checkpointed_stream = checkpointed_stream;
 libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
 cdcs->domid_out = domid;
 
@@ -1553,7 +1554,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
 {
-return do_domain_create(ctx, d_config, domid, -1, 0,
+return do_domain_create(ctx, d_config, domid, -1, NULL,
 ao_how, aop_console_how);
 }
 
@@ -1563,8 +1564,8 @@ int libxl_domain_create_restore(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
 {
-return do_domain_create(ctx, d_config, domid, restore_fd,
-params->checkpointed_stream, ao_how, 
aop_console_how);
+return do_domain_create(ctx, d_config, domid, restore_fd, params,
+ao_how, aop_console_how);
 }
 
 /*
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9147de1..5ab945a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3217,11 +3217,11 @@ struct libxl__domain_create_state {
 libxl_domain_config *guest_config;
 libxl_domain_config guest_config_saved; /* vanilla config */
 int restore_fd;
+libxl_domain_restore_params restore_params;
 libxl__domain_create_cb *callback;
 libxl_asyncprogress_how aop_console_how;
 /* private to domain_create */
 int guest_domid;
-int checkpointed_stream;
 libxl__domain_build_state build_state;
 libxl__bootloader_state bl;
 libxl__stub_dm_spawn_state dmss;
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index b82a5c1..80aae1b 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -60,7 +60,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, 
libxl__domain_create_state *dcs,
 state->store_domid, state->console_port,
 state->console_domid,
 hvm, pae, superpages,
-cbflags, dcs->checkpointed_stream,
+cbflags, dcs->restore_params.checkpointed_stream,
 };
 
 dcs->shs.ao = ao;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 03/27] tools/libxl: Introduce ROUNDUP()

2015-07-09 Thread Andrew Cooper
This is the same as is used by libxc.

Signed-off-by: Andrew Cooper 
Acked-by: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 tools/libxl/libxl_internal.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 5235d25..19fc425 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -110,6 +110,9 @@
 
 #define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
 
+#define ROUNDUP(_val, _order)   \
+(((unsigned long)(_val)+(1UL<<(_order))-1) & ~((1UL<<(_order))-1))
+
 #define min(X, Y) ({ \
 const typeof (X) _x = (X);   \
 const typeof (Y) _y = (Y);   \
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 08/27] tools/xl: Mandatory flag indicating the format of the migration stream

2015-07-09 Thread Andrew Cooper
Introduced at this point so the python stream conversion code has a concrete
ABI to use.  Later when libxl itself starts supporting a v2 stream, it will be
added to XL_MANDATORY_FLAG_ALL.

Signed-off-by: Andrew Cooper 
Acked-by: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
v2: Expand commit message
---
 tools/libxl/xl_cmdimpl.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 971209c..26b1e7d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -109,6 +109,7 @@
*/
 
 #define XL_MANDATORY_FLAG_JSON (1U << 0) /* config data is in JSON format */
+#define XL_MANDATORY_FLAG_STREAMv2 (1U << 1) /* stream is v2 */
 #define XL_MANDATORY_FLAG_ALL  (XL_MANDATORY_FLAG_JSON)
 struct save_file_header {
 char magic[32]; /* savefileheader_magic */
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 06/27] tools/libxl: Split libxl__domain_create_state.restore_fd in two

2015-07-09 Thread Andrew Cooper
In a future patch, we shall support automatically converting a legacy stream
to a v2 stream, in which case libxc needs to read from a different fd.

Simply overwriting restore_fd does not work; the two fd's have different
circumstances.  The restore_fd needs to be returned to its origial state
before libxl_domain_create_restore() returns, while in the converted case, the
fd needs allocating and deallocating appropriately.

No functional change.

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
New in v2
---
 tools/libxl/libxl_create.c   |2 +-
 tools/libxl/libxl_internal.h |2 +-
 tools/libxl/libxl_save_callout.c |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 61515da..be13204 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1525,7 +1525,7 @@ static int do_domain_create(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 cdcs->dcs.guest_config = d_config;
 libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
 libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
-cdcs->dcs.restore_fd = restore_fd;
+cdcs->dcs.restore_fd = cdcs->dcs.libxc_fd = restore_fd;
 if (restore_fd > -1)
 cdcs->dcs.restore_params = *params;
 cdcs->dcs.callback = domain_create_cb;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 5ab945a..588cfb8 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3216,7 +3216,7 @@ struct libxl__domain_create_state {
 libxl__ao *ao;
 libxl_domain_config *guest_config;
 libxl_domain_config guest_config_saved; /* vanilla config */
-int restore_fd;
+int restore_fd, libxc_fd;
 libxl_domain_restore_params restore_params;
 libxl__domain_create_cb *callback;
 libxl_asyncprogress_how aop_console_how;
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 80aae1b..1136b79 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -48,7 +48,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, 
libxl__domain_create_state *dcs,
 
 /* Convenience aliases */
 const uint32_t domid = dcs->guest_domid;
-const int restore_fd = dcs->restore_fd;
+const int restore_fd = dcs->libxc_fd;
 libxl__domain_build_state *const state = &dcs->build_state;
 
 unsigned cbflags = libxl__srm_callout_enumcallbacks_restore
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 07/27] tools/libxl: Extra management APIs for the save helper

2015-07-09 Thread Andrew Cooper
With migration v2, there are several moving parts needing to be juggled at
once.  This requires the error handling logic to be able to query the state of
each moving part, possibly before they have been started, and be able to
cancel them.

Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 

---
Since v1:
 * Add an _init() function which allows _inuse() to be safe to call even
   before the save helper has started.
---
 tools/libxl/libxl_internal.h |9 +
 tools/libxl/libxl_save_callout.c |   17 ++---
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 588cfb8..4bd6ea1 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3272,6 +3272,15 @@ _hidden void libxl__xc_domain_restore(libxl__egc *egc,
 _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
int rc, int retval, int errnoval);
 
+_hidden void libxl__save_helper_init(libxl__save_helper_state *shs);
+_hidden void libxl__save_helper_abort(libxl__egc *egc,
+  libxl__save_helper_state *shs);
+
+static inline bool libxl__save_helper_inuse(const libxl__save_helper_state 
*shs)
+{
+return libxl__ev_child_inuse(&shs->child);
+}
+
 /* Each time the dm needs to be saved, we must call suspend and then save */
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
libxl__domain_suspend_state *dss);
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 1136b79..cd18cd2 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -146,6 +146,13 @@ void 
libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
 shs->egc = 0;
 }
 
+void libxl__save_helper_init(libxl__save_helper_state *shs)
+{
+libxl__ao_abortable_init(&shs->abrt);
+libxl__ev_fd_init(&shs->readable);
+libxl__ev_child_init(&shs->child);
+}
+
 /*- helper execution -*/
 
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
@@ -167,9 +174,7 @@ static void run_helper(libxl__egc *egc, 
libxl__save_helper_state *shs,
 shs->rc = 0;
 shs->completed = 0;
 shs->pipes[0] = shs->pipes[1] = 0;
-libxl__ao_abortable_init(&shs->abrt);
-libxl__ev_fd_init(&shs->readable);
-libxl__ev_child_init(&shs->child);
+libxl__save_helper_init(shs);
 
 shs->abrt.ao = shs->ao;
 shs->abrt.callback = helper_stop;
@@ -279,6 +284,12 @@ static void helper_stop(libxl__egc *egc, 
libxl__ao_abortable *abrt, int rc)
 libxl__kill(gc, shs->child.pid, SIGTERM, "save/restore helper");
 }
 
+void libxl__save_helper_abort(libxl__egc *egc,
+  libxl__save_helper_state *shs)
+{
+helper_stop(egc, &shs->abrt, ERROR_FAIL);
+}
+
 static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
int fd, short events, short revents)
 {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 09/27] docs: Libxl migration v2 stream specification

2015-07-09 Thread Andrew Cooper
Signed-off-by: Andrew Cooper 
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 docs/specs/libxl-migration-stream.pandoc |  205 ++
 1 file changed, 205 insertions(+)
 create mode 100644 docs/specs/libxl-migration-stream.pandoc

diff --git a/docs/specs/libxl-migration-stream.pandoc 
b/docs/specs/libxl-migration-stream.pandoc
new file mode 100644
index 000..7235317
--- /dev/null
+++ b/docs/specs/libxl-migration-stream.pandoc
@@ -0,0 +1,205 @@
+% LibXenLight Domain Image Format
+% Andrew Cooper <>
+% Draft B
+
+Introduction
+
+
+For the purposes of this document, `xl` is used as a representation of any
+implementer of the `libxl` API.  `xl` should be considered completely
+interchangeable with alternates, such as `libvirt` or `xenopsd-xl`.
+
+Purpose
+---
+
+The _domain image format_ is the context of a running domain used for
+snapshots of a domain or for transferring domains between hosts during
+migration.
+
+There are a number of problems with the domain image format used in Xen 4.5
+and earlier (the _legacy format_)
+
+* There is no `libxl` context information.  `xl` is required to send certain
+  pieces of `libxl` context itself.
+
+* The contents of the stream is passed directly through `libxl` to `libxc`.
+  The legacy `libxc` format contained some information which belonged at the
+  `libxl` level, resulting in awkward layer violation to return the
+  information back to `libxl`.
+
+* The legacy `libxc` format was inextensible, causing inextensibility in the
+  legacy `libxl` handling.
+
+This design addresses the above points, allowing for a completely
+self-contained, extensible stream with each layer responsibile for its own
+appropriate information.
+
+
+Not Yet Included
+
+
+The following features are not yet fully specified and will be
+included in a future draft.
+
+* Remus
+
+* ARM
+
+
+Overview
+
+
+The image format consists of a _Header_, followed by 1 or more _Records_.
+Each record consists of a type and length field, followed by any type-specific
+data.
+
+\clearpage
+
+Header
+==
+
+The header identifies the stream as a `libxl` stream, including the version of
+this specification that it complies with.
+
+All fields in this header shall be in _big-endian_ byte order, regardless of
+the setting of the endianness bit.
+
+ 0 1 2 3 4 5 6 7 octet
++-+
+| ident   |
++---+-+
+| version   | options |
++---+-+
+
+
+Field   Description
+--- 
+ident   0x4c6962786c466d74 ("LibxlFmt" in ASCII).
+
+version 0x0002.  The version of this specification.
+
+options bit 0: Endianness.0 = little-endian, 1 = big-endian.
+
+bit 1: Legacy Format. If set, this stream was created by
+  the legacy conversion tool.
+
+bits 2-31: Reserved.
+
+
+The endianness shall be 0 (little-endian) for images generated on an
+i386, x86_64, or arm host.
+
+\clearpage
+
+
+Records
+===
+
+A record has a record header, type specific data and a trailing footer.  If
+`length` is not a multiple of 8, the body is padded with zeroes to align the
+end of the record on an 8 octet boundary.
+
+ 0 1 2 3 4 5 6 7 octet
++---+-+
+| type  | body_length |
++---+---+-+
+| body... |
+...
+|   | padding (0 to 7 octets) |
++---+-+
+
+
+FieldDescription
+---  ---
+type 0x: END
+
+ 0x0001: LIBXC_CONTEXT
+
+ 0x0002: XENSTORE_DATA
+
+ 0x0003: EMULATOR_CONTEXT
+
+ 0x0004 - 0x7FFF: Reserved for future _mandatory_
+ records.
+
+ 0x8000 - 0x: Reserved for future _optional_
+ records.
+
+body_length  Length in octets of the record body.
+
+body Content of the record.
+
+padding  0 to 7 octets of zeros to pad the whole record to a multiple
+ of 8 octets.
+
+
+\clearpage
+
+END
+
+
+A end record marks the end of the image, and shall be the final record
+in the stream.
+
+ 0 1 2 3 4 5 6 7 octet

Re: [Xen-devel] [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters

2015-07-09 Thread Ian Jackson
Tiejun Chen writes ("[v7][PATCH 16/16] tools: parse to enable new rdm policy 
parameters"):
> This patch parses to enable user configurable parameters to specify
> RDM resource and according policies,
> 
> Global RDM parameter:
> rdm = "strategy=host,policy=strict/relaxed"
> Per-device RDM parameter:
> pci = [ 'sbdf, rdm_policy=strict/relaxed' ]
> 
> Default per-device RDM policy is same as default global RDM policy as being
> 'relaxed'. And the per-device policy would override the global policy like
> others.

Thanks for this.  I have found a couple of things in this patch which
I would like to see improved.  See below.

Again, given how late I am, I do not feel that I should be nacking it
at this time.  You have a tools ack from Wei, so my comments are not a
blocker for this series.

But if you need to respin, please take these comments into account,
and consider which are feasible to fix in the time available.  If you
are respinning this series targeting Xen 4.7 or later, please address
all of the points I make below.

Thanks.


The first issue (which would really be relevant to the documentation
patch) is that the documentation is in a separate commit.  There are
sometimes valid reasons for doing this.  I'm not sure if they apply,
but if they do this should be explained in one of the commit
messages.  If this was done I'm afraid I have missed it.

> +}else if ( !strcmp(optkey, "rdm_policy") ) {
> +if ( !strcmp(tok, "strict") ) {
> +pcidev->rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
> +} else if ( !strcmp(tok, "relaxed") ) {
> +pcidev->rdm_policy = 
> LIBXL_RDM_RESERVE_POLICY_RELAXED;
> +} else {
> +XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM 
> property"
> +  " policy: 'strict' or 'relaxed'.",
> + tok);
> +goto parse_error;
> +}

This section has coding style (whitespace) problems and long lines.
If you need to respin, please fix them.

> +for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
> +switch(state) {
> +case STATE_TYPE:
> +if (*ptr == '=') {
> +state = STATE_RDM_STRATEGY;
> +*ptr = '\0';
> +if (strcmp(tok, "strategy")) {
> +XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
> +goto parse_error;
> +}
> +tok = ptr + 1;
> +}

This code is extremely repetitive.

Really I would prefer that this parsing was done with a miniature flex
parser, rather than ad-hoc pointer arithmetic and use of strtok.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-09 Thread Ian Jackson
Tiejun Chen writes ("[v7][PATCH 13/16] libxl: construct e820 map with RDM 
information for HVM guest"):
> Here we'll construct a basic guest e820 table via
> XENMEM_set_memory_map. This table includes lowmem, highmem
> and RDMs if they exist, and hvmloader would need this info
> later.
> 
> Note this guest e820 table would be same as before if the
> platform has no any RDM or we disable RDM (by default).
...
>  tools/libxl/libxl_dom.c  |  5 +++
>  tools/libxl/libxl_internal.h | 24 +
>  tools/libxl/libxl_x86.c  | 83 
> 
...
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 62ef120..41da479 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>  goto out;
>  }
>  
> +if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
> +LOG(ERROR, "setting domain memory map failed");
> +goto out;
> +}

This is platform-independent code, isn't it ?  In which case this will
break the build on ARM, I think.

Would an ARM maintainer please confirm.

Aside from that I have no issues with this patch.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary

2015-07-09 Thread Ian Jackson
Tiejun Chen writes ("[v7][PATCH 12/16] tools: introduce a new parameter to set 
a predefined rdm boundary"):
> Previously we always fix that predefined boundary as 2G to handle
> conflict between memory and rdm, but now this predefined boundar
> can be changes with the parameter "rdm_mem_boundary" in .cfg file.
> 
> CC: Ian Jackson 
> CC: Stefano Stabellini 
> CC: Ian Campbell 
> CC: Wei Liu 
> Acked-by: Wei Liu 
> Signed-off-by: Tiejun Chen 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-09 Thread Ian Jackson
Tiejun Chen writes ("[v7][PATCH 11/16] tools/libxl: detect and avoid conflicts 
with RDM"):
> While building a VM, HVM domain builder provides struct hvm_info_table{}
> to help hvmloader. Currently it includes two fields to construct guest
> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
> check them to fix any conflict with RDM.
...
> CC: Ian Jackson 
> CC: Stefano Stabellini 
> CC: Ian Campbell 
> CC: Wei Liu 
> Acked-by: Wei Liu 
> Signed-off-by: Tiejun Chen 
> Reviewed-by: Kevin Tian 

I have found a few things in this patch which I would like to see
improved.  See below.

Given how late I am with this review, I do not feel that I should be
nacking it at this time.  You have a tools ack from Wei, so my
comments are not a blocker for this series.

But if you need to respin, please take these comments into account,
and consider which are feasible to fix in the time available.  If you
are respinning this series targeting Xen 4.7 or later, please address
all of the points I make below.

Thanks.


> +int libxl__domain_device_construct_rdm(libxl__gc *gc,
> +   libxl_domain_config *d_config,
> +   uint64_t rdm_mem_boundary,
> +   struct xc_hvm_build_args *args)
...
> +uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull\
<<32);
...
> +if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE && !d_config->num_\
pcidevs)

There are quite a few of these long lines, which should be wrapped.
See tools/libxl/CODING_STYLE.

> +d_config->num_rdms = nr_entries;
> +d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
> +d_config->num_rdms * sizeof(libxl_device_rdm));

This code is remarkably similar to a function later on which adds an
rdm.  Please can you factor it out.

> +} else
> +d_config->num_rdms = 0;

Please can you put { } around the else block too.  I don't think this
mixed style is good.

> +for (j = 0; j < d_config->num_rdms; j++) {
> +if (d_config->rdms[j].start ==
> + (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)

This construct
   (uint64_t)some_pfn << XC_PAGE_SHIFT
appears an awful lot.

I would prefer it if it were done in an inline function (or maybe a
macro).


> +libxl_domain_build_info *const info = &d_config->b_info;
> +/*
> + * Currently we fix this as 2G to guarantte how to handle
 ^

Should read "guarantee".

> +ret = libxl__domain_device_construct_rdm(gc, d_config,
> + rdm_mem_boundary,
> + &args);
> +if (ret) {
> +LOG(ERROR, "checking reserved device memory failed");
> +goto out;
> +}

`rc' should be used here rather than `ret'.  (It is unfortunate that
this function has poor style already, but it would be best not to make
it worse.)


Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-07-09 Thread Elena Ufimtseva

- wei.l...@citrix.com wrote:

> On Thu, Jul 09, 2015 at 05:00:45PM +0100, Jan Beulich wrote:
> > >>> On 09.07.15 at 17:53,  wrote:
> > > - jbeul...@suse.com wrote:
> > >> >>> On 09.07.15 at 14:07,  wrote:
> > >> > You are right, it needs to be rebased. I can post later rebased
> on
> > >> memory 
> > >> > leak fix version, if you thin its a way to go.
> > >> 
> > >> I didn't look at v9 yet, and can't predict when I will be able
> to.
> > > 
> > > Would you like me to post v10 with memory leak patch included in
> the 
> > > patchset before you start looking at v9?
> > 
> > If there is a dependency on the changes in the leak fix v6, then
> > this would be a good idea. If not, you can keep things as they are
> > now. I view the entire set more as a bug fix than a feature anyway,
> > and hence see no reason not to get this in after the freeze. But
> I'm
> > adding Wei just in case...
> > 
> 

Thanks Jan.
The dependency exists on memory leak patch, so I will add it to this series and 
squash the first patch from v9.
 
> I just looked at v9. The first three patches are quite mechanical.
> The
> fourth patch is relatively bigger but it's also quite straightforward
> (mostly parsing input). All in all, this series itself is
> self-contained.
> 
> I'm don't think OSSTest is able to test that, so it would not cause
> visible regression on our side.
> 
> I also agree it's a bug fix. Preferably this series should be applied
> before first RC.
> 
> Wei.

Thank you Wei.
> 
> > Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-09 Thread Ian Jackson
Tiejun Chen writes ("[v7][PATCH 10/16] tools: introduce some new parameters to 
set rdm policy"):
> This patch introduces user configurable parameters to specify RDM
> resource and according policies,
...

>  int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
>  {
> +/* We'd like to force reserve rdm specific to a device by default.*/
> +if ( pci->rdm_policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
   ^

I have just spotted that spurious whitespace.  However I won't block
this for that.

Acked-by: Ian Jackson 

(actually).

I would appreciate it if you could ensure that this is fixed in any
repost.  You may retain my ack if you do that.  Committers should feel
free to fix it on commit.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/9] libxl: event tests: Improve Makefile doc comment

2015-07-09 Thread Ian Jackson
Including the explanation of how to run these tests.

Signed-off-by: Ian Jackson 
---
 tools/libxl/Makefile |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index cc9c152..44a4da7 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -109,7 +109,13 @@ LIBXL_TESTS += timedereg
 # "outside libxl" file is compiled exactly like a piece of application
 # code.  They must share information via explicit libxl entrypoints.
 # Unlike proper parts of libxl, it is permissible for libxl_test_FOO.c
-# to use private global variables for its state.
+# to use private global variables for its state.  Note that all the
+# "inside" parts are compiled into a single test library, so their
+# symbol names must be unique.
+#
+# To run these tests, either use LD_PRELOAD to get libxenlight_test.so
+# loaded, or rename it to libxenlight.so so it is the target of the
+# appropriate symlinks.
 
 LIBXL_TEST_OBJS += $(foreach t, $(LIBXL_TESTS),libxl_test_$t.o)
 TEST_PROG_OBJS += $(foreach t, $(LIBXL_TESTS),test_$t.o) test_common.o
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 6/9] libxl: event tests: Provide libxl_test_fdevent

2015-07-09 Thread Ian Jackson
We are going to use this shortly.  But, it is nicely self-contained.

Signed-off-by: Ian Jackson 
---
 tools/libxl/Makefile |2 +-
 tools/libxl/libxl_test_fdevent.c |   79 ++
 tools/libxl/libxl_test_fdevent.h |   12 ++
 3 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxl/libxl_test_fdevent.c
 create mode 100644 tools/libxl/libxl_test_fdevent.h

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 512b0e1..b92809c 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -101,7 +101,7 @@ LIBXL_OBJS += _libxl_types.o libxl_flask.o 
_libxl_types_internal.o
 
 LIBXL_TESTS += timedereg
 LIBXL_TESTS_PROGS = $(LIBXL_TESTS)
-LIBXL_TESTS_INSIDE = $(LIBXL_TESTS)
+LIBXL_TESTS_INSIDE = $(LIBXL_TESTS) fdevent
 
 # Each entry FOO in LIBXL_TESTS has two main .c files:
 #   libxl_test_FOO.c  "inside libxl" code to support the test case
diff --git a/tools/libxl/libxl_test_fdevent.c b/tools/libxl/libxl_test_fdevent.c
new file mode 100644
index 000..2d875d9
--- /dev/null
+++ b/tools/libxl/libxl_test_fdevent.c
@@ -0,0 +1,79 @@
+/*
+ * fdevent test helpr for the libxl event system
+ */
+
+#include "libxl_internal.h"
+
+#include "libxl_test_fdevent.h"
+
+typedef struct {
+libxl__ao *ao;
+libxl__ev_fd fd;
+libxl__ao_abortable abrt;
+} libxl__test_fdevent;
+
+static void fdevent_complete(libxl__egc *egc, libxl__test_fdevent *tfe,
+ int rc);
+
+static void tfe_init(libxl__test_fdevent *tfe, libxl__ao *ao)
+{
+tfe->ao = ao;
+libxl__ev_fd_init(&tfe->fd);
+libxl__ao_abortable_init(&tfe->abrt);
+}
+
+static void tfe_cleanup(libxl__gc *gc, libxl__test_fdevent *tfe)
+{
+libxl__ev_fd_deregister(gc, &tfe->fd);
+libxl__ao_abortable_deregister(&tfe->abrt);
+}
+
+static void tfe_fd_cb(libxl__egc *egc, libxl__ev_fd *ev,
+  int fd, short events, short revents)
+{
+libxl__test_fdevent *tfe = CONTAINER_OF(ev,*tfe,fd);
+STATE_AO_GC(tfe->ao);
+fdevent_complete(egc, tfe, 0);
+}
+
+static void tfe_abrt_cb(libxl__egc *egc, libxl__ao_abortable *abrt,
+int rc)
+{
+libxl__test_fdevent *tfe = CONTAINER_OF(abrt,*tfe,abrt);
+STATE_AO_GC(tfe->ao);
+fdevent_complete(egc, tfe, rc);
+}
+
+static void fdevent_complete(libxl__egc *egc, libxl__test_fdevent *tfe,
+ int rc)
+{
+STATE_AO_GC(tfe->ao);
+tfe_cleanup(gc, tfe);
+libxl__ao_complete(egc, ao, rc);
+}
+
+int libxl_test_fdevent(libxl_ctx *ctx, int fd, short events,
+   libxl_asyncop_how *ao_how)
+{
+int rc;
+libxl__test_fdevent *tfe;
+
+AO_CREATE(ctx, 0, ao_how);
+GCNEW(tfe);
+
+tfe_init(tfe, ao);
+
+rc = libxl__ev_fd_register(gc, &tfe->fd, tfe_fd_cb, fd, events);
+if (rc) goto out;
+
+tfe->abrt.ao = ao;
+tfe->abrt.callback = tfe_abrt_cb;
+rc = libxl__ao_abortable_register(&tfe->abrt);
+if (rc) goto out;
+
+return AO_INPROGRESS;
+
+ out:
+tfe_cleanup(gc, tfe);
+return AO_CREATE_FAIL(rc);
+}
diff --git a/tools/libxl/libxl_test_fdevent.h b/tools/libxl/libxl_test_fdevent.h
new file mode 100644
index 000..82a307e
--- /dev/null
+++ b/tools/libxl/libxl_test_fdevent.h
@@ -0,0 +1,12 @@
+#ifndef TEST_FDEVENT_H
+#define TEST_FDEVENT_H
+
+#include 
+
+int libxl_test_fdevent(libxl_ctx *ctx, int fd, short events,
+   libxl_asyncop_how *ao_how)
+   LIBXL_EXTERNAL_CALLERS_ONLY;
+/* This operation waits for one of the poll events to occur on fd, and
+ * then completes successfully.  (Or, it can be aborted.) */
+
+#endif /*TEST_FDEVENT_H*/
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 9/9] libxl: event tests: test_timedereg: Fix rc handling

2015-07-09 Thread Ian Jackson
In 31c836f4 "libxl: events: Permit timeouts to signal ao abort",
timeout callbacks take an extra rc argument.

In that patch the wrong assertion is made about the rc in
test_timedereg's `occurs' callback.  Fix this to make the test pass
again.

Signed-off-by: Ian Jackson 
---
 tools/libxl/libxl_test_timedereg.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_test_timedereg.c 
b/tools/libxl/libxl_test_timedereg.c
index c464663..a567db6 100644
--- a/tools/libxl/libxl_test_timedereg.c
+++ b/tools/libxl/libxl_test_timedereg.c
@@ -67,7 +67,7 @@ static void occurs(libxl__egc *egc, libxl__ev_time *ev,
 int off = ev - &et[0][0];
 LOG(DEBUG,"occurs[%d][%d] seq=%d rc=%d", off/NTIMES, off%NTIMES, seq, rc);
 
-assert(!rc);
+assert(rc == ERROR_TIMEDOUT);
 
 switch (seq) {
 case 0:
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   3   4   >