from:"Jan Beulich"

Re: [Xen-devel] [PATCH for-4.5] xen: vnuma: expose vnode_to_pnode to guest

2014-11-10 Thread Jan Beulich

 On 10.11.14 at 11:51, dario.faggi...@citrix.com wrote:
 I'm 100% ok to re-start that discussion here and now... however, how
 stable should this interface be? Can't we deal with this when actually
 implementing NUMA aware ballooning and add stuff at than point, if
 necessary?

Wei's desire to have a stable interface for 4.5 and later is quite
reasonable, as we can't change the interface once a release got
shipped with it. If we were to discover additional needs later, only
backward compatible changes to the existing interface would be
allowed (and e.g. adding another field to the interface structure
would be out of scope).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [TestDay] VMX test report for Xen 4.5.0-rc1

2014-11-12 Thread Jan Beulich

 On 12.11.14 at 07:58, robert...@intel.com wrote:
 2. Failed to hotplug a VT-d device with XEN4.5-RC1
   http://bugzilla-archived.xenproject.org/bugzilla/show_bug.cgi?id=1894 

First of all I'm not sure it is really useful to use the old, discontinued
bugzilla to report bugs. I think it would be much easier if you reported
them directly (and individually) to the mailing list, with or without
triggering the creation of bugs in the new tracking system.

As to the specific bug above - nothing is being said on whether a
driver was still attached to the device at the time it was being
hot-unplugged; if there was, I'm not sure what the expected
behavior would be (i.e. on real hardware).

 3. Fail to hot add multi devices to guest
   http://bugzilla-archived.xenproject.org/bugzilla/show_bug.cgi?id=1895 

The description in contradicting itself - title and parts of it say
the operation failed, but Bug detailed description says Checking
inside guest VM, the VT-D device is attached to guest VM actually,
and works fine.

 4. Not all PFs are available if assign multi VT-d devices to Wndows guest VM
   http://bugzilla-archived.xenproject.org/bugzilla/show_bug.cgi?id=1896 

I think we were told the other day that pass-through of PFs is
not being supported by Intel. What's the purpose of reporting
bugs there?

 7. Error msg occurred after attaching multi SR-IOV VF device to running guest
   http://bugzilla-archived.xenproject.org/bugzilla/show_bug.cgi?id=1899 

This reads (except for the title) the same as 1895. Am I overlooking
something here?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-11-12 Thread Jan Beulich

 On 12.11.14 at 10:13, tiejun.c...@intel.com wrote:
 On 2014/11/12 17:02, Jan Beulich wrote:
 On 12.11.14 at 09:45, tiejun.c...@intel.com wrote:
 #2 flags field in each specific device of new domctl would control
 whether this device need to check/reserve its own RMRR range. But its
 not dependent on current device assignment domctl, so the user can use
 them to control which devices need to work as hotplug later, separately.

 And this could be left as a second step, in order for what needs to
 be done now to not get more complicated that necessary.


 Do you mean currently we still rely on the device assignment domctl to
 provide SBDF? So looks nothing should be changed in our policy.

 I can't connect your question to what I said. What I tried to tell you
 
 Something is misunderstanding to me.
 
 was that I don't currently see a need to make this overly complicated:
 Having the option to punch holes for all devices and (by default)
 dealing with just the devices assigned at boot may be sufficient as a
 first step. Yet (repeating just to avoid any misunderstanding) that
 makes things easier only if we decide to require device assignment to
 happen before memory getting populated (since in that case there's
 
 Here what do you mean, 'if we decide to require device assignment to 
 happen before memory getting populated'?
 
 Because -quote-
 
 In the present the device assignment is always after memory population. 
 And I also mentioned previously I double checked this sequence with printk.
 
 
 Or you already plan or deciede to change this sequence?

So it is now the 3rd time that I'm telling you that part of your
decision making as to which route to follow should be to
re-consider whether the current sequence of operations shouldn't
be changed. Please also consult with the VT-d maintainers (hint to
them: participating in this discussion publicly would be really nice)
on _all_ decisions to be made here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] Adjust number of domains in cpupools when destroying domain

2014-11-12 Thread Jan Beulich

 On 12.11.14 at 11:46, andrew.coop...@citrix.com wrote:
 On 12/11/14 10:40, Juergen Gross wrote:
 --- a/xen/common/cpupool.c
 +++ b/xen/common/cpupool.c
 @@ -225,6 +225,35 @@ static int cpupool_destroy(struct cpupool *c)
  }
  
  /*
 + * Move domain to another cpupool
 + */
 +static int cpupool_move_domain_unlocked(struct domain *d, struct cpupool *c)
 
 This isn't an unlocked function.  It is strictly called with the
 cpupool_lock held.  Per prevailing style, it should be named
 __cpupool_move_domain().

I generally disagree to this, even if this is the prevailing style.
Double-underscore prefixed names shouldn't be used at all in our
code, as they're being reserved by the C library standard (and
the compiler is free to introduce library calls named such). But
the question of course is valid why the function name says
unlocked when it's always being called with the lock held -
locked would seem more natural in this case. But in the end
Jürgen is the maintainer of that code, so he decides.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: vNUMA project

2014-11-12 Thread Jan Beulich

 On 12.11.14 at 14:45, wei.l...@citrix.com wrote:
 On Wed, Nov 12, 2014 at 09:35:01AM +, Jan Beulich wrote:
  On 11.11.14 at 19:03, david.vra...@citrix.com wrote:
  On 11/11/14 17:36, Wei Liu wrote:
  Option #1 requires less modification to guest, because guest won't
  need to switch to new hypercall. It's unclear at this point if a guest
  asks to populate a gpfn that doesn't belong to any vnode, what Xen
  should do about it. Should it be permissive or strict? 
  
  There are XENMEMF flags to request exact node or not  -- leave it up to
  the balloon driver.  The Linux balloon driver could try exact on all
  nodes before falling back to permissive or just always try inexact.
  
  Perhaps a XENMEMF_vnode bit to indicate the node is virtual?
 
 Yes. The only bad thing here is that we don't currently check in the
 hypervisor that unknown bits are zero, i.e. code using the new flag
 will need to have a separate means to find out whether the bit is
 supported. Not a big deal of course.
 
 
 If this new bit is set and domain has vnuma, then it's valid
 (supported); otherwise it's not.
 
 To not break existing guests, we can fall back to non-vnuma hinted
 allocation when the new bit is set and vnuma is not available.

While this is valid, none of this was my point - I was talking about a
new guest running on an older hypervisor.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: vNUMA project

2014-11-12 Thread Jan Beulich

 On 12.11.14 at 15:40, wei.l...@citrix.com wrote:
 So what's the usual technique in Linux to make sure if a specific
 Xen feature is present?
 
 Jan, is it suitable to use a XENFEAT_* bit for this?

Yes, that would be the canonical way.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/3] x86/xen: use the maximum MFN to calculate the required DMA mask

2014-11-12 Thread Jan Beulich

 On 12.11.14 at 16:25, david.vra...@citrix.com wrote:
 +u64
 +xen_swiotlb_get_required_mask(struct device *dev)
 +{
 + u64 max_mfn;
 +
 + max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);
 +
 + return DMA_BIT_MASK(fls64(max_mfn  PAGE_SHIFT) + 1);
 +}

The value the hypercall returns is exclusive and an unsigned long.
Hence I'd suggest

u64
xen_swiotlb_get_required_mask(struct device *dev)
{
unsigned long max_mfn;

max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);

return DMA_BIT_MASK(__fls(max_mfn - 1) + 1 + PAGE_SHIFT);
}

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxc: Turn on the pdpe1gb cpuid flag by default

2014-11-13 Thread Jan Beulich

 On 13.11.14 at 06:31, liang.z...@intel.com wrote:

Apart from the subject being kind of wrong (you're not turning it on
by default, but only expose it if the hardware supports it), ...

 --- a/tools/libxc/xc_cpuid_x86.c
 +++ b/tools/libxc/xc_cpuid_x86.c
 @@ -192,6 +192,7 @@ static void intel_xc_cpuid_policy(
  bitmaskof(X86_FEATURE_ABM));
  regs[3] = (bitmaskof(X86_FEATURE_NX) |
  bitmaskof(X86_FEATURE_LM) |
 +bitmaskof(X86_FEATURE_PAGE1GB) |
  bitmaskof(X86_FEATURE_SYSCALL) |
  bitmaskof(X86_FEATURE_RDTSCP));
  break;

... this is incomplete: The AMD function would seem to need the
same change, and the !PAE special casing in xc_cpuid_hvm_policy()
should imo disable the flag.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [TestDay]Xen-4.5-RC1: Dom0 failed to resume from S3 power state with operations on '/sys/power/state'

2014-11-13 Thread Jan Beulich

 On 13.11.14 at 03:32, robert...@intel.com wrote:
 Hi,
 
 This is a separated report for bug 
 http://bugzilla-archived.xenproject.org/bugzilla/show_bug.cgi?id=1897 
 
 Environment:
 
 Service Arch (ia32/ia32e/IA64): ia32e
 Guest Arch (ia32/ia32e/IA64): 
 Guest OS Type (Linux/Windows):
 Change Set: Xen4.5-RC1 (git:1b068a5b485062c862c20d8ba7674faa97ee3714)
 Hardware: Ivybridge-EP, Grantley-EP
 Other:
 
 
 Bug detailed description:
 --
 Boot into Dom0 (kernel 3.17.0), execute the command echo mem  
 /sys/power/state, to make Dom0
 system get into S3 sleep state, then press any key or power button to resume 
 
 from S3
 state, but failed and system hangs up with black screen.
 
 
 Reproduce steps:
 
 1. Boot into Dom0, execute the command echo mem  /sys/power/state to make
 Dom0 get into S3 state.
 2. Press any key  or press power button to resume from S3,no resume to dom0
 
 Current result:
 
 Dom0 system cannot resume from S3 power state.

Considering the Konrad confirmed this to work on other hardware
it's likely going to be difficult to analyze this for anyone not in
possession of a machine where things don't work as expected. I.e.
again presumably a case where at least initial investigation would
ideally be done by your engineers.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] About sharing pages between Xen and guest kernel

2014-11-13 Thread Jan Beulich

 On 13.11.14 at 00:52, am...@gatech.edu wrote:
 Hello,
 
 I am trying to set up a shared page between the hypervisor and a Linux guest 
 kernel. In Xen I am doing:
 
 void *ptr = alloc_xenheap_page();
 share_xen_page_with_guest(virt_to_page(ptr), current-domain, 
 XENSHARE_writable);
 unsigned int mfn = virt_to_mfn(ptr);
 
 And my plan was to pass the mfn to the guest kernel and retrieve the pfn 
 with mfn_to_pfn(). However, this returned ~0. 
 
 So I took a look at share_xen_page_with_guest() and I noticed it calls 
 set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY), 
 where INVALID_M2P_ENTRY = ~0.
 
 My questions are:
   1. Do I have to call set_gpfn_from_mfn() after calling 
 share_xen_page_with_guest() in the hypervisor?
   2. If yes, what gpfn should I use? I am thinking I could allocate a 
 page in 
 the guest, and retrieve the pfn with page_to_pfn(), 
   and then pass that to the hypervisor. But I don’t know if I am 
 overcomplicating things.

No, that wouldn't be correct (as is implied by 2). Instead you want
to map the page based on its MFN (i.e. potentially without even
establishing a PFN for it). But I suppose you looked at other
examples in the source code, and you should have realized that
this isn't really meant to be used for arbitrary memory sharing.
Perhaps xentrace's map_tbufs() would be the closest reference to
use.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] delaying 4.4.2 and 4.3.4

2014-11-13 Thread Jan Beulich

All,

while the 3 month period since the previous stable releases would
expire at the beginning of December, looking at the number of
changes in the stable trees I don't think starting preparations right
now would make much sense. Unless I hear severe objections to
this plan, and unless 4.5 gets significantly delayed, I think we
should look at RC1s for them once 4.5 is (almost) out the door.
That'll also minimize collisions between testing needs 4.5 and the
stable releases have.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/3] x86: allow dma_get_required_mask() to be overridden

2014-11-13 Thread Jan Beulich

 On 13.11.14 at 11:25, jbeul...@suse.com wrote:
 On 12.11.14 at 16:25, david.vra...@citrix.com wrote:
 --- a/arch/x86/include/asm/device.h
 +++ b/arch/x86/include/asm/device.h
 @@ -13,4 +13,6 @@ struct dev_archdata {
  struct pdev_archdata {
  };
  
 +#define ARCH_HAS_DMA_GET_REQUIRED_MASK
 
 Is there a particular reason you put this here rather than in
 dma-mapping.h?

Never mind - I see you need it that way so that struct dma_map_ops
gets the get_required_mask field defined, other than e.g. ia64 which
handles through another abstraction layer.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] xen: remove Xencomm

2014-11-13 Thread Jan Beulich

 On 13.11.14 at 17:03, t...@xen.org wrote:
 Being a feature that has only been used by ia64 and/or ppc it
 doesn't seem like we need to keep it any longer in the tree.
 
 So remove it.
 
 Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
 Signed-off-by: Tim Deegan t...@xen.org

Acked-by: Jan Beulich jbeul...@suse.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH RFC] EFI: allow retry of ExitBootServices() call

2014-11-14 Thread Jan Beulich

The specification is kind of vague under what conditions
ExitBootServices() may legitimately fail, requiring the OS loader to
retry:

If MapKey value is incorrect, ExitBootServices() returns
 EFI_INVALID_PARAMETER and GetMemoryMap() with ExitBootServices() must
 be called again. Firmware implementation may choose to do a partial
 shutdown of the boot services during the first call to
 ExitBootServices(). EFI OS loader should not make calls to any boot
 service function other then GetMemoryMap() after the first call to
 ExitBootServices().

While our code guarantees the map key to be valid, there are systems
where a firmware internal notification sent while processing
ExitBootServices() reportedly results in changes to the memory map.
In that case, make a best effort second try: Avoid any boot service
calls other than the two named above, with the possible exception of
error paths. Those aren't a problem, since if we end up needing to
retry, we're hosed when something goes wrong as much as if we didn't
make the retry attempt.

For x86, a minimal adjustment to efi_arch_process_memory_map() is
needed for it to cope with potentially being called a second time.

For arm64, while efi_process_memory_map_bootinfo() is easy to verify
that it can safely be called more than once without violating spec
constraints, it's not so obvious for fdt_add_uefi_nodes(), hence a
step by step approach:
- deletion of memory nodes and memory reserve map entries: the 2nd pass
  shouldn't find any as the 1st one deleted them all,
- a chosen node should be found as it got added in the 1st pass,
- the various linux,uefi-* nodes all got added during the 1st pass
  and hence only their contents may get updated.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -140,7 +140,7 @@ static void __init efi_arch_process_memo
 
 /* Populate E820 table and check trampoline area availability. */
 e = e820map - 1;
-for ( i = 0; i  map_size; i += desc_size )
+for ( e820nr = i = 0; i  map_size; i += desc_size )
 {
 EFI_MEMORY_DESCRIPTOR *desc = map + i;
 u64 len = desc-NumberOfPages  EFI_PAGE_SHIFT;
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -703,7 +703,7 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SY
 EFI_GRAPHICS_OUTPUT_PROTOCOL *gop = NULL;
 EFI_GRAPHICS_OUTPUT_MODE_INFORMATION *mode_info;
 union string section = { NULL }, name;
-bool_t base_video = 0;
+bool_t base_video = 0, retry;
 char *option_str;
 bool_t use_cfg_file;
 
@@ -1051,17 +1051,23 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SY
 if ( !efi_memmap )
 blexit(LUnable to allocate memory for EFI memory map);
 
-status = efi_bs-GetMemoryMap(efi_memmap_size, efi_memmap, map_key,
-  efi_mdesc_size, mdesc_ver);
-if ( EFI_ERROR(status) )
-PrintErrMesg(LCannot obtain memory map, status);
+for ( retry = 0; ; retry = 1 )
+{
+status = efi_bs-GetMemoryMap(efi_memmap_size, efi_memmap, map_key,
+  efi_mdesc_size, mdesc_ver);
+if ( EFI_ERROR(status) )
+PrintErrMesg(LCannot obtain memory map, status);
 
-efi_arch_process_memory_map(SystemTable, efi_memmap, efi_memmap_size,
-efi_mdesc_size, mdesc_ver);
+efi_arch_process_memory_map(SystemTable, efi_memmap, efi_memmap_size,
+efi_mdesc_size, mdesc_ver);
 
-efi_arch_pre_exit_boot();
+efi_arch_pre_exit_boot();
+
+status = efi_bs-ExitBootServices(ImageHandle, map_key);
+if ( status != EFI_INVALID_PARAMETER || retry )
+break;
+}
 
-status = efi_bs-ExitBootServices(ImageHandle, map_key);
 if ( EFI_ERROR(status) )
 PrintErrMesg(LCannot exit boot services, status);
 



EFI: allow retry of ExitBootServices() call

The specification is kind of vague under what conditions
ExitBootServices() may legitimately fail, requiring the OS loader to
retry:

If MapKey value is incorrect, ExitBootServices() returns
 EFI_INVALID_PARAMETER and GetMemoryMap() with ExitBootServices() must
 be called again. Firmware implementation may choose to do a partial
 shutdown of the boot services during the first call to
 ExitBootServices(). EFI OS loader should not make calls to any boot
 service function other then GetMemoryMap() after the first call to
 ExitBootServices().

While our code guarantees the map key to be valid, there are systems
where a firmware internal notification sent while processing
ExitBootServices() reportedly results in changes to the memory map.
In that case, make a best effort second try: Avoid any boot service
calls other than the two named above, with the possible exception of
error paths. Those aren't a problem, since if we end up needing to
retry, we're hosed when something goes wrong as much as if we didn't
make the retry attempt.

For x86, a minimal

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-14 Thread Jan Beulich

 On 14.11.14 at 14:11, li...@eikelenboom.it wrote:
 (XEN) [2014-11-14 13:00:06.810] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
 tainted ]
 (XEN) [2014-11-14 13:00:06.810] CPU:3
 (XEN) [2014-11-14 13:00:06.810] RIP:e008:[82d080148f14] 
 dpci_softirq+0x9c/0x231
 (XEN) [2014-11-14 13:00:06.810] RFLAGS: 00010283   CONTEXT: hypervisor
 (XEN) [2014-11-14 13:00:06.811] rax: 00100100   rbx: 830515c86d90 
  rcx: 0001

This and ...

 (XEN) [2014-11-14 13:00:06.811] Pagetable walk from 00100108:
 (XEN) [2014-11-14 13:00:06.811]  L4[0x000] = 00055d09a063 
 (XEN) [2014-11-14 13:00:06.811]  L3[0x000] = 00055d099063 
 (XEN) [2014-11-14 13:00:06.811]  L2[0x000] = 00055d098063 
  

... this suggest it hit a poisoned list entry, due to

#define LIST_POISON1  ((void *) 0x00100100)

But without you helping out with associating the crash location with
a source line (or allowing us to do so by making xen-syms available
somewhere) this is going to remain guesswork.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-14 Thread Jan Beulich

 On 14.11.14 at 16:20, li...@eikelenboom.it wrote:
 If it still helps i could try Andrews suggestion and try out with only 
 commit aeeea485 ..

Yes, even if it's pretty certain it's the second of the commits, verifying
this would be helpful (or if the assumption is wrong, the pattern it's
dying with would change and hence perhaps provide further clues).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-11-17 Thread Jan Beulich

 On 17.11.14 at 08:57, tiejun.c...@intel.com wrote:
 --- a/xen/common/memory.c
 +++ b/xen/common/memory.c
 @@ -698,10 +698,13 @@ struct get_reserved_device_memory {
   unsigned int used_entries;
   };
 
 -static int get_reserved_device_memory(xen_pfn_t start,
 -  xen_ulong_t nr, void *ctxt)
 +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr, u16 
 seg,
 +  u16 *ids, int cnt, void *ctxt)

While the approach is a lot better than what you did previously, I still
don't like you adding 3 new parameters when one would do (calling
the callback for each SBDF individually): That way you avoid
introducing a hidden dependency on how the VT-d code manages its
internal data.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC] EFI: allow retry of ExitBootServices() call

2014-11-17 Thread Jan Beulich

 On 14.11.14 at 16:32, konrad.w...@oracle.com wrote:
 On Fri, Nov 14, 2014 at 12:37:30PM +, Jan Beulich wrote:
 The specification is kind of vague under what conditions
 ExitBootServices() may legitimately fail, requiring the OS loader to
 retry:
 
 If MapKey value is incorrect, ExitBootServices() returns
  EFI_INVALID_PARAMETER and GetMemoryMap() with ExitBootServices() must
  be called again. Firmware implementation may choose to do a partial
  shutdown of the boot services during the first call to
  ExitBootServices(). EFI OS loader should not make calls to any boot
  service function other then GetMemoryMap() after the first call to
  ExitBootServices().
 
 While our code guarantees the map key to be valid, there are systems
 where a firmware internal notification sent while processing
 ExitBootServices() reportedly results in changes to the memory map.
 
 s/reportedly/in fact/
 In that case, make a best effort second try: Avoid any boot service
 calls other than the two named above, with the possible exception of
 error paths. Those aren't a problem, since if we end up needing to
 retry, we're hosed when something goes wrong as much as if we didn't
 make the retry attempt.
 
 For x86, a minimal adjustment to efi_arch_process_memory_map() is
 needed for it to cope with potentially being called a second time.
 
 Wow. Talk about timing. We saw this and were going to see
 doing something similar.

So what are your thoughts then regarding this patch going into 4.5?
And for the LIST_POISON* override one?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V2] Decouple SnadyBridge quirk form VTd timeout

2014-11-17 Thread Jan Beulich

 On 17.11.14 at 14:32, donald.d.dug...@intel.com wrote:
 Hmm, good ideas.  How about I change the `snb_igd_quirk' parameter to be:
 
 snb_igd_quirk = current behavior, enable 
 quirk with 1 sec timeout
 snb_igd_quirk=default  = enable quirk with theoretical max 
 timeout of 670 msec
 snb_igd_quirk=n = enable quirk with timeout of 
 `n' msec

Please retain the current boolean behavior (i.e. =yes, =no, etc should
all continue to be permitted). And the =yes case then would become
what you propose above as =default. Plus I see no reason for the lack
of a specified value to mean 1s - if 670ms is the theoretical may, there's
no need to support the 1s one by means other than the use specifying
that value on the command line.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V2] Decouple SnadyBridge quirk form VTd timeout

2014-11-17 Thread Jan Beulich

 On 17.11.14 at 15:51, donald.d.dug...@intel.com wrote:
 My primary goal is to decouple the IGD quirk code from the VTd timeout value, 
 the two are unrelated and shouldn't be using the same define.  In the process 
 we can clean up the IGD code so that, if a user wants, the user can specify a 
 more appropriate timeout value for the quirk code.  There's no need to change 
 the current behavior of the flag, just give the user more options.

It the VT-d maintainers agree with that, I certainly won't stand in the
way.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxc: Expose the pdpe1gb cpuid flag to guest

2014-11-17 Thread Jan Beulich

 On 17.11.14 at 16:39, ian.jack...@eu.citrix.com wrote:
 Liang Li writes ([PATCH] libxc: Expose the pdpe1gb cpuid flag to guest):
 If hardware support the pdpe1gb flag, expose it to guest by default.
 Users don't have to use a 'cpuid= ' option in config file to turn
 it on.
 
 I don't understand what this flag does.  I guess from the name it
 turns on 1G pages.  I guess these are supported ?
 
 I would like to see comment from an x86 hypervisor maintainer.

Yes, we support 1Gb pages. The purpose of the patch is to not
unconditionally hide the flag from guests. (I had commented on
v1, but sadly this one wasn't tagged as v2, nor was I included on
the Cc list, hence I didn't spot the new version.)

The one thing I'm not certain about is shadow mode: Only 2Mb
pages may be supported there. Tim?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC 0/7] xen: Clean-up of mem_event subsystem

2014-11-17 Thread Jan Beulich

 On 17.11.14 at 17:43, rcojoc...@bitdefender.com wrote:
 On 11/17/2014 06:37 PM, Jan Beulich wrote:
 On 12.11.14 at 16:48, andrew.coop...@citrix.com wrote:
 On 12/11/14 15:31, Tamas K Lengyel wrote:
  xen/include/public/domctl.h |  44 +--
  xen/include/public/hvm/params.h |   2 +-
  xen/include/public/mem_event.h  | 134 ---
  xen/include/public/memory.h |   6 +-
  xen/include/public/vm_event.h   | 179 +

 While in principle I think this series is a very good thing, there is a
 problem with editing the pubic header files.

 The contents of mem_event.h is not currently hidden behind #ifdef
 __XEN_TOOLS__

 As a result, it is strictly speaking part of the VM-visible public
 API/ABI and not permitted to change in a backwards incompatible manor.

 Having said that, it is currently only usable by privileged domains, so
 there is an argument to be made for declaring that it should have been
 hidden behind __XEN_TOOLS__ in the first place, making it permittable to
 change.
 
 I'm not sure I agree - the meaning of tools here would seem quite a
 bit different than e.g. in domctl.h. Looking at patch 1, I can't see how
 an old consumer (remember that for many of these we have at best
 fake consumers in the tree) would deal with the now differently
 arranged data. I don't see any versioning of the interface, and hence
 I can't see how tools would know which of the formats to expect.
 
 In the initial patch I've sent Tamas I had arranged things as follows,
 (so that the layout would stay compatible) but I think we ended up
 discussing it and deciding it would look cleaner to just re-do the whole
 thing:
 
 +struct mem_event_ept_data {
 +uint64_t gfn;
 +uint64_t offset;
 +uint64_t gla; /* if gla_valid */
 +};
 +
 +struct mem_event_cr_data {
 +uint64_t new_value;
 +uint64_t _pad;
 +uint64_t old_value;
 +};
 +
 +struct mem_event_int3_data {
 +uint64_t gfn;
 +uint64_t _pad;
 +uint64_t eip;
 +};
 +
 +struct mem_event_singlestep_data {
 +uint64_t gfn;
 +uint64_t _pad;
 +uint64_t eip;
 +};
 +
 +struct mem_event_msr_data {
 +uint64_t msr;
 +uint64_t old_value;
 +uint64_t new_value;
 +};
 +
  typedef struct mem_event_st {
  uint32_t flags;
  uint32_t vcpu_id;
 
 -uint64_t gfn;
 -uint64_t offset;
 -uint64_t gla; /* if gla_valid */
 +union {
 +struct mem_event_ept_dataept_event;
 +struct mem_event_cr_data cr_event;
 +struct mem_event_int3_data   int3_event;
 +struct mem_event_singlestep_data singlestep_event;
 +struct mem_event_msr_datamsr_event;
 +};
 
  uint32_t p2mt;
 
 Would something like this be more along the right lines?

Yes, afaic. But tool stack and mm maintainers need to be on the same
page before you should take this as the final route to follow.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC 0/7] xen: Clean-up of mem_event subsystem

2014-11-17 Thread Jan Beulich

 On 17.11.14 at 18:06, tamas.leng...@zentific.com wrote:
 On Mon, Nov 17, 2014 at 5:37 PM, Jan Beulich jbeul...@suse.com wrote:
 
  On 12.11.14 at 16:48, andrew.coop...@citrix.com wrote:
  On 12/11/14 15:31, Tamas K Lengyel wrote:
   xen/include/public/domctl.h |  44 +--
   xen/include/public/hvm/params.h |   2 +-
   xen/include/public/mem_event.h  | 134 ---
   xen/include/public/memory.h |   6 +-
   xen/include/public/vm_event.h   | 179 +
 
  While in principle I think this series is a very good thing, there is a
  problem with editing the pubic header files.
 
  The contents of mem_event.h is not currently hidden behind #ifdef
  __XEN_TOOLS__
 
  As a result, it is strictly speaking part of the VM-visible public
  API/ABI and not permitted to change in a backwards incompatible manor.
 
  Having said that, it is currently only usable by privileged domains, so
  there is an argument to be made for declaring that it should have been
  hidden behind __XEN_TOOLS__ in the first place, making it permittable to
  change.

 I'm not sure I agree - the meaning of tools here would seem quite a
 bit different than e.g. in domctl.h. Looking at patch 1, I can't see how
 an old consumer (remember that for many of these we have at best
 fake consumers in the tree) would deal with the now differently
 arranged data. I don't see any versioning of the interface, and hence
 I can't see how tools would know which of the formats to expect.
 
 The lack of versioning is a real concern which I have aired during the 4.5
 development process. For example, when we switched from HVMEM_access_* to
 XENMEM_access_* a customer had to do a bunch of manual configure checks to
 determine what is supported and what isn't. Furthermore, many of the
 related APIs have changed quite radically between Xen 4.1-4.5, some being
 abandoned midway just to resurface later in a different context. Going
 forward having a clear VM_EVENT_VERSION definition would be very helpful
 and would be the cleanest solution IMHO.

That's a concern different from mine - source compatibility may be
acceptable to get broken. Undetectable binary incompatibilities are
what worry me more.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-18 Thread Jan Beulich

 On 17.11.14 at 23:40, li...@eikelenboom.it wrote:
 OK, i still don't get why the output of debug-key 'i' reports +47 as pirq 
 here instead of the guest value 
 (g_gsi of for this legacy interrupt which is 40 ?), like it does when it's a 
 MSI with the PIRQ ?

No - as you said yourself, it's a matter of who uses which numbers.
Xen can't know the IRQ number the guest OS choses internally. It
can only tell you the number the hypervisor uses internally and the
one it told the guest to use to refer to it (the former is what we
commonly refer to as IRQ, while the latter is the pIRQ). Pin-based
(i.e. IO-APIC) interrupts should always use the same number for
both; MSI ones could only happen to have them match up, but
would use distinct numbers normally.

 I am puzzled by the driver binding twice to the same interrupt, but perhaps 
 that
 is just a buggy driver.
 
 Doesn't that happen more often like with integrated USB controllers ?
   17:  4  0  0  0  0  0  
 xen-pirq-ioapic-level  ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3
   18:   4385  0  0  0  0  0  
 xen-pirq-ioapic-level  ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, 
 ohci_hcd:usb7

No, these are all distinct devices - if you look at you lspci output,
you should find multiple EHCI and OHCI controller instances. It's
slightly odd that each kind gets grouped onto a single IO-APIC pin,
but that's a (legal) BIOS decision.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-18 Thread Jan Beulich

 On 17.11.14 at 19:01, li...@eikelenboom.it wrote:
 (XEN) [2014-11-17 17:54:18.695] CPU00:
 (XEN) [2014-11-17 17:54:18.705] CPU01:
 (XEN) [2014-11-17 17:54:18.716] d16 OK-softirq 62msec ago, state:1, 2628 
 count, 
 [prev:83054ef57e70, next:83054ef57e70] 83051b904428NULL 
 MAPPED_SHIFT GUEST_MSI_SHIFT  PIRQ:87
 (XEN) [2014-11-17 17:54:18.765] d16 OK-raise   112msec ago, state:1, 2628 
 count, [prev:00200200, next:00100100] 83051b904428NULL 
 MAPPED_SHIFT GUEST_MSI_SHIFT  PIRQ:87
 (XEN) [2014-11-17 17:54:18.815] CPU02:
 (XEN) [2014-11-17 17:54:18.825] d17 OK-softirq 500msec ago, state:1, 3439 
 count, [prev:83054ef47e70, next:83054ef47e70] 83051a1c8c28NULL 
 MAPPED_SHIFT GUEST_MSI_SHIFT  PIRQ:87
 (XEN) [2014-11-17 17:54:18.875] d17 OK-raise   549msec ago, state:1, 3439 
 count, [prev:00200200, next:00100100] 83051a1c8c28NULL 
 MAPPED_SHIFT GUEST_MSI_SHIFT  PIRQ:87
 (XEN) [2014-11-17 17:54:18.924] CPU03:
 (XEN) [2014-11-17 17:54:18.935] d16 OK-softirq 313msec ago, state:1, 3533 
 count, [prev:83054ef37e70, next:83054ef37e70] 83051b904428NULL 
 MAPPED_SHIFT GUEST_MSI_SHIFT  PIRQ:87
 (XEN) [2014-11-17 17:54:18.984] d16 OK-raise   363msec ago, state:1, 3533 
 count, [prev:00200200, next:00100100] 83051b904428NULL 
 MAPPED_SHIFT GUEST_MSI_SHIFT  PIRQ:87
 (XEN) [2014-11-17 17:54:19.034] CPU04:
 (XEN) [2014-11-17 17:54:19.044] d16 OK-softirq 359msec ago, state:1, 3691 
 count, [prev:83054ef27e88, next:83054ef27e88] 83051b904428NULL 
 MAPPED_SHIFT GUEST_MSI_SHIFT  PIRQ:87
 (XEN) [2014-11-17 17:54:19.094] d16 OK-raise   408msec ago, state:1, 3691 
 count, [prev:00200200, next:00100100] 83051b904428NULL 
 MAPPED_SHIFT GUEST_MSI_SHIFT  PIRQ:87
 (XEN) [2014-11-17 17:54:19.143] CPU05:
 (XEN) [2014-11-17 17:54:19.154] d16 OK-softirq 458msec ago, state:1, 52039 
 count, [prev:83054ef283e0, next:83054ef283e0] 
 83051b95fd28MACH_PCI_SHIFT MAPPED_SHIFT GUEST_PCI_SHIFT  PIRQ:0
 (XEN) [2014-11-17 17:54:19.205] d16 OK-raise   489msec ago, state:1, 52049 
 count, [prev:00200200, next:00100100] 
 83051b95fd28MACH_PCI_SHIFT MAPPED_SHIFT GUEST_PCI_SHIFT  PIRQ:0
 (XEN) [2014-11-17 17:54:19.257] d16 ERR-poison 561msec ago, state:0, 1 count, 
 [prev:00200200, next:00100100] 83051b95fd28MACH_PCI_SHIFT 
 MAPPED_SHIFT GUEST_PCI_SHIFT  PIRQ:0
 (XEN) [2014-11-17 17:54:19.307] d16 Z-softirq  731msec ago, state:3, 3 count, 
 [prev:83054ef283e0, next:83054ef283e0] 83051b95fd28MACH_PCI_SHIFT 
 MAPPED_SHIFT GUEST_PCI_SHIFT  PIRQ:0
 (XEN) [2014-11-17 17:54:19.356] domain_crash called from io.c:938
 (XEN) [2014-11-17 17:54:19.356] Domain 16 reported crashed by domain 32767 on 
 cpu#5:

I think what would help establishing the sequence of events would
be to at the very least calculate the times printed above relative to
a single NOW() invocation done in dump_debug() rather than
dump_record(). That may, however, yield meaningless values when
done at millisecond granularity. Hence, either using nanosecond
granularity or not using time values but a simple sequence counter
might be a desirable approach.

What puzzles me additionally is that for list_del() to encounter an
already removed entry, I'd expect the entry to (mistakenly) have
got added twice. Yet there's no sign of that (the most recent
OK-raise entry shows the list entry still having poisoned pointers).
Or it would need to have got inserted a second time on another
CPU, but the track record thereof having got overwritten. Perhaps
now that we suspect the legacy IRQ to be the problematic one the
patch could be adjusted to track only operations on non-MSI IRQs
(or separately all three PT_IRQ_TYPE_*)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-11-18 Thread Jan Beulich

 On 18.11.14 at 09:16, tiejun.c...@intel.com wrote:
 On 2014/11/18 16:01, Jan Beulich wrote:
 On 18.11.14 at 04:08, tiejun.c...@intel.com wrote:
 Here I tried to implement what you want. Note just pick two key
 fragments since others have no big deal.

 #1:

 @@ -898,14 +898,25 @@ int
 intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
{
struct acpi_rmrr_unit *rmrr;
int rc = 0;
 +unsigned int i;
 +u32 id;
 +u16 bdf;

list_for_each_entry(rmrr, acpi_rmrr_units, list)
{
 -rc = func(PFN_DOWN(rmrr-base_address),
 -  PFN_UP(rmrr-end_address) - PFN_DOWN(rmrr-base_address),
 -  ctxt);
 -if ( rc )
 -break;
 +for (i = 0; (bdf = rmrr-scope.devices[i]) 
 +i  rmrr-scope.devices_cnt  !rc; i++)
 +{
 +id = PCI_SBDF(rmrr-segment, bdf);
 +rc = func(PFN_DOWN(rmrr-base_address),
 +   PFN_UP(rmrr-end_address) -
 +PFN_DOWN(rmrr-base_address),
 +   id,
 +   ctxt);
 +if ( rc  0 )
 +return rc;
 +}
 +rc = 0;

 Getting close - the main issue is that (as previously mentioned) you
 should avoid open-coding for_each_rmrr_device(). It also doesn't
 
 Sorry, are you saying these lines?
 
   +for (i = 0; (bdf = rmrr-scope.devices[i]) 
   +i  rmrr-scope.devices_cnt  !rc; i++)
 
 So without lookuping devices[i], how can we call func() for each sbdf as 
 you mentioned?

You've got both rmrr and bdf in the body of for_each_rmrr_device().
After all - as I said - you just open-coded it.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for Xen 4.5] xen/arm: Add support for GICv3 for domU

2014-11-18 Thread Jan Beulich

 On 18.11.14 at 16:00, julien.gr...@linaro.org wrote:
 On 10/31/2014 09:02 AM, Jan Beulich wrote:
 On 30.10.14 at 19:51, julien.gr...@linaro.org wrote:
 The naming suggests that the #if really should be around just the
 gic_version field (with a dummy field in the #else case to be C89
 compatible, e.g. a zero width unnamed bitfield) and the
 corresponding #define-s above, ...
 
 Not really related to this patch... but the way to improve it (via
 extending createdomain).
 
 I need to create an empty structure. Is the dummy field really needed?
 If so, did you meant?
 
 struct
 {
int :0;
 }

Yes.

 The C spec declare this kind of structure as undefined.

I can't find anything saying so.

 Would an empty structure and used it be better?

Empty structures (and unions) aren't valid in standard C afaics, up to
and including C11. That was the whole point of suggesting the above
alternative, with me (maybe wrongly) believing that this would be valid.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Problems accessing passthrough PCI device

2014-11-18 Thread Jan Beulich

 On 18.11.14 at 17:24, furryfutt...@gmail.com wrote:
 Hello Jan,
 
 Friday, November 14, 2014, 5:27:36 AM, you wrote:
 
 I implied your earlier statement to mean that. But - did you also
 verify that the three flags actually end up set (ideally from both
 DomU and Dom0 perspective)? The PCI backend may be screwing
 up things...
 
 Yes I do verify the write. How do I check this from Dom0?
 
 Just use lspci.
 
 I've just checked this with lspci. I see that the IO is being enabled.

Memory you mean.

 Any   other   idea   on   why I might be reading back 0xff for all PCI
 memory area reads? The lspci output follows.

Since this isn't behind a bridge - no, not really. Did you try this with
any other device for comparison purposes?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-11-19 Thread Jan Beulich

 On 19.11.14 at 02:26, tiejun.c...@intel.com wrote:
  So without lookuping devices[i], how can we call func() for each sbdf as
 you mentioned?

 You've got both rmrr and bdf in the body of for_each_rmrr_device().
 After all - as I said - you just open-coded it.

 
 Yeah, so change this again,
 
 int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
 {
  struct acpi_rmrr_unit *rmrr;
  int rc = 0;
  unsigned int i;
  u16 bdf;
 
  for_each_rmrr_device ( rmrr, bdf, i )
  {
  rc = func(PFN_DOWN(rmrr-base_address),
 PFN_UP(rmrr-end_address) -
  PFN_DOWN(rmrr-base_address),
 PCI_SBDF(rmrr-segment, bdf),
ctxt);
  /* Hit this entry so just go next. */
  if ( rc == 1 )
  i = rmrr-scope.devices_cnt;
  else if ( rc  0 )
  return rc;
  }
 
  return rc;
 }

Better. Another improvement would be make it not depend on the
internal workings of for_each_rmrr_device()... And in any case you
should not special case 1 - just return when rc is negative and skip
the rest of the current RMRR when it's positive. And of course make
the function's final return value predictable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-20 Thread Jan Beulich

 On 20.11.14 at 02:23, sfl...@ihonk.com wrote:
 On 11/17/2014 23:54, Jan Beulich wrote:
 Another thing - now that serial logging appears to be working for
 you, did you try whether the host, once hung, still reacts to serial
 input (perhaps force input to go to Xen right at boot via the
 conswitch= option)? If so, 'd' debug-key output would likely be
 the piece of most interest.
 
 Here you go. Performed with a checkout of 9a727a81 (because it was 
 handy), let me know if you'd rather see the results from 4.5-rc2 or any 
 other Xen debugging info:

Interesting - all CPUs are executing stuff from Dom1, which be itself
is not indicating a problem, but may (considering the host hang) hint
at guest vCPU-s not getting de-scheduled anymore on one or more of
the CPUs. The 'a' debug key output would hopefully give us enough
information to know whether that's the case. Ideally do 'd' and 'a'
a couple of times each (alternating, and with a few seconds in
between).

Also, for double checking purposes, could you make the xen-syms
of a build you observed the problem with available somewhere?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v10 for-xen-4.5 2/2] dpci: Replace tasklet with an softirq

2014-11-20 Thread Jan Beulich

 On 19.11.14 at 17:44, konrad.w...@oracle.com wrote:
 On Fri, Nov 14, 2014 at 11:11:46AM -0500, Konrad Rzeszutek Wilk wrote:
 On Fri, Nov 14, 2014 at 03:13:42PM +, Jan Beulich wrote:
   On 12.11.14 at 03:23, konrad.w...@oracle.com wrote:
   +static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
   +{
   +struct domain *d = pirq_dpci-dom;
   +
   +ASSERT(spin_is_locked(d-event_lock));
   +
   +switch ( cmpxchg(pirq_dpci-state, 1  STATE_SCHED, 0) )
   +{
   +case (1  STATE_SCHED):
   +/*
   + * We are going to try to de-schedule the softirq before it 
   goes 
 in
   + * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 
   'dom'.
   + */
   +put_domain(d);
   +/* fallthrough. */
  
  Considering Sander's report, the only suspicious place I find is this
  one: When the STATE_SCHED flag is set, pirq_dpci is on some
  CPU's list. What guarantees it to get removed from that list before
  getting inserted on another one?
 
 None. The moment that STATE_SCHED is cleared, 'raise_softirq_for'
 is free to manipulate the list.
 
 I was too quick to say this. A bit more inspection shows that while
 'raise_softirq_for' is free to manipulate the list - it won't be called.
 
 The reason is that the pt_pirq_softirq_reset is called _after_ the IRQ
 action handler are removed for this IRQ. That means we will not receive
 any interrupts for it and call 'raise_softirq_for'. At least until
 'pt_irq_create_bind' is called. And said function has a check for
 this too:
 
 42  * A crude 'while' loop with us dropping the spinlock and giving
 243  * the softirq_dpci a chance to run.
 244  * We MUST check for this condition as the softirq could be scheduled
 245  * and hasn't run yet. Note that this code replaced tasklet_kill which
 246  * would have spun forever and would do the same thing (wait to flush 
 out
 247  * outstanding hvm_dirq_assist calls.
 248  */
 249 if ( pt_pirq_softirq_active(pirq_dpci) )

With that you correct your earlier reply, but I don't see how this
answers my original question.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Problems accessing passthrough PCI device

2014-11-20 Thread Jan Beulich

 On 19.11.14 at 16:12, furryfutt...@gmail.com wrote:
 This   is  getting  more  interesting.  It  seems  that  something  is
 overwriting the pci-back configuration data.
 
 Starting  from a fresh reboot I checked the Dom0 pci configuration and
 got this:
 
 root@smartin-xen:~# lspci -s 00:19.0 -x
 00:19.0 Ethernet controller: Intel Corporation Device 1559 (rev 04)
 00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00
 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 00 00 00 00
 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20
 30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00
 
 I then start/stop my DomU and checked the Dom0 pci configuration again
 and got this:
 
 root@smartin-xen:~# lspci -s 00:19.0 -x
 00:19.0 Ethernet controller: Intel Corporation Device 1559 (rev 04)
 00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00
 10: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20
 30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00

I.e. the BARs got zapped. Not something that should happen. And
certainly explaining why you would not be able to use the device a
second time.

 Inside  my  DomU I added code to print the PCI configuration registers
 and what I get after restarting the DomU is:
 
 (d18) 14:57:04.042 src/e1000e.c@00150: 00: 86 80 59 15 00 00 10 00 04 00 00 
 02 00 00 00 00
 (d18) 14:57:04.042 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 
 00 00 00 00 00
 (d18) 14:57:04.042 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 
 00 86 80 54 20
 (d18) 14:57:04.043 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 
 00 14 01 00 00
 (d18) 14:57:04.043 src/e1000e.c@00324: Enable PCI Memory Access
 (d18) 14:57:05.043 src/e1000e.c@00150: 00: 86 80 59 15 03 00 10 00 04 00 00 
 02 00 00 00 00
 (d18) 14:57:05.044 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00  
 00 00 00 00 00
 (d18) 14:57:05.044 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 
 00 86 80 54 20
 (d18) 14:57:05.045 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 
 00 14 01 00 00
 
 As  you can see the pci configuration read from the pci-back driver by
 my DomU is different to the data in the Dom0 pci configuration!

The only byte that is different afaics is the one at 0x3c, and that's
fine. Plus comparing with what Dom0 sees before the guest was
started or after the guest was stopped isn't really meaningful -
instead you'd want to compare against what Dom0 sees while the
guest is running.

But in the end your problem may have to do with the various
warnings issued regarding unrecognized DMAR table data, all
relating to the use of namespaces that our code doesn't have
support for.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 0/3] x86: XSA-109/110 follow-up (to be considered for 4.5)

2014-11-20 Thread Jan Beulich

1: tighten page table owner checking in do_mmu_update()
2: don't ignore foreigndom input on various MMUEXT ops
3: HVM: don't crash guest upon problems occurring in user mode

Reason to request considering this for 4.5: Tightened argument
checking (as done in the first two patches) reduces the chances
of them getting abused. Not unduly crashing the guest (as done
in the third one) may avoid future security issues of guest user
mode affecting the guest kernel.

Signed-off-by: Jan Beulich jbeul...@suse.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 1/3] x86: tighten page table owner checking in do_mmu_update()

2014-11-20 Thread Jan Beulich

MMU_MACHPHYS_UPDATE, not manipulating page tables, shouldn't ignore
a bad page table domain being specified.

Also pt_owner can't be NULL when reaching the out label, so the
respective check can be dropped.

Signed-off-by: Jan Beulich jbeul...@suse.com
Acked-by: Tim Deegan t...@xen.org

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3618,6 +3618,11 @@ long do_mmu_update(
 break;
 
 case MMU_MACHPHYS_UPDATE:
+if ( unlikely(d != pt_owner) )
+{
+rc = -EPERM;
+break;
+}
 
 mfn = req.ptr  PAGE_SHIFT;
 gpfn = req.val;
@@ -3694,7 +3699,7 @@ long do_mmu_update(
 perfc_add(num_page_updates, i);
 
  out:
-if ( pt_owner  (pt_owner != d) )
+if ( pt_owner != d )
 rcu_unlock_domain(pt_owner);
 
 /* Add incremental work we have done to the @done output parameter. */



x86: tighten page table owner checking in do_mmu_update()

MMU_MACHPHYS_UPDATE, not manipulating page tables, shouldn't ignore
a bad page table domain being specified.

Also pt_owner can't be NULL when reaching the out label, so the
respective check can be dropped.

Signed-off-by: Jan Beulich jbeul...@suse.com
Acked-by: Tim Deegan t...@xen.org

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3618,6 +3618,11 @@ long do_mmu_update(
 break;
 
 case MMU_MACHPHYS_UPDATE:
+if ( unlikely(d != pt_owner) )
+{
+rc = -EPERM;
+break;
+}
 
 mfn = req.ptr  PAGE_SHIFT;
 gpfn = req.val;
@@ -3694,7 +3699,7 @@ long do_mmu_update(
 perfc_add(num_page_updates, i);
 
  out:
-if ( pt_owner  (pt_owner != d) )
+if ( pt_owner != d )
 rcu_unlock_domain(pt_owner);
 
 /* Add incremental work we have done to the @done output parameter. */
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 2/3] x86: don't ignore foreigndom input on various MMUEXT ops

2014-11-20 Thread Jan Beulich

Instead properly fail requests that shouldn't be issued on foreign
domains or - for MMUEXT_{CLEAR,COPY}_PAGE - extend the existing
operation to work that way.

In the course of doing this the need to always clear okay even when
wanting an error code other than -EINVAL became unwieldy, so the
respective logic is being adjusted at once, together with a little
other related cleanup.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3062,23 +3062,23 @@ long do_mmuext_op(
 }
 
 case MMUEXT_NEW_BASEPTR:
-if ( paging_mode_translate(d) )
-okay = 0;
+if ( unlikely(d != pg_owner) )
+rc = -EPERM;
+else if ( unlikely(paging_mode_translate(d)) )
+rc = -EINVAL;
 else
-{
 rc = new_guest_cr3(op.arg1.mfn);
-okay = !rc;
-}
 break;
 
 case MMUEXT_NEW_USER_BASEPTR: {
 unsigned long old_mfn;
 
-if ( paging_mode_translate(current-domain) )
-{
-okay = 0;
+if ( unlikely(d != pg_owner) )
+rc = -EPERM;
+else if ( unlikely(paging_mode_translate(d)) )
+rc = -EINVAL;
+if ( unlikely(rc) )
 break;
-}
 
 old_mfn = pagetable_get_pfn(curr-arch.guest_table_user);
 /*
@@ -3136,12 +3136,17 @@ long do_mmuext_op(
 }
 
 case MMUEXT_TLB_FLUSH_LOCAL:
-flush_tlb_local();
+if ( likely(d == pg_owner) )
+flush_tlb_local();
+else
+rc = -EPERM;
 break;
 
 case MMUEXT_INVLPG_LOCAL:
-if ( !paging_mode_enabled(d) 
- || paging_invlpg(curr, op.arg1.linear_addr) != 0 )
+if ( unlikely(d != pg_owner) )
+rc = -EPERM;
+else if ( !paging_mode_enabled(d) ||
+  paging_invlpg(curr, op.arg1.linear_addr) != 0 )
 flush_tlb_one_local(op.arg1.linear_addr);
 break;
 
@@ -3150,13 +3155,16 @@ long do_mmuext_op(
 {
 cpumask_t pmask;
 
-if ( unlikely(vcpumask_to_pcpumask(d,
-guest_handle_to_param(op.arg2.vcpumask, 
const_void),
-pmask)) )
-{
-okay = 0;
+if ( unlikely(d != pg_owner) )
+rc = -EPERM;
+else if ( unlikely(vcpumask_to_pcpumask(d,
+   guest_handle_to_param(op.arg2.vcpumask,
+ const_void),
+   pmask)) )
+rc = -EINVAL;
+if ( unlikely(rc) )
 break;
-}
+
 if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
 flush_tlb_mask(pmask);
 else
@@ -3165,18 +3173,26 @@ long do_mmuext_op(
 }
 
 case MMUEXT_TLB_FLUSH_ALL:
-flush_tlb_mask(d-domain_dirty_cpumask);
+if ( likely(d == pg_owner) )
+flush_tlb_mask(d-domain_dirty_cpumask);
+else
+rc = -EPERM;
 break;
 
 case MMUEXT_INVLPG_ALL:
-flush_tlb_one_mask(d-domain_dirty_cpumask, op.arg1.linear_addr);
+if ( likely(d == pg_owner) )
+flush_tlb_one_mask(d-domain_dirty_cpumask, 
op.arg1.linear_addr);
+else
+rc = -EPERM;
 break;
 
 case MMUEXT_FLUSH_CACHE:
-if ( unlikely(!cache_flush_permitted(d)) )
+if ( unlikely(d != pg_owner) )
+rc = -EPERM;
+else if ( unlikely(!cache_flush_permitted(d)) )
 {
 MEM_LOG(Non-physdev domain tried to FLUSH_CACHE.);
-okay = 0;
+rc = -EACCES;
 }
 else
 {
@@ -3185,8 +3201,8 @@ long do_mmuext_op(
 break;
 
 case MMUEXT_FLUSH_CACHE_GLOBAL:
-if ( unlikely(foreigndom != DOMID_SELF) )
-okay = 0;
+if ( unlikely(d != pg_owner) )
+rc = -EPERM;
 else if ( likely(cache_flush_permitted(d)) )
 {
 unsigned int cpu;
@@ -3211,7 +3227,9 @@ long do_mmuext_op(
 unsigned long ptr  = op.arg1.linear_addr;
 unsigned long ents = op.arg2.nr_ents;
 
-if ( paging_mode_external(d) )
+if ( unlikely(d != pg_owner) )
+rc = -EPERM;
+else if ( paging_mode_external(d) )
 {
 MEM_LOG(ignoring SET_LDT hypercall from external domain);
 okay = 0;
@@ -3237,7 +3255,7 @@ long do_mmuext_op(
 case MMUEXT_CLEAR_PAGE: {
 struct page_info *page;
 
-page = get_page_from_gfn

Re: [Xen-devel] [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.

2014-11-20 Thread Jan Beulich

 On 19.11.14 at 23:21, konrad.w...@oracle.com wrote:
 @@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci 
 *pirq_dpci)
  }
  
  /*
 - * Reset the pirq_dpci-dom parameter to NULL.
 + * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set,
 + * reset pirq_dpci-dom parameter to NULL (used for teardown).
   *
   * This function checks the different states to make sure it can do it
   * at the right time. If it unschedules the 'hvm_dirq_assist' from running
   * it also refcounts (which is what the softirq would have done) properly.
   */
 -static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
 +static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
 +   unsigned int clear)

Looks reasonable apart from this wanting to be bool_t, and apart
from the fact that iiuc it still doesn't eliminate the observed crashes
(in which case the fix for that may better be folded into here). I'm
not really convinced that this fix is what you currently have as
patch 2 (as the previously mentioned problem of _reset() [now
_cancel()] clearing STATE_SCHED without removing the list entry
from the list is now becoming even more obvious), but I'll also
comment there.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.

2014-11-20 Thread Jan Beulich

 On 19.11.14 at 23:21, konrad.w...@oracle.com wrote:

Leaving aside the question of whether this is the right approach, in
case it is a couple of comments:

 @@ -85,7 +91,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci 
 *pirq_dpci)
   */
  bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
  {
 -if ( pirq_dpci-state  ((1  STATE_RUN) | (1  STATE_SCHED)) )
 +if ( pirq_dpci-state  ((1  STATE_RUN) | (1  STATE_SCHED) | (1  
 STATE_ZOMBIE) ) )

This is becoming unwieldy - perhaps better just if ( pirq_dpci-state )?

 @@ -122,6 +128,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci 
 *pirq_dpci,
  /* fallthrough. */
  case (1  STATE_RUN):
  case (1  STATE_RUN) | (1  STATE_SCHED):
 +case (1  STATE_RUN) | (1  STATE_SCHED) | (1  STATE_ZOMBIE):

How would we get into this state? Afaict STATE_ZOMBIE can't be set
at the same time as STATE_SCHED.

 @@ -786,6 +793,7 @@ unlock:
  static void dpci_softirq(void)
  {
  unsigned int cpu = smp_processor_id();
 +unsigned int reset = 0;

bool_t (and would better be inside the loop, avoiding the pointless
re-init on the last iteration).

But taking it as a whole - aren't we now at risk of losing an interrupt
instance the guest expects, due to raise_softirq_for() bailing on
zombie entries, and dpci_softirq() being the only place where the
zombie state gets cleared?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 4/4] x86/xen: use the maximum MFN to calculate the required DMA mask

2014-11-20 Thread Jan Beulich

 On 19.11.14 at 17:02, david.vra...@citrix.com wrote:
 On a Xen PV guest the DMA addresses and physical addresses are not 1:1
 (such as Xen PV guests) and the generic dma_get_required_mask() does
 not return the correct mask (since it uses max_pfn).
 
 Some device drivers (such as mptsas, mpt2sas) use
 dma_get_required_mask() to set the device's DMA mask to allow them to
 use only 32-bit DMA addresses in hardware structures.  This results in
 unnecessary use of the SWIOTLB if DMA addresses are more than 32-bits,
 impacting performance significantly.
 
 Provide a get_required_mask op that uses the maximum MFN to calculate
 the DMA mask.
 
 Signed-off-by: David Vrabel david.vra...@citrix.com

Reviewed-by: Jan Beulich jbeul...@suse.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/3] x86/HVM: don't crash guest upon problems occurring in user mode

2014-11-20 Thread Jan Beulich

 On 20.11.14 at 12:34, t...@xen.org wrote:
 At 11:13 +0100 on 20 Nov (1416478386), Jan Beulich wrote:
 This extends commit 5283b310 (x86/HVM: only kill guest when unknown VM
 exit occurred in guest kernel mode) to further cases, including the
 failed VM entry one that XSA-110 was needed to be issued for.
 
 Signed-off-by: Jan Beulich jbeul...@suse.com
 
 This seems like a good idea in general, but I'm not sure it's
 appropriate for _all_ of these.  Unhandled exit types and 
 overlong instruction decode seem obviously good. 
 
 hvm_hap_nested_page_fault() returns 0: seems only to happen for pvh
 guests that write to read-only memory (?).  That seems like a
 different class of failure.  I don't think our response should be
 different based on the privilege level here, although domain_crash()
 does seem harsh.  (I presume this is to avoid emulating an instruction
 in PVH mode?)  If we're changing this, I think it should be to #GP
 rather than #UD.

I dropped those for now. We should re-evaluate this once we
have #VE support on VMX (i.e. we may want to inject that one
there instead).

 p2m_pt_handle_deferred_changes() returns  0: AFAICS this is basically
 ENOMEM when trying to update p2m tables.  It's so unlikely to be
 caused by userspace activity that disguising it with #UD is probably
 just unhelpful.  It turns a clean failure into an undebuggable
 intermittent glitch.

Dropped too.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC for 4.6] xen: Extend DOMCTL createdomain to support arch configuration

2014-11-20 Thread Jan Beulich

 On 18.11.14 at 20:20, julien.gr...@linaro.org wrote:
 --- a/xen/arch/x86/setup.c
 +++ b/xen/arch/x86/setup.c
 @@ -540,6 +540,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
  int i, j, e820_warn = 0, bytes = 0;
  bool_t acpi_boot_table_init_done = 0;
  struct domain *dom0;
 +struct arch_domainconfig config;
  struct ns16550_defaults ns16550 = {
  .data_bits = 8,
  .parity= 'n',
 @@ -1347,7 +1348,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
  domcr_flags |= DOMCRF_pvh | DOMCRF_hap;
  
  /* Create initial domain 0. */
 -dom0 = domain_create(0, domcr_flags, 0);
 +memset(config, 0, sizeof(config));
 +dom0 = domain_create(0, domcr_flags, 0, config);

Why not NULL like almost everywhere else?

 --- a/xen/include/public/arch-x86/xen.h
 +++ b/xen/include/public/arch-x86/xen.h
 @@ -228,6 +228,11 @@ struct arch_shared_info {
  };
  typedef struct arch_shared_info arch_shared_info_t;
  
 +struct arch_domainconfig {
 +char dummy;
 +};
 +typedef struct arch_domainconfig arch_domainconfig_t;

I think we decided that, regardless of all bad precedents, we'd stop
adding things without xen_ prefix to the public interface. I'm also
rather uncertain about all these typedef-s - I think we could very
well get away without adding any new ones (if internally to the
hypervisor or tools they turn out to make the code much better
readable, each such component could still introduce one on its own).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-11-20 Thread Jan Beulich

 On 20.11.14 at 15:40, kevin.t...@intel.com wrote:
  From: Tian, Kevin
 Sent: Thursday, November 20, 2014 4:51 PM
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Thursday, November 20, 2014 4:04 PM
   On 20.11.14 at 08:45, kevin.t...@intel.com wrote:
   Current option sounds a reasonable one, i.e. passing a list of BDFs
   assigned to this VM before populating p2m, and then having
   hypervisor to filter out reserved regions associated with those
   BDFs. This way libxc teaches Xen to create reserved regions once,
   and then later the filtered info is returned upon query.

  Reasonable, but partly redundant. The positive aspect being that
  it permits this list and the list of actually being assigned devices to
  be different, i.e. allowing holes to be set up for devices that only
  _may_ get assigned at some point.

 redundant if we think the list only exactly matching the statically
 assigned devices. but that's just current point.

 reasonable if we think there may be other policies impacting the list
 (e.g. if hotplugable device may have a config option and then we have
 a potential list larger than static assigned devices). From this angle
 I think the new interface actually makes sense for this very purpose.

 Jan are you OK with this?

I can live with it, yes.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.

2014-11-20 Thread Jan Beulich

 On 20.11.14 at 20:51, konrad.w...@oracle.com wrote:
 @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d, struct 
 hvm_pirq_dpci *pirq_dpci)
  ASSERT(d-arch.hvm_domain.irq.dpci);
  
  spin_lock(d-event_lock);
 -if ( pirq_dpci-state )
 +if ( test_and_clear_bool(pirq_dpci-masked) )
  {
  struct pirq *pirq = dpci_pirq(pirq_dpci);
  const struct dev_intx_gsi_link *digl;

So this now guards solely against the timeout enforced EOI? Why do
you no longer need to guard against cancellation (i.e. why isn't this
looking at both -state and -masked)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][RFC][PATCH 04/13] hvmloader/util: get reserved device memory maps

2014-11-20 Thread Jan Beulich

 On 21.11.14 at 08:43, kevin.t...@intel.com wrote:
  From: Chen, Tiejun
 Sent: Friday, November 21, 2014 2:26 PM
 
 On 2014/11/3 18:02, Jan Beulich wrote:
  On 03.11.14 at 10:55, tiejun.c...@intel.com wrote:
  On 2014/11/3 17:45, Jan Beulich wrote:
  On 03.11.14 at 10:32, tiejun.c...@intel.com wrote:
  On 2014/11/3 16:53, Jan Beulich wrote:
  On 03.11.14 at 03:22, tiejun.c...@intel.com wrote:
  On 2014/10/31 16:14, Jan Beulich wrote:
  On 31.10.14 at 03:20, tiejun.c...@intel.com wrote:
  On 2014/10/30 17:13, Jan Beulich wrote:
  On 30.10.14 at 06:55, tiejun.c...@intel.com wrote:
  On 2014/10/29 17:05, Jan Beulich wrote:
  On 29.10.14 at 07:54, tiejun.c...@intel.com wrote:
  Looks I can remove those stuff from util.h and just add 'extern'
 to them
  when we really need them.
 
  Please stop thinking this way. Declarations for things defined
 in .c
  files are to be present in headers, and the defining .c file has 
  to
  include that header (making sure declaration and definition are
 and
  remain in sync). I hate having to again repeat my remark that
 you
  shouldn't forget it's not application code that you're modifying.
  Robust and maintainable code are a requirement in the
 hypervisor
  (and, as said it being an extension of it, hvmloader). Which - 
  just
  to avoid any misunderstanding - isn't to say that this shouldn't
 also
  apply to application code. It's just that in the hypervisor and
 kernel
  (and certain other code system components) the consequences
 of
  being lax are much more severe.
 
  Okay. But currently, the pci.c file already include 'util.h' and
  'xen/memory.h,
 
  #include util.h
  ...
  #include xen/memory.h
 
  We can't redefine struct xen_reserved_device_memory in util.h.
 
  Redefine? I said forward declare.
 
  Seems we just need to declare
 hvm_get_reserved_device_memory_map() in
  the head file, tools/firmware/hvmloader/util.h,
 
  unsigned int hvm_get_reserved_device_memory_map(void);
 
  To me this looks very much like poor programming style, even if in
  the context of hvmloader communicating information via global
  variables rather than function arguments and return values is
 
  Do you mean you don't like a global variable? But it can be use to get
  RDM without more hypercall or function call in the context of
 hvmloader.
 
  This argument which you brought up before, and which we commented
  on before, is pretty pointless. We don't really care much about doing
  one or two more hypercalls from hvmloader, unless these would be
  long-running ones.
 
 
  Another benefit to use a global variable is that we wouldn't allocate
  xen_reserved_device_memory * N each time, and reduce some
 duplicated
  codes, unless you mean I should define that as static inside in local.
 
  Now this reason is indeed worth a consideration. How many times is
  the data being needed/retrieved?
 
  Currently there are two cases in tools/hvmloader, setup pci and build
  e820 table. Each time, as you know we don't know how may entries we
  should require, so we always allocate one instance then according to the
  return value to allocate the proper instances to get that.
 
  Hmm, two uses isn't really that bad, i.e. I'd then still be in favor of
  a more normal interface.
 
 
 Just go back here since I realize we always use mem_alloc(), which is
 pick from RESERVED_MEMORY, to allocate all buffer inside this hypercall
 caller in hvmloader, but unfortunately we have no any associated free
 function implementation in hvmloader, so if we call this multiple times
 this means it really waster more memory in RESERVED_MEMORY. So I still
 think one global variable should be fine.
 
 it's natural to get reserved information once, and then saved for either
 one-time or multiple-time references.

Not really natural, but possible. Yet if done this way, then the
interface shouldn't give the appearance of retrieving it every time,
i.e. the global should be set up separately and the users of the
data should access the data rather than calling a (fake) function.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-21 Thread Jan Beulich

 On 20.11.14 at 21:07, sfl...@ihonk.com wrote:
 Running with mwait-idle=0 solves (hides?) the problem. Next step is to 
 fiddle with the C states?

So this also prompted me to go over the list of errata. Just to confirm
- your CPU is family 6 model 44? What stepping? And what nominal
frequency?

There are a couple potentially relevant errata (BC36, BC38, BC54,
BC77, BC110).

To exclude BC36, a boot log with apic-verbosity=debug and debug
key 'i' output would be necessary.

BC38 should not affect us since we don't enter C states from ISRs.

BC54 is probably irrelevant since we meanwhile know that your
system doesn't really hang hard.

For BC77 it would be worth trying to disable turbo mode instead of
disabling the mwait-idle driver (xenpm disable-turbo-mode right
after boot).

And BC110 would be relevant only if without the mwait-idle driver
there would be no use of C3. Plus anyway this would more likely end
up in a hard hang too.

And then, considering that my system with a family 6 model 44 CPU
has never shown anything similar (albeit that doesn't mean all that
much since our workloads are likely very different), you're not
over-clocking? And did you disable hyper-threading on purpose (if
so could you check whether enabling it makes a difference)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Buggy interaction of live migration and p2m updates

2014-11-21 Thread Jan Beulich

 On 20.11.14 at 19:28, andrew.coop...@citrix.com wrote:
 Should the guest change the p2m structure during live migration, the
 toolstack ends up with a stale p2m with a non-p2m frame in the middle,
 resulting in bogus cross-referencing.  Should the guest change an entry
 in the p2m, the p2m frame itself will be resent as it would be marked as
 dirty in the logdirty bitmap, but the target pfn will remain unsent and
 probably stale on the receiving side.

MMU_MACHPHYS_UPDATE processing marks the page being changed
as dirty. Perhaps guest_physmap_{add,remove}_page() (or certain
callers thereof) should do so too?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list

2014-11-21 Thread Jan Beulich

 On 21.11.14 at 14:37, jgr...@suse.com wrote:
 On 11/21/2014 02:26 PM, Andrew Cooper wrote:
 On 21/11/14 12:57, Juergen Gross wrote:
 On 11/21/2014 01:23 PM, Jan Beulich wrote:
 On 14.11.14 at 10:37, jgr...@suse.com.non-mime.internet wrote:
 --- a/xen/include/public/arch-x86/xen.h
 +++ b/xen/include/public/arch-x86/xen.h
 @@ -224,7 +224,12 @@ struct arch_shared_info {
/* Frame containing list of mfns containing list of mfns
 containing p2m. */
xen_pfn_t pfn_to_mfn_frame_list_list;
unsigned long nmi_reason;
 -uint64_t pad[32];
 +/*
 + * Following two fields are valid if pfn_to_mfn_frame_list_list
 contains
 + * ~0UL.
 + */
 +unsigned long p2m_vaddr;/* virtual address of the p2m list */
 +unsigned long p2m_as_root;  /* mfn of the top level page table */

 xen_pfn_t please. And what does the as in the name stand for?

 as is address space. I can rename it to e.g. p2m_pgd_mfn.

 That is a linuxism in naming, which is also not accurate.

  From my understanding, this frame must be an L4 table for 64bit guests,
 and an L3 table for 32bit guests.  I.e. it is effectively a cr3 with
 which to use the p2m_vaddr field above?
 
 Okay, so p2m_cr3 then?

I'm okay with that, but would think p2m_pt_root would be more
generic. Considering PAE, wouldn't this need to be a 64-bit value
then even in the 32-bit interface? Or alternatively be represented
as MFN instead?

Jan



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] PCI-passthrough for 32 bit guests and high MMIO addresses

2014-11-21 Thread Jan Beulich

 On 21.11.14 at 15:39, jgr...@suse.com wrote:
 Trying to do PCI-passthrough with a 32-bit pv-domain I passed the
 wrong device to the domain. The MMIO address was too large for a
 MFN of a 32-bit system (it was 38000320-3800036f).
 
 Instead of rejecting the operation Xen tried to perform it resulting
 in a (quite understandable) failure in the domU.
 
 I think either the hypervisor or the tools should refuse to do
 PCI-passthrough in this case.

What's wrong with this large an address? 32-bit PV uses PAE, i.e.
can map them. If the kernel isn't capable of that that's not
something to make Xen (or the tools) refuse such assignments. I
would only see an issue if a hypercall interface involved here isn't
using wide enough fields (but these addresses should be read
from the BARs, i.e. no hypercall involved).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] PCI-passthrough for 32 bit guests and high MMIO addresses

2014-11-21 Thread Jan Beulich

 On 21.11.14 at 16:01, andrew.coop...@citrix.com wrote:
 On 21/11/14 14:54, Jan Beulich wrote:
 On 21.11.14 at 15:39, jgr...@suse.com wrote:
 Trying to do PCI-passthrough with a 32-bit pv-domain I passed the
 wrong device to the domain. The MMIO address was too large for a
 MFN of a 32-bit system (it was 38000320-3800036f).

 Instead of rejecting the operation Xen tried to perform it resulting
 in a (quite understandable) failure in the domU.

 I think either the hypervisor or the tools should refuse to do
 PCI-passthrough in this case.
 What's wrong with this large an address?
 
 It is wider than 44 bits, so doesn't fit in a 32bit pfn for p2m/m2p
 update operations.

MMIO regions don't go into these tables.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] PCI-passthrough for 32 bit guests and high MMIO addresses

2014-11-21 Thread Jan Beulich

 On 21.11.14 at 16:17, jgr...@suse.com wrote:
 On 11/21/2014 03:54 PM, Jan Beulich wrote:
 On 21.11.14 at 15:39, jgr...@suse.com wrote:
 Trying to do PCI-passthrough with a 32-bit pv-domain I passed the
 wrong device to the domain. The MMIO address was too large for a
 MFN of a 32-bit system (it was 38000320-3800036f).

 Instead of rejecting the operation Xen tried to perform it resulting
 in a (quite understandable) failure in the domU.

 I think either the hypervisor or the tools should refuse to do
 PCI-passthrough in this case.

 What's wrong with this large an address? 32-bit PV uses PAE, i.e.
 can map them. If the kernel isn't capable of that that's not
 something to make Xen (or the tools) refuse such assignments. I
 would only see an issue if a hypercall interface involved here isn't
 using wide enough fields (but these addresses should be read
 from the BARs, i.e. no hypercall involved).
 
 The MFN format is part of the pv-ABI. And a MFN of a 32-bit pv-guest is
 only 32 bits (even if don't take the invalid bit into account).
 
 Should a pv-guest really be capable to map an address outside it's
 accessible MFN-range?

For MMIO, why not? All that counts there is the page table entry
format.

 Are the tools capable of processing such a mapping in case of saving the
 domain?

MMIO isn't subject to saving/restoring.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.

2014-11-21 Thread Jan Beulich

 On 21.11.14 at 16:14, konrad.w...@oracle.com wrote:
 On Fri, Nov 21, 2014 at 01:03:43PM +0100, Jan Beulich wrote:
  On 21.11.14 at 12:50, konrad.w...@oracle.com wrote:
  On November 21, 2014 2:51:33 AM EST, Jan Beulich jbeul...@suse.com wrote:
  On 20.11.14 at 20:51, konrad.w...@oracle.com wrote:
  @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d,
 struct hvm_pirq_dpci *pirq_dpci)
   ASSERT(d-arch.hvm_domain.irq.dpci);
   
   spin_lock(d-event_lock);
  -if ( pirq_dpci-state )
  +if ( test_and_clear_bool(pirq_dpci-masked) )
   {
   struct pirq *pirq = dpci_pirq(pirq_dpci);
   const struct dev_intx_gsi_link *digl;
 
 So this now guards solely against the timeout enforced EOI? Why do
 you no longer need to guard against cancellation (i.e. why isn't this
 looking at both -state and -masked)?
 
  
  The previous state check was superfluous as the dpci_softirq would check 
  for 
  the valid STATE_ before calling hvm_dirq_assist (and deal with 
  cancellation).
 
 I thought so first too, but then how is the guarding against
 cancellation working now?
 
 Since there are two forms of cancellation, lets consider each one seperatly:
 
 1). pt_irq_create_bind and pt_irq_destroy_bind. Each of them calling 
 pt_pirq_softirq_reset in the 'error' case or in the normal path of 
 destruction.
 When that happens the action handler for the IRQ is removed the moment we 
 call
 pirq_guest_unbind. As such we only have to deal with the potential 
 outstanding
 pirq_dpci waiting to be serviced. Since we did the 'pt_pirq_softirq_reset'
 we have changed the state from STATE_SCHED to zero. That means 
 dpci_softirq will
 NOT go in:
   if ( test_and_clear_bit(STATE_SCHED, pirq_dpci-state) )   

Unless the flag got set again in the meantime. Since the event lock
is being acquired right before the line quoted above, _that_ is
what needs closely looking at (and why I was asking about the
deletion of the [at the first glance unnecessary] checking of -state
in hvm_dirq_assist()).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 01/19] xen: dump vNUMA information with debug key u

2014-11-21 Thread Jan Beulich

 On 21.11.14 at 16:06, wei.l...@citrix.com wrote:
 Signed-off-by: Elena Ufimsteva ufimts...@gmail.com
 Signed-off-by: Wei Liu wei.l...@citrix.com
 Cc: Jan Beulich jbeul...@suse.com
 ---
  xen/arch/x86/numa.c |   46 +-
  1 file changed, 45 insertions(+), 1 deletion(-)
 
 diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
 index 628a40a..d27c30f 100644
 --- a/xen/arch/x86/numa.c
 +++ b/xen/arch/x86/numa.c
 @@ -363,10 +363,12 @@ EXPORT_SYMBOL(node_data);
  static void dump_numa(unsigned char key)
  {
  s_time_t now = NOW();
 -int i;
 +int i, j, err, n;

unsigned int please for all but err.

 @@ -408,6 +410,48 @@ static void dump_numa(unsigned char key)
  
  for_each_online_node ( i )
  printk(Node %u: %u\n, i, page_num_node[i]);
 +
 +if ( !d-vnuma )
 +continue;
 +
 +vnuma = d-vnuma;
 +printk( %u vnodes, %u vcpus\n, vnuma-nr_vnodes, d-max_vcpus);
 +for ( i = 0; i  vnuma-nr_vnodes; i++ )
 +{
 +err = snprintf(keyhandler_scratch, 12, %u,
 +vnuma-vnode_to_pnode[i]);
 +if ( err  0 || vnuma-vnode_to_pnode[i] == NUMA_NO_NODE )
 +snprintf(keyhandler_scratch, 3, ???);

strcpy() would be much cheaper here.

 +
 +printk(   vnode %3u - pnode %s\n, i, keyhandler_scratch);
 +for ( j = 0; j  vnuma-nr_vmemranges; j++ )
 +{
 +if ( vnuma-vmemrange[j].nid == i )
 +{
 +mem = vnuma-vmemrange[j].end - 
 vnuma-vmemrange[j].start;
 +printk(%PRIu64 MB: , mem  20);
 +printk( 0x%PRIx64 - 0x%PRIx64\n,
 +   vnuma-vmemrange[j].start,
 +   vnuma-vmemrange[j].end);

Where possible please don't split printk()s of a single output line. Also
%# instead of 0x% please (but maybe padding the values so the
align properly would be the better change; that would also eliminate
the need for explicit leading spaces).

 +}
 +}
 +
 +printk(vcpus: );
 +for ( j = 0, n = 0; j  d-max_vcpus; j++ )
 +{
 +if ( vnuma-vcpu_to_vnode[j] == i )
 +{
 +if ( (n + 1) % 8 == 0 )
 +printk(%d\n, j);
 +else if ( !(n % 8)  n != 0 )
 +printk(%d , j);
 +else
 +printk(%d , j);

Same here - left padding them to make them aligned will likely make
the result quite a bit easier to grok especially for large guests.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 00/19] Virtual NUMA for PV and HVM

2014-11-21 Thread Jan Beulich

 On 21.11.14 at 17:35, wei.l...@citrix.com wrote:
 On Fri, Nov 21, 2014 at 04:25:34PM +, Jan Beulich wrote:
  On 21.11.14 at 16:06, wei.l...@citrix.com wrote:
  vnuma_vdistances = [10, 30] # optional
 
 Being optional, would the real distances be used instead? And what
 
 Default value of [10, 20] will be used.

That's bad. Would it be very difficult to use the host values?

 meaning does this apparently one dimensional array here have for
 the actually two dimensional SLIT? (Read: An example with more
 than two nodes would be useful.)
 
 
 The first element of [X, Y] is local distance, the second element is
 remote distance.
 
 For a 4 node system:
 
  0123
 0XYYY 
 1YXYY
 2YYXY
 3YYYX

That may match up with how most current NUMA systems look like,
but do we really want to bake in an oversimplification like this?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 00/19] Virtual NUMA for PV and HVM

2014-11-21 Thread Jan Beulich

 On 21.11.14 at 17:55, wei.l...@citrix.com wrote:
 Nonetheless I'm all for having a configuration option that would meet
 both present and future need. Do you have anything in mind? Are you
 suggesting we should allow specifying every element in SLIT in xl?

I think that would be desirable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen-pciback: drop SR-IOV VFs when PF driver unloads

2014-11-23 Thread Jan Beulich

 On 21.11.14 at 23:03, konrad.w...@oracle.com wrote:
 I rewrote it a bit to be more in the style of pciback:
[...]
 [v2: Removed the switch statement, moved it about]

What you don't mention here is that you also removed the outer
loop, yet that breaks functionality afaict: There can (and I suppose
normally will) be multiple VFs needing device_release_driver() called
when a single PF goes away.

Also I'm not really happy for a patch with my S-o-b underneath to
introduce goto-s without real need.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-24 Thread Jan Beulich

 On 23.11.14 at 02:28, sfl...@ihonk.com wrote:
 With mwait-idle=0:
 
 (XEN) 'c' pressed - printing ACPI Cx structures
 (XEN) ==cpu0==
 (XEN) active state: C0
 (XEN) max_cstate:   C7
 (XEN) states:
 (XEN) C1:   type[C1] latency[001] usage[] method[  FFH] 
 duration[0]
 (XEN) C2:   type[C0] latency[000] usage[] method[ NONE] 
 duration[0]
 (XEN) C3:   type[C3] latency[064] usage[] method[  FFH] 
 duration[0]
 (XEN) C4:   type[C3] latency[096] usage[] method[  FFH] 
 duration[0]
 (XEN)*C0:   usage[] duration[46930624784]
 (XEN) PC2[0] PC3[0] PC6[0] PC7[0]
 (XEN) CC3[0] CC6[0] CC7[0]
[...]

Very interesting - the hypervisor has C-state information, but never
entered any of them. That certainly explains the difference between
using/not using the ,wait-idle driver, but puts us back to there being
a more general issue with C-state use on this CPU model. Possibly
related to C2 having entry method NONE, but then again I can't
see how such a state could get entered into the table the first place:
set_cx() bails upon check_cx() returning an error, and hence its
switch()'s default statement should never be reached. Plus even if an
array entry was set to NONE, it should simply be ignored when
looking for a state to enter. I'll probably need to put together a
debugging patch to figure out what's going on here.

In any event C2 being set to NONE and that information presumably
coming from firmware is an indication that there's a problem with C2
(note that the numbering doesn't really match up with what the
document says, this likely really is C1E) on that CPU. Which gets us
back to ...

 CPU information for one of the cores, 2.8 GHz is nominal, stepping is 2. 
 Not sure how to translate that stepping number into Intel's format:
 
 processor   : 0
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 44
 model name  : Intel(R) Xeon(R) CPU   X5660  @ 2.80GHz
 stepping: 2
[...]
 There are a couple potentially relevant errata (BC36, BC38, BC54,
 BC77, BC110).

 To exclude BC36, a boot log with apic-verbosity=debug and debug
 key 'i' output would be necessary.
 
 Done, see the very end of the email.
 
 BC38 should not affect us since we don't enter C states from ISRs.

 BC54 is probably irrelevant since we meanwhile know that your
 system doesn't really hang hard.

 For BC77 it would be worth trying to disable turbo mode instead of
 disabling the mwait-idle driver (xenpm disable-turbo-mode right
 after boot).
 
 I looked up BC77 but as a result found this document[1], which seems to 
 relate to the i7.  Would this[2] not be the relevant document?
 
 [1] 
 http://www.intel.com/content/dam/www/public/us/en/documents/specification-upd
 ates/core-i7-900-ee-and-desktop-processor-series-32nm-spec-update.pdf
 
 [2] 
 http://www.intel.com/content/dam/www/public/us/en/documents/specification-upd
 ates/xeon-5600-specification-update.pdf

Indeed. I wasn't aware that there are family/model/stepping tuples
that can be both Xeon and desktop CPUs.

 As promised, below is the apic-verbosity=debug log, with 'i'. Thanks!

I'm sorry, I misspelled the option, it's really apic_verbosity=debug.
The 'i' output at least already confirms that there are no ExtINT
entries among the IO-APIC RTEs.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.

2014-11-24 Thread Jan Beulich

 On 21.11.14 at 17:45, konrad.w...@oracle.com wrote:
 From 90d00db0949a8e796d7f812134753a54b2fe3d51 Mon Sep 17 00:00:00 2001
 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Date: Thu, 20 Nov 2014 14:28:11 -0500
 Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to 
 process.

So am I right in understanding that this is then the only thing that
needs to go into the tree to fix Sander's problem? No zombie state
anymore?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 14/19] hvmloader: disallow memory relocation when vNUMA is enabled

2014-11-24 Thread Jan Beulich

 On 21.11.14 at 20:56, konrad.w...@oracle.com wrote:
 On Fri, Nov 21, 2014 at 03:06:56PM +, Wei Liu wrote:
 --- a/tools/firmware/hvmloader/pci.c
 +++ b/tools/firmware/hvmloader/pci.c
 @@ -88,6 +88,19 @@ void pci_setup(void)
  printf(Relocating guest memory for lowmem MMIO space %s\n,
 allow_memory_relocate?enabled:disabled);
  
 +/* Disallow low memory relocation when vNUMA is enabled, because
 + * relocated memory ends up off node. Further more, even if we
 + * dynamically expand node coverage in hvmloader, low memory and
 + * high memory may reside in different physical nodes, blindly
 + * relocates low memory to high memory gives us a sub-optimal
 + * configuration.
 
 And this is done in hvmloader, so the toolstack has no inkling that
 we need to relocate memory to make space for the PCI.
 
 In such case I would not have this check here. Instead put it in 
 libxl and disallow vNUMA with PCI passthrough.
 
 And then the fix is to take the logic that is in hvmloader for PCI
 BAR size relocation and move it in libxl. Then it can construct the
 proper vNUMA topology and also fix an outstanding QEMU-xen bug.

The problem then being that two code pieces pretty far apart from
one another need to be kept in perfect sync. Not really nice in
terms of maintainability. I'd really prefer hvmloader to re-write the
vNUMA info (on real hardware firmware also needs to take care of
the memory holes - there's no magic external entity there either).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 14/19] hvmloader: disallow memory relocation when vNUMA is enabled

2014-11-24 Thread Jan Beulich

 On 21.11.14 at 16:06, wei.l...@citrix.com wrote:
 Signed-off-by: Wei Liu wei.l...@citrix.com

So this is the fourth patch now without any description whatsoever.

 --- a/tools/firmware/hvmloader/pci.c
 +++ b/tools/firmware/hvmloader/pci.c
 @@ -88,6 +88,19 @@ void pci_setup(void)
  printf(Relocating guest memory for lowmem MMIO space %s\n,
 allow_memory_relocate?enabled:disabled);
  
 +/* Disallow low memory relocation when vNUMA is enabled, because
 + * relocated memory ends up off node. Further more, even if we
 + * dynamically expand node coverage in hvmloader, low memory and
 + * high memory may reside in different physical nodes, blindly
 + * relocates low memory to high memory gives us a sub-optimal
 + * configuration.
 + */
 +if ( hvm_info-nr_nodes != 0  allow_memory_relocate )
 +{
 +allow_memory_relocate = false;
 +printf(vNUMA enabled, relocating guest memory for lowmem MMIO space 
 disabled\n);
 +}

Apart from the comment violating our coding style, as already
indicated in the reply to Konrad's comment I don't think this is
the right approach. If it is meant to be a temporary measure, the
comment should say so (and perhaps have a TBD or similar grep-
able mark in it).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-24 Thread Jan Beulich

 On 24.11.14 at 10:08, sfl...@ihonk.com wrote:
 On Nov 24, 2014, at 00:45, Jan Beulich jbeul...@suse.com wrote:
 
 On 23.11.14 at 02:28, sfl...@ihonk.com wrote:
 With mwait-idle=0:
 
 (XEN) 'c' pressed - printing ACPI Cx structures
 (XEN) ==cpu0==
 (XEN) active state: C0
 (XEN) max_cstate:   C7
 (XEN) states:
 (XEN) C1:   type[C1] latency[001] usage[] method[  FFH] 
 duration[0]
 (XEN) C2:   type[C0] latency[000] usage[] method[ NONE] 
 duration[0]
 (XEN) C3:   type[C3] latency[064] usage[] method[  FFH] 
 duration[0]
 (XEN) C4:   type[C3] latency[096] usage[] method[  FFH] 
 duration[0]
 (XEN)*C0:   usage[] duration[46930624784]
 (XEN) PC2[0] PC3[0] PC6[0] PC7[0]
 (XEN) CC3[0] CC6[0] CC7[0]
 [...]
 
 Very interesting - the hypervisor has C-state information, but never
 entered any of them. That certainly explains the difference between
 using/not using the ,wait-idle driver, but puts us back to there being
 a more general issue with C-state use on this CPU model. Possibly
 related to C2 having entry method NONE, but then again I can't
 see how such a state could get entered into the table the first place:
 set_cx() bails upon check_cx() returning an error, and hence its
 switch()'s default statement should never be reached. Plus even if an
 array entry was set to NONE, it should simply be ignored when
 looking for a state to enter. I'll probably need to put together a
 debugging patch to figure out what's going on here.
 
 Okay, happy to give it a go whenever you have the time to put something 
 together.

While putting this together I found the reason for the strange
C2: line, and the attached debugging patch already has the fix
for it (which I'll also submit separately, and hence you may need
to drop that specific hunk should you end up applying it on a tree
which already has that fix). You'll need to again run with
mwait-idle=0, and it's the boot messages along with the 'c'
debug key output that's of interest.

Thanks, Jan

--- unstable.orig/xen/arch/x86/acpi/cpu_idle.c
+++ unstable/xen/arch/x86/acpi/cpu_idle.c
@@ -58,7 +58,7 @@
 #include xen/notifier.h
 #include xen/cpu.h
 
-/*#define DEBUG_PM_CX*/
+#define DEBUG_PM_CX
 
 #define GET_HW_RES_IN_NS(msr, val) \
 do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
@@ -238,6 +238,9 @@ static char* acpi_cstate_method_name[] =
 HALT
 };
 
+struct reasons { unsigned long max, pwr, urg, nxt; };//temp
+static DEFINE_PER_CPU(struct reasons, reasons);//temp
+
 static void print_acpi_power(uint32_t cpu, struct acpi_processor_power *power)
 {
 uint32_t i, idle_usage = 0;
@@ -273,6 +276,8 @@ static void print_acpi_power(uint32_t cp
 printk((last_state_idx == 0) ?* : );
 printk(C0:\tusage[%08d] duration[%PRId64]\n,
idle_usage, NOW() - idle_res);
+printk(max=%lx pwr=%lx urg=%lx nxt=%lx\n,//temp
+   per_cpu(reasons.max, cpu), per_cpu(reasons.pwr, cpu), 
per_cpu(reasons.urg, cpu), per_cpu(reasons.nxt, cpu));
 
 print_hw_residencies(cpu);
 }
@@ -501,6 +506,7 @@ static void acpi_processor_idle(void)
 u32 exp = 0, pred = 0;
 u32 irq_traced[4] = { 0 };
 
+next_state = 1;//temp
 if ( max_cstate  0  power  !sched_has_urgent_vcpu() 
  (next_state = cpuidle_current_governor-select(power))  0 )
 {
@@ -519,6 +525,10 @@ static void acpi_processor_idle(void)
 }
 if ( !cx )
 {
+this_cpu(reasons.max) += max_cstate = 0;//temp
+this_cpu(reasons.pwr) += !power;//temp
+this_cpu(reasons.urg) += !!sched_has_urgent_vcpu();//temp
+this_cpu(reasons.nxt) += next_state = 0;//temp
 if ( pm_idle_save )
 pm_idle_save();
 else
@@ -1007,6 +1017,7 @@ static void set_cx(
 cx-entry_method = ACPI_CSTATE_EM_SYSIO;
 break;
 default:
+printk(CPU%u: C%u space %x?\n, acpi_power-cpu, xen_cx-type, 
xen_cx-reg.space_id);//temp
 cx-entry_method = ACPI_CSTATE_EM_NONE;
 break;
 }
@@ -1015,7 +1026,7 @@ static void set_cx(
 cx-target_residency = cx-latency * latency_factor;
 
 smp_wmb();
-acpi_power-count++;
+acpi_power-count += (cx-type != ACPI_STATE_C1);
 if ( cx-type == ACPI_STATE_C1 || cx-type == ACPI_STATE_C2 )
 acpi_power-safe_state = cx;
 }
@@ -1141,6 +1152,7 @@ long set_cx_pminfo(uint32_t cpu, struct 
 
 /* FIXME: C-state dependency is not supported by far */
 
+printk(CPU%u: %pS, %pS\n, cpu, pm_idle_save, pm_idle);//temp
 if ( cpu_id == 0 )
 {
 if ( pm_idle_save == NULL )
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/3] x86: XSA-109/110 follow-up (to be considered for 4.5)

2014-11-24 Thread Jan Beulich

 On 20.11.14 at 11:04, jbeul...@suse.com wrote:
 1: tighten page table owner checking in do_mmu_update()
 2: don't ignore foreigndom input on various MMUEXT ops
 3: HVM: don't crash guest upon problems occurring in user mode
 
 Reason to request considering this for 4.5: Tightened argument
 checking (as done in the first two patches) reduces the chances
 of them getting abused. Not unduly crashing the guest (as done
 in the third one) may avoid future security issues of guest user
 mode affecting the guest kernel.
 
 Signed-off-by: Jan Beulich jbeul...@suse.com

Hi Konrad - looks like I forgot to Cc you...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/3] x86: don't ignore foreigndom input on various MMUEXT ops

2014-11-24 Thread Jan Beulich

 On 24.11.14 at 13:43, dunl...@umich.edu wrote:
 On Thu, Nov 20, 2014 at 10:12 AM, Jan Beulich jbeul...@suse.com wrote:
 Instead properly fail requests that shouldn't be issued on foreign
 domains or - for MMUEXT_{CLEAR,COPY}_PAGE - extend the existing
 operation to work that way.
 
 I take it this is for 4.6?

Not really, as said in the cover letter.

 I've looked through it and everything looks OK.
 
 But I agree with Tim, that having so many different changes all at the
 same time makes the patch hard to review.
 
 In particular, I'd rather start with a patch to get rid of okay
 entirely; then make MMUEXT_{CLEAR,COPY}_PAGE use foreingndom instead
 of current; then have a patch which returns -EPERM for the other ones;
 then a patch to get rid of spage in MMUEXT_[UN]MARK_SUPER.

Yes, that's how I would have done it following Tim's comments if
there wasn't the desire for backporting - that'll become quite a bit
more involved if I do the cleanup patch removing okay first. And
doing the -EPERM one first would mean that the backport needs to
carefully add properly setting okay. Splitting the clear/copy page
change from the -EPERM one doesn't make much sense to me at
all considering the title (and hence purpose) of the patch.

 Regarding MMUEXT_{CLEAR,COPY}_PAGE: This is effectively changing the
 interface.  Are we sure there are no callers which just expect them to
 work on current, and don't set foreigndom properly?

As much or as little as all of the sudden returning -EPERM in such
a case for other sub-ops. They never were meant to be used that
way, and the original omission of those checks shouldn't preclude us
from adding them.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Failure on make clean

2014-11-25 Thread Jan Beulich

 On 25.11.14 at 08:44, jgr...@suse.com wrote:
 Hi,
 
 make clean in xen-unstable is failing:
 
 make[2]: Entering directory '/home/gross/xen/tools'
 set -e; if test -d qemu-xen-traditional-dir/.; then \
  make -C qemu-xen-traditional-dir clean; \
 fi
 make[3]: Entering directory 
 '/home/gross/xen/tools/qemu-xen-traditional-dir-remote'
 Makefile:3: config-host.mak: No such file or directory

SRC_PATH gets defined there. This file missing makes me wonder
whether you ran make clean without a prior make. If so,
perhaps the test in tools/Makefile should be altered:

subdir-clean-qemu-xen-traditional-dir:
set -e; if test -e qemu-xen-traditional-dir/config-host.mak; then \
$(MAKE) -C qemu-xen-traditional-dir clean; \
fi

(possibly in a similar way for subdir-clean-qemu-xen-dir), albeit that
may end up leaving an unclean tree when having interrupted qemu's
configure process (not sure in which order it creates files).

Jan

 Makefile:4: /rules.mak: No such file or directory
 make[3]: *** No rule to make target '/rules.mak'.  Stop.
 make[3]: Leaving directory 
 '/home/gross/xen/tools/qemu-xen-traditional-dir-remote'
 Makefile:181: recipe for target 'subdir-clean-qemu-xen-traditional-dir' 
 failed
 make[2]: *** [subdir-clean-qemu-xen-traditional-dir] Error 2
 make[2]: Leaving directory '/home/gross/xen/tools'
 /home/gross/xen/tools/../tools/Rules.mk:111: recipe for target 
 'subdirs-clean' failed
 make[1]: *** [subdirs-clean] Error 2
 make[1]: Leaving directory '/home/gross/xen/tools'
 Makefile:139: recipe for target 'clean' failed
 make: *** [clean] Error 2
 
 
 I think SRC_PATH is somehow undefined...
 
 
 Juergen
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org 
 http://lists.xen.org/xen-devel 




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] x86/HVM: Partial revert of 28b4baacd5

2014-11-25 Thread Jan Beulich

 On 25.11.14 at 11:08, andrew.coop...@citrix.com wrote:
 A failed vmentry is overwhelmingly likely to be caused by corrupt VMCS state.
 As a result, injecting a fault and retrying the the vmentry is likely to 
 fail
 in the same way.

That's not all that unlikely - remember that the change was prompted
by the XSA-110 fix. There CS pieces being in a bad state would get
corrected by the exception injection.

 One other alternative, which I would pursue if we were not already in -rc2
 would be to add some extra logic to detect repeated vmentry failure and 
 allow
 one attempt to shoot userspace before giving up and crashing the domain.

That's not even needed afaict (and if it really is, it can't be all that
difficult/intrusive): Did you observe what you attempt to fix here in
practice, or is this just from theoretical considerations? I ask because
I don't think it can actually happen, as the second time we get here
the guest ought to be in kernel mode (due to the exception injection)
and hence would get crashed anyway.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-25 Thread Jan Beulich

 On 25.11.14 at 10:38, sfl...@ihonk.com wrote:
 On 11/25/2014 12:16 AM, Jan Beulich wrote:
 Interesting, so other than for me (perhaps due to other patches
 I have in my tree) the change resulted in C states now being used
 again despite mwait-idle=0, which is good. Question now is - with
 this being the case, did the hangs re-occur?
 
 Unfortunately they did. (Happened unusually quick this time, though I 
 doubt the statistical significance.) Not sure what the desirable output 
 is, so I did a couple of 'a' and 'd' requests, capped off by a 'c':

Okay, so it's not really the mwait-idle driver causing the regression,
but it is C-state related. Hence we're now down to seeing whether all
or just the deeper C states are affected, i.e. I now need to ask you
to play with max_cstate=. For that you'll have to remember that the
option's effect differs between the ACPI and the MWAIT idle drivers.
In the spirit of bisection I'd suggest using max_cstate=2 first no
matter which of the two scenarios you pick. If that still hangs,
max_cstate=1 obviously is the only other thing to try. Should that
not hang (and you left out mwait-idle=0), trying max_cstate=3
in that same scenario would be the other case to check.

No need for 'd' and 'a' output for the time being, but 'c' output would
be much appreciated for all cases where you observe hangs.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] x86/HVM: Partial revert of 28b4baacd5

2014-11-25 Thread Jan Beulich

 On 25.11.14 at 11:46, andrew.coop...@citrix.com wrote:
 On 25/11/14 10:42, Jan Beulich wrote:
 On 25.11.14 at 11:08, andrew.coop...@citrix.com wrote:
 A failed vmentry is overwhelmingly likely to be caused by corrupt VMCS 
 state.
 As a result, injecting a fault and retrying the the vmentry is likely to 
 fail
 in the same way.
 That's not all that unlikely - remember that the change was prompted
 by the XSA-110 fix. There CS pieces being in a bad state would get
 corrected by the exception injection.

 One other alternative, which I would pursue if we were not already in -rc2
 would be to add some extra logic to detect repeated vmentry failure and 
 allow
 one attempt to shoot userspace before giving up and crashing the domain.
 That's not even needed afaict (and if it really is, it can't be all that
 difficult/intrusive): Did you observe what you attempt to fix here in
 practice, or is this just from theoretical considerations? I ask because
 I don't think it can actually happen, as the second time we get here
 the guest ought to be in kernel mode (due to the exception injection)
 and hence would get crashed anyway.
 
 Only from theoretical considerations.  A bad CS (and possibly SS) would
 be fixed by this, but there are many others which wouldn't

But that doesn't eliminate the fact that in the second pass we'd find
the guest in kernel mode, and hence crash it. Yet your reply sounds
as if you still think your patch is needed.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] x86/HVM: Partial revert of 28b4baacd5

2014-11-25 Thread Jan Beulich

 On 25.11.14 at 11:58, andrew.coop...@citrix.com wrote:
 On 25/11/14 10:46, Andrew Cooper wrote:
 On 25/11/14 10:42, Jan Beulich wrote:
 On 25.11.14 at 11:08, andrew.coop...@citrix.com wrote:
 A failed vmentry is overwhelmingly likely to be caused by corrupt VMCS 
 state.
 As a result, injecting a fault and retrying the the vmentry is likely to 
 fail
 in the same way.
 That's not all that unlikely - remember that the change was prompted
 by the XSA-110 fix. There CS pieces being in a bad state would get
 corrected by the exception injection.

 One other alternative, which I would pursue if we were not already in -rc2
 would be to add some extra logic to detect repeated vmentry failure and 
 allow
 one attempt to shoot userspace before giving up and crashing the domain.
 That's not even needed afaict (and if it really is, it can't be all that
 difficult/intrusive): Did you observe what you attempt to fix here in
 practice, or is this just from theoretical considerations? I ask because
 I don't think it can actually happen, as the second time we get here
 the guest ought to be in kernel mode (due to the exception injection)
 and hence would get crashed anyway.
 Only from theoretical considerations.  A bad CS (and possibly SS) would
 be fixed by this, but there are many others which wouldn't
 
 Actually, as Tim correctly points out, a bad CS/SS won't be fixed by
 this without emulating the event injection.  Per the XSA-106 followup,
 we only ever emulate enough of event injection to cover the dpl checks
 on software events for older generation SVM.  We never actually emulate
 the context switch itself.

Which suggests that rather than doing the partial revert as you
propose we might better extend the check to become kernel mode
or event injection pending.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] x86/HVM: Partial revert of 28b4baacd5

2014-11-25 Thread Jan Beulich

 On 25.11.14 at 13:10, andrew.coop...@citrix.com wrote:
 On 25/11/14 11:31, Jan Beulich wrote:
 On 25.11.14 at 11:58, andrew.coop...@citrix.com wrote:
 On 25/11/14 10:46, Andrew Cooper wrote:
 On 25/11/14 10:42, Jan Beulich wrote:
 On 25.11.14 at 11:08, andrew.coop...@citrix.com wrote:
 A failed vmentry is overwhelmingly likely to be caused by corrupt VMCS 
 state.
 As a result, injecting a fault and retrying the the vmentry is likely to 
 fail
 in the same way.
 That's not all that unlikely - remember that the change was prompted
 by the XSA-110 fix. There CS pieces being in a bad state would get
 corrected by the exception injection.

 One other alternative, which I would pursue if we were not already in 
 -rc2
 would be to add some extra logic to detect repeated vmentry failure and 
 allow
 one attempt to shoot userspace before giving up and crashing the domain.
 That's not even needed afaict (and if it really is, it can't be all that
 difficult/intrusive): Did you observe what you attempt to fix here in
 practice, or is this just from theoretical considerations? I ask because
 I don't think it can actually happen, as the second time we get here
 the guest ought to be in kernel mode (due to the exception injection)
 and hence would get crashed anyway.
 Only from theoretical considerations.  A bad CS (and possibly SS) would
 be fixed by this, but there are many others which wouldn't
 Actually, as Tim correctly points out, a bad CS/SS won't be fixed by
 this without emulating the event injection.  Per the XSA-106 followup,
 we only ever emulate enough of event injection to cover the dpl checks
 on software events for older generation SVM.  We never actually emulate
 the context switch itself.
 Which suggests that rather than doing the partial revert as you
 propose we might better extend the check to become kernel mode
 or event injection pending.
 
 At that point, it is safer just to unconditionally crash on a repeated
 vmentry failure, rather than gain a list of conditions which we hope
 wont leave us spinning in a loop.

Crashing on _repeated_ VM entry failure is certainly fine with me.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] vNUMA: rename interface structures

2014-11-25 Thread Jan Beulich

No-one (including me) paid attention during review that these
structures don't adhere to the naming requirements of the public
interface: Consistently use xen_ prefixes at least for all new
additions.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1264,7 +1264,7 @@ int xc_domain_setvnuma(xc_interface *xch
 uint32_t nr_vnodes,
 uint32_t nr_regions,
 uint32_t nr_vcpus,
-vmemrange_t *vmemrange,
+xen_vmemrange_t *vmemrange,
 unsigned int *vdistance,
 unsigned int *vcpu_to_vnode,
 unsigned int *vnode_to_pnode);
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2171,7 +2171,7 @@ int xc_domain_setvnuma(xc_interface *xch
uint32_t nr_vnodes,
uint32_t nr_vmemranges,
uint32_t nr_vcpus,
-   vmemrange_t *vmemrange,
+   xen_vmemrange_t *vmemrange,
unsigned int *vdistance,
unsigned int *vcpu_to_vnode,
unsigned int *vnode_to_pnode)
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -345,7 +345,7 @@ static struct vnuma_info *vnuma_alloc(un
 vnuma-vdistance = xmalloc_array(unsigned int, nr_vnodes * nr_vnodes);
 vnuma-vcpu_to_vnode = xmalloc_array(unsigned int, nr_vcpus);
 vnuma-vnode_to_pnode = xmalloc_array(unsigned int, nr_vnodes);
-vnuma-vmemrange = xmalloc_array(vmemrange_t, nr_ranges);
+vnuma-vmemrange = xmalloc_array(xen_vmemrange_t, nr_ranges);
 
 if ( vnuma-vdistance == NULL || vnuma-vmemrange == NULL ||
  vnuma-vcpu_to_vnode == NULL || vnuma-vnode_to_pnode == NULL )
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -972,7 +972,7 @@ long do_memory_op(unsigned long cmd, XEN
 
 case XENMEM_get_vnumainfo:
 {
-struct vnuma_topology_info topology;
+struct xen_vnuma_topology_info topology;
 struct domain *d;
 unsigned int dom_vnodes, dom_vranges, dom_vcpus;
 struct vnuma_info tmp;
@@ -1033,7 +1033,7 @@ long do_memory_op(unsigned long cmd, XEN
 read_unlock(d-vnuma_rwlock);
 
 tmp.vdistance = xmalloc_array(unsigned int, dom_vnodes * dom_vnodes);
-tmp.vmemrange = xmalloc_array(vmemrange_t, dom_vranges);
+tmp.vmemrange = xmalloc_array(xen_vmemrange_t, dom_vranges);
 tmp.vcpu_to_vnode = xmalloc_array(unsigned int, dom_vcpus);
 
 if ( tmp.vdistance == NULL ||
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -980,7 +980,7 @@ struct xen_domctl_vnuma {
 /*
  * memory rages for each vNUMA node
  */
-XEN_GUEST_HANDLE_64(vmemrange_t) vmemrange;
+XEN_GUEST_HANDLE_64(xen_vmemrange_t) vmemrange;
 };
 typedef struct xen_domctl_vnuma xen_domctl_vnuma_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t);
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -530,14 +530,13 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_
 #define XENMEM_get_vnumainfo26
 
 /* vNUMA node memory ranges */
-struct vmemrange {
+struct xen_vmemrange {
 uint64_t start, end;
 unsigned int flags;
 unsigned int nid;
 };
-
-typedef struct vmemrange vmemrange_t;
-DEFINE_XEN_GUEST_HANDLE(vmemrange_t);
+typedef struct xen_vmemrange xen_vmemrange_t;
+DEFINE_XEN_GUEST_HANDLE(xen_vmemrange_t);
 
 /*
  * vNUMA topology specifies vNUMA node number, distance table,
@@ -548,7 +547,7 @@ DEFINE_XEN_GUEST_HANDLE(vmemrange_t);
  * copied back to guest. Domain returns expected values of nr_vnodes,
  * nr_vmemranges and nr_vcpus to guest if the values where incorrect.
  */
-struct vnuma_topology_info {
+struct xen_vnuma_topology_info {
 /* IN */
 domid_t domid;
 uint16_t pad;
@@ -566,12 +565,12 @@ struct vnuma_topology_info {
 uint64_t pad;
 } vcpu_to_vnode;
 union {
-XEN_GUEST_HANDLE(vmemrange_t) h;
+XEN_GUEST_HANDLE(xen_vmemrange_t) h;
 uint64_t pad;
 } vmemrange;
 };
-typedef struct vnuma_topology_info vnuma_topology_info_t;
-DEFINE_XEN_GUEST_HANDLE(vnuma_topology_info_t);
+typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
 /* Next available subop number is 27 */
 
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -100,7 +100,7 @@ struct vnuma_info {
 unsigned int *vdistance;
 unsigned int *vcpu_to_vnode;
 unsigned int *vnode_to_pnode;
-struct vmemrange *vmemrange;
+struct xen_vmemrange *vmemrange;
 };
 
 void vnuma_destroy(struct vnuma_info *vnuma);


vNUMA: rename interface structures

No-one (including me) paid attention during review that these
structures don't adhere to the naming requirements of the public
interface: Consistently use xen_ prefixes

Re: [Xen-devel] [PATCH v15 09/21] x86/VPMU: Add public xenpmu.h

2014-11-25 Thread Jan Beulich

 On 17.11.14 at 00:07, boris.ostrov...@oracle.com wrote:
 --- a/xen/include/xlat.lst
 +++ b/xen/include/xlat.lst
 @@ -103,6 +103,10 @@
  !vcpu_set_singleshot_timer   vcpu.h
  ?xenoprof_init   xenoprof.h
  ?xenoprof_passivexenoprof.h
 +?pmu_intel_ctxt  arch-x86/pmu.h
 +?pmu_amd_ctxtarch-x86/pmu.h
 +?pmu_cntr_pair   arch-x86/pmu.h
 +?pmu_regsarch-x86/pmu.h

If another revision turns out necessary, please get these additions
sorted alphabetically.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 11/21] x86/VPMU: Interface for setting PMU mode and flags

2014-11-25 Thread Jan Beulich

 On 17.11.14 at 00:07, boris.ostrov...@oracle.com wrote:
 @@ -278,3 +290,139 @@ void vpmu_dump(struct vcpu *v)
  vpmu-arch_vpmu_ops-arch_vpmu_dump(v);
  }
  
 +static s_time_t vpmu_start_ctxt_switch;

Blank line here please.

 +static long vpmu_sched_checkin(void *arg)
 +{
 +int cpu = cpumask_next(smp_processor_id(), cpu_online_map);

unsigned int.

 +int ret;
 +
 +/* Mode change shouldn't take more than a few (say, 5) seconds. */
 +if ( NOW()  vpmu_start_ctxt_switch + SECONDS(5) )
 +{
 +ret = -EBUSY;
 +goto fail;
 +}

So what does this guard against? Holding xenpmu_mode_lock for
5 seconds is not a problem, plus I don't think you actually need a
lock at all. Just using a global, atomically updated flag ought to be
sufficient (the way you use the lock is really nothing else afaict).

 +
 +if ( cpu  nr_cpu_ids )
 +{
 +ret = continue_hypercall_on_cpu(cpu, vpmu_sched_checkin, arg);
 +if ( ret )
 +goto fail;
 +}
 +else
 +vpmu_start_ctxt_switch = 0;
 +
 +return 0;
 +
 + fail:
 +vpmu_mode = (uint64_t)arg;

Even if we don't support x86-32 anymore, please avoid giving bad
examples: Casts between pointers and integers should always use
(unsigned) long on the integer side.

 +vpmu_start_ctxt_switch = 0;
 +return ret;
 +}
 +
 +static int vpmu_force_context_switch(uint64_t old_mode)

Same comment as above regarding the type.

 +{
 +int ret;
 +
 +vpmu_start_ctxt_switch = NOW();
 +
 +ret = continue_hypercall_on_cpu(cpumask_first(cpu_online_map),
 +vpmu_sched_checkin, (void *)old_mode);
 +if ( ret )
 +vpmu_start_ctxt_switch = 0;
 +
 +return ret;
 +}

I don't think it is guaranteed (independent of implementation details
of continue_hypercall_on_cpu()) that this way you went through the
context switch path once on each CPU if
cpumask_first(cpu_online_map) == smp_processor_id() here. In
particular it looks like there being a problem calling vcpu_sleep_sync()
in the tasklet handler when v == current (which would be the case
if you started on the correct CPU and the tasklet softirq got
handled before the scheduler one). I think you instead want to use
cpumask_cycle() here and above, and finish the whole operation
once you're back on the CPU you started on (the single-pCPU case
may need some extra consideration).

I realize that you simply follow what microcode_update() does, but
I think we should rather correct that one than copy the latent issue
it has elsewhere.

 +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 +{
 +int ret;
 +struct xen_pmu_params pmu_params;
 +
 +ret = xsm_pmu_op(XSM_OTHER, current-domain, op);
 +if ( ret )
 +return ret;
 +
 +switch ( op )
 +{
 +case XENPMU_mode_set:
 +{
 +uint64_t old_mode;

Irrespective of the earlier comments I don't see why this isn't just
unsigned int (or typeof(vpmu_mode)).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 11/21] x86/VPMU: Interface for setting PMU mode and flags

2014-11-25 Thread Jan Beulich

 On 17.11.14 at 00:07, boris.ostrov...@oracle.com wrote:
 @@ -244,19 +256,19 @@ void vpmu_initialise(struct vcpu *v)
  switch ( vendor )
  {
  case X86_VENDOR_AMD:
 -if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
 -opt_vpmu_enabled = 0;
 +if ( svm_vpmu_initialise(v) != 0 )
 +vpmu_mode = XENPMU_MODE_OFF;
  break;
  
  case X86_VENDOR_INTEL:
 -if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
 -opt_vpmu_enabled = 0;
 +if ( vmx_vpmu_initialise(v) != 0 )
 +vpmu_mode = XENPMU_MODE_OFF;
  break;

So this turns off the vPMU globally upon failure of initializing
some random vCPU. Why is that? I see this was the case even
before your entire series, but shouldn't this be fixed _before_
enhancing the whole thing to support PV/PVH?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 13/21] x86/VPMU: Initialize PMU for PV(H) guests

2014-11-25 Thread Jan Beulich

 On 17.11.14 at 00:07, boris.ostrov...@oracle.com wrote:
 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
 +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
 @@ -362,24 +362,34 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
  struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
  uint64_t *p = NULL;
  
 -if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
 -return 0;
 -
 -wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 -if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
 +p = xzalloc_bytes(sizeof(uint64_t));

xzalloc(uint64_t)?

 +if ( !p )
  goto out_err;
  
 -if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
 -goto out_err;
 -vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 -
 -core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
 -   sizeof(uint64_t) * fixed_pmc_cnt +
 -   sizeof(struct xen_pmu_cntr_pair) *
 -   arch_pmc_cnt);
 -p = xzalloc(uint64_t);

At least it was that way before...

With that fixed, feel free to add
Acked-by: Jan Beulich jbeul...@suse.com

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 17/21] x86/VPMU: Handle PMU interrupts for PV guests

2014-11-25 Thread Jan Beulich

 On 17.11.14 at 00:07, boris.ostrov...@oracle.com wrote:
 +if ( (vpmu_mode  XENPMU_MODE_SELF) )
 +cur_regs = guest_cpu_user_regs();
 +else if ( !guest_mode(regs)  
 is_hardware_domain(sampling-domain) )
 +{
 +cur_regs = regs;
 +domid = DOMID_XEN;
 +}
 +else
 +cur_regs = guest_cpu_user_regs();

Of course it would be nice to do this with a single if/else pair.

 +else
 +{
 +struct segment_register seg;
 +
 +hvm_get_segment_register(sampled, x86_seg_cs, seg);
 +r-cs = seg.sel;
 +hvm_get_segment_register(sampled, x86_seg_ss, seg);
 +r-ss = seg.sel;
 +if ( seg.attr.fields.dpl != 0 )
 +*flags |= PMU_SAMPLE_USER;

Is that how hardware treats it (CPL != 0 meaning user, rather
than CPL == 3)? Maybe you should surface CPL instead of a
boolean flag?

Anyway, with or without the adjustments (unless you go a 3rd
way)
Acked-by: Jan Beulich jbeul...@suse.com

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient

2014-11-25 Thread Jan Beulich

 On 07.10.14 at 15:10, vkuzn...@redhat.com wrote:
 New operation sets the 'recipient' domain which will recieve all
 memory pages from a particular domain when these pages are freed.



 --- a/xen/common/domctl.c
 +++ b/xen/common/domctl.c
 @@ -1152,6 +1152,33 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
 u_domctl)
  }
  break;
  
 +case XEN_DOMCTL_set_recipient:
 +{
 +struct domain *recipient_dom;
 +
 +if ( op-u.recipient.recipient == DOMID_INVALID )
 +{
 +if ( d-recipient )
 +{
 +put_domain(d-recipient);
 +}

Please drop pointless braces like these and ...

 +d-recipient = NULL;
 +break;
 +}
 +
 +recipient_dom = get_domain_by_id(op-u.recipient.recipient);
 +if ( recipient_dom == NULL )
 +{
 +ret = -ESRCH;
 +break;
 +}
 +else
 +{
 +d-recipient = recipient_dom;
 +}

... the pointless else-s/break-s like here.

 --- a/xen/common/page_alloc.c
 +++ b/xen/common/page_alloc.c
 @@ -1707,6 +1707,7 @@ void free_domheap_pages(struct page_info *pg, unsigned 
 int order)
  {
  struct domain *d = page_get_owner(pg);
  unsigned int i;
 +unsigned long mfn, gmfn;
  bool_t drop_dom_ref;
  
  ASSERT(!in_irq());
 @@ -1764,11 +1765,28 @@ void free_domheap_pages(struct page_info *pg, 
 unsigned int order)
  scrub = 1;
  }
  
 -if ( unlikely(scrub) )
 -for ( i = 0; i  (1  order); i++ )
 -scrub_one_page(pg[i]);
 +if ( !d || !d-recipient || d-recipient-is_dying )
 +{
 +if ( unlikely(scrub) )
 +for ( i = 0; i  (1  order); i++ )
 +scrub_one_page(pg[i]);
  
 -free_heap_pages(pg, order);
 +free_heap_pages(pg, order);
 +}
 +else
 +{
 +mfn = page_to_mfn(pg);
 +gmfn = mfn_to_gmfn(d, mfn);
 +
 +page_set_owner(pg, NULL);
 +assign_pages(d-recipient, pg, order, 0);

This can fail.

 +
 +if ( guest_physmap_add_page(d-recipient, gmfn, mfn, order) )
 +{
 +gdprintk(XENLOG_INFO, Failed to add page to domain's 
 physmap
 +  mfn: %lx\n, mfn);

The current domain/vCPU don't matter here afaict, hence no need
for gdprintk(). The source and/or recipient domain do matter though,
hence printing either or both would seem desirable. The gMFN may
be relevant for diagnostic purposes too. Hence e.g.

printk(XENLOG_G_INFO Failed to add MFN %lx (GFN %lx) to 
Dom%d's physmap\n
   mfn, gmfn, d-recipient-domain_id);

What's worse though is that you don't guard against
XEN_DOMCTL_set_recipient zapping d-recipient. If that really is
necessary to be supported, some synchronization will be needed.

And finally - what's the intended protocol for the tool stack to
know that all pages got transferred (and hence the new domain
can be launched)?

 --- a/xen/include/public/domctl.h
 +++ b/xen/include/public/domctl.h
 @@ -965,6 +965,23 @@ struct xen_domctl_vnuma {
  typedef struct xen_domctl_vnuma xen_domctl_vnuma_t;
  DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t);
  
 +/*
 + * XEN_DOMCTL_set_recipient - sets the 'recipient' domain which will recieve
 + * all the domain's memory after its death or when and memory page from
 + * domain's heap is being freed.

So this specifically allows for this to happen on a non-dying domain.
Why, i.e. what's the intended usage scenario? If not really needed,
please remove this and verify in the handling that the source domain
is dying.

 --- a/xen/include/xen/sched.h
 +++ b/xen/include/xen/sched.h
 @@ -366,6 +366,7 @@ struct domain
  bool_t   is_privileged;
  /* Which guest this guest has privileges on */
  struct domain   *target;
 +struct domain   *recipient;

Please add a brief comment similar to the one for target.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-xen-4.5] x86/pvh/vpmu: Disable VPMU for PVH guests

2014-11-25 Thread Jan Beulich

 On 25.11.14 at 15:38, boris.ostrov...@oracle.com wrote:
 On 11/25/2014 03:45 AM, Jan Beulich wrote:
 @@ -1429,6 +1429,12 @@ int vlapic_init(struct vcpu *v)
   
   HVM_DBG_LOG(DBG_LEVEL_VLAPIC, %d, v-vcpu_id);
   
 +if ( is_pvh_vcpu(v) )
 +{
 +vlapic-hw.disabled = VLAPIC_HW_DISABLED;
 
 
 I did consider doing that but I thought that this flag is meant to be 
 set when the guest clears MSR_IA32_APICBASE_ENABLE to disable APIC and 
 therefore I'd be overloading it (the flag) in a way.

There's no overloading here - we're then simply treating all PVH
vCPU-s as having permanently hardware-disabled LAPICs (reflecting
current reality).

 Regardless, do you think that disabling VPMU for PVH is worth anyway?

That depends on what (bad) consequences not doing so has.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient

2014-11-25 Thread Jan Beulich

 On 25.11.14 at 16:41, vkuzn...@redhat.com wrote:
 Jan Beulich jbeul...@suse.com writes:
 On 07.10.14 at 15:10, vkuzn...@redhat.com wrote:
 @@ -1764,11 +1765,28 @@ void free_domheap_pages(struct page_info *pg, 
 unsigned int order)
  scrub = 1;
  }
  
 -if ( unlikely(scrub) )
 -for ( i = 0; i  (1  order); i++ )
 -scrub_one_page(pg[i]);
 +if ( !d || !d-recipient || d-recipient-is_dying )
 +{
 +if ( unlikely(scrub) )
 +for ( i = 0; i  (1  order); i++ )
 +scrub_one_page(pg[i]);
  
 -free_heap_pages(pg, order);
 +free_heap_pages(pg, order);
 +}
 +else
 +{
 +mfn = page_to_mfn(pg);
 +gmfn = mfn_to_gmfn(d, mfn);
 +
 +page_set_owner(pg, NULL);
 +assign_pages(d-recipient, pg, order, 0);

 This can fail.
 
 .. if the recipient is dying or we're over-allocating. Shouldn't happen
 (if toolstack does its job right) but I'll add a check.

You must not rely on the tool stack doing things right (see XSA-77).

 What's worse though is that you don't guard against
 XEN_DOMCTL_set_recipient zapping d-recipient. If that really is
 necessary to be supported, some synchronization will be needed.

 And finally - what's the intended protocol for the tool stack to
 know that all pages got transferred (and hence the new domain
 can be launched)?

 
 (hope this answers both questions above)
 
 Now the protocol is:
 1) Toolstack queries for all page frames (with xc_get_pfn_type_batch())
 for the old domain.
 2) Toolstack issues XEN_DOMCTL_set_recipient with the recipient domain
 3) Toolstack kills the source domain
 4) Toolstack waits for awhile (loop keeps checking while we see new pages
 arriving + some timeout).
 5) Toolstack issues XEN_DOMCTL_set_recipient with DOMID_INVALID
 resetting recipient.
 6) Toolstack checks if all pages were transferred
 7) If some pages are missing (e.g. source domain became zombie holding
 something) request new empty pages instead.
 8) Toolstack starts new domain
 
 So we don't actually need XEN_DOMCTL_set_recipient to switch from one
 recipient to the other, we just need a way to reset it.

No, this doesn't address either question. For the first one, you again
assume the tool stack behaves correctly, which is wrong nowadays.
And for the second you give a description, but that's not really a
dependable protocol. Nor do I really follow why resetting the recipient
is necessary. Can't the tools e.g. wait for the final death of the domain
if there's no other notification?

 --- a/xen/include/public/domctl.h
 +++ b/xen/include/public/domctl.h
 @@ -965,6 +965,23 @@ struct xen_domctl_vnuma {
  typedef struct xen_domctl_vnuma xen_domctl_vnuma_t;
  DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t);
  
 +/*
 + * XEN_DOMCTL_set_recipient - sets the 'recipient' domain which will 
 recieve
 + * all the domain's memory after its death or when and memory page from
 + * domain's heap is being freed.

 So this specifically allows for this to happen on a non-dying domain.
 Why, i.e. what's the intended usage scenario? If not really needed,
 please remove this and verify in the handling that the source domain
 is dying.
 
 Sorry if I didn't get this comment right. We need a way to tell which
 domain will recieve memory and XEN_DOMCTL_set_recipient sets (and
 resets) this target domain. We call it from toolstack before we start
 killing old domain so it is not dying yet. We can't do it
 hypervistor-side when our source domain exits doing SHUTDOWN_soft_reset
 as there is no recipient domain created yet.
 
 From a general (hypervisor) point of view we don't actually care if the
 domain is dying or not. We just want to recieve all freed pages from
 this domain (so they go to some other domain instead of trash).

We do care I think, primarily because we want a correct GMFN -
MFN mapping. Seeing multiple pages for the same GMFN is for
example going to cause the whole process in its current form to fail.
And considering the need for such a correct mapping - is it always
the case that the mapping gets updated after a page got removed
from a guest? (I can see that the mapping doesn't change anymore
for a dying guest, but you explicitly say that you want/need this to
work before the guest is actually marked dying.)

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5 RFC v2] x86/HVM: Unconditionally crash guests on repeated vmentry failures

2014-11-25 Thread Jan Beulich

 On 25.11.14 at 17:54, andrew.coop...@citrix.com wrote:
 This is RFC as there is still a niggle.  I tested this via a partial revert of
 the XSA-110 fix but the result is quite chatty given a double VMCB dump and
 domain crash.  However, I am not sure we want to make any vmentry failure
 quite, as any vmentry failure constitues a Xen bug.

I think that double printing would be tolerable, but I've had yet
another idea: Couldn't we make the second exception a #DF,
thus having the guest killed via triple fault in the worst case at
the third recurring failure (via hvm_combine_hw_exceptions())?

Also your test results point out that we're delivering such an
exception with wrong context to the guest: Machine state should
match that before the results from the emulation got committed.
But doing so would be a pretty significant change for almost no
benefit.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5 RFC v2] x86/HVM: Unconditionally crash guests on repeated vmentry failures

2014-11-27 Thread Jan Beulich

 On 26.11.14 at 18:43, andrew.coop...@citrix.com wrote:
 My v1 patch only fixes the VMX side of things.  SVM doesn't explicitly
 identify a failed vmentry and lets it fall into the default case of the
 vmexit handler.  As such, with v1, the infinite loop still affects AMD
 hardware.

Right; I should have said something along the lines of v1. An SVM
part is needed, but that should probably extend beyond what you
proposed in v2: There are a number of goto exit_and_crash
statements ahead of where you place your addition. I think they
all need to be treated similarly.

I therefore think we should revert the VMX part of 28b4baacd5
and make SVM behavior consistent with what results for VMX:
Crash the guest unconditionally on VMEXIT_INVALID, without
looking for recurring VM entry failures. See below/attached.

Jan

x86/HVM: prevent infinite VM entry retries

This reverts the VMX side of commit 28b4baac (x86/HVM: don't crash
guest upon problems occurring in user mode) and gets SVM in line with
the resulting VMX behavior. This is because Andrew validly says

A failed vmentry is overwhelmingly likely to be caused by corrupt
 VMC[SB] state.  As a result, injecting a fault and retrying the the
 vmentry is likely to fail in the same way.

Reported-by: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2338,6 +2338,7 @@ void svm_vmexit_handler(struct cpu_user_
 struct nestedvcpu *nv = vcpu_nestedhvm(v);
 struct vmcb_struct *ns_vmcb = nv-nv_vvmcx;
 uint64_t exitinfo1, exitinfo2;
+bool_t crash = 0;
 
 paging_update_nestedmode(v);
 
@@ -2371,13 +2372,16 @@ void svm_vmexit_handler(struct cpu_user_
 goto out;
 case NESTEDHVM_VMEXIT_FATALERROR:
 gdprintk(XENLOG_ERR, unexpected nestedsvm_vmexit() error\n);
-goto exit_and_crash;
-
+crash = 1;
+break;
 default:
 BUG();
 case NESTEDHVM_VMEXIT_ERROR:
 break;
 }
+if ( crash )
+break;
+/* fall through */
 case NESTEDHVM_VMEXIT_ERROR:
 gdprintk(XENLOG_ERR,
 nestedsvm_check_intercepts() returned 
NESTEDHVM_VMEXIT_ERROR\n);
@@ -2385,18 +2389,25 @@ void svm_vmexit_handler(struct cpu_user_
 case NESTEDHVM_VMEXIT_FATALERROR:
 gdprintk(XENLOG_ERR,
 unexpected nestedsvm_check_intercepts() error\n);
-goto exit_and_crash;
+crash = 1;
+break;
 default:
 gdprintk(XENLOG_INFO, nestedsvm_check_intercepts() returned %i\n,
 nsret);
-goto exit_and_crash;
+crash = 1;
+break;
 }
+
+if ( unlikely(crash)  exit_reason != VMEXIT_INVALID )
+goto exit_and_crash;
 }
 
 if ( unlikely(exit_reason == VMEXIT_INVALID) )
 {
+gdprintk(XENLOG_ERR, invalid VMCB state:\n);
 svm_vmcb_dump(__func__, vmcb);
-goto exit_and_crash;
+domain_crash(v-domain);
+return;
 }
 
 perfc_incra(svmexits, exit_reason);
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -134,18 +134,6 @@ static void vmx_vcpu_destroy(struct vcpu
 passive_domain_destroy(v);
 }
 
-/* Only crash the guest if the problem originates in kernel mode. */
-static void vmx_crash_or_fault(struct vcpu *v)
-{
-struct segment_register ss;
-
-vmx_get_segment_register(v, x86_seg_ss, ss);
-if ( ss.attr.fields.dpl )
-hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
-else
-domain_crash(v-domain);
-}
-
 static DEFINE_PER_CPU(struct vmx_msr_state, host_msr_state);
 
 static const u32 msr_index[] =
@@ -2520,7 +2508,7 @@ static void vmx_failed_vmentry(unsigned 
 vmcs_dump_vcpu(curr);
 printk(**\n);
 
-vmx_crash_or_fault(curr);
+domain_crash(curr-domain);
 }
 
 void vmx_enter_realmode(struct cpu_user_regs *regs)
@@ -3173,8 +3161,19 @@ void vmx_vmexit_handler(struct cpu_user_
 /* fall through */
 default:
 exit_and_crash:
-gdprintk(XENLOG_WARNING, Bad vmexit (reason %#lx)\n, exit_reason);
-vmx_crash_or_fault(v);
+{
+struct segment_register ss;
+
+gdprintk(XENLOG_WARNING, Bad vmexit (reason %#lx)\n,
+ exit_reason);
+
+vmx_get_segment_register(v, x86_seg_ss, ss);
+if ( ss.attr.fields.dpl )
+hvm_inject_hw_exception(TRAP_invalid_op,
+HVM_DELIVER_NO_ERROR_CODE);
+else
+domain_crash(v-domain);
+}
 break;
 }
 


x86/HVM: prevent infinite VM entry retries

This reverts the VMX side of commit 28b4baac (x86/HVM: don't crash
guest upon problems occurring

Re: [Xen-devel] [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient

2014-11-27 Thread Jan Beulich

 On 25.11.14 at 18:10, vkuzn...@redhat.com wrote:
 Jan Beulich jbeul...@suse.com writes:
 
 On 25.11.14 at 16:41, vkuzn...@redhat.com wrote:
 Jan Beulich jbeul...@suse.com writes:
 On 07.10.14 at 15:10, vkuzn...@redhat.com wrote:
 @@ -1764,11 +1765,28 @@ void free_domheap_pages(struct page_info *pg, 
 unsigned int order)
  scrub = 1;
  }
  
 -if ( unlikely(scrub) )
 -for ( i = 0; i  (1  order); i++ )
 -scrub_one_page(pg[i]);
 +if ( !d || !d-recipient || d-recipient-is_dying )
 +{
 +if ( unlikely(scrub) )
 +for ( i = 0; i  (1  order); i++ )
 +scrub_one_page(pg[i]);
  
 -free_heap_pages(pg, order);
 +free_heap_pages(pg, order);
 +}
 +else
 +{
 +mfn = page_to_mfn(pg);
 +gmfn = mfn_to_gmfn(d, mfn);
 +
 +page_set_owner(pg, NULL);
 +assign_pages(d-recipient, pg, order, 0);

 This can fail.
 
 .. if the recipient is dying or we're over-allocating. Shouldn't happen
 (if toolstack does its job right) but I'll add a check.

 You must not rely on the tool stack doing things right (see XSA-77).

 What's worse though is that you don't guard against
 XEN_DOMCTL_set_recipient zapping d-recipient. If that really is
 necessary to be supported, some synchronization will be needed.

 And finally - what's the intended protocol for the tool stack to
 know that all pages got transferred (and hence the new domain
 can be launched)?

 
 (hope this answers both questions above)
 
 Now the protocol is:
 1) Toolstack queries for all page frames (with xc_get_pfn_type_batch())
 for the old domain.
 2) Toolstack issues XEN_DOMCTL_set_recipient with the recipient domain
 3) Toolstack kills the source domain
 4) Toolstack waits for awhile (loop keeps checking while we see new pages
 arriving + some timeout).
 5) Toolstack issues XEN_DOMCTL_set_recipient with DOMID_INVALID
 resetting recipient.
 6) Toolstack checks if all pages were transferred
 7) If some pages are missing (e.g. source domain became zombie holding
 something) request new empty pages instead.
 8) Toolstack starts new domain
 
 So we don't actually need XEN_DOMCTL_set_recipient to switch from one
 recipient to the other, we just need a way to reset it.

 No, this doesn't address either question. For the first one, you again
 assume the tool stack behaves correctly, which is wrong nowadays.
 And for the second you give a description, but that's not really a
 dependable protocol. Nor do I really follow why resetting the recipient
 is necessary. Can't the tools e.g. wait for the final death of the domain
 if there's no other notification?
 
 Yes, it's possible and should happen in all normal cases. However my
 idea was that it's possible to start new domain even if the old one
 fails to die holding several (presumably special) pages and we're fine
 with replacing those with empty pages. In case we go for 'always wait
 for the original domain to die' solution resetting
 XEN_DOMCTL_set_recipient is not necessary (it is necessary in case we
 don't as we can recieve a page after we already requested a new one
 instead).

I think this always wait is imo the only reasonable solution: How
would replacing some pages with empty ones fit the kexec/kdump
purposes? However, it may not be necessary to wait for the
domain to completely disappear, it may be sufficient to wait for
its d-tot_pages to become zero.

 From a general (hypervisor) point of view we don't actually care if the
 domain is dying or not. We just want to recieve all freed pages from
 this domain (so they go to some other domain instead of trash).

 We do care I think, primarily because we want a correct GMFN -
 MFN mapping. Seeing multiple pages for the same GMFN is for
 example going to cause the whole process in its current form to fail.
 
 Can adding a check that nothing is mapped to the GMFN before mapping new
 MFN there be a solution?

Not checking for this is presumably the better approach - the new
domain would get what the old one had last at a given GMFN. What
it may not get are pages that intermediately got moved around.
But again, all that is relevant only if the old domain can still alter its
memory layout while the transfer is already in progress, which (as
said before) I think should be avoided.

 And considering the need for such a correct mapping - is it always
 the case that the mapping gets updated after a page got removed
 from a guest? (I can see that the mapping doesn't change anymore
 for a dying guest, but you explicitly say that you want/need this to
 work before the guest is actually marked dying.)
 
 Actual reassignment here happens for a dying guest only as
 XEN_DOMCTL_set_recipient does nothing. If you think it's safer to depend
 on the fact that dying flag is always set we can couple
 XEN_DOMCTL_set_recipient with XEN_DOMCTL_destroydomain in one call. It
 is possible if we go

Re: [Xen-devel] [PATCH v15 11/21] x86/VPMU: Interface for setting PMU mode and flags

2014-11-27 Thread Jan Beulich

 On 26.11.14 at 15:32, boris.ostrov...@oracle.com wrote:

 On 11/25/2014 08:49 AM, Jan Beulich wrote:
 On 17.11.14 at 00:07, boris.ostrov...@oracle.com wrote:
 @@ -244,19 +256,19 @@ void vpmu_initialise(struct vcpu *v)
   switch ( vendor )
   {
   case X86_VENDOR_AMD:
 -if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
 -opt_vpmu_enabled = 0;
 +if ( svm_vpmu_initialise(v) != 0 )
 +vpmu_mode = XENPMU_MODE_OFF;
   break;
   
   case X86_VENDOR_INTEL:
 -if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
 -opt_vpmu_enabled = 0;
 +if ( vmx_vpmu_initialise(v) != 0 )
 +vpmu_mode = XENPMU_MODE_OFF;
   break;
 So this turns off the vPMU globally upon failure of initializing
 some random vCPU. Why is that? I see this was the case even
 before your entire series, but shouldn't this be fixed _before_
 enhancing the whole thing to support PV/PVH?
 
 Yes, that's probably too strong. Do you want to fix this as an early 
 patch (before PV(H)) support gets in? I'd rather do it in the patch that 
 moves things into initcalls.

Yes, I think this should be fixed in a prereq patch, thus allowing it
to be easily backported if so desired.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 17/21] x86/VPMU: Handle PMU interrupts for PV guests

2014-11-27 Thread Jan Beulich

 On 26.11.14 at 15:39, boris.ostrov...@oracle.com wrote:

 On 11/25/2014 09:28 AM, Jan Beulich wrote:

 +else
 +{
 +struct segment_register seg;
 +
 +hvm_get_segment_register(sampled, x86_seg_cs, seg);
 +r-cs = seg.sel;
 +hvm_get_segment_register(sampled, x86_seg_ss, seg);
 +r-ss = seg.sel;
 +if ( seg.attr.fields.dpl != 0 )
 +*flags |= PMU_SAMPLE_USER;
 Is that how hardware treats it (CPL != 0 meaning user, rather
 than CPL == 3)?
 
 You mean how *software* (e.g. Linux kernel) treats it? If yes, then for 
 32-bit user_mode() checks for (CS == 3) and for 64-bit it's !!(CS  3).

No, I meant hardware. There CPL qualified PMU aspects, and it was
those I had in mind to use as reference here.

 Maybe you should surface CPL instead of a
 boolean flag?
 
 Am I not already doing it by passing SS and CS to the guest?

No, neither SS.RPL nor CS.RPL formally represent CPL.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-27 Thread Jan Beulich

 On 27.11.14 at 06:29, sfl...@ihonk.com wrote:
 On 11/25/2014 03:00 AM, Jan Beulich wrote:
 Okay, so it's not really the mwait-idle driver causing the regression,
 but it is C-state related. Hence we're now down to seeing whether all
 or just the deeper C states are affected, i.e. I now need to ask you
 to play with max_cstate=. For that you'll have to remember that the
 option's effect differs between the ACPI and the MWAIT idle drivers.
 In the spirit of bisection I'd suggest using max_cstate=2 first no
 matter which of the two scenarios you pick. If that still hangs,
 max_cstate=1 obviously is the only other thing to try. Should that
 not hang (and you left out mwait-idle=0), trying max_cstate=3
 in that same scenario would be the other case to check.

 No need for 'd' and 'a' output for the time being, but 'c' output would
 be much appreciated for all cases where you observe hangs.

 
 Okay, working through that now. I tried max_cstate=2 and got no hangs, 
 whether with or without mwait-idle=0. However, I was puzzled by this:
 
 (XEN) 'c' pressed - printing ACPI Cx structures
 (XEN) ==cpu0==
 (XEN) active state: C0
 (XEN) max_cstate:   C2
 (XEN) states:
 (XEN) C1:   type[C1] latency[003] usage[12219860] method[  FFH] 
 duration[1190961948551]
 (XEN) C2:   type[C1] latency[010] usage[10205554] method[  FFH] 
 duration[2015393965907]
 (XEN) C3:   type[C2] latency[020] usage[50926286] method[  FFH] 
 duration[30527997858148]
 (XEN)*C0:   usage[73351700] duration[9974627547595]
 (XEN) max=0 pwr=0 urg=0 nxt=0
 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
 (XEN) CC3[28794734145697] CC6[0] CC7[0]
 (XEN) ==cpu1==
 (XEN) active state: C3
 (XEN) max_cstate:   C2
 (XEN) states:
 (XEN) C1:   type[C1] latency[003] usage[10699950] method[  FFH] 
 duration[1141422044112]
 (XEN) C2:   type[C1] latency[010] usage[06382904] method[  FFH] 
 duration[1329739264322]
 (XEN)*C3:   type[C2] latency[020] usage[44630764] method[  FFH] 
 duration[31676618425954]
 (XEN) C0:   usage[61713618] duration[9561201640320]
 (XEN) max=0 pwr=0 urg=0 nxt=0
 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
 (XEN) CC3[30066495105056] CC6[0] CC7[0]
[...]
 
 Why would some of the cores be in C3 even though they list max_cstate as C2?

This was precisely the reason why I told you that the numbering
differs (and is confusing and has nothing to do with actual C state
numbers): What max_cstate refers to in the mwait-idle driver is
what above is listed as type[Cx], i.e. the state at index 1 is C1, at
2 we've got C1E, and at 3 we've got C2. And those still aren't in
line with the numbering the CPU documentation uses, it's rather
kind of meant to refer to the ACPI numbering (but probably also
not fully matching up).

So max_cstate=2 working suggests a problem with what the CPU
calls C6, which presumably isn't all that surprising considering the
many errata (BD35, BD38, BD40, BD59, BD87, and BD104). Not
sure how to proceed from here - I suppose you already made
sure you run with the latest available BIOS. And with 6 errata
documented it's not all that unlikely that there's a 7th one with
MONITOR/MWAIT behavior. The commit you bisected to (and
which you had verified to be the culprit by just forcing
arch_skip_send_event_check() to always return false) could be
reasonably assumed to be broken only when MWAIT use for all
C states didn't work.

Don, Jun - is there anything known but not yet publicly
documented for Family 6 Model 44 Xeons?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xsm/flask: improve unknown permission handling

2014-11-27 Thread Jan Beulich

 On 25.11.14 at 19:05, dgde...@tycho.nsa.gov wrote:
 --- a/xen/xsm/flask/hooks.c
 +++ b/xen/xsm/flask/hooks.c
 @@ -135,6 +135,19 @@ static int get_irq_sid(int irq, u32 *sid, struct 
 avc_audit_data *ad)
  return 0;
  }
  
 +static int avc_unknown_permission(const char* name, int id)

const char *name

 +{
 +/* A guest making an invalid hypercall can trigger this message, so it 
 can't
 + * be an ASSERT or BUG_ON, but normally it is caused by a missing case in
 + * one of the switch statements below.
 + */
 +printk(XENLOG_G_ERR FLASK: Unknown %s: %d.\n, name, id);

I think this ought to be XENLOG_G_WARNING when not returning
an error. E.g. switch printing and return code determination, use
the return code to select the correct log level, and return after
logging the message.

Jan

 +if ( !flask_enforcing || security_get_allow_unknown() )
 +return 0;
 +else
 +return -EPERM;
 +}
 +
  static int flask_domain_alloc_security(struct domain *d)
  {
  struct domain_security_struct *dsec;



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5 RFC v2] x86/HVM: Unconditionally crash guests on repeated vmentry failures

2014-11-27 Thread Jan Beulich

 On 27.11.14 at 11:26, t...@xen.org wrote:
 At 08:42 + on 27 Nov (1417074133), Jan Beulich wrote:
  On 26.11.14 at 18:43, andrew.coop...@citrix.com wrote:
  My v1 patch only fixes the VMX side of things.  SVM doesn't explicitly
  identify a failed vmentry and lets it fall into the default case of the
  vmexit handler.  As such, with v1, the infinite loop still affects AMD
  hardware.
 
 Right; I should have said something along the lines of v1. An SVM
 part is needed, but that should probably extend beyond what you
 proposed in v2: There are a number of goto exit_and_crash
 statements ahead of where you place your addition. I think they
 all need to be treated similarly.
 
 I therefore think we should revert the VMX part of 28b4baacd5
 and make SVM behavior consistent with what results for VMX:
 Crash the guest unconditionally on VMEXIT_INVALID, without
 looking for recurring VM entry failures. See below/attached.
 
 Jan
 
 x86/HVM: prevent infinite VM entry retries
 
 This reverts the VMX side of commit 28b4baac (x86/HVM: don't crash
 guest upon problems occurring in user mode) and gets SVM in line with
 the resulting VMX behavior. This is because Andrew validly says
 
 A failed vmentry is overwhelmingly likely to be caused by corrupt
  VMC[SB] state.  As a result, injecting a fault and retrying the the
  vmentry is likely to fail in the same way.
 
 Reported-by: Andrew Cooper andrew.coop...@citrix.com
 Signed-off-by: Jan Beulich jbeul...@suse.com
 
 Looking at the SVM side, AFAICT you're trying to filter out
 VMEXIT_INVALID exits that couldn't be handled by nested SVM, but I
 think it's fine just to always crash on nested-SVM failures: we know
 the guest wasn't in user mode because it successfully executed VMRUN.
 And looking at it, the other users of that label are for unexpected
 debugging exits, which can't be caused by the guest userspace either.
 
 So how about this for the SVM side, reverting to crashing for
 everything except new, unsupported exit types?

Generally a good idea, but there are two paths to exit_and_crash
(for VMEXIT_EXCEPTION_DB and VMEXIT_EXCEPTION_BP) which I
think would better crash only conditionally.

 --- a/xen/arch/x86/hvm/svm/svm.c
 +++ b/xen/arch/x86/hvm/svm/svm.c
 @@ -2675,16 +2675,18 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
  break;
  
  default:
 -exit_and_crash:
  gdprintk(XENLOG_ERR, unexpected VMEXIT: exit reason = %#PRIx64, 
 
   exitinfo1 = %#PRIx64, exitinfo2 = %#PRIx64\n,
   exit_reason, 
   (u64)vmcb-exitinfo1, (u64)vmcb-exitinfo2);
 -if ( vmcb_get_cpl(vmcb) )
 +if ( vmcb_get_cpl(vmcb) ) {
  hvm_inject_hw_exception(TRAP_invalid_op,
  HVM_DELIVER_NO_ERROR_CODE);
 -else
 -domain_crash(v-domain);
 +break;
 +}
 +/* else fall through */
 +exit_and_crash:
 +domain_crash(v-domain);
  break;
  }

Additionally this re-arrangement would make some domain_crash()
invocations silent (i.e. no other accompanying message), but that's
of course easy to fix.

And finally, if doing it that way we might better remove the
exit_and_crash label altogether and call domain_crash() directly
in the places we mean it to be called.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5 RFC v2] x86/HVM: Unconditionally crash guests on repeated vmentry failures

2014-11-27 Thread Jan Beulich

 On 27.11.14 at 12:29, t...@xen.org wrote:
 At 11:16 + on 27 Nov (1417083414), Jan Beulich wrote:
  On 27.11.14 at 11:26, t...@xen.org wrote:
  At 08:42 + on 27 Nov (1417074133), Jan Beulich wrote:
   On 26.11.14 at 18:43, andrew.coop...@citrix.com wrote:
   My v1 patch only fixes the VMX side of things.  SVM doesn't explicitly
   identify a failed vmentry and lets it fall into the default case of the
   vmexit handler.  As such, with v1, the infinite loop still affects AMD
   hardware.
  
  Right; I should have said something along the lines of v1. An SVM
  part is needed, but that should probably extend beyond what you
  proposed in v2: There are a number of goto exit_and_crash
  statements ahead of where you place your addition. I think they
  all need to be treated similarly.
  
  I therefore think we should revert the VMX part of 28b4baacd5
  and make SVM behavior consistent with what results for VMX:
  Crash the guest unconditionally on VMEXIT_INVALID, without
  looking for recurring VM entry failures. See below/attached.
  
  Jan
  
  x86/HVM: prevent infinite VM entry retries
  
  This reverts the VMX side of commit 28b4baac (x86/HVM: don't crash
  guest upon problems occurring in user mode) and gets SVM in line with
  the resulting VMX behavior. This is because Andrew validly says
  
  A failed vmentry is overwhelmingly likely to be caused by corrupt
   VMC[SB] state.  As a result, injecting a fault and retrying the the
   vmentry is likely to fail in the same way.
  
  Reported-by: Andrew Cooper andrew.coop...@citrix.com
  Signed-off-by: Jan Beulich jbeul...@suse.com
  
  Looking at the SVM side, AFAICT you're trying to filter out
  VMEXIT_INVALID exits that couldn't be handled by nested SVM, but I
  think it's fine just to always crash on nested-SVM failures: we know
  the guest wasn't in user mode because it successfully executed VMRUN.
  And looking at it, the other users of that label are for unexpected
  debugging exits, which can't be caused by the guest userspace either.
  
  So how about this for the SVM side, reverting to crashing for
  everything except new, unsupported exit types?
 
 Generally a good idea, but there are two paths to exit_and_crash
 (for VMEXIT_EXCEPTION_DB and VMEXIT_EXCEPTION_BP) which I
 think would better crash only conditionally.
 
 Those are catching Xen bugs -- we don't (or at least shouldn't) enable
 those exit types when the debugger is not attached.  I think that,
 like with the p2m ENOMEM case, turning them into #UD is not really
 helpful.  But if you prefer, those could be made into 'goto default'
 to preserve the current behaviour, which would also retain the
 debugging output.
 
 And finally, if doing it that way we might better remove the
 exit_and_crash label altogether and call domain_crash() directly
 in the places we mean it to be called.
 
 Indeed.  How's this, then?

Looks good - if you don't mind I'll replace the SVM part of the earlier
patch with this, add your S-o-b alongside mine, and send again for
final review.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/HVM: prevent infinite VM entry retries

2014-11-27 Thread Jan Beulich

 On 27.11.14 at 13:46, andrew.coop...@citrix.com wrote:
 On 27/11/14 12:39, Jan Beulich wrote:
  if ( unlikely(exit_reason == VMEXIT_INVALID) )
  {
 
 I think it would be good to retain the printk here from previous
 versions of the patch, specifically identifies the below vmcb dump as
 caused by a failed vmentry.  As it stands, it is a vmcb dump with no
 other context.
 
 Either way, Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

I meant to do so and then forgot. Added to the to be committed
version.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Jan Beulich

 On 28.11.14 at 04:28, liang.z...@intel.com wrote:
 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -4287,7 +4287,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
 unsigned int *ebx,
   !host_tsc_is_safe() )
  *edx = ~cpufeat_mask(X86_FEATURE_RDTSCP);
  /* Hide 1GB-superpage feature if we can't emulate it. */
 -if (!hvm_pse1gb_supported(d))
 +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d))
  *edx = ~cpufeat_mask(X86_FEATURE_PAGE1GB);

With

#define hvm_pse1gb_supported(d) \
(cpu_has_page1gb  paging_mode_hap(d))

the change above is pointless. While, considering this, comments on
v2 may have been misleading, you should have simply updated the
patch description instead to clarify why the v2 change was okay
even for the shadow mode case.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [xen-4.4-testing test] 31882: regressions - trouble: blocked/broken/fail/pass

2014-11-28 Thread Jan Beulich

 On 28.11.14 at 10:07, ian.jack...@eu.citrix.com wrote:
 flight 31882 xen-4.4-testing real [real]
 http://www.chiark.greenend.org.uk/~xensrcts/logs/31882/ 
 
 Regressions :-(
 
 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-amd64-i386-pair   17 guest-migrate/src_host/dst_host fail REGR. vs. 
 31781

Wasn't the swiotlb problem supposed to be dealt with by now?
swiotlb=65536 looks to be in place here, yet that still appears to
not be big enough...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] x86: add p2m_mmio_write_dm

2014-11-28 Thread Jan Beulich

 On 28.11.14 at 08:59, yu.c.zh...@linux.intel.com wrote:
 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -2838,7 +2838,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
 long gla,
   * to the mmio handler.
   */
  if ( (p2mt == p2m_mmio_dm) || 
 - (npfec.write_access  (p2mt == p2m_ram_ro)) )
 + (npfec.write_access 
 + ((p2mt == p2m_ram_ro) || (p2mt == p2m_mmio_write_dm))) )

Why are you corrupting indentation here?

Furthermore the code you modify here suggests that p2m_ram_ro
already has the needed semantics - writes get passed to the DM.
None of the other changes you make, and none of the other uses
of p2m_ram_ro appear to be in conflict with your intentions, so
you'd really need to explain better why you need the new type.

 @@ -5922,6 +5923,8 @@ long do_hvm_op(unsigned long op, 
 XEN_GUEST_HANDLE_PARAM(void) arg)
  a.mem_type =  HVMMEM_ram_rw;
  else if ( p2m_is_grant(t) )
  a.mem_type =  HVMMEM_ram_rw;
 +else if ( t == p2m_mmio_write_dm )
 +a.mem_type = HVMMEM_mmio_write_dm;
  else
  a.mem_type =  HVMMEM_mmio_dm;
  rc = __copy_to_guest(arg, a, 1) ? -EFAULT : 0;
 @@ -5941,7 +5944,8 @@ long do_hvm_op(unsigned long op, 
 XEN_GUEST_HANDLE_PARAM(void) arg)
  static const p2m_type_t memtype[] = {
  [HVMMEM_ram_rw]  = p2m_ram_rw,
  [HVMMEM_ram_ro]  = p2m_ram_ro,
 -[HVMMEM_mmio_dm] = p2m_mmio_dm
 +[HVMMEM_mmio_dm] = p2m_mmio_dm,
 +[HVMMEM_mmio_write_dm] = p2m_mmio_write_dm
  };
  
  if ( copy_from_guest(a, arg, 1) )
 @@ -5987,14 +5991,17 @@ long do_hvm_op(unsigned long op, 
 XEN_GUEST_HANDLE_PARAM(void) arg)
  rc = -EAGAIN;
  goto param_fail4;
  }
 +

Stray addition of a blank line?

  if ( !p2m_is_ram(t) 
 - (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) )
 + (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) 
 + t != p2m_mmio_write_dm )

Do you really want to permit e.g. transitions between mmio_dm and
mmio_write_dm? We should be as restrictive as possible here to not
open up paths to security problems.

  {
  put_gfn(d, pfn);
  goto param_fail4;
  }
  
  rc = p2m_change_type_one(d, pfn, t, memtype[a.hvmmem_type]);
 +
  put_gfn(d, pfn);
  if ( rc )
  goto param_fail4;

Another stray newline addition.

 --- a/xen/include/asm-x86/p2m.h
 +++ b/xen/include/asm-x86/p2m.h
 @@ -72,6 +72,7 @@ typedef enum {
  p2m_ram_shared = 12,  /* Shared or sharable memory */
  p2m_ram_broken = 13,  /* Broken page, access cause domain crash 
 */
  p2m_map_foreign  = 14,/* ram pages from foreign domain */
 +p2m_mmio_write_dm = 15,   /* Read-only; writes go to the device 
 model */
  } p2m_type_t;
  
  /* Modifiers to the query */
 

If the new type is really needed, shouldn't this get added to
P2M_RO_TYPES?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Jan Beulich

 On 28.11.14 at 11:29, liang.z...@intel.com wrote:
  -if (!hvm_pse1gb_supported(d))
 +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d))
  *edx = ~cpufeat_mask(X86_FEATURE_PAGE1GB);

With

#define hvm_pse1gb_supported(d) \
(cpu_has_page1gb  paging_mode_hap(d))
 
the change above is pointless. While, considering this, comments on
v2 may have been misleading, you should have simply updated the patch 
 description instead to clarify why the v2 change was okay even for the shadow 
 mode case.
 
 I checked the code and found that for a normal guest can only be in hap mode 
 or shadow mode before I sending the patch, but I am not share if it's true 
 for dom0. 

The CPUID code in libxc doesn't apply to Dom0 at all, and CPUID
handling is also special cased in the hypervisor for Dom0. Plus
finally Dom0 only possibly being PV or PVH, PVH requiring HAP
and PV generally not allowing large pages anyway, your concern
is unnecessary.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v4] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Jan Beulich

 On 28.11.14 at 11:52, liang.z...@intel.com wrote:
 If hardware support the 1GB pages, expose the feature to guest by
 default. Users don't have to use a 'cpuid= ' option in config fil
 e to turn it on.
 
 If guest use shadow mode, the 1GB pages feature will be hidden from
 guest, this is done in the function hvm_cpuid(). So the change is
 okay for shadow mode case.
 
 Signed-off-by: Liang Li liang.z...@intel.com
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com

Reviewed-by: Jan Beulich jbeul...@suse.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-11-28 Thread Jan Beulich

 On 28.11.14 at 12:36, andrew.coop...@citrix.com wrote:
 However, it occurs to me that HVMOP_op_mask absolutely must live in the
 public header files, as it is guest visible, and forms part of the ABI.
 
 Consider a userspace task which falls into a continuation, is preempted
 and paused by the guest kernel, the entire VM migrated, and the task
 unpaused later.  If HVMOP_op_mask has changed in that time, the
 continuation will become erroneous.
 
 In particular, all hypercall continuation strategies we use *must* be
 forward compatible when moving to newer versions of Xen, because Xen
 cannot possibly guarantee that a continuation which starts will finish
 on the same hypervisor.

Hmm, I see the issue, but surfacing such implementation details
would not only be unfortunate, but eventually prevent us from
making future changes. Just recall the mem-op case where we had
to widen the continuation encoding mask at some point. Hence while
I can't immediately see a way to avoid the situation you describe
(and it doesn't even take a user space task in a preemptible kernel),
I think we should allow ourselves a little more time to find possible
solutions other than locking down these masks (really they don't
need to be exposed in the public headers, but we would need
them to not change if no other solution can be found).

One thing to note is that the _introduction_ of such a mask (as
has happened for hvm-op between 4.4 and 4.5) is not a problem
as long as the respective bits all being zero has no special
meaning. What I considered for mem-op a while ago though (and
then forgot again) was to refuse non-zero start_extent values
for any operations not making use of that mechanism. The same
would of course be applicable to gnttab-op and hvm-op.

What might additionally make this not that urgent an issue for 4.5
is that only XSM_DM_PRIV guarded operations are affected, and
I suppose a stubdom gets re-created on the target host (rather
than migrated) when its client migrates.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-11-28 Thread Jan Beulich

 On 28.11.14 at 14:55, andrew.coop...@citrix.com wrote:
 The problem is with continuations which reuse the upper bits of the
 input register, not with this HVMOP_op_mask specifically; the
 HVMOP_op_mask simply adds to an existing problem.  This is something
 which needs considering urgently because, as you identify above, we have
 already suffered an accidental ABI breakage with the mem-op widening.

Since we can't retroactively fix the mem-op widening, I still don't see
what's urgent here: As long as we don't change any of these masks,
nothing bad is going to happen. Of course one thing to consider with
this aspect in mind is whether to change the hvm-op or gnttab-op
masks again _before_ 4.5 goes out, based on whether we feel they're
wide enough for the (un)foreseeable future.

 32bit HVM guests reliably form a continuation on every single iteration
 of a continuable hypercall (e.g. decrease reservation, which is the base
 of my XSA-111 PoC), making it trivial to construct a migrateable HVM
 domain which exposes the issue.

Hmm, I can't offhand see why that would be, but what you write
reads like you determined the reason? I ask because an unavoidable
use of continuations is certainly nice for making sure those code paths
get tested, but would be quite desirable to get eliminated at least for
non-debug builds.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] x86: add p2m_mmio_write_dm

2014-12-01 Thread Jan Beulich

 On 01.12.14 at 09:49, yu.c.zh...@linux.intel.com wrote:
 On 11/28/2014 5:57 PM, Jan Beulich wrote:
 On 28.11.14 at 08:59, yu.c.zh...@linux.intel.com wrote:
 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -2838,7 +2838,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
 long gla,
* to the mmio handler.
*/
   if ( (p2mt == p2m_mmio_dm) ||
 - (npfec.write_access  (p2mt == p2m_ram_ro)) )
 + (npfec.write_access 
 +   ((p2mt == p2m_ram_ro) || (p2mt == p2m_mmio_write_dm))) )

 Why are you corrupting indentation here?
 Thanks for your comments, Jan.
 The indentation here is to make sure the
 ((p2mt == p2m_ram_ro) || (p2mt == p2m_mmio_write_dm)) are grouped 
 together. But I am not sure if this is correct according to xen coding 
 style. I may have misunderstood your previous comments on Sep 3, which 
 said the indentation would need adjustment in reply to [Xen-devel] 
 [PATCH v3 1/2] x86: add p2m_mmio_write_dm

Getting the alignment right is needed, but no via using hard tabs.

 Furthermore the code you modify here suggests that p2m_ram_ro
 already has the needed semantics - writes get passed to the DM.
 None of the other changes you make, and none of the other uses
 of p2m_ram_ro appear to be in conflict with your intentions, so
 you'd really need to explain better why you need the new type.

 Thanks Jan.
 To my understanding, pages with p2m_ram_ro are not supposed to be 
 modified by guest. So in __hvm_copy(), when p2m type of a page is 
 p2m_ram_rom, no copy will occur.
 However, to our usage, we just wanna this page to be write protected, so 
 that our device model can be triggered to do some emulation. The content 
 written to this page is supposed not to be dropped. This way, if 
 sequentially a read operation is performed by guest to this page, the 
 guest will still see its anticipated value.

__hvm_copy() is only a helper function, and doesn't write to
mmio_dm space either; instead its (indirect) callers would invoke
hvmemul_do_mmio() upon seeing HVMCOPY_bad_gfn_to_mfn
returns. The question hence is about the apparent inconsistency
resulting from writes to ram_ro being dropped here but getting
passed to the DM by hvm_hap_nested_page_fault(). Tim - is
that really intentional?

 Maybe I need to update my commit message to explain this more clearly. :)

At the very least, yes.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [Patch V2] expand x86 arch_shared_info to support linear p2m list

2014-12-01 Thread Jan Beulich

 On 01.12.14 at 10:29, jgr...@suse.com wrote:
 The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
 currently contains the mfn of the top level page frame of the 3 level
 p2m tree, which is used by the Xen tools during saving and restoring
 (and live migration) of pv domains and for crash dump analysis. With
 three levels of the p2m tree it is possible to support up to 512 GB of
 RAM for a 64 bit pv domain.
 
 A 32 bit pv domain can support more, as each memory page can hold 1024
 instead of 512 entries, leading to a limit of 4 TB.
 
 To be able to support more RAM on x86-64 switch to a virtual mapped
 p2m list.
 
 This patch expands struct arch_shared_info with a new p2m list virtual
 address, the root of the page table root and a p2m generation count.
 The new information is indicated by the domain to be valid by storing
 ~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
 usability of this feature by a new flag XENFEAT_virtual_p2m.
 
 Right now XENFEAT_virtual_p2m will not be set. This will change when
 the Xen tools support the virtual mapped p2m list.

This seems wrong: XENFEAT_* only reflect hypervisor capabilities.
I.e. the availability of the new functionality may need to be
advertised another way - xenstore perhaps?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] x86: add p2m_mmio_write_dm

2014-12-01 Thread Jan Beulich

 On 01.12.14 at 11:30, t...@xen.org wrote:
 At 09:32 + on 01 Dec (1417422746), Jan Beulich wrote:
  On 01.12.14 at 09:49, yu.c.zh...@linux.intel.com wrote:
  To my understanding, pages with p2m_ram_ro are not supposed to be 
  modified by guest. So in __hvm_copy(), when p2m type of a page is 
  p2m_ram_rom, no copy will occur.
  However, to our usage, we just wanna this page to be write protected, so 
  that our device model can be triggered to do some emulation. The content 
  written to this page is supposed not to be dropped. This way, if 
  sequentially a read operation is performed by guest to this page, the 
  guest will still see its anticipated value.
 
 __hvm_copy() is only a helper function, and doesn't write to
 mmio_dm space either; instead its (indirect) callers would invoke
 hvmemul_do_mmio() upon seeing HVMCOPY_bad_gfn_to_mfn
 returns. The question hence is about the apparent inconsistency
 resulting from writes to ram_ro being dropped here but getting
 passed to the DM by hvm_hap_nested_page_fault(). Tim - is
 that really intentional?
 
 No - and AFAICT it shouldn't be happening.  It _is_ how it was
 implemented originally, because it involved fewer moving parts and
 didn't need to be efficient (and after all, writes to entirely missing
 addresses go to the device model too).
 
 But the code was later updated to log and discard writes to read-only
 memory (see 4d8aa29 from Trolle Selander).
 
 Early version of p2m_ram_ro were documented in the internal headers as
 sending the writes to the DM, but the public interface (HVMMEM_ram_ro)
 has always said that writes are discarded.

Hmm, so which way do you recommend resolving the inconsistency
then - match what the public interface says or what the apparent
original intention for the internal type was? Presumably we need to
follow the public interface mandated model, and hence require the
new type to be introduced.

 During this bit of archaeology I realised that either this new type
 should _not_ be made part of P2M_RO_TYPES, or, better, we need a new
 class of P2M types (P2M_DISCARD_WRITE_TYPES, say) that should be used
 for these paths in emulate_gva_to_mfn() and __hvm_copy(), containing
 just p2m_ram_ro and p2m_grant_map_ro.

And I suppose in that latter case the new type could be made part
of P2M_RO_TYPES()?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [Patch V2] expand x86 arch_shared_info to support linear p2m list

2014-12-01 Thread Jan Beulich

 On 01.12.14 at 12:19, david.vra...@citrix.com wrote:
 On 01/12/14 10:15, Jan Beulich wrote:
 On 01.12.14 at 10:29, jgr...@suse.com wrote:
 The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
 currently contains the mfn of the top level page frame of the 3 level
 p2m tree, which is used by the Xen tools during saving and restoring
 (and live migration) of pv domains and for crash dump analysis. With
 three levels of the p2m tree it is possible to support up to 512 GB of
 RAM for a 64 bit pv domain.

 A 32 bit pv domain can support more, as each memory page can hold 1024
 instead of 512 entries, leading to a limit of 4 TB.

 To be able to support more RAM on x86-64 switch to a virtual mapped
 p2m list.

 This patch expands struct arch_shared_info with a new p2m list virtual
 address, the root of the page table root and a p2m generation count.
 The new information is indicated by the domain to be valid by storing
 ~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
 usability of this feature by a new flag XENFEAT_virtual_p2m.

 Right now XENFEAT_virtual_p2m will not be set. This will change when
 the Xen tools support the virtual mapped p2m list.
 
 This seems wrong: XENFEAT_* only reflect hypervisor capabilities.
 I.e. the availability of the new functionality may need to be
 advertised another way - xenstore perhaps?
 
 Xenstore doesn't work for dom0.
 
 Shouldn't this be something the guest kernel reports using a ELF note bit?
 
 When building a domain (either in Xen for dom0 or in the tools), the
 builder may provide a linear p2m iff supported by the guest kernel and
 then (and only then) can it provide a guest with  512 GiB.

Yes, surely this flag could act as a kernel capability indicator (via
the XEN_ELFNOTE_SUPPORTED_FEATURES note), like e.g.
XENFEAT_dom0 already does. Jürgen's final statement, however,
suggested to me that this is meant to be only consumed by kernels.
Otoh the P2M provided by both Dom0 and DomU builders have always
been linear anyway; it's the pv-ops kernel that constructs a tree as
replacement.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] x86: add p2m_mmio_write_dm

2014-12-01 Thread Jan Beulich

 On 01.12.14 at 13:13, t...@xen.org wrote:
 At 11:17 + on 01 Dec (1417429027), Jan Beulich wrote:
  On 01.12.14 at 11:30, t...@xen.org wrote:
  At 09:32 + on 01 Dec (1417422746), Jan Beulich wrote:
   On 01.12.14 at 09:49, yu.c.zh...@linux.intel.com wrote:
   To my understanding, pages with p2m_ram_ro are not supposed to be 
   modified by guest. So in __hvm_copy(), when p2m type of a page is 
   p2m_ram_rom, no copy will occur.
   However, to our usage, we just wanna this page to be write protected, 
   so 
   that our device model can be triggered to do some emulation. The 
   content 
   written to this page is supposed not to be dropped. This way, if 
   sequentially a read operation is performed by guest to this page, the 
   guest will still see its anticipated value.
  
  __hvm_copy() is only a helper function, and doesn't write to
  mmio_dm space either; instead its (indirect) callers would invoke
  hvmemul_do_mmio() upon seeing HVMCOPY_bad_gfn_to_mfn
  returns. The question hence is about the apparent inconsistency
  resulting from writes to ram_ro being dropped here but getting
  passed to the DM by hvm_hap_nested_page_fault(). Tim - is
  that really intentional?
  
  No - and AFAICT it shouldn't be happening.  It _is_ how it was
  implemented originally, because it involved fewer moving parts and
  didn't need to be efficient (and after all, writes to entirely missing
  addresses go to the device model too).
  
  But the code was later updated to log and discard writes to read-only
  memory (see 4d8aa29 from Trolle Selander).
  
  Early version of p2m_ram_ro were documented in the internal headers as
  sending the writes to the DM, but the public interface (HVMMEM_ram_ro)
  has always said that writes are discarded.
 
 Hmm, so which way do you recommend resolving the inconsistency
 then - match what the public interface says or what the apparent
 original intention for the internal type was? Presumably we need to
 follow the public interface mandated model, and hence require the
 new type to be introduced.
 
 Sorry, I was unclear -- there isn't an inconsistency; both internal
 and public headers currently say that writes are discarded and AFAICT
 that is what the code does.

Not for hvm_hap_nested_page_fault() afaict - the forwarding to
DM there contradicts the writes are discarded model that other
code paths follow.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] x86: add p2m_mmio_write_dm

2014-12-02 Thread Jan Beulich

 On 02.12.14 at 08:48, yu.c.zh...@linux.intel.com wrote:

 
 On 12/1/2014 8:31 PM, Jan Beulich wrote:
 On 01.12.14 at 13:13, t...@xen.org wrote:
 At 11:17 + on 01 Dec (1417429027), Jan Beulich wrote:
 On 01.12.14 at 11:30, t...@xen.org wrote:
 At 09:32 + on 01 Dec (1417422746), Jan Beulich wrote:
 On 01.12.14 at 09:49, yu.c.zh...@linux.intel.com wrote:
 To my understanding, pages with p2m_ram_ro are not supposed to be
 modified by guest. So in __hvm_copy(), when p2m type of a page is
 p2m_ram_rom, no copy will occur.
 However, to our usage, we just wanna this page to be write protected, so
 that our device model can be triggered to do some emulation. The content
 written to this page is supposed not to be dropped. This way, if
 sequentially a read operation is performed by guest to this page, the
 guest will still see its anticipated value.

 __hvm_copy() is only a helper function, and doesn't write to
 mmio_dm space either; instead its (indirect) callers would invoke
 hvmemul_do_mmio() upon seeing HVMCOPY_bad_gfn_to_mfn
 returns. The question hence is about the apparent inconsistency
 resulting from writes to ram_ro being dropped here but getting
 passed to the DM by hvm_hap_nested_page_fault(). Tim - is
 that really intentional?

 No - and AFAICT it shouldn't be happening.  It _is_ how it was
 implemented originally, because it involved fewer moving parts and
 didn't need to be efficient (and after all, writes to entirely missing
 addresses go to the device model too).

 But the code was later updated to log and discard writes to read-only
 memory (see 4d8aa29 from Trolle Selander).

 Early version of p2m_ram_ro were documented in the internal headers as
 sending the writes to the DM, but the public interface (HVMMEM_ram_ro)
 has always said that writes are discarded.

 Hmm, so which way do you recommend resolving the inconsistency
 then - match what the public interface says or what the apparent
 original intention for the internal type was? Presumably we need to
 follow the public interface mandated model, and hence require the
 new type to be introduced.

 Sorry, I was unclear -- there isn't an inconsistency; both internal
 and public headers currently say that writes are discarded and AFAICT
 that is what the code does.

 Not for hvm_hap_nested_page_fault() afaict - the forwarding to
 DM there contradicts the writes are discarded model that other
 code paths follow.

 Thanks, Jan.
 By inconsistency, do you mean the p2m_ram_ro shall not trigger the 
 handle_mmio_with_translation() in hvm_hap_nested_page_fault()?

Yes, pending Tim's agreement.

 I'm also a bit confused with the writes are discarded/dropped comments 
 in the code. Does this mean writes to the p2m_ram_ro pages should be 
 abandoned without going to the dm, or going to the dm and  ignored 
 later? The code seems to be the second one.

And it would seem to me that it should be the former.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.5] call xc_domain_irq_permission before xc_domain_irq_permission

2014-12-02 Thread Jan Beulich

 On 01.12.14 at 20:22, stefano.stabell...@eu.citrix.com wrote:
 xc_physdev_unmap_pirq might revoke the permission to map the irq from
 the domain causing the following xc_domain_irq_permission call to fail
 and return error (domain_pirq_to_irq returns 0).

Apart from the patch title being rather confusing, isn't the root
problem then xc_physdev_unmap_pirq() removes permissions? At
least after the strict splitting of permission granting and mapping for
MMIO (commit 0561e1f0) it would seem that if the underlying
hypercall here behaves similarly to the original behavior there, it
ought to be fixed in an analogous way.

Jan

 Call xc_domain_irq_permission first to prevent this from happening
 (xc_physdev_unmap_pirq calls irq_deny_access, that doesn't fail if the
 permission is already removed).
 
 This is a simple bug fix and I think should go in 4.5.
 
 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 
 diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
 index 316643c..904d255 100644
 --- a/tools/libxl/libxl_pci.c
 +++ b/tools/libxl/libxl_pci.c
 @@ -1288,14 +1290,14 @@ skip1:
  goto out;
  }
  if ((fscanf(f, %u, irq) == 1)  irq) {
 -rc = xc_physdev_unmap_pirq(ctx-xch, domid, irq);
 -if (rc  0) {
 -LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, 
 xc_physdev_unmap_pirq irq=%d, irq);
 -}
  rc = xc_domain_irq_permission(ctx-xch, domid, irq, 0);
  if (rc  0) {
  LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, 
 xc_domain_irq_permission irq=%d, irq);
  }
 +rc = xc_physdev_unmap_pirq(ctx-xch, domid, irq);
 +if (rc  0) {
 +LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, 
 xc_physdev_unmap_pirq irq=%d, irq);
 +}
  }
  fclose(f);
  }
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org 
 http://lists.xen.org/xen-devel 




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 13153 matches

Mail list logo