Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Wei Liu
On Thu, Jan 21, 2016 at 10:37:51AM +, Ian Campbell wrote:
> On Thu, 2016-01-21 at 10:25 +, Wei Liu wrote:
> > On Thu, Jan 21, 2016 at 10:12:27AM +, Ian Campbell wrote:
> > [...]
> > > > I've asked the reporter to send logs for the 4.4 case to xen-devel.
> > > 
> > > User confirmed[0] that 4.4 is actually OK.
> > > 
> > > Did someone request stable backports yet, or shall I do so?
> > > 
> > 
> > I vaguely remember we requested backport for relevant patches long time
> > ago, but I admit I have lost track. So it wouldn't hurt if you do it
> > again.
> 
> So I think we'd be looking for:
> 
> 32a8440 xen-netfront: respect user provided max_queues
> 4c82ac3 xen-netback: respect user provided max_queues
> ca88ea1 xen-netfront: update num_queues to real created
> 
> which certainly resolves things such that the workarounds work, and I think
> will also fix the default case such that it works with up to 32 vcpus
> (although it will consume all the grants and only get 31/32 queues).
> 
> Does that sound correct?
> 

Yes, it does.

> As Annie said, we may still want to consider what a sensible default max
> queues would be.
> 

Maybe we should set a cap to 8 or 16 by default.

Wei.

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 5/5] libxl: add options to enable/disable emulated devices

2016-01-21 Thread Roger Pau Monné
El 21/01/16 a les 11.29, Ian Campbell ha escrit:
> On Thu, 2016-01-21 at 11:01 +0100, Roger Pau Monné wrote:
>> El 21/01/16 a les 10.39, Ian Campbell ha escrit:
 It also means that HVMlite guests created with current Xen will be
 capable of migrating to newer version of Xen, that might have a
 different default policy. For example in the future we might want to
 enable the lapic by default, so if a guest is created with the current
 Xen version it doesn't get a lapic at all, and then when migrated to
 newer versions a lapic would magically appear after the migration, which
 is not desired.
>>>
>>> ... and the reason these details can't be propagated via the Xen blob is
>>> that this emul stuff needs to be set exactly once at domain create time I
>>> suppose? Changing it to be later binding is considered to be too hard/too
>>> big a yak?
>>
>> Right, emulated devices are initialised as part of the 
>> XEN_DOMCTL_createdomain hypercall. Allowing them to be added later on 
>> and introducing a kind of intermediate domain building phase where only 
>> a certain set of hypercalls are valid is a possibility that Andrew 
>> already pointed out, however this seems like a very big project.
> 
> This seems like the right approach to me, but I appreciate you not wanting
> to tackle this here and now.
> 
> Would it be possible to set the set of "potential" emulated devices at
> create time and only "commit" to it after the save state is loaded? That
> would essentially mean init-all, load state, de-init those which didn't get
> any state loaded? Not as nice as doing it with the split of hypercall
> availability, but might be more achievable, since you already have the de-
> init code for domain teardown time.

Sadly there are devices that AFAICT don't restore any state (like the
VGA), which means a more complex mechanism is needed and we cannot rely
exclusively on restores in order to decide if a device is present or not.

IMHO the current approach is not that bad, and I think we should be able
to migrate to a better one without problems. If in the future we want to
do something like what you describe above (setting the set of emulated
devices based on the Xen context restored), the extra information in the
libxl JSON stream is certainly not going to hurt us. At worst we could
always check that the libxl JSON information matches with what the
hypervisor context contains for extra safety.

> In any case, whatever is chosen as the solution the commit message needs to
> go into a fair amount of detail as to why we picked that way of doing
> things, particularly if it is a compromise vs doing it properly.
> 
> This is important so we can answer "why did we do it this way" in 2 years
> time.

Right, thanks, I will update the commit message with the outcome of this
discussion.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [distros-debian-wheezy test] 38678: trouble: blocked/broken/pass

2016-01-21 Thread Platform Team regression test user
flight 38678 distros-debian-wheezy real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/38678/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i3863 host-install(3) broken REGR. vs. 38635
 build-amd64   3 host-install(3) broken REGR. vs. 38635
 build-i386-pvops  3 host-install(3) broken REGR. vs. 38635
 build-amd64-pvops 3 host-install(3) broken REGR. vs. 38635

Tests which did not succeed, but are not blocking:
 test-amd64-i386-i386-wheezy-netboot-pvgrub  1 build-check(1)   blocked n/a
 test-amd64-amd64-i386-wheezy-netboot-pygrub  1 build-check(1)  blocked n/a
 test-amd64-i386-amd64-wheezy-netboot-pygrub  1 build-check(1)  blocked n/a
 test-amd64-amd64-amd64-wheezy-netboot-pvgrub  1 build-check(1) blocked n/a

baseline version:
 flight   38635

jobs:
 build-amd64  broken  
 build-armhf  pass
 build-i386   broken  
 build-amd64-pvopsbroken  
 build-armhf-pvopspass
 build-i386-pvops broken  
 test-amd64-amd64-amd64-wheezy-netboot-pvgrub blocked 
 test-amd64-i386-i386-wheezy-netboot-pvgrub   blocked 
 test-amd64-i386-amd64-wheezy-netboot-pygrub  blocked 
 test-amd64-amd64-i386-wheezy-netboot-pygrub  blocked 



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of "request-multicast-control" flag

2016-01-21 Thread Paul Durrant
> -Original Message-
> From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-
> boun...@lists.xen.org] On Behalf Of Paul Durrant
> Sent: 20 January 2016 13:14
> To: Ian Campbell; xen-de...@lists.xenproject.org
> Cc: Ian Jackson; Keir (Xen.org); Jan Beulich; Tim (Xen.org)
> Subject: Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of
> "request-multicast-control" flag
> 
> > -Original Message-
> > From: Ian Campbell [mailto:ian.campb...@citrix.com]
> > Sent: 20 January 2016 13:06
> > To: Paul Durrant; xen-de...@lists.xenproject.org
> > Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org)
> > Subject: Re: [PATCH] public/io/netif.h: change semantics of "request-
> > multicast-control" flag
> >
> > On Wed, 2016-01-20 at 12:50 +, Paul Durrant wrote:
> > > My patch b2700877 "move and amend multicast control documentation"
> > > clarified use of the multicast control protocol between frontend and
> > > backend. However, it transpires that the restrictions that documentation
> > > placed on the "request-multicast-control" flag make it hard for a
> > > frontend to enable 'all multicast' promiscuous mode, in that to do so
> > > would require the frontend and backend to disconnect and re-connect.
> >
> > Do we therefore think that this document reflected reality, i.e. might this
> > not be "just" a documentation bug?
> >
> > (Or maybe we can't tell because the only previous implementation was
> years
> > ago in Solaris or something)
> 
> That's my concern. I hope it's just a documentation bug, but I don't know.
> Also I've already done an implementation in Linux netback according to the
> restricted semantics.
> 
> >
> > > This patch adds a new "feature-dynamic-multicast-control" flag to allow
> > > a backend to advertise that it will watch "request-multicast-control"
> hence
> > > allowing it to be meaningfully modified by the frontend at any time rather
> > > than only when the frontend and backend are disconnected.
> >
> > Would allowing XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL} to take a
> bcast
> > address
> > be easier on the backend, in that it would just need to be a static feature
> > rather than watching stuff on the fly?
> 
> The documented semantics of the list are 'exact match' so sending a bcast
> address doesn't do much good with a backend that doesn't know to treat is
> specially hence a frontend can't tell whether 'all multicast' mode is going to
> work without the extra feature flag. As for watching "request-multcast-
> control" vs. add/remove of bcast, the complexity of implementation is
> cheaper for the latter but I think the former is 'nicer'.
> 

Are you ok with the xenstore watch approach (and leaving the patch as is) or 
would you prefer to spec. the bcast address as a wildcard and submit a new 
patch?

  Paul

>   Paul
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Peng Fan
Hi Ian,

On Thu, Jan 21, 2016 at 10:19:32AM +, Ian Campbell wrote:
>On Thu, 2016-01-21 at 16:59 +0800, Peng Fan wrote:
>> 
>> To platform device of ARM, hypervisor is responsible for the mapping
>> between machine address and guest physical address, also responsible
>> for the irq mapping.
>> 
>> But to embedded ARM SoC, the hardware clk IP is handled in Dom0.
>
>Arguably Xen ought to be the one to do this, but we have determined
>(rightly, I think) that doing so for the entirely clk tree of an SoC would
>involve importing an unmanageable amount of code into Xen[0].
>
>In the meantime we defer this to Dom0.
>
>I wonder though if it would be possible to manage the clocks for a
>passthrough device from the toolstack, i.e. is there a sysfs node where one
>can say "keep the clock for this device enabled (at xMhz) even though you
>think the device is unused"?

I am afraid not. The linux device driver without xen can work well, because
there is clk tree and "clk_get,clk_prepare,clk_set_rate" work well in the 
driver.
I do not want to remove the clk apis usage from the device driver for xen, 
because the driver
also serves when no xen support. The pvclk interface is mainly to let the clk 
api can work as no xen.

Introduce pvclk may introduce more stuff, I have no more idea for now, 
if want to let the device driver unchanged and work well for passed through 
device. (:

>
>If so (or if it can be easily added) then the toolstack would just need to
>manage that value as part of the passthrough process rather than having the
>frontend do it via a PV protocol.
>
>Ian.
>
>[0] I'd like at some point for Xen to gain sufficient knowledge of clock IP
>to minimally control things like the CPU and DRAM clocks etc, but that
>needn't imply full clock tree support and would hopefully be a manageable
>amount of code, which would (hopefully) mostly be about trapping and
>emulating writes to one or two clock controllers per platform and
>arbitrating a small set of bits (while allowing dom0 to control the
>others). This is at least a medium if not long term idea though, and for
>all I know it might turn out to be unworkable in practice.

I am happy to see that if one day, xen takes part of the clock management to 
minimize power usage.
These few days, I also look into the power management of xen. seems it mainly 
serves for x86.
Let xen hypervisor handle cpu and dram clocks may be better than let dom0.

Thanks,
Peng.

>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

2016-01-21 Thread Andrew Cooper
On 21/01/16 11:07, Jan Beulich wrote:
 On 20.01.16 at 21:10,  wrote:
>> First of all, SMEP and SMAP.  32bit PV guests are subject to Xen's
>> SMEP/SMAP choices, because of running in ring 1.
>>
>> SMAP in particular is problematic because older Linux guests do fall
>> foul of it; they don't understand what a SMAP pagefault is, and enter an
>> infinite loop of pagefaults.  SMEP is also problematic because it breaks
>> any guest wishing to use a shared address space between kernel and
>> user.  (I had some fun getting the test framework to function until I
>> twigged what was happening).
>>
>> Both of these are regressions; older guests relying on existing
>> behaviour cease to function on newer hardware/Xen despite identical
>> settings.
> And for both of them there simply should be a way for the guest to
> state whether it's compatible (which should be the case for anything
> we can't deal with completely transparently to guests).
>
>> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
>> build (so there is a guest observable difference between running on a
>> debug and a non-debug Xen), and the comment beside it even identifies
>> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
>> is set.  Currently this means that we are not able to support Protection
>> Key for PV guests (although this restriction technically only applies to
>> debug builds of the hypervisor).
>>
>> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
>> is used to notice when a 64bit PV guest attempts to override the fixup
>> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
>> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
>> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
>> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
>> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
>> warning is logged and the kernel overridden.
>>
>>
>> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
>> them serve a purpose other than a debugging aid.
>>
>> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
>> "all bits available for guest use".
> And a kernel using any of the conflicting bits would then become
> unusable on a hypervisor with that debug option enabled? I'd
> rather see us document the state things are in...

_PAGE_GNTMAP is already states:

/*
 * Debug option: Ensure that granted mappings are not implicitly unmapped.
 * WARNING: This will need to be disabled to run OSes that use the spare PTE
 * bits themselves (e.g., *BSD).
 */

I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
disabled by default even in debug builds.

There should not be an ABI difference between release and "normal" debug
builds.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V13 5/5] xl: add pvusb commands

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 12:12 +, Wei Liu wrote:
> On Tue, Jan 19, 2016 at 08:10:07PM -0700, Jim Fehlig wrote:
> > On 01/19/2016 11:11 AM, Ian Jackson wrote:
> > > Chunyan Liu writes ("[PATCH V13 5/5] xl: add pvusb commands"):
> > > > Add pvusb commands: usbctrl-attach, usbctrl-detach, usb-list,
> > > > usbdev-attach and usbdev-detach.
> > > Thanks for swapping this with the other patch.  It is better now.
> > > 
> > > > +=item B I I
> > > However, I think you need to explictly state that the user may (and
> > > indeed, must) pass multiple settings as separate arguments.  AFAICT
> > > the parser here doesn't do the ,-splitting.
> > 
> > I just noticed this is the case with network devices as well. E.g.
> > 
> > #xl network-attach hvm-domU mac=00:16:3e:xx:yy:zz,bridge=br0
> > libxl: error: libxl_device.c:1095:device_hotplug_child_death_cb:
> > script: Could
> > not find bridge device xenbr0
> > 
> > main_networkattach() in tools/libxl/xl_cmdimpl.c doesn't split on the
> > ',', so
> > everything after mac=00:16:3e:xx:yy:zz is ignored. I'd need advice on
> > how to fix
> > this though. Based on xl-network-configuration doc and Xen tool's long
> > history
> > of network-attach supporting that syntax, I'd say main_networkattach()
> > should be
> > changed to split on ','. I could also change the docs. Do tools
> > maintainers have
> > a preference, or alternative option?
> > 
> 
> It is splitting on " " at the moment which is not very desirable.
> 
> I think this is a bug. I've added it to my list and will look into
> fixing it.

If you are feeling particularly highly motivated then xl_cmdimpl.c contains
quite a lot of adhoc parsing of various comma a space separated lists
(often dupped for a given instance into cmdline vs cfg file, which leads to
unintentional behavioural differences like above) which could usefully be
consolidated into a series of helpers.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of "request-multicast-control" flag

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 11:48 +, Paul Durrant wrote:
> > -Original Message-
> > From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-
> > boun...@lists.xen.org] On Behalf Of Paul Durrant
> > Sent: 20 January 2016 13:14
> > To: Ian Campbell; xen-de...@lists.xenproject.org
> > Cc: Ian Jackson; Keir (Xen.org); Jan Beulich; Tim (Xen.org)
> > Subject: Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of
> > "request-multicast-control" flag
> > 
> > > -Original Message-
> > > From: Ian Campbell [mailto:ian.campb...@citrix.com]
> > > Sent: 20 January 2016 13:06
> > > To: Paul Durrant; xen-de...@lists.xenproject.org
> > > Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org)
> > > Subject: Re: [PATCH] public/io/netif.h: change semantics of "request-
> > > multicast-control" flag
> > > 
> > > On Wed, 2016-01-20 at 12:50 +, Paul Durrant wrote:
> > > > My patch b2700877 "move and amend multicast control documentation"
> > > > clarified use of the multicast control protocol between frontend
> > > > and
> > > > backend. However, it transpires that the restrictions that
> > > > documentation
> > > > placed on the "request-multicast-control" flag make it hard for a
> > > > frontend to enable 'all multicast' promiscuous mode, in that to do
> > > > so
> > > > would require the frontend and backend to disconnect and re-
> > > > connect.
> > > 
> > > Do we therefore think that this document reflected reality, i.e.
> > > might this
> > > not be "just" a documentation bug?
> > > 
> > > (Or maybe we can't tell because the only previous implementation was
> > years
> > > ago in Solaris or something)
> > 
> > That's my concern. I hope it's just a documentation bug, but I don't
> > know.
> > Also I've already done an implementation in Linux netback according to
> > the
> > restricted semantics.
> > 
> > > 
> > > > This patch adds a new "feature-dynamic-multicast-control" flag to
> > > > allow
> > > > a backend to advertise that it will watch "request-multicast-
> > > > control"
> > hence
> > > > allowing it to be meaningfully modified by the frontend at any time
> > > > rather
> > > > than only when the frontend and backend are disconnected.
> > > 
> > > Would allowing XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL} to take a
> > bcast
> > > address
> > > be easier on the backend, in that it would just need to be a static 
> > > feature
> > > rather than watching stuff on the fly?
> > 
> > The documented semantics of the list are 'exact match' so sending a bcast
> > address doesn't do much good with a backend that doesn't know to treat is
> > specially hence a frontend can't tell whether 'all multicast' mode is going 
> > to
> > work without the extra feature flag. As for watching "request-multcast-
> > control" vs. add/remove of bcast, the complexity of implementation is
> > cheaper for the latter but I think the former is 'nicer'.
> > 
> 
> Are you ok with the xenstore watch approach (and leaving the patch as is)
> or would you prefer to spec. the bcast address as a wildcard and submit a
> new patch?

I'm fine with the watch approach, was just suggesting the alternative in
case it turned out to be much easier.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of "request-multicast-control" flag

2016-01-21 Thread Paul Durrant
> -Original Message-
> From: Ian Campbell [mailto:ian.campb...@citrix.com]
> Sent: 21 January 2016 11:59
> To: Paul Durrant; xen-de...@lists.xenproject.org
> Cc: Ian Jackson; Keir (Xen.org); Jan Beulich; Tim (Xen.org)
> Subject: Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of
> "request-multicast-control" flag
> 
> On Thu, 2016-01-21 at 11:48 +, Paul Durrant wrote:
> > > -Original Message-
> > > From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-
> > > boun...@lists.xen.org] On Behalf Of Paul Durrant
> > > Sent: 20 January 2016 13:14
> > > To: Ian Campbell; xen-de...@lists.xenproject.org
> > > Cc: Ian Jackson; Keir (Xen.org); Jan Beulich; Tim (Xen.org)
> > > Subject: Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of
> > > "request-multicast-control" flag
> > >
> > > > -Original Message-
> > > > From: Ian Campbell [mailto:ian.campb...@citrix.com]
> > > > Sent: 20 January 2016 13:06
> > > > To: Paul Durrant; xen-de...@lists.xenproject.org
> > > > Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org)
> > > > Subject: Re: [PATCH] public/io/netif.h: change semantics of "request-
> > > > multicast-control" flag
> > > >
> > > > On Wed, 2016-01-20 at 12:50 +, Paul Durrant wrote:
> > > > > My patch b2700877 "move and amend multicast control
> documentation"
> > > > > clarified use of the multicast control protocol between frontend
> > > > > and
> > > > > backend. However, it transpires that the restrictions that
> > > > > documentation
> > > > > placed on the "request-multicast-control" flag make it hard for a
> > > > > frontend to enable 'all multicast' promiscuous mode, in that to do
> > > > > so
> > > > > would require the frontend and backend to disconnect and re-
> > > > > connect.
> > > >
> > > > Do we therefore think that this document reflected reality, i.e.
> > > > might this
> > > > not be "just" a documentation bug?
> > > >
> > > > (Or maybe we can't tell because the only previous implementation was
> > > years
> > > > ago in Solaris or something)
> > >
> > > That's my concern. I hope it's just a documentation bug, but I don't
> > > know.
> > > Also I've already done an implementation in Linux netback according to
> > > the
> > > restricted semantics.
> > >
> > > >
> > > > > This patch adds a new "feature-dynamic-multicast-control" flag to
> > > > > allow
> > > > > a backend to advertise that it will watch "request-multicast-
> > > > > control"
> > > hence
> > > > > allowing it to be meaningfully modified by the frontend at any time
> > > > > rather
> > > > > than only when the frontend and backend are disconnected.
> > > >
> > > > Would allowing XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL} to take a
> > > bcast
> > > > address
> > > > be easier on the backend, in that it would just need to be a static
> feature
> > > > rather than watching stuff on the fly?
> > >
> > > The documented semantics of the list are 'exact match' so sending a bcast
> > > address doesn't do much good with a backend that doesn't know to treat
> is
> > > specially hence a frontend can't tell whether 'all multicast' mode is 
> > > going
> to
> > > work without the extra feature flag. As for watching "request-multcast-
> > > control" vs. add/remove of bcast, the complexity of implementation is
> > > cheaper for the latter but I think the former is 'nicer'.
> > >
> >
> > Are you ok with the xenstore watch approach (and leaving the patch as is)
> > or would you prefer to spec. the bcast address as a wildcard and submit a
> > new patch?
> 
> I'm fine with the watch approach, was just suggesting the alternative in
> case it turned out to be much easier.
> 

I already have an implementation of the watch approach which is now allowing 
Windows logo testing to pass :-)

  Paul

> Ian.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] stable backport of 3 xen multiqueue network dev fixes

2016-01-21 Thread Ian Campbell
Hi Dave,

Please could you queue these three:

32a8440 xen-netfront: respect user provided max_queues
4c82ac3 xen-netback: respect user provided max_queues
ca88ea1 xen-netfront: update num_queues to real created

for stable backports to at least 4.1. Multiqueue was added in v3.16 so I
think they could all go to LTS that far back.

Users are tripping over issues in the field with exhausting the number of
Xen grant entries available to guests when creating large numbers of queues
(which happens with guests with >=32 VCPUs).

ca88ea1 is a fix which causes the failure to allocate grants for all of the
desired queues gracefully, rather than crashing or failing to pass any
traffic.

32a8440 and 4c82ac3 fix the handling of the module parameters on both ends
so that users can choose to have fewer queues than p/vcpus if they like,
which would have been an effective workaround for the issue solved by
ca88ea1 if that hadn't been broken.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote:
> On 20/01/16 12:23, Ian Campbell wrote:
> > There have been a few reports recently[0] which relate to a failure of
> > netfront to allocate sufficient grant refs for all the queues:
> > 
> > [0.533589] xen_netfront: can't alloc rx grant refs
> > [0.533612] net eth0: only created 31 queues
> > 
> > Which can be worked around by increasing the number of grants on the
> > hypervisor command line or by limiting the number of queues permitted
> > by
> > either back or front using a module param (which was broken but is now
> > fixed on both sides, but I'm not sure it has been backported everywhere
> > such that it is a reliable thing to always tell users as a workaround).
> > 
> > Is there any plan to do anything about the default/out of the box
> > experience? Either limiting the number of queues or making both ends
> > cope
> > more gracefully with failure to create some queues (or both) might be
> > sufficient?
> > 
> > I think the crash after the above in the first link at [0] is fixed? I
> > think that was the purpose of ca88ea1247df "xen-netfront: update
> > num_queues
> > to real created" which was in 4.3.
> 
> I think the correct solution is to increase the default maximum grant
> table size.

That could well make sense, but then there will just be another higher
limit, so we should perhaps do both.

i.e. factoring in:
 * performance i.e. ability for N queues to saturate whatever sort of link
   contemporary Linux can saturate these days, plus some headroom, or
   whatever other ceiling seems sensible)
 * grant table resource consumption i.e. (sensible max number of blks * nr
   gnts per blk + sensible max number of vifs * nr gnts per vif + other
   devs needs) < per guest grant limit) to pick both the default gnttab
   size and the default max queuers.

(or s/sensible/supportable/g etc).

> Although, unless you're using the not-yet-applied per-cpu rwlock patches
> multiqueue is terrible on many (multisocket) systems and the number of
> queue should be limited in netback to 4 or even just 2.

Presumably the guest can't tell, so it can't do this.

I think when you say "terrible" you don't mean "worse than without mq" but
rather "not realising the expected gains from a larger nunber of queues",
right?.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC] VirtFS support on Xen

2016-01-21 Thread Wei Liu
On Thu, Jan 21, 2016 at 10:50:08AM +, David Vrabel wrote:
> On 21/01/16 10:28, Wei Liu wrote:
> > [RFC] VirtFS support on Xen
> > 
> > # Introduction
> > 
> > QEMU/KVM supports file system passthrough via an interface called
> > VirtFS [0]. VirtFS is in turn implemented with 9pfs protocol [1] and
> > VirtIO transport.
> > 
> > Xen used to have its own implementation of file system passthrough
> > called XenFS, but that has been inactive for a few years. The latest
> > update was in 2009 [2].
> > 
> > This project aims to add VirtFS support on Xen. This is more
> > sustainable than inventing our own wheel.#
> 
> What's the use case for this?  Who wants this feature?
> 

Anyone who wants file system passthrough.  More specifically, VM-based
container solutions can share files from host file system.

http://xendevsummit2015.sched.org/event/3WDg/xen-containers-better-way-to-run-docker-containers-sainath-grandhi-intel?iframe=no==yes=no
http://xendevsummit2015.sched.org/event/3X1L/hyper-make-vm-run-like-containers-xu-wang-hyper?iframe=no==yes=no

Wei.

> David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

2016-01-21 Thread Andrew Cooper
On 21/01/16 12:59, Jan Beulich wrote:
>>
 For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
 build (so there is a guest observable difference between running on a
 debug and a non-debug Xen), and the comment beside it even identifies
 that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
 is set.  Currently this means that we are not able to support Protection
 Key for PV guests (although this restriction technically only applies to
 debug builds of the hypervisor).

 The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
 is used to notice when a 64bit PV guest attempts to override the fixup
 Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
 pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
 _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
 _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
 tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
 warning is logged and the kernel overridden.


 Neither of the used PTE bits exist in the Xen public ABI.  Neither of
 them serve a purpose other than a debugging aid.

 I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
 "all bits available for guest use".
>>> And a kernel using any of the conflicting bits would then become
>>> unusable on a hypervisor with that debug option enabled? I'd
>>> rather see us document the state things are in...
>> _PAGE_GNTMAP is already states:
>>
>> /*
>>  * Debug option: Ensure that granted mappings are not implicitly unmapped.
>>  * WARNING: This will need to be disabled to run OSes that use the spare PTE
>>  * bits themselves (e.g., *BSD).
>>  */
> But that's (assuming the use of the two bits were spelled out) a
> guest OS not fully playing by the spec. To me, "available" PTE bits
> being shared implies that some of them may be claimed by Xen,
> while others may be claimed by guests. You're right that this needs
> to be written down, but I don't think we need to go as far as
> forbidding Xen to use any of them. And even less so should we
> preclude their use for any purpose going forward.

I do not want to make an ABI which mandates the use of certain bits by Xen.

As we see with the Protection Key feature, newer hardware feature start
using bits which were previously software available, and we absolutely
don't want to be in a position where our ABI prevents us from ever
supporting a new feature.

> In the end, with (as it seems) not much effort this could even be
> made dynamic: A guest could advertise which of the bits it doesn't
> use, and then Xen could pick two of them for the two purposes it
> currently needs them for. Should a guest leave no or just one bit
> available, the debugging aid could then be disabled.

This is unnecessarily complicated.  It only helps going forward, and
leaves Xen with substantially more complicated PTE handling.

What happens if at some point later, we try to boot a PV guest which has
a different PTE layout to what Xen has chosen?  We definitely can't
update every PTE with a newly-chosen layout.

>
>> I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
>> disabled by default even in debug builds.
>>
>> There should not be an ABI difference between release and "normal" debug
>> builds.
> Well, I see your point, but as said above I'm not convinced
> disabling all that code is the right solution. In fact, what you
> propose is not far away from removing that code altogether.

The two bits are only used for specialised debugging.  They should be
relegated to people doing specific debugging, and not interfere with the
overwhelming majority of cases where Xen doesn't need to use any
software available PTE bits.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] arm: clean up build variables

2016-01-21 Thread Jan Beulich
>>> On 20.01.16 at 22:47,  wrote:
> This consolidates some of the different variables used for the ARM
> builds. This change was prompted by the Kconfig changes but looking back
> in time the CONFIG_ARM_{32,64} variables existed before Kconfig so this
> should just be a generic cleanup.
> 
> Signed-off-by: Doug Goldstein 
> ---
>  xen/arch/arm/Makefile|  8 
>  xen/arch/arm/Rules.mk| 18 --
>  xen/drivers/passthrough/Makefile |  2 +-
>  3 files changed, 9 insertions(+), 19 deletions(-)

Trivial IOMMU part
Acked-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 12/16] ARM: Xen: Document UEFI support on Xen ARM virtual platforms

2016-01-21 Thread Mark Rutland
On Tue, Jan 19, 2016 at 09:43:59PM +0800, Shannon Zhao wrote:
> 
> On 2016/1/19 21:13, Mark Rutland wrote:
> >On Tue, Jan 19, 2016 at 12:23:17PM +, Stefano Stabellini wrote:
> >>>On Tue, 19 Jan 2016, Mark Rutland wrote:
>  >On Tue, Jan 19, 2016 at 06:25:25PM +0800, Shannon Zhao wrote:
> >>> > > >>We don't do this in Documentation/arm/uefi.txt, and I don't see 
> >>> > > >>why we
> >>> > > >>should do so here.
> >>> > > >>
> >>> > > >>Does Xen handle arbitrary size memory map descriptors? I'm not 
> >>> > > >>sure what
> >>> > > >>new information might be passed in future additions to the 
> >>> > > >>descriptor
> >>> > > >>format, and I'm not sure what should happen in the Dom0 case.
> >> > > >
> >> > > >Xen passes to Dom0 the memory map in the same format as the native
> >> > > >memory map.
>  >
>  >Does Xen parse or modify the EFI memory map in any way?
> >>>
> >>>Xen:
> >>>- calls EFI_BOOT_SERVICES.GetMemoryMap()
> >>>- takes note of the memory regions for its own usage
> >>>- create the fdt notes, including efi-mmap-start, with a pointer to it
> >>>
> >>>
>  >Does it pass the raw values returned by EFI_BOOT_SERVICES.GetMemoryMap()
>  >through to the xen,uefi-* properties, or does is make any static
>  >assumptions about what the values will be?
> >>>
> >>>It just passes the raw values.
> >I take it that means that any memory carved out for Xen itself is
> >described/discovered via a separate mechanism? How does that work
> 
> For Xen hypervisor booting on UEFI, it get the EFI memory map
> through the similar way like Linux, e.g. call
> EFI_BOOT_SERVICES.GetMemoryMap().
> For Dom0, Xen will create a new EFI memory map for Dom0.
> 
> See [PATCH v3 52/62] arm/acpi: Prepare EFI memory descriptor for Dom0
> http://lists.xen.org/archives/html/xen-devel/2015-11/msg01884.html

Ok. So Xen will parse the EFI memory map, and will create a new memory
map to pass to the Dom0 kernel, presumably with some memory having been
carved out for Xen itself, and never described to Dom0.

So if there's any extension to that in future, Dom0 may see problems.
There's not much that can be done about that, however, and extensions to
the descriptors seem unlikely at present.

Thanks,
Mark.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 12:55 +, Stefano Stabellini wrote:
> On Thu, 21 Jan 2016, Peng Fan wrote:
> > Hi Ian,
> > 
> > On Thu, Jan 21, 2016 at 12:26:04PM +, Ian Campbell wrote:
> > > On Thu, 2016-01-21 at 19:55 +0800, Peng Fan wrote:
> > > > Hi Ian,
> > > > 
> > > > On Thu, Jan 21, 2016 at 10:19:32AM +, Ian Campbell wrote:
> > > > > On Thu, 2016-01-21 at 16:59 +0800, Peng Fan wrote:
> > > > > >  
> > > > > > To platform device of ARM, hypervisor is responsible for the
> > > > > > mapping
> > > > > > between machine address and guest physical address, also
> > > > > > responsible
> > > > > > for the irq mapping.
> > > > > > 
> > > > > > But to embedded ARM SoC, the hardware clk IP is handled in
> > > > > > Dom0.
> > > > > 
> > > > > Arguably Xen ought to be the one to do this, but we have
> > > > > determined
> > > > > (rightly, I think) that doing so for the entirely clk tree of an
> > > > > SoC
> > > > > would
> > > > > involve importing an unmanageable amount of code into Xen[0].
> > > > > 
> > > > > In the meantime we defer this to Dom0.
> > > > > 
> > > > > I wonder though if it would be possible to manage the clocks for
> > > > > a
> > > > > passthrough device from the toolstack, i.e. is there a sysfs node
> > > > > where
> > > > > one
> > > > > can say "keep the clock for this device enabled (at xMhz) even
> > > > > though
> > > > > you
> > > > > think the device is unused"?
> > > > 
> > > > I am afraid not. The linux device driver without xen can work well,
> > > > because
> > > > there is clk tree and "clk_get,clk_prepare,clk_set_rate" work well
> > > > in the
> > > > driver.
> > > > I do not want to remove the clk apis usage from the device driver
> > > > for
> > > > xen, because the driver
> > > > also serves when no xen support. The pvclk interface is mainly to
> > > > let the
> > > > clk api can work as no xen.
> > > 
> > > Would adding a dummy fixed-clock[0] (or several of them) to the guest
> > > passthrough DT satisfy the driver's requirements? They would be
> > > hardcoded
> > > to whatever rate dom0 and/or the tools has decided upon (and had set
> > > in the
> > > real h/w).
> > 
> > If using this way, we have the assumption that DomU device driver would
> > not
> > change the rate of the clock driving the device. I am not sure whether
> > this is
> > ok for so many platform devices based ARM core.
> > 
> > In /sys/kernel/debug/clk/, there are clk tree info, but
> > clk api are not exposed to userspace as far as I know, so
> > if using sysfs interface to set a known fixed rate or enable/disable
> > the clock,
> > we need to expose the clk info to userspace.
> 
> Keeping in mind that we might now want to let DomU change the actual

s/now/not/?

> clock frequency anyway, exposing a dummy clock to keep the DomU driver
> happy might be a good option.
> 
> However whether we set the frequency from the Dom0 kernel or from the
> toolstack, how do we find out the right clock rate? From a grep in the
> kernel sources, it looks like some drivers still have hardcoded values.
> 
> Asking the user to provide the clock rate seems a bit too much to me.

Remember that for this use case they already need to provide the host DT
path to the device, the list of iomem resources, the list of irqs and a
suitable device tree snippet.

platform device passthrough is very much an "expert" option (intended for
folks building embedded device, not really for normal end users), which
already needs a fair bit of familiarity with the system in question, I
don't think providing some info about the clocks goes too much further than
this.

Seamless & easy passthrough is something we are aiming for for PCI, but not
for the general case of platform passthrough.

Making platform device passthrough as easy as PCI passthrough is much
harder and far reaching than avoiding the need to specify some clock
frequencies (remember we had very long discussions about the design and
couldn't come up with anything especially feasible).

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 4/8] x86/init: add linker table support

2016-01-21 Thread Boris Ostrovsky

On 01/21/2016 03:38 AM, Roger Pau Monné wrote:

El 20/01/16 a les 22.33, Luis R. Rodriguez ha escrit:

On Wed, Jan 20, 2016 at 1:00 PM, Konrad Rzeszutek Wilk
 wrote:

+static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
+{
+ if (!fn->supp_hardware_subarch) {
+ pr_err("Init sequence fails to declares any supported subarchs: 
%pF\n", fn->early_init);
+ WARN_ON(1);
+ }
+ if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
+ return true;
+ return false;
+}

So the logic for this working is that boot_params.hdr.hardware_subarch

And the macros define two: BIT(X86_SUBARCH_PC) or BIT(X86_SUBARCH_XEN).

But hardware_subarch by default is set to zero. Which means if GRUB2, PXELinux, 
Xen multiboot1
don't set it - then the X86_SUBARCH_PC is choosen right?

  1 << 0 & 1 << X86_SUBARCH_PC (which is zero).

For this to nicely work with Xen it ought to do this:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 993b7a7..6cf9afd 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1676,6 +1676,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
 boot_params.hdr.ramdisk_image = initrd_start;
 boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
 boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
+   boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;

 if (!xen_initial_domain()) {
 add_preferred_console("xenboot", 0, NULL);


?

That's correct for PV and PVH, likewise when qemu is required for HVM
qemu could set it. I have the qemu change done but that should only
cover HVM. A common place to set this as well could be the hypervisor,
but currently the hypervisor doesn't set any boot_params, instead a
generic struct is passed and the kernel code (for any OS) is expected
to interpret this and then set the required values for the OS in the
init path. Long term though if we wanted to merge init further one way
could be to have the hypervisor just set the zero page cleanly for the
different modes. If we needed more data other than the
hardware_subarch we also have the hardware_subarch_data, that's a u64
, and how that is used would be up to the subarch. In Xen's case it
could do what it wants with it. That would still mean perhaps defining
as part of a Xen boot protocol a place where xen specific code can
count on finding more Xen data passed by the hypervisor, the
xen_start_info. That is, if we wanted to merge init paths this is
something to consider.

One thing I considered on the question of who should set the zero page
for Xen with the prospect of merging inits, or at least this subarch
for both short term and long term are the obvious implications in
terms of hypervisor / kernel / qemu combination requirements if the
subarch is needed. Having it set in the kernel is an obvious immediate
choice for PV / PVH but it means we can't merge init paths completely
(down to asm inits), we'd still be able to merge some C init paths
though, the first entry would still be different. Having the zero page
set on the hypervisor would go long ways but it would mean a
hypervisor change required.

I don't think the hypervisor should be setting Linux specific boot
related parameters, the boot ABI should be OS agnostic. IMHO, a small
shim should be added to Linux in order to set what Linux requires when
entering from a Xen entry point.


And that's exactly what HVMlite does. Most of this shim layer is setting 
up boot_params, after which we jump to standard x86 boot path (i.e. 
startup_{32|64}). With hardware_subarch set to zero.


-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Peng Fan
Hi Jan,

On Thu, Jan 21, 2016 at 03:21:38AM -0700, Jan Beulich wrote:
 On 21.01.16 at 09:59,  wrote:
>> uart2 needs clock IMX7D_UART2_ROOT_CLK from the ccm.
>> passthrough uart2, hypervisor handles the reg and interrupts, that is 
>> because
>> hypervisor handles the memory map and the interrupt controller(GIC). But 
>> here
>> CCM is not handled by hypervisor, it is handled by Dom0.
>
>This, I take it, describes the current state, which doesn't make clear
>whether this is intentionally that way (and can't be changed), or
>just happens to be that way because so far it didn't matter.
>
>> For ARMV8 server products, I am not sure whether hypercall is better; but to 
>> my case, it's not feasible to use hypercall.
>
>Because of ...?

I guess you mean DomU issues hypercall and Xen forwards the request to Dom0,
then Dom0 reply the response?

If you experts think pvclk is not a good way to handle the clock for passed 
through
device, I can try hypercall way.

>
>> Dom0 handles all the clocks, DomU just send request to Dom0 and ask Dom0 to
>> enable/disable/set rate for a clock for the device. So I think it's okay
>> for multipile parties, the clk subsystem in Dom0 can handle mutiple requests 
>> even if the clock is shared.
>
>I.e. if one party sets one rate, and later another party sets
>a different rate, things are going to work fine? If so, why would
>the different parties need control over the rate in the first place?

oh. thanks for teaching me. If two IPs share one clock, and IP1 for Dom0, IP2 
for DomU,
Dom0 needs clock work at 20Hz, but DomU want clock work at 40Hz. pvclk can not 
avoid this.

If not using pvclk, we develop a new method to handle clock. I guest we can 
also not avoid this?

Thanks,
Peng.

>
>Jan
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 19:55 +0800, Peng Fan wrote:
> Hi Ian,
> 
> On Thu, Jan 21, 2016 at 10:19:32AM +, Ian Campbell wrote:
> > On Thu, 2016-01-21 at 16:59 +0800, Peng Fan wrote:
> > >  
> > > To platform device of ARM, hypervisor is responsible for the mapping
> > > between machine address and guest physical address, also responsible
> > > for the irq mapping.
> > > 
> > > But to embedded ARM SoC, the hardware clk IP is handled in Dom0.
> > 
> > Arguably Xen ought to be the one to do this, but we have determined
> > (rightly, I think) that doing so for the entirely clk tree of an SoC
> > would
> > involve importing an unmanageable amount of code into Xen[0].
> > 
> > In the meantime we defer this to Dom0.
> > 
> > I wonder though if it would be possible to manage the clocks for a
> > passthrough device from the toolstack, i.e. is there a sysfs node where
> > one
> > can say "keep the clock for this device enabled (at xMhz) even though
> > you
> > think the device is unused"?
> 
> I am afraid not. The linux device driver without xen can work well,
> because
> there is clk tree and "clk_get,clk_prepare,clk_set_rate" work well in the
> driver.
> I do not want to remove the clk apis usage from the device driver for
> xen, because the driver
> also serves when no xen support. The pvclk interface is mainly to let the
> clk api can work as no xen.

Would adding a dummy fixed-clock[0] (or several of them) to the guest
passthrough DT satisfy the driver's requirements? They would be hardcoded
to whatever rate dom0 and/or the tools has decided upon (and had set in the
real h/w).

Ian.

[0] Documentation/devicetree/bindings/clock/fixed-clock.txt


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Peng Fan
Hi Ian,

On Thu, Jan 21, 2016 at 12:26:04PM +, Ian Campbell wrote:
>On Thu, 2016-01-21 at 19:55 +0800, Peng Fan wrote:
>> Hi Ian,
>> 
>> On Thu, Jan 21, 2016 at 10:19:32AM +, Ian Campbell wrote:
>> > On Thu, 2016-01-21 at 16:59 +0800, Peng Fan wrote:
>> > >  
>> > > To platform device of ARM, hypervisor is responsible for the mapping
>> > > between machine address and guest physical address, also responsible
>> > > for the irq mapping.
>> > > 
>> > > But to embedded ARM SoC, the hardware clk IP is handled in Dom0.
>> > 
>> > Arguably Xen ought to be the one to do this, but we have determined
>> > (rightly, I think) that doing so for the entirely clk tree of an SoC
>> > would
>> > involve importing an unmanageable amount of code into Xen[0].
>> > 
>> > In the meantime we defer this to Dom0.
>> > 
>> > I wonder though if it would be possible to manage the clocks for a
>> > passthrough device from the toolstack, i.e. is there a sysfs node where
>> > one
>> > can say "keep the clock for this device enabled (at xMhz) even though
>> > you
>> > think the device is unused"?
>> 
>> I am afraid not. The linux device driver without xen can work well,
>> because
>> there is clk tree and "clk_get,clk_prepare,clk_set_rate" work well in the
>> driver.
>> I do not want to remove the clk apis usage from the device driver for
>> xen, because the driver
>> also serves when no xen support. The pvclk interface is mainly to let the
>> clk api can work as no xen.
>
>Would adding a dummy fixed-clock[0] (or several of them) to the guest
>passthrough DT satisfy the driver's requirements? They would be hardcoded
>to whatever rate dom0 and/or the tools has decided upon (and had set in the
>real h/w).

If using this way, we have the assumption that DomU device driver would not
change the rate of the clock driving the device. I am not sure whether this is
ok for so many platform devices based ARM core.

In /sys/kernel/debug/clk/, there are clk tree info, but
clk api are not exposed to userspace as far as I know, so
if using sysfs interface to set a known fixed rate or enable/disable the clock,
we need to expose the clk info to userspace.

Jan said using hypercall in the other mail, do you have any ideas about this?

Thanks,
Peng.

>
>Ian.
>
>[0] Documentation/devicetree/bindings/clock/fixed-clock.txt
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Stefano Stabellini
On Thu, 21 Jan 2016, Peng Fan wrote:
> Hi Ian,
>
> On Thu, Jan 21, 2016 at 12:26:04PM +, Ian Campbell wrote:
> >On Thu, 2016-01-21 at 19:55 +0800, Peng Fan wrote:
> >> Hi Ian,
> >>
> >> On Thu, Jan 21, 2016 at 10:19:32AM +, Ian Campbell wrote:
> >> > On Thu, 2016-01-21 at 16:59 +0800, Peng Fan wrote:
> >> > >  
> >> > > To platform device of ARM, hypervisor is responsible for the mapping
> >> > > between machine address and guest physical address, also responsible
> >> > > for the irq mapping.
> >> > >
> >> > > But to embedded ARM SoC, the hardware clk IP is handled in Dom0.
> >> >
> >> > Arguably Xen ought to be the one to do this, but we have determined
> >> > (rightly, I think) that doing so for the entirely clk tree of an SoC
> >> > would
> >> > involve importing an unmanageable amount of code into Xen[0].
> >> >
> >> > In the meantime we defer this to Dom0.
> >> >
> >> > I wonder though if it would be possible to manage the clocks for a
> >> > passthrough device from the toolstack, i.e. is there a sysfs node where
> >> > one
> >> > can say "keep the clock for this device enabled (at xMhz) even though
> >> > you
> >> > think the device is unused"?
> >>
> >> I am afraid not. The linux device driver without xen can work well,
> >> because
> >> there is clk tree and "clk_get,clk_prepare,clk_set_rate" work well in the
> >> driver.
> >> I do not want to remove the clk apis usage from the device driver for
> >> xen, because the driver
> >> also serves when no xen support. The pvclk interface is mainly to let the
> >> clk api can work as no xen.
> >
> >Would adding a dummy fixed-clock[0] (or several of them) to the guest
> >passthrough DT satisfy the driver's requirements? They would be hardcoded
> >to whatever rate dom0 and/or the tools has decided upon (and had set in the
> >real h/w).
>
> If using this way, we have the assumption that DomU device driver would not
> change the rate of the clock driving the device. I am not sure whether this is
> ok for so many platform devices based ARM core.
>
> In /sys/kernel/debug/clk/, there are clk tree info, but
> clk api are not exposed to userspace as far as I know, so
> if using sysfs interface to set a known fixed rate or enable/disable the 
> clock,
> we need to expose the clk info to userspace.

Keeping in mind that we might now want to let DomU change the actual
clock frequency anyway, exposing a dummy clock to keep the DomU driver
happy might be a good option.

However whether we set the frequency from the Dom0 kernel or from the
toolstack, how do we find out the right clock rate? From a grep in the
kernel sources, it looks like some drivers still have hardcoded values.

Asking the user to provide the clock rate seems a bit too much to me.


> Jan said using hypercall in the other mail, do you have any ideas about this?

I recall having discussed this in the past and the conclusion was that
moving the clock framework into Xen, so that the hypervisor could
directly control clock frequencies, although it might make sense from an
architectural point of view, would be too complex to be feasible.___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

2016-01-21 Thread Jan Beulich
>>> On 21.01.16 at 12:16,  wrote:
> On 21/01/16 11:07, Jan Beulich wrote:
> On 20.01.16 at 21:10,  wrote:
>>> First of all, SMEP and SMAP.  32bit PV guests are subject to Xen's
>>> SMEP/SMAP choices, because of running in ring 1.
>>>
>>> SMAP in particular is problematic because older Linux guests do fall
>>> foul of it; they don't understand what a SMAP pagefault is, and enter an
>>> infinite loop of pagefaults.  SMEP is also problematic because it breaks
>>> any guest wishing to use a shared address space between kernel and
>>> user.  (I had some fun getting the test framework to function until I
>>> twigged what was happening).
>>>
>>> Both of these are regressions; older guests relying on existing
>>> behaviour cease to function on newer hardware/Xen despite identical
>>> settings.
>> And for both of them there simply should be a way for the guest to
>> state whether it's compatible (which should be the case for anything
>> we can't deal with completely transparently to guests).
>>
>>> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
>>> build (so there is a guest observable difference between running on a
>>> debug and a non-debug Xen), and the comment beside it even identifies
>>> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
>>> is set.  Currently this means that we are not able to support Protection
>>> Key for PV guests (although this restriction technically only applies to
>>> debug builds of the hypervisor).
>>>
>>> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
>>> is used to notice when a 64bit PV guest attempts to override the fixup
>>> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
>>> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
>>> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
>>> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
>>> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
>>> warning is logged and the kernel overridden.
>>>
>>>
>>> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
>>> them serve a purpose other than a debugging aid.
>>>
>>> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
>>> "all bits available for guest use".
>> And a kernel using any of the conflicting bits would then become
>> unusable on a hypervisor with that debug option enabled? I'd
>> rather see us document the state things are in...
> 
> _PAGE_GNTMAP is already states:
> 
> /*
>  * Debug option: Ensure that granted mappings are not implicitly unmapped.
>  * WARNING: This will need to be disabled to run OSes that use the spare PTE
>  * bits themselves (e.g., *BSD).
>  */

But that's (assuming the use of the two bits were spelled out) a
guest OS not fully playing by the spec. To me, "available" PTE bits
being shared implies that some of them may be claimed by Xen,
while others may be claimed by guests. You're right that this needs
to be written down, but I don't think we need to go as far as
forbidding Xen to use any of them. And even less so should we
preclude their use for any purpose going forward.

In the end, with (as it seems) not much effort this could even be
made dynamic: A guest could advertise which of the bits it doesn't
use, and then Xen could pick two of them for the two purposes it
currently needs them for. Should a guest leave no or just one bit
available, the debugging aid could then be disabled.

> I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
> disabled by default even in debug builds.
> 
> There should not be an ABI difference between release and "normal" debug
> builds.

Well, I see your point, but as said above I'm not convinced
disabling all that code is the right solution. In fact, what you
propose is not far away from removing that code altogether.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

2016-01-21 Thread Haozhong Zhang
On 01/21/16 03:25, Jan Beulich wrote:
> >>> On 21.01.16 at 10:10,  wrote:
> > On 01/21/2016 04:53 PM, Jan Beulich wrote:
> > On 21.01.16 at 09:25,  wrote:
> >>> On 01/21/2016 04:18 PM, Jan Beulich wrote:
>  Yes. But then - other than you said above - it still looks to me as
>  if the split between PMEM and PBLK is arranged for by firmware?
> >>>
> >>> Yes. But OS/Hypervisor is not excepted to dynamically change its configure
> >>> (re-split),
> >>> i,e, for PoV of OS/Hypervisor, it is static.
> >>
> >> Exactly, that has been my understanding. And hence the PMEM part
> >> could be under the hypervisor's control, while the PBLK part could be
> >> Dom0's responsibility.
> >>
> > 
> > I am not sure if i have understood your point. What your suggestion is that
> > leave PMEM for hypervisor and all other parts (PBLK and _DSM handling) to
> > Dom0? If yes, we should:
> > a) handle hotplug in hypervisor (new PMEM add/remove) that causes hyperivsor
> > interpret ACPI SSDT/DSDT.
> 
> Why would this be different from ordinary memory hotplug, where
> Dom0 deals with the ACPI CA interaction, notifying Xen about the
> added memory?
>

The process of NVDIMM hotplug is similar to the ordinary memory
hotplug, and seemingly possible to support it in Xen hypervisor like
ordinary memory hotplug.

> > b) some _DSMs control PMEM so you should filter out these kind of _DSMs and
> > handle them in hypervisor.
> 
> Not if (see above) following the model we currently have in place.
>

You mean let dom0 linux evaluates those _DSMs and interact with
hypervisor if necessary (e.g. XENPF_mem_hotadd for memory hotplug)?

> > c) hypervisor should mange PMEM resource pool and partition it to multiple
> > VMs.
> 
> Yes.
>

But I Still do not quite understand this part: why must pmem resource
management and partition be done in hypervisor?

I mean if we allow the following steps of operations (for example)
(1) partition pmem in dom 0
(2) get address and size of each partition (part_addr, part_size)
(3) call a hypercall like nvdimm_memory_mapping(d, part_addr, part_size, gpfn) 
to
map a partition to the address gpfn in dom d.
Only the last step requires hypervisor. Would anything be wrong if we
allow above operations?

Ha

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC] VirtFS support on Xen

2016-01-21 Thread David Vrabel
On 21/01/16 10:28, Wei Liu wrote:
> [RFC] VirtFS support on Xen
> 
> # Introduction
> 
> QEMU/KVM supports file system passthrough via an interface called
> VirtFS [0]. VirtFS is in turn implemented with 9pfs protocol [1] and
> VirtIO transport.
> 
> Xen used to have its own implementation of file system passthrough
> called XenFS, but that has been inactive for a few years. The latest
> update was in 2009 [2].
> 
> This project aims to add VirtFS support on Xen. This is more
> sustainable than inventing our own wheel.#

What's the use case for this?  Who wants this feature?

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-4.5-testing baseline-only test] 38676: trouble: blocked/broken/fail/pass

2016-01-21 Thread Juergen Gross
On 21/01/16 11:00, Ian Campbell wrote:
> On Thu, 2016-01-21 at 07:49 +0100, Juergen Gross wrote:
>> On 21/01/16 07:31, Platform Team regression test user wrote:
>>> This run is configured for baseline tests only.
>>>
>>> flight 38676 xen-4.5-testing real [real]
>>> http://osstest.xs.citrite.net/~osstest/testlogs/logs/38676/
>>
>> I've seen this more than once now: some test results seem not to be
>> reachable from outside Citrix:
>>
>> $ host osstest.xs.citrite.net
>> Host osstest.xs.citrite.net not found: 3(NXDOMAIN)
>>
>> Is this on purpose?
> 
> More due to a lack of somewhere to post them publicly.

Okay. I just wanted to make sure you are aware of that issue. I'm fine
with ignoring those mails. :-)


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

2016-01-21 Thread Jan Beulich
>>> On 20.01.16 at 21:10,  wrote:
> First of all, SMEP and SMAP.  32bit PV guests are subject to Xen's
> SMEP/SMAP choices, because of running in ring 1.
> 
> SMAP in particular is problematic because older Linux guests do fall
> foul of it; they don't understand what a SMAP pagefault is, and enter an
> infinite loop of pagefaults.  SMEP is also problematic because it breaks
> any guest wishing to use a shared address space between kernel and
> user.  (I had some fun getting the test framework to function until I
> twigged what was happening).
> 
> Both of these are regressions; older guests relying on existing
> behaviour cease to function on newer hardware/Xen despite identical
> settings.

And for both of them there simply should be a way for the guest to
state whether it's compatible (which should be the case for anything
we can't deal with completely transparently to guests).

> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
> build (so there is a guest observable difference between running on a
> debug and a non-debug Xen), and the comment beside it even identifies
> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
> is set.  Currently this means that we are not able to support Protection
> Key for PV guests (although this restriction technically only applies to
> debug builds of the hypervisor).
> 
> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
> is used to notice when a 64bit PV guest attempts to override the fixup
> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
> warning is logged and the kernel overridden.
> 
> 
> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
> them serve a purpose other than a debugging aid.
> 
> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
> "all bits available for guest use".

And a kernel using any of the conflicting bits would then become
unusable on a hypervisor with that debug option enabled? I'd
rather see us document the state things are in...

> The other question is what we do when it comes to %cr4 and PV guests.
> 
> The current SMAP issue is a blocker for XenServer, and I have some nasty
> logic to fix up behind the guests back.  I have only just discovered the
> SMEP issue, but it is still a regression (again, nothing states that a
> PV guest must have a split address space;  segmentation is a perfectly
> valid option in 32bit guests).  The PK issue is one which shouldn't be
> an issue for us to implement in PV guests.
> 
> I am leaning towards allowing a toolstack to permit a PV guest to be
> able to play with a few more CR4 bits.  We can't give a guest kernel
> complete carte blanche, because of the security implications.  However,
> we do already context switch CR4 for PV guests, so a few extra bits  on
> a "nominated safe" domain is no extra hassle.

Sounds reasonable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.4-testing test] 78620: tolerable FAIL - PUSHED

2016-01-21 Thread osstest service owner
flight 78620 xen-4.4-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/78620/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-multivcpu 15 guest-start/debian.repeatfail  like 77671
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 77671
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail like 77671

Tests which did not succeed, but are not blocking:
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-amd64-amd64-qemuu-nested-intel 16 debian-hvm-install/l1/l2 fail never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 11 guest-start  fail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-amd64-i386-xend-qemut-winxpsp3 20 leak-check/checkfail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass

version targeted for testing:
 xen  425f7f77ce950f39236f7c13e288ce5198c3576a
baseline version:
 xen  6d2c41d676339ecc6255b92876b6153d4f85f579

Last test of basis77671  2016-01-09 20:12:32 Z   11 days
Testing same since78620  2016-01-20 13:41:06 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Ian Campbell 
  Jan Beulich 
  Kevin Tian 

jobs:
 build-amd64-xend pass
 build-i386-xend  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-qemuu-nested-amdfail
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 

Re: [Xen-devel] [PATCH V13 5/5] xl: add pvusb commands

2016-01-21 Thread Wei Liu
On Tue, Jan 19, 2016 at 08:10:07PM -0700, Jim Fehlig wrote:
> On 01/19/2016 11:11 AM, Ian Jackson wrote:
> > Chunyan Liu writes ("[PATCH V13 5/5] xl: add pvusb commands"):
> >> Add pvusb commands: usbctrl-attach, usbctrl-detach, usb-list,
> >> usbdev-attach and usbdev-detach.
> > Thanks for swapping this with the other patch.  It is better now.
> >
> >> +=item B I I
> > However, I think you need to explictly state that the user may (and
> > indeed, must) pass multiple settings as separate arguments.  AFAICT
> > the parser here doesn't do the ,-splitting.
> 
> I just noticed this is the case with network devices as well. E.g.
> 
> #xl network-attach hvm-domU mac=00:16:3e:xx:yy:zz,bridge=br0
> libxl: error: libxl_device.c:1095:device_hotplug_child_death_cb: script: Could
> not find bridge device xenbr0
> 
> main_networkattach() in tools/libxl/xl_cmdimpl.c doesn't split on the ',', so
> everything after mac=00:16:3e:xx:yy:zz is ignored. I'd need advice on how to 
> fix
> this though. Based on xl-network-configuration doc and Xen tool's long history
> of network-attach supporting that syntax, I'd say main_networkattach() should 
> be
> changed to split on ','. I could also change the docs. Do tools maintainers 
> have
> a preference, or alternative option?
> 

It is splitting on " " at the moment which is not very desirable.

I think this is a bug. I've added it to my list and will look into
fixing it.

Wei.

> Regards,
> Jim
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xen-netfront crash when detaching network while some network activity

2016-01-21 Thread Joao Martins


On 01/20/2016 09:59 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 01, 2015 at 11:32:58PM +0100, Marek Marczykowski-Górecki wrote:
>> On Tue, Dec 01, 2015 at 05:00:42PM -0500, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Nov 17, 2015 at 03:45:15AM +0100, Marek Marczykowski-Górecki wrote:
 On Wed, Oct 21, 2015 at 08:57:34PM +0200, Marek Marczykowski-Górecki wrote:
> On Wed, May 27, 2015 at 12:03:12AM +0200, Marek Marczykowski-Górecki 
> wrote:
>> On Tue, May 26, 2015 at 11:56:00AM +0100, David Vrabel wrote:
>>> On 22/05/15 12:49, Marek Marczykowski-Górecki wrote:
 Hi all,

 I'm experiencing xen-netfront crash when doing xl network-detach while
 some network activity is going on at the same time. It happens only 
 when
 domU has more than one vcpu. Not sure if this matters, but the backend
 is in another domU (not dom0). I'm using Xen 4.2.2. It happens on 
 kernel
 3.9.4 and 4.1-rc1 as well.

 Steps to reproduce:
 1. Start the domU with some network interface
 2. Call there 'ping -f some-IP'
 3. Call 'xl network-detach NAME 0'
>>>
>>> Do you see this all the time or just on occassions?
>>
>> Using above procedure - all the time.
>>
>>> I tried to reproduce it and couldn't see it. Is your VM an PV or HVM?
>>
>> PV, started by libvirt. This may have something to do, the problem didn't
>> existed on older Xen (4.1) and started by xl. I'm not sure about kernel
>> version there, but I think I've tried there 3.18 too, which has this
>> problem.
>>
>> But I don't see anything special in domU config file (neither backend
>> nor frontend) - it may be some libvirt default. If that's really the
>> cause. Can I (and how) get any useful information about that?
> 
> libvirt naturally does some libxl calls, and they may be different.
> 
> Any chance you could give me an idea of:
>  - What commands you use in libvirt?
>  - Do you use a bond or bridge?
>  - What version of libvirt you are using?
> 
> Thanks!
> CC-ing Joao just in case he has seen this.
>>
Hm, So far I couldn't reproduce the issue with upstream Xen/linux/libvirt, using
both libvirt or plain xl (both on a bridge setup) and also irrespective of the
both load and direction of traffic (be it a ping flood, pktgen with min.
sized packets or iperf).

>>
>> -- 
>> Best Regards,
>> Marek Marczykowski-Górecki
>> Invisible Things Lab
>> A: Because it messes up the order in which people normally read text.
>> Q: Why is top-posting such a bad thing?
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread David Vrabel
On 20/01/16 12:23, Ian Campbell wrote:
> There have been a few reports recently[0] which relate to a failure of
> netfront to allocate sufficient grant refs for all the queues:
> 
> [0.533589] xen_netfront: can't alloc rx grant refs
> [0.533612] net eth0: only created 31 queues
> 
> Which can be worked around by increasing the number of grants on the
> hypervisor command line or by limiting the number of queues permitted by
> either back or front using a module param (which was broken but is now
> fixed on both sides, but I'm not sure it has been backported everywhere
> such that it is a reliable thing to always tell users as a workaround).
> 
> Is there any plan to do anything about the default/out of the box
> experience? Either limiting the number of queues or making both ends cope
> more gracefully with failure to create some queues (or both) might be
> sufficient?
> 
> I think the crash after the above in the first link at [0] is fixed? I
> think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
> to real created" which was in 4.3.

I think the correct solution is to increase the default maximum grant
table size.

Although, unless you're using the not-yet-applied per-cpu rwlock patches
multiqueue is terrible on many (multisocket) systems and the number of
queue should be limited in netback to 4 or even just 2.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] cleancache: constify cleancache_ops structure

2016-01-21 Thread Julia Lawall


On Thu, 21 Jan 2016, David Vrabel wrote:

> On 23/12/15 21:06, Julia Lawall wrote:
> > The cleancache_ops structure is never modified, so declare it as const.
> >
> > This also removes the __read_mostly declaration on the cleancache_ops
> > variable declaration, since it seems redundant with const.
> >
> > Done with the help of Coccinelle.
> >
> > Signed-off-by: Julia Lawall 
> >
> > ---
> >
> > Not sure that the __read_mostly change is correct.  Does it apply to the
> > variable, or to what the variable points to?
>
> The variable, so...

Thanks.  I'll update the patch, unless you have already fixed it.

julia

> > --- a/mm/cleancache.c
> > +++ b/mm/cleancache.c
> > @@ -22,7 +22,7 @@
> >   * cleancache_ops is set by cleancache_register_ops to contain the pointers
> >   * to the cleancache "backend" implementation functions.
> >   */
> > -static struct cleancache_ops *cleancache_ops __read_mostly;
> > +static const struct cleancache_ops *cleancache_ops;
>
> ...you want to retain the __read_mostly here.
>
> David
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 5/5] libxl: add options to enable/disable emulated devices

2016-01-21 Thread Wei Liu
On Thu, Jan 21, 2016 at 11:01:43AM +0100, Roger Pau Monné wrote:
> El 21/01/16 a les 10.39, Ian Campbell ha escrit:
> > On Wed, 2016-01-20 at 19:33 +0100, Roger Pau Monné wrote:
> >> El 20/01/16 a les 14.01, Ian Campbell ha escrit:
> >>> On Wed, 2016-01-20 at 12:57 +0100, Roger Pau Monne wrote:
>  Allow enabling or disabling emulated devices from the libxl domain
>  configuration file. For HVM guests with a device model all the
>  emulated
>  devices are enabled. For HVM guests without a device model no devices
>  are
>  enabled by default, although they can be enabled using the options
>  provided.
>  The arbiter of whether a combination is posible or not is always Xen,
>  libxl
>  doesn't do any kind of check.
> 
>  This set of options is also propagated inside of the libxl migration
>  record
>  as part of the contents of the libxl_domain_build_info struct.
> >>>
> >>> ... and this is the real motivation for this change, not actually
> >>> allowing
> >>> users to control all this AIUI.
> >>>
> >>> Did you check that the fields updated using libxl_defbool_setdefault
> >>> are
> >>> actually updated in the JSON copy and therefore propagated to the other
> >>> side of a migration as specific values and not as "pick a default"? I
> >>> think
> >>> we don't want these changing on migration. I think/hope all this was
> >>> automatically handled by the work Wei did in the last release cycle.
> >>
> >> No, values populated by the {build/create}_info_setdefault functions are
> >> not propagated, OTOH values manually set by the user in the config file
> >> are indeed propagated. Do we have any guarantee that _setdefault is
> >> always going to behave in the same way?
> > 
> > No, part of the purpose of defbool and the other "do the default" values is
> > that we can evolve things over time.
> >
> >> If we don't have that guarantee I think this is already a bug, and we
> >> should call _setdefault before sending the domain info to the other end.
> >> In fact I have a patch that does exactly that, but I'm unsure if it's
> >> needed because I don't know the policy regarding default values in libxl.
> > 
> > Wei, isn't this (turning the defaults into concrete values) supposed to be
> > taken care of by the JSON mangling which you added?
> 
> Heh, I think you mean the JSON mangling added by Wei. In order to 
> propagate the values filled by default in libxl_domain_config I had to 
> add the following patch, which basically calls the _setdefault 
> functions before converting the domain_config into JSON. I'm planning 
> to make it part of this series in the next iteration:

The requirement of recording decision made in libxl and pass that to the
receiving end is not new. We had the same problem for uuid, disk and
some other things.

The first way of doing it is to update JSON before it is sent -- see
libxl.c:libxl_retrieve_domain_configuration. It uses the stashed JSON
file as template and pull in various bits from hypervisor and xenstore.
Your need of recording what emulated devices are available fits here.
You just need to provide a way to retrieve those bits in that function.

Another way of doing it is to update the stashed JSON template before it
is saved. See libxl_internal.c:libxl__update_domain_configuration. I
think this might be easier than the first way of doing it.

You should not export the _setdefault function to xl because it is a
layer violation.

Hope this clarify things.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [qemu-mainline test] 78597: tolerable FAIL - PUSHED

2016-01-21 Thread osstest service owner
flight 78597 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/78597/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-cubietruck 15 guest-start/debian.repeat fail in 78509 pass 
in 78597
 test-armhf-armhf-xl-rtds 11 guest-startfail in 78509 pass in 78597
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail pass in 
78509
 test-armhf-armhf-xl-vhd  14 guest-start/debian.repeat   fail pass in 78509

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail   like 78421

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 13 guest-saverestorefail never pass
 test-armhf-armhf-libvirt-raw 13 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuu3db34bf64ab4f8797565dd8750003156c32b301d
baseline version:
 qemuu19b6d84316892c8086e0115d6f09cb01abb86cfc

Last test of basis78421  2016-01-18 09:47:22 Z3 days
Testing same since78509  2016-01-19 09:21:37 Z2 days2 attempts


People who touched revisions under test:
  Andreas Färber 
  Christophe Fergeau 
  Daniel P. Berrange 
  Gerd Hoffmann 
  Juan Quintela 
  Mark Cave-Ayland 
  Paolo Bonzini 
  Peter Maydell 
  Wolfgang Bumiller 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt  

Re: [Xen-devel] [PATCH V13 5/5] xl: add pvusb commands

2016-01-21 Thread Wei Liu
On Thu, Jan 21, 2016 at 12:21:11PM +, Ian Campbell wrote:
> On Thu, 2016-01-21 at 12:12 +, Wei Liu wrote:
> > On Tue, Jan 19, 2016 at 08:10:07PM -0700, Jim Fehlig wrote:
> > > On 01/19/2016 11:11 AM, Ian Jackson wrote:
> > > > Chunyan Liu writes ("[PATCH V13 5/5] xl: add pvusb commands"):
> > > > > Add pvusb commands: usbctrl-attach, usbctrl-detach, usb-list,
> > > > > usbdev-attach and usbdev-detach.
> > > > Thanks for swapping this with the other patch.  It is better now.
> > > > 
> > > > > +=item B I I
> > > > However, I think you need to explictly state that the user may (and
> > > > indeed, must) pass multiple settings as separate arguments.  AFAICT
> > > > the parser here doesn't do the ,-splitting.
> > > 
> > > I just noticed this is the case with network devices as well. E.g.
> > > 
> > > #xl network-attach hvm-domU mac=00:16:3e:xx:yy:zz,bridge=br0
> > > libxl: error: libxl_device.c:1095:device_hotplug_child_death_cb:
> > > script: Could
> > > not find bridge device xenbr0
> > > 
> > > main_networkattach() in tools/libxl/xl_cmdimpl.c doesn't split on the
> > > ',', so
> > > everything after mac=00:16:3e:xx:yy:zz is ignored. I'd need advice on
> > > how to fix
> > > this though. Based on xl-network-configuration doc and Xen tool's long
> > > history
> > > of network-attach supporting that syntax, I'd say main_networkattach()
> > > should be
> > > changed to split on ','. I could also change the docs. Do tools
> > > maintainers have
> > > a preference, or alternative option?
> > > 
> > 
> > It is splitting on " " at the moment which is not very desirable.
> > 
> > I think this is a bug. I've added it to my list and will look into
> > fixing it.
> 
> If you are feeling particularly highly motivated then xl_cmdimpl.c contains
> quite a lot of adhoc parsing of various comma a space separated lists
> (often dupped for a given instance into cmdline vs cfg file, which leads to
> unintentional behavioural differences like above) which could usefully be
> consolidated into a series of helpers.
> 

Or rewrite every adhoc parser with bison/flex or Ocaml / Haskell parsec.

Jokes aside. I will see what I can do. Consolidating them into helper
functions is a good start.

Wei.

> Ian.
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH LIBVIRT v3] libxl: Support cmdline= in xl config files

2016-01-21 Thread Ian Campbell
... and consolidate the cmdline/extra/root parsing to facilitate doing
so.

The logic is the same as xl's parse_cmdline from the current xen.git master
branch (e6f0e099d2c17de47fd86e817b1998db903cab61).

On the formatting side switch to producing cmdline= instead of extra=.

Update a few tests and add serveral more.
  - test-cmdline is added to test the exclusive use of cmdline.
  - test-fullvirt-direct-kernel-boot.cfg is updated due to the switch
on the formatting side and now tests the exclusive use of cmdline=.
  - Tests are added for both paravirt and fullvirt where the .cfg uses
extra= and (paravirt only) root=. These are format (xl->xml) only
since the inverse will generate cmdline= hence is not a round trip
(which was already true if using root=, which used to generate
extra= on the way back).
  - Tests are added for both paravirt and fullvirt where the .cfg
declares cmdline= as well as bogus extra= and (paravirt only) root=
entries which should be ignored. Again these are format only tests
since the inverse won't include the bogus lines.

The last two bullets here required splitting the DO_TEST macro into
two halves, as is done in the xmconfigtest.c case.

In order to introduce a use of VIR_WARN for logging I had to add
virerror.h and VIR_LOG_INIT.

Signed-off-by: Ian Campbell 
---
v2: Use VIR_INFO (adding necessary infra)
Don't initialise things to NULL when there is no need.
v3: I know know the answer re VIR_FROM_THIS, because Jim fixed it.
Initialise cmdline to NULL, since neither I nor gcc were smart
 enough to spot the uninitialised path I did this in preference to
 adding the else case, since that apparently won't be masking the
 compiler's ability to spot uninitialised vars in this function.
Add tests
Addjust xenFormatXLOS to produce cmdline= instead of extra=.
---
 src/xenconfig/xen_xl.c | 70 +-
 ...est-fullvirt-direct-kernel-boot-bogus-extra.cfg | 31 ++
 ...est-fullvirt-direct-kernel-boot-bogus-extra.xml | 51 
 .../test-fullvirt-direct-kernel-boot-extra.cfg | 30 ++
 .../test-fullvirt-direct-kernel-boot-extra.xml | 51 
 .../test-fullvirt-direct-kernel-boot.cfg   |  2 +-
 .../test-paravirt-cmdline-bogus-extra-root.cfg | 13 
 .../test-paravirt-cmdline-bogus-extra-root.xml | 32 ++
 .../test-paravirt-cmdline-extra-root.cfg   | 15 +
 .../test-paravirt-cmdline-extra-root.xml   | 32 ++
 tests/xlconfigdata/test-paravirt-cmdline.cfg   | 14 +
 tests/xlconfigdata/test-paravirt-cmdline.xml   | 32 ++
 tests/xlconfigtest.c   | 24 ++--
 13 files changed, 365 insertions(+), 32 deletions(-)
 create mode 100644 
tests/xlconfigdata/test-fullvirt-direct-kernel-boot-bogus-extra.cfg
 create mode 100644 
tests/xlconfigdata/test-fullvirt-direct-kernel-boot-bogus-extra.xml
 create mode 100644 
tests/xlconfigdata/test-fullvirt-direct-kernel-boot-extra.cfg
 create mode 100644 
tests/xlconfigdata/test-fullvirt-direct-kernel-boot-extra.xml
 create mode 100644 
tests/xlconfigdata/test-paravirt-cmdline-bogus-extra-root.cfg
 create mode 100644 
tests/xlconfigdata/test-paravirt-cmdline-bogus-extra-root.xml
 create mode 100644 tests/xlconfigdata/test-paravirt-cmdline-extra-root.cfg
 create mode 100644 tests/xlconfigdata/test-paravirt-cmdline-extra-root.xml
 create mode 100644 tests/xlconfigdata/test-paravirt-cmdline.cfg
 create mode 100644 tests/xlconfigdata/test-paravirt-cmdline.xml

diff --git a/src/xenconfig/xen_xl.c b/src/xenconfig/xen_xl.c
index 026cbcc..be194e3 100644
--- a/src/xenconfig/xen_xl.c
+++ b/src/xenconfig/xen_xl.c
@@ -27,6 +27,7 @@
 
 #include "virconf.h"
 #include "virerror.h"
+#include "virlog.h"
 #include "domain_conf.h"
 #include "viralloc.h"
 #include "virstring.h"
@@ -35,6 +36,8 @@
 
 #define VIR_FROM_THIS VIR_FROM_XENXL
 
+VIR_LOG_INIT("xen.xen_xl");
+
 /*
  * Xen provides a libxl utility library, with several useful functions,
  * specifically xlu_disk_parse for parsing xl disk config strings.
@@ -58,11 +61,46 @@ extern int xlu_disk_parse(XLU_Config *cfg,
   libxl_device_disk *disk);
 #endif
 
+static int xenParseCmdline(virConfPtr conf, char **r_cmdline)
+{
+char *cmdline = NULL;
+const char *root, *extra, *buf;
+
+if (xenConfigGetString(conf, "cmdline", , NULL) < 0)
+return -1;
+
+if (xenConfigGetString(conf, "root", , NULL) < 0)
+return -1;
+
+if (xenConfigGetString(conf, "extra", , NULL) < 0)
+return -1;
+
+if (buf) {
+if (VIR_STRDUP(cmdline, buf) < 0)
+return -1;
+if (root || extra)
+VIR_WARN("ignoring root= and extra= in favour of cmdline=");
+} else {
+if (root && extra) {
+if (virAsprintf(, "root=%s %s", root, extra) < 0)
+return -1;
+} else 

Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 20:35 +0800, Peng Fan wrote:
> On Thu, Jan 21, 2016 at 12:26:04PM +, Ian Campbell wrote:
> > Would adding a dummy fixed-clock[0] (or several of them) to the guest
> > passthrough DT satisfy the driver's requirements? They would be hardcoded
> > to whatever rate dom0 and/or the tools has decided upon (and had set in the
> > real h/w).
> 
> If using this way, we have the assumption that DomU device driver would not
> change the rate of the clock driving the device. I am not sure whether this is
> ok for so many platform devices based ARM core.

Don't (non-buggy) drivers already need to cope with the possibility that
the clk core may not be able to satisfy their request for a particular rate
due to constraints from other ip blocks?

I would also expect the user to want to configure things in dom0 such that
the specific drivers used in domU are satisfied with what they get (which
is a bit fiddly perhaps, but platform device passthrough already is
somewhat that way).

> In /sys/kernel/debug/clk/, there are clk tree info, but
> clk api are not exposed to userspace as far as I know, so
> if using sysfs interface to set a known fixed rate or enable/disable the 
> clock,
> we need to expose the clk info to userspace.

That seems possible to arrange given a use case for it though, a SMOP (and
convincing the clk maintainer of the need).

Worst case is a xen-clkback driver or perhaps even vfio will want to do
something like this, we can't normally use vfio on Xen, but in this case
perhaps we could.


> Jan said using hypercall in the other mail, do you have any ideas about
> this?

This would need a whole clock infrastructure in Xen, wouldn't it? I
described why that is not currently available in an earlier mail, and also
my opinion that doing the whole thing in Xen would involve pulling in far
too much SoC specific code for each specific platform.

Ian.
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Jan Beulich
>>> On 21.01.16 at 13:06,  wrote:
> On Thu, Jan 21, 2016 at 03:21:38AM -0700, Jan Beulich wrote:
> On 21.01.16 at 09:59,  wrote:
>>> uart2 needs clock IMX7D_UART2_ROOT_CLK from the ccm.
>>> passthrough uart2, hypervisor handles the reg and interrupts, that is 
>>> because
>>> hypervisor handles the memory map and the interrupt controller(GIC). But 
>>> here
>>> CCM is not handled by hypervisor, it is handled by Dom0.
>>
>>This, I take it, describes the current state, which doesn't make clear
>>whether this is intentionally that way (and can't be changed), or
>>just happens to be that way because so far it didn't matter.
>>
>>> For ARMV8 server products, I am not sure whether hypercall is better; but to
>>> my case, it's not feasible to use hypercall.
>>
>>Because of ...?
> 
> I guess you mean DomU issues hypercall and Xen forwards the request to Dom0,
> then Dom0 reply the response?

Well, no, that wouldn't be a desirable (or even sane) model.

>>> Dom0 handles all the clocks, DomU just send request to Dom0 and ask Dom0 to
>>> enable/disable/set rate for a clock for the device. So I think it's okay
>>> for multipile parties, the clk subsystem in Dom0 can handle mutiple 
>>> requests 
>>> even if the clock is shared.
>>
>>I.e. if one party sets one rate, and later another party sets
>>a different rate, things are going to work fine? If so, why would
>>the different parties need control over the rate in the first place?
> 
> oh. thanks for teaching me. If two IPs share one clock, and IP1 for Dom0, 
> IP2 for DomU,
> Dom0 needs clock work at 20Hz, but DomU want clock work at 40Hz. pvclk can 
> not avoid this.

But you mustn't allow a DomU to affect Dom0, nor may two DomU-s
interact in such a way with one another.

> If not using pvclk, we develop a new method to handle clock. I guest we can 
> also not avoid this?

At the very least it would need to be avoided by denying the request.
Upon shared use, either all parties agree, or only one may use the
clock. And passing through a (platform) device would therefore imply
validating that the needed clock(s) are available to the target domain.
Doing this in a consistent way with all control in one component's
hands seems doable only if hypervisor and/or tool stack are the
controlling (and arbitrating) entity. In the end this is one of the
reasons why to me a simple PV I/O interface doesn't seem suitable
here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

2016-01-21 Thread Jan Beulich
>>> On 21.01.16 at 14:17,  wrote:
> On 21/01/16 12:59, Jan Beulich wrote:
>>>
> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
> build (so there is a guest observable difference between running on a
> debug and a non-debug Xen), and the comment beside it even identifies
> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
> is set.  Currently this means that we are not able to support Protection
> Key for PV guests (although this restriction technically only applies to
> debug builds of the hypervisor).
>
> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
> is used to notice when a 64bit PV guest attempts to override the fixup
> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
> warning is logged and the kernel overridden.
>
>
> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
> them serve a purpose other than a debugging aid.
>
> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
> "all bits available for guest use".
 And a kernel using any of the conflicting bits would then become
 unusable on a hypervisor with that debug option enabled? I'd
 rather see us document the state things are in...
>>> _PAGE_GNTMAP is already states:
>>>
>>> /*
>>>  * Debug option: Ensure that granted mappings are not implicitly unmapped.
>>>  * WARNING: This will need to be disabled to run OSes that use the spare PTE
>>>  * bits themselves (e.g., *BSD).
>>>  */
>> But that's (assuming the use of the two bits were spelled out) a
>> guest OS not fully playing by the spec. To me, "available" PTE bits
>> being shared implies that some of them may be claimed by Xen,
>> while others may be claimed by guests. You're right that this needs
>> to be written down, but I don't think we need to go as far as
>> forbidding Xen to use any of them. And even less so should we
>> preclude their use for any purpose going forward.
> 
> I do not want to make an ABI which mandates the use of certain bits by Xen.

Agreed, but that doesn't match what I said: We also shouldn't
make an ABI precluding use of some of the bits by Xen.

>> In the end, with (as it seems) not much effort this could even be
>> made dynamic: A guest could advertise which of the bits it doesn't
>> use, and then Xen could pick two of them for the two purposes it
>> currently needs them for. Should a guest leave no or just one bit
>> available, the debugging aid could then be disabled.
> 
> This is unnecessarily complicated.  It only helps going forward, and
> leaves Xen with substantially more complicated PTE handling.

Substantially? Instead of using a constant, we'd use a per-domain
variable. Not much of a complication afaict.

> What happens if at some point later, we try to boot a PV guest which has
> a different PTE layout to what Xen has chosen?  We definitely can't
> update every PTE with a newly-chosen layout.

Just to re-iterate: A per domain selection of bit mask(s).

>>> I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
>>> disabled by default even in debug builds.
>>>
>>> There should not be an ABI difference between release and "normal" debug
>>> builds.
>> Well, I see your point, but as said above I'm not convinced
>> disabling all that code is the right solution. In fact, what you
>> propose is not far away from removing that code altogether.
> 
> The two bits are only used for specialised debugging.  They should be
> relegated to people doing specific debugging, and not interfere with the
> overwhelming majority of cases where Xen doesn't need to use any
> software available PTE bits.

Your repeated claim that _PAGE_GUEST_KERNEL is purely debugging only
makes we wonder how you would mean to adjust adjust_guest_l1e() with
that flag gone (most notably the last of its if()-s).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 4/8] x86/init: add linker table support

2016-01-21 Thread Roger Pau Monné
El 20/01/16 a les 22.33, Luis R. Rodriguez ha escrit:
> On Wed, Jan 20, 2016 at 1:00 PM, Konrad Rzeszutek Wilk
>  wrote:
>>> +static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
>>> +{
>>> + if (!fn->supp_hardware_subarch) {
>>> + pr_err("Init sequence fails to declares any supported 
>>> subarchs: %pF\n", fn->early_init);
>>> + WARN_ON(1);
>>> + }
>>> + if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
>>> + return true;
>>> + return false;
>>> +}
>>
>> So the logic for this working is that boot_params.hdr.hardware_subarch
>>
>> And the macros define two: BIT(X86_SUBARCH_PC) or BIT(X86_SUBARCH_XEN).
>>
>> But hardware_subarch by default is set to zero. Which means if GRUB2, 
>> PXELinux, Xen multiboot1
>> don't set it - then the X86_SUBARCH_PC is choosen right?
>>
>>  1 << 0 & 1 << X86_SUBARCH_PC (which is zero).
>>
>> For this to nicely work with Xen it ought to do this:
>>
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 993b7a7..6cf9afd 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1676,6 +1676,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
>> boot_params.hdr.ramdisk_image = initrd_start;
>> boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
>> boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
>> +   boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;
>>
>> if (!xen_initial_domain()) {
>> add_preferred_console("xenboot", 0, NULL);
>>
>>
>> ?
> 
> That's correct for PV and PVH, likewise when qemu is required for HVM
> qemu could set it. I have the qemu change done but that should only
> cover HVM. A common place to set this as well could be the hypervisor,
> but currently the hypervisor doesn't set any boot_params, instead a
> generic struct is passed and the kernel code (for any OS) is expected
> to interpret this and then set the required values for the OS in the
> init path. Long term though if we wanted to merge init further one way
> could be to have the hypervisor just set the zero page cleanly for the
> different modes. If we needed more data other than the
> hardware_subarch we also have the hardware_subarch_data, that's a u64
> , and how that is used would be up to the subarch. In Xen's case it
> could do what it wants with it. That would still mean perhaps defining
> as part of a Xen boot protocol a place where xen specific code can
> count on finding more Xen data passed by the hypervisor, the
> xen_start_info. That is, if we wanted to merge init paths this is
> something to consider.
> 
> One thing I considered on the question of who should set the zero page
> for Xen with the prospect of merging inits, or at least this subarch
> for both short term and long term are the obvious implications in
> terms of hypervisor / kernel / qemu combination requirements if the
> subarch is needed. Having it set in the kernel is an obvious immediate
> choice for PV / PVH but it means we can't merge init paths completely
> (down to asm inits), we'd still be able to merge some C init paths
> though, the first entry would still be different. Having the zero page
> set on the hypervisor would go long ways but it would mean a
> hypervisor change required.

I don't think the hypervisor should be setting Linux specific boot
related parameters, the boot ABI should be OS agnostic. IMHO, a small
shim should be added to Linux in order to set what Linux requires when
entering from a Xen entry point.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-4.5-testing baseline-only test] 38676: trouble: blocked/broken/fail/pass

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 07:49 +0100, Juergen Gross wrote:
> On 21/01/16 07:31, Platform Team regression test user wrote:
> > This run is configured for baseline tests only.
> > 
> > flight 38676 xen-4.5-testing real [real]
> > http://osstest.xs.citrite.net/~osstest/testlogs/logs/38676/
> 
> I've seen this more than once now: some test results seem not to be
> reachable from outside Citrix:
> 
> $ host osstest.xs.citrite.net
> Host osstest.xs.citrite.net not found: 3(NXDOMAIN)
> 
> Is this on purpose?

More due to a lack of somewhere to post them publicly.

These results from "citrix-osst...@xenproject.org", generally with
"baseline-only" in the subject and a flight > 5000 are from the osstest
instance running in Cambridge, which used to be the production instance
prior to the colo.

Reports from the production colo come from "osstest service owner ".

The Cambridge baseline-only tests were originally setup so we could make
use of the historical info e.g. regarding machine specific failures in the
period after the switch (i.e. when there wasn't much historical data for
the colo based machines) and to give some overlap in the results of the old
and new instances.

Since then I have left them running since we have versions of hardware
(particularly older machines) which the COLO doesn't have, and also because
it gives those of us here a set of baselines to use for any adhoc testing
we might run here.

Being "baseline-only" each branch is run exactly once on any new baseline,
i.e. when the XenProject instance gets a push of that branch.

Right now our instance is suffering from a host-install issue exposed by
the Debian Jessie upgrade (arising because some of those machines have two
disks, which caused us to notice that none of the machines in the COLO did,
which is being fixed), so the results look bad right now. The fix is in
transit through the colo push gate where it will be picked up by the
Cambridge instance and the results should improve.

Anyway, I kept the mails going to the list because they might be of
interest, and because they are baseline-only they should be relatively low
volume (compared with the production instance). I'm happy to push logs for
specific flights to my user space on xenbits if someone wants to look at an
issue, but for the most part I would expect them to be ignored. If they are
considered too annoying I can arrange for the reports to go elsewhere.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [RFC] VirtFS support on Xen

2016-01-21 Thread Wei Liu
[RFC] VirtFS support on Xen

# Introduction

QEMU/KVM supports file system passthrough via an interface called
VirtFS [0]. VirtFS is in turn implemented with 9pfs protocol [1] and
VirtIO transport.

Xen used to have its own implementation of file system passthrough
called XenFS, but that has been inactive for a few years. The latest
update was in 2009 [2].

This project aims to add VirtFS support on Xen. This is more
sustainable than inventing our own wheel.

# QEMU

Some preliminary work has been done. The original code made a lot of
assumptions that VirtFS was tied to VirtIO, which wasn't true. Patches
to disentangle VirtFS generic code and VirtIO transport have been
upstreamed. What we now need to do is to implement a Xen transport
inside QEMU.

# Linux kernel

Linux kernel has a better shape than QEMU. The generic code and
transport layers are separated cleanly. We need to add a new transport
for Xen.

# Xen toolstack

New interfaces will be added to exposed relevant configuration options
to users. In essence it is just plumbing the right options to QEMU.

# Security

Xen transport will utilise grant table to limit access from backend to
frontend. Users can use several security modes provided by QEMU.

By and large the security of VirtFS on Xen depends on the quality of
QEMU VirtFS code (this is in line with our expectation of other PV
backends in QEMU) and the particular setup of VirtFS (whether a chroot
fs-proxy is used, which security model is picked etc).

# Device type and wire format

I propose to use "virtfs" as device type. A header file named virtfs.h
shall be committed to xen.git as reference for future implementation.

The design of wire format is limited by two factors: 1) 9pfs protocol
is based on transaction -- client sends out request and expects
response, 2) the arrangement of Linux kernel 9pfs places request
buffer and response buffer on the wire at the same time.

Linux seems to be the only one among several popular UNIX / UNIX-like
systems that has in-kernel 9p protocol implementation (correct me if
I'm wrong), so I couldn't fathom what other OSes such as FreeBSD and
NetBSD would need from this wire format. But it would be safe to bet
that they would like to reuse as much code as possible and follow the
same path as Linux does.

So the conclusion so far is that it would make sense for us to follow
the Linux approach (putting two buffers on wire at the same time) to
minimise our work. Other opinions are welcome.

As for the wire format itself, I've attached a draft from my
code. It's far from complete, but it contains the wire format I
envisage.

# Optimisation

The Xen trasnport can be easily extended to use multiple-page ring
should the need arise. A more sophisticated optimisation is to support
"indirect-descriptor" like interface to boost bandwidth.


Wei.


[0] http://www.linux-kvm.org/page/VirtFS
[1] https://en.wikipedia.org/wiki/9P_(protocol)
[2] https://blog.xenproject.org/2009/03/26/status-of-xenfs/
[3] http://wiki.qemu.org/Documentation/9psetup
[4] http://man.cat-v.org/plan_9/5/intro

DRAFT HEADER

/**
 * virtfs.h
 *
 * Copyright (c) 2016, Wei Liu 
 */

#ifndef __XEN_PUBLIC_IO_VIRTFS_H__
#define __XEN_PUBLIC_IO_VIRTFS_H__

#include 
#include 

struct xen_virtfsif_request {
grant_ref_t gref;  /* Reference to buffer page */
uint16_t offset;   /* Offset within buffer page */
uint16_t flags;/* XEN_VIRTFSF_* */
uint16_t id;   /* Echoed in response message. */
};

struct xen_virtfsif_response {
uint16_t id;
int16_t  status;/* Indicate whether backend successfully copy / 
map */
};

/* TODO: RFC style diagrams go in here */

/* TODO: document xenstore path */
/*
 * /local/domain/$guest/device/virtfs/$x/tag
 * /local/domain/$guest/device/virtfs/$x/ring-ref
 * /local/domain/$guest/device/virtfs/$x/event-channel
 */

/*
 * This is wire format for virtfs requests:
 *   Request 1: xen_virtfsif_request -- XEN_VIRTFSF_* (any flag)
 *   Request 2: xen_virtfsif_request -- if request 1 has XEN_VIRTFSF_MORE
 *   Request 3: xen_virtfsif_request -- if request 2 has XEN_VIRTFSF_MORE
 *   ...
 *   Request N: xen_virtfsif_request -- if request (N-1) has XEN_VIRTFSF_MORE,
 *  while itself doesn't contain
 *  XEN_VIRTFSF_MORE
 *
 *
 *  | slot 0 | slot 1 | ... | slot m | ... | slot n   |
 *  |   9p request buffer| 9p response buffer |
 *
 *  Grefs for 9p requests (if any) go in first half of wire format,
 *  grefs for 9p responses (if any) go in second half.
 */

/* If this bit is set, the next slot if part of the request. */
#define _XEN_VIRTFSF_MORE (0)
#define XEN_VIRTFSF_MORE  (1U << _XEN_VIRTFSF_MORE)

/*
 * If this bit is set, this gref is for backend to write data (9p
 * response buffer); otherwise it is for backend to read (9p 

Re: [Xen-devel] [PATCH v3 5/5] libxl: add options to enable/disable emulated devices

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 11:01 +0100, Roger Pau Monné wrote:
> El 21/01/16 a les 10.39, Ian Campbell ha escrit:
> > > If we don't have that guarantee I think this is already a bug, and we
> > > should call _setdefault before sending the domain info to the other end.
> > > In fact I have a patch that does exactly that, but I'm unsure if it's
> > > needed because I don't know the policy regarding default values in libxl.
> > 
> > Wei, isn't this (turning the defaults into concrete values) supposed to be
> > taken care of by the JSON mangling which you added?
> 
> Heh, I think you mean the JSON mangling added by Wei.

Correct.

>  In order to 
> propagate the values filled by default in libxl_domain_config I had to 
> add the following patch, which basically calls the _setdefault 
> functions before converting the domain_config into JSON. I'm planning 
> to make it part of this series in the next iteration:

I'll let Wei comment on why this isn't already done.

> > > With the current code, libxl basically limits the set of allowed masks
> > > to what it knows. After the change libxl just becomes a proxy for
> > > transmitting what the user has selected to Xen, and Xen either accepts
> > > or refuses it, basically making Xen the only arbiter that decides which
> > > emulated devices get enabled or not. This means that if we want to make
> > > more emulated devices available to the guest in the future no libxl
> > > changes will be required.
> > 
> > We would need to add a new defbool for the new feature.
> 
> Yes, but I was thinking more in the direction of enabling them, rather 
> than adding new ones.

Which would then require changing the defbool_set_default in libxl after
this change, so you do still need to change libxl.

> > > It also means that HVMlite guests created with current Xen will be
> > > capable of migrating to newer version of Xen, that might have a
> > > different default policy. For example in the future we might want to
> > > enable the lapic by default, so if a guest is created with the current
> > > Xen version it doesn't get a lapic at all, and then when migrated to
> > > newer versions a lapic would magically appear after the migration, which
> > > is not desired.
> > 
> > ... and the reason these details can't be propagated via the Xen blob is
> > that this emul stuff needs to be set exactly once at domain create time I
> > suppose? Changing it to be later binding is considered to be too hard/too
> > big a yak?
> 
> Right, emulated devices are initialised as part of the 
> XEN_DOMCTL_createdomain hypercall. Allowing them to be added later on 
> and introducing a kind of intermediate domain building phase where only 
> a certain set of hypercalls are valid is a possibility that Andrew 
> already pointed out, however this seems like a very big project.

This seems like the right approach to me, but I appreciate you not wanting
to tackle this here and now.

Would it be possible to set the set of "potential" emulated devices at
create time and only "commit" to it after the save state is loaded? That
would essentially mean init-all, load state, de-init those which didn't get
any state loaded? Not as nice as doing it with the split of hypercall
availability, but might be more achievable, since you already have the de-
init code for domain teardown time.

In any case, whatever is chosen as the solution the commit message needs to
go into a fair amount of detail as to why we picked that way of doing
things, particularly if it is a compromise vs doing it properly.

This is important so we can answer "why did we do it this way" in 2 years
time.

> > Even with the set of devices set at domain creation time Xen needs to take
> > care when reading its blob, and not fall apart (from a security PoV, it's
> > allowed to fail the state load) when presented with a save record relating
> > to something which is supposedly disabled. Has this been checked?
> 
> Yes, trying to load a state into a disable device will result in 
> -ENODEV.

Grand.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Peng Fan
Hi Jan,

On Thu, Jan 21, 2016 at 12:53:01AM -0700, Jan Beulich wrote:
 On 21.01.16 at 02:29,  wrote:
>> The platform device passthrough part for arm is to mapping the machine io 
>> address
>> to the guest physical io address. Then guest can map the phsical io address 
>> to its
>> own virtual address, then by accessing virtual address, guest can access 
>> machine io address space.
>> So the platform device passthrough does not needs frontend/backend driver to 
>> support, except smmu is handled by xen.
>> 
>> But the platform device needs clk to drive the hardware IP, also may needs 
>> pinmux settings.
>> the driver in guest needs to drive the hardware IP passed through to the 
>> guest, so it needs to operate on the clk.
>> 
>> Just pasted comments from George for V1:
>> 
>> "
>> Just speaking from the perspective of a Xen dev who's not an ARM dev:
>> a few more words on the relationship between pvclk and
>> device-passthrough would be helpful to set the context.  It sounds
>> like:
>> 
>> * On ARM, passing through a device requires a clocksource (at least
>> for many devices)
>> 
>> * dom0 has the hardware clocksource, but at the moment domUs don't
>> have a suitable clocksource
>> 
>> * This patch implements pvclk front/backend suitable for such devices
>> 
>> Is that right?  In which case something like the following would be helpful:
>> 
>> "This patch introduces pvclk, a paravirtualized clock source suitable
>> for devices to use when passing through to domUs on ARM systems."
>> "
>
>That's a possible perspective to take, but not the only one. In
>fact, coming to what I said previously, I wonder whether placing
>the "backend" in Dom0 is the right thing in the first place -
>fundamentally arbitration of hardware use should be done
>(or at least checked/enforced) by the hypervisor. I.e. just like
>while Dom0 may assign a PCI device to a guest, the hypervisor
>is in charge of actually making all the resources needed for this
>to work accessible to the guest, and/or verifying that permissions
>are in place (like e.g. when setting up interrupts). Yet the model
>proposed here completely bypasses the hypervisor afaict.

To platform device of ARM, hypervisor is responsible for the mapping
between machine address and guest physical address, also responsible
for the irq mapping.

But to embedded ARM SoC, the hardware clk IP is handled in Dom0.
On my i.MX platform, the hardware clk IP named
Clock Controller Module will output clks to drive different IPs, such as
uart,gpu,lcd,sd controller,scsi controller,pcie controller.
The hardware clock IP is not same for all ARM SoC vendors. ARM has
spec, such as GIC and SMMU to ask SoC vendors follow the spec, then
it's easy to let hypervisor handle them. But there is no common spec
for the clock IP.

>
>Are there connections between a platform device and its clock(s),
>e.g. in DT? If so, wouldn't it be possible for granting access to a
>platform device to imply granting control of the respective clock?

clock hardware IP is also a device. The following is partial dts for i.MX7Dual.
clks: ccm@3038 {
compatible = "fsl,imx7d-ccm";
reg = <0x3038 0x1>;
interrupts = ,
 ;
#clock-cells = <1>;
clocks = <>, <>;
clock-names = "ckil", "osc";
};

uart2: serial@3089 {
compatible = "fsl,imx7d-uart",
 "fsl,imx6q-uart";
reg = <0x3089 0x1>;
interrupts = ;
clocks = < IMX7D_UART2_ROOT_CLK>,
< IMX7D_UART2_ROOT_CLK>;
clock-names = "ipg", "per";
status = "disabled";
};
uart2 needs clock IMX7D_UART2_ROOT_CLK from the ccm.
passthrough uart2, hypervisor handles the reg and interrupts, that is because
hypervisor handles the memory map and the interrupt controller(GIC). But here
CCM is not handled by hypervisor, it is handled by Dom0.

>In which case clock control might perhaps better become a
>hypercall based mechanism? And further - are all clocks in use for
>at most one platform device (i.e. is there no sharing possible)? If
>not, how do you envision to make multiple parties agree on the
>clock settings and state?

For ARMV8 server products, I am not sure whether hypercall is better; but to my
case, it's not feasible to use hypercall.

Dom0 handles all the clocks, DomU just send request to Dom0 and ask Dom0 to
enable/disable/set rate for a clock for the device. So I think it's okay
for multipile parties, the clk subsystem in Dom0 can handle 

Re: [Xen-devel] [PATCH v3 5/5] libxl: add options to enable/disable emulated devices

2016-01-21 Thread Ian Campbell
On Wed, 2016-01-20 at 19:33 +0100, Roger Pau Monné wrote:
> El 20/01/16 a les 14.01, Ian Campbell ha escrit:
> > On Wed, 2016-01-20 at 12:57 +0100, Roger Pau Monne wrote:
> > > Allow enabling or disabling emulated devices from the libxl domain
> > > configuration file. For HVM guests with a device model all the
> > > emulated
> > > devices are enabled. For HVM guests without a device model no devices
> > > are
> > > enabled by default, although they can be enabled using the options
> > > provided.
> > > The arbiter of whether a combination is posible or not is always Xen,
> > > libxl
> > > doesn't do any kind of check.
> > > 
> > > This set of options is also propagated inside of the libxl migration
> > > record
> > > as part of the contents of the libxl_domain_build_info struct.
> > 
> > ... and this is the real motivation for this change, not actually
> > allowing
> > users to control all this AIUI.
> > 
> > Did you check that the fields updated using libxl_defbool_setdefault
> > are
> > actually updated in the JSON copy and therefore propagated to the other
> > side of a migration as specific values and not as "pick a default"? I
> > think
> > we don't want these changing on migration. I think/hope all this was
> > automatically handled by the work Wei did in the last release cycle.
> 
> No, values populated by the {build/create}_info_setdefault functions are
> not propagated, OTOH values manually set by the user in the config file
> are indeed propagated. Do we have any guarantee that _setdefault is
> always going to behave in the same way?

No, part of the purpose of defbool and the other "do the default" values is
that we can evolve things over time.

> If we don't have that guarantee I think this is already a bug, and we
> should call _setdefault before sending the domain info to the other end.
> In fact I have a patch that does exactly that, but I'm unsure if it's
> needed because I don't know the policy regarding default values in libxl.

Wei, isn't this (turning the defaults into concrete values) supposed to be
taken care of by the JSON mangling which you added?

> 
> > > Signed-off-by: Roger Pau Monné 
> > > ---
> > > Cc: Ian Jackson 
> > > Cc: Ian Campbell 
> > > Cc: Wei Liu 
> > > ---
> > >  docs/man/xl.cfg.pod.5   | 39
> > > +++
> > >  tools/libxl/libxl.h | 11 +++
> > >  tools/libxl/libxl_create.c  | 21 -
> > >  tools/libxl/libxl_types.idl |  6 ++
> > >  tools/libxl/libxl_x86.c | 33 -
> > >  tools/libxl/xl_cmdimpl.c|  7 +++
> > >  6 files changed, 111 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> > > index 8899f75..46d4529 100644
> > > --- a/docs/man/xl.cfg.pod.5
> > > +++ b/docs/man/xl.cfg.pod.5
> > > @@ -1762,6 +1762,45 @@ See F
> > > for
> > > more information.
> > >  
> > >  =back
> > >  
> > > +=head3 HVM without a device model options
> > > +
> > > +This options can be used to change the set of emulated devices
> > > provided
> > 
> > "These..."
> > 
> > > +to guests without a device model. Note that Xen might not support
> > > all
> > > +possible combinations. By default HVM guests without a device model
> > > +don't have any of them enabled.
> > 
> > ... and for those with a device model? The title and text suggest these
> > options are invalid/ignored in that case, but the code does actually
> > honour
> > what the user specified in this case.
> 
> Right, I've clarified this by adding the following paragraph:
> 
> "It is important to notice that these options (except the hpet one) are
> not available to HVM guests with a device model, and trying to set them
> will cause xl to exit with an error."
> 
> I've also fixed up the code in libxl__domain_build_info_setdefault to
> actually error out if a HVM guest with device model tries to set any of
> them.
> 
> > > diff --git a/tools/libxl/libxl_types.idl
> > > b/tools/libxl/libxl_types.idl
> > > index 92c95e5..8a21cda 100644
> > > --- a/tools/libxl/libxl_types.idl
> > > +++ b/tools/libxl/libxl_types.idl
> > > @@ -519,6 +519,12 @@ libxl_domain_build_info =
> > > Struct("domain_build_info",[
> > > ("serial_list",  libxl_st
> > > ring_list),
> > > ("rdm", libxl_rdm_reserve),
> > > ("rdm_mem_boundary_memkb",
> > > MemKB),
> > > +   ("lapic",libxl_de
> > > fbool),
> > > +   ("ioapic",   libxl_de
> > > fbool),
> > > +   ("rtc",  libxl_de
> > > fbool),
> > > +   ("power_management",
> > > libxl_defbool),
> > > +   ("pic",  libxl_de

Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Jan Beulich
>>> On 21.01.16 at 09:59,  wrote:
> uart2 needs clock IMX7D_UART2_ROOT_CLK from the ccm.
> passthrough uart2, hypervisor handles the reg and interrupts, that is 
> because
> hypervisor handles the memory map and the interrupt controller(GIC). But 
> here
> CCM is not handled by hypervisor, it is handled by Dom0.

This, I take it, describes the current state, which doesn't make clear
whether this is intentionally that way (and can't be changed), or
just happens to be that way because so far it didn't matter.

> For ARMV8 server products, I am not sure whether hypercall is better; but to 
> my case, it's not feasible to use hypercall.

Because of ...?

> Dom0 handles all the clocks, DomU just send request to Dom0 and ask Dom0 to
> enable/disable/set rate for a clock for the device. So I think it's okay
> for multipile parties, the clk subsystem in Dom0 can handle mutiple requests 
> even if the clock is shared.

I.e. if one party sets one rate, and later another party sets
a different rate, things are going to work fine? If so, why would
the different parties need control over the rate in the first place?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

2016-01-21 Thread Jan Beulich
>>> On 20.01.16 at 18:17,  wrote:

> 
> On 01/21/2016 01:07 AM, Jan Beulich wrote:
> On 20.01.16 at 16:29,  wrote:
>>> On 01/20/2016 07:20 PM, Jan Beulich wrote:
 To answer this I need to have my understanding of the partitioning
 being done by firmware confirmed: If that's the case, then "normal"
 means the part that doesn't get exposed as a block device (SSD).
 In any event there's no correlation to guest exposure here.
>>>
>>> Firmware does not manage NVDIMM. All the operations of nvdimm are handled
>>> by OS.
>>>
>>> Actually, there are lots of things we should take into account if we move
>>> the NVDIMM management to hypervisor:
>>> a) ACPI NFIT interpretation
>>>  A new ACPI table introduced in ACPI 6.0 is named NFIT which exports the
>>>  base information of NVDIMM devices which includes PMEM info, PBLK
>>>  info, nvdimm device interleave, vendor info, etc. Let me explain it one
>>>  by one.
>>>
>>>  PMEM and PBLK are two modes to access NVDIMM devices:
>>>  1) PMEM can be treated as NV-RAM which is directly mapped to CPU's 
>>> address
>>> space so that CPU can r/w it directly.
>>>  2) as NVDIMM has huge capability and CPU's address space is limited, 
>>> NVDIMM
>>> only offers two windows which are mapped to CPU's address space, 
>>> the data
>>> window and access window, so that CPU can use these two windows to 
>>> access
>>> the whole NVDIMM device.
>>
>> You fail to mention PBLK. The question above really was about what
> 
> The 2) is PBLK.
> 
>> entity controls which of the two modes get used (and perhaps for
>> which parts of the overall NVDIMM).
> 
> So i think the "normal" you mentioned is about PMEM. :)

Yes. But then - other than you said above - it still looks to me as
if the split between PMEM and PBLK is arranged for by firmware?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling

2016-01-21 Thread Wu, Feng

> > +spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> > +
> > +/*
> > + * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU
> was
> > + * removed from the blocking list while we are acquiring the lock.
> > + */
> > +if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> 
> With you wanting to deal with changes behind your back here,
> isn't there come kind of barrier needed between reading and using
> pi_block_cpu, such that the compiler won't convert the local
> variable accesses into multiple reads of
> v->arch.hvm_vmx.pi_block_cpu (which iiuc it is allowed to do)?
> 

Thinking about this more. Seems we call spin_lock_irqsave() before
using  ' v->arch.hvm_vmx.pi_block_cpu ', can spin_lock_irqsave()
used as the serializing instruction here? IIRIC, spin lock operations
can be used for this purpose, right?

Thanks,
Feng

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

2016-01-21 Thread Andrew Cooper
On 21/01/16 09:10, Xiao Guangrong wrote:
>
>
> On 01/21/2016 04:53 PM, Jan Beulich wrote:
> On 21.01.16 at 09:25,  wrote:
>>> On 01/21/2016 04:18 PM, Jan Beulich wrote:
 Yes. But then - other than you said above - it still looks to me as
 if the split between PMEM and PBLK is arranged for by firmware?
>>>
>>> Yes. But OS/Hypervisor is not excepted to dynamically change its
>>> configure
>>> (re-split),
>>> i,e, for PoV of OS/Hypervisor, it is static.
>>
>> Exactly, that has been my understanding. And hence the PMEM part
>> could be under the hypervisor's control, while the PBLK part could be
>> Dom0's responsibility.
>>
>
> I am not sure if i have understood your point. What your suggestion is
> that
> leave PMEM for hypervisor and all other parts (PBLK and _DSM handling) to
> Dom0? If yes, we should:
> a) handle hotplug in hypervisor (new PMEM add/remove) that causes
> hyperivsor
>interpret ACPI SSDT/DSDT.
> b) some _DSMs control PMEM so you should filter out these kind of
> _DSMs and
>handle them in hypervisor.
> c) hypervisor should mange PMEM resource pool and partition it to
> multiple
>VMs.

It is not possible for Xen to handle ACPI such as this.

There can only be one OSPM on a system, and 9/10ths of the functionality
needing it already lives in Dom0.

The only rational course of action is for Xen to treat both PBLK and
PMEM as "devices" and leave them in Dom0's hands.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 5/5] libxl: add options to enable/disable emulated devices

2016-01-21 Thread Roger Pau Monné
El 21/01/16 a les 10.39, Ian Campbell ha escrit:
> On Wed, 2016-01-20 at 19:33 +0100, Roger Pau Monné wrote:
>> El 20/01/16 a les 14.01, Ian Campbell ha escrit:
>>> On Wed, 2016-01-20 at 12:57 +0100, Roger Pau Monne wrote:
 Allow enabling or disabling emulated devices from the libxl domain
 configuration file. For HVM guests with a device model all the
 emulated
 devices are enabled. For HVM guests without a device model no devices
 are
 enabled by default, although they can be enabled using the options
 provided.
 The arbiter of whether a combination is posible or not is always Xen,
 libxl
 doesn't do any kind of check.

 This set of options is also propagated inside of the libxl migration
 record
 as part of the contents of the libxl_domain_build_info struct.
>>>
>>> ... and this is the real motivation for this change, not actually
>>> allowing
>>> users to control all this AIUI.
>>>
>>> Did you check that the fields updated using libxl_defbool_setdefault
>>> are
>>> actually updated in the JSON copy and therefore propagated to the other
>>> side of a migration as specific values and not as "pick a default"? I
>>> think
>>> we don't want these changing on migration. I think/hope all this was
>>> automatically handled by the work Wei did in the last release cycle.
>>
>> No, values populated by the {build/create}_info_setdefault functions are
>> not propagated, OTOH values manually set by the user in the config file
>> are indeed propagated. Do we have any guarantee that _setdefault is
>> always going to behave in the same way?
> 
> No, part of the purpose of defbool and the other "do the default" values is
> that we can evolve things over time.
>
>> If we don't have that guarantee I think this is already a bug, and we
>> should call _setdefault before sending the domain info to the other end.
>> In fact I have a patch that does exactly that, but I'm unsure if it's
>> needed because I don't know the policy regarding default values in libxl.
> 
> Wei, isn't this (turning the defaults into concrete values) supposed to be
> taken care of by the JSON mangling which you added?

Heh, I think you mean the JSON mangling added by Wei. In order to 
propagate the values filled by default in libxl_domain_config I had to 
add the following patch, which basically calls the _setdefault 
functions before converting the domain_config into JSON. I'm planning 
to make it part of this series in the next iteration:

---
commit b1b2cea4b61ce9bd05797d3dc5ff0f5fffccfd05
Author: Roger Pau Monne 
Date:   Wed Jan 20 19:06:50 2016 +0100

libxl: introduce libxl_domain_info_setdefault to the public API

The newly introduced function populates the libxl_domain_config with their
default values, just like it's done during domain creation.

This is needed so the libxl_domain_config sent to the restore side during
migration is accurate, since default values might change between libxl
versions.

Signed-off-by: Roger Pau Monné 
---
---
NB: I'm unsure whether this is a bug or not. IMHO I think it is, because
there's no guarantee that the default values will stay the same between
libxl versions, so a domain created with an old libxl version might see
differences when migrated to a newer libxl version.

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 157f07c..70bb6e1 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -886,6 +886,15 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, 
libxl_mac *src);
  */
 #define LIBXL_HAVE_VGA_INTERFACE_TYPE_UNKNOWN 1
 
+/*
+ * LIBXL_HAVE_DOMAIN_INFO_SETDEFAULT
+ *
+ * In the case that LIBXL_HAVE_DOMAIN_INFO_SETDEFAULT is set libxl
+ * provides the libxl_domain_info_setdefault function that can be used
+ * to set the libxl_domain_config fields to their default values.
+ */
+#define LIBXL_HAVE_DOMAIN_INFO_SETDEFAULT 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
@@ -1202,6 +1211,9 @@ int libxl_domain_soft_reset(libxl_ctx *ctx,
 void libxl_domain_config_init(libxl_domain_config *d_config);
 void libxl_domain_config_dispose(libxl_domain_config *d_config);
 
+/* Fill the libxl_domain_config struct with their default values. */
+int libxl_domain_info_setdefault(libxl_ctx *ctx, libxl_domain_config 
*d_config);
+
 /*
  * Retrieve domain configuration and filled it in d_config. The
  * returned configuration can be used to rebuild a domain. It only
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c7700a7..c988c2e 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -886,17 +886,10 @@ static void initiate_domain_create(libxl__egc *egc,
 goto error_out;
 }
 
-ret = libxl__domain_create_info_setdefault(gc, _config->c_info);
-if (ret) {
-LOG(ERROR, 

Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 16:59 +0800, Peng Fan wrote:
> 
> To platform device of ARM, hypervisor is responsible for the mapping
> between machine address and guest physical address, also responsible
> for the irq mapping.
> 
> But to embedded ARM SoC, the hardware clk IP is handled in Dom0.

Arguably Xen ought to be the one to do this, but we have determined
(rightly, I think) that doing so for the entirely clk tree of an SoC would
involve importing an unmanageable amount of code into Xen[0].

In the meantime we defer this to Dom0.

I wonder though if it would be possible to manage the clocks for a
passthrough device from the toolstack, i.e. is there a sysfs node where one
can say "keep the clock for this device enabled (at xMhz) even though you
think the device is unused"?

If so (or if it can be easily added) then the toolstack would just need to
manage that value as part of the passthrough process rather than having the
frontend do it via a PV protocol.

Ian.

[0] I'd like at some point for Xen to gain sufficient knowledge of clock IP
to minimally control things like the CPU and DRAM clocks etc, but that
needn't imply full clock tree support and would hopefully be a manageable
amount of code, which would (hopefully) mostly be about trapping and
emulating writes to one or two clock controllers per platform and
arbitrating a small set of bits (while allowing dom0 to control the
others). This is at least a medium if not long term idea though, and for
all I know it might turn out to be unworkable in practice.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 10:25 +, Wei Liu wrote:
> On Thu, Jan 21, 2016 at 10:12:27AM +, Ian Campbell wrote:
> [...]
> > > I've asked the reporter to send logs for the 4.4 case to xen-devel.
> > 
> > User confirmed[0] that 4.4 is actually OK.
> > 
> > Did someone request stable backports yet, or shall I do so?
> > 
> 
> I vaguely remember we requested backport for relevant patches long time
> ago, but I admit I have lost track. So it wouldn't hurt if you do it
> again.

So I think we'd be looking for:

32a8440 xen-netfront: respect user provided max_queues
4c82ac3 xen-netback: respect user provided max_queues
ca88ea1 xen-netfront: update num_queues to real created

which certainly resolves things such that the workarounds work, and I think
will also fix the default case such that it works with up to 32 vcpus
(although it will consume all the grants and only get 31/32 queues).

Does that sound correct?

As Annie said, we may still want to consider what a sensible default max
queues would be.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

2016-01-21 Thread Xiao Guangrong



On 01/21/2016 04:53 PM, Jan Beulich wrote:

On 21.01.16 at 09:25,  wrote:

On 01/21/2016 04:18 PM, Jan Beulich wrote:

Yes. But then - other than you said above - it still looks to me as
if the split between PMEM and PBLK is arranged for by firmware?


Yes. But OS/Hypervisor is not excepted to dynamically change its configure
(re-split),
i,e, for PoV of OS/Hypervisor, it is static.


Exactly, that has been my understanding. And hence the PMEM part
could be under the hypervisor's control, while the PBLK part could be
Dom0's responsibility.



I am not sure if i have understood your point. What your suggestion is that
leave PMEM for hypervisor and all other parts (PBLK and _DSM handling) to
Dom0? If yes, we should:
a) handle hotplug in hypervisor (new PMEM add/remove) that causes hyperivsor
   interpret ACPI SSDT/DSDT.
b) some _DSMs control PMEM so you should filter out these kind of _DSMs and
   handle them in hypervisor.
c) hypervisor should mange PMEM resource pool and partition it to multiple
   VMs.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH LIBVIRT v2] libxl: Support cmdline= in xl config files

2016-01-21 Thread Ian Campbell
On Wed, 2016-01-20 at 11:16 -0700, Jim Fehlig wrote:
> Ian Campbell wrote:
> > ... and consolidate the cmdline/extra/root parsing to facilitate doing
> > so.
> > 
> > The logic is the same as xl's parse_cmdline from the current xen.git
> > master
> > branch (e6f0e099d2c17de47fd86e817b1998db903cab61).
> > 
> > In order to introduce a use of VIR_WARN for logging I had to add
> > virerror.h and VIR_LOG_INIT.
> > 
> > Signed-off-by: Ian Campbell 
> > ---
> > NB: I was unsure if I was supposed to change VIR_FROM_NONE into
> > VIR_FROM_XEN, so I didn't (but will on advice).
> 
> It seems some of the VIR_FROM_ needs cleanup throughout the various 
> Xen-related
> files. We already have VIR_FROM_SEXPR and VIR_FROM_XENXM.
> src/xenconfig/xen_sxpr.c should use the former, src/xenconfig/xen_xm.c the
> latter. And given the pattern, I think we should introduce VIR_FROM_XENXL to
> cover xl.cfg related code. I can take care of this.

I've acked you patches about this.

> > +static int xenParseCmdline(virConfPtr conf, char **r_cmdline)
> > +{
> > +char *cmdline;
> 
> One too many initializers removed :-).
> 
> > +const char *root, *extra, *buf;
> > +
> > +if (xenConfigGetString(conf, "cmdline", , NULL) < 0)
> > +return -1;
> > +
> > +if (xenConfigGetString(conf, "root", , NULL) < 0)
> > +return -1;
> > +
> > +if (xenConfigGetString(conf, "extra", , NULL) < 0)
> > +return -1;
> > +
> > +if (buf) {
> > +if (VIR_STRDUP(cmdline, buf) < 0)
> > +return -1;
> > +if (root || extra)
> > +VIR_WARN("ignoring root= and extra= in favour of
> > cmdline=");
> > +} else {
> > +if (root && extra) {
> > +if (virAsprintf(, "root=%s %s", root, extra) < 0)
> > +return -1;
> > +} else if (root) {
> > +if (virAsprintf(, "root=%s", root) < 0)
> > +return -1;
> > +} else if (extra) {
> > +if (VIR_STRDUP(cmdline, extra) < 0)
> > +return -1;
> > +}
> > +}
> > +
> > +*r_cmdline = cmdline;
> 
> If none of cmdline, extra, or root are set in the config, def->os.cmdline gets
> set to garbage. xlconfigtest explodes when running 'make check'.

I even thought about this quite carefully but missed this case :-/

Would you prefer and = NULL on the decl or an else cmdline = NULL at the
end of that chain?

> Sorry for not mentioning it in my previous review, but we should add a
> test for cmdline, root, and extra.

Ack, will do so for v3.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] VPMU backports for 4.6

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 00:35 -0700, Jan Beulich wrote:
> > > > On 20.01.16 at 18:36,  wrote:
> > (As a side --- XSA-163 says that VPMU is "unsupported security-wise".
> > Do 
> > we make any distinction between a feature being generally or 
> > security-wise unsupported?)
> 
> Not sure; considering stable tree maintenance one might imply
> general support to be a superset of security support. But I can
> easily see other views on that being equally possible/legitimate.

I think I would take the view that if it is Unsupported then it is
Unsupported until we explicitly say otherwise.

That doesn't mean we can't backport fixes for issues which people
(presumably those who are working on taking the feature from Unsupported to
Supported in unstable) identify and would like fixed, assuming they don't
impact other Supported features.

I think it is OK for folks working on a currently unsupported feature in
unstable to want to fix known issues even in stable releases, particularly
(although not exclusively) those which would block people from doing
further testing.

Fixing those issues expands the amount of feedback which can be gathered
from users who might be willing to test the feature, but not use a
development version, which helps the developers find the next bug after the
one which was fixed, which will help to move that feature forwards in the
development branch too.

It's also possible that a feature might improve sufficiently that we would
consider it supported from a particular point release. We've done so n the
past, I think, although not always successfully, I'm thinking xsave in and
around the 4.0.x releases.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

2016-01-21 Thread Jan Beulich
>>> On 21.01.16 at 10:10,  wrote:
> On 01/21/2016 04:53 PM, Jan Beulich wrote:
> On 21.01.16 at 09:25,  wrote:
>>> On 01/21/2016 04:18 PM, Jan Beulich wrote:
 Yes. But then - other than you said above - it still looks to me as
 if the split between PMEM and PBLK is arranged for by firmware?
>>>
>>> Yes. But OS/Hypervisor is not excepted to dynamically change its configure
>>> (re-split),
>>> i,e, for PoV of OS/Hypervisor, it is static.
>>
>> Exactly, that has been my understanding. And hence the PMEM part
>> could be under the hypervisor's control, while the PBLK part could be
>> Dom0's responsibility.
>>
> 
> I am not sure if i have understood your point. What your suggestion is that
> leave PMEM for hypervisor and all other parts (PBLK and _DSM handling) to
> Dom0? If yes, we should:
> a) handle hotplug in hypervisor (new PMEM add/remove) that causes hyperivsor
> interpret ACPI SSDT/DSDT.

Why would this be different from ordinary memory hotplug, where
Dom0 deals with the ACPI CA interaction, notifying Xen about the
added memory?

> b) some _DSMs control PMEM so you should filter out these kind of _DSMs and
> handle them in hypervisor.

Not if (see above) following the model we currently have in place.

> c) hypervisor should mange PMEM resource pool and partition it to multiple
> VMs.

Yes.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Wei Liu
On Thu, Jan 21, 2016 at 10:12:27AM +, Ian Campbell wrote:
[...]
> > I've asked the reporter to send logs for the 4.4 case to xen-devel.
> 
> User confirmed[0] that 4.4 is actually OK.
> 
> Did someone request stable backports yet, or shall I do so?
> 

I vaguely remember we requested backport for relevant patches long time
ago, but I admit I have lost track. So it wouldn't hurt if you do it
again.

Wei.

> Ian.
> 
> [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00110.html

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

2016-01-21 Thread Xiao Guangrong



On 01/21/2016 04:18 PM, Jan Beulich wrote:

On 20.01.16 at 18:17,  wrote:




On 01/21/2016 01:07 AM, Jan Beulich wrote:

On 20.01.16 at 16:29,  wrote:

On 01/20/2016 07:20 PM, Jan Beulich wrote:

To answer this I need to have my understanding of the partitioning
being done by firmware confirmed: If that's the case, then "normal"
means the part that doesn't get exposed as a block device (SSD).
In any event there's no correlation to guest exposure here.


Firmware does not manage NVDIMM. All the operations of nvdimm are handled
by OS.

Actually, there are lots of things we should take into account if we move
the NVDIMM management to hypervisor:
a) ACPI NFIT interpretation
  A new ACPI table introduced in ACPI 6.0 is named NFIT which exports the
  base information of NVDIMM devices which includes PMEM info, PBLK
  info, nvdimm device interleave, vendor info, etc. Let me explain it one
  by one.

  PMEM and PBLK are two modes to access NVDIMM devices:
  1) PMEM can be treated as NV-RAM which is directly mapped to CPU's address
 space so that CPU can r/w it directly.
  2) as NVDIMM has huge capability and CPU's address space is limited, 
NVDIMM
 only offers two windows which are mapped to CPU's address space, the 
data
 window and access window, so that CPU can use these two windows to 
access
 the whole NVDIMM device.


You fail to mention PBLK. The question above really was about what


The 2) is PBLK.


entity controls which of the two modes get used (and perhaps for
which parts of the overall NVDIMM).


So i think the "normal" you mentioned is about PMEM. :)


Yes. But then - other than you said above - it still looks to me as
if the split between PMEM and PBLK is arranged for by firmware?


Yes. But OS/Hypervisor is not excepted to dynamically change its configure 
(re-split),
i,e, for PoV of OS/Hypervisor, it is static.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling

2016-01-21 Thread Jan Beulich
>>> On 21.01.16 at 10:05,  wrote:
>> > +spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
>> flags);
>> > +
>> > +/*
>> > + * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU
>> was
>> > + * removed from the blocking list while we are acquiring the lock.
>> > + */
>> > +if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
>> 
>> With you wanting to deal with changes behind your back here,
>> isn't there come kind of barrier needed between reading and using
>> pi_block_cpu, such that the compiler won't convert the local
>> variable accesses into multiple reads of
>> v->arch.hvm_vmx.pi_block_cpu (which iiuc it is allowed to do)?
> 
> Thinking about this more. Seems we call spin_lock_irqsave() before
> using  ' v->arch.hvm_vmx.pi_block_cpu ', can spin_lock_irqsave()
> used as the serializing instruction here? IIRIC, spin lock operations
> can be used for this purpose, right?

The way I placed the comment may have been misleading, and I'm
sorry for that. The comment really is about the use of the local
variable in acquiring the lock (I may have got misguided by local
variable and field name being the same into thinking this latter
access also checked the local variable).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

2016-01-21 Thread Jan Beulich
>>> On 21.01.16 at 10:29,  wrote:
> On 21/01/16 09:10, Xiao Guangrong wrote:
>> I am not sure if i have understood your point. What your suggestion is
>> that
>> leave PMEM for hypervisor and all other parts (PBLK and _DSM handling) to
>> Dom0? If yes, we should:
>> a) handle hotplug in hypervisor (new PMEM add/remove) that causes
>> hyperivsor
>>interpret ACPI SSDT/DSDT.
>> b) some _DSMs control PMEM so you should filter out these kind of
>> _DSMs and
>>handle them in hypervisor.
>> c) hypervisor should mange PMEM resource pool and partition it to
>> multiple
>>VMs.
> 
> It is not possible for Xen to handle ACPI such as this.
> 
> There can only be one OSPM on a system, and 9/10ths of the functionality
> needing it already lives in Dom0.
> 
> The only rational course of action is for Xen to treat both PBLK and
> PMEM as "devices" and leave them in Dom0's hands.

See my other reply: Why would this be different from "ordinary"
memory hotplug?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

2016-01-21 Thread Jan Beulich
>>> On 21.01.16 at 09:25,  wrote:
> On 01/21/2016 04:18 PM, Jan Beulich wrote:
>> Yes. But then - other than you said above - it still looks to me as
>> if the split between PMEM and PBLK is arranged for by firmware?
> 
> Yes. But OS/Hypervisor is not excepted to dynamically change its configure 
> (re-split),
> i,e, for PoV of OS/Hypervisor, it is static.

Exactly, that has been my understanding. And hence the PMEM part
could be under the hypervisor's control, while the PBLK part could be
Dom0's responsibility.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Ian Campbell
On Wed, 2016-01-20 at 15:16 +, Ian Campbell wrote:
> On Wed, 2016-01-20 at 10:10 -0500, Boris Ostrovsky wrote:
> > On 01/20/2016 10:02 AM, David Vrabel wrote:
> > > On 20/01/16 14:52, Ian Campbell wrote:
> > > > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:
> > > > > On 01/20/2016 07:23 AM, Ian Campbell wrote:
> > > > > > There have been a few reports recently[0] which relate to a
> > > > > > failure of
> > > > > > netfront to allocate sufficient grant refs for all the queues:
> > > > > > 
> > > > > > [0.533589] xen_netfront: can't alloc rx grant refs
> > > > > > [0.533612] net eth0: only created 31 queues
> > > > > > 
> > > > > > Which can be worked around by increasing the number of grants
> > > > > > on
> > > > > > the
> > > > > > hypervisor command line or by limiting the number of queues
> > > > > > permitted
> > > > > > by
> > > > > > either back or front using a module param (which was broken but
> > > > > > is now
> > > > > > fixed on both sides, but I'm not sure it has been backported
> > > > > > everywhere
> > > > > > such that it is a reliable thing to always tell users as a
> > > > > > workaround).
> > > > > > 
> > > > > > Is there any plan to do anything about the default/out of the
> > > > > > box
> > > > > > experience? Either limiting the number of queues or making both
> > > > > > ends
> > > > > > cope
> > > > > > more gracefully with failure to create some queues (or both)
> > > > > > might be
> > > > > > sufficient?
> > > > > > 
> > > > > > I think the crash after the above in the first link at [0] is
> > > > > > fixed? I
> > > > > > think that was the purpose of ca88ea1247df "xen-netfront:
> > > > > > update
> > > > > > num_queues
> > > > > > to real created" which was in 4.3.
> > > > > I think ca88ea1247df is the solution --- it will limit the number
> > > > > of
> > > > > queues.
> > > > That's in 4.4, which the first link at [0] claimed to have tested.
> > > > I
> > > > can
> > > > see this fixing the crash, but does it really fix the "actually
> > > > works
> > > > with
> > > > less queues than it tried to get" issue?
> > 
> > That's what I thought it does too. I didn't notice that 4.4 was tested 
> > as well, so maybe not.
> 
> I've asked the reporter to send logs for the 4.4 case to xen-devel.

User confirmed[0] that 4.4 is actually OK.

Did someone request stable backports yet, or shall I do so?

Ian.

[0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00110.html

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] libxl: dispose libxl_dominfo after libxl_domain_info()

2016-01-21 Thread Joao Martins
As suggested in a previous thread [0] this patch adds some missing calls
to libxl_dominfo_{init,dispose} when doing some of the libxl_domain_info
operations which would otherwise lead to memory leaks.

[0]
https://www.redhat.com/archives/libvir-list/2015-September/msg00519.html

Signed-off-by: Joao Martins 
---
Changes since v1:
 - Add missing libxl_dominfo_init() on MemoryStats and GetInfo()
---
 src/libxl/libxl_driver.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
index 73ed448..560f2a6 100644
--- a/src/libxl/libxl_driver.c
+++ b/src/libxl/libxl_driver.c
@@ -358,6 +358,8 @@ libxlReconnectDomain(virDomainObjPtr vm,
 
 virObjectLock(vm);
 
+libxl_dominfo_init(_info);
+
 /* Does domain still exist? */
 rc = libxl_domain_info(cfg->ctx, _info, vm->def->id);
 if (rc == ERROR_INVAL) {
@@ -389,11 +391,13 @@ libxlReconnectDomain(virDomainObjPtr vm,
 /* Enable domain death events */
 libxl_evenable_domain_death(cfg->ctx, vm->def->id, 0, >deathW);
 
+libxl_dominfo_dispose(_info);
 virObjectUnlock(vm);
 virObjectUnref(cfg);
 return 0;
 
  out:
+libxl_dominfo_dispose(_info);
 libxlDomainCleanup(driver, vm);
 if (!vm->persistent)
 virDomainObjListRemoveLocked(driver->domains, vm);
@@ -1589,6 +1593,8 @@ libxlDomainGetInfo(virDomainPtr dom, virDomainInfoPtr 
info)
 info->memory = vm->def->mem.cur_balloon;
 info->maxMem = virDomainDefGetMemoryActual(vm->def);
 } else {
+libxl_dominfo_init(_info);
+
 if (libxl_domain_info(cfg->ctx, _info, vm->def->id) != 0) {
 virReportError(VIR_ERR_INTERNAL_ERROR,
_("libxl_domain_info failed for domain '%d'"),
@@ -1598,6 +1604,8 @@ libxlDomainGetInfo(virDomainPtr dom, virDomainInfoPtr 
info)
 info->cpuTime = d_info.cpu_time;
 info->memory = d_info.current_memkb;
 info->maxMem = d_info.max_memkb;
+
+libxl_dominfo_dispose(_info);
 }
 
 info->state = virDomainObjGetState(vm, NULL);
@@ -4791,6 +4799,7 @@ libxlDomainMemoryStats(virDomainPtr dom,
 
 virCheckFlags(0, -1);
 
+libxl_dominfo_init(_info);
 cfg = libxlDriverConfigGet(driver);
 
 if (!(vm = libxlDomObjFromDomain(dom)))
@@ -4822,13 +4831,12 @@ libxlDomainMemoryStats(virDomainPtr dom,
 
 ret = i;
 
-libxl_dominfo_dispose(_info);
-
  endjob:
 if (!libxlDomainObjEndJob(driver, vm))
 vm = NULL;
 
  cleanup:
+libxl_dominfo_dispose(_info);
 if (vm)
 virObjectUnlock(vm);
 virObjectUnref(cfg);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxl: dispose libxl_dominfo after libxl_domain_info()

2016-01-21 Thread Joao Martins


On 01/21/2016 01:41 AM, Jim Fehlig wrote:
> Joao Martins wrote:
>> As suggested in a previous thread [0] this patch adds some missing calls
>> to libxl_dominfo_dispose when doing some of the libxl_domain_info
>> operations which would otherwise lead to memory leaks.
>>
>> [0]
>> https://www.redhat.com/archives/libvir-list/2015-September/msg00519.html
>>
>> Signed-off-by: Joao Martins 
>> ---
>>  src/libxl/libxl_driver.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
>> index 73ed448..a449730 100644
>> --- a/src/libxl/libxl_driver.c
>> +++ b/src/libxl/libxl_driver.c
>> @@ -358,6 +358,8 @@ libxlReconnectDomain(virDomainObjPtr vm,
>>  
>>  virObjectLock(vm);
>>  
>> +libxl_dominfo_init(_info);
>> +
>>  /* Does domain still exist? */
>>  rc = libxl_domain_info(cfg->ctx, _info, vm->def->id);
>>  if (rc == ERROR_INVAL) {
>> @@ -389,11 +391,13 @@ libxlReconnectDomain(virDomainObjPtr vm,
>>  /* Enable domain death events */
>>  libxl_evenable_domain_death(cfg->ctx, vm->def->id, 0, >deathW);
>>  
>> +libxl_dominfo_dispose(_info);
>>  virObjectUnlock(vm);
>>  virObjectUnref(cfg);
>>  return 0;
>>  
>>   out:
>> +libxl_dominfo_dispose(_info);
>>  libxlDomainCleanup(driver, vm);
>>  if (!vm->persistent)
>>  virDomainObjListRemoveLocked(driver->domains, vm);
>> @@ -1598,6 +1602,8 @@ libxlDomainGetInfo(virDomainPtr dom, virDomainInfoPtr 
>> info)
>>  info->cpuTime = d_info.cpu_time;
>>  info->memory = d_info.current_memkb;
>>  info->maxMem = d_info.max_memkb;
>> +
>> +libxl_dominfo_dispose(_info);
> 
> There should be a corresponding libxl_dominfo_init(). While looking at this, I
> audited all uses of libxl_domain_info() and found another missing _init(). 
> With
> the below diff squashed in, I think this code is finally correct :-). Do you 
> agree?
> 
Ah, missed the dominfo_init() calls. Yeah, I definitively agree. I just sent it
over v2.

Joao

> Regards,
> Jim
> 
> diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
> index 21c7610..4a9134e 100644
> --- a/src/libxl/libxl_driver.c
> +++ b/src/libxl/libxl_driver.c
> @@ -1593,6 +1593,8 @@ libxlDomainGetInfo(virDomainPtr dom, virDomainInfoPtr 
> info)
>  info->memory = vm->def->mem.cur_balloon;
>  info->maxMem = virDomainDefGetMemoryActual(vm->def);
>  } else {
> +libxl_dominfo_init(_info);
> +
>  if (libxl_domain_info(cfg->ctx, _info, vm->def->id) != 0) {
>  virReportError(VIR_ERR_INTERNAL_ERROR,
> _("libxl_domain_info failed for domain '%d'"),
> @@ -4797,6 +4799,7 @@ libxlDomainMemoryStats(virDomainPtr dom,
> 
>  virCheckFlags(0, -1);
> 
> +libxl_dominfo_init(_info);
>  cfg = libxlDriverConfigGet(driver);
> 
>  if (!(vm = libxlDomObjFromDomain(dom)))
> @@ -4828,13 +4831,12 @@ libxlDomainMemoryStats(virDomainPtr dom,
> 
>  ret = i;
> 
> -libxl_dominfo_dispose(_info);
> -
>   endjob:
>  if (!libxlDomainObjEndJob(driver, vm))
>  vm = NULL;
> 
>   cleanup:
> +libxl_dominfo_dispose(_info);
>  if (vm)
>  virObjectUnlock(vm);
>  virObjectUnref(cfg);
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] cleancache: constify cleancache_ops structure

2016-01-21 Thread David Vrabel
On 23/12/15 21:06, Julia Lawall wrote:
> The cleancache_ops structure is never modified, so declare it as const.
> 
> This also removes the __read_mostly declaration on the cleancache_ops
> variable declaration, since it seems redundant with const.
> 
> Done with the help of Coccinelle.
> 
> Signed-off-by: Julia Lawall 
> 
> ---
> 
> Not sure that the __read_mostly change is correct.  Does it apply to the
> variable, or to what the variable points to?

The variable, so...

> --- a/mm/cleancache.c
> +++ b/mm/cleancache.c
> @@ -22,7 +22,7 @@
>   * cleancache_ops is set by cleancache_register_ops to contain the pointers
>   * to the cleancache "backend" implementation functions.
>   */
> -static struct cleancache_ops *cleancache_ops __read_mostly;
> +static const struct cleancache_ops *cleancache_ops;

...you want to retain the __read_mostly here.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable bisection] complete test-amd64-i386-xl-qemut-win7-amd64

2016-01-21 Thread osstest service owner
branch xen-unstable
xenbranch xen-unstable
job test-amd64-i386-xl-qemut-win7-amd64
testid windows-install

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  xen git://xenbits.xen.org/xen.git
  Bug introduced:  d23da94b123a0d9326408c376e5735697bd2d96a
  Bug not present: 83ae0bb2260c71e8dcec538dd477f7253aaef327
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/78681/


  commit d23da94b123a0d9326408c376e5735697bd2d96a
  Author: Roger Pau Monné 
  Date:   Thu Jan 14 10:43:36 2016 +0100

  Revert "libxc: create an initial FPU state for HVM guests"

  This reverts commit d64dbbcc7c9934a46126c59d78536235908377ad:

  Xen always set the FPU as initialized when loading a HVM context, so libxc
  has to provide a valid FPU context when setting the CPU registers.

  This was a stop-gap measure in order to unblock OSSTest Windows 7 failures
  while a proper fix for the HVM CPU save/restore is being worked on.

  This can now be reverted because a proper fix is in place and we can 
signal
  in the save record whether the FPU is initialized or not.

  Signed-off-by: Roger Pau Monné 
  Reviewed-by: Andrew Cooper 
  Acked-by: Wei Liu 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-unstable/test-amd64-i386-xl-qemut-win7-amd64.windows-install.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/xen-unstable/test-amd64-i386-xl-qemut-win7-amd64.windows-install
 --summary-out=tmp/78681.bisection-summary --basis-template=77892 
--blessings=real,real-bisect xen-unstable test-amd64-i386-xl-qemut-win7-amd64 
windows-install
Searching for failure / basis pass:
 78525 fail [host=pinot0] / 78008 ok.
Failure / basis pass flights: 78525 / 78008
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest 5d7b0fcc26d66db767a477574effc764022c19ac 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
569eac99e8ddccd15fe78e8a3af5622afe780e3b 
f165e581d9a6f7cf81aa7496d3eee1e31212c8ad 
0c2f1283486f953604a282188a06e2db143f475d
Basis pass 5d7b0fcc26d66db767a477574effc764022c19ac 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
569eac99e8ddccd15fe78e8a3af5622afe780e3b 
f165e581d9a6f7cf81aa7496d3eee1e31212c8ad 
20c8f1a8a5fd61cb6f0ba6f3c3b3d567b1765116
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#5d7b0fcc26d66db767a477574effc764022c19ac-5d7b0fcc26d66db767a477574effc764022c19ac
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/qemu-xen-traditional.git#569eac99e8ddccd15fe78e8a3af5622afe780e3b-569eac99e8ddccd15fe78e8a3af5622afe780e3b
 
git://xenbits.xen.org/qemu-xen.git#f165e581d9a6f7cf81aa7496d3eee1e31212c8ad-f165e581d9a6f7cf81aa7496d3eee1e31212c8ad
 
git://xenbits.xen.org/xen.git#20c8f1a8a5fd61cb6f0ba6f3c3b3d567b1765116-0c2f1283486f953604a282188a06e2db143f475d
Loaded 1002 nodes in revision graph
Searching for test results:
 77892 pass irrelevant
 78008 pass 5d7b0fcc26d66db767a477574effc764022c19ac 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
569eac99e8ddccd15fe78e8a3af5622afe780e3b 
f165e581d9a6f7cf81aa7496d3eee1e31212c8ad 
20c8f1a8a5fd61cb6f0ba6f3c3b3d567b1765116
 77945 pass 5d7b0fcc26d66db767a477574effc764022c19ac 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
569eac99e8ddccd15fe78e8a3af5622afe780e3b 
f165e581d9a6f7cf81aa7496d3eee1e31212c8ad 
20c8f1a8a5fd61cb6f0ba6f3c3b3d567b1765116
 78129 fail 5d7b0fcc26d66db767a477574effc764022c19ac 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
569eac99e8ddccd15fe78e8a3af5622afe780e3b 
f165e581d9a6f7cf81aa7496d3eee1e31212c8ad 
7167222b15dc661ff0b44cc93d558f6bb4ff6492
 78159 fail 5d7b0fcc26d66db767a477574effc764022c19ac 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
569eac99e8ddccd15fe78e8a3af5622afe780e3b 
f165e581d9a6f7cf81aa7496d3eee1e31212c8ad 
7167222b15dc661ff0b44cc93d558f6bb4ff6492
 78210 fail 5d7b0fcc26d66db767a477574effc764022c19ac 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
569eac99e8ddccd15fe78e8a3af5622afe780e3b 
f165e581d9a6f7cf81aa7496d3eee1e31212c8ad 
7a1d1becf020f1fa511692bd97d50402588ef28d
 78343 fail 5d7b0fcc26d66db767a477574effc764022c19ac 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 

Re: [Xen-devel] Error booting Xen

2016-01-21 Thread Dario Faggioli
On Wed, 2016-01-20 at 03:06 -0700, Jan Beulich wrote:
> > > > On 19.01.16 at 20:55,  wrote:
> > 
> > $ 'addr2line -e xen-syms 82d0801c1cce' returns
> > 'xen/xen/arch/x86/xstate.c:387' which again points to
> > xsave. Also, adding 'xsave=0' makes it boot just fine.
> 
> Ah, I think I see the issue: We're zeroing the entire save area in
> the fixup code, which will make XRSTORS fault unconditionally.
> Shuai, would you please look into possible ways of fixing this?
> Just setting the compressed flag may not be enough, and falling
> back to plain XRSTOR doesn't seem to be an option either - I'm in
> particular worried that the current model of zeroing everything
> is bogus with the growing number of different components which
> get loaded here.
> 
> But of course another question then is why the XRSTORS faults
> in the first place. I guess we'll need a debugging patch to dump
> the full state to understand that.
> 
If someone can produce and send such patch, I'm sure Harmandeep will be
happy to give it a try on her hardware.

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

2016-01-21 Thread Jan Beulich
>>> On 21.01.16 at 15:01,  wrote:
> On 01/21/16 03:25, Jan Beulich wrote:
>> >>> On 21.01.16 at 10:10,  wrote:
>> > b) some _DSMs control PMEM so you should filter out these kind of _DSMs and
>> > handle them in hypervisor.
>> 
>> Not if (see above) following the model we currently have in place.
>>
> 
> You mean let dom0 linux evaluates those _DSMs and interact with
> hypervisor if necessary (e.g. XENPF_mem_hotadd for memory hotplug)?

Yes.

>> > c) hypervisor should mange PMEM resource pool and partition it to multiple
>> > VMs.
>> 
>> Yes.
>>
> 
> But I Still do not quite understand this part: why must pmem resource
> management and partition be done in hypervisor?

Because that's where memory management belongs. And PMEM,
other than PBLK, is just another form of RAM.

> I mean if we allow the following steps of operations (for example)
> (1) partition pmem in dom 0
> (2) get address and size of each partition (part_addr, part_size)
> (3) call a hypercall like nvdimm_memory_mapping(d, part_addr, part_size, 
> gpfn) to
> map a partition to the address gpfn in dom d.
> Only the last step requires hypervisor. Would anything be wrong if we
> allow above operations?

The main issue is that this would imo be a layering violation. I'm
sure it can be made work, but that doesn't mean that's the way
it ought to work.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

2016-01-21 Thread Andrew Cooper
On 21/01/16 13:55, Jan Beulich wrote:

 I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
 disabled by default even in debug builds.

 There should not be an ABI difference between release and "normal" debug
 builds.
>>> Well, I see your point, but as said above I'm not convinced
>>> disabling all that code is the right solution. In fact, what you
>>> propose is not far away from removing that code altogether.
>> The two bits are only used for specialised debugging.  They should be
>> relegated to people doing specific debugging, and not interfere with the
>> overwhelming majority of cases where Xen doesn't need to use any
>> software available PTE bits.
> Your repeated claim that _PAGE_GUEST_KERNEL is purely debugging only
> makes we wonder how you would mean to adjust adjust_guest_l1e() with
> that flag gone (most notably the last of its if()-s).

diff --git a/xen/arch/x86/mm.c
b/xen/arch/x86/mm.c 
 

index b81d1fd..46ef5ce
100644  

 

---
a/xen/arch/x86/mm.c 



+++
b/xen/arch/x86/mm.c 



@@ -1060,10 +1060,11 @@
get_page_from_l4e(  


  == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)
) 
\   
  

 MEM_LOG("Global bit is set to kernel page
%lx", 
\   
  


l1e_get_pfn((pl1e)));   
\   
  

-if ( !(l1e_get_flags((pl1e)) & _PAGE_USER)
)
\   
  

-l1e_add_flags((pl1e),
(_PAGE_GUEST_KERNEL|_PAGE_USER)); 
\   
  

-if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL)
)
\   
  

-l1e_add_flags((pl1e),
(_PAGE_GLOBAL|_PAGE_USER));   
\   
  

+if ( l1e_get_flags((pl1e)) & _PAGE_USER
)   
\   
  

+l1e_add_flags((pl1e),
_PAGE_GLOBAL);
\   
  

+   
else
\   
  

+l1e_remove_flags((pl1e),
_PAGE_GLOBAL); 
\   
  

+l1e_add_flags((pl1e),
_PAGE_USER);  
\   
  


}   
\   
  

 } while ( 0 )


_PAGE_GUEST_KERNEL isn't in the ABI, which means that the 2nd if() is
the only piece of code which validly sets it.

Read-modify-write operations already don't function correctly as
_PAGE_GUEST_KERNEL is a hidden saturating bit from the guests point of view.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

2016-01-21 Thread David Vrabel
On 21/01/16 13:17, Andrew Cooper wrote:
> 
> As we see with the Protection Key feature, newer hardware feature start
> using bits which were previously software available, and we absolutely
> don't want to be in a position where our ABI prevents us from ever
> supporting a new feature.

I don't see a problem in not supporting new features in /PV/ guests.

I think we should document the PV ABI as-is. i.e., that these two PTE
bits are not available for PV guest use.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv5 1/3] rwlock: Add per-cpu reader-writer lock infrastructure

2016-01-21 Thread Ian Campbell
On Fri, 2015-12-18 at 16:08 +, Malcolm Crossley wrote:
> diff --git a/xen/include/asm-arm/percpu.h b/xen/include/asm-arm/percpu.h
> index 71e7649..c308a56 100644
> --- a/xen/include/asm-arm/percpu.h
> +++ b/xen/include/asm-arm/percpu.h
> @@ -27,6 +27,11 @@ void percpu_init_areas(void);
>  #define __get_cpu_var(var) \
>  (*RELOC_HIDE(_cpu__##var, READ_SYSREG(TPIDR_EL2)))
>  
> +#define per_cpu_ptr(var, cpu)  \
> +(*RELOC_HIDE(, __per_cpu_offset[cpu]))
> +#define __get_cpu_ptr(var) \
> +(*RELOC_HIDE(, READ_SYSREG(TPIDR_EL2)))
> +
>  #define DECLARE_PER_CPU(type, name) extern __typeof__(type)
> per_cpu__##name
>  

For ARM: Acked-by :Ian Campbell 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Error booting Xen

2016-01-21 Thread Harmandeep Kaur
On Thu, Jan 21, 2016 at 8:44 PM, Dario Faggioli
 wrote:
> On Wed, 2016-01-20 at 03:06 -0700, Jan Beulich wrote:
>> > > > On 19.01.16 at 20:55,  wrote:
>> >
>> > $ 'addr2line -e xen-syms 82d0801c1cce' returns
>> > 'xen/xen/arch/x86/xstate.c:387' which again points to
>> > xsave. Also, adding 'xsave=0' makes it boot just fine.
>>
>> Ah, I think I see the issue: We're zeroing the entire save area in
>> the fixup code, which will make XRSTORS fault unconditionally.
>> Shuai, would you please look into possible ways of fixing this?
>> Just setting the compressed flag may not be enough, and falling
>> back to plain XRSTOR doesn't seem to be an option either - I'm in
>> particular worried that the current model of zeroing everything
>> is bogus with the growing number of different components which
>> get loaded here.
>>
>> But of course another question then is why the XRSTORS faults
>> in the first place. I guess we'll need a debugging patch to dump
>> the full state to understand that.
>>
> If someone can produce and send such patch, I'm sure Harmandeep will be
> happy to give it a try on her hardware.
>

Yes, sure. I will be more than happy to do this :)
Thank you Dario.

Regards,
Harman
> Thanks and Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of "request-multicast-control" flag

2016-01-21 Thread Ian Campbell
On Wed, 2016-01-20 at 12:50 +, Paul Durrant wrote:
> My patch b2700877 "move and amend multicast control documentation"
> clarified use of the multicast control protocol between frontend and
> backend. However, it transpires that the restrictions that documentation
> placed on the "request-multicast-control" flag make it hard for a
> frontend to enable 'all multicast' promiscuous mode, in that to do so
> would require the frontend and backend to disconnect and re-connect.
> 
> This patch adds a new "feature-dynamic-multicast-control" flag to allow
> a backend to advertise that it will watch "request-multicast-control" hence
> allowing it to be meaningfully modified by the frontend at any time rather
> than only when the frontend and backend are disconnected.
> 
> Signed-off-by: Paul Durrant 
> Cc: Ian Campbell 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Keir Fraser 
> Cc: Tim Deegan 


This looks good to me, but also adding Wei (Linux netback + BSD stuff) and
Roger (BSD stuff) for their perspective.

I should probably have done that for the last set of netif.h changes too,
since apart from the nominal maintainers of xen/include/public/io/*.h it's
worth getting input from the maintainers of the consumers. Not sure we can
express that very well in MAINTAINERS :-(.

Ian.

> ---
>  xen/include/public/io/netif.h | 34 ++
>  1 file changed, 22 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/include/public/io/netif.h
> b/xen/include/public/io/netif.h
> index fe0a87f..8816e0f 100644
> --- a/xen/include/public/io/netif.h
> +++ b/xen/include/public/io/netif.h
> @@ -136,18 +136,28 @@
>   */
>  
>  /*
> - * "feature-multicast-control" advertises the capability to filter ethernet
> - * multicast packets in the backend. To enable use of this capability the
> - * frontend must set "request-multicast-control" before moving into the
> - * connected state.
> - *
> - * If "request-multicast-control" is set then the backend transmit side 
> should
> - * no longer flood multicast packets to the frontend, it should instead drop 
> any
> - * multicast packet that does not match in a filter list. The list is
> - * amended by the frontend by sending dummy transmit requests containing
> - * XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL} extra-info fragments as specified 
> below.
> - * Once enabled by the frontend, the feature cannot be disabled except by
> - * closing and re-connecting to the backend.
> + * "feature-multicast-control" and "feature-dynamic-multicast-control"
> + * advertise the capability to filter ethernet multicast packets in the
> + * backend. If the frontend wishes to take advantage of this feature then
> + * it may set "request-multicast-control". If the backend only advertises
> + * "feature-multicast-control" then "request-multicast-control" must be set
> + * before the frontend moves into the connected state. The backend will
> + * sample the value on this state transition and any subsequent change in
> + * value will have no effect. However, if the backend also advertises
> + * "feature-dynamic-multicast-control" then "request-multicast-control"
> + * may be set by the frontend at any time. In this case, the backend will
> + * watch the value and re-sample on watch events.
> + *
> + * If the sampled value of "request-multicast-control" is set then the
> + * backend transmit side should no longer flood multicast packets to the
> + * frontend, it should instead drop any multicast packet that does not
> + * match in a filter list.
> + * The list is amended by the frontend by sending dummy transmit requests
> + * containing XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL} extra-info fragments as
> + * specified below.
> + * Note that the filter list may be amended even if the sampled value of
> + * "request-multicast-control" is not set, however the filter should only
> + * be applied if it is set.
>   */
>  
>  /*

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv5 2/3] grant_table: convert grant table rwlock to percpu rwlock

2016-01-21 Thread Ian Campbell
On Fri, 2015-12-18 at 16:08 +, Malcolm Crossley wrote:
> 
>  xen/arch/arm/mm.c |   4 +-
[...]
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 47bfb27..81f9e2e 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1055,7 +1055,7 @@ int xenmem_add_to_physmap_one(
>  switch ( space )
>  {
>  case XENMAPSPACE_grant_table:
> -write_lock(>grant_table->lock);
> +grant_write_lock(d->grant_table);
>  
>  if ( d->grant_table->gt_version == 0 )
>  d->grant_table->gt_version = 1;
> @@ -1085,7 +1085,7 @@ int xenmem_add_to_physmap_one(
>  
>  t = p2m_ram_rw;
>  
> -write_unlock(>grant_table->lock);
> +grant_write_unlock(d->grant_table);

ARML Acked-by: Ian Campbell 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] cleancache: constify cleancache_ops structure

2016-01-21 Thread Julia Lawall
The cleancache_ops structure is never modified, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall 

---

v2: put back the read mostly

 drivers/xen/tmem.c |2 +-
 include/linux/cleancache.h |2 +-
 mm/cleancache.c|4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/cleancache.h b/include/linux/cleancache.h
index bda5ec0b4..cb3e142 100644
--- a/include/linux/cleancache.h
+++ b/include/linux/cleancache.h
@@ -37,7 +37,7 @@ struct cleancache_ops {
void (*invalidate_fs)(int);
 };

-extern int cleancache_register_ops(struct cleancache_ops *ops);
+extern int cleancache_register_ops(const struct cleancache_ops *ops);
 extern void __cleancache_init_fs(struct super_block *);
 extern void __cleancache_init_shared_fs(struct super_block *);
 extern int  __cleancache_get_page(struct page *);
diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c
index 945fc43..4ac2ca8 100644
--- a/drivers/xen/tmem.c
+++ b/drivers/xen/tmem.c
@@ -242,7 +242,7 @@ static int tmem_cleancache_init_shared_fs(char *uuid, 
size_t pagesize)
return xen_tmem_new_pool(shared_uuid, TMEM_POOL_SHARED, pagesize);
 }

-static struct cleancache_ops tmem_cleancache_ops = {
+static const struct cleancache_ops tmem_cleancache_ops = {
.put_page = tmem_cleancache_put_page,
.get_page = tmem_cleancache_get_page,
.invalidate_page = tmem_cleancache_flush_page,
diff --git a/mm/cleancache.c b/mm/cleancache.c
index 8fc5081..c6356d6 100644
--- a/mm/cleancache.c
+++ b/mm/cleancache.c
@@ -22,7 +22,7 @@
  * cleancache_ops is set by cleancache_register_ops to contain the pointers
  * to the cleancache "backend" implementation functions.
  */
-static struct cleancache_ops *cleancache_ops __read_mostly;
+static const struct cleancache_ops *cleancache_ops __read_mostly;

 /*
  * Counters available via /sys/kernel/debug/cleancache (if debugfs is
@@ -49,7 +49,7 @@ static void cleancache_register_ops_sb(struct super_block 
*sb, void *unused)
 /*
  * Register operations for cleancache. Returns 0 on success.
  */
-int cleancache_register_ops(struct cleancache_ops *ops)
+int cleancache_register_ops(const struct cleancache_ops *ops)
 {
if (cmpxchg(_ops, NULL, ops))
return -EBUSY;


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of "request-multicast-control" flag

2016-01-21 Thread Wei Liu
On Thu, Jan 21, 2016 at 03:29:36PM +, Ian Campbell wrote:
> On Wed, 2016-01-20 at 12:50 +, Paul Durrant wrote:
> > My patch b2700877 "move and amend multicast control documentation"
> > clarified use of the multicast control protocol between frontend and
> > backend. However, it transpires that the restrictions that documentation
> > placed on the "request-multicast-control" flag make it hard for a
> > frontend to enable 'all multicast' promiscuous mode, in that to do so
> > would require the frontend and backend to disconnect and re-connect.
> > 
> > This patch adds a new "feature-dynamic-multicast-control" flag to allow
> > a backend to advertise that it will watch "request-multicast-control" hence
> > allowing it to be meaningfully modified by the frontend at any time rather
> > than only when the frontend and backend are disconnected.
> > 
> > Signed-off-by: Paul Durrant 
> > Cc: Ian Campbell 
> > Cc: Ian Jackson 
> > Cc: Jan Beulich 
> > Cc: Keir Fraser 
> > Cc: Tim Deegan 
> 
> 
> This looks good to me, but also adding Wei (Linux netback + BSD stuff) and
> Roger (BSD stuff) for their perspective.
> 
> I should probably have done that for the last set of netif.h changes too,
> since apart from the nominal maintainers of xen/include/public/io/*.h it's
> worth getting input from the maintainers of the consumers. Not sure we can
> express that very well in MAINTAINERS :-(.
> 
> Ian.

LGTM

Acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 5/5] libxl: add options to enable/disable emulated devices

2016-01-21 Thread Roger Pau Monné
El 21/01/16 a les 12.31, Wei Liu ha escrit:
> On Thu, Jan 21, 2016 at 11:01:43AM +0100, Roger Pau Monné wrote:
>> El 21/01/16 a les 10.39, Ian Campbell ha escrit:
>>> On Wed, 2016-01-20 at 19:33 +0100, Roger Pau Monné wrote:
 El 20/01/16 a les 14.01, Ian Campbell ha escrit:
> On Wed, 2016-01-20 at 12:57 +0100, Roger Pau Monne wrote:
>> Allow enabling or disabling emulated devices from the libxl domain
>> configuration file. For HVM guests with a device model all the
>> emulated
>> devices are enabled. For HVM guests without a device model no devices
>> are
>> enabled by default, although they can be enabled using the options
>> provided.
>> The arbiter of whether a combination is posible or not is always Xen,
>> libxl
>> doesn't do any kind of check.
>>
>> This set of options is also propagated inside of the libxl migration
>> record
>> as part of the contents of the libxl_domain_build_info struct.
>
> ... and this is the real motivation for this change, not actually
> allowing
> users to control all this AIUI.
>
> Did you check that the fields updated using libxl_defbool_setdefault
> are
> actually updated in the JSON copy and therefore propagated to the other
> side of a migration as specific values and not as "pick a default"? I
> think
> we don't want these changing on migration. I think/hope all this was
> automatically handled by the work Wei did in the last release cycle.

 No, values populated by the {build/create}_info_setdefault functions are
 not propagated, OTOH values manually set by the user in the config file
 are indeed propagated. Do we have any guarantee that _setdefault is
 always going to behave in the same way?
>>>
>>> No, part of the purpose of defbool and the other "do the default" values is
>>> that we can evolve things over time.
>>>
 If we don't have that guarantee I think this is already a bug, and we
 should call _setdefault before sending the domain info to the other end.
 In fact I have a patch that does exactly that, but I'm unsure if it's
 needed because I don't know the policy regarding default values in libxl.
>>>
>>> Wei, isn't this (turning the defaults into concrete values) supposed to be
>>> taken care of by the JSON mangling which you added?
>>
>> Heh, I think you mean the JSON mangling added by Wei. In order to 
>> propagate the values filled by default in libxl_domain_config I had to 
>> add the following patch, which basically calls the _setdefault 
>> functions before converting the domain_config into JSON. I'm planning 
>> to make it part of this series in the next iteration:
> 
> The requirement of recording decision made in libxl and pass that to the
> receiving end is not new. We had the same problem for uuid, disk and
> some other things.
> 
> The first way of doing it is to update JSON before it is sent -- see
> libxl.c:libxl_retrieve_domain_configuration. It uses the stashed JSON
> file as template and pull in various bits from hypervisor and xenstore.
> Your need of recording what emulated devices are available fits here.
> You just need to provide a way to retrieve those bits in that function.
> 
> Another way of doing it is to update the stashed JSON template before it
> is saved. See libxl_internal.c:libxl__update_domain_configuration. I
> think this might be easier than the first way of doing it.
> 
> You should not export the _setdefault function to xl because it is a
> layer violation.
> 
> Hope this clarify things.

Hello,

Yes, thank you very much, it has indeed clarified things. I've
implemented it inside of libxl__update_domain_configuration without issues.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread David Vrabel
On 21/01/16 12:19, Ian Campbell wrote:
> On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote:
>> On 20/01/16 12:23, Ian Campbell wrote:
>>> There have been a few reports recently[0] which relate to a failure of
>>> netfront to allocate sufficient grant refs for all the queues:
>>>
>>> [0.533589] xen_netfront: can't alloc rx grant refs
>>> [0.533612] net eth0: only created 31 queues
>>>
>>> Which can be worked around by increasing the number of grants on the
>>> hypervisor command line or by limiting the number of queues permitted
>>> by
>>> either back or front using a module param (which was broken but is now
>>> fixed on both sides, but I'm not sure it has been backported everywhere
>>> such that it is a reliable thing to always tell users as a workaround).
>>>
>>> Is there any plan to do anything about the default/out of the box
>>> experience? Either limiting the number of queues or making both ends
>>> cope
>>> more gracefully with failure to create some queues (or both) might be
>>> sufficient?
>>>
>>> I think the crash after the above in the first link at [0] is fixed? I
>>> think that was the purpose of ca88ea1247df "xen-netfront: update
>>> num_queues
>>> to real created" which was in 4.3.
>>
>> I think the correct solution is to increase the default maximum grant
>> table size.
> 
> That could well make sense, but then there will just be another higher
> limit, so we should perhaps do both.
> 
> i.e. factoring in:
>  * performance i.e. ability for N queues to saturate whatever sort of link
>contemporary Linux can saturate these days, plus some headroom, or
>whatever other ceiling seems sensible)
>  * grant table resource consumption i.e. (sensible max number of blks * nr
>gnts per blk + sensible max number of vifs * nr gnts per vif + other
>devs needs) < per guest grant limit) to pick both the default gnttab
>size and the default max queuers.

Yes.

>> Although, unless you're using the not-yet-applied per-cpu rwlock patches
>> multiqueue is terrible on many (multisocket) systems and the number of
>> queue should be limited in netback to 4 or even just 2.
> 
> Presumably the guest can't tell, so it can't do this.
> 
> I think when you say "terrible" you don't mean "worse than without mq" but
> rather "not realising the expected gains from a larger nunber of queues",
> right?.

Malcolm did the analysis but if I remember correctly, 8 queues performed
about the same as 1 queue and 16 were worse than 1 queue.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread annie li


On 2016/1/21 9:17, David Vrabel wrote:

On 21/01/16 12:19, Ian Campbell wrote:

On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote:

On 20/01/16 12:23, Ian Campbell wrote:

There have been a few reports recently[0] which relate to a failure of
netfront to allocate sufficient grant refs for all the queues:

[0.533589] xen_netfront: can't alloc rx grant refs
[0.533612] net eth0: only created 31 queues

Which can be worked around by increasing the number of grants on the
hypervisor command line or by limiting the number of queues permitted
by
either back or front using a module param (which was broken but is now
fixed on both sides, but I'm not sure it has been backported everywhere
such that it is a reliable thing to always tell users as a workaround).

Is there any plan to do anything about the default/out of the box
experience? Either limiting the number of queues or making both ends
cope
more gracefully with failure to create some queues (or both) might be
sufficient?

I think the crash after the above in the first link at [0] is fixed? I
think that was the purpose of ca88ea1247df "xen-netfront: update
num_queues
to real created" which was in 4.3.

I think the correct solution is to increase the default maximum grant
table size.

That could well make sense, but then there will just be another higher
limit, so we should perhaps do both.

i.e. factoring in:
  * performance i.e. ability for N queues to saturate whatever sort of link
contemporary Linux can saturate these days, plus some headroom, or
whatever other ceiling seems sensible)
  * grant table resource consumption i.e. (sensible max number of blks * nr
gnts per blk + sensible max number of vifs * nr gnts per vif + other
devs needs) < per guest grant limit) to pick both the default gnttab
size and the default max queuers.

Yes.


Would it waste lots of resources in the case where guest vif has lots of 
queue but no network load? Here is an example of gntref consumed by vif,

Dom0 20vcpu, domu 20vcpu,
one vif would consumes 20*256*2=10240 gntref.
If setting the maximum grant table size to 64pages(default value of xen 
is 32pages now?), then only 3 vif is supported in guest. Even blk isn't 
taken account in, and also blk multi-page ring feature.


Thanks
Annie



Although, unless you're using the not-yet-applied per-cpu rwlock patches
multiqueue is terrible on many (multisocket) systems and the number of
queue should be limited in netback to 4 or even just 2.

Presumably the guest can't tell, so it can't do this.

I think when you say "terrible" you don't mean "worse than without mq" but
rather "not realising the expected gains from a larger nunber of queues",
right?.

Malcolm did the analysis but if I remember correctly, 8 queues performed
about the same as 1 queue and 16 were worse than 1 queue.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv1 1/4] atomic: replace atomic_compareandswap() with atomic_cmpxchg()

2016-01-21 Thread Ian Campbell
On Fri, 2015-12-18 at 14:09 +, David Vrabel wrote:
> atomic_compareandswap() used atomic_t as the new, old and returned
> values which is less convinient than using just int.

"convenient"

diff --git a/xen/include/asm-arm/atomic.h b/xen/include/asm-arm/atomic.h
> index 5a38c67..29ab265 100644
> --- a/xen/include/asm-arm/atomic.h
> +++ b/xen/include/asm-arm/atomic.h
> @@ -138,14 +138,6 @@ static inline void _atomic_set(atomic_t *v, int i)
>  # error "unknown ARM variant"
>  #endif
>  
> -static inline atomic_t atomic_compareandswap(
> -atomic_t old, atomic_t new, atomic_t *v)
> -{
> -atomic_t rc;
> -rc.counter = __cmpxchg(>counter, old.counter, new.counter,
> sizeof(int));
> -return rc;
> -}
> -

ARM: Acked-by: Ian Campbell 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of "request-multicast-control" flag

2016-01-21 Thread Wei Liu
On Thu, Jan 21, 2016 at 04:35:40PM +0100, Roger Pau Monné wrote:
> El 21/01/16 a les 16.29, Ian Campbell ha escrit:
> > On Wed, 2016-01-20 at 12:50 +, Paul Durrant wrote:
> >> My patch b2700877 "move and amend multicast control documentation"
> >> clarified use of the multicast control protocol between frontend and
> >> backend. However, it transpires that the restrictions that documentation
> >> placed on the "request-multicast-control" flag make it hard for a
> >> frontend to enable 'all multicast' promiscuous mode, in that to do so
> >> would require the frontend and backend to disconnect and re-connect.
> >>
> >> This patch adds a new "feature-dynamic-multicast-control" flag to allow
> >> a backend to advertise that it will watch "request-multicast-control" hence
> >> allowing it to be meaningfully modified by the frontend at any time rather
> >> than only when the frontend and backend are disconnected.
> >>
> >> Signed-off-by: Paul Durrant 
> >> Cc: Ian Campbell 
> >> Cc: Ian Jackson 
> >> Cc: Jan Beulich 
> >> Cc: Keir Fraser 
> >> Cc: Tim Deegan 
> > 
> > 
> > This looks good to me, but also adding Wei (Linux netback + BSD stuff) and
> > Roger (BSD stuff) for their perspective.
> > 
> > I should probably have done that for the last set of netif.h changes too,
> > since apart from the nominal maintainers of xen/include/public/io/*.h it's
> > worth getting input from the maintainers of the consumers. Not sure we can
> > express that very well in MAINTAINERS :-(.
> 
> I'm going to leave this one to Wei, he has more experience than me
> regarding FreeBSD netfront (and xen-net related topics).
> 
> FWIW, a quick and dirty grep on FreeBSD netfront doesn't show any
> results for "request-multicast-control", so I guess it's not implemented.
> 

No, it is not implemented.

Wei.

> Roger.
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

2016-01-21 Thread Andrew Cooper
On 21/01/16 14:29, David Vrabel wrote:
> On 21/01/16 13:17, Andrew Cooper wrote:
>> As we see with the Protection Key feature, newer hardware feature start
>> using bits which were previously software available, and we absolutely
>> don't want to be in a position where our ABI prevents us from ever
>> supporting a new feature.
> I don't see a problem in not supporting new features in /PV/ guests.

I know you want to kill PV guests.  I am not going to deliberately
cripple PV guests in an attempt to achieve that goal.

>
> I think we should document the PV ABI as-is. i.e., that these two PTE
> bits are not available for PV guest use.

This is already known to break some guests.  It shouldn't stay as it
currently is.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] public/io/netif.h: change semantics of "request-multicast-control" flag

2016-01-21 Thread Roger Pau Monné
El 21/01/16 a les 16.29, Ian Campbell ha escrit:
> On Wed, 2016-01-20 at 12:50 +, Paul Durrant wrote:
>> My patch b2700877 "move and amend multicast control documentation"
>> clarified use of the multicast control protocol between frontend and
>> backend. However, it transpires that the restrictions that documentation
>> placed on the "request-multicast-control" flag make it hard for a
>> frontend to enable 'all multicast' promiscuous mode, in that to do so
>> would require the frontend and backend to disconnect and re-connect.
>>
>> This patch adds a new "feature-dynamic-multicast-control" flag to allow
>> a backend to advertise that it will watch "request-multicast-control" hence
>> allowing it to be meaningfully modified by the frontend at any time rather
>> than only when the frontend and backend are disconnected.
>>
>> Signed-off-by: Paul Durrant 
>> Cc: Ian Campbell 
>> Cc: Ian Jackson 
>> Cc: Jan Beulich 
>> Cc: Keir Fraser 
>> Cc: Tim Deegan 
> 
> 
> This looks good to me, but also adding Wei (Linux netback + BSD stuff) and
> Roger (BSD stuff) for their perspective.
> 
> I should probably have done that for the last set of netif.h changes too,
> since apart from the nominal maintainers of xen/include/public/io/*.h it's
> worth getting input from the maintainers of the consumers. Not sure we can
> express that very well in MAINTAINERS :-(.

I'm going to leave this one to Wei, he has more experience than me
regarding FreeBSD netfront (and xen-net related topics).

FWIW, a quick and dirty grep on FreeBSD netfront doesn't show any
results for "request-multicast-control", so I guess it's not implemented.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

2016-01-21 Thread David Vrabel
On 21/01/16 14:37, Andrew Cooper wrote:

>> I think we should document the PV ABI as-is. i.e., that these two PTE
>> bits are not available for PV guest use.
> 
> This is already known to break some guests.  It shouldn't stay as it
> currently is.

It's unfortunate that the current ABI is incompatible with some guests
in some situations, but it can't be fixed without adding a mechanism for
the guest to query or negotiate PTE bit availability since existing
guests have to be able to run on older hypervisors.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.

2016-01-21 Thread Xu, Quan
> On January 20, 2016 at 7:29 pm,   wrote:
> >>> On 20.01.16 at 11:26,  wrote:
> >> On January 15, 2016 at 9:10,  wrote:
> >> >>> On 23.12.15 at 09:25,  wrote:
> >> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
> >> >  return 0;
> >> >  }
> >> >
> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> >> > + u16 seg, u8 bus, u8
> >> > +devfn)
> >> {
> >> > +struct domain *d;
> >> > +struct pci_dev *pdev;
> >> > +
> >> > +d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> >> > +ASSERT(d);
> >>
> >> Don't you need to acquire some lock in order to safely assert this?
> >
> > Referred to the other use case of 'rcu_lock_domain_by_id()', Xen
> > checks whether The domain is 'NULL'.
> > Could I also replace the 'ASSERT(d)' with
> >   + If ( d == NULL )
> >   +   return;
> > ?
> 
> At a first glance this doesn't look right, but in the end that's something 
> you need
> to answer.
> 

Is it quite similar to whether the domain has been destroyed when Device-TLB is 
flushing?
Correct me if I still doesn't get you point..


> >> Also note that unused slots hold zero, i.e. there's at least a
> >> theoretical
> > risk of
> >> problems here when you don't also look at
> >> iommu->domid_bitmap.
> >>
> > I am not clear how to fix this point. Do you have good idea?
> > Add a lock to 'iommu->domid_bitmap'?
> 
> How would a lock help avoiding mistaking an unused slot to mean Dom0? As
> already suggested - I think you simply need to consult the bitmap along with 
> the
> domain ID array.
> 

Try to get domain id with iommu->domid_map[did] ?


> >> > +{
> >> > +list_del(>domain_list);
> >>
> >> This should happen under pcidevs_lock - you need to either acquire it
> >> or
> >> ASSERT() it being held.
> >>
> >
> > I should check whether it is under pcidevs_lock -- with
> > spin_is_locked(_lock)
> > If it is not under pcidevs_lock, I should acquire it.
> > I think ASSERT() is not a good idea. Hypervisor acquires this lock and
> > then remove the resource.
> 
> I don't understand this last sentence.
> 
For example: in 
pci_remove_device()
{
...
spin_lock(_lock);
..
iommu_remove_device()..
..
spin_unlock(_lock);
}

Device-TLB is maybe flush error in iommu_remove_device()..
Then it is under pcidevs_lock..
In this case, I need to check whether it is under pcidevs_lock.
If not, I need to acquire the pcidevs_lock.

> >> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
> >> >  queue_invalidate_iotlb(iommu,
> >> > type >>
> >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
> >> > dw, did, size_order, 0, addr);
> >> > +
> >> > +/*
> >> > + * Synchronize with hardware for invalidation request
> descriptors
> >> > + * submitted before Device-TLB invalidate descriptor.
> >> > + */
> >> > +rc = invalidate_sync(iommu);
> >> > +if ( rc )
> >> > + return rc;
> >> > +
> >> >  if ( flush_dev_iotlb )
> >> >  ret = dev_invalidate_iotlb(iommu, did, addr, size_order,
> type);
> >> > -rc = invalidate_sync(iommu);
> >> > +
> >> >  if ( !ret )
> >> >  ret = rc;
> >> >  }
> >>
> >> This change is because of ...?
> >>
> > As Kevin's comments,
> > I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb().
> > Then i can know which device is flush timeout.
> 
> I don't understand: How is your reply related to you moving up the invocation 
> of
> invalidate_sync()?

This invalidate_sync() is to synchronize with hardware for invalidation request 
descriptors
submitted before Device-TLB invalidate descriptor.

If I don't add this invalidate_sync().. 
dev_invalidate_iotlb_timeout() is not clear whether the flush time out is about 
Device-TLB flush error or the other previous iotlb/iec/context flush error. 


> 
> >> Plus logging the error code and device coordinates might turn out
> >> helpful in such cases. But first of all - if there was a timeout,
> >> we'd make it here only for Dom0. Perhaps the printing should move into an
> else to the domain_crash()?
> >> And if there was another error, the message would be outright wrong.
> >>
> > IMO, the print for Dom0 is enough.
> > I think it is not a good idea to move into an else to domain_crash().
> > Domain_crash is quite a common part.
> > Anyway I can improve it in low priority.
> 
> At the very least the message should not end up being actively misleading.
> 
What about the below?
+printk(XENLOG_ERR
+   "Device %04x:%02x:%02x is flush error.",
+   seg, bus, devfn);



Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH LIBVIRT v2] libxl: Support cmdline= in xl config files

2016-01-21 Thread Jim Fehlig
Ian Campbell wrote:
> On Wed, 2016-01-20 at 11:16 -0700, Jim Fehlig wrote:
>> Ian Campbell wrote:
>>> ... and consolidate the cmdline/extra/root parsing to facilitate doing
>>> so.
>>>
>>> The logic is the same as xl's parse_cmdline from the current xen.git
>>> master
>>> branch (e6f0e099d2c17de47fd86e817b1998db903cab61).
>>>
>>> In order to introduce a use of VIR_WARN for logging I had to add
>>> virerror.h and VIR_LOG_INIT.
>>>
>>> Signed-off-by: Ian Campbell 
>>> ---
>>> NB: I was unsure if I was supposed to change VIR_FROM_NONE into
>>> VIR_FROM_XEN, so I didn't (but will on advice).
>> It seems some of the VIR_FROM_ needs cleanup throughout the various 
>> Xen-related
>> files. We already have VIR_FROM_SEXPR and VIR_FROM_XENXM.
>> src/xenconfig/xen_sxpr.c should use the former, src/xenconfig/xen_xm.c the
>> latter. And given the pattern, I think we should introduce VIR_FROM_XENXL to
>> cover xl.cfg related code. I can take care of this.
> 
> I've acked you patches about this.

Thanks. I'll push those trivial patches shortly.

> 
>>> +static int xenParseCmdline(virConfPtr conf, char **r_cmdline)
>>> +{
>>> +char *cmdline;
>> One too many initializers removed :-).
>>
>>> +const char *root, *extra, *buf;
>>> +
>>> +if (xenConfigGetString(conf, "cmdline", , NULL) < 0)
>>> +return -1;
>>> +
>>> +if (xenConfigGetString(conf, "root", , NULL) < 0)
>>> +return -1;
>>> +
>>> +if (xenConfigGetString(conf, "extra", , NULL) < 0)
>>> +return -1;
>>> +
>>> +if (buf) {
>>> +if (VIR_STRDUP(cmdline, buf) < 0)
>>> +return -1;
>>> +if (root || extra)
>>> +VIR_WARN("ignoring root= and extra= in favour of
>>> cmdline=");
>>> +} else {
>>> +if (root && extra) {
>>> +if (virAsprintf(, "root=%s %s", root, extra) < 0)
>>> +return -1;
>>> +} else if (root) {
>>> +if (virAsprintf(, "root=%s", root) < 0)
>>> +return -1;
>>> +} else if (extra) {
>>> +if (VIR_STRDUP(cmdline, extra) < 0)
>>> +return -1;
>>> +}
>>> +}
>>> +
>>> +*r_cmdline = cmdline;
>> If none of cmdline, extra, or root are set in the config, def->os.cmdline 
>> gets
>> set to garbage. xlconfigtest explodes when running 'make check'.
> 
> I even thought about this quite carefully but missed this case :-/
> 
> Would you prefer and = NULL on the decl or an else cmdline = NULL at the
> end of that chain?

'char *cmdline = NULL;' is fine.

> 
>> Sorry for not mentioning it in my previous review, but we should add a
>> test for cmdline, root, and extra.
> 
> Ack, will do so for v3.

Thanks!

Regards,
Jim


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 0/6] HVMlite: DomU fixes and a Dom0 preparatory patch

2016-01-21 Thread Roger Pau Monne
Hello,

The series is sorted in the following way:
 - Patch 1 is a preparatory patch for Dom0 HVMlite support.
 - Patch 4 is a fix from a fallout introduced by the HVMlite series, which
   inadvertently disabled the emulated PIT that was added to PV(H) guests.
 - Patches 2, 3 and 5 expand the fields of the libxl_domain_build_info
   structure in order to store what emulated devices inside of Xen are
   enabled.
 - Patch 6 is new in this iteration and aims to provide a simple way to 
   report to the guest which emulated devices are enabled. I belive this is 
   the last part of the guest ABI that I wanted to do (if we leave 
   pci-passthrough out of the picture). IMHO, the DomU ABI is now complete.

The full series can also be found in my git tree:

http://xenbits.xen.org/gitweb/?p=people/royger/xen.git;a=shortlog;h=refs/heads/hvmlite_fixes_v4

I hope this will help review of patch 1/6, which is a significant rework,
and will probably be better reviewed by looking at the resulting code.

Thanks, Roger.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 3/6] libxl: initialise the build info before calling prepare_config

2016-01-21 Thread Roger Pau Monne
libxl__arch_domain_prepare_config has access to the libxl_domain_build_info
struct, so make sure it's properly initialised. Note that prepare_config is
called from within libxl__domain_make.

This is not a bug at the moment, because prepare_config doesn't access
libxl_domain_build_info yet, but this is likely going to change.

Signed-off-by: Roger Pau Monné 
---
Cc: Ian Jackson 
Cc: Ian Campbell 
Cc: Wei Liu 
---
Changes since v3:
 - Reword commit message to make it clear that prepare_config is called from
   domain_make, and that the fact that build_info is not initialised is not
   a bug ATM because prepare_config doesn't make use of it.
---
 tools/libxl/libxl_create.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index b669359..c7700a7 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -892,6 +892,12 @@ static void initiate_domain_create(libxl__egc *egc,
 goto error_out;
 }
 
+ret = libxl__domain_build_info_setdefault(gc, _config->b_info);
+if (ret) {
+LOG(ERROR, "Unable to set domain build info defaults");
+goto error_out;
+}
+
 ret = libxl__domain_make(gc, d_config, , >config);
 if (ret) {
 LOG(ERROR, "cannot make domain: %d", ret);
@@ -903,12 +909,6 @@ static void initiate_domain_create(libxl__egc *egc,
 dcs->guest_domid = domid;
 dcs->dmss.dm.guest_domid = 0; /* means we haven't spawned */
 
-ret = libxl__domain_build_info_setdefault(gc, _config->b_info);
-if (ret) {
-LOG(ERROR, "Unable to set domain build info defaults");
-goto error_out;
-}
-
 if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM &&
 (libxl_defbool_val(d_config->b_info.u.hvm.nested_hvm) &&
  libxl_defbool_val(d_config->b_info.u.hvm.altp2m))) {
-- 
1.9.5 (Apple Git-50.3)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 20/31] x86: Improvements to in-hypervisor cpuid sanity checks

2016-01-21 Thread Andrew Cooper
On 21/01/16 17:02, Jan Beulich wrote:
 On 16.12.15 at 22:24,  wrote:
>> @@ -864,69 +865,27 @@ void pv_cpuid(struct cpu_user_regs *regs)
>>  
>>  cpuid_count(a, c, , , , );
>>  
>> -if ( (regs->eax & 0x7fff) == 0x0001 )
>> -{
>> -/* Modify Feature Information. */
>> -if ( !cpu_has_apic )
>> -__clear_bit(X86_FEATURE_APIC, );
>> -
>> -if ( !is_pvh_domain(currd) )
>> -{
>> -__clear_bit(X86_FEATURE_PSE, );
>> -__clear_bit(X86_FEATURE_PGE, );
>> -__clear_bit(X86_FEATURE_PSE36, );
>> -__clear_bit(X86_FEATURE_VME, );
>> -}
>> -}
> This I understand goes away because pv_featureset[] never has
> those set?
>
>>  case 0x8001:
>> -/* Modify Feature Information. */
>> -if ( is_pv_32bit_domain(currd) )
>> -{
>> -__clear_bit(X86_FEATURE_LM % 32, );
>> -__clear_bit(X86_FEATURE_LAHF_LM % 32, );
>> -}
>> -if ( is_pv_32bit_domain(currd) &&
>> - boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
>> -__clear_bit(X86_FEATURE_SYSCALL % 32, );
> But what about these 32-bit specific removals?

LM, from the deep feature dependency removal in libxc, when it is known
that the domain is 32bit.

For SYSCALL, as far as I can tell, the logic is wrong.  32bit compat
mode code on Intel can use SYSCALL, as Xen is running in Long mode. 
(This is opposite to the AMD case where 32bit compat code cannot use
SYSENTER, because Xen is in Long mode.)

> Overall this of course makes things quite a bit more readable.

And there is more to come.

By the time my cpuid phase 2 plans are complete, all validitiy checks
will be done at the set_cpuid_policy hypercall boundary, meaning that
all these time-of-use checks can be dropped.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 23/31] xen/x86: Export cpuid levelling capabilities via SYSCTL

2016-01-21 Thread Jan Beulich
>>> On 16.12.15 at 22:24,  wrote:
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -32,6 +32,9 @@ integer_param("cpuid_mask_ext_ecx", opt_cpuid_mask_ext_ecx);
>  unsigned int __devinitdata opt_cpuid_mask_ext_edx = ~0u;
>  integer_param("cpuid_mask_ext_edx", opt_cpuid_mask_ext_edx);
>  
> +unsigned int __initdata expected_levelling_cap;

Unused variable.

> +unsigned int __read_mostly levelling_caps;

Never written to variable.

> +/*
> + * XEN_SYSCTL_get_levelling_caps (x86 specific)
> + *
> + * Return hardware capabilities concerning masking or faulting of the cpuid
> + * instruction for PV guests.
> + */
> +struct xen_sysctl_levelling_caps {
> +/*
> + * Featureset array index encoding:
> + * - (possibly) 'e' Extended
> + * - leaf, uppercase hex
> + * - register, lowercase
> + * - (possibly) subleaf
> + */
> +#define XEN_SYSCTL_LEVELCAP_faulting (1ul <<  0) /* CPUID faulting*/
> +#define XEN_SYSCTL_LEVELCAP_1c   (1ul <<  1) /* 0x0001.ecx*/
> +#define XEN_SYSCTL_LEVELCAP_1d   (1ul <<  2) /* 0x0001.edx*/
> +#define XEN_SYSCTL_LEVELCAP_e1c  (1ul <<  3) /* 0x8001.ecx*/
> +#define XEN_SYSCTL_LEVELCAP_e1d  (1ul <<  4) /* 0x8001.edx*/
> +#define XEN_SYSCTL_LEVELCAP_Da1  (1ul <<  5) /* 0x000D:1.eax  */
> +#define XEN_SYSCTL_LEVELCAP_6c   (1ul <<  6) /* 0x0006.ecx*/
> +#define XEN_SYSCTL_LEVELCAP_7a0  (1ul <<  7) /* 0x0007:0.eax  */
> +#define XEN_SYSCTL_LEVELCAP_7b0  (1ul <<  8) /* 0x0007:0.ebx  */

Similar comment apply to the naming here as I had given for the
feature sysctl.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] EDAC infomation partially missing

2016-01-21 Thread Jan Beulich
>>> On 20.01.16 at 16:01,  wrote:
> Initially reported to debian
> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here:
> 
> With AMD Opteron 6xxx processors, half of the memory controllers are
> missing from /sys/devices/system/edac/mc
> Checked with single 6120 (dual memory controller) and twin 6344 (2x dual
> MC), other dual-module CPUs might be affected too.
> 
> Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are
> listed under /sys/devices/system/edac/mc as expected. Same happens, when
> Xen 4.1 is used: all MCs present.
> 
> Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU
> machine) or mc2/mc3 (dual CPU machine) are present, although the full
> system memory is accessible. Checked versions were 4.1.4 (Debian
> Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid)

As already indicated by Ian in that bug, you should supply us with
full kernel and hypervisor logs for both the good and bad cases
(ideally with the same kernel version use in both runs, so that we
can exclude kernel behavior differences).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-next test] 78603: regressions - FAIL

2016-01-21 Thread osstest service owner
flight 78603 linux-next real [real]
http://logs.test-lab.xenproject.org/osstest/logs/78603/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-credit2  15 guest-localmigratefail REGR. vs. 78487
 test-amd64-amd64-xl-xsm  15 guest-localmigratefail REGR. vs. 78487
 test-amd64-amd64-xl  15 guest-localmigratefail REGR. vs. 78487
 test-amd64-amd64-xl-multivcpu 15 guest-localmigrate   fail REGR. vs. 78487
 test-amd64-i386-xl-xsm   15 guest-localmigratefail REGR. vs. 78487
 test-amd64-i386-xl   15 guest-localmigratefail REGR. vs. 78487
 test-amd64-amd64-pair  22 guest-migrate/dst_host/src_host fail REGR. vs. 78487
 test-amd64-i386-pair   22 guest-migrate/dst_host/src_host fail REGR. vs. 78487

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-libvirt-xsm  15 guest-saverestore.2   fail REGR. vs. 78487
 test-amd64-amd64-libvirt-xsm 15 guest-saverestore.2   fail REGR. vs. 78487
 test-amd64-amd64-libvirt-pair 22 guest-migrate/dst_host/src_host fail REGR. 
vs. 78487
 test-amd64-i386-libvirt  15 guest-saverestore.2   fail REGR. vs. 78487
 test-amd64-amd64-xl-rtds 15 guest-localmigratefail REGR. vs. 78487
 test-amd64-i386-libvirt-pair 22 guest-migrate/dst_host/src_host fail REGR. vs. 
78487
 test-amd64-amd64-libvirt 15 guest-saverestore.2   fail REGR. vs. 78487
 test-armhf-armhf-libvirt-raw  9 debian-di-install fail REGR. vs. 78487
 test-armhf-armhf-xl-cubietruck 15 guest-start/debian.repeat fail blocked in 
78487
 test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail blocked in 78487
 test-amd64-i386-rumpuserxen-i386 10 guest-startfail like 78487
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail   like 78487
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 78487
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 78487
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail like 78487

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel 14 guest-saverestorefail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 13 guest-saverestorefail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-intel 13 xen-boot/l1 fail never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 13 xen-boot/l1   fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass

version targeted for testing:
 linux

Re: [Xen-devel] [PATCH V13 5/5] xl: add pvusb commands

2016-01-21 Thread Ian Jackson
Wei Liu writes ("Re: [Xen-devel] [PATCH V13 5/5] xl: add pvusb commands"):
> Or rewrite every adhoc parser with bison/flex or Ocaml / Haskell parsec.

I endorse this product and/or service.

> Jokes aside. I will see what I can do. Consolidating them into helper
> functions is a good start.

Yes :-).

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V2] xen: interface: introduce pvclk interface

2016-01-21 Thread Stefano Stabellini
On Thu, 21 Jan 2016, Ian Campbell wrote:
> On Thu, 2016-01-21 at 12:55 +, Stefano Stabellini wrote:
> > On Thu, 21 Jan 2016, Peng Fan wrote:
> > > Hi Ian,
> > >
> > > On Thu, Jan 21, 2016 at 12:26:04PM +, Ian Campbell wrote:
> > > > On Thu, 2016-01-21 at 19:55 +0800, Peng Fan wrote:
> > > > > Hi Ian,
> > > > >
> > > > > On Thu, Jan 21, 2016 at 10:19:32AM +, Ian Campbell wrote:
> > > > > > On Thu, 2016-01-21 at 16:59 +0800, Peng Fan wrote:
> > > > > > >  
> > > > > > > To platform device of ARM, hypervisor is responsible for the
> > > > > > > mapping
> > > > > > > between machine address and guest physical address, also
> > > > > > > responsible
> > > > > > > for the irq mapping.
> > > > > > >
> > > > > > > But to embedded ARM SoC, the hardware clk IP is handled in
> > > > > > > Dom0.
> > > > > >
> > > > > > Arguably Xen ought to be the one to do this, but we have
> > > > > > determined
> > > > > > (rightly, I think) that doing so for the entirely clk tree of an
> > > > > > SoC
> > > > > > would
> > > > > > involve importing an unmanageable amount of code into Xen[0].
> > > > > >
> > > > > > In the meantime we defer this to Dom0.
> > > > > >
> > > > > > I wonder though if it would be possible to manage the clocks for
> > > > > > a
> > > > > > passthrough device from the toolstack, i.e. is there a sysfs node
> > > > > > where
> > > > > > one
> > > > > > can say "keep the clock for this device enabled (at xMhz) even
> > > > > > though
> > > > > > you
> > > > > > think the device is unused"?
> > > > >
> > > > > I am afraid not. The linux device driver without xen can work well,
> > > > > because
> > > > > there is clk tree and "clk_get,clk_prepare,clk_set_rate" work well
> > > > > in the
> > > > > driver.
> > > > > I do not want to remove the clk apis usage from the device driver
> > > > > for
> > > > > xen, because the driver
> > > > > also serves when no xen support. The pvclk interface is mainly to
> > > > > let the
> > > > > clk api can work as no xen.
> > > >
> > > > Would adding a dummy fixed-clock[0] (or several of them) to the guest
> > > > passthrough DT satisfy the driver's requirements? They would be
> > > > hardcoded
> > > > to whatever rate dom0 and/or the tools has decided upon (and had set
> > > > in the
> > > > real h/w).
> > >
> > > If using this way, we have the assumption that DomU device driver would
> > > not
> > > change the rate of the clock driving the device. I am not sure whether
> > > this is
> > > ok for so many platform devices based ARM core.
> > >
> > > In /sys/kernel/debug/clk/, there are clk tree info, but
> > > clk api are not exposed to userspace as far as I know, so
> > > if using sysfs interface to set a known fixed rate or enable/disable
> > > the clock,
> > > we need to expose the clk info to userspace.
> >
> > Keeping in mind that we might now want to let DomU change the actual
>
> s/now/not/?

Sorry, I meant "we might not want to let DomU change" ...


> > clock frequency anyway, exposing a dummy clock to keep the DomU driver
> > happy might be a good option.
> >
> > However whether we set the frequency from the Dom0 kernel or from the
> > toolstack, how do we find out the right clock rate? From a grep in the
> > kernel sources, it looks like some drivers still have hardcoded values.
> >
> > Asking the user to provide the clock rate seems a bit too much to me.
>
> Remember that for this use case they already need to provide the host DT
> path to the device, the list of iomem resources, the list of irqs and a
> suitable device tree snippet.
>
> platform device passthrough is very much an "expert" option (intended for
> folks building embedded device, not really for normal end users), which
> already needs a fair bit of familiarity with the system in question, I
> don't think providing some info about the clocks goes too much further than
> this.
>
> Seamless & easy passthrough is something we are aiming for for PCI, but not
> for the general case of platform passthrough.
>
> Making platform device passthrough as easy as PCI passthrough is much
> harder and far reaching than avoiding the need to specify some clock
> frequencies (remember we had very long discussions about the design and
> couldn't come up with anything especially feasible).

I agree that passthrough of non-PCI devices is a feature for "experts"
and we don't have many good solutions to this problem.

But one thing is asking for a device tree snippet, another is asking
for the clock frequency: as an expert user, finding out the device tree
node can be done by reading the device tree, the docs or googling for
it, but how is a user supposed to find out the clock frequency?  I don't
think that asking expert users to grep for values in the Linux source tree
should be an option. Is the frequency at least supposed to be documented
in the hardware manual?

Should we have a list of known devices and frequencies in libxl?  Maybe
not, but we could consider giving users the option to allow the DomU to

Re: [Xen-devel] [PATCH RFC 21/31] x86/domctl: Break out logic to update domain state from cpuid information

2016-01-21 Thread Jan Beulich
>>> On 16.12.15 at 22:24,  wrote:
> Later changes will add to this logic.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 

And actually looks like this is pretty independent of the earlier patches,
and hence could go in right away?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PULL 11/11] Xen PCI passthru: convert to realize()

2016-01-21 Thread Stefano Stabellini
From: Cao jin 

Signed-off-by: Cao jin 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Stabellini 
---
 hw/xen/xen_pt.c |   53 -
 1 file changed, 28 insertions(+), 25 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 9eef3df..d33221b 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -760,10 +760,10 @@ static void xen_pt_destroy(PCIDevice *d) {
 }
 /* init */
 
-static int xen_pt_initfn(PCIDevice *d)
+static void xen_pt_realize(PCIDevice *d, Error **errp)
 {
 XenPCIPassthroughState *s = XEN_PT_DEVICE(d);
-int rc = 0;
+int i, rc = 0;
 uint8_t machine_irq = 0, scratch;
 uint16_t cmd = 0;
 int pirq = XEN_PT_UNASSIGNED_PIRQ;
@@ -781,8 +781,8 @@ static int xen_pt_initfn(PCIDevice *d)
 );
 if (err) {
 error_append_hint(, "Failed to \"open\" the real pci device");
-error_report_err(err);
-return -1;
+error_propagate(errp, err);
+return;
 }
 
 s->is_virtfn = s->real_device.is_virtfn;
@@ -802,19 +802,19 @@ static int xen_pt_initfn(PCIDevice *d)
 if ((s->real_device.domain == 0) && (s->real_device.bus == 0) &&
 (s->real_device.dev == 2) && (s->real_device.func == 0)) {
 if (!is_igd_vga_passthrough(>real_device)) {
-XEN_PT_ERR(d, "Need to enable igd-passthru if you're trying"
-   " to passthrough IGD GFX.\n");
+error_setg(errp, "Need to enable igd-passthru if you're trying"
+" to passthrough IGD GFX");
 xen_host_pci_device_put(>real_device);
-return -1;
+return;
 }
 
 xen_pt_setup_vga(s, >real_device, );
 if (err) {
 error_append_hint(, "Setup VGA BIOS of passthrough"
 " GFX failed");
-error_report_err(err);
+error_propagate(errp, err);
 xen_host_pci_device_put(>real_device);
-return -1;
+return;
 }
 
 /* Register ISA bridge for passthrough GFX. */
@@ -836,20 +836,19 @@ static int xen_pt_initfn(PCIDevice *d)
 /* Bind interrupt */
 rc = xen_host_pci_get_byte(>real_device, PCI_INTERRUPT_PIN, );
 if (rc) {
-XEN_PT_ERR(d, "Failed to read PCI_INTERRUPT_PIN! (rc:%d)\n", rc);
+error_setg_errno(errp, errno, "Failed to read PCI_INTERRUPT_PIN");
 goto err_out;
 }
 if (!scratch) {
-XEN_PT_LOG(d, "no pin interrupt\n");
+error_setg(errp, "no pin interrupt");
 goto out;
 }
 
 machine_irq = s->real_device.irq;
 rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, );
-
 if (rc < 0) {
-XEN_PT_ERR(d, "Mapping machine irq %u to pirq %i failed, (err: %d)\n",
-   machine_irq, pirq, errno);
+error_setg_errno(errp, errno, "Mapping machine irq %u to"
+ " pirq %i failed", machine_irq, pirq);
 
 /* Disable PCI intx assertion (turn on bit10 of devctl) */
 cmd |= PCI_COMMAND_INTX_DISABLE;
@@ -870,8 +869,8 @@ static int xen_pt_initfn(PCIDevice *d)
PCI_SLOT(d->devfn),
e_intx);
 if (rc < 0) {
-XEN_PT_ERR(d, "Binding of interrupt %i failed! (err: %d)\n",
-   e_intx, errno);
+error_setg_errno(errp, errno, "Binding of interrupt %u failed",
+ e_intx);
 
 /* Disable PCI intx assertion (turn on bit10 of devctl) */
 cmd |= PCI_COMMAND_INTX_DISABLE;
@@ -879,8 +878,8 @@ static int xen_pt_initfn(PCIDevice *d)
 
 if (xen_pt_mapped_machine_irq[machine_irq] == 0) {
 if (xc_physdev_unmap_pirq(xen_xc, xen_domid, machine_irq)) {
-XEN_PT_ERR(d, "Unmapping of machine interrupt %i failed!"
-   " (err: %d)\n", machine_irq, errno);
+error_setg_errno(errp, errno, "Unmapping of machine"
+" interrupt %u failed", machine_irq);
 }
 }
 s->machine_irq = 0;
@@ -893,14 +892,14 @@ out:
 
 rc = xen_host_pci_get_word(>real_device, PCI_COMMAND, );
 if (rc) {
-XEN_PT_ERR(d, "Failed to read PCI_COMMAND! (rc: %d)\n", rc);
+error_setg_errno(errp, errno, "Failed to read PCI_COMMAND");
 goto err_out;
 } else {
 val |= cmd;
 rc = xen_host_pci_set_word(>real_device, PCI_COMMAND, val);
 if (rc) {
-XEN_PT_ERR(d, "Failed to write PCI_COMMAND val=0x%x!(rc: 
%d)\n",
-   val, rc);
+error_setg_errno(errp, errno, "Failed to write PCI_COMMAND"
+ " val = 0x%x", val);
 goto err_out;

[Xen-devel] [libvirt test] 78659: tolerable FAIL - PUSHED

2016-01-21 Thread osstest service owner
flight 78659 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/78659/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt-raw 13 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 13 guest-saverestorefail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  a6cfd22eba5f204deaf0fde3bd3ca96474d77c55
baseline version:
 libvirt  020135dc855712d09b92a1c7ba3341cd22010f18

Last test of basis78582  2016-01-20 04:21:30 Z1 days
Testing same since78659  2016-01-21 04:21:11 Z0 days1 attempts


People who touched revisions under test:
  Cole Robinson 
  Daniel P. Berrange 
  Ján Tomko 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm fail
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-armhf-armhf-libvirt fail
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-armhf-armhf-libvirt-qcow2   fail
 test-armhf-armhf-libvirt-raw fail
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=libvirt
+ revision=a6cfd22eba5f204deaf0fde3bd3ca96474d77c55
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push libvirt 
a6cfd22eba5f204deaf0fde3bd3ca96474d77c55
+ branch=libvirt
+ 

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.

2016-01-21 Thread Jan Beulich
>>> On 21.01.16 at 17:16,  wrote:
>>  On January 20, 2016 at 7:29 pm,   wrote:
>> >>> On 20.01.16 at 11:26,  wrote:
>> >> On January 15, 2016 at 9:10,  wrote:
>> >> >>> On 23.12.15 at 09:25,  wrote:
>> >> > @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
>> >> >  return 0;
>> >> >  }
>> >> >
>> >> > +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
>> >> > + u16 seg, u8 bus, u8
>> >> > +devfn)
>> >> {
>> >> > +struct domain *d;
>> >> > +struct pci_dev *pdev;
>> >> > +
>> >> > +d = rcu_lock_domain_by_id(iommu->domid_map[did]);
>> >> > +ASSERT(d);
>> >>
>> >> Don't you need to acquire some lock in order to safely assert this?
>> >
>> > Referred to the other use case of 'rcu_lock_domain_by_id()', Xen
>> > checks whether The domain is 'NULL'.
>> > Could I also replace the 'ASSERT(d)' with
>> >   + If ( d == NULL )
>> >   +   return;
>> > ?
>> 
>> At a first glance this doesn't look right, but in the end that's something 
> you need
>> to answer.
>> 
> 
> Is it quite similar to whether the domain has been destroyed when Device-TLB 
> is flushing?

Going back to the original question I raised, you need to ask
yourself: Can d validly be NULL when you get here (e.g. due to
some other processor changing something in parallel)? If yes,
you can't ASSERT(), and the next question then is what exactly
it means that it got ripped off the table. The answer to that
determines whether simply returning would be okay.

>> >> Also note that unused slots hold zero, i.e. there's at least a
>> >> theoretical
>> > risk of
>> >> problems here when you don't also look at
>> >> iommu->domid_bitmap.
>> >>
>> > I am not clear how to fix this point. Do you have good idea?
>> > Add a lock to 'iommu->domid_bitmap'?
>> 
>> How would a lock help avoiding mistaking an unused slot to mean Dom0? As
>> already suggested - I think you simply need to consult the bitmap along with 
> the
>> domain ID array.
>> 
> 
> Try to get domain id with iommu->domid_map[did] ?

???

>> >> > +{
>> >> > +list_del(>domain_list);
>> >>
>> >> This should happen under pcidevs_lock - you need to either acquire it
>> >> or
>> >> ASSERT() it being held.
>> >>
>> >
>> > I should check whether it is under pcidevs_lock -- with
>> > spin_is_locked(_lock)
>> > If it is not under pcidevs_lock, I should acquire it.
>> > I think ASSERT() is not a good idea. Hypervisor acquires this lock and
>> > then remove the resource.
>> 
>> I don't understand this last sentence.
>> 
> For example: in 
> pci_remove_device()
> {
> ...
> spin_lock(_lock);
> ..
> iommu_remove_device()..
> ..
> spin_unlock(_lock);
> }
> 
> Device-TLB is maybe flush error in iommu_remove_device()..
> Then it is under pcidevs_lock..
> In this case, I need to check whether it is under pcidevs_lock.
> If not, I need to acquire the pcidevs_lock.

Ah, okay. But you can't use spin_is_locked() for that purpose.

>> >> > @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
>> >> >  queue_invalidate_iotlb(iommu,
>> >> > type >>
>> >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
>> >> > dw, did, size_order, 0, addr);
>> >> > +
>> >> > +/*
>> >> > + * Synchronize with hardware for invalidation request
>> descriptors
>> >> > + * submitted before Device-TLB invalidate descriptor.
>> >> > + */
>> >> > +rc = invalidate_sync(iommu);
>> >> > +if ( rc )
>> >> > + return rc;
>> >> > +
>> >> >  if ( flush_dev_iotlb )
>> >> >  ret = dev_invalidate_iotlb(iommu, did, addr, size_order,
>> type);
>> >> > -rc = invalidate_sync(iommu);
>> >> > +
>> >> >  if ( !ret )
>> >> >  ret = rc;
>> >> >  }
>> >>
>> >> This change is because of ...?
>> >>
>> > As Kevin's comments,
>> > I add 'dev_invalidate_iotlb_sync()' to dev_invalidate_iotlb().
>> > Then i can know which device is flush timeout.
>> 
>> I don't understand: How is your reply related to you moving up the 
> invocation of
>> invalidate_sync()?
> 
> This invalidate_sync() is to synchronize with hardware for invalidation 
> request descriptors
> submitted before Device-TLB invalidate descriptor.
> 
> If I don't add this invalidate_sync().. 
> dev_invalidate_iotlb_timeout() is not clear whether the flush time out is 
> about Device-TLB flush error or the other previous iotlb/iec/context flush 
> error. 

This being anything but obvious, maybe a brief comment would
help?

>> >> Plus logging the error code and device coordinates might turn out
>> >> helpful in such cases. But first of all - if there was a timeout,
>> >> we'd make it here only for Dom0. Perhaps the printing should move into an
>> else to the domain_crash()?
>> >> And if there was another error, the 

Re: [Xen-devel] [PATCH RFC 18/31] xen/x86: Improve disabling of features which have dependencies

2016-01-21 Thread Jan Beulich
>>> On 16.12.15 at 22:24,  wrote:
> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -278,11 +278,16 @@ unsigned int xstate_ctxt_size(u64 xcr0)
>  /* Collect the information of processor's extended state */
>  void xstate_init(struct cpuinfo_x86 *c)
>  {
> +static bool_t __initdata use_xsave = 1;
> +
>  bool_t bsp = c == _cpu_data;
>  u32 eax, ebx, ecx, edx;
>  u64 feature_mask;
>  
> -if ( boot_cpu_data.cpuid_level < XSTATE_CPUID )
> +boolean_param("xsave", use_xsave);

Please make this sit next to the related variable. With that
Reviewed-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.6-testing test] 78618: tolerable trouble: broken/fail/pass - PUSHED

2016-01-21 Thread osstest service owner
flight 78618 xen-4.6-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/78618/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt-xsm  3 host-install(3) broken REGR. vs. 78528
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail REGR. vs. 78528
 test-armhf-armhf-xl-rtds15 guest-start/debian.repeat fail blocked in 78528
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 78528
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 78528

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 13 guest-saverestorefail never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  1fd615aa0108490ffc558d27627f509183cbfdaf
baseline version:
 xen  6150df9f3f99ecbcbd9917002186d1d895b5602e

Last test of basis78528  2016-01-19 14:09:10 Z2 days
Testing same since78618  2016-01-20 13:17:45 Z1 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Feng Wu 
  Ian Campbell 
  Jan Beulich 
  Kevin Tian 
  Yang Zhang 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-prev pass
 build-i386-prev  pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   

[Xen-devel] [PULL 03/11] xen-hvm: Clean up xen_hvm_init() error handling

2016-01-21 Thread Stefano Stabellini
From: Markus Armbruster 

xen_hvm_init() returns -1 without cleaning up on some errors (harmless
long as the caller exit()s on error), dies with hw_error() on others.
hw_error() isn't approprate here.  Clean up to exit() on all errors.

Signed-off-by: Markus Armbruster 
Reviewed-by: Stefano Stabellini 
Signed-off-by: Stefano Stabellini 
---
 hw/i386/pc_piix.c|5 ++---
 hw/i386/pc_q35.c |5 ++---
 include/hw/xen/xen.h |2 +-
 xen-hvm-stub.c   |3 +--
 xen-hvm.c|   61 ++
 5 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index df2b824..db0ae9c 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -121,9 +121,8 @@ static void pc_init1(MachineState *machine,
 pcms->below_4g_mem_size = machine->ram_size;
 }
 
-if (xen_enabled() && xen_hvm_init(pcms, _memory) != 0) {
-fprintf(stderr, "xen hardware virtual machine initialisation 
failed\n");
-exit(1);
+if (xen_enabled()) {
+xen_hvm_init(pcms, _memory);
 }
 
 pc_cpus_init(pcms);
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 412b3cd..abbcbf9 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -111,9 +111,8 @@ static void pc_q35_init(MachineState *machine)
 pcms->below_4g_mem_size = machine->ram_size;
 }
 
-if (xen_enabled() && xen_hvm_init(pcms, _memory) != 0) {
-fprintf(stderr, "xen hardware virtual machine initialisation 
failed\n");
-exit(1);
+if (xen_enabled()) {
+xen_hvm_init(pcms, _memory);
 }
 
 pc_cpus_init(pcms);
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index e90931a..d07bc99 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -39,7 +39,7 @@ qemu_irq *xen_interrupt_controller_init(void);
 void xenstore_store_pv_console_info(int i, struct CharDriverState *chr);
 
 #if defined(NEED_CPU_H) && !defined(CONFIG_USER_ONLY)
-int xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory);
+void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory);
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
struct MemoryRegion *mr);
 void xen_modified_memory(ram_addr_t start, ram_addr_t length);
diff --git a/xen-hvm-stub.c b/xen-hvm-stub.c
index 6a39425..7ff0602 100644
--- a/xen-hvm-stub.c
+++ b/xen-hvm-stub.c
@@ -47,9 +47,8 @@ void xen_modified_memory(ram_addr_t start, ram_addr_t length)
 {
 }
 
-int xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
+void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
-return 0;
 }
 
 void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
diff --git a/xen-hvm.c b/xen-hvm.c
index 2a93390..cb7128c 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -17,6 +17,7 @@
 #include "qmp-commands.h"
 
 #include "sysemu/char.h"
+#include "qemu/error-report.h"
 #include "qemu/range.h"
 #include "sysemu/xen-mapcache.h"
 #include "trace.h"
@@ -1189,16 +1190,8 @@ static void xen_wakeup_notifier(Notifier *notifier, void 
*data)
 xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
 }
 
-/* return 0 means OK, or -1 means critical issue -- will exit(1) */
-int xen_hvm_init(PCMachineState *pcms,
- MemoryRegion **ram_memory)
+void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 {
-/*
- * FIXME Returns -1 without cleaning up on some errors (harmless
- * as long as the caller exit()s on error), dies with hw_error()
- * on others.  hw_error() isn't approprate here.  Should probably
- * simply exit() on all errors.
- */
 int i, rc;
 xen_pfn_t ioreq_pfn;
 xen_pfn_t bufioreq_pfn;
@@ -1210,19 +1203,19 @@ int xen_hvm_init(PCMachineState *pcms,
 state->xce_handle = xen_xc_evtchn_open(NULL, 0);
 if (state->xce_handle == XC_HANDLER_INITIAL_VALUE) {
 perror("xen: event channel open");
-return -1;
+goto err;
 }
 
 state->xenstore = xs_daemon_open();
 if (state->xenstore == NULL) {
 perror("xen: xenstore open");
-return -1;
+goto err;
 }
 
 rc = xen_create_ioreq_server(xen_xc, xen_domid, >ioservid);
 if (rc < 0) {
 perror("xen: ioreq server create");
-return -1;
+goto err;
 }
 
 state->exit.notify = xen_exit_notifier;
@@ -1238,8 +1231,9 @@ int xen_hvm_init(PCMachineState *pcms,
_pfn, _pfn,
_evtchn);
 if (rc < 0) {
-hw_error("failed to get ioreq server info: error %d handle=" 
XC_INTERFACE_FMT,
- errno, xen_xc);
+error_report("failed to get ioreq server info: error %d handle=" 
XC_INTERFACE_FMT,
+ errno, xen_xc);
+goto err;
 }
 
 DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
@@ -1249,8 

  1   2   >