date:20150702

Re: [Xen-devel] [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-02 Thread Chen, Tiejun


When you say not tools, I take it to mean that you're not exposing
that option through the libxl interface?


Yes.



tools/libxc/xc_domain.c:xc_assign_dt_device() most certainly does pass
it in, and that's the level I'm talking about.  Someone reviewing this
patch series needs to know, when xc or libxl set NO_RDM, what will be
the effect?  The fact that libxc *shouldn't* set NO_RDM for PCI devices
doesn't mean it won't happen.

Now looking at the end of the series and grepping for
XEN_DOMCTL_DEV.*RDM, these values are *read and acted on* in exactly
two places:

xen/arch/x86/mm/p2m.c: The whole point of this series; if the p2m is
occupied already, and flag == RDM_STRICT, return an error; otherwise
ignore it.

xen/drivers/passthrough/device_tree.c: If flag != NO_RDM, return an error.

So the meaning of the flags is:
For pci devices:
  - RDM_RELAXED, NO_RDM: ignore conflicts in set_identity_p2m_entry()
  - RDM_STRICT: error on conflicts in set_identity_p2m_entry()
for dt devices:
  - Error if not NO_RDM


Correct.



It doesn't look to me like the NO_RDM setting actually adds any semantic
meaning.

What I see in the list of references you gave is a request from the list
below is Julien saying this:

I would also add a XEN_DOMCTL_DEV_NO_RDM that would be use for non-PCI
assignment.

It looks a bit like what you did is said, Well Julien asked for a
NO_RDM setting, so here's a NO_RDM setting.  Which while perhaps
understandable, doesn't make the value have any more usefulness.

It seems to me that the real problem is that you had two values to begin
with, rather than actually having flags (as the name would imply).

This what I would suggest.  Make a single flag:

#define _XEN_DOMCTL_DEV_RDM_RELAXED 0
#define XEN_DOMCTL_DEV_RDM_RELAXED  (1U_XEN_DOMCTL_DEV_RDM_RELAXED)

Then make the meaning of the flags as follows:
* for pci devices:
  - RDM_RELAXED flag SET: ignore conflicts in set_identity_p2m_entry()
  - RDM_RELAXED flag CLEAR: error on conflicts in set_identity_p2m_entry()


No problem.


* for dt devices:
  - Ignore this flag entirely


But we still a flag to assign_device() like this,

diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c

index 5d3842a..a182487 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)

 goto fail;
 }

-rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev));
+rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev),
+ XEN_DOMCTL_DEV_RDM_RELAXED);

 if ( rc )
 goto fail;

Or rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev), 0)?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 00/27] Libxl migration v2

2015-07-02 Thread Yang Hongyang


Hi Andrew,

  Are there any updates of this series that I can checkout and rebase mine onto?
:)

On 06/15/2015 09:44 PM, Andrew Cooper wrote:

This series adds support for the libxl migration v2 stream, and untangles the
existing layering violations of the toolstack and qemu records.

At the end of the series, legacy migration is no longer used.

Note: Remus support is broken and (RFC) fixed in separate patches in this
series.  It was too tangled to fix in a bisectable fashon.  Plain
suspend/migrate/resume however is (should be) bisectable along the entire
series.

There are a couple of outstanding questions:

1) What to do about the toolstack/xenstore record.  It is currently by being
passed around as a blob, but it might be better to split it out.

2) What (if any) ABI/API qualifications are needed? (Particularly in reference
to patch 21)

The Remus code is untested by me, but is hopefully in the correct ballpark.
All other combinations of suspend/migrate/resume have been tested with PV and
HVM guests (qemu-trad and qemu-upstream), including 32 - 64 bit migration
(which was the underlying bug causing us to write migration v2 in the first
place).

There are some further improvements which could be made.  In particular, it
appears that sending the toolstack record on each checkpoint is redundant, and
there is certainly room for some more pruning of the legacy migration code.

Anyway, thoughts/comments welcome.  Please test!

~Andrew


Andrew Cooper (22):
   tools/libxl: Fix libxl__ev_child_inuse() check for not-yet-initialised 
children
   tools/libxc: Always compile the compat qemu variables into xc_sr_context
   tools/libxl: Stash all restore parameters in domain_create_state
   tools/xl: Mandatory flag indicating the format of the migration stream
   tools/libxl: Introduce ROUNDUP()
   tools/libxl: Extra APIs for the save helper
   tools/libxl: Pass restore_fd as a parameter to libxl__xc_domain_restore()
   docs: Libxl migration v2 stream specification
   tools/python: Libxc migration v2 infrastructure
   tools/python: Libxl migration v2 infrastructure
   tools/python: Verification utility for v2 stream spec compliance
   tools/python: Conversion utility for legacy migration streams
   tools/libxl: Support converting a legacy stream to a v2 stream
   tools/libxl: Convert a legacy stream if needed
   tools/libxc+libxl+xl: Restore v2 streams
   tools/libxc+libxl+xl: Save v2 streams
   docs/libxl: [RFC] Introduce CHECKPOINT_END to support migration v2 remus 
streams
   tools/libxl: [RFC] Write checkpoint records into the stream
   tools/libx{c,l}: [RFC] Introduce restore_callbacks.checkpoint()
   tools/libxl: [RFC] Handle checkpoint records in a libxl migration v2 stream
   tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc
   tools/libxl: Drop all knowledge of toolstack callbacks

Ian Jackson (2):
   libxl: cancellation: Preparations for save/restore cancellation
   libxl: cancellation: Handle SIGTERM in save/restore helper

Ross Lagerwall (3):
   tools/libxl: Migration v2 stream format
   tools/libxl: Infrastructure for reading a libxl migration v2 stream
   tools/libxl: Infrastructure for writing a v2 stream

  docs/specs/libxl-migration-stream.pandoc  |  218 
  tools/libxc/Makefile  |2 -
  tools/libxc/include/xenguest.h|3 +
  tools/libxc/xc_sr_common.h|5 -
  tools/libxc/xc_sr_restore.c   |   33 +-
  tools/libxc/xc_sr_restore_x86_hvm.c   |  124 -
  tools/libxc/xc_sr_save_x86_hvm.c  |   36 --
  tools/libxl/Makefile  |2 +
  tools/libxl/libxl_aoutils.c   |7 +
  tools/libxl/libxl_convert_callout.c   |  146 ++
  tools/libxl/libxl_create.c|   80 +--
  tools/libxl/libxl_dom.c   |   61 +--
  tools/libxl/libxl_internal.h  |  140 -
  tools/libxl/libxl_save_callout.c  |   63 +--
  tools/libxl/libxl_save_helper.c   |   95 ++--
  tools/libxl/libxl_save_msgs_gen.pl|9 +-
  tools/libxl/libxl_sr_stream_format.h  |   58 +++
  tools/libxl/libxl_stream_read.c   |  663 
  tools/libxl/libxl_stream_write.c  |  640 +++
  tools/libxl/libxl_types.idl   |2 +
  tools/libxl/xl_cmdimpl.c  |9 +-
  tools/python/Makefile |4 +
  tools/python/scripts/convert-legacy-stream.py |  683 +
  tools/python/scripts/verify-stream-v2.py  |  174 +++
  tools/python/setup.py |1 +
  tools/python/xen/migration/libxc.py   |  446 
  tools/python/xen/migration/libxl.py   |  199 +++
  tools/python/xen/migration/tests.py   |   54 ++
  tools/python/xen/migration/verify.py  |   37 ++
  29 files changed, 3638

Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-02 Thread Roger Pau Monné

El 01/07/15 a les 18.15, Boris Ostrovsky ha escrit:
 On 07/01/2015 11:57 AM, Ian Campbell wrote:
 The next Xen technical call will be at:
  Wed  8 Jul 17:00:00 BST 2015
  `date -d @1436371200`

 See http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html
 for more information on the call.

 Please let me know (CC-ing the list) any topics which you would like to
 discuss. It might be useful to include:

* References to any relevant/recent mailing list threads;
* Other people who you think should be involved in the discussion (and
  CC them);

 If you would like to attend then please let me know so I can send you the
 dial in details.
 
 
 Given that there is fair amount of PVH-related work happening now
 (Roger's, Elena's and mine) perhaps we should have a discussion about
 that to see where we are going?
 
 Andrew, Tim, Roger, Jan (if he is back from vacation), Elena, Konrad and
 David would be good to have present.

I'm sorry but I will be on vacation next week, hiking in the mountains
without Internet or cell phone coverage, so I won't be able to join the
meeting.

I will be back on the 20th, maybe we can arrange to do another of this
PVH meetings on the next technical call in early August?

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [rumpuserxen test] 59019: regressions - FAIL

2015-07-02 Thread osstest service user

flight 59019 rumpuserxen real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59019/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-rumpuserxen   5 rumpuserxen-build fail REGR. vs. 33866
 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866

Tests which did not succeed, but are not blocking:
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a

version targeted for testing:
 rumpuserxen  3b91e44996ea6ae1276bce1cc44f38701c53ee6f
baseline version:
 rumpuserxen  30d72f3fc5e35cd53afd82c8179cc0e0b11146ad


People who touched revisions under test:
  Antti Kantee po...@iki.fi
  Ian Jackson ian.jack...@eu.citrix.com
  Martin Lucina mar...@lucina.net
  Wei Liu l...@liuw.name


jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-i386-rumpuserxen-i386 blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 535 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 04:32 +, Wu, Feng wrote:

  -Original Message-
  From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
  Sent: Tuesday, June 30, 2015 10:58 AM
  To: Wu, Feng
  Cc: xen-devel; k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com;
  Tian, Kevin; Zhang, Yang Z; george.dun...@eu.citrix.com; Wu, Feng
  Subject: Re: Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU
  scheduling

  On Mon, 2015-06-29 at 18:36 +0100, Andrew Cooper wrote:

   The basic idea here is:
   1. When vCPU's state is RUNSTATE_running,
   - set 'NV' to 'Notification Vector'.
   - Clear 'SN' to accpet PI.
   - set 'NDST' to the right pCPU.
   2. When vCPU's state is RUNSTATE_blocked,
   - set 'NV' to 'Wake-up Vector', so we can wake up the
 related vCPU when posted-interrupt happens for it.
   - Clear 'SN' to accpet PI.
   3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
   - Set 'SN' to suppress non-urgent interrupts.
 (Current, we only support non-urgent interrupts)
   - Set 'NV' back to 'Notification Vector' if needed.

  It might be me, but it feels a bit odd to see RUNSTATE-s being (ab)used
  directly for this, as it does feel odd to see arch specific code being
  added in there.

  Can't this be done in context_switch(), which is already architecture
  specific? I was thinking to something very similar to what has been done
  for PSR, i.e., on x86, put everything in __context_switch().

  Looking at who's prev and who's next, and at what pause_flags each has
  set, you should be able to implement all of the above logic.

  Or am I missing something?

 As mentioned in the description of this patch, here we need to do
 something when the vCPU's state is changed, can we get the
 state transition in __context_switch(), such as running - blocking?

Well, in the patch description you mention how you've done it, so of
course it mentions runstates.

That does not necessarily means we need to do something in
vcpu_runstate_change(). Actually, that's exactly what I'm asking: can
you check whether this thing that you need doing can be done somewhere
else than in vcpu_runstaete_change() ?

In fact, looking at how, where and what for, runstetes are used, that
really does not feel right, at least to me. What you seem to be
interested is whether a vCPU blocks and/or unblocks. Runstates are an
abstraction, build up on top of (mostly) pause_flags, like _VPF_blocked
(look at how runstate is updated).

I think you should not build on top of such abstraction, but on top of
pause_flags directly. I had a quick look, and it indeed seems to me that
you can get all you need from there too. It might even result in the
code looking simpler (but that's of course hard to tell without actually
trying). In fact, inside the context switching code, you already know
that prev was running so, if it has the proper flag set, it means it's
blocking (i.e., going to RUNSTATE_blocked, in runstates language), if
not, it maybe is being preempted (i.e., going to RUNSTATE_runnable).
Therefore, you can enact all your logic, even without any need to keep
track of the previous runstate, and without needing to build up a full
state machine and looking at all possible transitions.

So, can you have a look at whether that solution can fly? Because, if it
does, I think it would be a lot better.

Regards,
Dario
-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-02 Thread Ian Campbell

On Thu, 2015-07-02 at 09:32 +0200, Roger Pau Monné wrote:
 El 01/07/15 a les 18.15, Boris Ostrovsky ha escrit:
  On 07/01/2015 11:57 AM, Ian Campbell wrote:
  The next Xen technical call will be at:
   Wed  8 Jul 17:00:00 BST 2015
   `date -d @1436371200`
 
  See http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html
  for more information on the call.
 
  Please let me know (CC-ing the list) any topics which you would like to
  discuss. It might be useful to include:
 
 * References to any relevant/recent mailing list threads;
 * Other people who you think should be involved in the discussion (and
   CC them);
 
  If you would like to attend then please let me know so I can send you the
  dial in details.
  
  
  Given that there is fair amount of PVH-related work happening now
  (Roger's, Elena's and mine) perhaps we should have a discussion about
  that to see where we are going?
  
  Andrew, Tim, Roger, Jan (if he is back from vacation), Elena, Konrad and
  David would be good to have present.

Yes, I think this topic would indeed be a worthwhile subject for
discussion.

I think there is a toolstack element to all this as well, so would it be
a good idea to have Ian J, Stefano and Wei in addition to me (who will
be there to run the call in any case)? I'd probably mark them in as
optional participants for scheduling purposes.

 I'm sorry but I will be on vacation next week, hiking in the mountains
 without Internet or cell phone coverage, so I won't be able to join the
 meeting.

I think you are one of the critical participants in the discussion, i.e.
we can't go ahead without you.

 I will be back on the 20th, maybe we can arrange to do another of this
 PVH meetings on the next technical call in early August?

The August call is scheduled for 12 August, which is 6 weeks from
yesterday.

We could either go for that or if it is too far out we could push the
July iteration of the call back two weeks to Wednesday 22 July (so 3
weeks from yesterday). (a one week slip would still land in your
vacation time).

There is also the devsummit in the week 17 August (~7 weeks time) which
is an opportunity for a f2f.

Shall I put up a poll of some sort to gather preferred timeslot options
out of that set?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3 10/18] xen/arm: ITS: Add APIs to add and assign device

2015-07-02 Thread Ian Campbell

On Thu, 2015-07-02 at 14:10 +0530, Vijay Kilari wrote:
 On Mon, Jun 29, 2015 at 5:59 PM, Ian Campbell ian.campb...@citrix.com wrote:
  CIOn Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote:
  +/* Device assignment. Should be called from pci_device_add */
  +int its_add_device(struct domain *d, u32 devid)
  +{
 
  Prior to the PCI series landing, and to enable dom0 to use ITS it might
  be possible to call this from xen/arch/arm/platforms/thunderx.c via the
  specific_mappings platform hook, which would also expose the PCI
  controller to dom0 via a series of specific mmio mappings (look at
  xen/arch/arm/platforms/xgene-storm.c for the sort of thing I mean).
 
  That would, I think, give basic PCI functionality for dom0 (i.e.
  allowing us to boot on thunderx) and decouple things from the PCI series
  somewhat, which ought to make things easier overall IMHO.
 
   In case ThunderX, mmio mappings PCI RC is parsed to find devices.
 How do we know device ids upfront to call its_add_device?

Ah yes, this approach probably doesn't work for ITS. Which is a shame.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 08:58 +, Wu, Feng wrote:

  -Original Message-
  From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
 
  This is the third time that I ask:
   (1) whether it is possible to have more vcpus queued on one pcpu PI
   blocked list with desc.on (I really believe it is);
 
 I think it is, please see the following scenario:
 
 When cpu masks the interrupts, and an external interrupt occurs for the
 assigned device while the target vCPU2 is blocked, the wakeup notification
 event handler has no chance to run, after a while, another wakeup
 notification event for vCPU4 blocking on the same pCPU occurs,
 after cpu unmakes the interrupts, wakeup notification handler
 gets called. Then we get:
   vCPU2, desc.on = 1 and vCPU4, desc.on = 1
 Then in the handler we need to kick both of them.
 
Ok, first of all, thanks for answering! :-)

And yes, this makes sense.

   (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
   soon as any notification arrives, instead that putting together a
   mechanism for kicking only a specific one.
  
 Why can't we kick all of them, 'desc.on = 1' means there is a pending
 interrupt, when we meet this condition, kicking the related vCPU should
 be the right thing to do.
 
Right, I see it now. I felt like I was missing something, and that's why
I was asking to you to elaborate a bit more.
Thanks again for having done this. I was missing/forgetting half of the
way desc.on is actually handled, sorry for this.

BTW, I'm finding it hard reading this series from the archives; there
appears to be some threading issues and some missing messages. I also
don't have it in my inbox, because my filters failed to spot and flag it
properly. If you send a new version, please, Cc me, so it will be easier
for me to look at all the patches, and provide a more helpful review.

Thanks and Regards,
Dario

-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 2/6] libxl: do not add a vkb backend to hvm guests

2015-07-02 Thread Stefano Stabellini

On Wed, 1 Jul 2015, Konrad Rzeszutek Wilk wrote:
 On Wed, Jul 01, 2015 at 11:29:46AM +0100, Stefano Stabellini wrote:
  On Tue, 30 Jun 2015, Konrad Rzeszutek Wilk wrote:
   On Tue, Jun 30, 2015 at 03:13:53PM +0100, Ian Campbell wrote:
On Tue, 2015-06-30 at 15:02 +0100, Stefano Stabellini wrote:
 On Tue, 30 Jun 2015, Ian Campbell wrote:
  On Tue, 2015-06-30 at 12:21 +0100, Stefano Stabellini wrote:
   On Tue, 30 Jun 2015, Ian Campbell wrote:
On Mon, 2015-06-29 at 18:59 +0100, Stefano Stabellini wrote:
 On Thu, 25 Jun 2015, Ian Campbell wrote:
  On Tue, 2015-06-16 at 16:39 +0100, Stefano Stabellini wrote:
   On Tue, 16 Jun 2015, Wei Liu wrote:
On Wed, Jun 10, 2015 at 11:09:50AM +0100, Stefano 
Stabellini wrote:
 When QEMU restricts its xenstore connection, it 
 cannot provide PV
 backends. A separate QEMU instance is required to 
 provide PV backends in
 userspace, such as qdisk. With two separate 
 instances, it is not
 possible to take advantage of vkb for mouse and 
 keyboard, as the QEMU
 that emulates the graphic card (the device model), 
 would be separate
 from the QEMU running the vkb backend (PV QEMU).
 

The question is that how would this affect the 
non-split setup.
   
   vkb is useful because emulating usb forces QEMU to wake 
   up more often.
   However there is no way around it.
  
  Does pvfb+vkb continue to work due to code somewhere else?
 
 Yes, it continues to work as usual for PV guests.
 
 
  Do we think anyone will actually be using emulated VGA + PV 
  input
  devices?
 
 VGA + PV input only works with Linux and is only useful for 
 power
 efficiency, because if you disable usb emulation in QEMU, 
 then QEMU
 would be able to wake up less often. Given that usb emulation 
 is still
 on by default, I don't think that this change will have a big 
 impact.

My question was whether we thought anyone would be using this
non-default configuration, not what the impact on the default 
is.

You gave a good reason why people might be using this facility, 
do you
think anyone is actually using it?

   I don't know of anybody using it. I don't think we made clear 
   enough how
   to use this non-default configuration and its advantages for 
   users to go
   out of their ways to use it. 
  
  That's good enough for me, thanks,.
 
 Can I add your acked-by?

If you put some distillation of the reasoning given in this subthread
for why we think we can get away with it into the commit message then
yes.
   
   Why don't we also make the Linux code not expose this driver for HVM 
   guests?
   
   I've had an go for this last year (can't find the link) as it would unduly
 
 And the link:
 
 http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg00160.html
   cause the Linux kernel to take an extra 30 seconds to boot. That is 
   because
   'xend' by default exposes the PV configuration even for HVM guests - and 
   of
   course there are no PV drivers (as the VGA in QEMU is enabled).
  
  But even with xend it only happens with a vfb line is in the config
  file, right? That's why we didn't fix it back when the issue was
  reported, if I remember correctly.
 
 Both. If you had 'vnc' or 'vfb' it would setup the 'vfb' key.
  
  
   The only use case I had was for ARM - where there are no VGA - and the
   patch I think I had just disabled the xen-fbfront under X86 HVM.
  
  Yeah, we need xen-fbfront for ARM.
  
  
  Given that xen-fbfront is likely to go away for HVM guests, I wouldn't
  be opposed to stop the driver initialization in Linux on x86/HVM. Unless
  Roger's work on HVMlite is going to need xen-fbfront again, but in that
  case we'll be able to distinguish a regular HVM guest from an HVMlite
  guest, I think.
 
 Correct. Right now the 'xen_pvh_domain' is based on it being PV and 
 auto-translate.
 That whole thing will need some help.
 
 
 But I am looking at the xen-fbfront.c driver and it might be that
 I had already fixed this issue! (inadvertly it seems)
 
 51c71a3bbaca868043cc45b3ad3786dd48a90235
 Author: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Date:   Tue Nov 26 15:05:40 2013 -0500
 
 xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4).
 
 ..
- if running in HVM, check if user wanted 'xen_emul_unplug=never',
in which case bail out and don't load any PV drivers.
  - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci)
does not exist, then bail out and not load PV drivers.
  - (v2)

Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy

2015-07-02 Thread Chen, Tiejun


On 2015/7/2 18:28, George Dunlap wrote:

On 07/02/2015 11:01 AM, Chen, Tiejun wrote:

1. After spending yet another half hour doing research, I haven't found
any discussion that concluded we should have the global policy override
the local policy


I also took some time to go back checking this point and indeed this is
not in that public design. And as I mentioned in another email which is
following this, I also had a talk to Kevin about this issue, and looks
this is just concluded from our internal discussion and he didn't post
this in v2 design again because as you know, that design is about
something in high level. And as I recall, these discussions can't cover
everything at that moment because they thought we'd better post a
preliminary patches to further discuss something since this is really a
complicated case. So afterwards I sent out two RFC revisions to help all
guys finalize a good solution. And I can confirm current policy is
always same from the first RFC, but we didn't see any opposite advice
until now.


Probably because the reviewers all assumed that the design draft had
been followed, and you didn't make it clear that you'd changed it.


Shouldn't the patch head description already clarify this point? And I 
also comment this point in the code. After all, we already had several 
rounds of technical reviews so its a little hard to believe it was not 
obvious to be missed.





2. The only discussion I *did* find has *you yourself* saying that the
per-device setting should override the global setting, not once, but
twice; and nobody contradicting you.

Maybe there is somewhere else a discussion somewhere where this was
changed; but I've already spent half an hour this morning looking at
where you said it was (v2 design discussion), and found the opposite --
just as I remembered.  I'm not going to look anymore.

You have now caused me to waste an awful lot of time on this series that
could profitably have been used elsewhere.


Sorry to this but I just think we already have 2 RFC revisions and 4
revisions without RFC, and some patches are already Acked, we really
should overturn this policy right now?


First of all, I think it's easy to change.



I agree but what I'm saying is this is involving our policy. It 
shouldn't change this sort of thing if not all associated maintainers 
are in the agreement with you.



Even if it weren't, I already said that I'd be OK with accepting the
patch series with the existing override semantics, and without the
default semantics, *if* it were renamed to make it clear what was
going on.

But, for future reference, I am not going to approve an interface I
think is misleading or wrong -- particularly one like the xl interface
which we want to avoid changing if possible -- just because time is
short.  One of my own features, HVM USB pass-through, has narrowly
missed two releases (including the current one) because we wanted to be
careful to get the interface right.


I admit I should concern everything carefully like you.




Again, I didn't walk into v2 design. So here I don't want to bring any
confusion to you just with my reply.


This is your feature, so it is your responsibility to understand and
explain why you are doing what you are doing, if only to say Jan wanted


Maybe you remember I just posted v1 but looks that was not a better
design to show this implementation according to some feedback, so Kevin
issued v2 revision and had a wider discussion with you guys. Since then
I just follow this version. So I mean I don't further hold these things
in high level since I just think both policy is fine to me because IMO,
these two approaches are optional.


X to happen because of Y [see $ref].



So this is why I said you'd better ask this to Kevin or Jan since I
can't decide what's next at this point.


Let me say that again: I don't care whether anyone pulled rank and
ordered you to do something a certain way.  YOU are the one submitting
this patch.  That means YOU responsible for understanding why they want
it that way, and YOU are responsible for justifying it to other people.
  If you don't understand it at all, it's YOUR responsibility to get them
to explain it, not mine to chase them down.



As I said above I thought initially they're optional, and just about 
which one is a preference. So I picked up these patch descriptions 
reviewed in public to say this is our expectation. But looks this is not 
satisfied to you, so I don't think I can further explain this kind of 
thing appropriately, and then I ask you to ping Jan or Kevin to get a 
formal answer. Is this procedure not reasonable?


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread George Dunlap

On 07/02/2015 12:25 PM, Tim Deegan wrote:
 At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote:
 On 02/07/15 11:48, George Dunlap wrote:
 Now in p2m_set_mem_access(), rather than just using an unsigned long in
 the loop iterating over gfns, you do this thing where you convert gfn_t
 to unsigned long, add one, and then convert it back to gfn_t again.

 I can't see any comments in v3 that suggest you doing that, and it seems
 a bit clunky.  Is that really necessary?  Wouldn't it be better to
 declare a local variable?

 I'm not strongly opinionated on this one, it just seems a bit strange.

 Everything else looks good, thanks.

 Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
 typesafety don't allow for simply adding 1 to a typesafe variable.

 In a cases like this, I think it is acceptable to keep a unsigned long
 shadow variable and manipulate it is a plain integer.  The eventual
 _gfn() required to pass it further down the callchain will help to
 visually re-enforce the appropriate type.

 After all, the entire point of these typesafes are to try and avoid
 accidentally mixing up the different address spaces, but a function
 which takes a typesafe, loops over a subset and passes the same typesafe
 further down can probably be trusted to DTRT, catching errors at review
 time. 

 Off the top of my head, the only functions which would normally expect
 to mix and match the typesafes are the pagetable walking ones.
 
 It should be easy enough to extend the macros to define a
 gfn_inc(gfn_t) operator for this kind of thing.

I was thinking that -- although in this case you'd still need to un-pack
it to do the loop exit conditional.  To really make things pretty you'd
want a for_gfn_range() macro or something like that that takes a start
gfn and a number.

But that's really starting to be feature creep for this patch, which is
why I didn't want to suggest it on v4. :-)

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.6 Development Update (2 WEEKS TO FREEZE, important information in preamble)

2015-07-02 Thread Razvan Cojocaru

Hello Ian, Julien,

On 07/02/2015 11:35 AM, Ian Campbell wrote:
 On Wed, 2015-07-01 at 11:17 +0100, Julien Grall wrote:
 As suggested by Wei on the top of his mail [1], can you please CC only
 relevant people and avoid to reply all?
 
 It seems that many people are unable to follow these simple
 instructions.
 
 Wei, perhaps you could stop CCing people who inappropriately do not trim
 their quotes or the CC list in the future. If they cannot do us the
 courtesy of doing so I don't see why they should receive a courtesy copy
 of the status mail.
 
 A somewhat less aggressive approach might be to use Bcc instead Cc for
 the bulk of people (i.e. anyone who needn't be cc-d on every reply). The
 failure case of someone who cannot read simple instructions then becomes
 a lack of CCs rather than a plethora of unwanted Ccs.

First of all, let me apologize for doing this in the past, I'll
certainly remember to not let it happen again.

Second, I'd like to point out that, while I cannot speak for everyone,
and so maybe it's just me, I find this statement: please trim your
quotes when replying, and also trim the CC list if necessary a bit
ambiguous.

The quotes part is obvious (and not that many people have ommited to do
that), but to be honest I haven't been clear on who is supposed to be in
the trimmed CC list: the maintainers of the code I'm touching with my
series and Wei? There doesn't seem to be a clear rule about who should
be replied to (or maybe there is and I've missed it? If so, could you
please point it out?). Maybe clearing this up could help with this
problem in the future.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread Andrew Cooper

On 02/07/15 11:48, George Dunlap wrote:
 On 06/29/2015 04:45 PM, Vitaly Kuznetsov wrote:
 'pfn' and 'start_pfn' are ambiguous, both these functions expect GFNs as 
 input.

 On x86 the interface of p2m_set_mem_access() in p2m.c doesn't match the
 declaration in p2m-common.h as 'pfn' is being used instead of 'start_pfn'.

 On ARM both p2m_set_mem_access and p2m_get_mem_access interfaces don't match
 declarations from p2m-common.h: p2m_set_mem_access uses 'pfn' instead of
 'start_pfn' and p2m_get_mem_access uses 'gpfn' instead of 'pfn'.

 Convert p2m_get_mem_access/p2m_set_mem_access (and __p2m_get_mem_access on 
 ARM)
 interfaces to using gft_t instead of unsigned long and update all users of
 these functions.

 There is also an issue in p2m_get_mem_access on x86: 'gfn' parameter passed 
 to
 gfn_lock/gfn_unlock is not defined. This code compiles only because of a
 coincidence: gfn_lock/gfn_unlock are currently macros which don't use their
 second argument.

 Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com
 ---
 Changes since v3:
 - Comment codying style fix [Razvan Cojocaru]
 - Use INVALID_GFN instead of ~0 and -1 [Andrew Cooper]
 - Convert p2m_get_mem_access/p2m_set_mem_access interfaces to using gfn_t
   [Andrew Cooper]
 But you missed a change...

 @@ -1600,9 +1600,11 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned 
 long gla,
  return (p2ma == p2m_access_n2rwx);
  }
  
 -/* Set access type for a region of pfns.
 - * If start_pfn == -1ul, sets the default access type */
 -long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr,
 +/*
 + * Set access type for a region of gfns.
 + * If gfn == INVALID_GFN, sets the default access type.
 + */
 +long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr,
  uint32_t start, uint32_t mask, xenmem_access_t 
 access)
  {
  struct p2m_domain *p2m = p2m_get_hostp2m(d);
 @@ -1638,18 +1640,19 @@ long p2m_set_mem_access(struct domain *d, unsigned 
 long pfn, uint32_t nr,
  return -EINVAL;
  }
  
 -/* If request to set default access */
 -if ( pfn == ~0ul )
 +/* If request to set default access. */
 +if ( gfn_x(gfn) == INVALID_GFN )
  {
  p2m-default_access = a;
  return 0;
  }
  
  p2m_lock(p2m);
 -for ( pfn += start; nr  start; ++pfn )
 +for ( gfn = _gfn(gfn_x(gfn) + start); nr  start;
 +  gfn = _gfn(gfn_x(gfn) + 1) )
 Now in p2m_set_mem_access(), rather than just using an unsigned long in
 the loop iterating over gfns, you do this thing where you convert gfn_t
 to unsigned long, add one, and then convert it back to gfn_t again.

 I can't see any comments in v3 that suggest you doing that, and it seems
 a bit clunky.  Is that really necessary?  Wouldn't it be better to
 declare a local variable?

 I'm not strongly opinionated on this one, it just seems a bit strange.

 Everything else looks good, thanks.

Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
typesafety don't allow for simply adding 1 to a typesafe variable.

In a cases like this, I think it is acceptable to keep a unsigned long
shadow variable and manipulate it is a plain integer.  The eventual
_gfn() required to pass it further down the callchain will help to
visually re-enforce the appropriate type.

After all, the entire point of these typesafes are to try and avoid
accidentally mixing up the different address spaces, but a function
which takes a typesafe, loops over a subset and passes the same typesafe
further down can probably be trusted to DTRT, catching errors at review
time. 

Off the top of my head, the only functions which would normally expect
to mix and match the typesafes are the pagetable walking ones.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3 10/18] xen/arm: ITS: Add APIs to add and assign device

2015-07-02 Thread Vijay Kilari

On Mon, Jun 29, 2015 at 5:59 PM, Ian Campbell ian.campb...@citrix.com wrote:
 CIOn Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote:
 +/* Device assignment. Should be called from pci_device_add */
 +int its_add_device(struct domain *d, u32 devid)
 +{

 Prior to the PCI series landing, and to enable dom0 to use ITS it might
 be possible to call this from xen/arch/arm/platforms/thunderx.c via the
 specific_mappings platform hook, which would also expose the PCI
 controller to dom0 via a series of specific mmio mappings (look at
 xen/arch/arm/platforms/xgene-storm.c for the sort of thing I mean).

 That would, I think, give basic PCI functionality for dom0 (i.e.
 allowing us to boot on thunderx) and decouple things from the PCI series
 somewhat, which ought to make things easier overall IMHO.

  In case ThunderX, mmio mappings PCI RC is parsed to find devices.
How do we know device ids upfront to call its_add_device?


 Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng

 -Original Message-
 From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
 Sent: Thursday, July 02, 2015 4:30 PM
 To: Wu, Feng
 Cc: Andrew Cooper; xen-devel@lists.xen.org; Zhang, Yang Z;
 george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com
 Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
 is blocked

 On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:

 +list_for_each_entry(vmx, per_cpu(pi_blocked_vcpu, cpu),
 +pi_blocked_vcpu_list)
 +if ( vmx-pi_desc.on )
 +tasklet_schedule(vmx-pi_vcpu_wakeup_tasklet);

There is a logical bug here.  If we have two NV's delivered to this
pcpu, we will kick the first vcpu twice.

On finding desc.on, a kick should be scheduled, then the vcpu removed
from this list.  With desc.on set, we know for certain that another NV
will not arrive for it until it has been scheduled again and the
interrupt posted.

   Yes, that seems a possible issue (and one that should indeed be
   avoided).

   I'm still unsure about the one that I raised myself but, if it is
   possible to have more than one vcpu in a pcpu list, with desc.on==true,
   then it looks to me that we kick all of them, for each notification.

   Added what Andrew's spotted, if there are a bunch of vcpus, queued with
   desc.on==ture, and a bunch of notifications arrives before the tasklet
   gets executed, we'll be kicking the whole bunch of them for a bunch of
   times! :-/

  As Andrew mentioned, removing the vCPUs with desc.on = true from the
  list can avoid kick vCPUs for multiple times.

 It avoids kicking vcpus multiple times if more than one notification
 arrives, yes.

 It is, therefore, not effective in making sure that, even with only one
 notification, you only kick the interested vcpu.

 This is the third time that I ask:
  (1) whether it is possible to have more vcpus queued on one pcpu PI
  blocked list with desc.on (I really believe it is);

I think it is, please see the following scenario:

When cpu masks the interrupts, and an external interrupt occurs for the
assigned device while the target vCPU2 is blocked, the wakeup notification
event handler has no chance to run, after a while, another wakeup
notification event for vCPU4 blocking on the same pCPU occurs,
after cpu unmakes the interrupts, wakeup notification handler
gets called. Then we get:
vCPU2, desc.on = 1 and vCPU4, desc.on = 1
Then in the handler we need to kick both of them.

  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
  soon as any notification arrives, instead that putting together a
  mechanism for kicking only a specific one.

Why can't we kick all of them, 'desc.on = 1' means there is a pending
interrupt, when we meet this condition, kicking the related vCPU should
be the right thing to do.

Thanks,
Feng

 The fact that you're not answering is not so much of a big deal for
 me... I'll just keep asking! :-D

 Regards,
 Dario
 --
 This happens because I choose it to happen! (Raistlin Majere)
 -
 Dario Faggioli, Ph.D, http://about.me/dario.faggioli
 Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] doc: Fix nonexistent error code in libxl_event_check example

2015-07-02 Thread Euan Harris

Fix example code in comment.libxl_event_check() can return
ERROR_NOT_READY;  LIBXL_NOT_READY does not exist.

Signed-off-by: Euan Harris euan.har...@citrix.com
---
 tools/libxl/libxl_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index 3c6fcfe..fad4c14 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -213,7 +213,7 @@ void libxl_evdisable_disk_eject(libxl_ctx *ctx, 
libxl_evgen_disk_eject*);
  *  libxl_osevent_afterpoll(...);
  *  for (;;) {
  *  r = libxl_event_check(...);
- *  if (r==LIBXL_NOT_READY) break;
+ *  if (r==ERROR_NOT_READY) break;
  *  if (r) goto error_out;
  *  do something with the event;
  *  }
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3 10/18] xen/arm: ITS: Add APIs to add and assign device

2015-07-02 Thread Julien Grall

On 02/07/15 10:01, Ian Campbell wrote:
 On Thu, 2015-07-02 at 14:10 +0530, Vijay Kilari wrote:
 On Mon, Jun 29, 2015 at 5:59 PM, Ian Campbell ian.campb...@citrix.com 
 wrote:
 CIOn Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote:
 +/* Device assignment. Should be called from pci_device_add */
 +int its_add_device(struct domain *d, u32 devid)
 +{

 Prior to the PCI series landing, and to enable dom0 to use ITS it might
 be possible to call this from xen/arch/arm/platforms/thunderx.c via the
 specific_mappings platform hook, which would also expose the PCI
 controller to dom0 via a series of specific mmio mappings (look at
 xen/arch/arm/platforms/xgene-storm.c for the sort of thing I mean).

 That would, I think, give basic PCI functionality for dom0 (i.e.
 allowing us to boot on thunderx) and decouple things from the PCI series
 somewhat, which ought to make things easier overall IMHO.

   In case ThunderX, mmio mappings PCI RC is parsed to find devices.
 How do we know device ids upfront to call its_add_device?
 
 Ah yes, this approach probably doesn't work for ITS. Which is a shame.

I guess the number of device IDs supported by thunderX is bounded and
not so big. So you could do a loop which register all the IDs from 0 to N.

Though, it may be possible to be smarter.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread George Dunlap

On 06/29/2015 04:45 PM, Vitaly Kuznetsov wrote:
 'pfn' and 'start_pfn' are ambiguous, both these functions expect GFNs as 
 input.
 
 On x86 the interface of p2m_set_mem_access() in p2m.c doesn't match the
 declaration in p2m-common.h as 'pfn' is being used instead of 'start_pfn'.
 
 On ARM both p2m_set_mem_access and p2m_get_mem_access interfaces don't match
 declarations from p2m-common.h: p2m_set_mem_access uses 'pfn' instead of
 'start_pfn' and p2m_get_mem_access uses 'gpfn' instead of 'pfn'.
 
 Convert p2m_get_mem_access/p2m_set_mem_access (and __p2m_get_mem_access on 
 ARM)
 interfaces to using gft_t instead of unsigned long and update all users of
 these functions.
 
 There is also an issue in p2m_get_mem_access on x86: 'gfn' parameter passed to
 gfn_lock/gfn_unlock is not defined. This code compiles only because of a
 coincidence: gfn_lock/gfn_unlock are currently macros which don't use their
 second argument.
 
 Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com
 ---
 Changes since v3:
 - Comment codying style fix [Razvan Cojocaru]
 - Use INVALID_GFN instead of ~0 and -1 [Andrew Cooper]
 - Convert p2m_get_mem_access/p2m_set_mem_access interfaces to using gfn_t
   [Andrew Cooper]

But you missed a change...

 @@ -1600,9 +1600,11 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long 
 gla,
  return (p2ma == p2m_access_n2rwx);
  }
  
 -/* Set access type for a region of pfns.
 - * If start_pfn == -1ul, sets the default access type */
 -long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr,
 +/*
 + * Set access type for a region of gfns.
 + * If gfn == INVALID_GFN, sets the default access type.
 + */
 +long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr,
  uint32_t start, uint32_t mask, xenmem_access_t 
 access)
  {
  struct p2m_domain *p2m = p2m_get_hostp2m(d);
 @@ -1638,18 +1640,19 @@ long p2m_set_mem_access(struct domain *d, unsigned 
 long pfn, uint32_t nr,
  return -EINVAL;
  }
  
 -/* If request to set default access */
 -if ( pfn == ~0ul )
 +/* If request to set default access. */
 +if ( gfn_x(gfn) == INVALID_GFN )
  {
  p2m-default_access = a;
  return 0;
  }
  
  p2m_lock(p2m);
 -for ( pfn += start; nr  start; ++pfn )
 +for ( gfn = _gfn(gfn_x(gfn) + start); nr  start;
 +  gfn = _gfn(gfn_x(gfn) + 1) )

Now in p2m_set_mem_access(), rather than just using an unsigned long in
the loop iterating over gfns, you do this thing where you convert gfn_t
to unsigned long, add one, and then convert it back to gfn_t again.

I can't see any comments in v3 that suggest you doing that, and it seems
a bit clunky.  Is that really necessary?  Wouldn't it be better to
declare a local variable?

I'm not strongly opinionated on this one, it just seems a bit strange.

Everything else looks good, thanks.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread Tim Deegan

At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote:
 On 02/07/15 11:48, George Dunlap wrote:
  Now in p2m_set_mem_access(), rather than just using an unsigned long in
  the loop iterating over gfns, you do this thing where you convert gfn_t
  to unsigned long, add one, and then convert it back to gfn_t again.
 
  I can't see any comments in v3 that suggest you doing that, and it seems
  a bit clunky.  Is that really necessary?  Wouldn't it be better to
  declare a local variable?
 
  I'm not strongly opinionated on this one, it just seems a bit strange.
 
  Everything else looks good, thanks.
 
 Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
 typesafety don't allow for simply adding 1 to a typesafe variable.
 
 In a cases like this, I think it is acceptable to keep a unsigned long
 shadow variable and manipulate it is a plain integer.  The eventual
 _gfn() required to pass it further down the callchain will help to
 visually re-enforce the appropriate type.
 
 After all, the entire point of these typesafes are to try and avoid
 accidentally mixing up the different address spaces, but a function
 which takes a typesafe, loops over a subset and passes the same typesafe
 further down can probably be trusted to DTRT, catching errors at review
 time. 
 
 Off the top of my head, the only functions which would normally expect
 to mix and match the typesafes are the pagetable walking ones.

It should be easy enough to extend the macros to define a
gfn_inc(gfn_t) operator for this kind of thing.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 11/22] xen/x86: allow disabling emulated devices for HVM guests

2015-07-02 Thread Roger Pau Monné

El 01/07/15 a les 17.46, Andrew Cooper ha escrit:
 On 01/07/15 15:46, Roger Pau Monne wrote:
 Introduce a new DOMCTL flag that can be used to disable device emulation
 inside of Xen for HVM guests. The following emulated devices are disabled
 when the XEN_DOMCTL_CDF_noemu is used: hpet, pmtimer, rtc, ioapic, lapic,
 pic and pmu. Also all the MMIO handlers are disabled.

 Signed-off-by: Roger Pau Monné roger@citrix.com
 Cc: Jan Beulich jbeul...@suse.com
 Cc: Andrew Cooper andrew.coop...@citrix.com
 Cc: Boris Ostrovsky boris.ostrov...@oracle.com
 Cc: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
 Cc: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com
 Cc: Jun Nakajima jun.nakaj...@intel.com
 Cc: Eddie Dong eddie.d...@intel.com
 Cc: Kevin Tian kevin.t...@intel.com

 I would be hesitant to have a blanket change like this.

 Consider APICV/AVIC.  For performance reasons, we absolutely want HVM
 and PVH to make use of them, as they are substantially more efficient
 using hardware support than evening using plain evtchn hypercalls.

 However, the flipside is that we must provide an LAPIC emulation to
 cover the bits which hardware cannot virtualise.

 As a random idea, how about having a new hypercall or hvmparam which
 provides a bitmap of permitted emulators?  This would allow far finer
 grain control over what is and isn't available to a domain.

I don't think using a new hypercall or hvmparam is suitable for this,
the emulators are initialized in hvm_domain_initialise which is called
by the XEN_DOMCTL_createdomain hypercall. Trying to set them before
calling XEN_DOMCTL_createdomain is impossible because there's no domain
struct yet, and adding a new hypercall to do that later seems quite
convoluted, IMHO it's best to never initialize them in the first place.

I would rather add a bitmap field to xen_arch_domainconfig in order to
describe which emulators we want to enable.

I've been also wondering why we need to introduce this now, AFAICT we
can always introduce this bitmap field later and remove
XEN_DOMCTL_CDF_noemu/DOMCRF_noemu because the DOMCTL interface is not
stable anyway.

Also, from a guest POV, how is the hw emulated local apic going to be
used? Are we going to route the interrupts from virtual devices
(netfront, blkfront) to the lapic? Or we just want it for the timer and
ditch the PV timer?

I can see that this is more interesting for a PVH/HVMlite Dom0, but
still in that case I'm not sure how a guest is supposed to interact with
it. Will the PHYSDEV hypercalls route interrupts to the emulated local
apic instead of pirqs event channels? Will we trap PCI/MSI/MSI-X
configuration and emulate it?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.6 Development Update (2 WEEKS TO FREEZE, important information in preamble)

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 11:51 +0300, Razvan Cojocaru wrote:
 On 07/02/2015 11:35 AM, Ian Campbell wrote:
  It seems that many people are unable to follow these simple
  instructions.
 
 The quotes part is obvious (and not that many people have ommited to do
 that), but to be honest I haven't been clear on who is supposed to be in
 the trimmed CC list: the maintainers of the code I'm touching with my
 series and Wei? 

Exactly.

It's the 'interested parties' that should be Cc-ed, which, 99% of the
times, is exactly what you just said:
 - the maintainers because, well, they are the maintainers, they know
   the code, they most likely know your series and will be able to
   engage in a conversation on whether the estimation is correct or not;
 - the release manager, since we're discussing release;
 - there might me more people, such as, people that have been involved
   in the review process, despite not being maintainers, or... no,
   that's all that comes to my mind.

So, trim to such a set, and you'll make most of the people happy, I bet.

 There doesn't seem to be a clear rule about who should
 be replied to (or maybe there is and I've missed it? If so, could you
 please point it out?). Maybe clearing this up could help with this
 problem in the future.
 
I think it was pretty obvious. I guess it would not harm to add a line
making this crystal clear in the mail, but it's Wei's call to judge
whether that is really necessary.

Regards,
Dario

-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] Sharing display between guests

2015-07-02 Thread Maxime Ripard

Hi,

I've started using Xen on an Allwinner A33, which works great as an
headless device using the latest PSCI patches in U-Boot.

However, we would like to do something more with it, and we would need
to have two VMs accessing the display at once, each one drawing in its
own part of the framebuffer.

Something that would look like this:

   Framebuffer
+---+
|   |
|Guest 1|
|   |
+---+
|   |
|   |
|   |
|Guest 2|
|   |
|   |
|   |
+---+

Where thing start to get interesting is that the second guest would be
running Android, and as such would need OpenGL support, and access to
the GPU, and that ideally the first guest would need to be able to
draw over all the screen to create some kind of a drop-down menu.

Our first thought was to use two different planes of a DRM/KMS driver,
one for each VM, with the second guest having the primary plane, and
the first guest having an overlay, and we would set it up in dom0.

That would mean that we would have a static composition, that would
be setup once and we could forget about it during the life of the
system.

This way, we would also have a fixed size framebuffer assigned to
Android, which is much easier to support, and since we have total
control over the application in the first guest, we would be able to
control how much transparency we want to leave (== how much of
Android do we want to be displayed), and we would be able to create
our drop-down menu.

Now the hard part: is such a setup possible at all with Xen? Can we
export a single plane to a guest and let it be the only user of it? If
that is possible, how would that interact with the 3D acceleration?

If not, is it something that is conceptually flawed, or does one just
need to write the appropriate amount of code? Do you have a better
solution to this problem?

Thanks a lot for your feedback,
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/4] xen: sched: avoid dumping duplicate information

2015-07-02 Thread George Dunlap

On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
dario.faggi...@citrix.com wrote:
 When dumping scheduling information (debug key 'r'), what
 we print as 'Idle cpupool' is pretty much the same of what
 we print immediately after as 'Cpupool0'. In fact, if there
 are no pCPUs outside of any cpupools, it is exactly the
 same.

 If there are free pCPUs, there is some valuable information,
 but still a lot of duplication:

  (XEN) Online Cpus: 0-15
  (XEN) Free Cpus: 8
  (XEN) Idle cpupool:
  (XEN) Scheduler: SMP Credit Scheduler (credit)
  (XEN) info:
  (XEN)   ncpus  = 13
  (XEN)   master = 0
  (XEN)   credit = 3900
  (XEN)   credit balance = 45
  (XEN)   weight = 1280
  (XEN)   runq_sort  = 11820
  (XEN)   default-weight = 256
  (XEN)   tslice = 30ms
  (XEN)   ratelimit  = 1000us
  (XEN)   credits per msec   = 10
  (XEN)   ticks per tslice   = 3
  (XEN)   migration delay= 0us
  (XEN) idlers: ,6d29
  (XEN) active vcpus:
  (XEN) 1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) 
 {a/i=22/21 m=18+5 (k=0)}
  (XEN) 2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) 
 {a/i=37/36 m=11+544 (k=0)}
  (XEN) 3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) 
 {a/i=127/126 m=108+9 (k=0)}
  (XEN) 4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] 
 (-84+300) {a/i=163/162 m=36+568 (k=0)}
  (XEN) 5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) 
 {a/i=129/128 m=16+50 (k=0)}
  (XEN) CPU[08]  sort=5791, sibling=,0300, core=,ff00
  (XEN)   run: [32767.8] pri=-64 flags=0 cpu=8
  (XEN) Cpupool 0:
  (XEN) Cpus: 0-5,10-15
  (XEN) Scheduler: SMP Credit Scheduler (credit)
  (XEN) info:
  (XEN)   ncpus  = 13
  (XEN)   master = 0
  (XEN)   credit = 3900
  (XEN)   credit balance = 45
  (XEN)   weight = 1280
  (XEN)   runq_sort  = 11820
  (XEN)   default-weight = 256
  (XEN)   tslice = 30ms
  (XEN)   ratelimit  = 1000us
  (XEN)   credits per msec   = 10
  (XEN)   ticks per tslice   = 3
  (XEN)   migration delay= 0us
  (XEN) idlers: ,6d29
  (XEN) active vcpus:
  (XEN) 1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) 
 {a/i=22/21 m=18+5 (k=0)}
  (XEN) 2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) 
 {a/i=37/36 m=11+544 (k=0)}
  (XEN) 3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) 
 {a/i=127/126 m=108+9 (k=0)}
  (XEN) 4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] 
 (-84+300) {a/i=163/162 m=36+568 (k=0)}
  (XEN) 5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) 
 {a/i=129/128 m=16+50 (k=0)}
  (XEN) CPU[00]  sort=11801, sibling=,0003, core=,00ff
  (XEN)   run: [32767.0] pri=-64 flags=0 cpu=0
  ... ... ...
  (XEN) CPU[15]  sort=11820, sibling=,c000, core=,ff00
  (XEN)   run: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) 
 {a/i=22/21 m=18+5 (k=0)}
  (XEN) 1: [32767.15] pri=-64 flags=0 cpu=15
  (XEN) Cpupool 1:
  (XEN) Cpus: 6-7,9
  (XEN) Scheduler: SMP RTDS Scheduler (rtds)
  (XEN) CPU[06]
  (XEN) CPU[07]
  (XEN) CPU[09]

 With this change, we get rid of the redundancy, and retain
 only the information about the free pCPUs.

 (While there, turn a loop index variable from `int' to
 `unsigned int' in schedule_dump().)

 Signed-off-by: Dario Faggioli dario.faggi...@citrix.com

Acked-by: George Dunlap george.dun...@eu.citrix.com

 ---
 Cc: Juergen Gross jgr...@suse.com
 Cc: George Dunlap george.dun...@eu.citrix.com
 ---
  xen/common/cpupool.c  |6 +++---
  xen/common/schedule.c |   18 +-
  2 files changed, 16 insertions(+), 8 deletions(-)

 diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
 index 563864d..5471f93 100644
 --- a/xen/common/cpupool.c
 +++ b/xen/common/cpupool.c
 @@ -728,10 +728,10 @@ void dump_runq(unsigned char key)

  print_cpumap(Online Cpus, cpu_online_map);
  if ( !cpumask_empty(cpupool_free_cpus) )
 +{
  print_cpumap(Free Cpus, cpupool_free_cpus);
 -
 -printk(Idle cpupool:\n);
 -schedule_dump(NULL);
 +schedule_dump(NULL);
 +}

  for_each_cpupool(c)
  {
 diff --git a/xen/common/schedule.c b/xen/common/schedule.c
 index ecf1545..4ffcd98 100644
 --- a/xen/common/schedule.c
 +++ b/xen/common/schedule.c
 @@ -1473,16 +1473,24 @@ void scheduler_free(struct scheduler *sched)

  void schedule_dump(struct cpupool *c)
  {
 -int   i;
 +unsigned int  i;
  struct scheduler *sched;
  cpumask_t*cpus;

  /* Locking, if necessary, must be handled withing each scheduler */

 -sched = (c == NULL) ? ops : c-sched;
 -cpus = cpupool_scheduler_cpumask(c);
 -printk(Scheduler: %s (%s)\n, sched-name, sched-opt_name);
 -SCHED_OP(sched, dump_settings);

Re: [Xen-devel] [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-02 Thread Chen, Tiejun


@@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn, struct 
pci_dev *pdev)
   PCI_BUS(bdf) == pdev-bus 
   PCI_DEVFN2(bdf) == devfn )
  {
-ret = rmrr_identity_mapping(pdev-domain, 1, rmrr);
+/*
+ * RMRR is always reserved on e820 so either of flag
+ * is fine for hardware domain and here we'd like to
+ * pass XEN_DOMCTL_DEV_RDM_RELAXED.
+ */
+ret = rmrr_identity_mapping(pdev-domain, 1, rmrr,
+XEN_DOMCTL_DEV_RDM_RELAXED);


So two things.

First, you assert that the value here won't matter, because the
hardware domain is guaranteed never to have a conflict.

Which is likely to be true almost all the time; but the question is,
*if* something goes wrong, what should happen?

For instance, suppose that someone accidentally introduces a bug in
Xen that messes up or ignores reading a portion of the e820 map under
certain circumstances.  What should happen?


Yes, you can image all possible cases. But if this kind of bug can come 
true, I really very doubt if Xen can boot successfully. Because e820 is 
a fundamental key to run OS, so this case is very easy to panic Xen, right?


Anyway, I agree we should concern all corner cases.



If you set this to RELAXED, this clash will be silently ignored; which
means that devices that need RMRR will simply malfunction in weird
ways without any warning messages having been printed that might give


No. We always post that messages regardless of relaxe or strict since 
this massage just depends on one condition of that conflict exist.



someone a hint about what is going on.

If you set this to STRICT, then this clash will print an error
message, but as far as I can tell, the rest of the device assignment
will continue as normal.  (Please correct me if I've followed the code
wrong.)


Not all cases are like this behavior but here is true.



Since the device should be just as functional (or not functional)
either way, but in the STRICT case should actually print an error
message which someone might notice, it seems to me that STRICT is a
better option for the hardware domain.



Just see above.


Secondly, you assert in response to Kevin's question in v3 that this
path is only reachable when assigning to the hardware domain.  I think
you at least need to update the comment here to indicate that's what
you think; it's not at all obvious just from looking at the function


What about this?

  PCI_DEVFN2(bdf) == devfn )
 {
 /*
- * RMRR is always reserved on e820 so either of flag
- * is fine for hardware domain and here we'd like to
- * pass XEN_DOMCTL_DEV_RDM_RELAXED.
+ * Here means we're add a device to the hardware domain
+ * so actually RMRR is always reserved on e820 so either
+ * of flag is fine for hardware domain and here we'd like
+ * to pass XEN_DOMCTL_DEV_RDM_RELAXED.
  */
 ret = rmrr_identity_mapping(pdev-domain, 1, rmrr,
 XEN_DOMCTL_DEV_RDM_RELAXED);



that this is true.  And if we do end up doing something besides
STRICT, we should check to make sure that pdev-domain really *is* the
hardware domain before acting like it is.


  if ( ret )
  dprintk(XENLOG_ERR VTDPREFIX, d%d: RMRR mapping failed\n,
  pdev-domain-domain_id);
@@ -1939,7 +1946,8 @@ static int intel_iommu_remove_device(u8 devfn, struct 
pci_dev *pdev)
   PCI_DEVFN2(bdf) != devfn )
  continue;

-rmrr_identity_mapping(pdev-domain, 0, rmrr);
+rmrr_identity_mapping(pdev-domain, 0, rmrr,
+  XEN_DOMCTL_DEV_RDM_RELAXED);
  }


Same here wrt STRICT.


This is inside intel_iommu_remove_device() so actually any flag doesn't 
take effect to rmrr_identity_mapping(). But I should add a comment like 
this,


+/*
+ * Any flag is nothing to clear these mappings so here
+ * its always safe to set XEN_DOMCTL_DEV_RDM_RELAXED.
+ */




After those changes (a single RDM_RELAXED flag, passing STRICT in for
the hardware domain) then I think this patch is in good shape.



Based on my understanding to your concern, seems you always think in 
case of relax we don't post any message, right? But now as I reply 
above this is not correct so what's your further consideration?


Anyway, I'm fine to change this. And after you suggested to keep one bit 
just to indicate XEN_DOMCTL_DEV_RDM_RELAXED, we don't have that actual 
XEN_DOMCTL_DEV_RDM_STRICT so I can just reset all associated flag as 0 
easily.


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [xen-unstable test] 58974: regressions - FAIL

2015-07-02 Thread Dario Faggioli

On Wed, 2015-07-01 at 21:29 -0700, Meng Xu wrote:
 Hi Dario,
 
Hi,

 2015-06-30 10:14 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com:
  if you'd b able to have a look at what's
  happening, that would be awesome. If you don't have time, I will have a
  look myself, but only in a few days.
 
 Hmm, this is another bug for RTDS on ARM. :-(
 I don't have an ARM board set up right now. I'm not sure if I can
 run/test it on ARM. I'm curious if this bug is similar with the
 previous lock-related bug of RTDS scheduler on ARM.
 Denys fixed the previous bug at
 http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg03933.html
 
Yes, it reminded of that one to me too... It likely is something
similar. :-/

I understand you don't have any suitable hardware. Still, if you could
at least have a look at the failure and at the code, and try to provide
an analysis and/or suggest a way for resolving the issue, we'll be glad
to test patches on actual ARM systems.

If you need to look at the Xen (quite likely) and Linux (quite unlikely)
images, e.g., to disassemble it and check what corresponds to the
address where things explode, and similar things, they're available too.

Just look here:
http://logs.test-lab.xenproject.org/osstest/logs/58974/test-armhf-armhf-xl-rtds/info.html

And check what the build jobs, Xen and Linux paths are, in the table at
the bottom. You'll find this:

  buildjobbuild-armhf
  xenbuildjob build-armhf
  xen_kernel_ver  3.16.7-ckt12+

Go to the summary of the flight, which includes build jobs:

 'Flight 58974 scoreboard'
 http://logs.test-lab.xenproject.org/osstest/logs/58974/

Identify and go in that jobs' report:
http://logs.test-lab.xenproject.org/osstest/logs/58974/build-armhf/info.html

And go to:

 'build/ (outputs from build)'
 http://logs.test-lab.xenproject.org/osstest/logs/58974/build-armhf/build/

You'll find the binaries (for Xen, in this case), there.

The only caveat is that you'll need an ARM toolchain, to be able to use,
in this case, arm-elf-objdump and similar tools. The procedure to get
one is distro dependant, but it's not that hard these days (there are
guides on our wiki too).

Dario
-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy

2015-07-02 Thread George Dunlap

On 07/02/2015 02:11 AM, Chen, Tiejun wrote:
 If I'm correct, then #3 means it's not possible to have devices
 for a
 domain *default* to strict, but to be relaxed in individual
 instances.
 If you had five devices you wanted strict, and only one device you
 wanted to be relaxed (because you knew it didn't matter), you'd
 have
 to set reserved=strict for all the other devices, rather than just
 being able to set the domain setting to strict and set
 reserve=relaxed
 for the one.

 I think that both violates the principle of least surprise, and is
 less useful.


 So what's you idea to follow our requirement?

 So consider the following config snippet:

 ---
 rdm=reserve=relaxed

 pci=['01:00.1,msitranslate=1']
 

 What should the policy for that device be?

 According to your policy document, it seems to me like it should be
 relaxed, since the domain default* is set to relaxed and nothing

 Why? strict should be in this case.

 OK, I think I see where the problem is.  I had expected the domain-wide
 setting to be a default which was overridden by per-device policies (see
 pci_permissive and friends).  So when I saw global default RDM policy
 
 We knew this behavior but we'd like to take a different consideration in
 this case.
 
 confirmation bias caused me to interpret it as what I expected to see --
 the domain setting as the default, which the local setting could
 override.

 I see now that in your documentation you consistently talk about two
 different policies, each of which have their own defaults, and that the
 effective permissions for a device end up being the intersection of the
 two (i.e., only relaxed of both are relaxed; strict under all other
 circumstances).

 Why are you saying this is not our expectation? Just let me pick up that
 description *again*,

 Default per-device RDM policy is 'strict', while default global RDM
 policy is 'relaxed'. When both policies are specified on a given region,
 'strict' is always preferred.

 Look, if I haven't understood what you meant by the exact same words the
 first 4 times I read it, simply repeating the same exact words is not
 going to be helpful.  Ideally you need to try go understand where my
 misunderstanding is coming from and explain where I've misunderstood
 something; or, at least you need to try to use different words, or
 explain how the words you're using apply to the given situation.
 
 From my point of view, I already replied this previously by quoting part
 of the patch head description. As you know this revision is already
 marked as v4 and although I admit some code implementations still need a
 further review, at least our policy should already acknowledged right
 now unless this is really wrong. But in our case, looks you're
 concerning our mechanism is not expected to you. So
 

 This interface doesn't make any sense to me.  Why, if the global

 If you have any objection to our solution, and if you can't find any
 reasonable answer from our design, just please ping Jan or Kevin because
 
 just do it to make this clear to us. And then, whatever, I'm going be
 fine to step next.
 
 I'm really not that person who can address this kind of change at this
 point in this high level.

 And you have no idea why that design was chosen; you're just doing what
 
 Certainly I have my own understanding with this issue. But
 
 you're told?
 
 in high level I have to say Yes. If you really read that v2 design and
 its associated discussion, you should notice I didn't put any response
 right there.

Look, I'm getting a bit angry at your continual implication that I
haven't put in enough work reading the background for this series.  If
you go back and look at the v2 design discussion, you'll see that I was
actively involved in that discussion, and sent at least a dozen emails
about it.  I have now spent nearly two full days just on this series,
including going back over lots of conversations that have happened
before to find answers to questions which you could have given in a
single line; and also to check assertions that you've made which have
turned out to be false.

In the v2 design discussion, the only thing I could find regarding the
relationship between per-device settings and the domain-wide setting was
as where you said [1]:

per-device override is always favored if a conflicting setting in
rmrr_host.

And in v2, Wei asked you [2]:

But this only works with global configuration and individual
configuration in PCI spec trumps this, right?

And you responded [3]:

You're right.

Now it happens that in all those cases you were literally talking about
the rmrr_host part of the configuration, not the strict/relaxed part of
the configuration; but that doesn't even make sense, since there *is* no
device-specific rmrr_host setting -- the only configuration which has
both a domain-wide and per-device component is the relaxed/strict.

So:

1. After spending yet another half hour doing research, I haven't found
any discussion that concluded we

Re: [Xen-devel] [PATCH] x86/p2m-ept: Don't unmap the EPT pagetable while it is still in use

2015-07-02 Thread George Dunlap

On 06/30/2015 06:09 PM, Andrew Cooper wrote:
 The call to iommu_pte_flush() between the two hunks uses ept_entry-epte
 which is a pointer into the mapped page.
 
 It is eventually passed to `clflush` instruction which will suffer a pagefault
 if the virtual mapping has fallen out of the TLB.
 
 (XEN) [ Xen-4.5.0-xs102594-d  x86_64  debug=y  Not tainted ]
 (XEN) CPU:7
 (XEN) RIP:e008:[82d0801572f0] cacheline_flush+0x4/0x9
 snip
 (XEN) Xen call trace:
 (XEN)[82d0801572f0] cacheline_flush+0x4/0x9
 (XEN)[82d08014] __iommu_flush_cache+0x4a/0x6a
 (XEN)[82d0801532e2] iommu_pte_flush+0x2b/0xd5
 (XEN)[82d0801f909a] ept_set_entry+0x4bc/0x61f
 (XEN)[82d0801f0c25] p2m_set_entry+0xd1/0x112
 (XEN)[82d0801f25b1] clear_mmio_p2m_entry+0x1a0/0x200
 (XEN)[82d0801f4aac] unmap_mmio_regions+0x49/0x73
 (XEN)[82d080106292] do_domctl+0x15bd/0x1edb
 (XEN)[82d080234fcb] syscall_enter+0xeb/0x145
 (XEN)
 (XEN) Pagetable walk from 820040004ae0:
 (XEN)  L4[0x104] = 0008668a5063 
 (XEN)  L3[0x001] = 0008668a3063 
 (XEN)  L2[0x000] = 00086689c063 
 (XEN)  L1[0x004] = 00056f078063 0007f678
 (XEN)
 (XEN) 
 (XEN) Panic on CPU 7:
 (XEN) FATAL PAGE FAULT
 (XEN) [error_code=]
 (XEN) Faulting linear address: 820040004ae0
 (XEN) 
 
 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
 CC: Jan Beulich jbeul...@suse.com
 CC: George Dunlap george.dun...@eu.citrix.com
 CC: Jun Nakajima jun.nakaj...@intel.com
 CC: Eddie Dong eddie.d...@intel.com
 CC: Kevin Tian kevin.t...@intel.com

Reviewed-by: George Dunlap george.dun...@eu.citrix.com

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.6 Development Update (2 WEEKS TO FREEZE, important information in preamble)

2015-07-02 Thread Ian Campbell

On Wed, 2015-07-01 at 11:17 +0100, Julien Grall wrote:
 As suggested by Wei on the top of his mail [1], can you please CC only
 relevant people and avoid to reply all?

It seems that many people are unable to follow these simple
instructions.

Wei, perhaps you could stop CCing people who inappropriately do not trim
their quotes or the CC list in the future. If they cannot do us the
courtesy of doing so I don't see why they should receive a courtesy copy
of the status mail.

A somewhat less aggressive approach might be to use Bcc instead Cc for
the bulk of people (i.e. anyone who needn't be cc-d on every reply). The
failure case of someone who cannot read simple instructions then becomes
a lack of CCs rather than a plethora of unwanted Ccs.

Ian.

 
 Many thanks,
 
 [1] (Note, please trim your quotes when replying, and also trim the CC
 list if
 necessary. You might also consider changing the subject line of your
 reply to
 Status of FOO (Was: Xen 4.6 Development Update (X months reminder))
 
 On 01/07/15 07:12, Chen, Tiejun wrote:
 
  *  RMRR fix (fair)
  RFC posted
  
  Wei,
  
  I think this should be ok or good based on current status, and also
  should remove RFC here.
  
  Thanks
  Tiejun
  
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel
 
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-02 Thread Roger Pau Monné

El 02/07/15 a les 10.45, Ian Campbell ha escrit:
 On Thu, 2015-07-02 at 09:32 +0200, Roger Pau Monné wrote:
 El 01/07/15 a les 18.15, Boris Ostrovsky ha escrit:
 On 07/01/2015 11:57 AM, Ian Campbell wrote:
 The next Xen technical call will be at:
  Wed  8 Jul 17:00:00 BST 2015
  `date -d @1436371200`

 See http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html
 for more information on the call.

 Please let me know (CC-ing the list) any topics which you would like to
 discuss. It might be useful to include:

* References to any relevant/recent mailing list threads;
* Other people who you think should be involved in the discussion (and
  CC them);

 If you would like to attend then please let me know so I can send you the
 dial in details.


 Given that there is fair amount of PVH-related work happening now
 (Roger's, Elena's and mine) perhaps we should have a discussion about
 that to see where we are going?

 Andrew, Tim, Roger, Jan (if he is back from vacation), Elena, Konrad and
 David would be good to have present.
 
 Yes, I think this topic would indeed be a worthwhile subject for
 discussion.
 
 I think there is a toolstack element to all this as well, so would it be
 a good idea to have Ian J, Stefano and Wei in addition to me (who will
 be there to run the call in any case)? I'd probably mark them in as
 optional participants for scheduling purposes.

Yes, my series include way more toolstack than hypervisor changes.

 I'm sorry but I will be on vacation next week, hiking in the mountains
 without Internet or cell phone coverage, so I won't be able to join the
 meeting.
 
 I think you are one of the critical participants in the discussion, i.e.
 we can't go ahead without you.
 
 I will be back on the 20th, maybe we can arrange to do another of this
 PVH meetings on the next technical call in early August?
 
 The August call is scheduled for 12 August, which is 6 weeks from
 yesterday.
 
 We could either go for that or if it is too far out we could push the
 July iteration of the call back two weeks to Wednesday 22 July (so 3
 weeks from yesterday). (a one week slip would still land in your
 vacation time).

I would rather go for the 22 or 29 of July or the 5 of August, because
I'm also on vacations from the 10 to the 16 of August.

 There is also the devsummit in the week 17 August (~7 weeks time) which
 is an opportunity for a f2f.
 
 Shall I put up a poll of some sort to gather preferred timeslot options
 out of that set?

That sounds fine, thanks.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-02 Thread Ian Campbell

On Thu, 2015-07-02 at 09:45 +0100, Ian Campbell wrote:
 Shall I put up a poll of some sort to gather preferred timeslot options
 out of that set?

Please can everyone who is interested in this topic indicate their date
preference/availability at:

http://doodle.com/cy88dhwzybg7hh7p

I've gone with the usual 5pm BST slow for simplicity. That's 1200 Noon
EDT, 9am PDT and 6pm CEST.

If it turns out we can't find a option at that time then we can try
again with some other times in the mix.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng

 -Original Message-
 From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
 Sent: Thursday, July 02, 2015 6:10 PM
 To: Wu, Feng
 Cc: Andrew Cooper; xen-devel@lists.xen.org; Zhang, Yang Z;
 george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com
 Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
 is blocked

 On Thu, 2015-07-02 at 08:58 +, Wu, Feng wrote:

   -Original Message-
   From: Dario Faggioli [mailto:dario.faggi...@citrix.com]

   This is the third time that I ask:
(1) whether it is possible to have more vcpus queued on one pcpu PI
blocked list with desc.on (I really believe it is);

  I think it is, please see the following scenario:

  When cpu masks the interrupts, and an external interrupt occurs for the
  assigned device while the target vCPU2 is blocked, the wakeup notification
  event handler has no chance to run, after a while, another wakeup
  notification event for vCPU4 blocking on the same pCPU occurs,
  after cpu unmakes the interrupts, wakeup notification handler
  gets called. Then we get:
  vCPU2, desc.on = 1 and vCPU4, desc.on = 1
  Then in the handler we need to kick both of them.

 Ok, first of all, thanks for answering! :-)

 And yes, this makes sense.

(2) if yes, whether it is TheRightThing(TM) to kick all of them, as
soon as any notification arrives, instead that putting together a
mechanism for kicking only a specific one.

  Why can't we kick all of them, 'desc.on = 1' means there is a pending
  interrupt, when we meet this condition, kicking the related vCPU should
  be the right thing to do.

 Right, I see it now. I felt like I was missing something, and that's why
 I was asking to you to elaborate a bit more.
 Thanks again for having done this. I was missing/forgetting half of the
 way desc.on is actually handled, sorry for this.

 BTW, I'm finding it hard reading this series from the archives; there
 appears to be some threading issues and some missing messages. I also
 don't have it in my inbox, because my filters failed to spot and flag it
 properly. If you send a new version, please, Cc me, so it will be easier
 for me to look at all the patches, and provide a more helpful review.

Sure, thanks for the review!

Thanks,
Feng

 Thanks and Regards,
 Dario

 --
 This happens because I choose it to happen! (Raistlin Majere)
 -
 Dario Faggioli, Ph.D, http://about.me/dario.faggioli
 Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:

+list_for_each_entry(vmx, per_cpu(pi_blocked_vcpu, cpu),
+pi_blocked_vcpu_list)
+if ( vmx-pi_desc.on )
+tasklet_schedule(vmx-pi_vcpu_wakeup_tasklet);
  
   There is a logical bug here.  If we have two NV's delivered to this
   pcpu, we will kick the first vcpu twice.
  
   On finding desc.on, a kick should be scheduled, then the vcpu removed
   from this list.  With desc.on set, we know for certain that another NV
   will not arrive for it until it has been scheduled again and the
   interrupt posted.
  
  Yes, that seems a possible issue (and one that should indeed be
  avoided).
  
  I'm still unsure about the one that I raised myself but, if it is
  possible to have more than one vcpu in a pcpu list, with desc.on==true,
  then it looks to me that we kick all of them, for each notification.
  
  Added what Andrew's spotted, if there are a bunch of vcpus, queued with
  desc.on==ture, and a bunch of notifications arrives before the tasklet
  gets executed, we'll be kicking the whole bunch of them for a bunch of
  times! :-/
 
 As Andrew mentioned, removing the vCPUs with desc.on = true from the
 list can avoid kick vCPUs for multiple times.
 
It avoids kicking vcpus multiple times if more than one notification
arrives, yes.

It is, therefore, not effective in making sure that, even with only one
notification, you only kick the interested vcpu.

This is the third time that I ask:
 (1) whether it is possible to have more vcpus queued on one pcpu PI 
 blocked list with desc.on (I really believe it is);
 (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
 soon as any notification arrives, instead that putting together a
 mechanism for kicking only a specific one.

The fact that you're not answering is not so much of a big deal for
me... I'll just keep asking! :-D


Regards,
Dario
-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy

2015-07-02 Thread Chen, Tiejun


in high level I have to say Yes. If you really read that v2 design and
its associated discussion, you should notice I didn't put any response
right there.


Look, I'm getting a bit angry at your continual implication that I


Sorry to this.


haven't put in enough work reading the background for this series.  If
you go back and look at the v2 design discussion, you'll see that I was
actively involved in that discussion, and sent at least a dozen emails
about it.  I have now spent nearly two full days just on this series,


Sure and thanks for your review and time.


including going back over lots of conversations that have happened
before to find answers to questions which you could have given in a
single line; and also to check assertions that you've made which have
turned out to be false.

In the v2 design discussion, the only thing I could find regarding the
relationship between per-device settings and the domain-wide setting was
as where you said [1]:

per-device override is always favored if a conflicting setting in
rmrr_host.

And in v2, Wei asked you [2]:

But this only works with global configuration and individual
configuration in PCI spec trumps this, right?

And you responded [3]:

You're right.

Now it happens that in all those cases you were literally talking about
the rmrr_host part of the configuration, not the strict/relaxed part of
the configuration; but that doesn't even make sense, since there *is* no
device-specific rmrr_host setting -- the only configuration which has
both a domain-wide and per-device component is the relaxed/strict.

So:

1. After spending yet another half hour doing research, I haven't found
any discussion that concluded we should have the global policy override
the local policy


I also took some time to go back checking this point and indeed this is 
not in that public design. And as I mentioned in another email which is 
following this, I also had a talk to Kevin about this issue, and looks 
this is just concluded from our internal discussion and he didn't post 
this in v2 design again because as you know, that design is about 
something in high level. And as I recall, these discussions can't cover 
everything at that moment because they thought we'd better post a 
preliminary patches to further discuss something since this is really a 
complicated case. So afterwards I sent out two RFC revisions to help all 
guys finalize a good solution. And I can confirm current policy is 
always same from the first RFC, but we didn't see any opposite advice 
until now.




2. The only discussion I *did* find has *you yourself* saying that the
per-device setting should override the global setting, not once, but
twice; and nobody contradicting you.

Maybe there is somewhere else a discussion somewhere where this was
changed; but I've already spent half an hour this morning looking at
where you said it was (v2 design discussion), and found the opposite --
just as I remembered.  I'm not going to look anymore.

You have now caused me to waste an awful lot of time on this series that
could profitably have been used elsewhere.


Sorry to this but I just think we already have 2 RFC revisions and 4 
revisions without RFC, and some patches are already Acked, we really 
should overturn this policy right now?




[1]
marc.info/?i=aadfc41afe54684ab9ee6cbc0274a5d126147...@shsmsx101.ccr.corp.intel.com

[2] marc.info/?i=20150519110041.gb21...@zion.uk.xensource.com

[3] marc.info/?i=555c1b5c.7070...@intel.com



I was involved in the design discussion, and from the very beginning I
probably saw your plan but misunderstood it.  I wouldn't be surprised if
some others didn't quite understand what they were agreeing to.


Again, I didn't walk into v2 design. So here I don't want to bring any
confusion to you just with my reply.


This is your feature, so it is your responsibility to understand and
explain why you are doing what you are doing, if only to say Jan wanted


Maybe you remember I just posted v1 but looks that was not a better 
design to show this implementation according to some feedback, so Kevin 
issued v2 revision and had a wider discussion with you guys. Since then 
I just follow this version. So I mean I don't further hold these things 
in high level since I just think both policy is fine to me because IMO, 
these two approaches are optional.



X to happen because of Y [see $ref].



So this is why I said you'd better ask this to Kevin or Jan since I 
can't decide what's next at this point.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 00/13] xen/arm: Add support for GICv2 on GICv3

2015-07-02 Thread Julien Grall

On 01/07/15 12:00, Julien Grall wrote:
 Hi all,

Hi Ian,

 This patch series adds support for GICv2 on GICv3. This feature is available
 only when the GICv3 hardware is compatible with GICv2.
 
 When it's the case, the same interface is provided in order to use a
 virtualize GICv2 (i.e GICC and GICV). This will allow us to re-use the
 same vGIC driver.
 
 Currently GIC and vGIC drivers are tight because of the domain initialization
 splitted between GIC and vGIC. This patch series intends to remove this
 dependency in order to make the vGIC driver agnostic of the GIC driver.
 
 It has been tested on the ARMv8 Foundation Model with GICv2 and GICv3 as
 well as changing the vGIC version emulated for the guest (only on GICv3 host).
 
 A branch with all the patches can be found here:
 git://xenbits.xen.org/people/julieng/xen-unstable.git branch 
 gicv2-on-gicv3-v3
 
 All the patches as been acked except #11 and #12.

I was wondering if you can apply patch #1-#10 as they are already acked.
It would avoid me to resend the whole series.

If you ack #11, you could even apply #11 and #13 (#12 is independent).

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v3] mg-all-branch-statuses: Show how up to date each branch is

2015-07-02 Thread Ian Campbell

Using report_find_push_age_info allows us to provide counts of
attempts since the last baseline on current tip as well as the first
attempt of each of those.

Since everything serialises on the repo lock I didn't bother trying to
parallelise anything.

It's a little terse to keep it in 80 chars.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v2: Use new report_find_push_age_info functionality, output condensed.

v3: - Correctly quote parameters to printf, so empty ones don't misalign the
  rest.
- Drop dates in favour of number of days, and n/a in those cases
- Print Error! if no tip is available and UpToDate if tip==basis.

time with an recently updates set of Repos says:

57.34user 28.28system 5:34.16elapsed 25%CPU (0avgtext+0avgdata 
47256maxresident)k
100216inputs+600outputs (673major+2332436minor)pagefaults 0swaps

So it's not quick...

Example output:
Branch   BasisTip  #Tip #Tot 1stTip 1stNew
libvirt  d10a5f58 845184b20   10 n/a8 days
linux-3.0e1c63f9f 5dba9ddd29 2 days 868 days
linux-3.10   b3d78448 UpToDate
linux-3.14   762167f9 UpToDate
linux-3.16   162d6432 26749e7511 1 day  1 day
linux-3.18   d048c068 ea5dd38e33 2 days 2 days
linux-3.4bb4a05a0 cf1b3dad   15  161 11 days212 days
linux-4.1b953c0d2 6a010c0a0- n/an/a
linux-arm-xen64972ceb UpToDate
linux-linus  6aaf0da8 4da3064d00 n/an/a
linux-mingo-tip-master   d935d0f7 778a1ac50   16 n/a1173 days
linux-nextc8a9ad220  219 n/a447 days
osstest  15d2dd50 Error!  0- n/an/a
ovmf 269e0aeb 288ed59001 n/a1 day
qemu-mainlined2966f80 UpToDate
qemu-upstream-4.2-testingd2382550 UpToDate
qemu-upstream-4.3-testingefae5e0f UpToDate
qemu-upstream-4.4-testing32226f42 UpToDate
qemu-upstream-4.5-testingd9552b0a UpToDate
qemu-upstream-unstable   c4a962ec UpToDate
rumpuserxen  30d72f3f 3b91e449   61  113 77 days149 days
seabios  f24eb2f8 UpToDate
xen-4.0-testing  2692df2a UpToDate
xen-4.1-testing  40feff87 UpToDate
xen-4.2-testing  38fcda22 UpToDate
xen-4.3-testing  e7c02297 UpToDate
xen-4.4-testing  6c1cb3db UpToDate
xen-4.5-testing  e3bd3cef UpToDate
xen-unstable c40317f1 3d55a17903 n/a2 days
---
 mg-all-branch-statuses | 135 +
 1 file changed, 135 insertions(+)
 create mode 100755 mg-all-branch-statuses

diff --git a/mg-all-branch-statuses b/mg-all-branch-statuses
new file mode 100755
index 000..9a78ec2
--- /dev/null
+++ b/mg-all-branch-statuses
@@ -0,0 +1,135 @@
+#!/bin/bash
+# -*- bash -*-
+#
+# Prints the status of each branch
+#
+# Usage:
+#./mg-all-branch-statuses [BRANCH]
+#
+# If no BRANCHes specified, does all that are normally run by
+# cr-daily-branch or out of crontab.
+
+# This is part of osstest, an automated testing framework for Xen.
+# Copyright (C) 2009-2014 Citrix Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+
+set -e
+
+. cri-common
+
+mkdir -p tmp
+
+if [ $# = 0 ]; then
+   set `./mg-list-all-branches`
+fi
+
+gather_info()
+{
+local branch=$1; shift
+local basis=$1;  shift
+local tip=$1;shift
+
+select_xenbranch
+
+local info=`perl -we '
+   use Osstest::Executive;
+   use Osstest;
+   use Data::Dumper;
+   open DEBUG, /dev/null or die $!;
+   #open DEBUG, STDERR or die $!;
+   csreadconfig();
+   my ($branch,$tree,$basis,$tip) = @ARGV;
+   print DEBUG branch=$branch tree=$tree basis=$basis tip=$tip\n;
+   my $info = report_find_push_age_info([qw(real adhoc play)],
+ undef, [($branch)],
+ $tree, $basis, $tip);
+   print DEBUG Dumper $info;
+   my $onevar = sub {
+   my ($var,$dflt) = @_;
+   $dflt //= ;
+   print export .uc($var).=\;
+   print

[Xen-devel] [linux-3.16 test] 59012: tolerable FAIL - PUSHED

2015-07-02 Thread osstest service user

flight 59012 linux-3.16 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59012/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-qemut-rhel6hvm-intel 12 guest-start/redhat.repeat fail in 
58996 pass in 59012
 test-armhf-armhf-xl-arndale   6 xen-bootfail pass in 58996
 test-armhf-armhf-xl-multivcpu  7 host-ping-check-xenfail pass in 58996

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 14 guest-localmigrate fail in 58996 baseline untested
 test-armhf-armhf-xl-rtds 14 guest-start.2  fail in 58996 baseline untested
 test-amd64-i386-libvirt  11 guest-start  fail   like 58447
 test-amd64-amd64-xl-credit2  17 guest-localmigrate/x10   fail   like 58447
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   like 58447
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58447

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-arndale  12 migrate-support-check fail in 58996 never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-check fail in 58996 never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-freebsd10-i386  9 freebsd-install  fail never pass
 test-amd64-i386-freebsd10-amd64  9 freebsd-install fail never pass
 test-amd64-amd64-xl-multivcpu 17 guest-localmigrate/x10   fail  never pass
 test-amd64-amd64-xl-rtds 17 guest-localmigrate/x10   fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass

version targeted for testing:
 linux26749e751cc2be7bc0f17a6cca68f6e5c0675191
baseline version:
 linux162d64326176ee1916fb98323d810c78a7e3d042


People who touched revisions under test:
  Aaro Koskinen aaro.koski...@nokia.com
  Adam Jackson a...@redhat.com
  Alex Deucher alexander.deuc...@amd.com
  Andrew Morton a...@linux-foundation.org
  Andy Lutomirski l...@kernel.org
  Axel Lin axel@ingics.com
  Chengyu Song cson...@gatech.edu
  Chris Mason c...@fb.com
  Clemens Ladisch clem...@ladisch.de
  Dan Carpenter dan.carpen...@oracle.com
  Dan Williams dan.j.willi...@intel.com
  Dave Airlie airl...@redhat.com
  David S. Miller da...@davemloft.net
  David Woodhouse david.woodho...@intel.com
  Dieter Jurzitza
  Dmitry Torokhov dmitry.torok...@gmail.com
  Eric Dumazet eduma...@google.com
  Fabio Estevam fabio.este...@freescale.com
  Fan Du fan...@intel.com
  Felipe Balbi ba...@ti.com
  Filipe Manana fdman...@suse.com
  Florian Fainelli f.faine...@gmail.com
  Greg Kroah-Hartman gre...@linuxfoundation.org
  Gregory CLEMENT gregory.clem...@free-electrons.com
  Gu Zheng guz.f...@cn.fujitsu.com
  H. Peter Anvin h...@linux.intel.com
  Hannes Frederic Sowa han...@stressinduktion.org
  Hans de Goede hdego...@redhat.com
  Herbert Xu herb...@gondor.apana.org.au
  Horia Geanta horia.gea...@freescale.com
  Hui Wang hui.w...@canonical.com
  Ian Campbell ian.campb...@citrix.com
  Ingo Molnar mi...@kernel.org
  James Hogan james.ho...@imgtec.com
  Jan Kara j...@suse.cz
  Jani Nikula jani.nik...@intel.com
  Jason A. Donenfeld ja...@zx2c4.com
  Jean Delvare jdelv...@suse.de
  Jeff Mahoney je...@suse.com
  Jenny Falkovich jen...@mellanox.com
  Jens Axboe ax...@fb.com
  Jiang Liu jiang@linux.intel.com
  Jim Bride jim.br...@linux.intel.com
  Jiri Pirko j...@resnulli.us
  Johan Hovold jo...@kernel.org
  Johannes Berg johannes.b...@intel.com
  John D. Blair jo...@candicontrols.com
  Jonathan Cameron ji...@kernel.org
  JÃ©rÃ´me Glisse jgli...@redhat.com
  Kim Phillips kim.phill...@freescale.com
  Lars-Peter Clausen l...@metafoo.de
  Laura Abbott labb...@fedoraproject.org
  Li RongQing roy.qing...@gmail.com
  Linus Torvalds torva...@linux-foundation.org
  Luis Henriques

[Xen-devel] Xen + futexes

2015-07-02 Thread Vitaly Chernooky

Hi all!

I have theoretical question.

What do you think about porting futexes or binder to XEN?

With best regards,

-- 
*Vitaly Chernooky | Senior Developer - Product Engineering and Development*
GlobalLogic
P *+380.44.4929695 ext.1136* M *+380.63.6011802* S cvv_2k
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] OSSTest: stop testing SEDF at all

2015-07-02 Thread George Dunlap

On 06/30/2015 06:36 PM, Dario Faggioli wrote:
 SEDF has been broken and unmaintained at least
 until Xen 4.2, and most likely even before! Tests
 are failing without anyonce caring, and yet we're
 keeping using test resources for them.
 
 Let's stop doing this!
 
 Signed-off-by: Dario Faggioli dario.faggi...@citrix.com

Acked-by: George Dunlap george.dun...@eu.citrix.com

 ---
 diff --git a/allow.all b/allow.all
 index 88a3038..8067d5e 100644
 --- a/allow.all
 +++ b/allow.all
 @@ -1,4 +1,3 @@
 -test-@@-sedf@@
  test-@@-rtds@@
  build-@@logs-capture@@
  test-@@-pcipt@@
 diff --git a/make-flight b/make-flight
 index 31cb942..c763ce9 100755
 --- a/make-flight
 +++ b/make-flight
 @@ -274,21 +274,6 @@ do_hvm_rhel6_tests () {
done
  }
  
 -do_sedf_tests () {
 -  if [ $xenarch != $dom0arch -o x$test_sedf != xy ]; then
 -return
 -  fi
 -
 -  for pin in '' -pin; do
 -job_create_test test-$xenarch$kern-$dom0arch-xl-sedf$pin  \
 -   test-debian xl $xenarch $dom0arch  \
 -guests_vcpus=4\
 -xen_boot_append=sched=sedf loglvl=all ${pin:+dom0_vcpus_pin} \
 -linux_boot_append='loglevel=9 debug'  \
 -$debian_runvars all_hostflags=$most_hostflags
 -  done
 -}
 -
  do_credit2_tests () {
if [ $xenarch != $dom0arch ]; then
  return
 @@ -374,19 +359,18 @@ test_matrix_do_one () {
  
do_multivcpu_tests
  
 -  # RTDS came in 4.4, while SEDF is going away in 4.6
 +  # RTDS came in 4.5
case $xenbranch in
 -  xen-3.*-testing) test_sedf=y; test_rtds=n ;;
 -  xen-4.0-testing) test_sedf=y; test_rtds=n ;;
 -  xen-4.1-testing) test_sedf=y; test_rtds=n ;;
 -  xen-4.2-testing) test_sedf=y; test_rtds=n ;;
 -  xen-4.3-testing) test_sedf=y; test_rtds=n ;;
 -  xen-4.4-testing) test_sedf=y; test_rtds=n ;;
 -  xen-4.5-testing) test_sedf=y; test_rtds=y ;;
 -  *)   test_sedf=n; test_rtds=y ;;
 +  xen-3.*-testing) test_rtds=n ;;
 +  xen-4.0-testing) test_rtds=n ;;
 +  xen-4.1-testing) test_rtds=n ;;
 +  xen-4.2-testing) test_rtds=n ;;
 +  xen-4.3-testing) test_rtds=n ;;
 +  xen-4.4-testing) test_rtds=n ;;
 +  xen-4.5-testing) test_rtds=y ;;
 +  *)   test_rtds=y ;;
esac
  
 -  do_sedf_tests
do_rtds_tests
do_credit2_tests
  
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Andrew Cooper

On 02/07/15 09:30, Dario Faggioli wrote:
 On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:

 +list_for_each_entry(vmx, per_cpu(pi_blocked_vcpu, cpu),
 +pi_blocked_vcpu_list)
 +if ( vmx-pi_desc.on )
 +tasklet_schedule(vmx-pi_vcpu_wakeup_tasklet);
 There is a logical bug here.  If we have two NV's delivered to this
 pcpu, we will kick the first vcpu twice.

 On finding desc.on, a kick should be scheduled, then the vcpu removed
 from this list.  With desc.on set, we know for certain that another NV
 will not arrive for it until it has been scheduled again and the
 interrupt posted.

 Yes, that seems a possible issue (and one that should indeed be
 avoided).

 I'm still unsure about the one that I raised myself but, if it is
 possible to have more than one vcpu in a pcpu list, with desc.on==true,
 then it looks to me that we kick all of them, for each notification.

 Added what Andrew's spotted, if there are a bunch of vcpus, queued with
 desc.on==ture, and a bunch of notifications arrives before the tasklet
 gets executed, we'll be kicking the whole bunch of them for a bunch of
 times! :-/
 As Andrew mentioned, removing the vCPUs with desc.on = true from the
 list can avoid kick vCPUs for multiple times.

 It avoids kicking vcpus multiple times if more than one notification
 arrives, yes.

 It is, therefore, not effective in making sure that, even with only one
 notification, you only kick the interested vcpu.

 This is the third time that I ask:
  (1) whether it is possible to have more vcpus queued on one pcpu PI 
  blocked list with desc.on (I really believe it is);
  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
  soon as any notification arrives, instead that putting together a
  mechanism for kicking only a specific one.

We will receive one NV for every time the hardware managed to
successfully set desc.on

If multiple stack up and we proactively drain the list, we will
subsequently search the list to completion for all remaining NV's, due
to finding no appropriate entries.

I can't currently decide whether this will be quicker or slower overall,
or (most likely) it will even out to equal in the general case.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy

2015-07-02 Thread George Dunlap

On 07/02/2015 11:01 AM, Chen, Tiejun wrote:
 1. After spending yet another half hour doing research, I haven't found
 any discussion that concluded we should have the global policy override
 the local policy
 
 I also took some time to go back checking this point and indeed this is
 not in that public design. And as I mentioned in another email which is
 following this, I also had a talk to Kevin about this issue, and looks
 this is just concluded from our internal discussion and he didn't post
 this in v2 design again because as you know, that design is about
 something in high level. And as I recall, these discussions can't cover
 everything at that moment because they thought we'd better post a
 preliminary patches to further discuss something since this is really a
 complicated case. So afterwards I sent out two RFC revisions to help all
 guys finalize a good solution. And I can confirm current policy is
 always same from the first RFC, but we didn't see any opposite advice
 until now.

Probably because the reviewers all assumed that the design draft had
been followed, and you didn't make it clear that you'd changed it.

 2. The only discussion I *did* find has *you yourself* saying that the
 per-device setting should override the global setting, not once, but
 twice; and nobody contradicting you.

 Maybe there is somewhere else a discussion somewhere where this was
 changed; but I've already spent half an hour this morning looking at
 where you said it was (v2 design discussion), and found the opposite --
 just as I remembered.  I'm not going to look anymore.

 You have now caused me to waste an awful lot of time on this series that
 could profitably have been used elsewhere.
 
 Sorry to this but I just think we already have 2 RFC revisions and 4
 revisions without RFC, and some patches are already Acked, we really
 should overturn this policy right now?

First of all, I think it's easy to change.

Even if it weren't, I already said that I'd be OK with accepting the
patch series with the existing override semantics, and without the
default semantics, *if* it were renamed to make it clear what was
going on.

But, for future reference, I am not going to approve an interface I
think is misleading or wrong -- particularly one like the xl interface
which we want to avoid changing if possible -- just because time is
short.  One of my own features, HVM USB pass-through, has narrowly
missed two releases (including the current one) because we wanted to be
careful to get the interface right.

 Again, I didn't walk into v2 design. So here I don't want to bring any
 confusion to you just with my reply.

 This is your feature, so it is your responsibility to understand and
 explain why you are doing what you are doing, if only to say Jan wanted
 
 Maybe you remember I just posted v1 but looks that was not a better
 design to show this implementation according to some feedback, so Kevin
 issued v2 revision and had a wider discussion with you guys. Since then
 I just follow this version. So I mean I don't further hold these things
 in high level since I just think both policy is fine to me because IMO,
 these two approaches are optional.
 
 X to happen because of Y [see $ref].

 
 So this is why I said you'd better ask this to Kevin or Jan since I
 can't decide what's next at this point.

Let me say that again: I don't care whether anyone pulled rank and
ordered you to do something a certain way.  YOU are the one submitting
this patch.  That means YOU responsible for understanding why they want
it that way, and YOU are responsible for justifying it to other people.
 If you don't understand it at all, it's YOUR responsibility to get them
to explain it, not mine to chase them down.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC 4/6] xen: Print and use errno where applicable.

2015-07-02 Thread Stefano Stabellini

On Wed, 1 Jul 2015, Konrad Rzeszutek Wilk wrote:
 On Wed, Jul 01, 2015 at 02:01:07PM +0100, Stefano Stabellini wrote:
  On Mon, 29 Jun 2015, Konrad Rzeszutek Wilk wrote:
   In Xen 4.6 commit cd2f100f0f61b3f333d52d1737dd73f02daee592
   libxc: Fix do_memory_op to return negative value on errors
   made the libxc API less odd-ball: On errors, return value is
   -1 and error code is in errno. On success the return value
   is either 0 or an positive value.
   
   Since we could be running with an old toolstack in which the
   Exx value is in rc or the newer, we print both and return
   the -EXX depending on rc == -1 condition.
   
   Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
   ---
xen-hvm.c | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)
   
   diff --git a/xen-hvm.c b/xen-hvm.c
   index 0408462..a92bc14 100644
   --- a/xen-hvm.c
   +++ b/xen-hvm.c
   @@ -345,11 +345,12 @@ go_physmap:
unsigned long idx = pfn + i;
xen_pfn_t gpfn = start_gpfn + i;

   +/* In Xen 4.6 rc is -1 and errno contains the error value. */
rc = xc_domain_add_to_physmap(xen_xc, xen_domid, 
   XENMAPSPACE_gmfn, idx, gpfn);
if (rc) {
DPRINTF(add_to_physmap MFN %PRI_xen_pfn to PFN %
   -PRI_xen_pfn failed: %d\n, idx, gpfn, rc);
   -return -rc;
   +PRI_xen_pfn failed: %d (errno: %d)\n, idx, gpfn, 
   rc, errno);
   +return rc == -1 ? -errno : -rc;
  
  Printing both rc and errno is the right thing to do, but I am not sure
  changing return value depending on the libxc version is a good idea.
  Maybe we should be consistent and always return rc?
 
 In Xen 4.5 and earlier this function would return -EINVAL (say rc=EINVAL).
 With Xen 4.6 it would always return 1 on errors (rc is -1, and with --1 we 
 get 1), while
 the errno would have EINVAL.
 
 To be consistent and have this function return an proper -Exx value we
 need that check to use errno in case rc == -1.

Maybe the best thing to do is to introduce a versioned
xen_xc_domain_add_to_physmap to include/hw/xen/xen_common.h


 I am uncomfortable with returning positive values as errors, which reminds me 
 -
 I need to update the commit to mention the return 1 issue.

Agreed


  
  
}
}

   @@ -422,11 +423,12 @@ static int xen_remove_from_physmap(XenIOState 
   *state,
xen_pfn_t idx = start_addr + i;
xen_pfn_t gpfn = phys_offset + i;

   +/* In Xen 4.6 rc is -1 and errno contains the error value. */
rc = xc_domain_add_to_physmap(xen_xc, xen_domid, 
   XENMAPSPACE_gmfn, idx, gpfn);
if (rc) {
fprintf(stderr, add_to_physmap MFN %PRI_xen_pfn to PFN %
   -PRI_xen_pfn failed: %d\n, idx, gpfn, rc);
   -return -rc;
   +PRI_xen_pfn failed: %d (errno: %d)\n, idx, gpfn, 
   rc, errno);
   +return rc == -1 ? -errno : -rc;
}
}

   -- 
   2.1.0
   
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxl: Increase device model startup timeout to 1min.

2015-07-02 Thread Anthony PERARD

On Wed, Jul 01, 2015 at 04:03:55PM +0100, Stefano Stabellini wrote:
 On Tue, 30 Jun 2015, Ian Jackson wrote:
  * The number and nature of parallel operations done in the stress
test is unreasonable for the provided hardware:
  = the timeout is fine
   
   I don't know if it is our place to make this call.  Should we really be
   deciding what is considered reasonable? I think not. Defining what is
   reasonable and policies that match it is not a route I think we should
   take in libxl.
  
  Nevertheless if we are defining timeouts we are implicitly setting
  some parameters which imply that certain configurations are
  unreasonable.  Hopefully all such configurations are absurd.
  
  If what you mean is that our bounds of `reasonable' should be very
  wide, then I agree.  If anyone could reasonably expect it to work,
  then that is fine.  Certainly we should refrain fromk subjective
  judgements.
 
 OK.  How do you measure reasonable for this case?
 
 What I actually mean to ask is how do you suggest we proceed on this
 problem?
 
 Of course it would be nice if we knew exactly why this is happening, but
 the issue only happens once every 2-3 tempest runs, each of them takes
 about 1 hour.  Tempest executes about 1300 tests for each run, some
 of them in parallel. We haven't taken the time to read all the tests run
 by tempest so we don't know exactly what they do.
 
 We don't really know the environment that causes the failure. Reading
 all the tests is not an option. We could try adding more tracing to the
 system, but given the type of error, if we do we are not likely to
 reproduce the error at all, or maybe reproduce something different.
 
 
 Given the state of things, I suggest we make sure that increasing the
 timeout actually fixes/works-around the problem. I would also like to
 see some empirical measurements that tell us by how much we should
 increase the timeout. Is 1 minute actually enough?

I have tested an increase timeout this night. And here are the result.

The machine is a AMD Opteron(tm) Processor 4284, with 8G of RAM and 8 pCPU.
It's running Ubuntu 14.04, with Xen 4.4. On top of that, OpenStack have
been deployed via devstack on a single.

The test is to run Tempest with --concurrency=4. There are 4 tests runned
in parallel, but they don't necessarly start a VM. When they do, it's a PV
with 64MB and 1 vCPU and sometime with double amount of RAM.

The stats:
  Tempest run: 22
  Tempest run time for each run: ~3000s
  Tempest number of test: 1143
after 22 run of tempest:
  QEMU start: 3352
  number of run that took more than 2s: 20
  number of run that took more than 9s: 6
  maximum start time: 10.973713s

I have gathered the QEMU start time by having strace running for each of
them. I have then look at the time it took from the first syscall
execve('qemu') until the syscall where QEMU respond on its QMP socket
(libxl have acknoledge that QEMU is running at that time).

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
 On 02/07/15 09:30, Dario Faggioli wrote:

  It is, therefore, not effective in making sure that, even with only one
  notification, you only kick the interested vcpu.
 
  This is the third time that I ask:
   (1) whether it is possible to have more vcpus queued on one pcpu PI 
   blocked list with desc.on (I really believe it is);
   (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
   soon as any notification arrives, instead that putting together a
   mechanism for kicking only a specific one.
 
 We will receive one NV for every time the hardware managed to
 successfully set desc.on
 
Right, I see it now, thanks.

 If multiple stack up and we proactively drain the list, we will
 subsequently search the list to completion for all remaining NV's, due
 to finding no appropriate entries.
 
 I can't currently decide whether this will be quicker or slower overall,
 or (most likely) it will even out to equal in the general case.
 
Well, given the thing works as you (two) just described, I think
draining the list is the only thing we can do.

In fact, AFAICT, since we can't know for what vcpu a particular
notification is intended, we don't have alternatives to waking them all,
do we?

Dario

-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 1/3] xen/domain_page: Convert map_domain_page_global() to using mfn_t

2015-07-02 Thread Ben Catterall

From: Andrew Cooper andrew.coop...@citrix.com

The sh_map/unmap wrappers can be dropped, and take the opportunity to turn
some #define's into static inlines, for added type saftey.

As part of adding the type safety, GCC highlights an problematic include cycle
with arm/mm.h including domain_page.h which includes xen/mm.h and falls over
__page_to_mfn being used before being declared.  Simply dropping the inclusion
of domain_page.h fixes the compilation issue.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Jan Beulich jbeul...@suse.com
CC: Tim Deegan t...@xen.org
CC: Ian Campbell ian.campb...@citrix.com
CC: Stefano Stabellini stefano.stabell...@citrix.com
---
 xen/arch/arm/mm.c|  6 ++
 xen/arch/x86/domain_page.c   |  9 -
 xen/arch/x86/mm/shadow/multi.c   | 10 +-
 xen/arch/x86/mm/shadow/private.h | 12 
 xen/include/asm-arm/mm.h |  1 -
 xen/include/xen/domain_page.h| 22 +-
 6 files changed, 28 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index ff1b330..d479048 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -271,11 +271,9 @@ void clear_fixmap(unsigned map)
 }
 
 #ifdef CONFIG_DOMAIN_PAGE
-void *map_domain_page_global(unsigned long mfn)
+void *map_domain_page_global(mfn_t mfn)
 {
-mfn_t m = _mfn(mfn);
-
-return vmap(m, 1);
+return vmap(mfn, 1);
 }
 
 void unmap_domain_page_global(const void *va)
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index d684b2f..0f7548b 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -302,17 +302,16 @@ int mapcache_vcpu_init(struct vcpu *v)
 return 0;
 }
 
-void *map_domain_page_global(unsigned long mfn)
+void *map_domain_page_global(mfn_t mfn)
 {
-mfn_t m = _mfn(mfn);
 ASSERT(!in_irq()  local_irq_is_enabled());
 
 #ifdef NDEBUG
-if ( mfn = PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
-return mfn_to_virt(mfn);
+if ( mfn_x(mfn) = PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+return mfn_to_virt(mfn_x(mfn));
 #endif
 
-return vmap(m, 1);
+return vmap(mfn, 1);
 }
 
 void unmap_domain_page_global(const void *ptr)
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 42204d9..54d0bd3 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3806,7 +3806,7 @@ sh_detach_old_tables(struct vcpu *v)
 if ( v-arch.paging.shadow.guest_vtable )
 {
 if ( shadow_mode_external(d) || shadow_mode_translate(d) )
-sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
+unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
 v-arch.paging.shadow.guest_vtable = NULL;
 }
 #endif // !NDEBUG
@@ -3977,8 +3977,8 @@ sh_update_cr3(struct vcpu *v, int do_locking)
 if ( shadow_mode_external(d) || shadow_mode_translate(d) )
 {
 if ( v-arch.paging.shadow.guest_vtable )
-sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
-v-arch.paging.shadow.guest_vtable = sh_map_domain_page_global(gmfn);
+unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
+v-arch.paging.shadow.guest_vtable = map_domain_page_global(gmfn);
 /* PAGING_LEVELS==4 implies 64-bit, which means that
  * map_domain_page_global can't fail */
 BUG_ON(v-arch.paging.shadow.guest_vtable == NULL);
@@ -4010,8 +4010,8 @@ sh_update_cr3(struct vcpu *v, int do_locking)
 if ( shadow_mode_external(d) || shadow_mode_translate(d) )
 {
 if ( v-arch.paging.shadow.guest_vtable )
-sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
-v-arch.paging.shadow.guest_vtable = sh_map_domain_page_global(gmfn);
+unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
+v-arch.paging.shadow.guest_vtable = map_domain_page_global(gmfn);
 /* Does this really need map_domain_page_global?  Handle the
  * error properly if so. */
 BUG_ON(v-arch.paging.shadow.guest_vtable == NULL); /* XXX */
diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
index f72ea9f..eff39dc 100644
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -517,18 +517,6 @@ sh_unmap_domain_page(void *p)
 unmap_domain_page(p);
 }
 
-static inline void *
-sh_map_domain_page_global(mfn_t mfn)
-{
-return map_domain_page_global(mfn_x(mfn));
-}
-
-static inline void
-sh_unmap_domain_page_global(void *p)
-{
-unmap_domain_page_global(p);
-}
-
 /**/
 /* Shadow-page refcounting. */
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 3601140..2e1f21a 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -5,7 +5,6 @@
 #include xen/kernel.h
 #include asm/page.h
 #include public/xen.h
-#include

Re: [Xen-devel] [PATCH] Refactor ioreq server for better performance

2015-07-02 Thread Yu, Zhang


[snip]



Thanks, Paul.
Well, I agree the former approach would be simpler. But I still doubt
if this is more reasonable. :)
IIUC, one of the reasons for struct domain to have a rangeset list(and
a spinlock - rangesets_lock), is because there are iomem_caps and
irq_caps for each domain. These 2 rangeset members of struct domain are
platform independent.
However, struct rb_rangeset is only supposed to be used in ioreq
server, which is only for x86 hvm cases. Adding a rb_rangeset list
member(similarly, if so, a rb_rangesets_lock is also required) in
struct domain maybe useless for hardware domain and for platforms other
than x86.


Fair enough.


So, I'd like to register a new debug key, to dump the ioreq server
informations, just like the keys to dump iommu p2m table or the irq
mappings. With a new debug key, we do not need to add a spinlock for
rb_rangeset in struct domain, the one in ioreq server would be enough.
Does this sound reasonable?



That would be ok with me, but I'm not sure about claiming a whole debug key for 
this. Is there any other one that you could piggy-back on? If not, then maybe 
just make it part of the 'q' output.


Thanks, my new implementation uses the 'q' debug key. Will send out the
new version later. :)

Yu


   Paul



[snip]




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel







___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxl: Increase device model startup timeout to 1min.

2015-07-02 Thread Ian Jackson

Anthony PERARD writes (Re: [PATCH] libxl: Increase device model startup 
timeout to 1min.):
 I have tested an increase timeout this night. And here are the result.
 
 The machine is a AMD Opteron(tm) Processor 4284, with 8G of RAM and 8 pCPU.
 It's running Ubuntu 14.04, with Xen 4.4. On top of that, OpenStack have
 been deployed via devstack on a single.
 
 The test is to run Tempest with --concurrency=4. There are 4 tests runned
 in parallel, but they don't necessarly start a VM. When they do, it's a PV
 with 64MB and 1 vCPU and sometime with double amount of RAM.
 
 The stats:
   Tempest run: 22
   Tempest run time for each run: ~3000s
   Tempest number of test: 1143
 after 22 run of tempest:
   QEMU start: 3352
   number of run that took more than 2s: 20
   number of run that took more than 9s: 6
   maximum start time: 10.973713s
 
 I have gathered the QEMU start time by having strace running for each of
 them. I have then look at the time it took from the first syscall
 execve('qemu') until the syscall where QEMU respond on its QMP socket
 (libxl have acknoledge that QEMU is running at that time).

Thanks for this information.

So from what you say it appears that we are running at most 4 copies
of libxl and qemu in parallel, along with at most 4 VMs ?

And out of 3352 qemu starts, we have
  = 2s   3332
  2s = 9s 14
   9s   6
?

Do you have any information about the maximum system load in general ?
dom0 load, vcpu overcommit, etc. ?  Do you know what the guests are
doing ?

I'm starting to think that this might be a real bug but that the bug
might be Linux's I/O subsystem sometimes produces appalling latency
under load (which is hardly news).

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 13:16 +0100, Andrew Cooper wrote:
 On 02/07/15 13:04, Dario Faggioli wrote:
  On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:

  I can't currently decide whether this will be quicker or slower overall,
  or (most likely) it will even out to equal in the general case.
 
  Well, given the thing works as you (two) just described, I think
  draining the list is the only thing we can do.
 
  In fact, AFAICT, since we can't know for what vcpu a particular
  notification is intended, we don't have alternatives to waking them all,
  do we?
 
 Perhaps you misunderstand.
 
I'm quite sure I was. While I think now I'm getting it.

 Every single vcpu has a PI descriptor which is shared memory with hardware.
 
Right.

 A NV is delivered strictly when hardware atomically changes desc.on from
 0 to 1.  i.e. the first time that an oustanding notification arrives. 
 (iirc, desc.on is later cleared by hardware when the vcpu is scheduled
 and the vector(s) actually injected.)
 
 Part of the scheduling modifications alter when a vcpu is eligible to
 have NV's delivered on its behalf.  non-scheduled vcpus get NV's while
 scheduled vcpus have direct injection instead.
 
Blocked vcpus, AFAICT. But that's not relevant here.

 Therefore, in the case that an NV arrives, we know for certain that one
 of the NV-eligible vcpus has had desc.on set by hardware, and we can
 uniquely identify it by searching for the vcpu for which desc.on is set.
 
Yeah, but we ca have more than one of them. You said I can't currently
decide whether this will be quicker or slower, which I read like you
were suggesting that not draining the queue was a plausible alternative,
while I now think it's not.

Perhaps you were not meaning anything like that, so it was not necessary
for me to point this out, in which case, sorry for the noise. :-)

 In the case of stacked NV's, we cannot associate which specific vcpu
 caused which NV, but we know that we will get one NV per vcpu needing
 kicking.
 
Exactly, and that's what I'm talking about, and why I'm saying that
waking everyone is the only solution. The bottom line being that, even
in case this is deemed too slow, we don't have the option of waking only
one vcpu at each NV, as we wouldn't know who to wake, and hence we'd
need to make things faster in some other way.


Regards,
Dario
-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()

2015-07-02 Thread Andrew Cooper

On 02/07/15 13:43, Ben Catterall wrote:
 Removed as they were wrappers around map_domain_page() to
 make it appear to take an mfn_t type.

 Signed-off-by: Ben Catterall ben.catter...@citrix.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

In the future, a patch like this should either state it is dependent on
another series, or in this case, probably be a 4/4 on the existing series.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Andrew Cooper

On 02/07/15 13:04, Dario Faggioli wrote:
 On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
 On 02/07/15 09:30, Dario Faggioli wrote:
 It is, therefore, not effective in making sure that, even with only one
 notification, you only kick the interested vcpu.

 This is the third time that I ask:
  (1) whether it is possible to have more vcpus queued on one pcpu PI 
  blocked list with desc.on (I really believe it is);
  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
  soon as any notification arrives, instead that putting together a
  mechanism for kicking only a specific one.
 We will receive one NV for every time the hardware managed to
 successfully set desc.on

 Right, I see it now, thanks.

 If multiple stack up and we proactively drain the list, we will
 subsequently search the list to completion for all remaining NV's, due
 to finding no appropriate entries.

 I can't currently decide whether this will be quicker or slower overall,
 or (most likely) it will even out to equal in the general case.

 Well, given the thing works as you (two) just described, I think
 draining the list is the only thing we can do.

 In fact, AFAICT, since we can't know for what vcpu a particular
 notification is intended, we don't have alternatives to waking them all,
 do we?

Perhaps you misunderstand.

Every single vcpu has a PI descriptor which is shared memory with hardware.

A NV is delivered strictly when hardware atomically changes desc.on from
0 to 1.  i.e. the first time that an oustanding notification arrives. 
(iirc, desc.on is later cleared by hardware when the vcpu is scheduled
and the vector(s) actually injected.)

Part of the scheduling modifications alter when a vcpu is eligible to
have NV's delivered on its behalf.  non-scheduled vcpus get NV's while
scheduled vcpus have direct injection instead.

Therefore, in the case that an NV arrives, we know for certain that one
of the NV-eligible vcpus has had desc.on set by hardware, and we can
uniquely identify it by searching for the vcpu for which desc.on is set.

In the case of stacked NV's, we cannot associate which specific vcpu
caused which NV, but we know that we will get one NV per vcpu needing
kicking.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-02 Thread Vijay Kilari

On Fri, Jun 26, 2015 at 8:35 PM, Julien Grall julien.gr...@citrix.com wrote:
 Hi Vijay,

 On 26/06/2015 14:54, Vijay Kilari wrote:

 On Tue, Jun 23, 2015 at 8:02 PM, Julien Grall julien.gr...@citrix.com
 wrote:

 Hi Vijay,

 On 22/06/15 13:01, vijay.kil...@gmail.com wrote:

 From: Vijaya Kumar K vijaya.ku...@caviumnetworks.com

 Implements hw_irq_controller api's required
 to handle LPI's


 This patch doesn't hw_irq_controller for LPI but just hack around the
 current GICv3 host hw_irq_controller.

 As said on the previous version, the goal of hw_irq_controller is too
 keep things simple (i.e few conditional code). Please introduce a
 separate hw_irq_controller for LPIs.


 If new hw_irq_controller is introduced for LPIs, then this has to
 be exported using some lpi structure which holds pointer to
 hw_irq_controller
 for guest  host type similar to gic_hw_ops


 The interface is not set in stone, you are free to change what you want as
 long as we keep something clean and comprehensible. It's the same for the
 functions (I have in mind route_irq_to_guest).

 In this case, I would prefer to see 2 callbacks (one for the host the other
 for the guest) which return the correct IRQ controller for a specific IRQ. I
 have in mind something like:

get_guest_hw_irq_controller(unsigned int irq)
{
if ( !is_lpi )
  return gicv3_guest_irq_controller
else
  return gicv3_guest_lpi_controller
}

 Same for the host irq controller. So the selection of the IRQ controller
 would be hidden from gic.c and keep the code a generic as possible.

 +/*
 + * Make the above write visible to the redistributors.
 + * And yes, we're flushing exactly: One. Single. Byte.
 + * Humpf...
 + */
 +if ( gic_rdists-flags  RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING )
 +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg));
 +else
 +dsb(ishst);
 +
 +/* Get collection id for this event id */
 +col = its_dev-its-collections[virq % num_online_cpus()];


 This is fragile, you are assuming that num_online_cpus() will never
 change. Why don't you store the collection in every irq_desc?


This will add additional 8 bytes overhead for each irq_desc.

 Also is there a macro to get number of actual number processors in
 system.?
 AUI, nr_cpu_ids always returns 128


 AFAIU, nr_cpu_ids should reflect the number of CPU of the platform. x86
 correctly set it when parsing the ACPI. So I think this is a bug in the ARM
 code.

 In fact, I wasn't able to find a place in the ARM code where this value is
 changed.

nr_cpu_ids is not changed in case of ARM. I think this value has
to be updated in start_xen with cpus value

void __init start_xen(unsigned long boot_phys_offset,
  unsigned long fdt_paddr,
  unsigned long cpuid)


cpus = smp_get_max_cpus();
..

}

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread Vitaly Kuznetsov

George Dunlap george.dun...@eu.citrix.com writes:

 On 07/02/2015 12:25 PM, Tim Deegan wrote:
 At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote:
 On 02/07/15 11:48, George Dunlap wrote:
 Now in p2m_set_mem_access(), rather than just using an unsigned long in
 the loop iterating over gfns, you do this thing where you convert gfn_t
 to unsigned long, add one, and then convert it back to gfn_t again.

 I can't see any comments in v3 that suggest you doing that, and it seems
 a bit clunky.  Is that really necessary?  Wouldn't it be better to
 declare a local variable?

 I'm not strongly opinionated on this one, it just seems a bit strange.

 Everything else looks good, thanks.

 Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
 typesafety don't allow for simply adding 1 to a typesafe variable.

 In a cases like this, I think it is acceptable to keep a unsigned long
 shadow variable and manipulate it is a plain integer.  The eventual
 _gfn() required to pass it further down the callchain will help to
 visually re-enforce the appropriate type.

 After all, the entire point of these typesafes are to try and avoid
 accidentally mixing up the different address spaces, but a function
 which takes a typesafe, loops over a subset and passes the same typesafe
 further down can probably be trusted to DTRT, catching errors at review
 time. 

 Off the top of my head, the only functions which would normally expect
 to mix and match the typesafes are the pagetable walking ones.
 
 It should be easy enough to extend the macros to define a
 gfn_inc(gfn_t) operator for this kind of thing.

 I was thinking that -- although in this case you'd still need to un-pack
 it to do the loop exit conditional.  To really make things pretty you'd
 want a for_gfn_range() macro or something like that that takes a start
 gfn and a number.

 But that's really starting to be feature creep for this patch, which is
 why I didn't want to suggest it on v4. :-)

Well, if you look at what I was fixing in v1 ... :-)

I suggest we add a local unsigned long here and close the deal, when
{g,m,p}fn_{inc,dec} macros are available we'll get rid of it.

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-02 Thread Ian Campbell

On Thu, 2015-07-02 at 17:51 +0530, Vijay Kilari wrote:
 On Mon, Jun 29, 2015 at 5:29 PM, Ian Campbell ian.campb...@citrix.com wrote:
  On Tue, 2015-06-23 at 15:32 +0100, Julien Grall wrote:
  [...]
   +{
   +struct its_collection *col;
   +struct its_device *its_dev = get_irq_device(desc);
   +u8 *cfg;
   +u32 virq = irq_to_virq(desc);
   +
   +ASSERT(virq  its_dev-nr_lpis);
   +
   +cfg = gic_rdists-prop_page + desc-irq - NR_GIC_LPI;
   +if ( enable )
   +*cfg |= LPI_PROP_ENABLED;
   +else
   +*cfg = ~LPI_PROP_ENABLED;
   +
   +/*
   + * Make the above write visible to the redistributors.
   + * And yes, we're flushing exactly: One. Single. Byte.
   + * Humpf...
   + */
   +if ( gic_rdists-flags  RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING )
   +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg));
   +else
   +dsb(ishst);
   +
   +/* Get collection id for this event id */
   +col = its_dev-its-collections[virq % num_online_cpus()];
 
  This is fragile, you are assuming that num_online_cpus() will never
  change. Why don't you store the collection in every irq_desc?
 
  The original Linux code upon which this is based doesn't seem to need to
  lookup the collection here, why is flushing needed for us but not Linux?
 
 We are writing to lpi property table. Even linux code flushes it.

Sorry I was referring to the collection look up and inv, not the cache
flush, i.e. this bit:

+/* Get collection id for this event id */
+col = its_dev-its-collections[virq % num_online_cpus()];
+its_send_inv(its_dev, col, virq);

Linux doesn't seem to do that INV there.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()

2015-07-02 Thread Ben Catterall

Removed as they were wrappers around map_domain_page() to
make it appear to take an mfn_t type.

Signed-off-by: Ben Catterall ben.catter...@citrix.com
---
 xen/arch/x86/mm/hap/hap.c|   4 +-
 xen/arch/x86/mm/shadow/common.c  |  22 +++---
 xen/arch/x86/mm/shadow/multi.c   | 152 +++
 xen/arch/x86/mm/shadow/private.h |  13 
 xen/include/asm-x86/hap.h|  15 
 5 files changed, 89 insertions(+), 117 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index d0d3f1e..63980af 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -395,7 +395,7 @@ static void hap_install_xen_entries_in_l4(struct vcpu *v, 
mfn_t l4mfn)
 struct domain *d = v-domain;
 l4_pgentry_t *l4e;
 
-l4e = hap_map_domain_page(l4mfn);
+l4e = map_domain_page(l4mfn);
 
 /* Copy the common Xen mappings from the idle domain */
 memcpy(l4e[ROOT_PAGETABLE_FIRST_XEN_SLOT],
@@ -411,7 +411,7 @@ static void hap_install_xen_entries_in_l4(struct vcpu *v, 
mfn_t l4mfn)
 l4e[l4_table_offset(LINEAR_PT_VIRT_START)] =
 l4e_from_pfn(mfn_x(l4mfn), __PAGE_HYPERVISOR);
 
-hap_unmap_domain_page(l4e);
+unmap_domain_page(l4e);
 }
 
 static mfn_t hap_make_monitor_table(struct vcpu *v)
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index da6b847..3da9767 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -781,11 +781,11 @@ static void oos_hash_add(struct vcpu *v, mfn_t gmfn)
 if ( swap )
 SWAP(oos_snapshot[idx], oos_snapshot[oidx]);
 
-gptr = sh_map_domain_page(oos[oidx]);
-gsnpptr = sh_map_domain_page(oos_snapshot[oidx]);
+gptr = map_domain_page(oos[oidx]);
+gsnpptr = map_domain_page(oos_snapshot[oidx]);
 memcpy(gsnpptr, gptr, PAGE_SIZE);
-sh_unmap_domain_page(gptr);
-sh_unmap_domain_page(gsnpptr);
+unmap_domain_page(gptr);
+unmap_domain_page(gsnpptr);
 }
 
 /* Remove an MFN from the list of out-of-sync guest pagetables */
@@ -1498,7 +1498,7 @@ mfn_t shadow_alloc(struct domain *d,
 p = __map_domain_page(sp);
 ASSERT(p != NULL);
 clear_page(p);
-sh_unmap_domain_page(p);
+unmap_domain_page(p);
 INIT_PAGE_LIST_ENTRY(sp-list);
 page_list_add(sp, tmp_list);
 sp-u.sh.type = shadow_type;
@@ -2524,7 +2524,7 @@ static int sh_remove_shadow_via_pointer(struct domain *d, 
mfn_t smfn)
 if (sp-up == 0) return 0;
 pmfn = _mfn(sp-up  PAGE_SHIFT);
 ASSERT(mfn_valid(pmfn));
-vaddr = sh_map_domain_page(pmfn);
+vaddr = map_domain_page(pmfn);
 ASSERT(vaddr);
 vaddr += sp-up  (PAGE_SIZE-1);
 ASSERT(l1e_get_pfn(*(l1_pgentry_t *)vaddr) == mfn_x(smfn));
@@ -2554,7 +2554,7 @@ static int sh_remove_shadow_via_pointer(struct domain *d, 
mfn_t smfn)
 default: BUG(); /* Some wierd unknown shadow type */
 }
 
-sh_unmap_domain_page(vaddr);
+unmap_domain_page(vaddr);
 if ( rc )
 perfc_incr(shadow_up_pointer);
 else
@@ -3028,7 +3028,7 @@ int shadow_enable(struct domain *d, u32 mode)
 e[i] = ((0x40U * i)
 | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER
 | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-sh_unmap_domain_page(e);
+unmap_domain_page(e);
 pg-u.inuse.type_info = PGT_l2_page_table | 1 | PGT_validated;
 }
 
@@ -3631,8 +3631,8 @@ int shadow_track_dirty_vram(struct domain *d,
 if ( sl1mfn != map_mfn )
 {
 if ( map_sl1p )
-sh_unmap_domain_page(map_sl1p);
-map_sl1p = sh_map_domain_page(_mfn(sl1mfn));
+unmap_domain_page(map_sl1p);
+map_sl1p = map_domain_page(_mfn(sl1mfn));
 map_mfn = sl1mfn;
 }
 sl1e = map_sl1p + (sl1ma  ~PAGE_MASK);
@@ -3663,7 +3663,7 @@ int shadow_track_dirty_vram(struct domain *d,
 }
 
 if ( map_sl1p )
-sh_unmap_domain_page(map_sl1p);
+unmap_domain_page(map_sl1p);
 
 memcpy(dirty_bitmap, dirty_vram-dirty_bitmap, dirty_size);
 memset(dirty_vram-dirty_bitmap, 0, dirty_size);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 2e3d3f6..4c8badf 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -221,16 +221,16 @@ shadow_check_gwalk(struct vcpu *v, unsigned long va, 
walk_t *gw, int version)
 #if GUEST_PAGING_LEVELS = 4 /* 64-bit only... */
 l4p = (guest_l4e_t *)v-arch.paging.shadow.guest_vtable;
 mismatch |= (gw-l4e.l4 != l4p[guest_l4_table_offset(va)].l4);
-l3p = sh_map_domain_page(gw-l3mfn);
+l3p = map_domain_page(gw-l3mfn);
 mismatch |= (gw-l3e.l3 != l3p[guest_l3_table_offset(va)].l3);
-sh_unmap_domain_page(l3p);
+

Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-02 Thread Vijay Kilari

On Thu, Jul 2, 2015 at 6:05 PM, Ian Campbell ian.campb...@citrix.com wrote:
 On Thu, 2015-07-02 at 17:51 +0530, Vijay Kilari wrote:
 On Mon, Jun 29, 2015 at 5:29 PM, Ian Campbell ian.campb...@citrix.com 
 wrote:
  On Tue, 2015-06-23 at 15:32 +0100, Julien Grall wrote:
  [...]
   +{
   +struct its_collection *col;
   +struct its_device *its_dev = get_irq_device(desc);
   +u8 *cfg;
   +u32 virq = irq_to_virq(desc);
   +
   +ASSERT(virq  its_dev-nr_lpis);
   +
   +cfg = gic_rdists-prop_page + desc-irq - NR_GIC_LPI;
   +if ( enable )
   +*cfg |= LPI_PROP_ENABLED;
   +else
   +*cfg = ~LPI_PROP_ENABLED;
   +
   +/*
   + * Make the above write visible to the redistributors.
   + * And yes, we're flushing exactly: One. Single. Byte.
   + * Humpf...
   + */
   +if ( gic_rdists-flags  RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING )
   +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg));
   +else
   +dsb(ishst);
   +
   +/* Get collection id for this event id */
   +col = its_dev-its-collections[virq % num_online_cpus()];
 
  This is fragile, you are assuming that num_online_cpus() will never
  change. Why don't you store the collection in every irq_desc?
 
  The original Linux code upon which this is based doesn't seem to need to
  lookup the collection here, why is flushing needed for us but not Linux?

 We are writing to lpi property table. Even linux code flushes it.

 Sorry I was referring to the collection look up and inv, not the cache
 flush, i.e. this bit:

 +/* Get collection id for this event id */
 +col = its_dev-its-collections[virq % num_online_cpus()];
 +its_send_inv(its_dev, col, virq);

 Linux doesn't seem to do that INV there.

Linux does INV.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/irqchip/irq-gic-v3-its.c?id=refs/tags/v4.1

line 555

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/7] libxc: fix uninitialized variable in xc_cpuid_pv_policy()

2015-07-02 Thread Andrew Cooper

On 01/07/15 18:37, Jennifer Herbert wrote:
 If xc_domain_get_guest_width were to fail, guest_width is not set, and
 hence guest_64bit becomes undefined.
 Fix is to initialise to 0, and report error if call fails.

 Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
 ---
  tools/libxc/xc_cpuid_x86.c |8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

 diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
 index c97f91a..847b701 100644
 --- a/tools/libxc/xc_cpuid_x86.c
 +++ b/tools/libxc/xc_cpuid_x86.c
 @@ -437,14 +437,16 @@ static void xc_cpuid_pv_policy(

Urgh - let another libxc function which can fail hard in several ways,
yet is void.

  {
  DECLARE_DOMCTL;
  unsigned int guest_width;
 -int guest_64bit;
 +int guest_64bit = 0;

The default in Xen is that a PV guest is 64bit until explicitly
converted to being compat.  The better default therefore is 1.

  char brand[13];
  uint64_t xfeature_mask;
  
  xc_cpuid_brand_get(brand);
  
 -xc_domain_get_guest_width(xch, domid, guest_width);
 -guest_64bit = (guest_width == 8);
 +if (xc_domain_get_guest_width(xch, domid, guest_width) == 0)
 +guest_64bit = (guest_width == 8);
 +else
 +ERROR(Could not read guest word width.);

No full stop please.

  
  /* Detecting Xen's atitude towards XSAVE */
  memset(domctl, 0, sizeof(domctl));


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng

 -Original Message-
 From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
 Sent: Thursday, July 02, 2015 8:04 PM
 To: Andrew Cooper
 Cc: Wu, Feng; Tian, Kevin; k...@xen.org; george.dun...@eu.citrix.com;
 xen-devel@lists.xen.org; jbeul...@suse.com; Zhang, Yang Z
 Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
 is blocked

 On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
  On 02/07/15 09:30, Dario Faggioli wrote:

   It is, therefore, not effective in making sure that, even with only one
   notification, you only kick the interested vcpu.

   This is the third time that I ask:
(1) whether it is possible to have more vcpus queued on one pcpu PI
blocked list with desc.on (I really believe it is);
(2) if yes, whether it is TheRightThing(TM) to kick all of them, as
soon as any notification arrives, instead that putting together a
mechanism for kicking only a specific one.

  We will receive one NV for every time the hardware managed to
  successfully set desc.on

 Right, I see it now, thanks.

  If multiple stack up and we proactively drain the list, we will
  subsequently search the list to completion for all remaining NV's, due
  to finding no appropriate entries.

  I can't currently decide whether this will be quicker or slower overall,
  or (most likely) it will even out to equal in the general case.

 Well, given the thing works as you (two) just described, I think
 draining the list is the only thing we can do.

 In fact, AFAICT, since we can't know for what vcpu a particular
 notification is intended,

Exactly, when notification event happens, the hardware sets 'ON',
software will find the vCPU with 'ON' set, in fact, software doesn't
know which vCPU the wakeup event is targeting, the only thing it
can do is kicking the vCPUs with desc.on = 1.

Thanks,
Feng

 we don't have alternatives to waking them all,
 do we?

 Dario

 --
 This happens because I choose it to happen! (Raistlin Majere)
 -
 Dario Faggioli, Ph.D, http://about.me/dario.faggioli
 Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 11/22] xen/x86: allow disabling emulated devices for HVM guests

2015-07-02 Thread Boris Ostrovsky


On 07/02/2015 07:49 AM, Stefano Stabellini wrote:

On Wed, 1 Jul 2015, Andrew Cooper wrote:

On 01/07/15 17:13, Stefano Stabellini wrote:

On Wed, 1 Jul 2015, Andrew Cooper wrote:

On 01/07/15 16:51, Boris Ostrovsky wrote:

On 07/01/2015 11:46 AM, Andrew Cooper wrote:

On 01/07/15 15:46, Roger Pau Monne wrote:

Introduce a new DOMCTL flag that can be used to disable device
emulation
inside of Xen for HVM guests. The following emulated devices are
disabled
when the XEN_DOMCTL_CDF_noemu is used: hpet, pmtimer, rtc, ioapic,
lapic,
pic and pmu. Also all the MMIO handlers are disabled.

Signed-off-by: Roger Pau Monné roger@citrix.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Andrew Cooper andrew.coop...@citrix.com
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
Cc: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com
Cc: Jun Nakajima jun.nakaj...@intel.com
Cc: Eddie Dong eddie.d...@intel.com
Cc: Kevin Tian kevin.t...@intel.com

I would be hesitant to have a blanket change like this.

Consider APICV/AVIC.  For performance reasons, we absolutely want HVM
and PVH to make use of them, as they are substantially more efficient
using hardware support than evening using plain evtchn hypercalls.

However, the flipside is that we must provide an LAPIC emulation to
cover the bits which hardware cannot virtualise.

As a random idea, how about having a new hypercall or hvmparam which
provides a bitmap of permitted emulators?  This would allow far finer
grain control over what is and isn't available to a domain.

I think we also need to decide on which subsets of emulators we are
going to support, otherwise test matrix will become pretty big. For
example, initially we may want to allow all (for what we now call HVM)
or none (PVH).

Right, but that can currently be enforced with an if ( arg != 0  arg
!= ~0 ) return -EOPNOTSUPP; in the hypercall handler for now.

It still leaves us with the ability to add in LAPIC emulation in the
future by changing the auditing.  A blanket no emulation boolean is
very much harder to relax in the future.

APICV is a bit of a special case, because it is partially virtualized in
hardware.

Not in the slightest.  It is *exactly* the same as existing hardware
virt.

I thought we were speaking about emulation, specifically regarding
device emulation in Xen x86, such as the hpet for example. In this
context APICV is a bit of a special case. Are there other devices being
partially virtualized in hardware on x86? (I admit that I haven't follow
x86 development that closely.)



Hardware does most of the work, but occasionally needs to break
into Xen to mange thing.  The difference is that we don't call some of
the existing vmexits emulating an x86 cpu, despite this being what is
actually happening.

To me, that is different.



But in general, considering that the whole purpose of PVH as DomU is
security


From kernel perspective, the major reason for having PVH is to move 
away from PV memory management (in the long term)



-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-02 Thread Vijay Kilari

On Mon, Jun 29, 2015 at 5:29 PM, Ian Campbell ian.campb...@citrix.com wrote:
 On Tue, 2015-06-23 at 15:32 +0100, Julien Grall wrote:
 [...]
  +{
  +struct its_collection *col;
  +struct its_device *its_dev = get_irq_device(desc);
  +u8 *cfg;
  +u32 virq = irq_to_virq(desc);
  +
  +ASSERT(virq  its_dev-nr_lpis);
  +
  +cfg = gic_rdists-prop_page + desc-irq - NR_GIC_LPI;
  +if ( enable )
  +*cfg |= LPI_PROP_ENABLED;
  +else
  +*cfg = ~LPI_PROP_ENABLED;
  +
  +/*
  + * Make the above write visible to the redistributors.
  + * And yes, we're flushing exactly: One. Single. Byte.
  + * Humpf...
  + */
  +if ( gic_rdists-flags  RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING )
  +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg));
  +else
  +dsb(ishst);
  +
  +/* Get collection id for this event id */
  +col = its_dev-its-collections[virq % num_online_cpus()];

 This is fragile, you are assuming that num_online_cpus() will never
 change. Why don't you store the collection in every irq_desc?

 The original Linux code upon which this is based doesn't seem to need to
 lookup the collection here, why is flushing needed for us but not Linux?

We are writing to lpi property table. Even linux code flushes it.


 I'm also confused by the use of the variable name virq in a function
 called set_lpi_config which appears to be dealing with host level
 physical LPIs. It seems like this function would be broken for LPIs
 which were delivered to Xen and not to a guest, and that the irq_to_virq
 in here ought to be desc-irq, no?

I am using desc-irq for updating lpi property table but using vid to
send inv command


 BTW, the Linux original calls what this calls virq id instead, which
 is much less confusing.

OK will rename virq to vid

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/6] add xsaves/xrstors support

2015-07-02 Thread Andrew Cooper

On 02/07/15 15:02, Shuai Ruan wrote:
 This patchset enable xsaves/xrstors feature.
 It includes tree parts:
 1. add xsaves/xrstors for xen.
 2. add xsaves/xrstors for pv guest.
 3. add xsaves/xrstors for hvn guest.

What is xsaves/xrstors and why might I want Xen to use it? What
advantages does it give? When might these instructions be available?
Where can I read more details about this?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 3/3] Convert map_domain_page() to use the new mfn_t type

2015-07-02 Thread Ben Catterall

Reworked the internals and declaration, applying (un)boxing
where needed. Converted calls to map_domain_page() to
provide mfn_t types, boxing where needed.

Signed-off-by: Ben Catterall ben.catter...@citrix.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

---
Changed since v1:
   * Created paddr_to_mfn() and mfn_to_paddr() for both x86 and ARM
   * Converted code to use the new paddr_to_mfn() rather than e.g.
 paddrPAGE_SHIFT

Signed-off-by: Ben Catterall ben.catter...@citrix.com
---
 xen/arch/arm/domain_build.c   |  2 +-
 xen/arch/arm/kernel.c |  2 +-
 xen/arch/arm/mm.c | 12 +-
 xen/arch/arm/p2m.c|  4 ++--
 xen/arch/arm/traps.c  |  4 ++--
 xen/arch/x86/debug.c  | 10 
 xen/arch/x86/domain.c |  4 ++--
 xen/arch/x86/domain_build.c   | 10 
 xen/arch/x86/domain_page.c| 22 -
 xen/arch/x86/domctl.c |  2 +-
 xen/arch/x86/mm.c | 40 +++
 xen/arch/x86/mm/guest_walk.c  |  2 +-
 xen/arch/x86/mm/hap/guest_walk.c  |  2 +-
 xen/arch/x86/mm/mem_sharing.c |  4 ++--
 xen/arch/x86/mm/p2m-ept.c | 22 -
 xen/arch/x86/mm/p2m-pod.c |  8 +++
 xen/arch/x86/mm/p2m-pt.c  | 28 +++---
 xen/arch/x86/mm/p2m.c |  2 +-
 xen/arch/x86/mm/paging.c  | 32 -
 xen/arch/x86/mm/shadow/common.c   |  2 +-
 xen/arch/x86/mm/shadow/multi.c|  4 ++--
 xen/arch/x86/mm/shadow/private.h  |  2 +-
 xen/arch/x86/smpboot.c|  2 +-
 xen/arch/x86/tboot.c  |  4 ++--
 xen/arch/x86/traps.c  | 12 +-
 xen/arch/x86/x86_64/mm.c  | 14 +--
 xen/arch/x86/x86_64/traps.c   | 10 
 xen/arch/x86/x86_emulate.c| 10 
 xen/common/grant_table.c  |  4 ++--
 xen/common/kexec.c|  4 ++--
 xen/common/kimage.c   | 10 
 xen/common/memory.c   |  6 ++---
 xen/common/tmem_xen.c |  6 ++---
 xen/drivers/passthrough/amd/iommu_guest.c | 10 
 xen/drivers/passthrough/amd/iommu_map.c   | 14 +--
 xen/drivers/passthrough/vtd/x86/vtd.c |  2 +-
 xen/include/asm-arm/mm.h  |  2 ++
 xen/include/asm-x86/hap.h |  2 +-
 xen/include/asm-x86/page.h| 10 +---
 xen/include/asm-x86/paging.h  |  2 +-
 xen/include/xen/domain_page.h |  8 +++
 41 files changed, 179 insertions(+), 173 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index e9cb8a9..37db8b7 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1322,7 +1322,7 @@ static void initrd_load(struct kernel_info *kinfo)
 return;
 }
 
-dst = map_domain_page(maPAGE_SHIFT);
+dst = map_domain_page(_mfn(paddr_to_mfn(ma)));
 
 copy_from_paddr(dst + s, paddr + offs, l);
 
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 209c3dd..9826fb2 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -182,7 +182,7 @@ static void kernel_zimage_load(struct kernel_info *info)
 return;
 }
 
-dst = map_domain_page(maPAGE_SHIFT);
+dst = map_domain_page(_mfn(paddr_to_mfn(ma)));
 
 copy_from_paddr(dst + s, paddr + offs, l);
 
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index d479048..ae0f34c 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -213,7 +213,7 @@ void dump_pt_walk(paddr_t ttbr, paddr_t addr,
 else
 root_table = 0;
 
-mapping = map_domain_page(root_pfn + root_table);
+mapping = map_domain_page(_mfn(root_pfn + root_table));
 
 for ( level = root_level; ; level++ )
 {
@@ -230,7 +230,7 @@ void dump_pt_walk(paddr_t ttbr, paddr_t addr,
 
 /* For next iteration */
 unmap_domain_page(mapping);
-mapping = map_domain_page(pte.walk.base);
+mapping = map_domain_page(_mfn(pte.walk.base));
 }
 
 unmap_domain_page(mapping);
@@ -282,11 +282,11 @@ void unmap_domain_page_global(const void *va)
 }
 
 /* Map a page of domheap memory */
-void *map_domain_page(unsigned long mfn)
+void *map_domain_page(mfn_t mfn)
 {
 unsigned long flags;
 lpae_t *map = this_cpu(xen_dommap);
-unsigned long slot_mfn = mfn  ~LPAE_ENTRY_MASK;
+unsigned long slot_mfn = mfn_x(mfn)  ~LPAE_ENTRY_MASK;
 vaddr_t va;
 lpae_t pte;
 int i, slot;
@@ -339,7 +339,7 @@ void *map_domain_page(unsigned long mfn)
 
 va = (DOMHEAP_VIRT_START
   + (slot  SECOND_SHIFT)
-  + ((mfn  LPAE_ENTRY_MASK)  THIRD_SHIFT));
+

Re: [Xen-devel] Sharing display between guests

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 13:13 +0200, Maxime Ripard wrote:
 Hi,
 
Hi,

I don't have much comments on all this, just that, reading, it, it
sounded somewhat similar to what GL is doing:

http://www.xenproject.org/component/allvideoshare/video/latest/ces15-globalogic.html
http://www.xenproject.org/component/allvideoshare/video/globallogic-xen-android.html

 That would mean that we would have a static composition, that would
 be setup once and we could forget about it during the life of the
 system.
 
Well, they use a DomU and Dom0, rather than 2 DomUs (or so I think), but
is seems to me that at least some composition is indeed happening.

I don't have much more details, I'm afraid, but if you think your use
case is related, or at least that you could benefit from interacting
with them and sharing experiences/code/whatever, just ask, I've Cc-ed
some of them. :-)

Regards,
Dario
-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 1/2] Resize the MAX_NR_IO_RANGES for ioreq server

2015-07-02 Thread Yu Zhang

MAX_NR_IO_RANGES is used by ioreq server as the maximum
number of discrete ranges to be tracked. This patch changes
its value to 8k, so that more ranges can be tracked on next
generation of Intel platforms in XenGT. Future patches can
extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
can serve as a default limit.

Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com
---
 xen/include/asm-x86/hvm/domain.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index ad68fcf..d62fda9 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -49,7 +49,7 @@ struct hvm_ioreq_vcpu {
 };
 
 #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
-#define MAX_NR_IO_RANGES  256
+#define MAX_NR_IO_RANGES  8192
 
 struct hvm_ioreq_server {
 struct list_head   list_entry;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 2/2] Add new data structure to track ranges.

2015-07-02 Thread Yu Zhang

This patch introduces a new data structure, struct rb_rangeset,
to represent a group of continuous ranges, e.g. the start and end
addresses for PIO/MMIO regions. By now, this structure is supposed
to assist ioreq server to forward the I/O request to backend device
models more efficiently.

Behavior of this new data structure is quite similar to rangeset,
with major difference being the time complexity. Based on doubly
linked list, struct rangeset provides O(n) time complexity for
searching. And struct rb_rangeset is based on red-black tree, with
binary searching, the time complexity is improved to O(log(n)) -
more suitable to track massive discrete ranges.

Ioreq server code is changed to utilize this new type, and a new
routine, hvm_ioreq_server_dump_range_info, is added to dump all the
ranges tracked in an ioreq server.

Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com
---
 xen/arch/x86/domain.c|   3 +
 xen/arch/x86/hvm/hvm.c   |  56 ++--
 xen/common/Makefile  |   1 +
 xen/common/rb_rangeset.c | 281 +++
 xen/include/asm-x86/hvm/domain.h |   2 +-
 xen/include/asm-x86/hvm/hvm.h|   1 +
 xen/include/xen/rb_rangeset.h|  49 +++
 7 files changed, 378 insertions(+), 15 deletions(-)
 create mode 100644 xen/common/rb_rangeset.c
 create mode 100644 xen/include/xen/rb_rangeset.h

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a8fe046..f8a8b80 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2086,6 +2086,9 @@ int domain_relinquish_resources(struct domain *d)
 void arch_dump_domain_info(struct domain *d)
 {
 paging_dump_domain_info(d);
+
+if ( is_hvm_domain(d) )
+hvm_ioreq_server_dump_range_info(d);
 }
 
 void arch_dump_vcpu_info(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..c79676e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -37,6 +37,7 @@
 #include xen/wait.h
 #include xen/mem_access.h
 #include xen/rangeset.h
+#include xen/rb_rangeset.h
 #include asm/shadow.h
 #include asm/hap.h
 #include asm/current.h
@@ -818,7 +819,7 @@ static void hvm_ioreq_server_free_rangesets(struct 
hvm_ioreq_server *s,
 return;
 
 for ( i = 0; i  NR_IO_RANGE_TYPES; i++ )
-rangeset_destroy(s-range[i]);
+rb_rangeset_destroy(s-range[i]);
 }
 
 static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, 
@@ -842,8 +843,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
hvm_ioreq_server *s,
 if ( rc )
 goto fail;
 
-s-range[i] = rangeset_new(s-domain, name,
-   RANGESETF_prettyprint_hex);
+s-range[i] = rb_rangeset_new(name);
 
 xfree(name);
 
@@ -851,7 +851,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
hvm_ioreq_server *s,
 if ( !s-range[i] )
 goto fail;
 
-rangeset_limit(s-range[i], MAX_NR_IO_RANGES);
+rb_rangeset_limit(s-range[i], MAX_NR_IO_RANGES);
 }
 
  done:
@@ -1149,7 +1149,7 @@ static int hvm_map_io_range_to_ioreq_server(struct domain 
*d, ioservid_t id,
 
 if ( s-id == id )
 {
-struct rangeset *r;
+struct rb_rangeset *r;
 
 switch ( type )
 {
@@ -1169,10 +1169,10 @@ static int hvm_map_io_range_to_ioreq_server(struct 
domain *d, ioservid_t id,
 break;
 
 rc = -EEXIST;
-if ( rangeset_overlaps_range(r, start, end) )
+if ( rb_rangeset_overlaps_range(r, start, end) )
 break;
 
-rc = rangeset_add_range(r, start, end);
+rc = rb_rangeset_add_range(r, start, end);
 break;
 }
 }
@@ -1200,7 +1200,7 @@ static int hvm_unmap_io_range_from_ioreq_server(struct 
domain *d, ioservid_t id,
 
 if ( s-id == id )
 {
-struct rangeset *r;
+struct rb_rangeset *r;
 
 switch ( type )
 {
@@ -1220,10 +1220,10 @@ static int hvm_unmap_io_range_from_ioreq_server(struct 
domain *d, ioservid_t id,
 break;
 
 rc = -ENOENT;
-if ( !rangeset_contains_range(r, start, end) )
+if ( !rb_rangeset_contains_range(r, start, end) )
 break;
 
-rc = rangeset_remove_range(r, start, end);
+rc = rb_rangeset_remove_range(r, start, end);
 break;
 }
 }
@@ -1349,6 +1349,34 @@ static void hvm_destroy_all_ioreq_servers(struct domain 
*d)
 spin_unlock(d-arch.hvm_domain.ioreq_server.lock);
 }
 
+void  hvm_ioreq_server_dump_range_info(struct domain *d)
+{
+unsigned int i;
+struct hvm_ioreq_server *s;
+
+spin_lock(d-arch.hvm_domain.ioreq_server.lock);
+
+list_for_each_entry ( s,
+  d-arch.hvm_domain.ioreq_server.list,
+  list_entry )
+{
+if ( s ==

[Xen-devel] [PATCH 0/2] Refactor ioreq server for better performance.

2015-07-02 Thread Yu Zhang

XenGT leverages ioreq server to track and forward the accesses to
GPU I/O resources, e.g. the PPGTT(per-process graphic translation
tables). Currently, ioreq server uses rangeset to track the BDF/
PIO/MMIO ranges to be emulated. To select an ioreq server, the
rangeset is searched to see if the I/O range is recorded. However,
traversing the link list inside rangeset could be time consuming
when number of ranges is too high. On HSW platform, number of PPGTTs
for each vGPU could be several hundred. On BDW, this value could
be several thousand.

To accommodate more ranges, limitation of the number of ranges in an
ioreq server, MAX_NR_IO_RANGES is changed - future patches will be
provided to tune this with other approaches. And to increase the ioreq
server performance, a new data structure, rb_rangeset, is introduced.

Yu Zhang (2):
  Resize the MAX_NR_IO_RANGES for ioreq server
  Add new data structure to track ranges.

 xen/arch/x86/domain.c|   3 +
 xen/arch/x86/hvm/hvm.c   |  56 ++--
 xen/common/Makefile  |   1 +
 xen/common/rb_rangeset.c | 281 +++
 xen/include/asm-x86/hvm/domain.h |   4 +-
 xen/include/asm-x86/hvm/hvm.h|   1 +
 xen/include/xen/rb_rangeset.h|  49 +++
 7 files changed, 379 insertions(+), 16 deletions(-)
 create mode 100644 xen/common/rb_rangeset.c
 create mode 100644 xen/include/xen/rb_rangeset.h

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread Vitaly Kuznetsov

'pfn' and 'start_pfn' are ambiguous, both these functions expect GFNs as input.

On x86 the interface of p2m_set_mem_access() in p2m.c doesn't match the
declaration in p2m-common.h as 'pfn' is being used instead of 'start_pfn'.

On ARM both p2m_set_mem_access and p2m_get_mem_access interfaces don't match
declarations from p2m-common.h: p2m_set_mem_access uses 'pfn' instead of
'start_pfn' and p2m_get_mem_access uses 'gpfn' instead of 'pfn'.

Convert p2m_get_mem_access/p2m_set_mem_access (and __p2m_get_mem_access on ARM)
interfaces to using gft_t instead of unsigned long and update all users of
these functions.

There is also an issue in p2m_get_mem_access on x86: 'gfn' parameter passed to
gfn_lock/gfn_unlock is not defined. This code compiles only because of a
coincidence: gfn_lock/gfn_unlock are currently macros which don't use their
second argument.

Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com
---
Changes since v4:
- In p2m_set_mem_access use local unsigned long gfn_l for looping instead of
  boxing/unboxing [George Dunlap]

Changes since v3:
- Comment codying style fix [Razvan Cojocaru]
- Use INVALID_GFN instead of ~0 and -1 [Andrew Cooper]
- Convert p2m_get_mem_access/p2m_set_mem_access interfaces to using gfn_t
  [Andrew Cooper]

Changes since v2:
- Instead of adding start_ prefix on ARM remove it on x86 [Jan Beulich,
  Ian Campbell, Razvan Cojocaru]

Changes since v1:
- This patch is a successor of '[PATCH] x86/mm: use existing 'pfn' in
  p2m_get_mem_access', instead of fixing gfn_lock/gfn_unlock arguments we do
  s/pfn/gfn/g for both p2m_get_mem_access/p2m_set_mem_access [Andrew Cooper,
  Jan Beulich]

P.S.
- The patch was compile-tested on x86 and ARM64.
---
 xen/arch/arm/p2m.c   | 33 +
 xen/arch/x86/mm/p2m.c| 36 
 xen/common/mem_access.c  |  4 ++--
 xen/include/xen/p2m-common.h | 13 ++---
 4 files changed, 45 insertions(+), 41 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 903fa3f..6b9ef33 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -436,7 +436,7 @@ static int p2m_create_table(struct domain *d, lpae_t *entry,
 return 0;
 }
 
-static int __p2m_get_mem_access(struct domain *d, unsigned long gpfn,
+static int __p2m_get_mem_access(struct domain *d, gfn_t gfn,
 xenmem_access_t *access)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
@@ -465,14 +465,14 @@ static int __p2m_get_mem_access(struct domain *d, 
unsigned long gpfn,
 return 0;
 }
 
-/* If request to get default access */
-if ( gpfn == ~0ul )
+/* If request to get default access. */
+if ( gfn_x(gfn) == INVALID_GFN )
 {
 *access = memaccess[p2m-default_access];
 return 0;
 }
 
-i = radix_tree_lookup(p2m-mem_access_settings, gpfn);
+i = radix_tree_lookup(p2m-mem_access_settings, gfn_x(gfn));
 
 if ( !i )
 {
@@ -480,7 +480,7 @@ static int __p2m_get_mem_access(struct domain *d, unsigned 
long gpfn,
  * No setting was found in the Radix tree. Check if the
  * entry exists in the page-tables.
  */
-paddr_t maddr = p2m_lookup(d, gpfn  PAGE_SHIFT, NULL);
+paddr_t maddr = p2m_lookup(d, gfn_x(gfn)  PAGE_SHIFT, NULL);
 if ( INVALID_PADDR == maddr )
 return -ESRCH;
 
@@ -1386,7 +1386,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
long flag)
  * We do this first as this is faster in the default case when no
  * permission is set on the page.
  */
-rc = __p2m_get_mem_access(current-domain, paddr_to_pfn(ipa), xma);
+rc = __p2m_get_mem_access(current-domain, _gfn(paddr_to_pfn(ipa)), xma);
 if ( rc  0 )
 goto err;
 
@@ -1590,7 +1590,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, vaddr_t gla, 
const struct npfec npfec)
 if ( !p2m-mem_access_enabled )
 return true;
 
-rc = p2m_get_mem_access(v-domain, paddr_to_pfn(gpa), xma);
+rc = p2m_get_mem_access(v-domain, _gfn(paddr_to_pfn(gpa)), xma);
 if ( rc )
 return true;
 
@@ -1632,13 +1632,13 @@ bool_t p2m_mem_access_check(paddr_t gpa, vaddr_t gla, 
const struct npfec npfec)
 /* First, handle rx2rw and n2rwx conversion automatically. */
 if ( npfec.write_access  xma == XENMEM_access_rx2rw )
 {
-rc = p2m_set_mem_access(v-domain, paddr_to_pfn(gpa), 1,
+rc = p2m_set_mem_access(v-domain, _gfn(paddr_to_pfn(gpa)), 1,
 0, ~0, XENMEM_access_rw);
 return false;
 }
 else if ( xma == XENMEM_access_n2rwx )
 {
-rc = p2m_set_mem_access(v-domain, paddr_to_pfn(gpa), 1,
+rc = p2m_set_mem_access(v-domain, _gfn(paddr_to_pfn(gpa)), 1,
 0, ~0, XENMEM_access_rwx);
 }
 
@@ -1660,7 +1660,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, vaddr_t gla, 
const struct npfec npfec)
 {
 /* A listener is not

Re: [Xen-devel] [RFC PATCH v3 08/18] xen/arm: vITS: Add virtual ITS driver

2015-07-02 Thread Vijay Kilari

Hi Julien,

On Tue, Jun 23, 2015 at 10:09 PM, Julien Grall julien.gr...@citrix.com wrote:
 Hi Vijay,


 +struct vits_device {
 +uint32_t vdevid;
 +uint32_t pdevid;
 +struct its_device *its_dev;
 +struct rb_node node;
 +};

 We spoke about a specific structure in the design [2] but you introduced
 a new one. Why?

Section 6 of DraftG specifies to manage separate tree for device assignment.
This helps to manage RB-tree per domain to hold list of devices
assigned to this domain index with vdevid.

This helps to check if device is assigned to this domain before processing
any ITS command with that vdevid.


 Having everything in the its_device would help to catch a device
 attached to 2 different domains...

One option is to introduce a new variable inside its_device to know
to which domain the device is currently assigned.


 Also, the field pdevid is not vits specific but its.
pdevid can be removed as its_device structure already has it

Regards
Vijay

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] Mapping Dom0 page in hypervisor from domctl

2015-07-02 Thread Bread Cutter

Hi all,

I'm trying to map a page provided by Dom0 into Xen's address space,
but I keep freezing the physical computer when I try. Maybe I'm
hitting a spinlock or something, but clearly I'm doing something
incorrectly. I'll probably be able to get farther once I have another
computer with a serial port to get logging, but hopefully it's
something obvious.

Dom0 sends a domctl containing a virtual address. I attempt to
translate it to a GFN, and then get the page_info structure, and
finally map it.

Source code:

/* Unmap any existing mapping */
if ( d-arch.hvm_domain.dom0_mapping.page_info )
{
unmap_domain_page(d-arch.hvm_domain.dom0_mapping.buffer);
put_page(d-arch.hvm_domain.dom0_mapping.page_info);

d-arch.hvm_domain.dom0_mapping.buffer = NULL;
d-arch.hvm_domain.dom0_mapping.page_info = NULL;
}

if ( domctl-va )
{
struct page_info* page_info;
uint64_t gfn;
uint32_t pfec = 0;

gdprintk (XENLOG_DEBUG, Using guest VA 0x%lX\n, domctl-va);

/* Validate that the given VA is page-aligned */
if ((domctl-va  ~PAGE_MASK) != 0)
{
gdprintk (XENLOG_DEBUG, VA is not page aligned\n);
ret = -EINVAL;
break;
}

/* Translate the given virtual address to the guest frame number */
gfn = paging_gva_to_gfn(current, domctl-va, pfec);
if ( gfn == INVALID_GFN )
{
gdprintk (XENLOG_DEBUG, Invalid GFN\n);
ret = -EFAULT;
break;
}
gdprintk (XENLOG_DEBUG, Translated to gfn 0x%lX\n, gfn);

/* Get the page info */
page_info = get_page_from_gfn(current-domain, gfn, NULL,
P2M_UNSHARE);
if ( !page_info )
{
gdprintk (XENLOG_DEBUG, No page info\n);
ret = -ESRCH;
break;
}
gdprintk (XENLOG_DEBUG, Translated to mfn 0x%lX\n,
page_to_mfn(page_info));

d-arch.hvm_domain.dom0_mapping.page_info = page_info;
d-arch.hvm_domain.dom0_mapping.buffer =
(uint8_t*)map_domain_page_global(page_to_mfn(page_info));
if ( !d-arch.hvm_domain.dom0_mapping.buffer )
{
gdprintk (XENLOG_DEBUG, Map failed\n);
put_page(page_info);
ret = -ENOMEM;
break;
}

gdprintk (XENLOG_DEBUG, All Mapped: %p\n,
d-arch.hvm_domain.dom0_mapping.buffer);
}

Any insight into what I'm doing wrong would be appreciated!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/7] libxc: Fix a number of coverity issues.

2015-07-02 Thread Andrew Cooper

On 01/07/15 18:37, Jennifer Herbert wrote:
 Fix a number of coverity issues in libxc.

Patches 2 through 7:
  Reviewed-by: Andrew Cooper andrew.coop...@citrix.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC 6/6] xen: Add backtrace for serious issues.

2015-07-02 Thread Stefan Hajnoczi

On Wed, Jul 01, 2015 at 02:06:30PM +0100, Stefano Stabellini wrote:
 On Mon, 29 Jun 2015, Konrad Rzeszutek Wilk wrote:
  When debugging issues that caused the emulator to kill itself
  or skipping certain operations (unable to write to host
  registers) an stack trace will most definitly aid in debugging
  the problem.
  
  As such this patch uses the most basic backtrace to print out
  details.
  
  Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 I think it could be useful, but it cannot be done as a xen-hvm.c thing.
 It should be somewhere generic, maybe under util? Stefan, any
 suggestions?

Yes, it seems like a util/ thing.  backtrace() and
backtrace_symbols_fd() are glibc-specific so it must not break the build
on other platforms.

I think the reason we've surivived without backtraces so far is because
fatal errors are typically handled with abort(3).  It causes a core dump
so you have the full process state, including backtraces.

I'm fine with adding a backtrace function though since it's more
lightweight and allows for graceful shutdown or error recovery.


pgpXBsKw5J3DS.pgp
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 2/6] x86/xsaves: enable xsaves/xrstors in xen

2015-07-02 Thread Shuai Ruan

This patch uses xsaves/xrstors instead of xsaveopt/xrstor
when perform task switch in xen if the feature is supported
in hardware.

Please note that xsaves/xrstors only use compact format.

Signed-off-by: Shuai Ruan shuai.r...@intel.com
---
 xen/arch/x86/xstate.c| 83 
 xen/include/asm-x86/xstate.h |  3 +-
 2 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index e34eda3..ff67986 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -14,6 +14,7 @@
 #include asm/xstate.h
 #include asm/asm_defns.h
 
+#define XSTATE_COMPACTION_ENABLED  (1ULL  63)
 static bool_t __read_mostly cpu_has_xsaveopt;
 static bool_t __read_mostly cpu_has_xsavec;
 bool_t __read_mostly cpu_has_xgetbv1;
@@ -102,7 +103,9 @@ void xsave(struct vcpu *v, uint64_t mask)
 typeof(ptr-fpu_sse.fip.sel) fcs = ptr-fpu_sse.fip.sel;
 typeof(ptr-fpu_sse.fdp.sel) fds = ptr-fpu_sse.fdp.sel;
 
-if ( cpu_has_xsaveopt )
+if ( cpu_has_xsaves )
+xsaves(lmask, hmask, ptr);
+else if ( cpu_has_xsaveopt )
 {
 /*
  * xsaveopt may not write the FPU portion even when the respective
@@ -155,7 +158,9 @@ void xsave(struct vcpu *v, uint64_t mask)
 }
 else
 {
-if ( cpu_has_xsaveopt )
+if ( cpu_has_xsaves )
+xsaves(lmask, hmask, ptr);
+else if ( cpu_has_xsaveopt )
 asm volatile ( .byte 0x0f,0xae,0x37
: =m (*ptr)
: a (lmask), d (hmask), D (ptr) );
@@ -198,36 +203,54 @@ void xrstor(struct vcpu *v, uint64_t mask)
 switch ( __builtin_expect(ptr-fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) )
 {
 default:
-asm volatile ( 1: .byte 0x48,0x0f,0xae,0x2f\n
-   .section .fixup,\ax\  \n
-   2: mov %5,%%ecx \n
-  xor %1,%1\n
-  rep stosb\n
-  lea %2,%0\n
-  mov %3,%1\n
-  jmp 1b   \n
-   .previous   \n
-   _ASM_EXTABLE(1b, 2b)
-   : +D (ptr), +a (lmask)
-   : m (*ptr), g (lmask), d (hmask),
- m (xsave_cntxt_size)
-   : ecx );
+if ( cpu_has_xsaves )
+{
+if ( !(v-arch.xsave_area-xsave_hdr.xcomp_bv 
+ XSTATE_COMPACTION_ENABLED) )
+v-arch.xsave_area-xsave_hdr.xcomp_bv = get_xcr0() |
+XSTATE_COMPACTION_ENABLED;
+xrstors(lmask, hmask, ptr);
+}
+else
+asm volatile ( 1: .byte 0x48,0x0f,0xae,0x2f\n
+   .section .fixup,\ax\  \n
+   2: mov %5,%%ecx \n
+  xor %1,%1\n
+  rep stosb\n
+  lea %2,%0\n
+  mov %3,%1\n
+  jmp 1b   \n
+   .previous   \n
+   _ASM_EXTABLE(1b, 2b)
+   : +D (ptr), +a (lmask)
+   : m (*ptr), g (lmask), d (hmask),
+ m (xsave_cntxt_size)
+   : ecx );
 break;
 case 4: case 2:
-asm volatile ( 1: .byte 0x0f,0xae,0x2f\n
-   .section .fixup,\ax\ \n
-   2: mov %5,%%ecx\n
-  xor %1,%1   \n
-  rep stosb   \n
-  lea %2,%0   \n
-  mov %3,%1   \n
-  jmp 1b  \n
-   .previous  \n
-   _ASM_EXTABLE(1b, 2b)
-   : +D (ptr), +a (lmask)
-   : m (*ptr), g (lmask), d (hmask),
- m (xsave_cntxt_size)
-   : ecx );
+if ( cpu_has_xsaves )
+{
+if ( !(v-arch.xsave_area-xsave_hdr.xcomp_bv 
+ XSTATE_COMPACTION_ENABLED) )
+v-arch.xsave_area-xsave_hdr.xcomp_bv = get_xcr0() |
+XSTATE_COMPACTION_ENABLED;
+xrstors(lmask, hmask, ptr);
+}
+else
+asm volatile ( 1: .byte 0x48,0x0f,0xae,0x2f\n
+   .section .fixup,\ax\  \n
+   2: mov %5,%%ecx \n
+  xor %1,%1\n
+  rep stosb\n
+

[Xen-devel] [PATCH 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest

2015-07-02 Thread Shuai Ruan

This patch enables xsaves for hvm guest, includes:
1.handle xsaves vmcs init and vmexit.
2.add logic to write/read the XSS msr.

Signed-off-by: Shuai Ruan shuai.r...@intel.com
---
 xen/arch/x86/hvm/hvm.c | 40 ++
 xen/arch/x86/hvm/vmx/vmcs.c|  7 ++-
 xen/arch/x86/hvm/vmx/vmx.c | 18 +
 xen/arch/x86/xstate.c  |  4 ++--
 xen/include/asm-x86/hvm/vmx/vmcs.h |  5 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |  2 ++
 xen/include/asm-x86/xstate.h   |  1 +
 7 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..2958e0d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4269,6 +4269,10 @@ void hvm_hypervisor_cpuid_leaf(uint32_t sub_idx,
 }
 }
 
+#define XSAVEOPT   (1  0)
+#define XSAVEC (1  1)
+#define XGETBV (1  2)
+#define XSAVES (1  3)
 void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
@@ -4355,6 +4359,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 *ebx = _eax + _ebx;
 }
 }
+if ( count == 1 )
+{
+if ( cpu_has_xsaves )
+{
+*ebx = XSTATE_AREA_MIN_SIZE;
+if ( v-arch.xcr0 | v-arch.msr_ia32_xss )
+for ( sub_leaf = 2; sub_leaf  63; sub_leaf++ )
+{
+if ( !((v-arch.xcr0 | v-arch.msr_ia32_xss)
+   (1ULL  sub_leaf)) )
+continue;
+domain_cpuid(d, input, sub_leaf, _eax, _ebx, _ecx,
+ _edx);
+*ebx =  *ebx + _eax;
+}
+}
+else
+{
+*eax = ~XSAVES;
+if ( !cpu_has_xgetbv1 )
+*eax = ~XGETBV;
+if ( !cpu_has_xsavec )
+*eax = ~XSAVEC;
+if ( !cpu_has_xsaveopt )
+*eax = ~XSAVEOPT;
+*ebx = *ecx = *edx = 0;
+}
+}
 break;
 
 case 0x8001:
@@ -4454,6 +4486,10 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t 
*msr_content)
 *msr_content = v-arch.hvm_vcpu.guest_efer;
 break;
 
+case MSR_IA32_XSS:
+*msr_content = v-arch.msr_ia32_xss;
+break;
+
 case MSR_IA32_TSC:
 *msr_content = _hvm_rdtsc_intercept();
 break;
@@ -4573,6 +4609,10 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t 
msr_content)
return X86EMUL_EXCEPTION;
 break;
 
+case MSR_IA32_XSS:
+v-arch.msr_ia32_xss = msr_content;
+break;
+
 case MSR_IA32_TSC:
 hvm_set_guest_tsc(v, msr_content);
 break;
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4c5ceb5..8e61e3f 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -230,7 +230,8 @@ static int vmx_init_vmcs_config(void)
SECONDARY_EXEC_ENABLE_EPT |
SECONDARY_EXEC_ENABLE_RDTSCP |
SECONDARY_EXEC_PAUSE_LOOP_EXITING |
-   SECONDARY_EXEC_ENABLE_INVPCID);
+   SECONDARY_EXEC_ENABLE_INVPCID |
+   SECONDARY_EXEC_XSAVES);
 rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
 if ( _vmx_misc_cap  VMX_MISC_VMWRITE_ALL )
 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
@@ -921,6 +922,7 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, 
u64 val)
 virtual_vmcs_exit(vvmcs);
 }
 
+#define VMX_XSS_EXIT_BITMAP 0
 static int construct_vmcs(struct vcpu *v)
 {
 struct domain *d = v-domain;
@@ -1204,6 +1206,9 @@ static int construct_vmcs(struct vcpu *v)
 __vmwrite(GUEST_PAT, guest_pat);
 }
 
+if ( cpu_has_vmx_xsaves )
+__vmwrite(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
+
 vmx_vmcs_exit(v);
 
 /* PVH: paging mode is updated by arch_set_info_guest(). */
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index fc29b89..7c950b3 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2683,6 +2683,16 @@ static int vmx_handle_apic_write(void)
 return vlapic_apicv_write(current, exit_qualification  0xfff);
 }
 
+static void vmx_handle_xsaves(void)
+{
+WARN();
+}
+
+static void vmx_handle_xrstors(void)
+{
+WARN();
+}
+
 void vmx_vmexit_handler(struct cpu_user_regs *regs)
 {
 unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0;
@@ -3201,6 +3211,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 vmx_vcpu_flush_pml_buffer(v);
 break;
 
+case EXIT_REASON_XSAVES:
+vmx_handle_xsaves();
+break;
+
+case EXIT_REASON_XRSTORS:
+

[Xen-devel] [PATCH 6/6] x86/xsaves: detect xsaves/xgetbv in xen

2015-07-02 Thread Shuai Ruan

As xsaves/xgetbv already support, so switch on.

Signed-off-by: Shuai Ruan shuai.r...@intel.com
---
 xen/arch/x86/xstate.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index c20f865..ebf9920 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -525,15 +525,15 @@ void xstate_init(bool_t bsp)
 {
 cpu_has_xsaveopt = !!(eax  XSTATE_FEATURE_XSAVEOPT);
 cpu_has_xsavec = !!(eax  XSTATE_FEATURE_XSAVEC);
-/* XXX cpu_has_xgetbv1 = !!(eax  XSTATE_FEATURE_XGETBV1); */
-/* XXX cpu_has_xsaves = !!(eax  XSTATE_FEATURE_XSAVES); */
+cpu_has_xgetbv1 = !!(eax  XSTATE_FEATURE_XGETBV1);
+cpu_has_xsaves = !!(eax  XSTATE_FEATURE_XSAVES);
 }
 else
 {
 BUG_ON(!cpu_has_xsaveopt != !(eax  XSTATE_FEATURE_XSAVEOPT));
 BUG_ON(!cpu_has_xsavec != !(eax  XSTATE_FEATURE_XSAVEC));
-/* XXX BUG_ON(!cpu_has_xgetbv1 != !(eax  XSTATE_FEATURE_XGETBV1)); */
-/* XXX BUG_ON(!cpu_has_xsaves != !(eax  XSTATE_FEATURE_XSAVES)); */
+BUG_ON(!cpu_has_xgetbv1 != !(eax  XSTATE_FEATURE_XGETBV1));
+BUG_ON(!cpu_has_xsaves != !(eax  XSTATE_FEATURE_XSAVES));
 }
 
 if ( cpu_has_xsaves )
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 1/6] x86/xsaves: enable xsaves/xrstors for pv guest

2015-07-02 Thread Shuai Ruan

This patch emualtes xsaves/xrstors instruction and
XSS msr access.

As xsaves/xrstors instructions and XSS msr access
required be executed only in ring0. So emulation is
needed when pv guest use these instructions.

Signed-off-by: Shuai Ruan shuai.r...@intel.com
---
 xen/arch/x86/domain.c   |  3 ++
 xen/arch/x86/traps.c| 85 +
 xen/arch/x86/x86_64/mm.c| 52 +
 xen/arch/x86/xstate.c   | 39 +++
 xen/include/asm-x86/domain.h|  1 +
 xen/include/asm-x86/mm.h|  1 +
 xen/include/asm-x86/msr-index.h |  2 +
 xen/include/asm-x86/xstate.h|  3 ++
 8 files changed, 186 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a8fe046..66f8231 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -426,6 +426,7 @@ int vcpu_initialise(struct vcpu *v)
 
 /* By default, do not emulate */
 v-arch.vm_event.emulate_flags = 0;
+v-arch.msr_ia32_xss = 0;
 
 rc = mapcache_vcpu_init(v);
 if ( rc )
@@ -1494,6 +1495,8 @@ static void __context_switch(void)
 if ( xcr0 != get_xcr0()  !set_xcr0(xcr0) )
 BUG();
 }
+if ( cpu_has_xsaves )
+wrmsr_safe(MSR_IA32_XSS, n-arch.msr_ia32_xss);
 vcpu_restore_fpu_eager(n);
 n-arch.ctxt_switch_to(n);
 }
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index ac62f20..227670b 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2346,6 +2346,80 @@ static int emulate_privileged_op(struct cpu_user_regs 
*regs)
 }
 break;
 
+case 0xc7:
+{
+void *xsave_addr;
+int not_page_aligned = 0;
+u32 guest_xsaves_size = xstate_ctxt_size_compact(v-arch.xcr0);
+
+switch ( insn_fetch(u8, code_base, eip, code_limit) )
+{
+case 0x2f:/* XSAVES */
+{
+if ( (regs-edi  ~PAGE_MASK) + guest_xsaves_size  PAGE_SIZE )
+{
+mfn_t mfn_list[2];
+void *va;
+
+not_page_aligned = 1;
+mfn_list[0] = _mfn(do_page_walk_mfn(v, regs-edi));
+mfn_list[1] = _mfn(do_page_walk_mfn(v,
+   PAGE_ALIGN(regs-edi)));
+va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
+ASSERT(((unsigned long) va  ~PAGE_MASK) == 0);
+xsave_addr = (void *)((unsigned long)va +
+ (regs-edi  ~PAGE_MASK));
+}
+else
+xsave_addr = do_page_walk(v, regs-edi);
+
+if ( !xsave_addr )
+goto fail;
+
+xsaves(regs-eax, regs-edx, xsave_addr);
+
+if ( not_page_aligned )
+vunmap((void *)((unsigned long)xsave_addr  PAGE_MASK));
+else
+unmap_domain_page(xsave_addr);
+break;
+}
+case 0x1f:/* XRSTORS */
+{
+if( (regs-edi  ~PAGE_MASK) + guest_xsaves_size  PAGE_SIZE )
+{
+mfn_t mfn_list[2];
+void *va;
+
+not_page_aligned = 1;
+mfn_list[0] = _mfn(do_page_walk_mfn(v, regs-edi));
+mfn_list[1] = _mfn(do_page_walk_mfn(v,
+   PAGE_ALIGN(regs-edi)));
+va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
+ASSERT(((unsigned long) va  ~PAGE_MASK) == 0);
+xsave_addr = (void *)((unsigned long)va +
+ (regs-edi  ~PAGE_MASK));
+}
+else
+xsave_addr = do_page_walk(v, regs-edi);
+
+if ( !xsave_addr )
+goto fail;
+
+xrstors(regs-eax, regs-edx, xsave_addr);
+
+if ( not_page_aligned )
+vunmap((void *)((unsigned long)xsave_addr  PAGE_MASK));
+else
+unmap_domain_page(xsave_addr);
+break;
+}
+default:
+goto fail;
+}
+break;
+}
+
 case 0x06: /* CLTS */
 (void)do_fpu_taskswitch(0);
 break;
@@ -2638,6 +2712,12 @@ static int emulate_privileged_op(struct cpu_user_regs 
*regs)
 wrmsrl(regs-_ecx, msr_content);
 break;
 
+case MSR_IA32_XSS:
+if ( wrmsr_safe(regs-ecx, msr_content) != 0 )
+goto fail;
+v-arch.msr_ia32_xss = msr_content;
+break;
+
 default:
 if ( wrmsr_hypervisor_regs(regs-ecx, msr_content) == 1 )
 break;
@@ -2740,6 +2820,11 @@ static int emulate_privileged_op(struct cpu_user_regs 
*regs)

[Xen-devel] [PATCH 4/6] libxc: expose xsaves/xgetbv/xsavec to hvm guest

2015-07-02 Thread Shuai Ruan

This patch exposes xsaves/xgetbv/xsavec to hvm guest.

Signed-off-by: Shuai Ruan shuai.r...@intel.com
---
 tools/libxc/xc_cpuid_x86.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index c97f91a..0ed8b68 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -211,6 +211,9 @@ static void intel_xc_cpuid_policy(
 }
 
 #define XSAVEOPT(1  0)
+#define XSAVEC  (1  1)
+#define XGETBV  (1  2)
+#define XSAVES  (1  3)
 /* Configure extended state enumeration leaves (0x000D for xsave) */
 static void xc_cpuid_config_xsave(
 xc_interface *xch, domid_t domid, uint64_t xfeature_mask,
@@ -247,8 +250,7 @@ static void xc_cpuid_config_xsave(
 regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
 break;
 case 1: /* leaf 1 */
-regs[0] = XSAVEOPT;
-regs[1] = regs[2] = regs[3] = 0;
+regs[0] = (XSAVEOPT | XSAVEC | XGETBV | XSAVES);
 break;
 case 2 ... 63: /* sub-leaves */
 if ( !(xfeature_mask  (1ULL  input[1])) )
@@ -256,8 +258,6 @@ static void xc_cpuid_config_xsave(
 regs[0] = regs[1] = regs[2] = regs[3] = 0;
 break;
 }
-/* Don't touch EAX, EBX. Also cleanup ECX and EDX */
-regs[2] = regs[3] = 0;
 break;
 }
 }
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 0/6] add xsaves/xrstors support

2015-07-02 Thread Shuai Ruan

This patchset enable xsaves/xrstors feature.
It includes tree parts:
1. add xsaves/xrstors for xen.
2. add xsaves/xrstors for pv guest.
3. add xsaves/xrstors for hvn guest.

Shuai Ruan (6):
  x86/xsaves: enable xsaves/xrstors for pv guest
  x86/xsaves: enable xsaves/xrstors in xen
  x86/xsaves: enable xsaves/xrstors for hvm guest
  libxc: expose xsaves/xgetbv/xsavec to hvm guest
  x86/xsaves: support compact format for hvm save/restore
  x86/xsaves: detect xsaves/xgetbv in xen

 tools/libxc/xc_cpuid_x86.c |   8 +-
 xen/arch/x86/domain.c  |   3 +
 xen/arch/x86/hvm/hvm.c |  56 +++-
 xen/arch/x86/hvm/vmx/vmcs.c|   7 +-
 xen/arch/x86/hvm/vmx/vmx.c |  18 +++
 xen/arch/x86/traps.c   |  85 
 xen/arch/x86/x86_64/mm.c   |  52 +++
 xen/arch/x86/xstate.c  | 271 -
 xen/include/asm-x86/domain.h   |   1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   5 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |   2 +
 xen/include/asm-x86/mm.h   |   1 +
 xen/include/asm-x86/msr-index.h|   2 +
 xen/include/asm-x86/xstate.h   |  13 +-
 14 files changed, 477 insertions(+), 47 deletions(-)

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts

2015-07-02 Thread Paul Durrant

 -Original Message-
 From: Roger Pau Monné [mailto:roger@citrix.com]
 Sent: 02 July 2015 15:52
 To: Paul Durrant; xen-de...@lists.xenproject.org
 Cc: Andrew Cooper; Keir (Xen.org); Jan Beulich
 Subject: Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and
 mmio intercepts

 El 30/06/15 a les 15.05, Paul Durrant ha escrit:
 [...]
 +void msixtbl_init(struct domain *d)
  +{
  +register_mmio_handler(d, msixtbl_mmio_ops);

 Since you are adding an initialization function to vmsi I think
 msixtbl_list and msixtbl_list_lock should also be initialized here
 instead of hvm_domain_initialise.

Yes, that sounds quite reasonable. Perhaps as a follow-up patch though, unless 
I need to post v6 of the series for some other reason.

  Paul

 Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts

2015-07-02 Thread Roger Pau Monné

El 02/07/15 a les 17.02, Paul Durrant ha escrit:
 -Original Message-
 From: Roger Pau Monné [mailto:roger@citrix.com]
 Sent: 02 July 2015 15:52
 To: Paul Durrant; xen-de...@lists.xenproject.org
 Cc: Andrew Cooper; Keir (Xen.org); Jan Beulich
 Subject: Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and
 mmio intercepts

 El 30/06/15 a les 15.05, Paul Durrant ha escrit:
 [...]
 +void msixtbl_init(struct domain *d)
 +{
 +register_mmio_handler(d, msixtbl_mmio_ops);

 Since you are adding an initialization function to vmsi I think
 msixtbl_list and msixtbl_list_lock should also be initialized here
 instead of hvm_domain_initialise.

 Yes, that sounds quite reasonable. Perhaps as a follow-up patch though, 
 unless I need to post v6 of the series for some other reason.

That's fine, if not I can also add it as a pre-patch to my HVMlite
stuff, it's trivial.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts

2015-07-02 Thread Paul Durrant

 -Original Message-
 From: Roger Pau Monné [mailto:roger@citrix.com]
 Sent: 02 July 2015 16:12
 To: Paul Durrant; xen-de...@lists.xenproject.org
 Cc: Andrew Cooper; Keir (Xen.org); Jan Beulich
 Subject: Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and
 mmio intercepts

 El 02/07/15 a les 17.02, Paul Durrant ha escrit:
  -Original Message-
  From: Roger Pau Monné [mailto:roger@citrix.com]
  Sent: 02 July 2015 15:52
  To: Paul Durrant; xen-de...@lists.xenproject.org
  Cc: Andrew Cooper; Keir (Xen.org); Jan Beulich
  Subject: Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio
 and
  mmio intercepts

  El 30/06/15 a les 15.05, Paul Durrant ha escrit:
  [...]
  +void msixtbl_init(struct domain *d)
  +{
  +register_mmio_handler(d, msixtbl_mmio_ops);

  Since you are adding an initialization function to vmsi I think
  msixtbl_list and msixtbl_list_lock should also be initialized here
  instead of hvm_domain_initialise.

  Yes, that sounds quite reasonable. Perhaps as a follow-up patch though,
 unless I need to post v6 of the series for some other reason.

 That's fine, if not I can also add it as a pre-patch to my HVMlite
 stuff, it's trivial.

Sure. Thanks,

  Paul

 Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 2/6] [WIP] libxl: xsrestrict QEMU

2015-07-02 Thread Stefano Stabellini

Check whether QEMU supports the xsrestrict option, by parsing its --help
output. Store the result on xenstore for future reference on a per QEMU
binary basis, so that device_model_override still works fine with it.

Replace / with _ in the QEMU binary path before writing it to xenstore,
so that it doesn't get confused with xenstore paths.

If QEMU supports xsrestrict and emulator_id, pass xsrestrict=on to it.
Statically reserve two emulator_ids, one for device models and another
for pv qemus. Use the emulator_ids appropriately.

WIP: direct use of fork is forbidden in libxl

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

---
Changes in v4:
- update xenstore-paths.markdown

Changes in v3:
- add emulator_ids
- mark as WIP
---
 docs/misc/xenstore-paths.markdown |8 +
 tools/libxl/libxl_dm.c|   72 +
 tools/libxl/libxl_internal.h  |7 
 tools/libxl/libxl_utils.c |   10 ++
 4 files changed, 97 insertions(+)

diff --git a/docs/misc/xenstore-paths.markdown 
b/docs/misc/xenstore-paths.markdown
index d94ea9d..780f601 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -397,6 +397,14 @@ The device model version for a domain.
 
 ifb device used by Remus to buffer network output from the associated vif.
 
+ ~/libxl/$DEVICE_MODEL_BINARY/* [n,INTERNAL]
+
+Contains a list of options supported by the device model, in the form:
+$OPTION = (1|0).
+$DEVICE_MODEL_BINARY is the full path to the device model binary with
+'/' replaced by '_'. So for example /usr/lib/xen/bin/qemu-system-i386
+would be /libxl/_usr_lib_xen_bin_qemu-system-i386.
+
 [BLKIF]: 
http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: 
http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: 
http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 7038e5c..d5f230a 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -447,6 +447,65 @@ retry:
 return 0;
 }
 
+int libxl__check_qemu_supported(libxl__gc *gc, const char *dm, char *opt)
+{
+libxl_ctx *ctx = libxl__gc_owner(gc);
+pid_t pid;
+int pipefd[2], status;
+FILE *fp;
+char *buf;
+ssize_t buf_size = 512;
+int ret = 0;
+char *s;
+
+s = libxl__strdup(gc, dm);
+libxl__replace_chr(gc, s, '/', '_');
+s = libxl__sprintf(gc, libxl/%s/%s, s, opt);
+buf = libxl__xs_read(gc, XBT_NULL, s);
+if (buf != NULL)
+return !strcmp(buf, 1);
+
+if (access(dm, X_OK)  0) {
+LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
+ device model %s is not executable, dm);
+return ERROR_FAIL;
+}
+
+if (libxl_pipe(ctx, pipefd)  0)
+return ERROR_FAIL;
+
+pid = fork();
+if (pid  0)
+return ERROR_FAIL;
+
+/* child spawn QEMU */
+if (!pid) {
+char *args[] = {(char*)dm, --help, NULL};
+close(pipefd[0]);
+libxl__exec(gc, -1, pipefd[1], pipefd[1], dm, args, NULL);
+exit(1);
+}
+
+/* parent parses the output */
+close(pipefd[1]);
+fp = fdopen(pipefd[0], r);
+buf = libxl__malloc(gc, buf_size);
+while (fgets(buf, buf_size, fp) != NULL) {
+if (strstr(buf, opt) != NULL) {
+ret = 1;
+goto out;
+}
+}
+out:
+close(pipefd[0]);
+waitpid(pid, status, pid);
+libxl_report_child_exitstatus(ctx, XTL_WARN, dm, pid, status);
+
+ret = libxl__xs_write(gc, XBT_NULL, s, %d, ret);
+
+return ret;
+}
+
 static char ** libxl__build_device_model_args_new(libxl__gc *gc,
 const char *dm, int guest_domid,
 const libxl_domain_config 
*guest_config,
@@ -934,6 +993,14 @@ end_search:
 if (user) {
 flexarray_append(dm_args, -runas);
 flexarray_append(dm_args, user);
+if (libxl__check_qemu_supported(gc, dm, xsrestrict) 
+libxl__check_qemu_supported(gc, dm, emulator_id)) {
+flexarray_append(dm_args, -xenopts);
+flexarray_append(dm_args,
+GCSPRINTF(xsrestrict=on,emulator_id=%u,
+(b_info-type == LIBXL_DOMAIN_TYPE_PV) ?
+QEMU_XEN_PV_ID : QEMU_XEN_DEVICE_MODEL_ID));
+}
 }
 }
 flexarray_append(dm_args, NULL);
@@ -1660,6 +1727,11 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, 
libxl__dm_spawn_state *dmss)
 flexarray_vappend(dm_args, -monitor, /dev/null, NULL);
 flexarray_vappend(dm_args, -serial, /dev/null, NULL);
 flexarray_vappend(dm_args, -parallel, /dev/null, NULL);
+if (libxl__check_qemu_supported(gc, dm, emulator_id)) {
+flexarray_append(dm_args, -xenopts);
+flexarray_append(dm_args,
+

[Xen-devel] [PATCH v4 5/6] libxl: change qdisk-backend-pid path on xenstore

2015-07-02 Thread Stefano Stabellini

Change the qdisk-backend-pid path on xenstore from
libxl/$DOMID/qdisk-backend-pid to /local/domain/$DOMID/image/pvqemu-pid to be
more similar to the device-model path.

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

---

Changes in v4:
- update xenstore-paths.markdown
---
 docs/misc/xenstore-paths.markdown |9 +
 tools/libxl/libxl_dm.c|4 ++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/docs/misc/xenstore-paths.markdown 
b/docs/misc/xenstore-paths.markdown
index e6ed25f..7f87f74 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -115,6 +115,11 @@ The domain's own ID.
 The process ID of the device model associated with this domain, if it
 has one.
 
+ ~/image/pvqemu-pid = INTEGER [INTERNAL]
+
+The process ID of the userspace PV backends daemon associated with this
+domain, if it has one.
+
  ~/device-model = STRING [n,INTERNAL]
 
 The full device model binary path.
@@ -360,10 +365,6 @@ A domain writable path. Available for arbitrary domain use.
 
 Contains the status of the device models running on the domain.
 
- ~/libxl/$DOMID/qdisk-backend-pid [w]
-
-Contains the PIDs of the device models running on the domain.
-
 ## Virtual Machine Paths
 
 The /vm/$UUID namespace is used by toolstacks to store various
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 90f2c97..3a4625a 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1774,7 +1774,7 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, 
libxl__dm_spawn_state *dmss)
  * because we will call this from unprivileged driver domains,
  * so save it in the current domain libxl private dir.
  */
-dmss-spawn.pidpath = GCSPRINTF(libxl/%u/qdisk-backend-pid, domid);
+dmss-spawn.pidpath = GCSPRINTF(/local/domain/%d/image/pvqemu-pid, 
domid);
 dmss-spawn.midproc_cb = libxl__spawn_record_pid;
 dmss-spawn.confirm_cb = device_model_confirm;
 dmss-spawn.failure_cb = device_model_startup_failed;
@@ -1834,7 +1834,7 @@ int libxl__destroy_qdisk_backend(libxl__gc *gc, uint32_t 
domid)
 char *pid_path;
 int rc;
 
-pid_path = GCSPRINTF(libxl/%u/qdisk-backend-pid, domid);
+pid_path = GCSPRINTF(/local/domain/%d/image/pvqemu-pid, domid);
 
 rc = kill_device_model(gc, pid_path);
 if (rc)
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 4/6] libxl: change xs path for QEMU

2015-07-02 Thread Stefano Stabellini

Change the QEMU xenstore watch path to
/local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID/$EMULATOR_ID.
Currently two emulator_ids are statically allocated: one for device models and
one for pv qemus.

Add a parameter to libxl__device_model_xs_path to distinguish the device
model from the pv backends provider.

Store the device model binary path under
/local/domain/$DOMID/device-model on xenstore, so that we can fetch it
later and retrieve the list of supported options from
/local/domain/$LIBXL_TOOLSTACK_DOMID/libxl/$device_model_binary,
introduce in the previous path.

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

---

Changes in v4:
- update xenstore-paths.markdown

Changes in v3:
- use LIBXL_TOOLSTACK_DOMID instead of 0 in the commit message
- change xs path to include the emulator_id
---
 docs/misc/xenstore-paths.markdown |   16 +++-
 tools/libxl/libxl.c   |2 +-
 tools/libxl/libxl_create.c|3 +++
 tools/libxl/libxl_device.c|2 +-
 tools/libxl/libxl_dm.c|   18 ++
 tools/libxl/libxl_dom.c   |   12 ++--
 tools/libxl/libxl_internal.c  |   19 +++
 tools/libxl/libxl_internal.h  |5 ++---
 tools/libxl/libxl_pci.c   |   14 +++---
 9 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/docs/misc/xenstore-paths.markdown 
b/docs/misc/xenstore-paths.markdown
index 99e038b..e6ed25f 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -115,6 +115,10 @@ The domain's own ID.
 The process ID of the device model associated with this domain, if it
 has one.
 
+ ~/device-model = STRING [n,INTERNAL]
+
+The full device model binary path.
+
  ~/cpu/[0-9]+/availability = (online|offline) [PV]
 
 One node for each virtual CPU up to the guest's configured
@@ -306,13 +310,15 @@ A virtual network device backend. Described by
 
 A PV console backend. Described in [console.txt](console.txt)
 
- ~/device-model/$DOMID/* [w]
+ ~/device-model/$DOMID/$EMULATOR_ID/* [w]
 
 Information relating to device models running in the domain. $DOMID is
-the target domain of the device model. When the device model is running
-at the same privilege level of the guest domain, this path does not
-contain any sensitive information and becomes guest writeable. Otherwise
-it is INTERNAL.
+the target domain of the device model. $EMULATOR_ID indentifies a
+specific device model instance (multiple device model instances for the
+same domain are possible).  When the device model is running at the same
+privilege level of the guest domain, this path does not contain any
+sensitive information and becomes guest writeable. Otherwise
+it is INTERNAL
 
  ~/libxl/disable_udev = (1|0) []
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 516713e..bca4c88 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1035,7 +1035,7 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
 if (type == LIBXL_DOMAIN_TYPE_HVM) {
 uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid);
 
-path = libxl__device_model_xs_path(gc, dm_domid, domid, /state);
+path = libxl__device_model_xs_path(gc, false, dm_domid, domid, 
/state);
 state = libxl__xs_read(gc, XBT_NULL, path);
 if (state != NULL  !strcmp(state, paused)) {
 libxl__qemu_traditional_cmd(gc, domid, continue);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index a74b340..df946e2 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1002,6 +1002,9 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 return;
 }
 
+libxl__xs_write(gc, XBT_NULL, GCSPRINTF(/local/domain/%u/device-model, 
domid),
+%s, libxl__domain_device_model(gc, info));
+
 /* consume bootloader outputs. state-pv_{kernel,ramdisk} have
  * been initialised by the bootloader already.
  */
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 93bb41e..aadcd08 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -1190,7 +1190,7 @@ int libxl__wait_for_device_model_deprecated(libxl__gc *gc,
 char *path;
 uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
 
-path = libxl__device_model_xs_path(gc, dm_domid, domid, /state);
+path = libxl__device_model_xs_path(gc, false, dm_domid, domid, /state);
 return libxl__xenstore_child_wait_deprecated(gc, domid,
  LIBXL_DEVICE_MODEL_START_TIMEOUT,
  Device Model, path, state, spawning,
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index ecae536..90f2c97 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1274,9 +1274,9 @@ void libxl__spawn_stub_dm(libxl__egc *egc, 
libxl__stub_dm_spawn_state *sdss)
 retry_transaction:
 t = xs_transaction_start(ctx-xsh);

Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread Tim Deegan

At 12:32 +0100 on 02 Jul (1435840328), George Dunlap wrote:
 On 07/02/2015 12:25 PM, Tim Deegan wrote:
  At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote:
  On 02/07/15 11:48, George Dunlap wrote:
  Now in p2m_set_mem_access(), rather than just using an unsigned long in
  the loop iterating over gfns, you do this thing where you convert gfn_t
  to unsigned long, add one, and then convert it back to gfn_t again.
 
  I can't see any comments in v3 that suggest you doing that, and it seems
  a bit clunky.  Is that really necessary?  Wouldn't it be better to
  declare a local variable?
 
  I'm not strongly opinionated on this one, it just seems a bit strange.
 
  Everything else looks good, thanks.
 
  Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
  typesafety don't allow for simply adding 1 to a typesafe variable.
 
  In a cases like this, I think it is acceptable to keep a unsigned long
  shadow variable and manipulate it is a plain integer.  The eventual
  _gfn() required to pass it further down the callchain will help to
  visually re-enforce the appropriate type.
 
  After all, the entire point of these typesafes are to try and avoid
  accidentally mixing up the different address spaces, but a function
  which takes a typesafe, loops over a subset and passes the same typesafe
  further down can probably be trusted to DTRT, catching errors at review
  time. 
 
  Off the top of my head, the only functions which would normally expect
  to mix and match the typesafes are the pagetable walking ones.
  
  It should be easy enough to extend the macros to define a
  gfn_inc(gfn_t) operator for this kind of thing.
 
 I was thinking that -- although in this case you'd still need to un-pack
 it to do the loop exit conditional.  To really make things pretty you'd
 want a for_gfn_range() macro or something like that that takes a start
 gfn and a number.
 
 But that's really starting to be feature creep for this patch, which is
 why I didn't want to suggest it on v4. :-)

Oh, quite. :)

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 0/6] libxl: xs_restrict QEMU

2015-07-02 Thread Stefano Stabellini

Hi all,

this patch series changes libxl to start QEMU as device model with the
new xsrestrict option (http://marc.info/?l=xen-develm=143341692707358).
It also starts a second QEMU to provide PV backends in userspace (qdisk)
to HVM guests.


Changes in v4:
- update xenstore-paths.markdown
- add error message in case count  MAX_PHYSMAP_ENTRIES
- add a note to xenstore-paths.markdown about the possible change in
privilege level
- only change permissions if xsrestrict is supported

Changes in v3:
- use LIBXL_TOOLSTACK_DOMID instead of 0 in the commit message
- update commit message with more info on why it is safe
- add a limit on the number of physmap entries to save and restore
- add emulator_ids
- mark patch #3 as WIP
- use LIBXL_TOOLSTACK_DOMID instead of 0 in the commit message
- change xs path to include the emulator_id
- change qdisk-backend-pid path on xenstore
- use dcs-dmss.pvqemu to spawn the second QEMU
- keep track of the rc of both QEMUs before proceeding


Stefano Stabellini (6):
  libxl: do not add a vkb backend to hvm guests
  [WIP] libxl: xsrestrict QEMU
  libxl: allow /local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID to 
be written by $DOMID
  libxl: change xs path for QEMU
  libxl: change qdisk-backend-pid path on xenstore
  libxl: spawns two QEMUs for HVM guests

 docs/misc/xenstore-paths.markdown |   30 --
 tools/libxl/libxl.c   |2 +-
 tools/libxl/libxl_create.c|   58 +--
 tools/libxl/libxl_device.c|2 +-
 tools/libxl/libxl_dm.c|  115 +
 tools/libxl/libxl_dom.c   |   19 --
 tools/libxl/libxl_internal.c  |   19 --
 tools/libxl/libxl_internal.h  |   15 -
 tools/libxl/libxl_pci.c   |   14 ++---
 tools/libxl/libxl_utils.c |   10 
 10 files changed, 225 insertions(+), 59 deletions(-)

Cheers,

Stefano

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 3/6] libxl: allow /local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID to be written by $DOMID

2015-07-02 Thread Stefano Stabellini

The device model is going to restrict its xenstore connection to $DOMID
level, using XS_RESTRICT, only implemented by oxenstored at present.

Let qemu-xen access
/local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID, as it is
required by QEMU to read/write the physmap. It doesn't contain any
information the guest is not already fully aware of.

Add a maximum limit of physmap entries to save, so that the guest cannot
DOS the toolstack.

Information under
/local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID include:

- the device model status, which is updated by the device model before
the guest is unpaused, then ignored by both QEMU and libxl

- the guest physmap, which contains the memory layout of the guest. The
guest is already in possession of this information. A malicious guest
could modify the physmap entries causing QEMU to read wrong information
at restore time. QEMU would populate its XenPhysmap list with wrong
entries that wouldn't match its own device emulation addresses, leading
to a failure in restoring the domain. But it could not compromise the
security of the system because the addresses are only used for checks
against QEMU's addresses.


In the case of qemu-traditional, the state node is used to issue
commands to the device model, so avoid to change permissions in that
case.

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com


---
Changes in v4:
- add error message in case count  MAX_PHYSMAP_ENTRIES
- add a note to xenstore-paths.markdown about the possible change in
privilege level
- only change permissions if xsrestrict is supported

Changes in v3:
- use LIBXL_TOOLSTACK_DOMID instead of 0 in the commit message
- update commit message with more info on why it is safe
- add a limit on the number of physmap entries to save and restore

Changes in v2:
- fix permissions to actually do what intended
- use LIBXL_TOOLSTACK_DOMID instead of 0
---
 docs/misc/xenstore-paths.markdown |7 +--
 tools/libxl/libxl_dm.c|   12 +++-
 tools/libxl/libxl_dom.c   |7 +++
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/docs/misc/xenstore-paths.markdown 
b/docs/misc/xenstore-paths.markdown
index 780f601..99e038b 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -306,10 +306,13 @@ A virtual network device backend. Described by
 
 A PV console backend. Described in [console.txt](console.txt)
 
- ~/device-model/$DOMID/* [INTERNAL]
+ ~/device-model/$DOMID/* [w]
 
 Information relating to device models running in the domain. $DOMID is
-the target domain of the device model.
+the target domain of the device model. When the device model is running
+at the same privilege level of the guest domain, this path does not
+contain any sensitive information and becomes guest writeable. Otherwise
+it is INTERNAL.
 
  ~/libxl/disable_udev = (1|0) []
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index d5f230a..ecae536 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1531,6 +1531,7 @@ void libxl__spawn_local_dm(libxl__egc *egc, 
libxl__dm_spawn_state *dmss)
 char **pass_stuff;
 const char *dm;
 int dm_state_fd = -1;
+struct xs_permissions rwperm[2];
 
 if (libxl_defbool_val(b_info-device_model_stubdomain)) {
 abort();
@@ -1573,7 +1574,16 @@ void libxl__spawn_local_dm(libxl__egc *egc, 
libxl__dm_spawn_state *dmss)
 }
 
 path = libxl__device_model_xs_path(gc, LIBXL_TOOLSTACK_DOMID, domid, );
-xs_mkdir(ctx-xsh, XBT_NULL, path);
+if (b_info-device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN 
+libxl__check_qemu_supported(gc, dm, xsrestrict)) {
+rwperm[0].id = LIBXL_TOOLSTACK_DOMID;
+rwperm[0].perms = XS_PERM_NONE;
+rwperm[1].id = domid;
+rwperm[1].perms = XS_PERM_WRITE;
+libxl__xs_mkdir(gc, XBT_NULL, path, rwperm, 2);
+} else {
+xs_mkdir(ctx-xsh, XBT_NULL, path);
+}
 
 if (b_info-type == LIBXL_DOMAIN_TYPE_HVM 
 b_info-device_model_version
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index f408646..c8523da 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1677,6 +1677,8 @@ static inline char *physmap_path(libxl__gc *gc, uint32_t 
dm_domid,
phys_offset, node);
 }
 
+#define MAX_PHYSMAP_ENTRIES 12
+
 int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
 uint32_t *len, void *dss_void)
 {
@@ -1698,6 +1700,11 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
 num);
 count = num;
 
+if (count  MAX_PHYSMAP_ENTRIES) {
+LOG(ERROR, Max physmap entries reached);
+return -1;
+}
+
 *len = sizeof(version) + sizeof(count);
 *buf = calloc(1, *len);
 ptr = *buf;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3 08/18] xen/arm: vITS: Add virtual ITS driver

2015-07-02 Thread Ian Campbell

On Thu, 2015-07-02 at 19:03 +0530, Vijay Kilari wrote:
 Hi Julien,
 
 On Tue, Jun 23, 2015 at 10:09 PM, Julien Grall julien.gr...@citrix.com 
 wrote:
  Hi Vijay,
 
 
  +struct vits_device {
  +uint32_t vdevid;
  +uint32_t pdevid;
  +struct its_device *its_dev;
  +struct rb_node node;
  +};
 
  We spoke about a specific structure in the design [2] but you introduced
  a new one. Why?
 
 Section 6 of DraftG specifies to manage separate tree for device assignment.
 This helps to manage RB-tree per domain to hold list of devices
 assigned to this domain index with vdevid.
 
 This helps to check if device is assigned to this domain before processing
 any ITS command with that vdevid.
 
 
  Having everything in the its_device would help to catch a device
  attached to 2 different domains...
 
 One option is to introduce a new variable inside its_device to know
 to which domain the device is currently assigned.

IIRC that's what I intended, e.g. two trees referencing the same
underlying data structure. Sorry that wasn't clear.

 
 
  Also, the field pdevid is not vits specific but its.
 pdevid can be removed as its_device structure already has it
 
 Regards
 Vijay



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 4/6] xen/PMU: Describe vendor-specific PMU registers

2015-07-02 Thread Boris Ostrovsky

AMD and Intel PMU register initialization and helpers that determine
whether a register belongs to PMU.

This and some of subsequent PMU emulation code is somewhat similar to
Xen's PMU implementation.

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
Reviewed-by: David Vrabel david.vra...@citrix.com
---
 arch/x86/xen/pmu.c | 153 -
 1 file changed, 152 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index ba7687c..1fc7e10 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -18,6 +18,155 @@
 DEFINE_PER_CPU(struct xen_pmu_data *, xenpmu_shared);
 #define get_xenpmu_data()per_cpu(xenpmu_shared, smp_processor_id())
 
+
+/* AMD PMU */
+#define F15H_NUM_COUNTERS   6
+#define F10H_NUM_COUNTERS   4
+
+static __read_mostly uint32_t amd_counters_base;
+static __read_mostly uint32_t amd_ctrls_base;
+static __read_mostly int amd_msr_step;
+static __read_mostly int k7_counters_mirrored;
+static __read_mostly int amd_num_counters;
+
+/* Intel PMU */
+#define MSR_TYPE_COUNTER0
+#define MSR_TYPE_CTRL   1
+#define MSR_TYPE_GLOBAL 2
+#define MSR_TYPE_ARCH_COUNTER   3
+#define MSR_TYPE_ARCH_CTRL  4
+
+/* Number of general pmu registers (CPUID.EAX[0xa].EAX[8..15]) */
+#define PMU_GENERAL_NR_SHIFT8
+#define PMU_GENERAL_NR_BITS 8
+#define PMU_GENERAL_NR_MASK (((1  PMU_GENERAL_NR_BITS) - 1) \
+ PMU_GENERAL_NR_SHIFT)
+
+/* Number of fixed pmu registers (CPUID.EDX[0xa].EDX[0..4]) */
+#define PMU_FIXED_NR_SHIFT  0
+#define PMU_FIXED_NR_BITS   5
+#define PMU_FIXED_NR_MASK   (((1  PMU_FIXED_NR_BITS) - 1) \
+ PMU_FIXED_NR_SHIFT)
+
+/* Alias registers (0x4c1) for full-width writes to PMCs */
+#define MSR_PMC_ALIAS_MASK  (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_PMC0))
+
+static __read_mostly int intel_num_arch_counters, intel_num_fixed_counters;
+
+
+static void xen_pmu_arch_init(void)
+{
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
+
+   switch (boot_cpu_data.x86) {
+   case 0x15:
+   amd_num_counters = F15H_NUM_COUNTERS;
+   amd_counters_base = MSR_F15H_PERF_CTR;
+   amd_ctrls_base = MSR_F15H_PERF_CTL;
+   amd_msr_step = 2;
+   k7_counters_mirrored = 1;
+   break;
+   case 0x10:
+   case 0x12:
+   case 0x14:
+   case 0x16:
+   default:
+   amd_num_counters = F10H_NUM_COUNTERS;
+   amd_counters_base = MSR_K7_PERFCTR0;
+   amd_ctrls_base = MSR_K7_EVNTSEL0;
+   amd_msr_step = 1;
+   k7_counters_mirrored = 0;
+   break;
+   }
+   } else {
+   uint32_t eax, ebx, ecx, edx;
+
+   cpuid(0xa, eax, ebx, ecx, edx);
+
+   intel_num_arch_counters = (eax  PMU_GENERAL_NR_MASK) 
+   PMU_GENERAL_NR_SHIFT;
+   intel_num_fixed_counters = (edx  PMU_FIXED_NR_MASK) 
+   PMU_FIXED_NR_SHIFT;
+   }
+}
+
+static inline uint32_t get_fam15h_addr(u32 addr)
+{
+   switch (addr) {
+   case MSR_K7_PERFCTR0:
+   case MSR_K7_PERFCTR1:
+   case MSR_K7_PERFCTR2:
+   case MSR_K7_PERFCTR3:
+   return MSR_F15H_PERF_CTR + (addr - MSR_K7_PERFCTR0);
+   case MSR_K7_EVNTSEL0:
+   case MSR_K7_EVNTSEL1:
+   case MSR_K7_EVNTSEL2:
+   case MSR_K7_EVNTSEL3:
+   return MSR_F15H_PERF_CTL + (addr - MSR_K7_EVNTSEL0);
+   default:
+   break;
+   }
+
+   return addr;
+}
+
+static inline bool is_amd_pmu_msr(unsigned int msr)
+{
+   if ((msr = MSR_F15H_PERF_CTL 
+msr  MSR_F15H_PERF_CTR + (amd_num_counters * 2)) ||
+   (msr = MSR_K7_EVNTSEL0 
+msr  MSR_K7_PERFCTR0 + amd_num_counters))
+   return true;
+
+   return false;
+}
+
+static int is_intel_pmu_msr(u32 msr_index, int *type, int *index)
+{
+   u32 msr_index_pmc;
+
+   switch (msr_index) {
+   case MSR_CORE_PERF_FIXED_CTR_CTRL:
+   case MSR_IA32_DS_AREA:
+   case MSR_IA32_PEBS_ENABLE:
+   *type = MSR_TYPE_CTRL;
+   return true;
+
+   case MSR_CORE_PERF_GLOBAL_CTRL:
+   case MSR_CORE_PERF_GLOBAL_STATUS:
+   case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+   *type = MSR_TYPE_GLOBAL;
+   return true;
+
+   default:
+
+   if ((msr_index = MSR_CORE_PERF_FIXED_CTR0) 
+   (msr_index  MSR_CORE_PERF_FIXED_CTR0 +
+intel_num_fixed_counters)) {
+   *index = msr_index - MSR_CORE_PERF_FIXED_CTR0;
+   *type = MSR_TYPE_COUNTER;
+

[Xen-devel] [PATCH v4 6/6] libxl: spawns two QEMUs for HVM guests

2015-07-02 Thread Stefano Stabellini

Starts a second QEMU to provide PV backends in userspace to HVM guests.
Use both dcs-dmss.pvqemu and dcs-dmss.dm to keep track of the starting
QEMUs. Introduce two new fields to struct libxl__dm_spawn_state: dcs to
store the pointer to libxl__domain_create_state, and rc to store the
return code.

Only proceed when both QEMUs have started.

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

---
Changes in v3:
- use dcs-dmss.pvqemu to spawn the second QEMU
- keep track of the rc of both QEMUs before proceeding
---
 tools/libxl/libxl_create.c   |   50 ++
 tools/libxl/libxl_dm.c   |9 +++-
 tools/libxl/libxl_internal.h |3 +++
 3 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index df946e2..94dd52c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -751,6 +751,9 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid,
 static void domcreate_devmodel_started(libxl__egc *egc,
libxl__dm_spawn_state *dmss,
int rc);
+static void domcreate_devmodel_callback(libxl__egc *egc,
+   libxl__dm_spawn_state *dmss,
+   int ret);
 static void domcreate_bootloader_console_available(libxl__egc *egc,
libxl__bootloader_state 
*bl);
 static void domcreate_bootloader_done(libxl__egc *egc,
@@ -1016,8 +1019,17 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 dcs-dmss.dm.spawn.ao = ao;
 dcs-dmss.dm.guest_config = dcs-guest_config;
 dcs-dmss.dm.build_state = dcs-build_state;
-dcs-dmss.dm.callback = domcreate_devmodel_started;
-dcs-dmss.callback = domcreate_devmodel_started;
+dcs-dmss.dm.callback = domcreate_devmodel_callback;
+dcs-dmss.dm.dcs = dcs;
+dcs-dmss.callback = domcreate_devmodel_callback;
+
+if (info-type == LIBXL_DOMAIN_TYPE_HVM) {
+dcs-dmss.pvqemu.guest_domid = domid;
+dcs-dmss.pvqemu.spawn.ao = ao;
+dcs-dmss.pvqemu.callback = domcreate_devmodel_callback;
+dcs-dmss.pvqemu.dcs = dcs;
+libxl__spawn_qdisk_backend(egc, dcs-dmss.pvqemu);
+}
 
 if ( restore_fd  0 ) {
 rc = libxl__domain_build(gc, d_config, domid, state);
@@ -1347,20 +1359,13 @@ static void domcreate_devmodel_started(libxl__egc *egc,
libxl__dm_spawn_state *dmss,
int ret)
 {
-libxl__domain_create_state *dcs = CONTAINER_OF(dmss, *dcs, dmss.dm);
+libxl__domain_create_state *dcs = dmss-dcs;
 STATE_AO_GC(dmss-spawn.ao);
-libxl_ctx *ctx = CTX;
 int domid = dcs-guest_domid;
 
 /* convenience aliases */
 libxl_domain_config *const d_config = dcs-guest_config;
 
-if (ret) {
-LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
-   device model did not start: %d, ret);
-goto error_out;
-}
-
 if (dcs-dmss.dm.guest_domid) {
 if (d_config-b_info.device_model_version
 == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
@@ -1379,11 +1384,28 @@ static void domcreate_devmodel_started(libxl__egc *egc,
 }
 
 domcreate_attach_vtpms(egc, dcs-multidev, 0);
-return;
+}
 
-error_out:
-assert(ret);
-domcreate_complete(egc, dcs, ret);
+static void domcreate_devmodel_callback(libxl__egc *egc,
+   libxl__dm_spawn_state *dmss,
+   int ret)
+{
+libxl__domain_create_state *dcs = dmss-dcs;
+STATE_AO_GC(dmss-spawn.ao);
+int worst_rc = 0;
+
+dmss-rc = ret;
+
+if (libxl__spawn_inuse(dcs-dmss.dm.spawn) ||
+libxl__spawn_inuse(dcs-dmss.pvqemu.spawn))
+   return;
+worst_rc = (dcs-dmss.dm.rc  dcs-dmss.pvqemu.rc) ? dcs-dmss.dm.rc : 
dcs-dmss.pvqemu.rc;
+
+/* all qemus have completed */
+if (worst_rc)
+domcreate_complete(egc, dcs, worst_rc);
+else
+domcreate_devmodel_started(egc, dmss, 0);
 }
 
 static void domcreate_attach_vtpms(libxl__egc *egc,
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 3a4625a..71e7fe2 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1850,13 +1850,20 @@ out:
 
 int libxl__destroy_device_model(libxl__gc *gc, uint32_t domid)
 {
+int rc;
 char *path = libxl__device_model_xs_path(gc, false, LIBXL_TOOLSTACK_DOMID,
  domid, );
 if (!xs_rm(CTX-xsh, XBT_NULL, path))
 LOG(ERROR, xs_rm failed for %s, path);
+
+kill_device_model(gc,
+GCSPRINTF(/local/domain/%d/image/pvqemu-pid, domid));
+
 /* We should try to destroy the device model anyway. */
-return kill_device_model(gc,
+rc = kill_device_model(gc,
 GCSPRINTF(/local/domain/%d/image/device-model-pid, domid));
+
+return rc;

Re: [Xen-devel] [PATCH 0/2] Refactor ioreq server for better performance.

2015-07-02 Thread Yu, Zhang


Oh, I forgot the version number and change history.
This patchset should be version 2.

The change history should be:
1 Split the original patch into 2;
2 Take Paul Durrant’s comments:
a Add a name member in the struct rb_rangeset, and use the ‘q’
debug key to dump the ranges in ioreq server;
b Keep original routine names for hvm ioreq server;
c. Commit message changes – mention that a future patch to change
the maximum ranges inside ioreq server;

Sorry, my fault. Could I add this change history in next version,
or should I resend the version 2? :)

Thanks
Yu

On 7/2/2015 8:31 PM, Yu Zhang wrote:

XenGT leverages ioreq server to track and forward the accesses to
GPU I/O resources, e.g. the PPGTT(per-process graphic translation
tables). Currently, ioreq server uses rangeset to track the BDF/
PIO/MMIO ranges to be emulated. To select an ioreq server, the
rangeset is searched to see if the I/O range is recorded. However,
traversing the link list inside rangeset could be time consuming
when number of ranges is too high. On HSW platform, number of PPGTTs
for each vGPU could be several hundred. On BDW, this value could
be several thousand.

To accommodate more ranges, limitation of the number of ranges in an
ioreq server, MAX_NR_IO_RANGES is changed - future patches will be
provided to tune this with other approaches. And to increase the ioreq
server performance, a new data structure, rb_rangeset, is introduced.

Yu Zhang (2):
   Resize the MAX_NR_IO_RANGES for ioreq server
   Add new data structure to track ranges.

  xen/arch/x86/domain.c|   3 +
  xen/arch/x86/hvm/hvm.c   |  56 ++--
  xen/common/Makefile  |   1 +
  xen/common/rb_rangeset.c | 281 +++
  xen/include/asm-x86/hvm/domain.h |   4 +-
  xen/include/asm-x86/hvm/hvm.h|   1 +
  xen/include/xen/rb_rangeset.h|  49 +++
  7 files changed, 379 insertions(+), 16 deletions(-)
  create mode 100644 xen/common/rb_rangeset.c
  create mode 100644 xen/include/xen/rb_rangeset.h



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] osstest: install libnl3 packages

2015-07-02 Thread Ian Jackson

Roger Pau Monne writes ([PATCH] osstest: install libnl3 packages):
 Install the libnl3 packages needed by the remus code. Those are available on
 both Wheezy and Jessie, although the Wheezy ones are too old.

This patch implicitly drops support for lenny and squeeze.  I think
you should mention that in the commit message.

Personally I have no objection to that but it ought to be mentioned
explicitly, both so that people can object and so that anyone trying
to fix it up later can find the commit easily.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 6/6] xen/PMU: PMU emulation code

2015-07-02 Thread Boris Ostrovsky

Add PMU emulation code that runs when we are processing a PMU interrupt.
This code will allow us not to trap to hypervisor on each MSR/LVTPC access
(of which there may be quite a few in the handler).

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
---
 arch/x86/xen/pmu.c | 214 +
 1 file changed, 185 insertions(+), 29 deletions(-)

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 69a0b68..4c09167 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -13,11 +13,20 @@
 /* x86_pmu.handle_irq definition */
 #include ../kernel/cpu/perf_event.h
 
+#define XENPMU_IRQ_PROCESSING1
+struct xenpmu {
+   /* Shared page between hypervisor and domain */
+   struct xen_pmu_data *xenpmu_data;
 
-/* Shared page between hypervisor and domain */
-DEFINE_PER_CPU(struct xen_pmu_data *, xenpmu_shared);
-#define get_xenpmu_data()per_cpu(xenpmu_shared, smp_processor_id())
+   uint8_t flags;
+};
+DEFINE_PER_CPU(struct xenpmu, xenpmu_shared);
+#define get_xenpmu_data()(this_cpu_ptr(xenpmu_shared)-xenpmu_data)
+#define get_xenpmu_flags()   (this_cpu_ptr(xenpmu_shared)-flags)
 
+/* Macro for computing address of a PMU MSR bank */
+#define field_offset(ctxt, field) ((void *)((uintptr_t)ctxt + \
+   (uintptr_t)ctxt-field))
 
 /* AMD PMU */
 #define F15H_NUM_COUNTERS   6
@@ -169,19 +178,124 @@ static int is_intel_pmu_msr(u32 msr_index, int *type, 
int *index)
}
 }
 
-bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err)
+static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type,
+ int index, bool is_read)
 {
+   uint64_t *reg = NULL;
+   struct xen_pmu_intel_ctxt *ctxt;
+   uint64_t *fix_counters;
+   struct xen_pmu_cntr_pair *arch_cntr_pair;
+   struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
+   uint8_t xenpmu_flags = get_xenpmu_flags();
+
 
+   if (!xenpmu_data || !(xenpmu_flags  XENPMU_IRQ_PROCESSING))
+   return false;
+
+   ctxt = xenpmu_data-pmu.c.intel;
+
+   switch (msr) {
+   case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+   reg = ctxt-global_ovf_ctrl;
+   break;
+   case MSR_CORE_PERF_GLOBAL_STATUS:
+   reg = ctxt-global_status;
+   break;
+   case MSR_CORE_PERF_GLOBAL_CTRL:
+   reg = ctxt-global_ctrl;
+   break;
+   case MSR_CORE_PERF_FIXED_CTR_CTRL:
+   reg = ctxt-fixed_ctrl;
+   break;
+   default:
+   switch (type) {
+   case MSR_TYPE_COUNTER:
+   fix_counters = field_offset(ctxt, fixed_counters);
+   reg = fix_counters[index];
+   break;
+   case MSR_TYPE_ARCH_COUNTER:
+   arch_cntr_pair = field_offset(ctxt, arch_counters);
+   reg = arch_cntr_pair[index].counter;
+   break;
+   case MSR_TYPE_ARCH_CTRL:
+   arch_cntr_pair = field_offset(ctxt, arch_counters);
+   reg = arch_cntr_pair[index].control;
+   break;
+   default:
+   return false;
+   }
+   }
+
+   if (reg) {
+   if (is_read)
+   *val = *reg;
+   else {
+   *reg = *val;
+
+   if (msr == MSR_CORE_PERF_GLOBAL_OVF_CTRL)
+   ctxt-global_status = (~(*val));
+   }
+   return true;
+   }
+
+   return false;
+}
+
+static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read)
+{
+   uint64_t *reg = NULL;
+   int i, off = 0;
+   struct xen_pmu_amd_ctxt *ctxt;
+   uint64_t *counter_regs, *ctrl_regs;
+   struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
+   uint8_t xenpmu_flags = get_xenpmu_flags();
+
+   if (!xenpmu_data || !(xenpmu_flags  XENPMU_IRQ_PROCESSING))
+   return false;
+
+   if (k7_counters_mirrored 
+   ((msr = MSR_K7_EVNTSEL0)  (msr = MSR_K7_PERFCTR3)))
+   msr = get_fam15h_addr(msr);
+
+   ctxt = xenpmu_data-pmu.c.amd;
+   for (i = 0; i  amd_num_counters; i++) {
+   if (msr == amd_ctrls_base + off) {
+   ctrl_regs = field_offset(ctxt, ctrls);
+   reg = ctrl_regs[i];
+   break;
+   } else if (msr == amd_counters_base + off) {
+   counter_regs = field_offset(ctxt, counters);
+   reg = counter_regs[i];
+   break;
+   }
+   off += amd_msr_step;
+   }
+
+   if (reg) {
+   if (is_read)
+   *val = *reg;
+   else
+   *reg = *val;
+
+   return true;
+   }
+

[Xen-devel] [PATCH v5 2/6] xen/PMU: Sysfs interface for setting Xen PMU mode

2015-07-02 Thread Boris Ostrovsky

Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall.

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
---
 Documentation/ABI/testing/sysfs-hypervisor-pmu |  23 +
 arch/x86/include/asm/xen/hypercall.h   |   6 ++
 drivers/xen/sys-hypervisor.c   | 127 +
 include/xen/interface/xen.h|   1 +
 include/xen/interface/xenpmu.h |  59 
 5 files changed, 216 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu
 create mode 100644 include/xen/interface/xenpmu.h

diff --git a/Documentation/ABI/testing/sysfs-hypervisor-pmu 
b/Documentation/ABI/testing/sysfs-hypervisor-pmu
new file mode 100644
index 000..d1ceb77
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-hypervisor-pmu
@@ -0,0 +1,23 @@
+What:  /sys/hypervisor/pmu/pmu_mode
+Date:  July 2015
+KernelVersion: 4.2
+Contact:   Boris Ostrovsky boris.ostrov...@oracle.com
+Description:
+   Describes mode that Xen's performance-monitoring unit (PMU)
+   uses. Accepted values are
+   off  -- PMU is disabled
+   self -- The guest can profile itself
+   hv   -- The guest can profile itself and, if it is
+ privileged (e.g. dom0), the hypervisor
+   all --  The guest can profile itself, the hypervisor
+ and all other guests. Only available to
+ privileged guests.
+
+What:   /sys/hypervisor/pmu/pmu_features
+Date:   July 2015
+KernelVersion:  4.2
+Contact:Boris Ostrovsky boris.ostrov...@oracle.com
+Description:
+   Describes Xen PMU features (as an integer). A set bit indicates
+   that the corresponding feature is enabled. See
+   include/xen/interface/xenpmu.h for available features
diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index ca08a27..83aea80 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -465,6 +465,12 @@ HYPERVISOR_tmem_op(
return _hypercall1(int, tmem_op, op);
 }
 
+static inline int
+HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
+{
+   return _hypercall2(int, xenpmu_op, op, arg);
+}
+
 static inline void
 MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
 {
diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 96453f8..f642085 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -20,6 +20,7 @@
 #include xen/xenbus.h
 #include xen/interface/xen.h
 #include xen/interface/version.h
+#include xen/interface/xenpmu.h
 
 #define HYPERVISOR_ATTR_RO(_name) \
 static struct hyp_sysfs_attr  _name##_attr = __ATTR_RO(_name)
@@ -368,6 +369,124 @@ static void xen_properties_destroy(void)
sysfs_remove_group(hypervisor_kobj, xen_properties_group);
 }
 
+struct pmu_mode {
+   const char *name;
+   uint32_t mode;
+};
+
+struct pmu_mode pmu_modes[] = {
+   {off, XENPMU_MODE_OFF},
+   {self, XENPMU_MODE_SELF},
+   {hv, XENPMU_MODE_HV},
+   {all, XENPMU_MODE_ALL}
+};
+
+static ssize_t pmu_mode_store(struct hyp_sysfs_attr *attr,
+ const char *buffer, size_t len)
+{
+   int ret;
+   struct xen_pmu_params xp;
+   int i;
+
+   for (i = 0; i  ARRAY_SIZE(pmu_modes); i++) {
+   if (strncmp(buffer, pmu_modes[i].name, len - 1) == 0) {
+   xp.val = pmu_modes[i].mode;
+   break;
+   }
+   }
+
+   if (i == ARRAY_SIZE(pmu_modes))
+   return -EINVAL;
+
+   xp.version.maj = XENPMU_VER_MAJ;
+   xp.version.min = XENPMU_VER_MIN;
+   ret = HYPERVISOR_xenpmu_op(XENPMU_mode_set, xp);
+   if (ret)
+   return ret;
+
+   return len;
+}
+
+static ssize_t pmu_mode_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+   int ret;
+   struct xen_pmu_params xp;
+   int i;
+   uint32_t mode;
+
+   xp.version.maj = XENPMU_VER_MAJ;
+   xp.version.min = XENPMU_VER_MIN;
+   ret = HYPERVISOR_xenpmu_op(XENPMU_mode_get, xp);
+   if (ret)
+   return ret;
+
+   mode = (uint32_t)xp.val;
+   for (i = 0; i  ARRAY_SIZE(pmu_modes); i++) {
+   if (mode == pmu_modes[i].mode)
+   return sprintf(buffer, %s\n, pmu_modes[i].name);
+   }
+
+   return -EINVAL;
+}
+HYPERVISOR_ATTR_RW(pmu_mode);
+
+static ssize_t pmu_features_store(struct hyp_sysfs_attr *attr,
+ const char *buffer, size_t len)
+{
+   int ret;
+   uint32_t features;
+   struct xen_pmu_params xp;
+
+   ret = kstrtou32(buffer, 0, features);
+   if (ret)
+   return ret;
+
+   xp.val = features;
+   xp.version.maj =

[Xen-devel] [PATCH v5 0/6] xen/PMU: PMU support for Xen PV(H) guests

2015-07-02 Thread Boris Ostrovsky

I haven't posted Linux part of PV(H) VPMU support in a while but now
that (hopefully) the hypervisor part is getting close to be done I
think it's time to post it again.

There are very few differences compared to the last version, mostly due
to updates in shared structures layouts. Patches 1 and 4 have no changes
at all and patch 5 has minor changes due to rebasing so I kept David's
Reviewed-by tag.


Boris Ostrovsky (6):
  xen: xensyms support
  xen/PMU: Sysfs interface for setting Xen PMU mode
  xen/PMU: Initialization code for Xen PMU
  xen/PMU: Describe vendor-specific PMU registers
  xen/PMU: Intercept PMU-related MSR and APIC accesses
  xen/PMU: PMU emulation code

 Documentation/ABI/testing/sysfs-hypervisor-pmu |  23 +
 arch/x86/include/asm/xen/hypercall.h   |   6 +
 arch/x86/include/asm/xen/interface.h   | 123 ++
 arch/x86/xen/Makefile  |   2 +-
 arch/x86/xen/apic.c|   6 +
 arch/x86/xen/enlighten.c   |  13 +-
 arch/x86/xen/pmu.c | 572 +
 arch/x86/xen/pmu.h |  15 +
 arch/x86/xen/smp.c |  29 +-
 arch/x86/xen/suspend.c |  23 +-
 drivers/xen/Kconfig|   8 +
 drivers/xen/sys-hypervisor.c   | 127 ++
 drivers/xen/xenfs/Makefile |   1 +
 drivers/xen/xenfs/super.c  |   3 +
 drivers/xen/xenfs/xenfs.h  |   1 +
 drivers/xen/xenfs/xensyms.c| 152 +++
 include/xen/interface/platform.h   |  18 +
 include/xen/interface/xen.h|   2 +
 include/xen/interface/xenpmu.h |  94 
 19 files changed, 1208 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu
 create mode 100644 arch/x86/xen/pmu.c
 create mode 100644 arch/x86/xen/pmu.h
 create mode 100644 drivers/xen/xenfs/xensyms.c
 create mode 100644 include/xen/interface/xenpmu.h

-- 
1.8.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 1/6] xen: xensyms support

2015-07-02 Thread Boris Ostrovsky

Export Xen symbols to dom0 via /proc/xen/xensyms (similar to
/proc/kallsyms).

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
Reviewed-by: David Vrabel david.vra...@citrix.com
---
 drivers/xen/Kconfig  |   8 +++
 drivers/xen/xenfs/Makefile   |   1 +
 drivers/xen/xenfs/super.c|   3 +
 drivers/xen/xenfs/xenfs.h|   1 +
 drivers/xen/xenfs/xensyms.c  | 152 +++
 include/xen/interface/platform.h |  18 +
 6 files changed, 183 insertions(+)
 create mode 100644 drivers/xen/xenfs/xensyms.c

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 7cd226d..1d825ca 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -280,4 +280,12 @@ config XEN_ACPI
def_bool y
depends on X86  ACPI
 
+config XEN_SYMS
+   bool Xen symbols
+   depends on XEN_DOM0  XENFS
+   default y if KALLSYMS
+   help
+  Exports hypervisor symbols (along with their types and addresses) via
+  /proc/xen/xensyms file, similar to /proc/kallsyms
+
 endmenu
diff --git a/drivers/xen/xenfs/Makefile b/drivers/xen/xenfs/Makefile
index b019865..1a83010 100644
--- a/drivers/xen/xenfs/Makefile
+++ b/drivers/xen/xenfs/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_XENFS) += xenfs.o
 
 xenfs-y  = super.o
 xenfs-$(CONFIG_XEN_DOM0) += xenstored.o
+xenfs-$(CONFIG_XEN_SYMS) += xensyms.o
diff --git a/drivers/xen/xenfs/super.c b/drivers/xen/xenfs/super.c
index 06092e0..8559a71 100644
--- a/drivers/xen/xenfs/super.c
+++ b/drivers/xen/xenfs/super.c
@@ -57,6 +57,9 @@ static int xenfs_fill_super(struct super_block *sb, void 
*data, int silent)
{ privcmd, xen_privcmd_fops, S_IRUSR|S_IWUSR },
{ xsd_kva, xsd_kva_file_ops, S_IRUSR|S_IWUSR},
{ xsd_port, xsd_port_file_ops, S_IRUSR|S_IWUSR},
+#ifdef CONFIG_XEN_SYMS
+   { xensyms, xensyms_ops, S_IRUSR},
+#endif
{},
};
 
diff --git a/drivers/xen/xenfs/xenfs.h b/drivers/xen/xenfs/xenfs.h
index 6b80c77..2c5934e 100644
--- a/drivers/xen/xenfs/xenfs.h
+++ b/drivers/xen/xenfs/xenfs.h
@@ -3,5 +3,6 @@
 
 extern const struct file_operations xsd_kva_file_ops;
 extern const struct file_operations xsd_port_file_ops;
+extern const struct file_operations xensyms_ops;
 
 #endif /* _XENFS_XENBUS_H */
diff --git a/drivers/xen/xenfs/xensyms.c b/drivers/xen/xenfs/xensyms.c
new file mode 100644
index 000..ed16f8d
--- /dev/null
+++ b/drivers/xen/xenfs/xensyms.c
@@ -0,0 +1,152 @@
+#include linux/module.h
+#include linux/init.h
+#include linux/seq_file.h
+#include linux/fs.h
+#include linux/mm.h
+#include linux/proc_fs.h
+#include linux/slab.h
+#include xen/interface/platform.h
+#include asm/xen/hypercall.h
+#include xen/xen-ops.h
+#include xenfs.h
+
+
+#define XEN_KSYM_NAME_LEN 127 /* Hypervisor may have different name length */
+
+struct xensyms {
+   struct xen_platform_op op;
+   char *name;
+   uint32_t namelen;
+};
+
+/* Grab next output page from the hypervisor */
+static int xensyms_next_sym(struct xensyms *xs)
+{
+   int ret;
+   struct xenpf_symdata *symdata = xs-op.u.symdata;
+   uint64_t symnum;
+
+   memset(xs-name, 0, xs-namelen);
+   symdata-namelen = xs-namelen;
+
+   symnum = symdata-symnum;
+
+   ret = HYPERVISOR_dom0_op(xs-op);
+   if (ret  0)
+   return ret;
+
+   /*
+* If hypervisor's symbol didn't fit into the buffer then allocate
+* a larger buffer and try again
+*/
+   if (unlikely(symdata-namelen  xs-namelen)) {
+   kfree(xs-name);
+
+   xs-namelen = symdata-namelen;
+   xs-name = kzalloc(xs-namelen, GFP_KERNEL);
+   if (!xs-name)
+   return 1;
+
+   set_xen_guest_handle(symdata-name, xs-name);
+   symdata-symnum--; /* Rewind */
+
+   ret = HYPERVISOR_dom0_op(xs-op);
+   if (ret  0)
+   return ret;
+   }
+
+   if (symdata-symnum == symnum)
+   /* End of symbols */
+   return 1;
+
+   return 0;
+}
+
+static void *xensyms_start(struct seq_file *m, loff_t *pos)
+{
+   struct xensyms *xs = (struct xensyms *)m-private;
+
+   xs-op.u.symdata.symnum = *pos;
+
+   if (xensyms_next_sym(xs))
+   return NULL;
+
+   return m-private;
+}
+
+static void *xensyms_next(struct seq_file *m, void *p, loff_t *pos)
+{
+   struct xensyms *xs = (struct xensyms *)m-private;
+
+   xs-op.u.symdata.symnum = ++(*pos);
+
+   if (xensyms_next_sym(xs))
+   return NULL;
+
+   return p;
+}
+
+static int xensyms_show(struct seq_file *m, void *p)
+{
+   struct xensyms *xs = (struct xensyms *)m-private;
+   struct xenpf_symdata *symdata = xs-op.u.symdata;
+
+   seq_printf(m, %016llx %c %s\n, symdata-address,
+  symdata-type, xs-name);
+
+   return 0;
+}
+
+static void

[Xen-devel] [PATCH v5 3/6] xen/PMU: Initialization code for Xen PMU

2015-07-02 Thread Boris Ostrovsky

Map shared data structure that will hold CPU registers, VPMU context,
V/PCPU IDs of the CPU interrupted by PMU interrupt. Hypervisor fills
this information in its handler and passes it to the guest for further
processing.

Set up PMU VIRQ.

Now that perf infrastructure will assume that PMU is available on a PV
guest we need to be careful and make sure that accesses via RDPMC
instruction don't cause fatal traps by the hypervisor. Provide a nop
RDPMC handler.

For the same reason avoid issuing a warning on a write to APIC's LVTPC.

Both of these will be made functional in later patches.

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
---
 arch/x86/include/asm/xen/interface.h | 123 +
 arch/x86/xen/Makefile|   2 +-
 arch/x86/xen/apic.c  |   3 +
 arch/x86/xen/enlighten.c |  12 ++-
 arch/x86/xen/pmu.c   | 172 +++
 arch/x86/xen/pmu.h   |  11 +++
 arch/x86/xen/smp.c   |  29 +-
 arch/x86/xen/suspend.c   |  23 +++--
 include/xen/interface/xen.h  |   1 +
 include/xen/interface/xenpmu.h   |  33 +++
 10 files changed, 400 insertions(+), 9 deletions(-)
 create mode 100644 arch/x86/xen/pmu.c
 create mode 100644 arch/x86/xen/pmu.h

diff --git a/arch/x86/include/asm/xen/interface.h 
b/arch/x86/include/asm/xen/interface.h
index 3400dba..dced9bc 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -172,6 +172,129 @@ struct vcpu_guest_context {
 #endif
 };
 DEFINE_GUEST_HANDLE_STRUCT(vcpu_guest_context);
+
+/* AMD PMU registers and structures */
+struct xen_pmu_amd_ctxt {
+   /*
+* Offsets to counter and control MSRs (relative to xen_pmu_arch.c.amd).
+* For PV(H) guests these fields are RO.
+*/
+   uint32_t counters;
+   uint32_t ctrls;
+
+   /* Counter MSRs */
+#if defined(__STDC_VERSION__)  __STDC_VERSION__ = 199901L
+   uint64_t regs[];
+#elif defined(__GNUC__)
+   uint64_t regs[0];
+#endif
+};
+
+/* Intel PMU registers and structures */
+struct xen_pmu_cntr_pair {
+   uint64_t counter;
+   uint64_t control;
+};
+
+struct xen_pmu_intel_ctxt {
+   /*
+* Offsets to fixed and architectural counter MSRs (relative to
+* xen_pmu_arch.c.intel).
+* For PV(H) guests these fields are RO.
+*/
+   uint32_t fixed_counters;
+   uint32_t arch_counters;
+
+   /* PMU registers */
+   uint64_t global_ctrl;
+   uint64_t global_ovf_ctrl;
+   uint64_t global_status;
+   uint64_t fixed_ctrl;
+   uint64_t ds_area;
+   uint64_t pebs_enable;
+   uint64_t debugctl;
+
+   /* Fixed and architectural counter MSRs */
+#if defined(__STDC_VERSION__)  __STDC_VERSION__ = 199901L
+   uint64_t regs[];
+#elif defined(__GNUC__)
+   uint64_t regs[0];
+#endif
+};
+
+/* Sampled domain's registers */
+struct xen_pmu_regs {
+   uint64_t ip;
+   uint64_t sp;
+   uint64_t flags;
+   uint16_t cs;
+   uint16_t ss;
+   uint8_t cpl;
+   uint8_t pad[3];
+};
+
+/* PMU flags */
+#define PMU_CACHED(10) /* PMU MSRs are cached in the context */
+#define PMU_SAMPLE_USER   (11) /* Sample is from user or kernel mode 
*/
+#define PMU_SAMPLE_REAL   (12) /* Sample is from realmode */
+#define PMU_SAMPLE_PV (13) /* Sample from a PV guest */
+
+/*
+ * Architecture-specific information describing state of the processor at
+ * the time of PMU interrupt.
+ * Fields of this structure marked as RW for guest should only be written by
+ * the guest when PMU_CACHED bit in pmu_flags is set (which is done by the
+ * hypervisor during PMU interrupt). Hypervisor will read updated data in
+ * XENPMU_flush hypercall and clear PMU_CACHED bit.
+ */
+struct xen_pmu_arch {
+   union {
+   /*
+* Processor's registers at the time of interrupt.
+* WO for hypervisor, RO for guests.
+*/
+   struct xen_pmu_regs regs;
+   /*
+* Padding for adding new registers to xen_pmu_regs in
+* the future
+*/
+#define XENPMU_REGS_PAD_SZ  64
+   uint8_t pad[XENPMU_REGS_PAD_SZ];
+   } r;
+
+   /* WO for hypervisor, RO for guest */
+   uint64_t pmu_flags;
+
+   /*
+* APIC LVTPC register.
+* RW for both hypervisor and guest.
+* Only APIC_LVT_MASKED bit is loaded by the hypervisor into hardware
+* during XENPMU_flush or XENPMU_lvtpc_set.
+*/
+   union {
+   uint32_t lapic_lvtpc;
+   uint64_t pad;
+   } l;
+
+   /*
+* Vendor-specific PMU registers.
+* RW for both hypervisor and guest (see exceptions above).
+* Guest's updates to this field are verified and then loaded by the
+* hypervisor into hardware during XENPMU_flush
+*/

Re: [Xen-devel] [PATCH v5 02/16] x86/hvm: remove multiple open coded 'chunking' loops

2015-07-02 Thread Andrew Cooper

On 30/06/15 14:05, Paul Durrant wrote:
 ...in hvmemul_read/write()

 Add hvmemul_phys_mmio_access() and hvmemul_linear_mmio_access() functions
 to reduce code duplication.

 NOTE: This patch also introduces a change in 'chunking' around a page
   boundary. Previously (for example) an 8 byte access at the last
   byte of a page would get carried out as 8 single-byte accesses.
   It will now be carried out as a single-byte access, followed by
   a 4-byte access, a 2-byte access and then another single-byte
   access.

 Signed-off-by: Paul Durrant paul.durr...@citrix.com
 Cc: Keir Fraser k...@xen.org
 Cc: Jan Beulich jbeul...@suse.com
 Cc: Andrew Cooper andrew.coop...@citrix.com
 ---
  xen/arch/x86/hvm/emulate.c |  223 
 +++-
  1 file changed, 116 insertions(+), 107 deletions(-)

 diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
 index 8b60843..b67f5db 100644
 --- a/xen/arch/x86/hvm/emulate.c
 +++ b/xen/arch/x86/hvm/emulate.c
 @@ -539,6 +539,117 @@ static int hvmemul_virtual_to_linear(
  return X86EMUL_EXCEPTION;
  }
  
 +static int hvmemul_phys_mmio_access(
 +paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer)
 +{
 +unsigned long one_rep = 1;
 +unsigned int chunk;
 +int rc;
 +
 +/* Accesses must fall within a page */

Full stop.

 +BUG_ON((gpa  (PAGE_SIZE - 1)) + size  PAGE_SIZE);

~PAGE_MASK as opposed to (PAGE_SIZE - 1)

 +
 +/*
 + * hvmemul_do_io() cannot handle non-power-of-2 accesses or
 + * accesses larger than sizeof(long), so choose the highest power
 + * of 2 not exceeding sizeof(long) as the 'chunk' size.
 + */
 +chunk = 1  (fls(size) - 1);

Depending on size, chunk can become undefined (shifting by 31 or -1) or
zero (shifting by 32).

How about

if ( size  sizeof(long) )
chunk = sizeof(long);
else
chunk = 1U  (fls(size) - 1);

?

 +if ( chunk  sizeof (long) )
 +chunk = sizeof (long);
 +
 +for ( ;; )
 +{
 +rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0,
 +buffer);
 +if ( rc != X86EMUL_OKAY )
 +break;
 +
 +/* Advance to the next chunk */

Full stop.

 +gpa += chunk;
 +buffer += chunk;
 +size -= chunk;
 +
 +if ( size == 0 )
 +break;
 +
 +/*
 + * If the chunk now exceeds the remaining size, choose the next
 + * lowest power of 2 that will fit.
 + */
 +while ( chunk  size )
 +chunk = 1;
 +}
 +
 +return rc;
 +}
 +
 +static int hvmemul_linear_mmio_access(
 +unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer,
 +uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t known_gpfn)
 +{
 +struct hvm_vcpu_io *vio = current-arch.hvm_vcpu.hvm_io;
 +unsigned long page_off = gla  (PAGE_SIZE - 1);

Can be int as opposed to long, and offset appears to be the prevailing
name.  Also, ~PAGE_MASK.

 +unsigned int chunk;
 +paddr_t gpa;
 +unsigned long one_rep = 1;
 +int rc;
 +
 +chunk = min_t(unsigned int, size, PAGE_SIZE - page_off);
 +
 +if ( known_gpfn )
 +gpa = pfn_to_paddr(vio-mmio_gpfn) | page_off;
 +else
 +{
 +rc = hvmemul_linear_to_phys(gla, gpa, chunk, one_rep, pfec,
 +hvmemul_ctxt);
 +if ( rc != X86EMUL_OKAY )
 +return rc;
 +}
 +
 +for ( ;; )
 +{
 +rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer);
 +if ( rc != X86EMUL_OKAY )
 +break;
 +
 +gla += chunk;
 +buffer += chunk;
 +size -= chunk;
 +
 +if ( size == 0 )
 +break;
 +
 +ASSERT((gla  (PAGE_SIZE - 1)) == 0);

~PAGE_MASK.

 +ASSERT(size  PAGE_SIZE);

Nothing I can see here prevents size being greater than PAGE_SIZE. 
chunk strictly will be, but size -= chunk can still leave size greater
than a page.

~Andrew

 +chunk = size;
 +rc = hvmemul_linear_to_phys(gla, gpa, chunk, one_rep, pfec,
 +hvmemul_ctxt);
 +if ( rc != X86EMUL_OKAY )
 +return rc;
 +}
 +
 +return rc;
 +}
 +


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()

2015-07-02 Thread Tim Deegan

At 13:43 +0100 on 02 Jul (1435844582), Ben Catterall wrote:
 Removed as they were wrappers around map_domain_page() to
 make it appear to take an mfn_t type.
 
 Signed-off-by: Ben Catterall ben.catter...@citrix.com

Reviewed-by: Tim Deegan t...@xen.org

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-02 Thread Ian Jackson

Ian Campbell writes (Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org 
Technical Call (2015-07-08)):
 On Thu, 2015-07-02 at 09:45 +0100, Ian Campbell wrote:
  Shall I put up a poll of some sort to gather preferred timeslot options
  out of that set?
 
 Please can everyone who is interested in this topic indicate their date
 preference/availability at:
 
 http://doodle.com/cy88dhwzybg7hh7p
 
 I've gone with the usual 5pm BST slow for simplicity. That's 1200 Noon
 EDT, 9am PDT and 6pm CEST.

I'm never available at 1700 BST on a Wednesday, I'm afraid.  I can
make that time any other day of the week.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] xen/arm: Find automatically the gnttab region for DOM0

2015-07-02 Thread Julien Grall

Hi,

Ping?

Regards,

On 17/06/15 14:58, Julien Grall wrote:
 Currently, the grant table region is hardcoded per-platform. When a new
 board is coming up, we have to check the spec in order to find a space
 in the memory layout free. Depending on the platform it may be tedious.
 
 A good candidate for the gnttab region is the one used by Xen binary as
 some part will never be mapped to the DOM0 address, MMIO are mapped 1:1
 and the RAM will be either:
 - direct mapped: 1:1 mapping is used = no problem
 - non direct mapped: Xen always relocates himself as high as possible
 (limited to 4GB on ARM32) and the RAM bank are filled from the first
 one. It's very unlikely that the gnttab region will overlap with the
 RAM. Although for safety a check may be necessary when we will reenable
 the option.
 
 Furthermore, there is plenty of space to contain a big gnttab, the default
 size is 32 frame (i.e 128KB) but it can be changed via a command option.
 
 It's not possible to use the whole region used by Xen, as some part of
 the binary will be freed after Xen boot and can be used by DOM0 and other
 guest. A sensible choice is the text secion as it will always reside in
 memory never be mapped to the guest and the size is big enough (~300KB
 on ARM64). It could be extended later to use other contiguous sections
 such as data...
 
 Note that on ARM64, the grant table region may be after 4GB (Xen is
 relocated to the highest address) using DOM0 32 bit with short page table
 may not work. Although, I don't think this is a big deal as device may not
 work and/or the RAM is too high due to the 1:1 mapping.
 
 This patch also drop the platforms thunderx and xilinx-zynqmp which became
 dummy by dropping the hardcoding DOM0 grant table region.
 
 Signed-off-by: Julien Grall julien.gr...@citrix.com
 
 ---
 Changes in v2:
 - Typoes
 - Message and comment improvement
 ---
  xen/arch/arm/domain_build.c| 47 
 --
  xen/arch/arm/kernel.h  |  4 +++
  xen/arch/arm/platform.c| 14 --
  xen/arch/arm/platforms/Makefile|  2 --
  xen/arch/arm/platforms/midway.c|  3 ---
  xen/arch/arm/platforms/omap5.c |  6 -
  xen/arch/arm/platforms/rcar2.c |  3 ---
  xen/arch/arm/platforms/seattle.c   |  3 ---
  xen/arch/arm/platforms/sunxi.c |  3 ---
  xen/arch/arm/platforms/thunderx.c  | 41 -
  xen/arch/arm/platforms/vexpress.c  |  2 --
  xen/arch/arm/platforms/xgene-storm.c   |  3 ---
  xen/arch/arm/platforms/xilinx-zynqmp.c | 41 -
  xen/include/asm-arm/platform.h |  7 -
  14 files changed, 43 insertions(+), 136 deletions(-)
  delete mode 100644 xen/arch/arm/platforms/thunderx.c
  delete mode 100644 xen/arch/arm/platforms/xilinx-zynqmp.c
 
 diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
 index e9cb8a9..d8b775f 100644
 --- a/xen/arch/arm/domain_build.c
 +++ b/xen/arch/arm/domain_build.c
 @@ -21,6 +21,7 @@
  
  #include asm/gic.h
  #include xen/irq.h
 +#include xen/grant_table.h
  #include kernel.h
  
  static unsigned int __initdata opt_dom0_max_vcpus;
 @@ -605,8 +606,8 @@ static int make_memory_node(const struct domain *d,
  return res;
  }
  
 -static int make_hypervisor_node(struct domain *d,
 -void *fdt, const struct dt_device_node 
 *parent)
 +static int make_hypervisor_node(const struct kernel_info *kinfo,
 +const struct dt_device_node *parent)
  {
  const char compat[] =
  xen,xen-__stringify(XEN_VERSION).__stringify(XEN_SUBVERSION)\0
 @@ -615,9 +616,10 @@ static int make_hypervisor_node(struct domain *d,
  gic_interrupt_t intr;
  __be32 *cells;
  int res;
 +/* Convenience alias */
  int addrcells = dt_n_addr_cells(parent);
  int sizecells = dt_n_size_cells(parent);
 -paddr_t gnttab_start, gnttab_size;
 +void *fdt = kinfo-fdt;
  
  DPRINT(Create hypervisor node\n);
  
 @@ -639,12 +641,9 @@ static int make_hypervisor_node(struct domain *d,
  if ( res )
  return res;
  
 -platform_dom0_gnttab(gnttab_start, gnttab_size);
 -DPRINT(  Grant table range: %#PRIpaddr-%#PRIpaddr\n,
 -   gnttab_start, gnttab_start + gnttab_size);
  /* reg 0 is grant table space */
  cells = reg[0];
 -dt_set_range(cells, parent, gnttab_start, gnttab_size);
 +dt_set_range(cells, parent, kinfo-gnttab_start, kinfo-gnttab_size);
  res = fdt_property(fdt, reg, reg,
 dt_cells_to_size(addrcells + sizecells));
  if ( res )
 @@ -1192,7 +1191,7 @@ static int handle_node(struct domain *d, struct 
 kernel_info *kinfo,
  
  if ( node == dt_host )
  {
 -res = make_hypervisor_node(d, kinfo-fdt, node);
 +res = make_hypervisor_node(kinfo, node);
  if ( res )
  return res;
  
 @@ -1368,6

Re: [Xen-devel] [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask()

2015-07-02 Thread George Dunlap

On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
dario.faggi...@citrix.com wrote:
 and of (almost every) direct use of cpupool_online_cpumask().

 In fact, what we really want for the most of the times,
 is the set of valid pCPUs of the cpupool a certain domain
 is part of. Furthermore, in case it's called with a NULL
 pool as argument, cpupool_scheduler_cpumask() does more
 harm than good, by returning the bitmask of free pCPUs!

 This commit, therefore:
  * gets rid of cpupool_scheduler_cpumask(), in favour of
cpupool_domain_cpumask(), which makes it more evident
what we are after, and accommodates some sanity checking;
  * replaces some of the calls to cpupool_online_cpumask()
with calls to the new functions too.

 Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
 ---
 Cc: George Dunlap george.dun...@eu.citrix.com
 Cc: Juergen Gross jgr...@suse.com
 Cc: Robert VanVossen robert.vanvos...@dornerworks.com
 Cc: Josh Whitehead josh.whiteh...@dornerworks.com
 Cc: Meng Xu men...@cis.upenn.edu
 Cc: Sisu Xi xis...@gmail.com
 ---
  xen/common/domain.c |5 +++--
  xen/common/domctl.c |4 ++--
  xen/common/sched_arinc653.c |2 +-
  xen/common/sched_credit.c   |6 +++---
  xen/common/sched_rt.c   |   12 ++--
  xen/common/sched_sedf.c |2 +-
  xen/common/schedule.c   |2 +-
  xen/include/xen/sched-if.h  |   12 ++--
  8 files changed, 27 insertions(+), 18 deletions(-)


 diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
 index a1945ac..8c36635 100644
 --- a/xen/common/sched_credit.c
 +++ b/xen/common/sched_credit.c
 @@ -309,7 +309,7 @@ __runq_remove(struct csched_vcpu *svc)
  static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
 const cpumask_t *mask)
  {
 -return !cpumask_subset(cpupool_online_cpumask(vc-domain-cpupool),
 +return !cpumask_subset(cpupool_domain_cpumask(vc-domain),
 vc-cpu_soft_affinity) 
 !cpumask_subset(vc-cpu_hard_affinity, vc-cpu_soft_affinity) 
 cpumask_intersects(vc-cpu_soft_affinity, mask);
 @@ -374,7 +374,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)

  /* cpu is vc-processor, so it must be in a cpupool. */
  ASSERT(per_cpu(cpupool, cpu) != NULL);
 -online = cpupool_online_cpumask(per_cpu(cpupool, cpu));
 +online = cpupool_domain_cpumask(new-sdom-dom);

Two comments in passing here (no action required):

1. Looks like passing both cpu and svc to this function is a bit
redundant, as the function can just look up new-vcpu-processor
itself

2. After this patch, the ASSERT there will be redundant, as there's
already an identical assert in cpupool_domain_cpumask()

Those won't hurt, but if you end up respinning you might think about
at least taking the ASSERT out.

 diff --git a/xen/common/schedule.c b/xen/common/schedule.c
 index 4ffcd98..7ad298a 100644
 --- a/xen/common/schedule.c
 +++ b/xen/common/schedule.c
 @@ -80,7 +80,7 @@ static struct scheduler __read_mostly ops;

  #define DOM2OP(_d)(((_d)-cpupool == NULL) ? ops : 
 ((_d)-cpupool-sched))
  #define VCPU2OP(_v)   (DOM2OP((_v)-domain))
 -#define VCPU2ONLINE(_v) cpupool_online_cpumask((_v)-domain-cpupool)
 +#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)-domain)

  static inline void trace_runstate_change(struct vcpu *v, int new_state)
  {
 diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
 index 7cc25c6..20af791 100644
 --- a/xen/include/xen/sched-if.h
 +++ b/xen/include/xen/sched-if.h
 @@ -183,9 +183,17 @@ struct cpupool
  atomic_t refcnt;
  };

 -#define cpupool_scheduler_cpumask(_pool) \
 -(((_pool) == NULL) ? cpupool_free_cpus : (_pool)-cpu_valid)
  #define cpupool_online_cpumask(_pool) \
  (((_pool) == NULL) ? cpu_online_map : (_pool)-cpu_valid)

 +static inline cpumask_t* cpupool_domain_cpumask(struct domain *d)
 +{
 +/*
 + * d-cpupool == NULL only for the idle domain, which should
 + * have no interest in calling into this.
 + */
 +ASSERT(d-cpupool != NULL);
 +return cpupool_online_cpumask(d-cpupool);
 +}

It's a bit strange to assert that d-cpupool != NULL, and then call a
macro whose return value only depends on whether d-cpupool is NULL or
not.  Why not just return d-cpupool-cpu_valid?

Thanks for tracking these down, BTW!

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 03/16] x86/hvm: change hvm_mmio_read_t and hvm_mmio_write_t length argument...

2015-07-02 Thread Andrew Cooper

On 30/06/15 14:05, Paul Durrant wrote:
 ...from unsigned long to unsigned int

 A 64-bit length is not necessary, 32 bits is enough.

 Signed-off-by: Paul Durrant paul.durr...@citrix.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 >

1 - 100 of 174 matches

Mail list logo