Re: [Xen-devel] [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
When you say not tools, I take it to mean that you're not exposing that option through the libxl interface? Yes. tools/libxc/xc_domain.c:xc_assign_dt_device() most certainly does pass it in, and that's the level I'm talking about. Someone reviewing this patch series needs to know, when xc or libxl set NO_RDM, what will be the effect? The fact that libxc *shouldn't* set NO_RDM for PCI devices doesn't mean it won't happen. Now looking at the end of the series and grepping for XEN_DOMCTL_DEV.*RDM, these values are *read and acted on* in exactly two places: xen/arch/x86/mm/p2m.c: The whole point of this series; if the p2m is occupied already, and flag == RDM_STRICT, return an error; otherwise ignore it. xen/drivers/passthrough/device_tree.c: If flag != NO_RDM, return an error. So the meaning of the flags is: For pci devices: - RDM_RELAXED, NO_RDM: ignore conflicts in set_identity_p2m_entry() - RDM_STRICT: error on conflicts in set_identity_p2m_entry() for dt devices: - Error if not NO_RDM Correct. It doesn't look to me like the NO_RDM setting actually adds any semantic meaning. What I see in the list of references you gave is a request from the list below is Julien saying this: I would also add a XEN_DOMCTL_DEV_NO_RDM that would be use for non-PCI assignment. It looks a bit like what you did is said, Well Julien asked for a NO_RDM setting, so here's a NO_RDM setting. Which while perhaps understandable, doesn't make the value have any more usefulness. It seems to me that the real problem is that you had two values to begin with, rather than actually having flags (as the name would imply). This what I would suggest. Make a single flag: #define _XEN_DOMCTL_DEV_RDM_RELAXED 0 #define XEN_DOMCTL_DEV_RDM_RELAXED (1U_XEN_DOMCTL_DEV_RDM_RELAXED) Then make the meaning of the flags as follows: * for pci devices: - RDM_RELAXED flag SET: ignore conflicts in set_identity_p2m_entry() - RDM_RELAXED flag CLEAR: error on conflicts in set_identity_p2m_entry() No problem. * for dt devices: - Ignore this flag entirely But we still a flag to assign_device() like this, diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c index 5d3842a..a182487 100644 --- a/xen/drivers/passthrough/device_tree.c +++ b/xen/drivers/passthrough/device_tree.c @@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev) goto fail; } -rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev)); +rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev), + XEN_DOMCTL_DEV_RDM_RELAXED); if ( rc ) goto fail; Or rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev), 0)? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 00/27] Libxl migration v2
Hi Andrew, Are there any updates of this series that I can checkout and rebase mine onto? :) On 06/15/2015 09:44 PM, Andrew Cooper wrote: This series adds support for the libxl migration v2 stream, and untangles the existing layering violations of the toolstack and qemu records. At the end of the series, legacy migration is no longer used. Note: Remus support is broken and (RFC) fixed in separate patches in this series. It was too tangled to fix in a bisectable fashon. Plain suspend/migrate/resume however is (should be) bisectable along the entire series. There are a couple of outstanding questions: 1) What to do about the toolstack/xenstore record. It is currently by being passed around as a blob, but it might be better to split it out. 2) What (if any) ABI/API qualifications are needed? (Particularly in reference to patch 21) The Remus code is untested by me, but is hopefully in the correct ballpark. All other combinations of suspend/migrate/resume have been tested with PV and HVM guests (qemu-trad and qemu-upstream), including 32 - 64 bit migration (which was the underlying bug causing us to write migration v2 in the first place). There are some further improvements which could be made. In particular, it appears that sending the toolstack record on each checkpoint is redundant, and there is certainly room for some more pruning of the legacy migration code. Anyway, thoughts/comments welcome. Please test! ~Andrew Andrew Cooper (22): tools/libxl: Fix libxl__ev_child_inuse() check for not-yet-initialised children tools/libxc: Always compile the compat qemu variables into xc_sr_context tools/libxl: Stash all restore parameters in domain_create_state tools/xl: Mandatory flag indicating the format of the migration stream tools/libxl: Introduce ROUNDUP() tools/libxl: Extra APIs for the save helper tools/libxl: Pass restore_fd as a parameter to libxl__xc_domain_restore() docs: Libxl migration v2 stream specification tools/python: Libxc migration v2 infrastructure tools/python: Libxl migration v2 infrastructure tools/python: Verification utility for v2 stream spec compliance tools/python: Conversion utility for legacy migration streams tools/libxl: Support converting a legacy stream to a v2 stream tools/libxl: Convert a legacy stream if needed tools/libxc+libxl+xl: Restore v2 streams tools/libxc+libxl+xl: Save v2 streams docs/libxl: [RFC] Introduce CHECKPOINT_END to support migration v2 remus streams tools/libxl: [RFC] Write checkpoint records into the stream tools/libx{c,l}: [RFC] Introduce restore_callbacks.checkpoint() tools/libxl: [RFC] Handle checkpoint records in a libxl migration v2 stream tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc tools/libxl: Drop all knowledge of toolstack callbacks Ian Jackson (2): libxl: cancellation: Preparations for save/restore cancellation libxl: cancellation: Handle SIGTERM in save/restore helper Ross Lagerwall (3): tools/libxl: Migration v2 stream format tools/libxl: Infrastructure for reading a libxl migration v2 stream tools/libxl: Infrastructure for writing a v2 stream docs/specs/libxl-migration-stream.pandoc | 218 tools/libxc/Makefile |2 - tools/libxc/include/xenguest.h|3 + tools/libxc/xc_sr_common.h|5 - tools/libxc/xc_sr_restore.c | 33 +- tools/libxc/xc_sr_restore_x86_hvm.c | 124 - tools/libxc/xc_sr_save_x86_hvm.c | 36 -- tools/libxl/Makefile |2 + tools/libxl/libxl_aoutils.c |7 + tools/libxl/libxl_convert_callout.c | 146 ++ tools/libxl/libxl_create.c| 80 +-- tools/libxl/libxl_dom.c | 61 +-- tools/libxl/libxl_internal.h | 140 - tools/libxl/libxl_save_callout.c | 63 +-- tools/libxl/libxl_save_helper.c | 95 ++-- tools/libxl/libxl_save_msgs_gen.pl|9 +- tools/libxl/libxl_sr_stream_format.h | 58 +++ tools/libxl/libxl_stream_read.c | 663 tools/libxl/libxl_stream_write.c | 640 +++ tools/libxl/libxl_types.idl |2 + tools/libxl/xl_cmdimpl.c |9 +- tools/python/Makefile |4 + tools/python/scripts/convert-legacy-stream.py | 683 + tools/python/scripts/verify-stream-v2.py | 174 +++ tools/python/setup.py |1 + tools/python/xen/migration/libxc.py | 446 tools/python/xen/migration/libxl.py | 199 +++ tools/python/xen/migration/tests.py | 54 ++ tools/python/xen/migration/verify.py | 37 ++ 29 files changed, 3638
Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)
El 01/07/15 a les 18.15, Boris Ostrovsky ha escrit: On 07/01/2015 11:57 AM, Ian Campbell wrote: The next Xen technical call will be at: Wed 8 Jul 17:00:00 BST 2015 `date -d @1436371200` See http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html for more information on the call. Please let me know (CC-ing the list) any topics which you would like to discuss. It might be useful to include: * References to any relevant/recent mailing list threads; * Other people who you think should be involved in the discussion (and CC them); If you would like to attend then please let me know so I can send you the dial in details. Given that there is fair amount of PVH-related work happening now (Roger's, Elena's and mine) perhaps we should have a discussion about that to see where we are going? Andrew, Tim, Roger, Jan (if he is back from vacation), Elena, Konrad and David would be good to have present. I'm sorry but I will be on vacation next week, hiking in the mountains without Internet or cell phone coverage, so I won't be able to join the meeting. I will be back on the 20th, maybe we can arrange to do another of this PVH meetings on the next technical call in early August? Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [rumpuserxen test] 59019: regressions - FAIL
flight 59019 rumpuserxen real [real] http://logs.test-lab.xenproject.org/osstest/logs/59019/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-rumpuserxen 5 rumpuserxen-build fail REGR. vs. 33866 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866 Tests which did not succeed, but are not blocking: test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a version targeted for testing: rumpuserxen 3b91e44996ea6ae1276bce1cc44f38701c53ee6f baseline version: rumpuserxen 30d72f3fc5e35cd53afd82c8179cc0e0b11146ad People who touched revisions under test: Antti Kantee po...@iki.fi Ian Jackson ian.jack...@eu.citrix.com Martin Lucina mar...@lucina.net Wei Liu l...@liuw.name jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-i386-rumpuserxen-i386 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 535 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling
On Thu, 2015-07-02 at 04:32 +, Wu, Feng wrote: -Original Message- From: Dario Faggioli [mailto:dario.faggi...@citrix.com] Sent: Tuesday, June 30, 2015 10:58 AM To: Wu, Feng Cc: xen-devel; k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Tian, Kevin; Zhang, Yang Z; george.dun...@eu.citrix.com; Wu, Feng Subject: Re: Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling On Mon, 2015-06-29 at 18:36 +0100, Andrew Cooper wrote: The basic idea here is: 1. When vCPU's state is RUNSTATE_running, - set 'NV' to 'Notification Vector'. - Clear 'SN' to accpet PI. - set 'NDST' to the right pCPU. 2. When vCPU's state is RUNSTATE_blocked, - set 'NV' to 'Wake-up Vector', so we can wake up the related vCPU when posted-interrupt happens for it. - Clear 'SN' to accpet PI. 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline, - Set 'SN' to suppress non-urgent interrupts. (Current, we only support non-urgent interrupts) - Set 'NV' back to 'Notification Vector' if needed. It might be me, but it feels a bit odd to see RUNSTATE-s being (ab)used directly for this, as it does feel odd to see arch specific code being added in there. Can't this be done in context_switch(), which is already architecture specific? I was thinking to something very similar to what has been done for PSR, i.e., on x86, put everything in __context_switch(). Looking at who's prev and who's next, and at what pause_flags each has set, you should be able to implement all of the above logic. Or am I missing something? As mentioned in the description of this patch, here we need to do something when the vCPU's state is changed, can we get the state transition in __context_switch(), such as running - blocking? Well, in the patch description you mention how you've done it, so of course it mentions runstates. That does not necessarily means we need to do something in vcpu_runstate_change(). Actually, that's exactly what I'm asking: can you check whether this thing that you need doing can be done somewhere else than in vcpu_runstaete_change() ? In fact, looking at how, where and what for, runstetes are used, that really does not feel right, at least to me. What you seem to be interested is whether a vCPU blocks and/or unblocks. Runstates are an abstraction, build up on top of (mostly) pause_flags, like _VPF_blocked (look at how runstate is updated). I think you should not build on top of such abstraction, but on top of pause_flags directly. I had a quick look, and it indeed seems to me that you can get all you need from there too. It might even result in the code looking simpler (but that's of course hard to tell without actually trying). In fact, inside the context switching code, you already know that prev was running so, if it has the proper flag set, it means it's blocking (i.e., going to RUNSTATE_blocked, in runstates language), if not, it maybe is being preempted (i.e., going to RUNSTATE_runnable). Therefore, you can enact all your logic, even without any need to keep track of the previous runstate, and without needing to build up a full state machine and looking at all possible transitions. So, can you have a look at whether that solution can fly? Because, if it does, I think it would be a lot better. Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)
On Thu, 2015-07-02 at 09:32 +0200, Roger Pau Monné wrote: El 01/07/15 a les 18.15, Boris Ostrovsky ha escrit: On 07/01/2015 11:57 AM, Ian Campbell wrote: The next Xen technical call will be at: Wed 8 Jul 17:00:00 BST 2015 `date -d @1436371200` See http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html for more information on the call. Please let me know (CC-ing the list) any topics which you would like to discuss. It might be useful to include: * References to any relevant/recent mailing list threads; * Other people who you think should be involved in the discussion (and CC them); If you would like to attend then please let me know so I can send you the dial in details. Given that there is fair amount of PVH-related work happening now (Roger's, Elena's and mine) perhaps we should have a discussion about that to see where we are going? Andrew, Tim, Roger, Jan (if he is back from vacation), Elena, Konrad and David would be good to have present. Yes, I think this topic would indeed be a worthwhile subject for discussion. I think there is a toolstack element to all this as well, so would it be a good idea to have Ian J, Stefano and Wei in addition to me (who will be there to run the call in any case)? I'd probably mark them in as optional participants for scheduling purposes. I'm sorry but I will be on vacation next week, hiking in the mountains without Internet or cell phone coverage, so I won't be able to join the meeting. I think you are one of the critical participants in the discussion, i.e. we can't go ahead without you. I will be back on the 20th, maybe we can arrange to do another of this PVH meetings on the next technical call in early August? The August call is scheduled for 12 August, which is 6 weeks from yesterday. We could either go for that or if it is too far out we could push the July iteration of the call back two weeks to Wednesday 22 July (so 3 weeks from yesterday). (a one week slip would still land in your vacation time). There is also the devsummit in the week 17 August (~7 weeks time) which is an opportunity for a f2f. Shall I put up a poll of some sort to gather preferred timeslot options out of that set? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 10/18] xen/arm: ITS: Add APIs to add and assign device
On Thu, 2015-07-02 at 14:10 +0530, Vijay Kilari wrote: On Mon, Jun 29, 2015 at 5:59 PM, Ian Campbell ian.campb...@citrix.com wrote: CIOn Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote: +/* Device assignment. Should be called from pci_device_add */ +int its_add_device(struct domain *d, u32 devid) +{ Prior to the PCI series landing, and to enable dom0 to use ITS it might be possible to call this from xen/arch/arm/platforms/thunderx.c via the specific_mappings platform hook, which would also expose the PCI controller to dom0 via a series of specific mmio mappings (look at xen/arch/arm/platforms/xgene-storm.c for the sort of thing I mean). That would, I think, give basic PCI functionality for dom0 (i.e. allowing us to boot on thunderx) and decouple things from the PCI series somewhat, which ought to make things easier overall IMHO. In case ThunderX, mmio mappings PCI RC is parsed to find devices. How do we know device ids upfront to call its_add_device? Ah yes, this approach probably doesn't work for ITS. Which is a shame. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On Thu, 2015-07-02 at 08:58 +, Wu, Feng wrote: -Original Message- From: Dario Faggioli [mailto:dario.faggi...@citrix.com] This is the third time that I ask: (1) whether it is possible to have more vcpus queued on one pcpu PI blocked list with desc.on (I really believe it is); I think it is, please see the following scenario: When cpu masks the interrupts, and an external interrupt occurs for the assigned device while the target vCPU2 is blocked, the wakeup notification event handler has no chance to run, after a while, another wakeup notification event for vCPU4 blocking on the same pCPU occurs, after cpu unmakes the interrupts, wakeup notification handler gets called. Then we get: vCPU2, desc.on = 1 and vCPU4, desc.on = 1 Then in the handler we need to kick both of them. Ok, first of all, thanks for answering! :-) And yes, this makes sense. (2) if yes, whether it is TheRightThing(TM) to kick all of them, as soon as any notification arrives, instead that putting together a mechanism for kicking only a specific one. Why can't we kick all of them, 'desc.on = 1' means there is a pending interrupt, when we meet this condition, kicking the related vCPU should be the right thing to do. Right, I see it now. I felt like I was missing something, and that's why I was asking to you to elaborate a bit more. Thanks again for having done this. I was missing/forgetting half of the way desc.on is actually handled, sorry for this. BTW, I'm finding it hard reading this series from the archives; there appears to be some threading issues and some missing messages. I also don't have it in my inbox, because my filters failed to spot and flag it properly. If you send a new version, please, Cc me, so it will be easier for me to look at all the patches, and provide a more helpful review. Thanks and Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 2/6] libxl: do not add a vkb backend to hvm guests
On Wed, 1 Jul 2015, Konrad Rzeszutek Wilk wrote: On Wed, Jul 01, 2015 at 11:29:46AM +0100, Stefano Stabellini wrote: On Tue, 30 Jun 2015, Konrad Rzeszutek Wilk wrote: On Tue, Jun 30, 2015 at 03:13:53PM +0100, Ian Campbell wrote: On Tue, 2015-06-30 at 15:02 +0100, Stefano Stabellini wrote: On Tue, 30 Jun 2015, Ian Campbell wrote: On Tue, 2015-06-30 at 12:21 +0100, Stefano Stabellini wrote: On Tue, 30 Jun 2015, Ian Campbell wrote: On Mon, 2015-06-29 at 18:59 +0100, Stefano Stabellini wrote: On Thu, 25 Jun 2015, Ian Campbell wrote: On Tue, 2015-06-16 at 16:39 +0100, Stefano Stabellini wrote: On Tue, 16 Jun 2015, Wei Liu wrote: On Wed, Jun 10, 2015 at 11:09:50AM +0100, Stefano Stabellini wrote: When QEMU restricts its xenstore connection, it cannot provide PV backends. A separate QEMU instance is required to provide PV backends in userspace, such as qdisk. With two separate instances, it is not possible to take advantage of vkb for mouse and keyboard, as the QEMU that emulates the graphic card (the device model), would be separate from the QEMU running the vkb backend (PV QEMU). The question is that how would this affect the non-split setup. vkb is useful because emulating usb forces QEMU to wake up more often. However there is no way around it. Does pvfb+vkb continue to work due to code somewhere else? Yes, it continues to work as usual for PV guests. Do we think anyone will actually be using emulated VGA + PV input devices? VGA + PV input only works with Linux and is only useful for power efficiency, because if you disable usb emulation in QEMU, then QEMU would be able to wake up less often. Given that usb emulation is still on by default, I don't think that this change will have a big impact. My question was whether we thought anyone would be using this non-default configuration, not what the impact on the default is. You gave a good reason why people might be using this facility, do you think anyone is actually using it? I don't know of anybody using it. I don't think we made clear enough how to use this non-default configuration and its advantages for users to go out of their ways to use it. That's good enough for me, thanks,. Can I add your acked-by? If you put some distillation of the reasoning given in this subthread for why we think we can get away with it into the commit message then yes. Why don't we also make the Linux code not expose this driver for HVM guests? I've had an go for this last year (can't find the link) as it would unduly And the link: http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg00160.html cause the Linux kernel to take an extra 30 seconds to boot. That is because 'xend' by default exposes the PV configuration even for HVM guests - and of course there are no PV drivers (as the VGA in QEMU is enabled). But even with xend it only happens with a vfb line is in the config file, right? That's why we didn't fix it back when the issue was reported, if I remember correctly. Both. If you had 'vnc' or 'vfb' it would setup the 'vfb' key. The only use case I had was for ARM - where there are no VGA - and the patch I think I had just disabled the xen-fbfront under X86 HVM. Yeah, we need xen-fbfront for ARM. Given that xen-fbfront is likely to go away for HVM guests, I wouldn't be opposed to stop the driver initialization in Linux on x86/HVM. Unless Roger's work on HVMlite is going to need xen-fbfront again, but in that case we'll be able to distinguish a regular HVM guest from an HVMlite guest, I think. Correct. Right now the 'xen_pvh_domain' is based on it being PV and auto-translate. That whole thing will need some help. But I am looking at the xen-fbfront.c driver and it might be that I had already fixed this issue! (inadvertly it seems) 51c71a3bbaca868043cc45b3ad3786dd48a90235 Author: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Tue Nov 26 15:05:40 2013 -0500 xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4). .. - if running in HVM, check if user wanted 'xen_emul_unplug=never', in which case bail out and don't load any PV drivers. - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci) does not exist, then bail out and not load PV drivers. - (v2)
Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
On 2015/7/2 18:28, George Dunlap wrote: On 07/02/2015 11:01 AM, Chen, Tiejun wrote: 1. After spending yet another half hour doing research, I haven't found any discussion that concluded we should have the global policy override the local policy I also took some time to go back checking this point and indeed this is not in that public design. And as I mentioned in another email which is following this, I also had a talk to Kevin about this issue, and looks this is just concluded from our internal discussion and he didn't post this in v2 design again because as you know, that design is about something in high level. And as I recall, these discussions can't cover everything at that moment because they thought we'd better post a preliminary patches to further discuss something since this is really a complicated case. So afterwards I sent out two RFC revisions to help all guys finalize a good solution. And I can confirm current policy is always same from the first RFC, but we didn't see any opposite advice until now. Probably because the reviewers all assumed that the design draft had been followed, and you didn't make it clear that you'd changed it. Shouldn't the patch head description already clarify this point? And I also comment this point in the code. After all, we already had several rounds of technical reviews so its a little hard to believe it was not obvious to be missed. 2. The only discussion I *did* find has *you yourself* saying that the per-device setting should override the global setting, not once, but twice; and nobody contradicting you. Maybe there is somewhere else a discussion somewhere where this was changed; but I've already spent half an hour this morning looking at where you said it was (v2 design discussion), and found the opposite -- just as I remembered. I'm not going to look anymore. You have now caused me to waste an awful lot of time on this series that could profitably have been used elsewhere. Sorry to this but I just think we already have 2 RFC revisions and 4 revisions without RFC, and some patches are already Acked, we really should overturn this policy right now? First of all, I think it's easy to change. I agree but what I'm saying is this is involving our policy. It shouldn't change this sort of thing if not all associated maintainers are in the agreement with you. Even if it weren't, I already said that I'd be OK with accepting the patch series with the existing override semantics, and without the default semantics, *if* it were renamed to make it clear what was going on. But, for future reference, I am not going to approve an interface I think is misleading or wrong -- particularly one like the xl interface which we want to avoid changing if possible -- just because time is short. One of my own features, HVM USB pass-through, has narrowly missed two releases (including the current one) because we wanted to be careful to get the interface right. I admit I should concern everything carefully like you. Again, I didn't walk into v2 design. So here I don't want to bring any confusion to you just with my reply. This is your feature, so it is your responsibility to understand and explain why you are doing what you are doing, if only to say Jan wanted Maybe you remember I just posted v1 but looks that was not a better design to show this implementation according to some feedback, so Kevin issued v2 revision and had a wider discussion with you guys. Since then I just follow this version. So I mean I don't further hold these things in high level since I just think both policy is fine to me because IMO, these two approaches are optional. X to happen because of Y [see $ref]. So this is why I said you'd better ask this to Kevin or Jan since I can't decide what's next at this point. Let me say that again: I don't care whether anyone pulled rank and ordered you to do something a certain way. YOU are the one submitting this patch. That means YOU responsible for understanding why they want it that way, and YOU are responsible for justifying it to other people. If you don't understand it at all, it's YOUR responsibility to get them to explain it, not mine to chase them down. As I said above I thought initially they're optional, and just about which one is a preference. So I picked up these patch descriptions reviewed in public to say this is our expectation. But looks this is not satisfied to you, so I don't think I can further explain this kind of thing appropriately, and then I ask you to ping Jan or Kevin to get a formal answer. Is this procedure not reasonable? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access
On 07/02/2015 12:25 PM, Tim Deegan wrote: At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote: On 02/07/15 11:48, George Dunlap wrote: Now in p2m_set_mem_access(), rather than just using an unsigned long in the loop iterating over gfns, you do this thing where you convert gfn_t to unsigned long, add one, and then convert it back to gfn_t again. I can't see any comments in v3 that suggest you doing that, and it seems a bit clunky. Is that really necessary? Wouldn't it be better to declare a local variable? I'm not strongly opinionated on this one, it just seems a bit strange. Everything else looks good, thanks. Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for typesafety don't allow for simply adding 1 to a typesafe variable. In a cases like this, I think it is acceptable to keep a unsigned long shadow variable and manipulate it is a plain integer. The eventual _gfn() required to pass it further down the callchain will help to visually re-enforce the appropriate type. After all, the entire point of these typesafes are to try and avoid accidentally mixing up the different address spaces, but a function which takes a typesafe, loops over a subset and passes the same typesafe further down can probably be trusted to DTRT, catching errors at review time. Off the top of my head, the only functions which would normally expect to mix and match the typesafes are the pagetable walking ones. It should be easy enough to extend the macros to define a gfn_inc(gfn_t) operator for this kind of thing. I was thinking that -- although in this case you'd still need to un-pack it to do the loop exit conditional. To really make things pretty you'd want a for_gfn_range() macro or something like that that takes a start gfn and a number. But that's really starting to be feature creep for this patch, which is why I didn't want to suggest it on v4. :-) -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.6 Development Update (2 WEEKS TO FREEZE, important information in preamble)
Hello Ian, Julien, On 07/02/2015 11:35 AM, Ian Campbell wrote: On Wed, 2015-07-01 at 11:17 +0100, Julien Grall wrote: As suggested by Wei on the top of his mail [1], can you please CC only relevant people and avoid to reply all? It seems that many people are unable to follow these simple instructions. Wei, perhaps you could stop CCing people who inappropriately do not trim their quotes or the CC list in the future. If they cannot do us the courtesy of doing so I don't see why they should receive a courtesy copy of the status mail. A somewhat less aggressive approach might be to use Bcc instead Cc for the bulk of people (i.e. anyone who needn't be cc-d on every reply). The failure case of someone who cannot read simple instructions then becomes a lack of CCs rather than a plethora of unwanted Ccs. First of all, let me apologize for doing this in the past, I'll certainly remember to not let it happen again. Second, I'd like to point out that, while I cannot speak for everyone, and so maybe it's just me, I find this statement: please trim your quotes when replying, and also trim the CC list if necessary a bit ambiguous. The quotes part is obvious (and not that many people have ommited to do that), but to be honest I haven't been clear on who is supposed to be in the trimmed CC list: the maintainers of the code I'm touching with my series and Wei? There doesn't seem to be a clear rule about who should be replied to (or maybe there is and I've missed it? If so, could you please point it out?). Maybe clearing this up could help with this problem in the future. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access
On 02/07/15 11:48, George Dunlap wrote: On 06/29/2015 04:45 PM, Vitaly Kuznetsov wrote: 'pfn' and 'start_pfn' are ambiguous, both these functions expect GFNs as input. On x86 the interface of p2m_set_mem_access() in p2m.c doesn't match the declaration in p2m-common.h as 'pfn' is being used instead of 'start_pfn'. On ARM both p2m_set_mem_access and p2m_get_mem_access interfaces don't match declarations from p2m-common.h: p2m_set_mem_access uses 'pfn' instead of 'start_pfn' and p2m_get_mem_access uses 'gpfn' instead of 'pfn'. Convert p2m_get_mem_access/p2m_set_mem_access (and __p2m_get_mem_access on ARM) interfaces to using gft_t instead of unsigned long and update all users of these functions. There is also an issue in p2m_get_mem_access on x86: 'gfn' parameter passed to gfn_lock/gfn_unlock is not defined. This code compiles only because of a coincidence: gfn_lock/gfn_unlock are currently macros which don't use their second argument. Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com --- Changes since v3: - Comment codying style fix [Razvan Cojocaru] - Use INVALID_GFN instead of ~0 and -1 [Andrew Cooper] - Convert p2m_get_mem_access/p2m_set_mem_access interfaces to using gfn_t [Andrew Cooper] But you missed a change... @@ -1600,9 +1600,11 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla, return (p2ma == p2m_access_n2rwx); } -/* Set access type for a region of pfns. - * If start_pfn == -1ul, sets the default access type */ -long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr, +/* + * Set access type for a region of gfns. + * If gfn == INVALID_GFN, sets the default access type. + */ +long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr, uint32_t start, uint32_t mask, xenmem_access_t access) { struct p2m_domain *p2m = p2m_get_hostp2m(d); @@ -1638,18 +1640,19 @@ long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr, return -EINVAL; } -/* If request to set default access */ -if ( pfn == ~0ul ) +/* If request to set default access. */ +if ( gfn_x(gfn) == INVALID_GFN ) { p2m-default_access = a; return 0; } p2m_lock(p2m); -for ( pfn += start; nr start; ++pfn ) +for ( gfn = _gfn(gfn_x(gfn) + start); nr start; + gfn = _gfn(gfn_x(gfn) + 1) ) Now in p2m_set_mem_access(), rather than just using an unsigned long in the loop iterating over gfns, you do this thing where you convert gfn_t to unsigned long, add one, and then convert it back to gfn_t again. I can't see any comments in v3 that suggest you doing that, and it seems a bit clunky. Is that really necessary? Wouldn't it be better to declare a local variable? I'm not strongly opinionated on this one, it just seems a bit strange. Everything else looks good, thanks. Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for typesafety don't allow for simply adding 1 to a typesafe variable. In a cases like this, I think it is acceptable to keep a unsigned long shadow variable and manipulate it is a plain integer. The eventual _gfn() required to pass it further down the callchain will help to visually re-enforce the appropriate type. After all, the entire point of these typesafes are to try and avoid accidentally mixing up the different address spaces, but a function which takes a typesafe, loops over a subset and passes the same typesafe further down can probably be trusted to DTRT, catching errors at review time. Off the top of my head, the only functions which would normally expect to mix and match the typesafes are the pagetable walking ones. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 10/18] xen/arm: ITS: Add APIs to add and assign device
On Mon, Jun 29, 2015 at 5:59 PM, Ian Campbell ian.campb...@citrix.com wrote: CIOn Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote: +/* Device assignment. Should be called from pci_device_add */ +int its_add_device(struct domain *d, u32 devid) +{ Prior to the PCI series landing, and to enable dom0 to use ITS it might be possible to call this from xen/arch/arm/platforms/thunderx.c via the specific_mappings platform hook, which would also expose the PCI controller to dom0 via a series of specific mmio mappings (look at xen/arch/arm/platforms/xgene-storm.c for the sort of thing I mean). That would, I think, give basic PCI functionality for dom0 (i.e. allowing us to boot on thunderx) and decouple things from the PCI series somewhat, which ought to make things easier overall IMHO. In case ThunderX, mmio mappings PCI RC is parsed to find devices. How do we know device ids upfront to call its_add_device? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
-Original Message- From: Dario Faggioli [mailto:dario.faggi...@citrix.com] Sent: Thursday, July 02, 2015 4:30 PM To: Wu, Feng Cc: Andrew Cooper; xen-devel@lists.xen.org; Zhang, Yang Z; george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote: +list_for_each_entry(vmx, per_cpu(pi_blocked_vcpu, cpu), +pi_blocked_vcpu_list) +if ( vmx-pi_desc.on ) +tasklet_schedule(vmx-pi_vcpu_wakeup_tasklet); There is a logical bug here. If we have two NV's delivered to this pcpu, we will kick the first vcpu twice. On finding desc.on, a kick should be scheduled, then the vcpu removed from this list. With desc.on set, we know for certain that another NV will not arrive for it until it has been scheduled again and the interrupt posted. Yes, that seems a possible issue (and one that should indeed be avoided). I'm still unsure about the one that I raised myself but, if it is possible to have more than one vcpu in a pcpu list, with desc.on==true, then it looks to me that we kick all of them, for each notification. Added what Andrew's spotted, if there are a bunch of vcpus, queued with desc.on==ture, and a bunch of notifications arrives before the tasklet gets executed, we'll be kicking the whole bunch of them for a bunch of times! :-/ As Andrew mentioned, removing the vCPUs with desc.on = true from the list can avoid kick vCPUs for multiple times. It avoids kicking vcpus multiple times if more than one notification arrives, yes. It is, therefore, not effective in making sure that, even with only one notification, you only kick the interested vcpu. This is the third time that I ask: (1) whether it is possible to have more vcpus queued on one pcpu PI blocked list with desc.on (I really believe it is); I think it is, please see the following scenario: When cpu masks the interrupts, and an external interrupt occurs for the assigned device while the target vCPU2 is blocked, the wakeup notification event handler has no chance to run, after a while, another wakeup notification event for vCPU4 blocking on the same pCPU occurs, after cpu unmakes the interrupts, wakeup notification handler gets called. Then we get: vCPU2, desc.on = 1 and vCPU4, desc.on = 1 Then in the handler we need to kick both of them. (2) if yes, whether it is TheRightThing(TM) to kick all of them, as soon as any notification arrives, instead that putting together a mechanism for kicking only a specific one. Why can't we kick all of them, 'desc.on = 1' means there is a pending interrupt, when we meet this condition, kicking the related vCPU should be the right thing to do. Thanks, Feng The fact that you're not answering is not so much of a big deal for me... I'll just keep asking! :-D Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] doc: Fix nonexistent error code in libxl_event_check example
Fix example code in comment.libxl_event_check() can return ERROR_NOT_READY; LIBXL_NOT_READY does not exist. Signed-off-by: Euan Harris euan.har...@citrix.com --- tools/libxl/libxl_event.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h index 3c6fcfe..fad4c14 100644 --- a/tools/libxl/libxl_event.h +++ b/tools/libxl/libxl_event.h @@ -213,7 +213,7 @@ void libxl_evdisable_disk_eject(libxl_ctx *ctx, libxl_evgen_disk_eject*); * libxl_osevent_afterpoll(...); * for (;;) { * r = libxl_event_check(...); - * if (r==LIBXL_NOT_READY) break; + * if (r==ERROR_NOT_READY) break; * if (r) goto error_out; * do something with the event; * } -- 2.4.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 10/18] xen/arm: ITS: Add APIs to add and assign device
On 02/07/15 10:01, Ian Campbell wrote: On Thu, 2015-07-02 at 14:10 +0530, Vijay Kilari wrote: On Mon, Jun 29, 2015 at 5:59 PM, Ian Campbell ian.campb...@citrix.com wrote: CIOn Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote: +/* Device assignment. Should be called from pci_device_add */ +int its_add_device(struct domain *d, u32 devid) +{ Prior to the PCI series landing, and to enable dom0 to use ITS it might be possible to call this from xen/arch/arm/platforms/thunderx.c via the specific_mappings platform hook, which would also expose the PCI controller to dom0 via a series of specific mmio mappings (look at xen/arch/arm/platforms/xgene-storm.c for the sort of thing I mean). That would, I think, give basic PCI functionality for dom0 (i.e. allowing us to boot on thunderx) and decouple things from the PCI series somewhat, which ought to make things easier overall IMHO. In case ThunderX, mmio mappings PCI RC is parsed to find devices. How do we know device ids upfront to call its_add_device? Ah yes, this approach probably doesn't work for ITS. Which is a shame. I guess the number of device IDs supported by thunderX is bounded and not so big. So you could do a loop which register all the IDs from 0 to N. Though, it may be possible to be smarter. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access
On 06/29/2015 04:45 PM, Vitaly Kuznetsov wrote: 'pfn' and 'start_pfn' are ambiguous, both these functions expect GFNs as input. On x86 the interface of p2m_set_mem_access() in p2m.c doesn't match the declaration in p2m-common.h as 'pfn' is being used instead of 'start_pfn'. On ARM both p2m_set_mem_access and p2m_get_mem_access interfaces don't match declarations from p2m-common.h: p2m_set_mem_access uses 'pfn' instead of 'start_pfn' and p2m_get_mem_access uses 'gpfn' instead of 'pfn'. Convert p2m_get_mem_access/p2m_set_mem_access (and __p2m_get_mem_access on ARM) interfaces to using gft_t instead of unsigned long and update all users of these functions. There is also an issue in p2m_get_mem_access on x86: 'gfn' parameter passed to gfn_lock/gfn_unlock is not defined. This code compiles only because of a coincidence: gfn_lock/gfn_unlock are currently macros which don't use their second argument. Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com --- Changes since v3: - Comment codying style fix [Razvan Cojocaru] - Use INVALID_GFN instead of ~0 and -1 [Andrew Cooper] - Convert p2m_get_mem_access/p2m_set_mem_access interfaces to using gfn_t [Andrew Cooper] But you missed a change... @@ -1600,9 +1600,11 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla, return (p2ma == p2m_access_n2rwx); } -/* Set access type for a region of pfns. - * If start_pfn == -1ul, sets the default access type */ -long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr, +/* + * Set access type for a region of gfns. + * If gfn == INVALID_GFN, sets the default access type. + */ +long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr, uint32_t start, uint32_t mask, xenmem_access_t access) { struct p2m_domain *p2m = p2m_get_hostp2m(d); @@ -1638,18 +1640,19 @@ long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr, return -EINVAL; } -/* If request to set default access */ -if ( pfn == ~0ul ) +/* If request to set default access. */ +if ( gfn_x(gfn) == INVALID_GFN ) { p2m-default_access = a; return 0; } p2m_lock(p2m); -for ( pfn += start; nr start; ++pfn ) +for ( gfn = _gfn(gfn_x(gfn) + start); nr start; + gfn = _gfn(gfn_x(gfn) + 1) ) Now in p2m_set_mem_access(), rather than just using an unsigned long in the loop iterating over gfns, you do this thing where you convert gfn_t to unsigned long, add one, and then convert it back to gfn_t again. I can't see any comments in v3 that suggest you doing that, and it seems a bit clunky. Is that really necessary? Wouldn't it be better to declare a local variable? I'm not strongly opinionated on this one, it just seems a bit strange. Everything else looks good, thanks. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access
At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote: On 02/07/15 11:48, George Dunlap wrote: Now in p2m_set_mem_access(), rather than just using an unsigned long in the loop iterating over gfns, you do this thing where you convert gfn_t to unsigned long, add one, and then convert it back to gfn_t again. I can't see any comments in v3 that suggest you doing that, and it seems a bit clunky. Is that really necessary? Wouldn't it be better to declare a local variable? I'm not strongly opinionated on this one, it just seems a bit strange. Everything else looks good, thanks. Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for typesafety don't allow for simply adding 1 to a typesafe variable. In a cases like this, I think it is acceptable to keep a unsigned long shadow variable and manipulate it is a plain integer. The eventual _gfn() required to pass it further down the callchain will help to visually re-enforce the appropriate type. After all, the entire point of these typesafes are to try and avoid accidentally mixing up the different address spaces, but a function which takes a typesafe, loops over a subset and passes the same typesafe further down can probably be trusted to DTRT, catching errors at review time. Off the top of my head, the only functions which would normally expect to mix and match the typesafes are the pagetable walking ones. It should be easy enough to extend the macros to define a gfn_inc(gfn_t) operator for this kind of thing. Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 11/22] xen/x86: allow disabling emulated devices for HVM guests
El 01/07/15 a les 17.46, Andrew Cooper ha escrit: On 01/07/15 15:46, Roger Pau Monne wrote: Introduce a new DOMCTL flag that can be used to disable device emulation inside of Xen for HVM guests. The following emulated devices are disabled when the XEN_DOMCTL_CDF_noemu is used: hpet, pmtimer, rtc, ioapic, lapic, pic and pmu. Also all the MMIO handlers are disabled. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: Suravee Suthikulpanit suravee.suthikulpa...@amd.com Cc: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com Cc: Jun Nakajima jun.nakaj...@intel.com Cc: Eddie Dong eddie.d...@intel.com Cc: Kevin Tian kevin.t...@intel.com I would be hesitant to have a blanket change like this. Consider APICV/AVIC. For performance reasons, we absolutely want HVM and PVH to make use of them, as they are substantially more efficient using hardware support than evening using plain evtchn hypercalls. However, the flipside is that we must provide an LAPIC emulation to cover the bits which hardware cannot virtualise. As a random idea, how about having a new hypercall or hvmparam which provides a bitmap of permitted emulators? This would allow far finer grain control over what is and isn't available to a domain. I don't think using a new hypercall or hvmparam is suitable for this, the emulators are initialized in hvm_domain_initialise which is called by the XEN_DOMCTL_createdomain hypercall. Trying to set them before calling XEN_DOMCTL_createdomain is impossible because there's no domain struct yet, and adding a new hypercall to do that later seems quite convoluted, IMHO it's best to never initialize them in the first place. I would rather add a bitmap field to xen_arch_domainconfig in order to describe which emulators we want to enable. I've been also wondering why we need to introduce this now, AFAICT we can always introduce this bitmap field later and remove XEN_DOMCTL_CDF_noemu/DOMCRF_noemu because the DOMCTL interface is not stable anyway. Also, from a guest POV, how is the hw emulated local apic going to be used? Are we going to route the interrupts from virtual devices (netfront, blkfront) to the lapic? Or we just want it for the timer and ditch the PV timer? I can see that this is more interesting for a PVH/HVMlite Dom0, but still in that case I'm not sure how a guest is supposed to interact with it. Will the PHYSDEV hypercalls route interrupts to the emulated local apic instead of pirqs event channels? Will we trap PCI/MSI/MSI-X configuration and emulate it? Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.6 Development Update (2 WEEKS TO FREEZE, important information in preamble)
On Thu, 2015-07-02 at 11:51 +0300, Razvan Cojocaru wrote: On 07/02/2015 11:35 AM, Ian Campbell wrote: It seems that many people are unable to follow these simple instructions. The quotes part is obvious (and not that many people have ommited to do that), but to be honest I haven't been clear on who is supposed to be in the trimmed CC list: the maintainers of the code I'm touching with my series and Wei? Exactly. It's the 'interested parties' that should be Cc-ed, which, 99% of the times, is exactly what you just said: - the maintainers because, well, they are the maintainers, they know the code, they most likely know your series and will be able to engage in a conversation on whether the estimation is correct or not; - the release manager, since we're discussing release; - there might me more people, such as, people that have been involved in the review process, despite not being maintainers, or... no, that's all that comes to my mind. So, trim to such a set, and you'll make most of the people happy, I bet. There doesn't seem to be a clear rule about who should be replied to (or maybe there is and I've missed it? If so, could you please point it out?). Maybe clearing this up could help with this problem in the future. I think it was pretty obvious. I guess it would not harm to add a line making this crystal clear in the mail, but it's Wei's call to judge whether that is really necessary. Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Sharing display between guests
Hi, I've started using Xen on an Allwinner A33, which works great as an headless device using the latest PSCI patches in U-Boot. However, we would like to do something more with it, and we would need to have two VMs accessing the display at once, each one drawing in its own part of the framebuffer. Something that would look like this: Framebuffer +---+ | | |Guest 1| | | +---+ | | | | | | |Guest 2| | | | | | | +---+ Where thing start to get interesting is that the second guest would be running Android, and as such would need OpenGL support, and access to the GPU, and that ideally the first guest would need to be able to draw over all the screen to create some kind of a drop-down menu. Our first thought was to use two different planes of a DRM/KMS driver, one for each VM, with the second guest having the primary plane, and the first guest having an overlay, and we would set it up in dom0. That would mean that we would have a static composition, that would be setup once and we could forget about it during the life of the system. This way, we would also have a fixed size framebuffer assigned to Android, which is much easier to support, and since we have total control over the application in the first guest, we would be able to control how much transparency we want to leave (== how much of Android do we want to be displayed), and we would be able to create our drop-down menu. Now the hard part: is such a setup possible at all with Xen? Can we export a single plane to a guest and let it be the only user of it? If that is possible, how would that interact with the 3D acceleration? If not, is it something that is conceptually flawed, or does one just need to write the appropriate amount of code? Do you have a better solution to this problem? Thanks a lot for your feedback, Maxime -- Maxime Ripard, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com signature.asc Description: Digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/4] xen: sched: avoid dumping duplicate information
On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli dario.faggi...@citrix.com wrote: When dumping scheduling information (debug key 'r'), what we print as 'Idle cpupool' is pretty much the same of what we print immediately after as 'Cpupool0'. In fact, if there are no pCPUs outside of any cpupools, it is exactly the same. If there are free pCPUs, there is some valuable information, but still a lot of duplication: (XEN) Online Cpus: 0-15 (XEN) Free Cpus: 8 (XEN) Idle cpupool: (XEN) Scheduler: SMP Credit Scheduler (credit) (XEN) info: (XEN) ncpus = 13 (XEN) master = 0 (XEN) credit = 3900 (XEN) credit balance = 45 (XEN) weight = 1280 (XEN) runq_sort = 11820 (XEN) default-weight = 256 (XEN) tslice = 30ms (XEN) ratelimit = 1000us (XEN) credits per msec = 10 (XEN) ticks per tslice = 3 (XEN) migration delay= 0us (XEN) idlers: ,6d29 (XEN) active vcpus: (XEN) 1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)} (XEN) 2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)} (XEN) 3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)} (XEN) 4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)} (XEN) 5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)} (XEN) CPU[08] sort=5791, sibling=,0300, core=,ff00 (XEN) run: [32767.8] pri=-64 flags=0 cpu=8 (XEN) Cpupool 0: (XEN) Cpus: 0-5,10-15 (XEN) Scheduler: SMP Credit Scheduler (credit) (XEN) info: (XEN) ncpus = 13 (XEN) master = 0 (XEN) credit = 3900 (XEN) credit balance = 45 (XEN) weight = 1280 (XEN) runq_sort = 11820 (XEN) default-weight = 256 (XEN) tslice = 30ms (XEN) ratelimit = 1000us (XEN) credits per msec = 10 (XEN) ticks per tslice = 3 (XEN) migration delay= 0us (XEN) idlers: ,6d29 (XEN) active vcpus: (XEN) 1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)} (XEN) 2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)} (XEN) 3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)} (XEN) 4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)} (XEN) 5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)} (XEN) CPU[00] sort=11801, sibling=,0003, core=,00ff (XEN) run: [32767.0] pri=-64 flags=0 cpu=0 ... ... ... (XEN) CPU[15] sort=11820, sibling=,c000, core=,ff00 (XEN) run: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)} (XEN) 1: [32767.15] pri=-64 flags=0 cpu=15 (XEN) Cpupool 1: (XEN) Cpus: 6-7,9 (XEN) Scheduler: SMP RTDS Scheduler (rtds) (XEN) CPU[06] (XEN) CPU[07] (XEN) CPU[09] With this change, we get rid of the redundancy, and retain only the information about the free pCPUs. (While there, turn a loop index variable from `int' to `unsigned int' in schedule_dump().) Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Acked-by: George Dunlap george.dun...@eu.citrix.com --- Cc: Juergen Gross jgr...@suse.com Cc: George Dunlap george.dun...@eu.citrix.com --- xen/common/cpupool.c |6 +++--- xen/common/schedule.c | 18 +- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 563864d..5471f93 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -728,10 +728,10 @@ void dump_runq(unsigned char key) print_cpumap(Online Cpus, cpu_online_map); if ( !cpumask_empty(cpupool_free_cpus) ) +{ print_cpumap(Free Cpus, cpupool_free_cpus); - -printk(Idle cpupool:\n); -schedule_dump(NULL); +schedule_dump(NULL); +} for_each_cpupool(c) { diff --git a/xen/common/schedule.c b/xen/common/schedule.c index ecf1545..4ffcd98 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -1473,16 +1473,24 @@ void scheduler_free(struct scheduler *sched) void schedule_dump(struct cpupool *c) { -int i; +unsigned int i; struct scheduler *sched; cpumask_t*cpus; /* Locking, if necessary, must be handled withing each scheduler */ -sched = (c == NULL) ? ops : c-sched; -cpus = cpupool_scheduler_cpumask(c); -printk(Scheduler: %s (%s)\n, sched-name, sched-opt_name); -SCHED_OP(sched, dump_settings);
Re: [Xen-devel] [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy
@@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) PCI_BUS(bdf) == pdev-bus PCI_DEVFN2(bdf) == devfn ) { -ret = rmrr_identity_mapping(pdev-domain, 1, rmrr); +/* + * RMRR is always reserved on e820 so either of flag + * is fine for hardware domain and here we'd like to + * pass XEN_DOMCTL_DEV_RDM_RELAXED. + */ +ret = rmrr_identity_mapping(pdev-domain, 1, rmrr, +XEN_DOMCTL_DEV_RDM_RELAXED); So two things. First, you assert that the value here won't matter, because the hardware domain is guaranteed never to have a conflict. Which is likely to be true almost all the time; but the question is, *if* something goes wrong, what should happen? For instance, suppose that someone accidentally introduces a bug in Xen that messes up or ignores reading a portion of the e820 map under certain circumstances. What should happen? Yes, you can image all possible cases. But if this kind of bug can come true, I really very doubt if Xen can boot successfully. Because e820 is a fundamental key to run OS, so this case is very easy to panic Xen, right? Anyway, I agree we should concern all corner cases. If you set this to RELAXED, this clash will be silently ignored; which means that devices that need RMRR will simply malfunction in weird ways without any warning messages having been printed that might give No. We always post that messages regardless of relaxe or strict since this massage just depends on one condition of that conflict exist. someone a hint about what is going on. If you set this to STRICT, then this clash will print an error message, but as far as I can tell, the rest of the device assignment will continue as normal. (Please correct me if I've followed the code wrong.) Not all cases are like this behavior but here is true. Since the device should be just as functional (or not functional) either way, but in the STRICT case should actually print an error message which someone might notice, it seems to me that STRICT is a better option for the hardware domain. Just see above. Secondly, you assert in response to Kevin's question in v3 that this path is only reachable when assigning to the hardware domain. I think you at least need to update the comment here to indicate that's what you think; it's not at all obvious just from looking at the function What about this? PCI_DEVFN2(bdf) == devfn ) { /* - * RMRR is always reserved on e820 so either of flag - * is fine for hardware domain and here we'd like to - * pass XEN_DOMCTL_DEV_RDM_RELAXED. + * Here means we're add a device to the hardware domain + * so actually RMRR is always reserved on e820 so either + * of flag is fine for hardware domain and here we'd like + * to pass XEN_DOMCTL_DEV_RDM_RELAXED. */ ret = rmrr_identity_mapping(pdev-domain, 1, rmrr, XEN_DOMCTL_DEV_RDM_RELAXED); that this is true. And if we do end up doing something besides STRICT, we should check to make sure that pdev-domain really *is* the hardware domain before acting like it is. if ( ret ) dprintk(XENLOG_ERR VTDPREFIX, d%d: RMRR mapping failed\n, pdev-domain-domain_id); @@ -1939,7 +1946,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev) PCI_DEVFN2(bdf) != devfn ) continue; -rmrr_identity_mapping(pdev-domain, 0, rmrr); +rmrr_identity_mapping(pdev-domain, 0, rmrr, + XEN_DOMCTL_DEV_RDM_RELAXED); } Same here wrt STRICT. This is inside intel_iommu_remove_device() so actually any flag doesn't take effect to rmrr_identity_mapping(). But I should add a comment like this, +/* + * Any flag is nothing to clear these mappings so here + * its always safe to set XEN_DOMCTL_DEV_RDM_RELAXED. + */ After those changes (a single RDM_RELAXED flag, passing STRICT in for the hardware domain) then I think this patch is in good shape. Based on my understanding to your concern, seems you always think in case of relax we don't post any message, right? But now as I reply above this is not correct so what's your further consideration? Anyway, I'm fine to change this. And after you suggested to keep one bit just to indicate XEN_DOMCTL_DEV_RDM_RELAXED, we don't have that actual XEN_DOMCTL_DEV_RDM_STRICT so I can just reset all associated flag as 0 easily. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 58974: regressions - FAIL
On Wed, 2015-07-01 at 21:29 -0700, Meng Xu wrote: Hi Dario, Hi, 2015-06-30 10:14 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com: if you'd b able to have a look at what's happening, that would be awesome. If you don't have time, I will have a look myself, but only in a few days. Hmm, this is another bug for RTDS on ARM. :-( I don't have an ARM board set up right now. I'm not sure if I can run/test it on ARM. I'm curious if this bug is similar with the previous lock-related bug of RTDS scheduler on ARM. Denys fixed the previous bug at http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg03933.html Yes, it reminded of that one to me too... It likely is something similar. :-/ I understand you don't have any suitable hardware. Still, if you could at least have a look at the failure and at the code, and try to provide an analysis and/or suggest a way for resolving the issue, we'll be glad to test patches on actual ARM systems. If you need to look at the Xen (quite likely) and Linux (quite unlikely) images, e.g., to disassemble it and check what corresponds to the address where things explode, and similar things, they're available too. Just look here: http://logs.test-lab.xenproject.org/osstest/logs/58974/test-armhf-armhf-xl-rtds/info.html And check what the build jobs, Xen and Linux paths are, in the table at the bottom. You'll find this: buildjobbuild-armhf xenbuildjob build-armhf xen_kernel_ver 3.16.7-ckt12+ Go to the summary of the flight, which includes build jobs: 'Flight 58974 scoreboard' http://logs.test-lab.xenproject.org/osstest/logs/58974/ Identify and go in that jobs' report: http://logs.test-lab.xenproject.org/osstest/logs/58974/build-armhf/info.html And go to: 'build/ (outputs from build)' http://logs.test-lab.xenproject.org/osstest/logs/58974/build-armhf/build/ You'll find the binaries (for Xen, in this case), there. The only caveat is that you'll need an ARM toolchain, to be able to use, in this case, arm-elf-objdump and similar tools. The procedure to get one is distro dependant, but it's not that hard these days (there are guides on our wiki too). Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
On 07/02/2015 02:11 AM, Chen, Tiejun wrote: If I'm correct, then #3 means it's not possible to have devices for a domain *default* to strict, but to be relaxed in individual instances. If you had five devices you wanted strict, and only one device you wanted to be relaxed (because you knew it didn't matter), you'd have to set reserved=strict for all the other devices, rather than just being able to set the domain setting to strict and set reserve=relaxed for the one. I think that both violates the principle of least surprise, and is less useful. So what's you idea to follow our requirement? So consider the following config snippet: --- rdm=reserve=relaxed pci=['01:00.1,msitranslate=1'] What should the policy for that device be? According to your policy document, it seems to me like it should be relaxed, since the domain default* is set to relaxed and nothing Why? strict should be in this case. OK, I think I see where the problem is. I had expected the domain-wide setting to be a default which was overridden by per-device policies (see pci_permissive and friends). So when I saw global default RDM policy We knew this behavior but we'd like to take a different consideration in this case. confirmation bias caused me to interpret it as what I expected to see -- the domain setting as the default, which the local setting could override. I see now that in your documentation you consistently talk about two different policies, each of which have their own defaults, and that the effective permissions for a device end up being the intersection of the two (i.e., only relaxed of both are relaxed; strict under all other circumstances). Why are you saying this is not our expectation? Just let me pick up that description *again*, Default per-device RDM policy is 'strict', while default global RDM policy is 'relaxed'. When both policies are specified on a given region, 'strict' is always preferred. Look, if I haven't understood what you meant by the exact same words the first 4 times I read it, simply repeating the same exact words is not going to be helpful. Ideally you need to try go understand where my misunderstanding is coming from and explain where I've misunderstood something; or, at least you need to try to use different words, or explain how the words you're using apply to the given situation. From my point of view, I already replied this previously by quoting part of the patch head description. As you know this revision is already marked as v4 and although I admit some code implementations still need a further review, at least our policy should already acknowledged right now unless this is really wrong. But in our case, looks you're concerning our mechanism is not expected to you. So This interface doesn't make any sense to me. Why, if the global If you have any objection to our solution, and if you can't find any reasonable answer from our design, just please ping Jan or Kevin because just do it to make this clear to us. And then, whatever, I'm going be fine to step next. I'm really not that person who can address this kind of change at this point in this high level. And you have no idea why that design was chosen; you're just doing what Certainly I have my own understanding with this issue. But you're told? in high level I have to say Yes. If you really read that v2 design and its associated discussion, you should notice I didn't put any response right there. Look, I'm getting a bit angry at your continual implication that I haven't put in enough work reading the background for this series. If you go back and look at the v2 design discussion, you'll see that I was actively involved in that discussion, and sent at least a dozen emails about it. I have now spent nearly two full days just on this series, including going back over lots of conversations that have happened before to find answers to questions which you could have given in a single line; and also to check assertions that you've made which have turned out to be false. In the v2 design discussion, the only thing I could find regarding the relationship between per-device settings and the domain-wide setting was as where you said [1]: per-device override is always favored if a conflicting setting in rmrr_host. And in v2, Wei asked you [2]: But this only works with global configuration and individual configuration in PCI spec trumps this, right? And you responded [3]: You're right. Now it happens that in all those cases you were literally talking about the rmrr_host part of the configuration, not the strict/relaxed part of the configuration; but that doesn't even make sense, since there *is* no device-specific rmrr_host setting -- the only configuration which has both a domain-wide and per-device component is the relaxed/strict. So: 1. After spending yet another half hour doing research, I haven't found any discussion that concluded we
Re: [Xen-devel] [PATCH] x86/p2m-ept: Don't unmap the EPT pagetable while it is still in use
On 06/30/2015 06:09 PM, Andrew Cooper wrote: The call to iommu_pte_flush() between the two hunks uses ept_entry-epte which is a pointer into the mapped page. It is eventually passed to `clflush` instruction which will suffer a pagefault if the virtual mapping has fallen out of the TLB. (XEN) [ Xen-4.5.0-xs102594-d x86_64 debug=y Not tainted ] (XEN) CPU:7 (XEN) RIP:e008:[82d0801572f0] cacheline_flush+0x4/0x9 snip (XEN) Xen call trace: (XEN)[82d0801572f0] cacheline_flush+0x4/0x9 (XEN)[82d08014] __iommu_flush_cache+0x4a/0x6a (XEN)[82d0801532e2] iommu_pte_flush+0x2b/0xd5 (XEN)[82d0801f909a] ept_set_entry+0x4bc/0x61f (XEN)[82d0801f0c25] p2m_set_entry+0xd1/0x112 (XEN)[82d0801f25b1] clear_mmio_p2m_entry+0x1a0/0x200 (XEN)[82d0801f4aac] unmap_mmio_regions+0x49/0x73 (XEN)[82d080106292] do_domctl+0x15bd/0x1edb (XEN)[82d080234fcb] syscall_enter+0xeb/0x145 (XEN) (XEN) Pagetable walk from 820040004ae0: (XEN) L4[0x104] = 0008668a5063 (XEN) L3[0x001] = 0008668a3063 (XEN) L2[0x000] = 00086689c063 (XEN) L1[0x004] = 00056f078063 0007f678 (XEN) (XEN) (XEN) Panic on CPU 7: (XEN) FATAL PAGE FAULT (XEN) [error_code=] (XEN) Faulting linear address: 820040004ae0 (XEN) Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Jan Beulich jbeul...@suse.com CC: George Dunlap george.dun...@eu.citrix.com CC: Jun Nakajima jun.nakaj...@intel.com CC: Eddie Dong eddie.d...@intel.com CC: Kevin Tian kevin.t...@intel.com Reviewed-by: George Dunlap george.dun...@eu.citrix.com -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.6 Development Update (2 WEEKS TO FREEZE, important information in preamble)
On Wed, 2015-07-01 at 11:17 +0100, Julien Grall wrote: As suggested by Wei on the top of his mail [1], can you please CC only relevant people and avoid to reply all? It seems that many people are unable to follow these simple instructions. Wei, perhaps you could stop CCing people who inappropriately do not trim their quotes or the CC list in the future. If they cannot do us the courtesy of doing so I don't see why they should receive a courtesy copy of the status mail. A somewhat less aggressive approach might be to use Bcc instead Cc for the bulk of people (i.e. anyone who needn't be cc-d on every reply). The failure case of someone who cannot read simple instructions then becomes a lack of CCs rather than a plethora of unwanted Ccs. Ian. Many thanks, [1] (Note, please trim your quotes when replying, and also trim the CC list if necessary. You might also consider changing the subject line of your reply to Status of FOO (Was: Xen 4.6 Development Update (X months reminder)) On 01/07/15 07:12, Chen, Tiejun wrote: * RMRR fix (fair) RFC posted Wei, I think this should be ok or good based on current status, and also should remove RFC here. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)
El 02/07/15 a les 10.45, Ian Campbell ha escrit: On Thu, 2015-07-02 at 09:32 +0200, Roger Pau Monné wrote: El 01/07/15 a les 18.15, Boris Ostrovsky ha escrit: On 07/01/2015 11:57 AM, Ian Campbell wrote: The next Xen technical call will be at: Wed 8 Jul 17:00:00 BST 2015 `date -d @1436371200` See http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html for more information on the call. Please let me know (CC-ing the list) any topics which you would like to discuss. It might be useful to include: * References to any relevant/recent mailing list threads; * Other people who you think should be involved in the discussion (and CC them); If you would like to attend then please let me know so I can send you the dial in details. Given that there is fair amount of PVH-related work happening now (Roger's, Elena's and mine) perhaps we should have a discussion about that to see where we are going? Andrew, Tim, Roger, Jan (if he is back from vacation), Elena, Konrad and David would be good to have present. Yes, I think this topic would indeed be a worthwhile subject for discussion. I think there is a toolstack element to all this as well, so would it be a good idea to have Ian J, Stefano and Wei in addition to me (who will be there to run the call in any case)? I'd probably mark them in as optional participants for scheduling purposes. Yes, my series include way more toolstack than hypervisor changes. I'm sorry but I will be on vacation next week, hiking in the mountains without Internet or cell phone coverage, so I won't be able to join the meeting. I think you are one of the critical participants in the discussion, i.e. we can't go ahead without you. I will be back on the 20th, maybe we can arrange to do another of this PVH meetings on the next technical call in early August? The August call is scheduled for 12 August, which is 6 weeks from yesterday. We could either go for that or if it is too far out we could push the July iteration of the call back two weeks to Wednesday 22 July (so 3 weeks from yesterday). (a one week slip would still land in your vacation time). I would rather go for the 22 or 29 of July or the 5 of August, because I'm also on vacations from the 10 to the 16 of August. There is also the devsummit in the week 17 August (~7 weeks time) which is an opportunity for a f2f. Shall I put up a poll of some sort to gather preferred timeslot options out of that set? That sounds fine, thanks. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)
On Thu, 2015-07-02 at 09:45 +0100, Ian Campbell wrote: Shall I put up a poll of some sort to gather preferred timeslot options out of that set? Please can everyone who is interested in this topic indicate their date preference/availability at: http://doodle.com/cy88dhwzybg7hh7p I've gone with the usual 5pm BST slow for simplicity. That's 1200 Noon EDT, 9am PDT and 6pm CEST. If it turns out we can't find a option at that time then we can try again with some other times in the mix. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
-Original Message- From: Dario Faggioli [mailto:dario.faggi...@citrix.com] Sent: Thursday, July 02, 2015 6:10 PM To: Wu, Feng Cc: Andrew Cooper; xen-devel@lists.xen.org; Zhang, Yang Z; george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked On Thu, 2015-07-02 at 08:58 +, Wu, Feng wrote: -Original Message- From: Dario Faggioli [mailto:dario.faggi...@citrix.com] This is the third time that I ask: (1) whether it is possible to have more vcpus queued on one pcpu PI blocked list with desc.on (I really believe it is); I think it is, please see the following scenario: When cpu masks the interrupts, and an external interrupt occurs for the assigned device while the target vCPU2 is blocked, the wakeup notification event handler has no chance to run, after a while, another wakeup notification event for vCPU4 blocking on the same pCPU occurs, after cpu unmakes the interrupts, wakeup notification handler gets called. Then we get: vCPU2, desc.on = 1 and vCPU4, desc.on = 1 Then in the handler we need to kick both of them. Ok, first of all, thanks for answering! :-) And yes, this makes sense. (2) if yes, whether it is TheRightThing(TM) to kick all of them, as soon as any notification arrives, instead that putting together a mechanism for kicking only a specific one. Why can't we kick all of them, 'desc.on = 1' means there is a pending interrupt, when we meet this condition, kicking the related vCPU should be the right thing to do. Right, I see it now. I felt like I was missing something, and that's why I was asking to you to elaborate a bit more. Thanks again for having done this. I was missing/forgetting half of the way desc.on is actually handled, sorry for this. BTW, I'm finding it hard reading this series from the archives; there appears to be some threading issues and some missing messages. I also don't have it in my inbox, because my filters failed to spot and flag it properly. If you send a new version, please, Cc me, so it will be easier for me to look at all the patches, and provide a more helpful review. Sure, thanks for the review! Thanks, Feng Thanks and Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote: +list_for_each_entry(vmx, per_cpu(pi_blocked_vcpu, cpu), +pi_blocked_vcpu_list) +if ( vmx-pi_desc.on ) +tasklet_schedule(vmx-pi_vcpu_wakeup_tasklet); There is a logical bug here. If we have two NV's delivered to this pcpu, we will kick the first vcpu twice. On finding desc.on, a kick should be scheduled, then the vcpu removed from this list. With desc.on set, we know for certain that another NV will not arrive for it until it has been scheduled again and the interrupt posted. Yes, that seems a possible issue (and one that should indeed be avoided). I'm still unsure about the one that I raised myself but, if it is possible to have more than one vcpu in a pcpu list, with desc.on==true, then it looks to me that we kick all of them, for each notification. Added what Andrew's spotted, if there are a bunch of vcpus, queued with desc.on==ture, and a bunch of notifications arrives before the tasklet gets executed, we'll be kicking the whole bunch of them for a bunch of times! :-/ As Andrew mentioned, removing the vCPUs with desc.on = true from the list can avoid kick vCPUs for multiple times. It avoids kicking vcpus multiple times if more than one notification arrives, yes. It is, therefore, not effective in making sure that, even with only one notification, you only kick the interested vcpu. This is the third time that I ask: (1) whether it is possible to have more vcpus queued on one pcpu PI blocked list with desc.on (I really believe it is); (2) if yes, whether it is TheRightThing(TM) to kick all of them, as soon as any notification arrives, instead that putting together a mechanism for kicking only a specific one. The fact that you're not answering is not so much of a big deal for me... I'll just keep asking! :-D Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
in high level I have to say Yes. If you really read that v2 design and its associated discussion, you should notice I didn't put any response right there. Look, I'm getting a bit angry at your continual implication that I Sorry to this. haven't put in enough work reading the background for this series. If you go back and look at the v2 design discussion, you'll see that I was actively involved in that discussion, and sent at least a dozen emails about it. I have now spent nearly two full days just on this series, Sure and thanks for your review and time. including going back over lots of conversations that have happened before to find answers to questions which you could have given in a single line; and also to check assertions that you've made which have turned out to be false. In the v2 design discussion, the only thing I could find regarding the relationship between per-device settings and the domain-wide setting was as where you said [1]: per-device override is always favored if a conflicting setting in rmrr_host. And in v2, Wei asked you [2]: But this only works with global configuration and individual configuration in PCI spec trumps this, right? And you responded [3]: You're right. Now it happens that in all those cases you were literally talking about the rmrr_host part of the configuration, not the strict/relaxed part of the configuration; but that doesn't even make sense, since there *is* no device-specific rmrr_host setting -- the only configuration which has both a domain-wide and per-device component is the relaxed/strict. So: 1. After spending yet another half hour doing research, I haven't found any discussion that concluded we should have the global policy override the local policy I also took some time to go back checking this point and indeed this is not in that public design. And as I mentioned in another email which is following this, I also had a talk to Kevin about this issue, and looks this is just concluded from our internal discussion and he didn't post this in v2 design again because as you know, that design is about something in high level. And as I recall, these discussions can't cover everything at that moment because they thought we'd better post a preliminary patches to further discuss something since this is really a complicated case. So afterwards I sent out two RFC revisions to help all guys finalize a good solution. And I can confirm current policy is always same from the first RFC, but we didn't see any opposite advice until now. 2. The only discussion I *did* find has *you yourself* saying that the per-device setting should override the global setting, not once, but twice; and nobody contradicting you. Maybe there is somewhere else a discussion somewhere where this was changed; but I've already spent half an hour this morning looking at where you said it was (v2 design discussion), and found the opposite -- just as I remembered. I'm not going to look anymore. You have now caused me to waste an awful lot of time on this series that could profitably have been used elsewhere. Sorry to this but I just think we already have 2 RFC revisions and 4 revisions without RFC, and some patches are already Acked, we really should overturn this policy right now? [1] marc.info/?i=aadfc41afe54684ab9ee6cbc0274a5d126147...@shsmsx101.ccr.corp.intel.com [2] marc.info/?i=20150519110041.gb21...@zion.uk.xensource.com [3] marc.info/?i=555c1b5c.7070...@intel.com I was involved in the design discussion, and from the very beginning I probably saw your plan but misunderstood it. I wouldn't be surprised if some others didn't quite understand what they were agreeing to. Again, I didn't walk into v2 design. So here I don't want to bring any confusion to you just with my reply. This is your feature, so it is your responsibility to understand and explain why you are doing what you are doing, if only to say Jan wanted Maybe you remember I just posted v1 but looks that was not a better design to show this implementation according to some feedback, so Kevin issued v2 revision and had a wider discussion with you guys. Since then I just follow this version. So I mean I don't further hold these things in high level since I just think both policy is fine to me because IMO, these two approaches are optional. X to happen because of Y [see $ref]. So this is why I said you'd better ask this to Kevin or Jan since I can't decide what's next at this point. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 00/13] xen/arm: Add support for GICv2 on GICv3
On 01/07/15 12:00, Julien Grall wrote: Hi all, Hi Ian, This patch series adds support for GICv2 on GICv3. This feature is available only when the GICv3 hardware is compatible with GICv2. When it's the case, the same interface is provided in order to use a virtualize GICv2 (i.e GICC and GICV). This will allow us to re-use the same vGIC driver. Currently GIC and vGIC drivers are tight because of the domain initialization splitted between GIC and vGIC. This patch series intends to remove this dependency in order to make the vGIC driver agnostic of the GIC driver. It has been tested on the ARMv8 Foundation Model with GICv2 and GICv3 as well as changing the vGIC version emulated for the guest (only on GICv3 host). A branch with all the patches can be found here: git://xenbits.xen.org/people/julieng/xen-unstable.git branch gicv2-on-gicv3-v3 All the patches as been acked except #11 and #12. I was wondering if you can apply patch #1-#10 as they are already acked. It would avoid me to resend the whole series. If you ack #11, you could even apply #11 and #13 (#12 is independent). Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v3] mg-all-branch-statuses: Show how up to date each branch is
Using report_find_push_age_info allows us to provide counts of attempts since the last baseline on current tip as well as the first attempt of each of those. Since everything serialises on the repo lock I didn't bother trying to parallelise anything. It's a little terse to keep it in 80 chars. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v2: Use new report_find_push_age_info functionality, output condensed. v3: - Correctly quote parameters to printf, so empty ones don't misalign the rest. - Drop dates in favour of number of days, and n/a in those cases - Print Error! if no tip is available and UpToDate if tip==basis. time with an recently updates set of Repos says: 57.34user 28.28system 5:34.16elapsed 25%CPU (0avgtext+0avgdata 47256maxresident)k 100216inputs+600outputs (673major+2332436minor)pagefaults 0swaps So it's not quick... Example output: Branch BasisTip #Tip #Tot 1stTip 1stNew libvirt d10a5f58 845184b20 10 n/a8 days linux-3.0e1c63f9f 5dba9ddd29 2 days 868 days linux-3.10 b3d78448 UpToDate linux-3.14 762167f9 UpToDate linux-3.16 162d6432 26749e7511 1 day 1 day linux-3.18 d048c068 ea5dd38e33 2 days 2 days linux-3.4bb4a05a0 cf1b3dad 15 161 11 days212 days linux-4.1b953c0d2 6a010c0a0- n/an/a linux-arm-xen64972ceb UpToDate linux-linus 6aaf0da8 4da3064d00 n/an/a linux-mingo-tip-master d935d0f7 778a1ac50 16 n/a1173 days linux-nextc8a9ad220 219 n/a447 days osstest 15d2dd50 Error! 0- n/an/a ovmf 269e0aeb 288ed59001 n/a1 day qemu-mainlined2966f80 UpToDate qemu-upstream-4.2-testingd2382550 UpToDate qemu-upstream-4.3-testingefae5e0f UpToDate qemu-upstream-4.4-testing32226f42 UpToDate qemu-upstream-4.5-testingd9552b0a UpToDate qemu-upstream-unstable c4a962ec UpToDate rumpuserxen 30d72f3f 3b91e449 61 113 77 days149 days seabios f24eb2f8 UpToDate xen-4.0-testing 2692df2a UpToDate xen-4.1-testing 40feff87 UpToDate xen-4.2-testing 38fcda22 UpToDate xen-4.3-testing e7c02297 UpToDate xen-4.4-testing 6c1cb3db UpToDate xen-4.5-testing e3bd3cef UpToDate xen-unstable c40317f1 3d55a17903 n/a2 days --- mg-all-branch-statuses | 135 + 1 file changed, 135 insertions(+) create mode 100755 mg-all-branch-statuses diff --git a/mg-all-branch-statuses b/mg-all-branch-statuses new file mode 100755 index 000..9a78ec2 --- /dev/null +++ b/mg-all-branch-statuses @@ -0,0 +1,135 @@ +#!/bin/bash +# -*- bash -*- +# +# Prints the status of each branch +# +# Usage: +#./mg-all-branch-statuses [BRANCH] +# +# If no BRANCHes specified, does all that are normally run by +# cr-daily-branch or out of crontab. + +# This is part of osstest, an automated testing framework for Xen. +# Copyright (C) 2009-2014 Citrix Inc. +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. + +set -e + +. cri-common + +mkdir -p tmp + +if [ $# = 0 ]; then + set `./mg-list-all-branches` +fi + +gather_info() +{ +local branch=$1; shift +local basis=$1; shift +local tip=$1;shift + +select_xenbranch + +local info=`perl -we ' + use Osstest::Executive; + use Osstest; + use Data::Dumper; + open DEBUG, /dev/null or die $!; + #open DEBUG, STDERR or die $!; + csreadconfig(); + my ($branch,$tree,$basis,$tip) = @ARGV; + print DEBUG branch=$branch tree=$tree basis=$basis tip=$tip\n; + my $info = report_find_push_age_info([qw(real adhoc play)], + undef, [($branch)], + $tree, $basis, $tip); + print DEBUG Dumper $info; + my $onevar = sub { + my ($var,$dflt) = @_; + $dflt //= ; + print export .uc($var).=\; + print
[Xen-devel] [linux-3.16 test] 59012: tolerable FAIL - PUSHED
flight 59012 linux-3.16 real [real] http://logs.test-lab.xenproject.org/osstest/logs/59012/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-i386-qemut-rhel6hvm-intel 12 guest-start/redhat.repeat fail in 58996 pass in 59012 test-armhf-armhf-xl-arndale 6 xen-bootfail pass in 58996 test-armhf-armhf-xl-multivcpu 7 host-ping-check-xenfail pass in 58996 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 14 guest-localmigrate fail in 58996 baseline untested test-armhf-armhf-xl-rtds 14 guest-start.2 fail in 58996 baseline untested test-amd64-i386-libvirt 11 guest-start fail like 58447 test-amd64-amd64-xl-credit2 17 guest-localmigrate/x10 fail like 58447 test-amd64-i386-libvirt-xsm 11 guest-start fail like 58447 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58447 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-arndale 12 migrate-support-check fail in 58996 never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-check fail in 58996 never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-i386-freebsd10-i386 9 freebsd-install fail never pass test-amd64-i386-freebsd10-amd64 9 freebsd-install fail never pass test-amd64-amd64-xl-multivcpu 17 guest-localmigrate/x10 fail never pass test-amd64-amd64-xl-rtds 17 guest-localmigrate/x10 fail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass version targeted for testing: linux26749e751cc2be7bc0f17a6cca68f6e5c0675191 baseline version: linux162d64326176ee1916fb98323d810c78a7e3d042 People who touched revisions under test: Aaro Koskinen aaro.koski...@nokia.com Adam Jackson a...@redhat.com Alex Deucher alexander.deuc...@amd.com Andrew Morton a...@linux-foundation.org Andy Lutomirski l...@kernel.org Axel Lin axel@ingics.com Chengyu Song cson...@gatech.edu Chris Mason c...@fb.com Clemens Ladisch clem...@ladisch.de Dan Carpenter dan.carpen...@oracle.com Dan Williams dan.j.willi...@intel.com Dave Airlie airl...@redhat.com David S. Miller da...@davemloft.net David Woodhouse david.woodho...@intel.com Dieter Jurzitza Dmitry Torokhov dmitry.torok...@gmail.com Eric Dumazet eduma...@google.com Fabio Estevam fabio.este...@freescale.com Fan Du fan...@intel.com Felipe Balbi ba...@ti.com Filipe Manana fdman...@suse.com Florian Fainelli f.faine...@gmail.com Greg Kroah-Hartman gre...@linuxfoundation.org Gregory CLEMENT gregory.clem...@free-electrons.com Gu Zheng guz.f...@cn.fujitsu.com H. Peter Anvin h...@linux.intel.com Hannes Frederic Sowa han...@stressinduktion.org Hans de Goede hdego...@redhat.com Herbert Xu herb...@gondor.apana.org.au Horia Geanta horia.gea...@freescale.com Hui Wang hui.w...@canonical.com Ian Campbell ian.campb...@citrix.com Ingo Molnar mi...@kernel.org James Hogan james.ho...@imgtec.com Jan Kara j...@suse.cz Jani Nikula jani.nik...@intel.com Jason A. Donenfeld ja...@zx2c4.com Jean Delvare jdelv...@suse.de Jeff Mahoney je...@suse.com Jenny Falkovich jen...@mellanox.com Jens Axboe ax...@fb.com Jiang Liu jiang@linux.intel.com Jim Bride jim.br...@linux.intel.com Jiri Pirko j...@resnulli.us Johan Hovold jo...@kernel.org Johannes Berg johannes.b...@intel.com John D. Blair jo...@candicontrols.com Jonathan Cameron ji...@kernel.org Jérôme Glisse jgli...@redhat.com Kim Phillips kim.phill...@freescale.com Lars-Peter Clausen l...@metafoo.de Laura Abbott labb...@fedoraproject.org Li RongQing roy.qing...@gmail.com Linus Torvalds torva...@linux-foundation.org Luis Henriques
[Xen-devel] Xen + futexes
Hi all! I have theoretical question. What do you think about porting futexes or binder to XEN? With best regards, -- *Vitaly Chernooky | Senior Developer - Product Engineering and Development* GlobalLogic P *+380.44.4929695 ext.1136* M *+380.63.6011802* S cvv_2k www.globallogic.com http://www.globallogic.com/email_disclaimer.txt ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] OSSTest: stop testing SEDF at all
On 06/30/2015 06:36 PM, Dario Faggioli wrote: SEDF has been broken and unmaintained at least until Xen 4.2, and most likely even before! Tests are failing without anyonce caring, and yet we're keeping using test resources for them. Let's stop doing this! Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Acked-by: George Dunlap george.dun...@eu.citrix.com --- diff --git a/allow.all b/allow.all index 88a3038..8067d5e 100644 --- a/allow.all +++ b/allow.all @@ -1,4 +1,3 @@ -test-@@-sedf@@ test-@@-rtds@@ build-@@logs-capture@@ test-@@-pcipt@@ diff --git a/make-flight b/make-flight index 31cb942..c763ce9 100755 --- a/make-flight +++ b/make-flight @@ -274,21 +274,6 @@ do_hvm_rhel6_tests () { done } -do_sedf_tests () { - if [ $xenarch != $dom0arch -o x$test_sedf != xy ]; then -return - fi - - for pin in '' -pin; do -job_create_test test-$xenarch$kern-$dom0arch-xl-sedf$pin \ - test-debian xl $xenarch $dom0arch \ -guests_vcpus=4\ -xen_boot_append=sched=sedf loglvl=all ${pin:+dom0_vcpus_pin} \ -linux_boot_append='loglevel=9 debug' \ -$debian_runvars all_hostflags=$most_hostflags - done -} - do_credit2_tests () { if [ $xenarch != $dom0arch ]; then return @@ -374,19 +359,18 @@ test_matrix_do_one () { do_multivcpu_tests - # RTDS came in 4.4, while SEDF is going away in 4.6 + # RTDS came in 4.5 case $xenbranch in - xen-3.*-testing) test_sedf=y; test_rtds=n ;; - xen-4.0-testing) test_sedf=y; test_rtds=n ;; - xen-4.1-testing) test_sedf=y; test_rtds=n ;; - xen-4.2-testing) test_sedf=y; test_rtds=n ;; - xen-4.3-testing) test_sedf=y; test_rtds=n ;; - xen-4.4-testing) test_sedf=y; test_rtds=n ;; - xen-4.5-testing) test_sedf=y; test_rtds=y ;; - *) test_sedf=n; test_rtds=y ;; + xen-3.*-testing) test_rtds=n ;; + xen-4.0-testing) test_rtds=n ;; + xen-4.1-testing) test_rtds=n ;; + xen-4.2-testing) test_rtds=n ;; + xen-4.3-testing) test_rtds=n ;; + xen-4.4-testing) test_rtds=n ;; + xen-4.5-testing) test_rtds=y ;; + *) test_rtds=y ;; esac - do_sedf_tests do_rtds_tests do_credit2_tests ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On 02/07/15 09:30, Dario Faggioli wrote: On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote: +list_for_each_entry(vmx, per_cpu(pi_blocked_vcpu, cpu), +pi_blocked_vcpu_list) +if ( vmx-pi_desc.on ) +tasklet_schedule(vmx-pi_vcpu_wakeup_tasklet); There is a logical bug here. If we have two NV's delivered to this pcpu, we will kick the first vcpu twice. On finding desc.on, a kick should be scheduled, then the vcpu removed from this list. With desc.on set, we know for certain that another NV will not arrive for it until it has been scheduled again and the interrupt posted. Yes, that seems a possible issue (and one that should indeed be avoided). I'm still unsure about the one that I raised myself but, if it is possible to have more than one vcpu in a pcpu list, with desc.on==true, then it looks to me that we kick all of them, for each notification. Added what Andrew's spotted, if there are a bunch of vcpus, queued with desc.on==ture, and a bunch of notifications arrives before the tasklet gets executed, we'll be kicking the whole bunch of them for a bunch of times! :-/ As Andrew mentioned, removing the vCPUs with desc.on = true from the list can avoid kick vCPUs for multiple times. It avoids kicking vcpus multiple times if more than one notification arrives, yes. It is, therefore, not effective in making sure that, even with only one notification, you only kick the interested vcpu. This is the third time that I ask: (1) whether it is possible to have more vcpus queued on one pcpu PI blocked list with desc.on (I really believe it is); (2) if yes, whether it is TheRightThing(TM) to kick all of them, as soon as any notification arrives, instead that putting together a mechanism for kicking only a specific one. We will receive one NV for every time the hardware managed to successfully set desc.on If multiple stack up and we proactively drain the list, we will subsequently search the list to completion for all remaining NV's, due to finding no appropriate entries. I can't currently decide whether this will be quicker or slower overall, or (most likely) it will even out to equal in the general case. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
On 07/02/2015 11:01 AM, Chen, Tiejun wrote: 1. After spending yet another half hour doing research, I haven't found any discussion that concluded we should have the global policy override the local policy I also took some time to go back checking this point and indeed this is not in that public design. And as I mentioned in another email which is following this, I also had a talk to Kevin about this issue, and looks this is just concluded from our internal discussion and he didn't post this in v2 design again because as you know, that design is about something in high level. And as I recall, these discussions can't cover everything at that moment because they thought we'd better post a preliminary patches to further discuss something since this is really a complicated case. So afterwards I sent out two RFC revisions to help all guys finalize a good solution. And I can confirm current policy is always same from the first RFC, but we didn't see any opposite advice until now. Probably because the reviewers all assumed that the design draft had been followed, and you didn't make it clear that you'd changed it. 2. The only discussion I *did* find has *you yourself* saying that the per-device setting should override the global setting, not once, but twice; and nobody contradicting you. Maybe there is somewhere else a discussion somewhere where this was changed; but I've already spent half an hour this morning looking at where you said it was (v2 design discussion), and found the opposite -- just as I remembered. I'm not going to look anymore. You have now caused me to waste an awful lot of time on this series that could profitably have been used elsewhere. Sorry to this but I just think we already have 2 RFC revisions and 4 revisions without RFC, and some patches are already Acked, we really should overturn this policy right now? First of all, I think it's easy to change. Even if it weren't, I already said that I'd be OK with accepting the patch series with the existing override semantics, and without the default semantics, *if* it were renamed to make it clear what was going on. But, for future reference, I am not going to approve an interface I think is misleading or wrong -- particularly one like the xl interface which we want to avoid changing if possible -- just because time is short. One of my own features, HVM USB pass-through, has narrowly missed two releases (including the current one) because we wanted to be careful to get the interface right. Again, I didn't walk into v2 design. So here I don't want to bring any confusion to you just with my reply. This is your feature, so it is your responsibility to understand and explain why you are doing what you are doing, if only to say Jan wanted Maybe you remember I just posted v1 but looks that was not a better design to show this implementation according to some feedback, so Kevin issued v2 revision and had a wider discussion with you guys. Since then I just follow this version. So I mean I don't further hold these things in high level since I just think both policy is fine to me because IMO, these two approaches are optional. X to happen because of Y [see $ref]. So this is why I said you'd better ask this to Kevin or Jan since I can't decide what's next at this point. Let me say that again: I don't care whether anyone pulled rank and ordered you to do something a certain way. YOU are the one submitting this patch. That means YOU responsible for understanding why they want it that way, and YOU are responsible for justifying it to other people. If you don't understand it at all, it's YOUR responsibility to get them to explain it, not mine to chase them down. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC 4/6] xen: Print and use errno where applicable.
On Wed, 1 Jul 2015, Konrad Rzeszutek Wilk wrote: On Wed, Jul 01, 2015 at 02:01:07PM +0100, Stefano Stabellini wrote: On Mon, 29 Jun 2015, Konrad Rzeszutek Wilk wrote: In Xen 4.6 commit cd2f100f0f61b3f333d52d1737dd73f02daee592 libxc: Fix do_memory_op to return negative value on errors made the libxc API less odd-ball: On errors, return value is -1 and error code is in errno. On success the return value is either 0 or an positive value. Since we could be running with an old toolstack in which the Exx value is in rc or the newer, we print both and return the -EXX depending on rc == -1 condition. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen-hvm.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/xen-hvm.c b/xen-hvm.c index 0408462..a92bc14 100644 --- a/xen-hvm.c +++ b/xen-hvm.c @@ -345,11 +345,12 @@ go_physmap: unsigned long idx = pfn + i; xen_pfn_t gpfn = start_gpfn + i; +/* In Xen 4.6 rc is -1 and errno contains the error value. */ rc = xc_domain_add_to_physmap(xen_xc, xen_domid, XENMAPSPACE_gmfn, idx, gpfn); if (rc) { DPRINTF(add_to_physmap MFN %PRI_xen_pfn to PFN % -PRI_xen_pfn failed: %d\n, idx, gpfn, rc); -return -rc; +PRI_xen_pfn failed: %d (errno: %d)\n, idx, gpfn, rc, errno); +return rc == -1 ? -errno : -rc; Printing both rc and errno is the right thing to do, but I am not sure changing return value depending on the libxc version is a good idea. Maybe we should be consistent and always return rc? In Xen 4.5 and earlier this function would return -EINVAL (say rc=EINVAL). With Xen 4.6 it would always return 1 on errors (rc is -1, and with --1 we get 1), while the errno would have EINVAL. To be consistent and have this function return an proper -Exx value we need that check to use errno in case rc == -1. Maybe the best thing to do is to introduce a versioned xen_xc_domain_add_to_physmap to include/hw/xen/xen_common.h I am uncomfortable with returning positive values as errors, which reminds me - I need to update the commit to mention the return 1 issue. Agreed } } @@ -422,11 +423,12 @@ static int xen_remove_from_physmap(XenIOState *state, xen_pfn_t idx = start_addr + i; xen_pfn_t gpfn = phys_offset + i; +/* In Xen 4.6 rc is -1 and errno contains the error value. */ rc = xc_domain_add_to_physmap(xen_xc, xen_domid, XENMAPSPACE_gmfn, idx, gpfn); if (rc) { fprintf(stderr, add_to_physmap MFN %PRI_xen_pfn to PFN % -PRI_xen_pfn failed: %d\n, idx, gpfn, rc); -return -rc; +PRI_xen_pfn failed: %d (errno: %d)\n, idx, gpfn, rc, errno); +return rc == -1 ? -errno : -rc; } } -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl: Increase device model startup timeout to 1min.
On Wed, Jul 01, 2015 at 04:03:55PM +0100, Stefano Stabellini wrote: On Tue, 30 Jun 2015, Ian Jackson wrote: * The number and nature of parallel operations done in the stress test is unreasonable for the provided hardware: = the timeout is fine I don't know if it is our place to make this call. Should we really be deciding what is considered reasonable? I think not. Defining what is reasonable and policies that match it is not a route I think we should take in libxl. Nevertheless if we are defining timeouts we are implicitly setting some parameters which imply that certain configurations are unreasonable. Hopefully all such configurations are absurd. If what you mean is that our bounds of `reasonable' should be very wide, then I agree. If anyone could reasonably expect it to work, then that is fine. Certainly we should refrain fromk subjective judgements. OK. How do you measure reasonable for this case? What I actually mean to ask is how do you suggest we proceed on this problem? Of course it would be nice if we knew exactly why this is happening, but the issue only happens once every 2-3 tempest runs, each of them takes about 1 hour. Tempest executes about 1300 tests for each run, some of them in parallel. We haven't taken the time to read all the tests run by tempest so we don't know exactly what they do. We don't really know the environment that causes the failure. Reading all the tests is not an option. We could try adding more tracing to the system, but given the type of error, if we do we are not likely to reproduce the error at all, or maybe reproduce something different. Given the state of things, I suggest we make sure that increasing the timeout actually fixes/works-around the problem. I would also like to see some empirical measurements that tell us by how much we should increase the timeout. Is 1 minute actually enough? I have tested an increase timeout this night. And here are the result. The machine is a AMD Opteron(tm) Processor 4284, with 8G of RAM and 8 pCPU. It's running Ubuntu 14.04, with Xen 4.4. On top of that, OpenStack have been deployed via devstack on a single. The test is to run Tempest with --concurrency=4. There are 4 tests runned in parallel, but they don't necessarly start a VM. When they do, it's a PV with 64MB and 1 vCPU and sometime with double amount of RAM. The stats: Tempest run: 22 Tempest run time for each run: ~3000s Tempest number of test: 1143 after 22 run of tempest: QEMU start: 3352 number of run that took more than 2s: 20 number of run that took more than 9s: 6 maximum start time: 10.973713s I have gathered the QEMU start time by having strace running for each of them. I have then look at the time it took from the first syscall execve('qemu') until the syscall where QEMU respond on its QMP socket (libxl have acknoledge that QEMU is running at that time). -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote: On 02/07/15 09:30, Dario Faggioli wrote: It is, therefore, not effective in making sure that, even with only one notification, you only kick the interested vcpu. This is the third time that I ask: (1) whether it is possible to have more vcpus queued on one pcpu PI blocked list with desc.on (I really believe it is); (2) if yes, whether it is TheRightThing(TM) to kick all of them, as soon as any notification arrives, instead that putting together a mechanism for kicking only a specific one. We will receive one NV for every time the hardware managed to successfully set desc.on Right, I see it now, thanks. If multiple stack up and we proactively drain the list, we will subsequently search the list to completion for all remaining NV's, due to finding no appropriate entries. I can't currently decide whether this will be quicker or slower overall, or (most likely) it will even out to equal in the general case. Well, given the thing works as you (two) just described, I think draining the list is the only thing we can do. In fact, AFAICT, since we can't know for what vcpu a particular notification is intended, we don't have alternatives to waking them all, do we? Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/3] xen/domain_page: Convert map_domain_page_global() to using mfn_t
From: Andrew Cooper andrew.coop...@citrix.com The sh_map/unmap wrappers can be dropped, and take the opportunity to turn some #define's into static inlines, for added type saftey. As part of adding the type safety, GCC highlights an problematic include cycle with arm/mm.h including domain_page.h which includes xen/mm.h and falls over __page_to_mfn being used before being declared. Simply dropping the inclusion of domain_page.h fixes the compilation issue. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Jan Beulich jbeul...@suse.com CC: Tim Deegan t...@xen.org CC: Ian Campbell ian.campb...@citrix.com CC: Stefano Stabellini stefano.stabell...@citrix.com --- xen/arch/arm/mm.c| 6 ++ xen/arch/x86/domain_page.c | 9 - xen/arch/x86/mm/shadow/multi.c | 10 +- xen/arch/x86/mm/shadow/private.h | 12 xen/include/asm-arm/mm.h | 1 - xen/include/xen/domain_page.h| 22 +- 6 files changed, 28 insertions(+), 32 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index ff1b330..d479048 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -271,11 +271,9 @@ void clear_fixmap(unsigned map) } #ifdef CONFIG_DOMAIN_PAGE -void *map_domain_page_global(unsigned long mfn) +void *map_domain_page_global(mfn_t mfn) { -mfn_t m = _mfn(mfn); - -return vmap(m, 1); +return vmap(mfn, 1); } void unmap_domain_page_global(const void *va) diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index d684b2f..0f7548b 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -302,17 +302,16 @@ int mapcache_vcpu_init(struct vcpu *v) return 0; } -void *map_domain_page_global(unsigned long mfn) +void *map_domain_page_global(mfn_t mfn) { -mfn_t m = _mfn(mfn); ASSERT(!in_irq() local_irq_is_enabled()); #ifdef NDEBUG -if ( mfn = PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) -return mfn_to_virt(mfn); +if ( mfn_x(mfn) = PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) +return mfn_to_virt(mfn_x(mfn)); #endif -return vmap(m, 1); +return vmap(mfn, 1); } void unmap_domain_page_global(const void *ptr) diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 42204d9..54d0bd3 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -3806,7 +3806,7 @@ sh_detach_old_tables(struct vcpu *v) if ( v-arch.paging.shadow.guest_vtable ) { if ( shadow_mode_external(d) || shadow_mode_translate(d) ) -sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); +unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); v-arch.paging.shadow.guest_vtable = NULL; } #endif // !NDEBUG @@ -3977,8 +3977,8 @@ sh_update_cr3(struct vcpu *v, int do_locking) if ( shadow_mode_external(d) || shadow_mode_translate(d) ) { if ( v-arch.paging.shadow.guest_vtable ) -sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); -v-arch.paging.shadow.guest_vtable = sh_map_domain_page_global(gmfn); +unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); +v-arch.paging.shadow.guest_vtable = map_domain_page_global(gmfn); /* PAGING_LEVELS==4 implies 64-bit, which means that * map_domain_page_global can't fail */ BUG_ON(v-arch.paging.shadow.guest_vtable == NULL); @@ -4010,8 +4010,8 @@ sh_update_cr3(struct vcpu *v, int do_locking) if ( shadow_mode_external(d) || shadow_mode_translate(d) ) { if ( v-arch.paging.shadow.guest_vtable ) -sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); -v-arch.paging.shadow.guest_vtable = sh_map_domain_page_global(gmfn); +unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); +v-arch.paging.shadow.guest_vtable = map_domain_page_global(gmfn); /* Does this really need map_domain_page_global? Handle the * error properly if so. */ BUG_ON(v-arch.paging.shadow.guest_vtable == NULL); /* XXX */ diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h index f72ea9f..eff39dc 100644 --- a/xen/arch/x86/mm/shadow/private.h +++ b/xen/arch/x86/mm/shadow/private.h @@ -517,18 +517,6 @@ sh_unmap_domain_page(void *p) unmap_domain_page(p); } -static inline void * -sh_map_domain_page_global(mfn_t mfn) -{ -return map_domain_page_global(mfn_x(mfn)); -} - -static inline void -sh_unmap_domain_page_global(void *p) -{ -unmap_domain_page_global(p); -} - /**/ /* Shadow-page refcounting. */ diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 3601140..2e1f21a 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -5,7 +5,6 @@ #include xen/kernel.h #include asm/page.h #include public/xen.h -#include
Re: [Xen-devel] [PATCH] Refactor ioreq server for better performance
[snip] Thanks, Paul. Well, I agree the former approach would be simpler. But I still doubt if this is more reasonable. :) IIUC, one of the reasons for struct domain to have a rangeset list(and a spinlock - rangesets_lock), is because there are iomem_caps and irq_caps for each domain. These 2 rangeset members of struct domain are platform independent. However, struct rb_rangeset is only supposed to be used in ioreq server, which is only for x86 hvm cases. Adding a rb_rangeset list member(similarly, if so, a rb_rangesets_lock is also required) in struct domain maybe useless for hardware domain and for platforms other than x86. Fair enough. So, I'd like to register a new debug key, to dump the ioreq server informations, just like the keys to dump iommu p2m table or the irq mappings. With a new debug key, we do not need to add a spinlock for rb_rangeset in struct domain, the one in ioreq server would be enough. Does this sound reasonable? That would be ok with me, but I'm not sure about claiming a whole debug key for this. Is there any other one that you could piggy-back on? If not, then maybe just make it part of the 'q' output. Thanks, my new implementation uses the 'q' debug key. Will send out the new version later. :) Yu Paul [snip] ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl: Increase device model startup timeout to 1min.
Anthony PERARD writes (Re: [PATCH] libxl: Increase device model startup timeout to 1min.): I have tested an increase timeout this night. And here are the result. The machine is a AMD Opteron(tm) Processor 4284, with 8G of RAM and 8 pCPU. It's running Ubuntu 14.04, with Xen 4.4. On top of that, OpenStack have been deployed via devstack on a single. The test is to run Tempest with --concurrency=4. There are 4 tests runned in parallel, but they don't necessarly start a VM. When they do, it's a PV with 64MB and 1 vCPU and sometime with double amount of RAM. The stats: Tempest run: 22 Tempest run time for each run: ~3000s Tempest number of test: 1143 after 22 run of tempest: QEMU start: 3352 number of run that took more than 2s: 20 number of run that took more than 9s: 6 maximum start time: 10.973713s I have gathered the QEMU start time by having strace running for each of them. I have then look at the time it took from the first syscall execve('qemu') until the syscall where QEMU respond on its QMP socket (libxl have acknoledge that QEMU is running at that time). Thanks for this information. So from what you say it appears that we are running at most 4 copies of libxl and qemu in parallel, along with at most 4 VMs ? And out of 3352 qemu starts, we have = 2s 3332 2s = 9s 14 9s 6 ? Do you have any information about the maximum system load in general ? dom0 load, vcpu overcommit, etc. ? Do you know what the guests are doing ? I'm starting to think that this might be a real bug but that the bug might be Linux's I/O subsystem sometimes produces appalling latency under load (which is hardly news). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On Thu, 2015-07-02 at 13:16 +0100, Andrew Cooper wrote: On 02/07/15 13:04, Dario Faggioli wrote: On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote: I can't currently decide whether this will be quicker or slower overall, or (most likely) it will even out to equal in the general case. Well, given the thing works as you (two) just described, I think draining the list is the only thing we can do. In fact, AFAICT, since we can't know for what vcpu a particular notification is intended, we don't have alternatives to waking them all, do we? Perhaps you misunderstand. I'm quite sure I was. While I think now I'm getting it. Every single vcpu has a PI descriptor which is shared memory with hardware. Right. A NV is delivered strictly when hardware atomically changes desc.on from 0 to 1. i.e. the first time that an oustanding notification arrives. (iirc, desc.on is later cleared by hardware when the vcpu is scheduled and the vector(s) actually injected.) Part of the scheduling modifications alter when a vcpu is eligible to have NV's delivered on its behalf. non-scheduled vcpus get NV's while scheduled vcpus have direct injection instead. Blocked vcpus, AFAICT. But that's not relevant here. Therefore, in the case that an NV arrives, we know for certain that one of the NV-eligible vcpus has had desc.on set by hardware, and we can uniquely identify it by searching for the vcpu for which desc.on is set. Yeah, but we ca have more than one of them. You said I can't currently decide whether this will be quicker or slower, which I read like you were suggesting that not draining the queue was a plausible alternative, while I now think it's not. Perhaps you were not meaning anything like that, so it was not necessary for me to point this out, in which case, sorry for the noise. :-) In the case of stacked NV's, we cannot associate which specific vcpu caused which NV, but we know that we will get one NV per vcpu needing kicking. Exactly, and that's what I'm talking about, and why I'm saying that waking everyone is the only solution. The bottom line being that, even in case this is deemed too slow, we don't have the option of waking only one vcpu at each NV, as we wouldn't know who to wake, and hence we'd need to make things faster in some other way. Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()
On 02/07/15 13:43, Ben Catterall wrote: Removed as they were wrappers around map_domain_page() to make it appear to take an mfn_t type. Signed-off-by: Ben Catterall ben.catter...@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com In the future, a patch like this should either state it is dependent on another series, or in this case, probably be a 4/4 on the existing series. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On 02/07/15 13:04, Dario Faggioli wrote: On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote: On 02/07/15 09:30, Dario Faggioli wrote: It is, therefore, not effective in making sure that, even with only one notification, you only kick the interested vcpu. This is the third time that I ask: (1) whether it is possible to have more vcpus queued on one pcpu PI blocked list with desc.on (I really believe it is); (2) if yes, whether it is TheRightThing(TM) to kick all of them, as soon as any notification arrives, instead that putting together a mechanism for kicking only a specific one. We will receive one NV for every time the hardware managed to successfully set desc.on Right, I see it now, thanks. If multiple stack up and we proactively drain the list, we will subsequently search the list to completion for all remaining NV's, due to finding no appropriate entries. I can't currently decide whether this will be quicker or slower overall, or (most likely) it will even out to equal in the general case. Well, given the thing works as you (two) just described, I think draining the list is the only thing we can do. In fact, AFAICT, since we can't know for what vcpu a particular notification is intended, we don't have alternatives to waking them all, do we? Perhaps you misunderstand. Every single vcpu has a PI descriptor which is shared memory with hardware. A NV is delivered strictly when hardware atomically changes desc.on from 0 to 1. i.e. the first time that an oustanding notification arrives. (iirc, desc.on is later cleared by hardware when the vcpu is scheduled and the vector(s) actually injected.) Part of the scheduling modifications alter when a vcpu is eligible to have NV's delivered on its behalf. non-scheduled vcpus get NV's while scheduled vcpus have direct injection instead. Therefore, in the case that an NV arrives, we know for certain that one of the NV-eligible vcpus has had desc.on set by hardware, and we can uniquely identify it by searching for the vcpu for which desc.on is set. In the case of stacked NV's, we cannot associate which specific vcpu caused which NV, but we know that we will get one NV per vcpu needing kicking. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs
On Fri, Jun 26, 2015 at 8:35 PM, Julien Grall julien.gr...@citrix.com wrote: Hi Vijay, On 26/06/2015 14:54, Vijay Kilari wrote: On Tue, Jun 23, 2015 at 8:02 PM, Julien Grall julien.gr...@citrix.com wrote: Hi Vijay, On 22/06/15 13:01, vijay.kil...@gmail.com wrote: From: Vijaya Kumar K vijaya.ku...@caviumnetworks.com Implements hw_irq_controller api's required to handle LPI's This patch doesn't hw_irq_controller for LPI but just hack around the current GICv3 host hw_irq_controller. As said on the previous version, the goal of hw_irq_controller is too keep things simple (i.e few conditional code). Please introduce a separate hw_irq_controller for LPIs. If new hw_irq_controller is introduced for LPIs, then this has to be exported using some lpi structure which holds pointer to hw_irq_controller for guest host type similar to gic_hw_ops The interface is not set in stone, you are free to change what you want as long as we keep something clean and comprehensible. It's the same for the functions (I have in mind route_irq_to_guest). In this case, I would prefer to see 2 callbacks (one for the host the other for the guest) which return the correct IRQ controller for a specific IRQ. I have in mind something like: get_guest_hw_irq_controller(unsigned int irq) { if ( !is_lpi ) return gicv3_guest_irq_controller else return gicv3_guest_lpi_controller } Same for the host irq controller. So the selection of the IRQ controller would be hidden from gic.c and keep the code a generic as possible. +/* + * Make the above write visible to the redistributors. + * And yes, we're flushing exactly: One. Single. Byte. + * Humpf... + */ +if ( gic_rdists-flags RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING ) +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg)); +else +dsb(ishst); + +/* Get collection id for this event id */ +col = its_dev-its-collections[virq % num_online_cpus()]; This is fragile, you are assuming that num_online_cpus() will never change. Why don't you store the collection in every irq_desc? This will add additional 8 bytes overhead for each irq_desc. Also is there a macro to get number of actual number processors in system.? AUI, nr_cpu_ids always returns 128 AFAIU, nr_cpu_ids should reflect the number of CPU of the platform. x86 correctly set it when parsing the ACPI. So I think this is a bug in the ARM code. In fact, I wasn't able to find a place in the ARM code where this value is changed. nr_cpu_ids is not changed in case of ARM. I think this value has to be updated in start_xen with cpus value void __init start_xen(unsigned long boot_phys_offset, unsigned long fdt_paddr, unsigned long cpuid) cpus = smp_get_max_cpus(); .. } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access
George Dunlap george.dun...@eu.citrix.com writes: On 07/02/2015 12:25 PM, Tim Deegan wrote: At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote: On 02/07/15 11:48, George Dunlap wrote: Now in p2m_set_mem_access(), rather than just using an unsigned long in the loop iterating over gfns, you do this thing where you convert gfn_t to unsigned long, add one, and then convert it back to gfn_t again. I can't see any comments in v3 that suggest you doing that, and it seems a bit clunky. Is that really necessary? Wouldn't it be better to declare a local variable? I'm not strongly opinionated on this one, it just seems a bit strange. Everything else looks good, thanks. Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for typesafety don't allow for simply adding 1 to a typesafe variable. In a cases like this, I think it is acceptable to keep a unsigned long shadow variable and manipulate it is a plain integer. The eventual _gfn() required to pass it further down the callchain will help to visually re-enforce the appropriate type. After all, the entire point of these typesafes are to try and avoid accidentally mixing up the different address spaces, but a function which takes a typesafe, loops over a subset and passes the same typesafe further down can probably be trusted to DTRT, catching errors at review time. Off the top of my head, the only functions which would normally expect to mix and match the typesafes are the pagetable walking ones. It should be easy enough to extend the macros to define a gfn_inc(gfn_t) operator for this kind of thing. I was thinking that -- although in this case you'd still need to un-pack it to do the loop exit conditional. To really make things pretty you'd want a for_gfn_range() macro or something like that that takes a start gfn and a number. But that's really starting to be feature creep for this patch, which is why I didn't want to suggest it on v4. :-) Well, if you look at what I was fixing in v1 ... :-) I suggest we add a local unsigned long here and close the deal, when {g,m,p}fn_{inc,dec} macros are available we'll get rid of it. -- Vitaly ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs
On Thu, 2015-07-02 at 17:51 +0530, Vijay Kilari wrote: On Mon, Jun 29, 2015 at 5:29 PM, Ian Campbell ian.campb...@citrix.com wrote: On Tue, 2015-06-23 at 15:32 +0100, Julien Grall wrote: [...] +{ +struct its_collection *col; +struct its_device *its_dev = get_irq_device(desc); +u8 *cfg; +u32 virq = irq_to_virq(desc); + +ASSERT(virq its_dev-nr_lpis); + +cfg = gic_rdists-prop_page + desc-irq - NR_GIC_LPI; +if ( enable ) +*cfg |= LPI_PROP_ENABLED; +else +*cfg = ~LPI_PROP_ENABLED; + +/* + * Make the above write visible to the redistributors. + * And yes, we're flushing exactly: One. Single. Byte. + * Humpf... + */ +if ( gic_rdists-flags RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING ) +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg)); +else +dsb(ishst); + +/* Get collection id for this event id */ +col = its_dev-its-collections[virq % num_online_cpus()]; This is fragile, you are assuming that num_online_cpus() will never change. Why don't you store the collection in every irq_desc? The original Linux code upon which this is based doesn't seem to need to lookup the collection here, why is flushing needed for us but not Linux? We are writing to lpi property table. Even linux code flushes it. Sorry I was referring to the collection look up and inv, not the cache flush, i.e. this bit: +/* Get collection id for this event id */ +col = its_dev-its-collections[virq % num_online_cpus()]; +its_send_inv(its_dev, col, virq); Linux doesn't seem to do that INV there. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()
Removed as they were wrappers around map_domain_page() to make it appear to take an mfn_t type. Signed-off-by: Ben Catterall ben.catter...@citrix.com --- xen/arch/x86/mm/hap/hap.c| 4 +- xen/arch/x86/mm/shadow/common.c | 22 +++--- xen/arch/x86/mm/shadow/multi.c | 152 +++ xen/arch/x86/mm/shadow/private.h | 13 xen/include/asm-x86/hap.h| 15 5 files changed, 89 insertions(+), 117 deletions(-) diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index d0d3f1e..63980af 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -395,7 +395,7 @@ static void hap_install_xen_entries_in_l4(struct vcpu *v, mfn_t l4mfn) struct domain *d = v-domain; l4_pgentry_t *l4e; -l4e = hap_map_domain_page(l4mfn); +l4e = map_domain_page(l4mfn); /* Copy the common Xen mappings from the idle domain */ memcpy(l4e[ROOT_PAGETABLE_FIRST_XEN_SLOT], @@ -411,7 +411,7 @@ static void hap_install_xen_entries_in_l4(struct vcpu *v, mfn_t l4mfn) l4e[l4_table_offset(LINEAR_PT_VIRT_START)] = l4e_from_pfn(mfn_x(l4mfn), __PAGE_HYPERVISOR); -hap_unmap_domain_page(l4e); +unmap_domain_page(l4e); } static mfn_t hap_make_monitor_table(struct vcpu *v) diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c index da6b847..3da9767 100644 --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -781,11 +781,11 @@ static void oos_hash_add(struct vcpu *v, mfn_t gmfn) if ( swap ) SWAP(oos_snapshot[idx], oos_snapshot[oidx]); -gptr = sh_map_domain_page(oos[oidx]); -gsnpptr = sh_map_domain_page(oos_snapshot[oidx]); +gptr = map_domain_page(oos[oidx]); +gsnpptr = map_domain_page(oos_snapshot[oidx]); memcpy(gsnpptr, gptr, PAGE_SIZE); -sh_unmap_domain_page(gptr); -sh_unmap_domain_page(gsnpptr); +unmap_domain_page(gptr); +unmap_domain_page(gsnpptr); } /* Remove an MFN from the list of out-of-sync guest pagetables */ @@ -1498,7 +1498,7 @@ mfn_t shadow_alloc(struct domain *d, p = __map_domain_page(sp); ASSERT(p != NULL); clear_page(p); -sh_unmap_domain_page(p); +unmap_domain_page(p); INIT_PAGE_LIST_ENTRY(sp-list); page_list_add(sp, tmp_list); sp-u.sh.type = shadow_type; @@ -2524,7 +2524,7 @@ static int sh_remove_shadow_via_pointer(struct domain *d, mfn_t smfn) if (sp-up == 0) return 0; pmfn = _mfn(sp-up PAGE_SHIFT); ASSERT(mfn_valid(pmfn)); -vaddr = sh_map_domain_page(pmfn); +vaddr = map_domain_page(pmfn); ASSERT(vaddr); vaddr += sp-up (PAGE_SIZE-1); ASSERT(l1e_get_pfn(*(l1_pgentry_t *)vaddr) == mfn_x(smfn)); @@ -2554,7 +2554,7 @@ static int sh_remove_shadow_via_pointer(struct domain *d, mfn_t smfn) default: BUG(); /* Some wierd unknown shadow type */ } -sh_unmap_domain_page(vaddr); +unmap_domain_page(vaddr); if ( rc ) perfc_incr(shadow_up_pointer); else @@ -3028,7 +3028,7 @@ int shadow_enable(struct domain *d, u32 mode) e[i] = ((0x40U * i) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); -sh_unmap_domain_page(e); +unmap_domain_page(e); pg-u.inuse.type_info = PGT_l2_page_table | 1 | PGT_validated; } @@ -3631,8 +3631,8 @@ int shadow_track_dirty_vram(struct domain *d, if ( sl1mfn != map_mfn ) { if ( map_sl1p ) -sh_unmap_domain_page(map_sl1p); -map_sl1p = sh_map_domain_page(_mfn(sl1mfn)); +unmap_domain_page(map_sl1p); +map_sl1p = map_domain_page(_mfn(sl1mfn)); map_mfn = sl1mfn; } sl1e = map_sl1p + (sl1ma ~PAGE_MASK); @@ -3663,7 +3663,7 @@ int shadow_track_dirty_vram(struct domain *d, } if ( map_sl1p ) -sh_unmap_domain_page(map_sl1p); +unmap_domain_page(map_sl1p); memcpy(dirty_bitmap, dirty_vram-dirty_bitmap, dirty_size); memset(dirty_vram-dirty_bitmap, 0, dirty_size); diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 2e3d3f6..4c8badf 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -221,16 +221,16 @@ shadow_check_gwalk(struct vcpu *v, unsigned long va, walk_t *gw, int version) #if GUEST_PAGING_LEVELS = 4 /* 64-bit only... */ l4p = (guest_l4e_t *)v-arch.paging.shadow.guest_vtable; mismatch |= (gw-l4e.l4 != l4p[guest_l4_table_offset(va)].l4); -l3p = sh_map_domain_page(gw-l3mfn); +l3p = map_domain_page(gw-l3mfn); mismatch |= (gw-l3e.l3 != l3p[guest_l3_table_offset(va)].l3); -sh_unmap_domain_page(l3p); +
Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs
On Thu, Jul 2, 2015 at 6:05 PM, Ian Campbell ian.campb...@citrix.com wrote: On Thu, 2015-07-02 at 17:51 +0530, Vijay Kilari wrote: On Mon, Jun 29, 2015 at 5:29 PM, Ian Campbell ian.campb...@citrix.com wrote: On Tue, 2015-06-23 at 15:32 +0100, Julien Grall wrote: [...] +{ +struct its_collection *col; +struct its_device *its_dev = get_irq_device(desc); +u8 *cfg; +u32 virq = irq_to_virq(desc); + +ASSERT(virq its_dev-nr_lpis); + +cfg = gic_rdists-prop_page + desc-irq - NR_GIC_LPI; +if ( enable ) +*cfg |= LPI_PROP_ENABLED; +else +*cfg = ~LPI_PROP_ENABLED; + +/* + * Make the above write visible to the redistributors. + * And yes, we're flushing exactly: One. Single. Byte. + * Humpf... + */ +if ( gic_rdists-flags RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING ) +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg)); +else +dsb(ishst); + +/* Get collection id for this event id */ +col = its_dev-its-collections[virq % num_online_cpus()]; This is fragile, you are assuming that num_online_cpus() will never change. Why don't you store the collection in every irq_desc? The original Linux code upon which this is based doesn't seem to need to lookup the collection here, why is flushing needed for us but not Linux? We are writing to lpi property table. Even linux code flushes it. Sorry I was referring to the collection look up and inv, not the cache flush, i.e. this bit: +/* Get collection id for this event id */ +col = its_dev-its-collections[virq % num_online_cpus()]; +its_send_inv(its_dev, col, virq); Linux doesn't seem to do that INV there. Linux does INV. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/irqchip/irq-gic-v3-its.c?id=refs/tags/v4.1 line 555 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/7] libxc: fix uninitialized variable in xc_cpuid_pv_policy()
On 01/07/15 18:37, Jennifer Herbert wrote: If xc_domain_get_guest_width were to fail, guest_width is not set, and hence guest_64bit becomes undefined. Fix is to initialise to 0, and report error if call fails. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xc_cpuid_x86.c |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c index c97f91a..847b701 100644 --- a/tools/libxc/xc_cpuid_x86.c +++ b/tools/libxc/xc_cpuid_x86.c @@ -437,14 +437,16 @@ static void xc_cpuid_pv_policy( Urgh - let another libxc function which can fail hard in several ways, yet is void. { DECLARE_DOMCTL; unsigned int guest_width; -int guest_64bit; +int guest_64bit = 0; The default in Xen is that a PV guest is 64bit until explicitly converted to being compat. The better default therefore is 1. char brand[13]; uint64_t xfeature_mask; xc_cpuid_brand_get(brand); -xc_domain_get_guest_width(xch, domid, guest_width); -guest_64bit = (guest_width == 8); +if (xc_domain_get_guest_width(xch, domid, guest_width) == 0) +guest_64bit = (guest_width == 8); +else +ERROR(Could not read guest word width.); No full stop please. /* Detecting Xen's atitude towards XSAVE */ memset(domctl, 0, sizeof(domctl)); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
-Original Message- From: Dario Faggioli [mailto:dario.faggi...@citrix.com] Sent: Thursday, July 02, 2015 8:04 PM To: Andrew Cooper Cc: Wu, Feng; Tian, Kevin; k...@xen.org; george.dun...@eu.citrix.com; xen-devel@lists.xen.org; jbeul...@suse.com; Zhang, Yang Z Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote: On 02/07/15 09:30, Dario Faggioli wrote: It is, therefore, not effective in making sure that, even with only one notification, you only kick the interested vcpu. This is the third time that I ask: (1) whether it is possible to have more vcpus queued on one pcpu PI blocked list with desc.on (I really believe it is); (2) if yes, whether it is TheRightThing(TM) to kick all of them, as soon as any notification arrives, instead that putting together a mechanism for kicking only a specific one. We will receive one NV for every time the hardware managed to successfully set desc.on Right, I see it now, thanks. If multiple stack up and we proactively drain the list, we will subsequently search the list to completion for all remaining NV's, due to finding no appropriate entries. I can't currently decide whether this will be quicker or slower overall, or (most likely) it will even out to equal in the general case. Well, given the thing works as you (two) just described, I think draining the list is the only thing we can do. In fact, AFAICT, since we can't know for what vcpu a particular notification is intended, Exactly, when notification event happens, the hardware sets 'ON', software will find the vCPU with 'ON' set, in fact, software doesn't know which vCPU the wakeup event is targeting, the only thing it can do is kicking the vCPUs with desc.on = 1. Thanks, Feng we don't have alternatives to waking them all, do we? Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 11/22] xen/x86: allow disabling emulated devices for HVM guests
On 07/02/2015 07:49 AM, Stefano Stabellini wrote: On Wed, 1 Jul 2015, Andrew Cooper wrote: On 01/07/15 17:13, Stefano Stabellini wrote: On Wed, 1 Jul 2015, Andrew Cooper wrote: On 01/07/15 16:51, Boris Ostrovsky wrote: On 07/01/2015 11:46 AM, Andrew Cooper wrote: On 01/07/15 15:46, Roger Pau Monne wrote: Introduce a new DOMCTL flag that can be used to disable device emulation inside of Xen for HVM guests. The following emulated devices are disabled when the XEN_DOMCTL_CDF_noemu is used: hpet, pmtimer, rtc, ioapic, lapic, pic and pmu. Also all the MMIO handlers are disabled. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: Suravee Suthikulpanit suravee.suthikulpa...@amd.com Cc: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com Cc: Jun Nakajima jun.nakaj...@intel.com Cc: Eddie Dong eddie.d...@intel.com Cc: Kevin Tian kevin.t...@intel.com I would be hesitant to have a blanket change like this. Consider APICV/AVIC. For performance reasons, we absolutely want HVM and PVH to make use of them, as they are substantially more efficient using hardware support than evening using plain evtchn hypercalls. However, the flipside is that we must provide an LAPIC emulation to cover the bits which hardware cannot virtualise. As a random idea, how about having a new hypercall or hvmparam which provides a bitmap of permitted emulators? This would allow far finer grain control over what is and isn't available to a domain. I think we also need to decide on which subsets of emulators we are going to support, otherwise test matrix will become pretty big. For example, initially we may want to allow all (for what we now call HVM) or none (PVH). Right, but that can currently be enforced with an if ( arg != 0 arg != ~0 ) return -EOPNOTSUPP; in the hypercall handler for now. It still leaves us with the ability to add in LAPIC emulation in the future by changing the auditing. A blanket no emulation boolean is very much harder to relax in the future. APICV is a bit of a special case, because it is partially virtualized in hardware. Not in the slightest. It is *exactly* the same as existing hardware virt. I thought we were speaking about emulation, specifically regarding device emulation in Xen x86, such as the hpet for example. In this context APICV is a bit of a special case. Are there other devices being partially virtualized in hardware on x86? (I admit that I haven't follow x86 development that closely.) Hardware does most of the work, but occasionally needs to break into Xen to mange thing. The difference is that we don't call some of the existing vmexits emulating an x86 cpu, despite this being what is actually happening. To me, that is different. But in general, considering that the whole purpose of PVH as DomU is security From kernel perspective, the major reason for having PVH is to move away from PV memory management (in the long term) -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs
On Mon, Jun 29, 2015 at 5:29 PM, Ian Campbell ian.campb...@citrix.com wrote: On Tue, 2015-06-23 at 15:32 +0100, Julien Grall wrote: [...] +{ +struct its_collection *col; +struct its_device *its_dev = get_irq_device(desc); +u8 *cfg; +u32 virq = irq_to_virq(desc); + +ASSERT(virq its_dev-nr_lpis); + +cfg = gic_rdists-prop_page + desc-irq - NR_GIC_LPI; +if ( enable ) +*cfg |= LPI_PROP_ENABLED; +else +*cfg = ~LPI_PROP_ENABLED; + +/* + * Make the above write visible to the redistributors. + * And yes, we're flushing exactly: One. Single. Byte. + * Humpf... + */ +if ( gic_rdists-flags RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING ) +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg)); +else +dsb(ishst); + +/* Get collection id for this event id */ +col = its_dev-its-collections[virq % num_online_cpus()]; This is fragile, you are assuming that num_online_cpus() will never change. Why don't you store the collection in every irq_desc? The original Linux code upon which this is based doesn't seem to need to lookup the collection here, why is flushing needed for us but not Linux? We are writing to lpi property table. Even linux code flushes it. I'm also confused by the use of the variable name virq in a function called set_lpi_config which appears to be dealing with host level physical LPIs. It seems like this function would be broken for LPIs which were delivered to Xen and not to a guest, and that the irq_to_virq in here ought to be desc-irq, no? I am using desc-irq for updating lpi property table but using vid to send inv command BTW, the Linux original calls what this calls virq id instead, which is much less confusing. OK will rename virq to vid ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/6] add xsaves/xrstors support
On 02/07/15 15:02, Shuai Ruan wrote: This patchset enable xsaves/xrstors feature. It includes tree parts: 1. add xsaves/xrstors for xen. 2. add xsaves/xrstors for pv guest. 3. add xsaves/xrstors for hvn guest. What is xsaves/xrstors and why might I want Xen to use it? What advantages does it give? When might these instructions be available? Where can I read more details about this? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 3/3] Convert map_domain_page() to use the new mfn_t type
Reworked the internals and declaration, applying (un)boxing where needed. Converted calls to map_domain_page() to provide mfn_t types, boxing where needed. Signed-off-by: Ben Catterall ben.catter...@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- Changed since v1: * Created paddr_to_mfn() and mfn_to_paddr() for both x86 and ARM * Converted code to use the new paddr_to_mfn() rather than e.g. paddrPAGE_SHIFT Signed-off-by: Ben Catterall ben.catter...@citrix.com --- xen/arch/arm/domain_build.c | 2 +- xen/arch/arm/kernel.c | 2 +- xen/arch/arm/mm.c | 12 +- xen/arch/arm/p2m.c| 4 ++-- xen/arch/arm/traps.c | 4 ++-- xen/arch/x86/debug.c | 10 xen/arch/x86/domain.c | 4 ++-- xen/arch/x86/domain_build.c | 10 xen/arch/x86/domain_page.c| 22 - xen/arch/x86/domctl.c | 2 +- xen/arch/x86/mm.c | 40 +++ xen/arch/x86/mm/guest_walk.c | 2 +- xen/arch/x86/mm/hap/guest_walk.c | 2 +- xen/arch/x86/mm/mem_sharing.c | 4 ++-- xen/arch/x86/mm/p2m-ept.c | 22 - xen/arch/x86/mm/p2m-pod.c | 8 +++ xen/arch/x86/mm/p2m-pt.c | 28 +++--- xen/arch/x86/mm/p2m.c | 2 +- xen/arch/x86/mm/paging.c | 32 - xen/arch/x86/mm/shadow/common.c | 2 +- xen/arch/x86/mm/shadow/multi.c| 4 ++-- xen/arch/x86/mm/shadow/private.h | 2 +- xen/arch/x86/smpboot.c| 2 +- xen/arch/x86/tboot.c | 4 ++-- xen/arch/x86/traps.c | 12 +- xen/arch/x86/x86_64/mm.c | 14 +-- xen/arch/x86/x86_64/traps.c | 10 xen/arch/x86/x86_emulate.c| 10 xen/common/grant_table.c | 4 ++-- xen/common/kexec.c| 4 ++-- xen/common/kimage.c | 10 xen/common/memory.c | 6 ++--- xen/common/tmem_xen.c | 6 ++--- xen/drivers/passthrough/amd/iommu_guest.c | 10 xen/drivers/passthrough/amd/iommu_map.c | 14 +-- xen/drivers/passthrough/vtd/x86/vtd.c | 2 +- xen/include/asm-arm/mm.h | 2 ++ xen/include/asm-x86/hap.h | 2 +- xen/include/asm-x86/page.h| 10 +--- xen/include/asm-x86/paging.h | 2 +- xen/include/xen/domain_page.h | 8 +++ 41 files changed, 179 insertions(+), 173 deletions(-) diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index e9cb8a9..37db8b7 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -1322,7 +1322,7 @@ static void initrd_load(struct kernel_info *kinfo) return; } -dst = map_domain_page(maPAGE_SHIFT); +dst = map_domain_page(_mfn(paddr_to_mfn(ma))); copy_from_paddr(dst + s, paddr + offs, l); diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c index 209c3dd..9826fb2 100644 --- a/xen/arch/arm/kernel.c +++ b/xen/arch/arm/kernel.c @@ -182,7 +182,7 @@ static void kernel_zimage_load(struct kernel_info *info) return; } -dst = map_domain_page(maPAGE_SHIFT); +dst = map_domain_page(_mfn(paddr_to_mfn(ma))); copy_from_paddr(dst + s, paddr + offs, l); diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index d479048..ae0f34c 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -213,7 +213,7 @@ void dump_pt_walk(paddr_t ttbr, paddr_t addr, else root_table = 0; -mapping = map_domain_page(root_pfn + root_table); +mapping = map_domain_page(_mfn(root_pfn + root_table)); for ( level = root_level; ; level++ ) { @@ -230,7 +230,7 @@ void dump_pt_walk(paddr_t ttbr, paddr_t addr, /* For next iteration */ unmap_domain_page(mapping); -mapping = map_domain_page(pte.walk.base); +mapping = map_domain_page(_mfn(pte.walk.base)); } unmap_domain_page(mapping); @@ -282,11 +282,11 @@ void unmap_domain_page_global(const void *va) } /* Map a page of domheap memory */ -void *map_domain_page(unsigned long mfn) +void *map_domain_page(mfn_t mfn) { unsigned long flags; lpae_t *map = this_cpu(xen_dommap); -unsigned long slot_mfn = mfn ~LPAE_ENTRY_MASK; +unsigned long slot_mfn = mfn_x(mfn) ~LPAE_ENTRY_MASK; vaddr_t va; lpae_t pte; int i, slot; @@ -339,7 +339,7 @@ void *map_domain_page(unsigned long mfn) va = (DOMHEAP_VIRT_START + (slot SECOND_SHIFT) - + ((mfn LPAE_ENTRY_MASK) THIRD_SHIFT)); +
Re: [Xen-devel] Sharing display between guests
On Thu, 2015-07-02 at 13:13 +0200, Maxime Ripard wrote: Hi, Hi, I don't have much comments on all this, just that, reading, it, it sounded somewhat similar to what GL is doing: http://www.xenproject.org/component/allvideoshare/video/latest/ces15-globalogic.html http://www.xenproject.org/component/allvideoshare/video/globallogic-xen-android.html That would mean that we would have a static composition, that would be setup once and we could forget about it during the life of the system. Well, they use a DomU and Dom0, rather than 2 DomUs (or so I think), but is seems to me that at least some composition is indeed happening. I don't have much more details, I'm afraid, but if you think your use case is related, or at least that you could benefit from interacting with them and sharing experiences/code/whatever, just ask, I've Cc-ed some of them. :-) Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
MAX_NR_IO_RANGES is used by ioreq server as the maximum number of discrete ranges to be tracked. This patch changes its value to 8k, so that more ranges can be tracked on next generation of Intel platforms in XenGT. Future patches can extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES can serve as a default limit. Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com --- xen/include/asm-x86/hvm/domain.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h index ad68fcf..d62fda9 100644 --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -49,7 +49,7 @@ struct hvm_ioreq_vcpu { }; #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1) -#define MAX_NR_IO_RANGES 256 +#define MAX_NR_IO_RANGES 8192 struct hvm_ioreq_server { struct list_head list_entry; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/2] Add new data structure to track ranges.
This patch introduces a new data structure, struct rb_rangeset, to represent a group of continuous ranges, e.g. the start and end addresses for PIO/MMIO regions. By now, this structure is supposed to assist ioreq server to forward the I/O request to backend device models more efficiently. Behavior of this new data structure is quite similar to rangeset, with major difference being the time complexity. Based on doubly linked list, struct rangeset provides O(n) time complexity for searching. And struct rb_rangeset is based on red-black tree, with binary searching, the time complexity is improved to O(log(n)) - more suitable to track massive discrete ranges. Ioreq server code is changed to utilize this new type, and a new routine, hvm_ioreq_server_dump_range_info, is added to dump all the ranges tracked in an ioreq server. Signed-off-by: Yu Zhang yu.c.zh...@linux.intel.com --- xen/arch/x86/domain.c| 3 + xen/arch/x86/hvm/hvm.c | 56 ++-- xen/common/Makefile | 1 + xen/common/rb_rangeset.c | 281 +++ xen/include/asm-x86/hvm/domain.h | 2 +- xen/include/asm-x86/hvm/hvm.h| 1 + xen/include/xen/rb_rangeset.h| 49 +++ 7 files changed, 378 insertions(+), 15 deletions(-) create mode 100644 xen/common/rb_rangeset.c create mode 100644 xen/include/xen/rb_rangeset.h diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index a8fe046..f8a8b80 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2086,6 +2086,9 @@ int domain_relinquish_resources(struct domain *d) void arch_dump_domain_info(struct domain *d) { paging_dump_domain_info(d); + +if ( is_hvm_domain(d) ) +hvm_ioreq_server_dump_range_info(d); } void arch_dump_vcpu_info(struct vcpu *v) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 535d622..c79676e 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -37,6 +37,7 @@ #include xen/wait.h #include xen/mem_access.h #include xen/rangeset.h +#include xen/rb_rangeset.h #include asm/shadow.h #include asm/hap.h #include asm/current.h @@ -818,7 +819,7 @@ static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s, return; for ( i = 0; i NR_IO_RANGE_TYPES; i++ ) -rangeset_destroy(s-range[i]); +rb_rangeset_destroy(s-range[i]); } static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, @@ -842,8 +843,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, if ( rc ) goto fail; -s-range[i] = rangeset_new(s-domain, name, - RANGESETF_prettyprint_hex); +s-range[i] = rb_rangeset_new(name); xfree(name); @@ -851,7 +851,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, if ( !s-range[i] ) goto fail; -rangeset_limit(s-range[i], MAX_NR_IO_RANGES); +rb_rangeset_limit(s-range[i], MAX_NR_IO_RANGES); } done: @@ -1149,7 +1149,7 @@ static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, if ( s-id == id ) { -struct rangeset *r; +struct rb_rangeset *r; switch ( type ) { @@ -1169,10 +1169,10 @@ static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, break; rc = -EEXIST; -if ( rangeset_overlaps_range(r, start, end) ) +if ( rb_rangeset_overlaps_range(r, start, end) ) break; -rc = rangeset_add_range(r, start, end); +rc = rb_rangeset_add_range(r, start, end); break; } } @@ -1200,7 +1200,7 @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, if ( s-id == id ) { -struct rangeset *r; +struct rb_rangeset *r; switch ( type ) { @@ -1220,10 +1220,10 @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, break; rc = -ENOENT; -if ( !rangeset_contains_range(r, start, end) ) +if ( !rb_rangeset_contains_range(r, start, end) ) break; -rc = rangeset_remove_range(r, start, end); +rc = rb_rangeset_remove_range(r, start, end); break; } } @@ -1349,6 +1349,34 @@ static void hvm_destroy_all_ioreq_servers(struct domain *d) spin_unlock(d-arch.hvm_domain.ioreq_server.lock); } +void hvm_ioreq_server_dump_range_info(struct domain *d) +{ +unsigned int i; +struct hvm_ioreq_server *s; + +spin_lock(d-arch.hvm_domain.ioreq_server.lock); + +list_for_each_entry ( s, + d-arch.hvm_domain.ioreq_server.list, + list_entry ) +{ +if ( s ==
[Xen-devel] [PATCH 0/2] Refactor ioreq server for better performance.
XenGT leverages ioreq server to track and forward the accesses to GPU I/O resources, e.g. the PPGTT(per-process graphic translation tables). Currently, ioreq server uses rangeset to track the BDF/ PIO/MMIO ranges to be emulated. To select an ioreq server, the rangeset is searched to see if the I/O range is recorded. However, traversing the link list inside rangeset could be time consuming when number of ranges is too high. On HSW platform, number of PPGTTs for each vGPU could be several hundred. On BDW, this value could be several thousand. To accommodate more ranges, limitation of the number of ranges in an ioreq server, MAX_NR_IO_RANGES is changed - future patches will be provided to tune this with other approaches. And to increase the ioreq server performance, a new data structure, rb_rangeset, is introduced. Yu Zhang (2): Resize the MAX_NR_IO_RANGES for ioreq server Add new data structure to track ranges. xen/arch/x86/domain.c| 3 + xen/arch/x86/hvm/hvm.c | 56 ++-- xen/common/Makefile | 1 + xen/common/rb_rangeset.c | 281 +++ xen/include/asm-x86/hvm/domain.h | 4 +- xen/include/asm-x86/hvm/hvm.h| 1 + xen/include/xen/rb_rangeset.h| 49 +++ 7 files changed, 379 insertions(+), 16 deletions(-) create mode 100644 xen/common/rb_rangeset.c create mode 100644 xen/include/xen/rb_rangeset.h -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access
'pfn' and 'start_pfn' are ambiguous, both these functions expect GFNs as input. On x86 the interface of p2m_set_mem_access() in p2m.c doesn't match the declaration in p2m-common.h as 'pfn' is being used instead of 'start_pfn'. On ARM both p2m_set_mem_access and p2m_get_mem_access interfaces don't match declarations from p2m-common.h: p2m_set_mem_access uses 'pfn' instead of 'start_pfn' and p2m_get_mem_access uses 'gpfn' instead of 'pfn'. Convert p2m_get_mem_access/p2m_set_mem_access (and __p2m_get_mem_access on ARM) interfaces to using gft_t instead of unsigned long and update all users of these functions. There is also an issue in p2m_get_mem_access on x86: 'gfn' parameter passed to gfn_lock/gfn_unlock is not defined. This code compiles only because of a coincidence: gfn_lock/gfn_unlock are currently macros which don't use their second argument. Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com --- Changes since v4: - In p2m_set_mem_access use local unsigned long gfn_l for looping instead of boxing/unboxing [George Dunlap] Changes since v3: - Comment codying style fix [Razvan Cojocaru] - Use INVALID_GFN instead of ~0 and -1 [Andrew Cooper] - Convert p2m_get_mem_access/p2m_set_mem_access interfaces to using gfn_t [Andrew Cooper] Changes since v2: - Instead of adding start_ prefix on ARM remove it on x86 [Jan Beulich, Ian Campbell, Razvan Cojocaru] Changes since v1: - This patch is a successor of '[PATCH] x86/mm: use existing 'pfn' in p2m_get_mem_access', instead of fixing gfn_lock/gfn_unlock arguments we do s/pfn/gfn/g for both p2m_get_mem_access/p2m_set_mem_access [Andrew Cooper, Jan Beulich] P.S. - The patch was compile-tested on x86 and ARM64. --- xen/arch/arm/p2m.c | 33 + xen/arch/x86/mm/p2m.c| 36 xen/common/mem_access.c | 4 ++-- xen/include/xen/p2m-common.h | 13 ++--- 4 files changed, 45 insertions(+), 41 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 903fa3f..6b9ef33 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -436,7 +436,7 @@ static int p2m_create_table(struct domain *d, lpae_t *entry, return 0; } -static int __p2m_get_mem_access(struct domain *d, unsigned long gpfn, +static int __p2m_get_mem_access(struct domain *d, gfn_t gfn, xenmem_access_t *access) { struct p2m_domain *p2m = p2m_get_hostp2m(d); @@ -465,14 +465,14 @@ static int __p2m_get_mem_access(struct domain *d, unsigned long gpfn, return 0; } -/* If request to get default access */ -if ( gpfn == ~0ul ) +/* If request to get default access. */ +if ( gfn_x(gfn) == INVALID_GFN ) { *access = memaccess[p2m-default_access]; return 0; } -i = radix_tree_lookup(p2m-mem_access_settings, gpfn); +i = radix_tree_lookup(p2m-mem_access_settings, gfn_x(gfn)); if ( !i ) { @@ -480,7 +480,7 @@ static int __p2m_get_mem_access(struct domain *d, unsigned long gpfn, * No setting was found in the Radix tree. Check if the * entry exists in the page-tables. */ -paddr_t maddr = p2m_lookup(d, gpfn PAGE_SHIFT, NULL); +paddr_t maddr = p2m_lookup(d, gfn_x(gfn) PAGE_SHIFT, NULL); if ( INVALID_PADDR == maddr ) return -ESRCH; @@ -1386,7 +1386,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) * We do this first as this is faster in the default case when no * permission is set on the page. */ -rc = __p2m_get_mem_access(current-domain, paddr_to_pfn(ipa), xma); +rc = __p2m_get_mem_access(current-domain, _gfn(paddr_to_pfn(ipa)), xma); if ( rc 0 ) goto err; @@ -1590,7 +1590,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, vaddr_t gla, const struct npfec npfec) if ( !p2m-mem_access_enabled ) return true; -rc = p2m_get_mem_access(v-domain, paddr_to_pfn(gpa), xma); +rc = p2m_get_mem_access(v-domain, _gfn(paddr_to_pfn(gpa)), xma); if ( rc ) return true; @@ -1632,13 +1632,13 @@ bool_t p2m_mem_access_check(paddr_t gpa, vaddr_t gla, const struct npfec npfec) /* First, handle rx2rw and n2rwx conversion automatically. */ if ( npfec.write_access xma == XENMEM_access_rx2rw ) { -rc = p2m_set_mem_access(v-domain, paddr_to_pfn(gpa), 1, +rc = p2m_set_mem_access(v-domain, _gfn(paddr_to_pfn(gpa)), 1, 0, ~0, XENMEM_access_rw); return false; } else if ( xma == XENMEM_access_n2rwx ) { -rc = p2m_set_mem_access(v-domain, paddr_to_pfn(gpa), 1, +rc = p2m_set_mem_access(v-domain, _gfn(paddr_to_pfn(gpa)), 1, 0, ~0, XENMEM_access_rwx); } @@ -1660,7 +1660,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, vaddr_t gla, const struct npfec npfec) { /* A listener is not
Re: [Xen-devel] [RFC PATCH v3 08/18] xen/arm: vITS: Add virtual ITS driver
Hi Julien, On Tue, Jun 23, 2015 at 10:09 PM, Julien Grall julien.gr...@citrix.com wrote: Hi Vijay, +struct vits_device { +uint32_t vdevid; +uint32_t pdevid; +struct its_device *its_dev; +struct rb_node node; +}; We spoke about a specific structure in the design [2] but you introduced a new one. Why? Section 6 of DraftG specifies to manage separate tree for device assignment. This helps to manage RB-tree per domain to hold list of devices assigned to this domain index with vdevid. This helps to check if device is assigned to this domain before processing any ITS command with that vdevid. Having everything in the its_device would help to catch a device attached to 2 different domains... One option is to introduce a new variable inside its_device to know to which domain the device is currently assigned. Also, the field pdevid is not vits specific but its. pdevid can be removed as its_device structure already has it Regards Vijay ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Mapping Dom0 page in hypervisor from domctl
Hi all, I'm trying to map a page provided by Dom0 into Xen's address space, but I keep freezing the physical computer when I try. Maybe I'm hitting a spinlock or something, but clearly I'm doing something incorrectly. I'll probably be able to get farther once I have another computer with a serial port to get logging, but hopefully it's something obvious. Dom0 sends a domctl containing a virtual address. I attempt to translate it to a GFN, and then get the page_info structure, and finally map it. Source code: /* Unmap any existing mapping */ if ( d-arch.hvm_domain.dom0_mapping.page_info ) { unmap_domain_page(d-arch.hvm_domain.dom0_mapping.buffer); put_page(d-arch.hvm_domain.dom0_mapping.page_info); d-arch.hvm_domain.dom0_mapping.buffer = NULL; d-arch.hvm_domain.dom0_mapping.page_info = NULL; } if ( domctl-va ) { struct page_info* page_info; uint64_t gfn; uint32_t pfec = 0; gdprintk (XENLOG_DEBUG, Using guest VA 0x%lX\n, domctl-va); /* Validate that the given VA is page-aligned */ if ((domctl-va ~PAGE_MASK) != 0) { gdprintk (XENLOG_DEBUG, VA is not page aligned\n); ret = -EINVAL; break; } /* Translate the given virtual address to the guest frame number */ gfn = paging_gva_to_gfn(current, domctl-va, pfec); if ( gfn == INVALID_GFN ) { gdprintk (XENLOG_DEBUG, Invalid GFN\n); ret = -EFAULT; break; } gdprintk (XENLOG_DEBUG, Translated to gfn 0x%lX\n, gfn); /* Get the page info */ page_info = get_page_from_gfn(current-domain, gfn, NULL, P2M_UNSHARE); if ( !page_info ) { gdprintk (XENLOG_DEBUG, No page info\n); ret = -ESRCH; break; } gdprintk (XENLOG_DEBUG, Translated to mfn 0x%lX\n, page_to_mfn(page_info)); d-arch.hvm_domain.dom0_mapping.page_info = page_info; d-arch.hvm_domain.dom0_mapping.buffer = (uint8_t*)map_domain_page_global(page_to_mfn(page_info)); if ( !d-arch.hvm_domain.dom0_mapping.buffer ) { gdprintk (XENLOG_DEBUG, Map failed\n); put_page(page_info); ret = -ENOMEM; break; } gdprintk (XENLOG_DEBUG, All Mapped: %p\n, d-arch.hvm_domain.dom0_mapping.buffer); } Any insight into what I'm doing wrong would be appreciated! ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/7] libxc: Fix a number of coverity issues.
On 01/07/15 18:37, Jennifer Herbert wrote: Fix a number of coverity issues in libxc. Patches 2 through 7: Reviewed-by: Andrew Cooper andrew.coop...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC 6/6] xen: Add backtrace for serious issues.
On Wed, Jul 01, 2015 at 02:06:30PM +0100, Stefano Stabellini wrote: On Mon, 29 Jun 2015, Konrad Rzeszutek Wilk wrote: When debugging issues that caused the emulator to kill itself or skipping certain operations (unable to write to host registers) an stack trace will most definitly aid in debugging the problem. As such this patch uses the most basic backtrace to print out details. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com I think it could be useful, but it cannot be done as a xen-hvm.c thing. It should be somewhere generic, maybe under util? Stefan, any suggestions? Yes, it seems like a util/ thing. backtrace() and backtrace_symbols_fd() are glibc-specific so it must not break the build on other platforms. I think the reason we've surivived without backtraces so far is because fatal errors are typically handled with abort(3). It causes a core dump so you have the full process state, including backtraces. I'm fine with adding a backtrace function though since it's more lightweight and allows for graceful shutdown or error recovery. pgpXBsKw5J3DS.pgp Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/6] x86/xsaves: enable xsaves/xrstors in xen
This patch uses xsaves/xrstors instead of xsaveopt/xrstor when perform task switch in xen if the feature is supported in hardware. Please note that xsaves/xrstors only use compact format. Signed-off-by: Shuai Ruan shuai.r...@intel.com --- xen/arch/x86/xstate.c| 83 xen/include/asm-x86/xstate.h | 3 +- 2 files changed, 55 insertions(+), 31 deletions(-) diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c index e34eda3..ff67986 100644 --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -14,6 +14,7 @@ #include asm/xstate.h #include asm/asm_defns.h +#define XSTATE_COMPACTION_ENABLED (1ULL 63) static bool_t __read_mostly cpu_has_xsaveopt; static bool_t __read_mostly cpu_has_xsavec; bool_t __read_mostly cpu_has_xgetbv1; @@ -102,7 +103,9 @@ void xsave(struct vcpu *v, uint64_t mask) typeof(ptr-fpu_sse.fip.sel) fcs = ptr-fpu_sse.fip.sel; typeof(ptr-fpu_sse.fdp.sel) fds = ptr-fpu_sse.fdp.sel; -if ( cpu_has_xsaveopt ) +if ( cpu_has_xsaves ) +xsaves(lmask, hmask, ptr); +else if ( cpu_has_xsaveopt ) { /* * xsaveopt may not write the FPU portion even when the respective @@ -155,7 +158,9 @@ void xsave(struct vcpu *v, uint64_t mask) } else { -if ( cpu_has_xsaveopt ) +if ( cpu_has_xsaves ) +xsaves(lmask, hmask, ptr); +else if ( cpu_has_xsaveopt ) asm volatile ( .byte 0x0f,0xae,0x37 : =m (*ptr) : a (lmask), d (hmask), D (ptr) ); @@ -198,36 +203,54 @@ void xrstor(struct vcpu *v, uint64_t mask) switch ( __builtin_expect(ptr-fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) { default: -asm volatile ( 1: .byte 0x48,0x0f,0xae,0x2f\n - .section .fixup,\ax\ \n - 2: mov %5,%%ecx \n - xor %1,%1\n - rep stosb\n - lea %2,%0\n - mov %3,%1\n - jmp 1b \n - .previous \n - _ASM_EXTABLE(1b, 2b) - : +D (ptr), +a (lmask) - : m (*ptr), g (lmask), d (hmask), - m (xsave_cntxt_size) - : ecx ); +if ( cpu_has_xsaves ) +{ +if ( !(v-arch.xsave_area-xsave_hdr.xcomp_bv + XSTATE_COMPACTION_ENABLED) ) +v-arch.xsave_area-xsave_hdr.xcomp_bv = get_xcr0() | +XSTATE_COMPACTION_ENABLED; +xrstors(lmask, hmask, ptr); +} +else +asm volatile ( 1: .byte 0x48,0x0f,0xae,0x2f\n + .section .fixup,\ax\ \n + 2: mov %5,%%ecx \n + xor %1,%1\n + rep stosb\n + lea %2,%0\n + mov %3,%1\n + jmp 1b \n + .previous \n + _ASM_EXTABLE(1b, 2b) + : +D (ptr), +a (lmask) + : m (*ptr), g (lmask), d (hmask), + m (xsave_cntxt_size) + : ecx ); break; case 4: case 2: -asm volatile ( 1: .byte 0x0f,0xae,0x2f\n - .section .fixup,\ax\ \n - 2: mov %5,%%ecx\n - xor %1,%1 \n - rep stosb \n - lea %2,%0 \n - mov %3,%1 \n - jmp 1b \n - .previous \n - _ASM_EXTABLE(1b, 2b) - : +D (ptr), +a (lmask) - : m (*ptr), g (lmask), d (hmask), - m (xsave_cntxt_size) - : ecx ); +if ( cpu_has_xsaves ) +{ +if ( !(v-arch.xsave_area-xsave_hdr.xcomp_bv + XSTATE_COMPACTION_ENABLED) ) +v-arch.xsave_area-xsave_hdr.xcomp_bv = get_xcr0() | +XSTATE_COMPACTION_ENABLED; +xrstors(lmask, hmask, ptr); +} +else +asm volatile ( 1: .byte 0x48,0x0f,0xae,0x2f\n + .section .fixup,\ax\ \n + 2: mov %5,%%ecx \n + xor %1,%1\n + rep stosb\n +
[Xen-devel] [PATCH 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest
This patch enables xsaves for hvm guest, includes: 1.handle xsaves vmcs init and vmexit. 2.add logic to write/read the XSS msr. Signed-off-by: Shuai Ruan shuai.r...@intel.com --- xen/arch/x86/hvm/hvm.c | 40 ++ xen/arch/x86/hvm/vmx/vmcs.c| 7 ++- xen/arch/x86/hvm/vmx/vmx.c | 18 + xen/arch/x86/xstate.c | 4 ++-- xen/include/asm-x86/hvm/vmx/vmcs.h | 5 + xen/include/asm-x86/hvm/vmx/vmx.h | 2 ++ xen/include/asm-x86/xstate.h | 1 + 7 files changed, 74 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 535d622..2958e0d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4269,6 +4269,10 @@ void hvm_hypervisor_cpuid_leaf(uint32_t sub_idx, } } +#define XSAVEOPT (1 0) +#define XSAVEC (1 1) +#define XGETBV (1 2) +#define XSAVES (1 3) void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { @@ -4355,6 +4359,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, *ebx = _eax + _ebx; } } +if ( count == 1 ) +{ +if ( cpu_has_xsaves ) +{ +*ebx = XSTATE_AREA_MIN_SIZE; +if ( v-arch.xcr0 | v-arch.msr_ia32_xss ) +for ( sub_leaf = 2; sub_leaf 63; sub_leaf++ ) +{ +if ( !((v-arch.xcr0 | v-arch.msr_ia32_xss) + (1ULL sub_leaf)) ) +continue; +domain_cpuid(d, input, sub_leaf, _eax, _ebx, _ecx, + _edx); +*ebx = *ebx + _eax; +} +} +else +{ +*eax = ~XSAVES; +if ( !cpu_has_xgetbv1 ) +*eax = ~XGETBV; +if ( !cpu_has_xsavec ) +*eax = ~XSAVEC; +if ( !cpu_has_xsaveopt ) +*eax = ~XSAVEOPT; +*ebx = *ecx = *edx = 0; +} +} break; case 0x8001: @@ -4454,6 +4486,10 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) *msr_content = v-arch.hvm_vcpu.guest_efer; break; +case MSR_IA32_XSS: +*msr_content = v-arch.msr_ia32_xss; +break; + case MSR_IA32_TSC: *msr_content = _hvm_rdtsc_intercept(); break; @@ -4573,6 +4609,10 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content) return X86EMUL_EXCEPTION; break; +case MSR_IA32_XSS: +v-arch.msr_ia32_xss = msr_content; +break; + case MSR_IA32_TSC: hvm_set_guest_tsc(v, msr_content); break; diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 4c5ceb5..8e61e3f 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -230,7 +230,8 @@ static int vmx_init_vmcs_config(void) SECONDARY_EXEC_ENABLE_EPT | SECONDARY_EXEC_ENABLE_RDTSCP | SECONDARY_EXEC_PAUSE_LOOP_EXITING | - SECONDARY_EXEC_ENABLE_INVPCID); + SECONDARY_EXEC_ENABLE_INVPCID | + SECONDARY_EXEC_XSAVES); rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap); if ( _vmx_misc_cap VMX_MISC_VMWRITE_ALL ) opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING; @@ -921,6 +922,7 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val) virtual_vmcs_exit(vvmcs); } +#define VMX_XSS_EXIT_BITMAP 0 static int construct_vmcs(struct vcpu *v) { struct domain *d = v-domain; @@ -1204,6 +1206,9 @@ static int construct_vmcs(struct vcpu *v) __vmwrite(GUEST_PAT, guest_pat); } +if ( cpu_has_vmx_xsaves ) +__vmwrite(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP); + vmx_vmcs_exit(v); /* PVH: paging mode is updated by arch_set_info_guest(). */ diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index fc29b89..7c950b3 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2683,6 +2683,16 @@ static int vmx_handle_apic_write(void) return vlapic_apicv_write(current, exit_qualification 0xfff); } +static void vmx_handle_xsaves(void) +{ +WARN(); +} + +static void vmx_handle_xrstors(void) +{ +WARN(); +} + void vmx_vmexit_handler(struct cpu_user_regs *regs) { unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0; @@ -3201,6 +3211,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) vmx_vcpu_flush_pml_buffer(v); break; +case EXIT_REASON_XSAVES: +vmx_handle_xsaves(); +break; + +case EXIT_REASON_XRSTORS: +
[Xen-devel] [PATCH 6/6] x86/xsaves: detect xsaves/xgetbv in xen
As xsaves/xgetbv already support, so switch on. Signed-off-by: Shuai Ruan shuai.r...@intel.com --- xen/arch/x86/xstate.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c index c20f865..ebf9920 100644 --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -525,15 +525,15 @@ void xstate_init(bool_t bsp) { cpu_has_xsaveopt = !!(eax XSTATE_FEATURE_XSAVEOPT); cpu_has_xsavec = !!(eax XSTATE_FEATURE_XSAVEC); -/* XXX cpu_has_xgetbv1 = !!(eax XSTATE_FEATURE_XGETBV1); */ -/* XXX cpu_has_xsaves = !!(eax XSTATE_FEATURE_XSAVES); */ +cpu_has_xgetbv1 = !!(eax XSTATE_FEATURE_XGETBV1); +cpu_has_xsaves = !!(eax XSTATE_FEATURE_XSAVES); } else { BUG_ON(!cpu_has_xsaveopt != !(eax XSTATE_FEATURE_XSAVEOPT)); BUG_ON(!cpu_has_xsavec != !(eax XSTATE_FEATURE_XSAVEC)); -/* XXX BUG_ON(!cpu_has_xgetbv1 != !(eax XSTATE_FEATURE_XGETBV1)); */ -/* XXX BUG_ON(!cpu_has_xsaves != !(eax XSTATE_FEATURE_XSAVES)); */ +BUG_ON(!cpu_has_xgetbv1 != !(eax XSTATE_FEATURE_XGETBV1)); +BUG_ON(!cpu_has_xsaves != !(eax XSTATE_FEATURE_XSAVES)); } if ( cpu_has_xsaves ) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
This patch emualtes xsaves/xrstors instruction and XSS msr access. As xsaves/xrstors instructions and XSS msr access required be executed only in ring0. So emulation is needed when pv guest use these instructions. Signed-off-by: Shuai Ruan shuai.r...@intel.com --- xen/arch/x86/domain.c | 3 ++ xen/arch/x86/traps.c| 85 + xen/arch/x86/x86_64/mm.c| 52 + xen/arch/x86/xstate.c | 39 +++ xen/include/asm-x86/domain.h| 1 + xen/include/asm-x86/mm.h| 1 + xen/include/asm-x86/msr-index.h | 2 + xen/include/asm-x86/xstate.h| 3 ++ 8 files changed, 186 insertions(+) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index a8fe046..66f8231 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -426,6 +426,7 @@ int vcpu_initialise(struct vcpu *v) /* By default, do not emulate */ v-arch.vm_event.emulate_flags = 0; +v-arch.msr_ia32_xss = 0; rc = mapcache_vcpu_init(v); if ( rc ) @@ -1494,6 +1495,8 @@ static void __context_switch(void) if ( xcr0 != get_xcr0() !set_xcr0(xcr0) ) BUG(); } +if ( cpu_has_xsaves ) +wrmsr_safe(MSR_IA32_XSS, n-arch.msr_ia32_xss); vcpu_restore_fpu_eager(n); n-arch.ctxt_switch_to(n); } diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index ac62f20..227670b 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -2346,6 +2346,80 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) } break; +case 0xc7: +{ +void *xsave_addr; +int not_page_aligned = 0; +u32 guest_xsaves_size = xstate_ctxt_size_compact(v-arch.xcr0); + +switch ( insn_fetch(u8, code_base, eip, code_limit) ) +{ +case 0x2f:/* XSAVES */ +{ +if ( (regs-edi ~PAGE_MASK) + guest_xsaves_size PAGE_SIZE ) +{ +mfn_t mfn_list[2]; +void *va; + +not_page_aligned = 1; +mfn_list[0] = _mfn(do_page_walk_mfn(v, regs-edi)); +mfn_list[1] = _mfn(do_page_walk_mfn(v, + PAGE_ALIGN(regs-edi))); +va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR); +ASSERT(((unsigned long) va ~PAGE_MASK) == 0); +xsave_addr = (void *)((unsigned long)va + + (regs-edi ~PAGE_MASK)); +} +else +xsave_addr = do_page_walk(v, regs-edi); + +if ( !xsave_addr ) +goto fail; + +xsaves(regs-eax, regs-edx, xsave_addr); + +if ( not_page_aligned ) +vunmap((void *)((unsigned long)xsave_addr PAGE_MASK)); +else +unmap_domain_page(xsave_addr); +break; +} +case 0x1f:/* XRSTORS */ +{ +if( (regs-edi ~PAGE_MASK) + guest_xsaves_size PAGE_SIZE ) +{ +mfn_t mfn_list[2]; +void *va; + +not_page_aligned = 1; +mfn_list[0] = _mfn(do_page_walk_mfn(v, regs-edi)); +mfn_list[1] = _mfn(do_page_walk_mfn(v, + PAGE_ALIGN(regs-edi))); +va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR); +ASSERT(((unsigned long) va ~PAGE_MASK) == 0); +xsave_addr = (void *)((unsigned long)va + + (regs-edi ~PAGE_MASK)); +} +else +xsave_addr = do_page_walk(v, regs-edi); + +if ( !xsave_addr ) +goto fail; + +xrstors(regs-eax, regs-edx, xsave_addr); + +if ( not_page_aligned ) +vunmap((void *)((unsigned long)xsave_addr PAGE_MASK)); +else +unmap_domain_page(xsave_addr); +break; +} +default: +goto fail; +} +break; +} + case 0x06: /* CLTS */ (void)do_fpu_taskswitch(0); break; @@ -2638,6 +2712,12 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) wrmsrl(regs-_ecx, msr_content); break; +case MSR_IA32_XSS: +if ( wrmsr_safe(regs-ecx, msr_content) != 0 ) +goto fail; +v-arch.msr_ia32_xss = msr_content; +break; + default: if ( wrmsr_hypervisor_regs(regs-ecx, msr_content) == 1 ) break; @@ -2740,6 +2820,11 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
[Xen-devel] [PATCH 4/6] libxc: expose xsaves/xgetbv/xsavec to hvm guest
This patch exposes xsaves/xgetbv/xsavec to hvm guest. Signed-off-by: Shuai Ruan shuai.r...@intel.com --- tools/libxc/xc_cpuid_x86.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c index c97f91a..0ed8b68 100644 --- a/tools/libxc/xc_cpuid_x86.c +++ b/tools/libxc/xc_cpuid_x86.c @@ -211,6 +211,9 @@ static void intel_xc_cpuid_policy( } #define XSAVEOPT(1 0) +#define XSAVEC (1 1) +#define XGETBV (1 2) +#define XSAVES (1 3) /* Configure extended state enumeration leaves (0x000D for xsave) */ static void xc_cpuid_config_xsave( xc_interface *xch, domid_t domid, uint64_t xfeature_mask, @@ -247,8 +250,7 @@ static void xc_cpuid_config_xsave( regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */ break; case 1: /* leaf 1 */ -regs[0] = XSAVEOPT; -regs[1] = regs[2] = regs[3] = 0; +regs[0] = (XSAVEOPT | XSAVEC | XGETBV | XSAVES); break; case 2 ... 63: /* sub-leaves */ if ( !(xfeature_mask (1ULL input[1])) ) @@ -256,8 +258,6 @@ static void xc_cpuid_config_xsave( regs[0] = regs[1] = regs[2] = regs[3] = 0; break; } -/* Don't touch EAX, EBX. Also cleanup ECX and EDX */ -regs[2] = regs[3] = 0; break; } } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/6] add xsaves/xrstors support
This patchset enable xsaves/xrstors feature. It includes tree parts: 1. add xsaves/xrstors for xen. 2. add xsaves/xrstors for pv guest. 3. add xsaves/xrstors for hvn guest. Shuai Ruan (6): x86/xsaves: enable xsaves/xrstors for pv guest x86/xsaves: enable xsaves/xrstors in xen x86/xsaves: enable xsaves/xrstors for hvm guest libxc: expose xsaves/xgetbv/xsavec to hvm guest x86/xsaves: support compact format for hvm save/restore x86/xsaves: detect xsaves/xgetbv in xen tools/libxc/xc_cpuid_x86.c | 8 +- xen/arch/x86/domain.c | 3 + xen/arch/x86/hvm/hvm.c | 56 +++- xen/arch/x86/hvm/vmx/vmcs.c| 7 +- xen/arch/x86/hvm/vmx/vmx.c | 18 +++ xen/arch/x86/traps.c | 85 xen/arch/x86/x86_64/mm.c | 52 +++ xen/arch/x86/xstate.c | 271 - xen/include/asm-x86/domain.h | 1 + xen/include/asm-x86/hvm/vmx/vmcs.h | 5 + xen/include/asm-x86/hvm/vmx/vmx.h | 2 + xen/include/asm-x86/mm.h | 1 + xen/include/asm-x86/msr-index.h| 2 + xen/include/asm-x86/xstate.h | 13 +- 14 files changed, 477 insertions(+), 47 deletions(-) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts
-Original Message- From: Roger Pau Monné [mailto:roger@citrix.com] Sent: 02 July 2015 15:52 To: Paul Durrant; xen-de...@lists.xenproject.org Cc: Andrew Cooper; Keir (Xen.org); Jan Beulich Subject: Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts El 30/06/15 a les 15.05, Paul Durrant ha escrit: [...] +void msixtbl_init(struct domain *d) +{ +register_mmio_handler(d, msixtbl_mmio_ops); Since you are adding an initialization function to vmsi I think msixtbl_list and msixtbl_list_lock should also be initialized here instead of hvm_domain_initialise. Yes, that sounds quite reasonable. Perhaps as a follow-up patch though, unless I need to post v6 of the series for some other reason. Paul Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts
El 02/07/15 a les 17.02, Paul Durrant ha escrit: -Original Message- From: Roger Pau Monné [mailto:roger@citrix.com] Sent: 02 July 2015 15:52 To: Paul Durrant; xen-de...@lists.xenproject.org Cc: Andrew Cooper; Keir (Xen.org); Jan Beulich Subject: Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts El 30/06/15 a les 15.05, Paul Durrant ha escrit: [...] +void msixtbl_init(struct domain *d) +{ +register_mmio_handler(d, msixtbl_mmio_ops); Since you are adding an initialization function to vmsi I think msixtbl_list and msixtbl_list_lock should also be initialized here instead of hvm_domain_initialise. Yes, that sounds quite reasonable. Perhaps as a follow-up patch though, unless I need to post v6 of the series for some other reason. That's fine, if not I can also add it as a pre-patch to my HVMlite stuff, it's trivial. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts
-Original Message- From: Roger Pau Monné [mailto:roger@citrix.com] Sent: 02 July 2015 16:12 To: Paul Durrant; xen-de...@lists.xenproject.org Cc: Andrew Cooper; Keir (Xen.org); Jan Beulich Subject: Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts El 02/07/15 a les 17.02, Paul Durrant ha escrit: -Original Message- From: Roger Pau Monné [mailto:roger@citrix.com] Sent: 02 July 2015 15:52 To: Paul Durrant; xen-de...@lists.xenproject.org Cc: Andrew Cooper; Keir (Xen.org); Jan Beulich Subject: Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts El 30/06/15 a les 15.05, Paul Durrant ha escrit: [...] +void msixtbl_init(struct domain *d) +{ +register_mmio_handler(d, msixtbl_mmio_ops); Since you are adding an initialization function to vmsi I think msixtbl_list and msixtbl_list_lock should also be initialized here instead of hvm_domain_initialise. Yes, that sounds quite reasonable. Perhaps as a follow-up patch though, unless I need to post v6 of the series for some other reason. That's fine, if not I can also add it as a pre-patch to my HVMlite stuff, it's trivial. Sure. Thanks, Paul Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 2/6] [WIP] libxl: xsrestrict QEMU
Check whether QEMU supports the xsrestrict option, by parsing its --help output. Store the result on xenstore for future reference on a per QEMU binary basis, so that device_model_override still works fine with it. Replace / with _ in the QEMU binary path before writing it to xenstore, so that it doesn't get confused with xenstore paths. If QEMU supports xsrestrict and emulator_id, pass xsrestrict=on to it. Statically reserve two emulator_ids, one for device models and another for pv qemus. Use the emulator_ids appropriately. WIP: direct use of fork is forbidden in libxl Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- Changes in v4: - update xenstore-paths.markdown Changes in v3: - add emulator_ids - mark as WIP --- docs/misc/xenstore-paths.markdown |8 + tools/libxl/libxl_dm.c| 72 + tools/libxl/libxl_internal.h |7 tools/libxl/libxl_utils.c | 10 ++ 4 files changed, 97 insertions(+) diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown index d94ea9d..780f601 100644 --- a/docs/misc/xenstore-paths.markdown +++ b/docs/misc/xenstore-paths.markdown @@ -397,6 +397,14 @@ The device model version for a domain. ifb device used by Remus to buffer network output from the associated vif. + ~/libxl/$DEVICE_MODEL_BINARY/* [n,INTERNAL] + +Contains a list of options supported by the device model, in the form: +$OPTION = (1|0). +$DEVICE_MODEL_BINARY is the full path to the device model binary with +'/' replaced by '_'. So for example /usr/lib/xen/bin/qemu-system-i386 +would be /libxl/_usr_lib_xen_bin_qemu-system-i386. + [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 7038e5c..d5f230a 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -447,6 +447,65 @@ retry: return 0; } +int libxl__check_qemu_supported(libxl__gc *gc, const char *dm, char *opt) +{ +libxl_ctx *ctx = libxl__gc_owner(gc); +pid_t pid; +int pipefd[2], status; +FILE *fp; +char *buf; +ssize_t buf_size = 512; +int ret = 0; +char *s; + +s = libxl__strdup(gc, dm); +libxl__replace_chr(gc, s, '/', '_'); +s = libxl__sprintf(gc, libxl/%s/%s, s, opt); +buf = libxl__xs_read(gc, XBT_NULL, s); +if (buf != NULL) +return !strcmp(buf, 1); + +if (access(dm, X_OK) 0) { +LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, + device model %s is not executable, dm); +return ERROR_FAIL; +} + +if (libxl_pipe(ctx, pipefd) 0) +return ERROR_FAIL; + +pid = fork(); +if (pid 0) +return ERROR_FAIL; + +/* child spawn QEMU */ +if (!pid) { +char *args[] = {(char*)dm, --help, NULL}; +close(pipefd[0]); +libxl__exec(gc, -1, pipefd[1], pipefd[1], dm, args, NULL); +exit(1); +} + +/* parent parses the output */ +close(pipefd[1]); +fp = fdopen(pipefd[0], r); +buf = libxl__malloc(gc, buf_size); +while (fgets(buf, buf_size, fp) != NULL) { +if (strstr(buf, opt) != NULL) { +ret = 1; +goto out; +} +} +out: +close(pipefd[0]); +waitpid(pid, status, pid); +libxl_report_child_exitstatus(ctx, XTL_WARN, dm, pid, status); + +ret = libxl__xs_write(gc, XBT_NULL, s, %d, ret); + +return ret; +} + static char ** libxl__build_device_model_args_new(libxl__gc *gc, const char *dm, int guest_domid, const libxl_domain_config *guest_config, @@ -934,6 +993,14 @@ end_search: if (user) { flexarray_append(dm_args, -runas); flexarray_append(dm_args, user); +if (libxl__check_qemu_supported(gc, dm, xsrestrict) +libxl__check_qemu_supported(gc, dm, emulator_id)) { +flexarray_append(dm_args, -xenopts); +flexarray_append(dm_args, +GCSPRINTF(xsrestrict=on,emulator_id=%u, +(b_info-type == LIBXL_DOMAIN_TYPE_PV) ? +QEMU_XEN_PV_ID : QEMU_XEN_DEVICE_MODEL_ID)); +} } } flexarray_append(dm_args, NULL); @@ -1660,6 +1727,11 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, libxl__dm_spawn_state *dmss) flexarray_vappend(dm_args, -monitor, /dev/null, NULL); flexarray_vappend(dm_args, -serial, /dev/null, NULL); flexarray_vappend(dm_args, -parallel, /dev/null, NULL); +if (libxl__check_qemu_supported(gc, dm, emulator_id)) { +flexarray_append(dm_args, -xenopts); +flexarray_append(dm_args, +
[Xen-devel] [PATCH v4 5/6] libxl: change qdisk-backend-pid path on xenstore
Change the qdisk-backend-pid path on xenstore from libxl/$DOMID/qdisk-backend-pid to /local/domain/$DOMID/image/pvqemu-pid to be more similar to the device-model path. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- Changes in v4: - update xenstore-paths.markdown --- docs/misc/xenstore-paths.markdown |9 + tools/libxl/libxl_dm.c|4 ++-- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown index e6ed25f..7f87f74 100644 --- a/docs/misc/xenstore-paths.markdown +++ b/docs/misc/xenstore-paths.markdown @@ -115,6 +115,11 @@ The domain's own ID. The process ID of the device model associated with this domain, if it has one. + ~/image/pvqemu-pid = INTEGER [INTERNAL] + +The process ID of the userspace PV backends daemon associated with this +domain, if it has one. + ~/device-model = STRING [n,INTERNAL] The full device model binary path. @@ -360,10 +365,6 @@ A domain writable path. Available for arbitrary domain use. Contains the status of the device models running on the domain. - ~/libxl/$DOMID/qdisk-backend-pid [w] - -Contains the PIDs of the device models running on the domain. - ## Virtual Machine Paths The /vm/$UUID namespace is used by toolstacks to store various diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 90f2c97..3a4625a 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1774,7 +1774,7 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, libxl__dm_spawn_state *dmss) * because we will call this from unprivileged driver domains, * so save it in the current domain libxl private dir. */ -dmss-spawn.pidpath = GCSPRINTF(libxl/%u/qdisk-backend-pid, domid); +dmss-spawn.pidpath = GCSPRINTF(/local/domain/%d/image/pvqemu-pid, domid); dmss-spawn.midproc_cb = libxl__spawn_record_pid; dmss-spawn.confirm_cb = device_model_confirm; dmss-spawn.failure_cb = device_model_startup_failed; @@ -1834,7 +1834,7 @@ int libxl__destroy_qdisk_backend(libxl__gc *gc, uint32_t domid) char *pid_path; int rc; -pid_path = GCSPRINTF(libxl/%u/qdisk-backend-pid, domid); +pid_path = GCSPRINTF(/local/domain/%d/image/pvqemu-pid, domid); rc = kill_device_model(gc, pid_path); if (rc) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 4/6] libxl: change xs path for QEMU
Change the QEMU xenstore watch path to /local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID/$EMULATOR_ID. Currently two emulator_ids are statically allocated: one for device models and one for pv qemus. Add a parameter to libxl__device_model_xs_path to distinguish the device model from the pv backends provider. Store the device model binary path under /local/domain/$DOMID/device-model on xenstore, so that we can fetch it later and retrieve the list of supported options from /local/domain/$LIBXL_TOOLSTACK_DOMID/libxl/$device_model_binary, introduce in the previous path. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- Changes in v4: - update xenstore-paths.markdown Changes in v3: - use LIBXL_TOOLSTACK_DOMID instead of 0 in the commit message - change xs path to include the emulator_id --- docs/misc/xenstore-paths.markdown | 16 +++- tools/libxl/libxl.c |2 +- tools/libxl/libxl_create.c|3 +++ tools/libxl/libxl_device.c|2 +- tools/libxl/libxl_dm.c| 18 ++ tools/libxl/libxl_dom.c | 12 ++-- tools/libxl/libxl_internal.c | 19 +++ tools/libxl/libxl_internal.h |5 ++--- tools/libxl/libxl_pci.c | 14 +++--- 9 files changed, 56 insertions(+), 35 deletions(-) diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown index 99e038b..e6ed25f 100644 --- a/docs/misc/xenstore-paths.markdown +++ b/docs/misc/xenstore-paths.markdown @@ -115,6 +115,10 @@ The domain's own ID. The process ID of the device model associated with this domain, if it has one. + ~/device-model = STRING [n,INTERNAL] + +The full device model binary path. + ~/cpu/[0-9]+/availability = (online|offline) [PV] One node for each virtual CPU up to the guest's configured @@ -306,13 +310,15 @@ A virtual network device backend. Described by A PV console backend. Described in [console.txt](console.txt) - ~/device-model/$DOMID/* [w] + ~/device-model/$DOMID/$EMULATOR_ID/* [w] Information relating to device models running in the domain. $DOMID is -the target domain of the device model. When the device model is running -at the same privilege level of the guest domain, this path does not -contain any sensitive information and becomes guest writeable. Otherwise -it is INTERNAL. +the target domain of the device model. $EMULATOR_ID indentifies a +specific device model instance (multiple device model instances for the +same domain are possible). When the device model is running at the same +privilege level of the guest domain, this path does not contain any +sensitive information and becomes guest writeable. Otherwise +it is INTERNAL ~/libxl/disable_udev = (1|0) [] diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 516713e..bca4c88 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -1035,7 +1035,7 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid) if (type == LIBXL_DOMAIN_TYPE_HVM) { uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid); -path = libxl__device_model_xs_path(gc, dm_domid, domid, /state); +path = libxl__device_model_xs_path(gc, false, dm_domid, domid, /state); state = libxl__xs_read(gc, XBT_NULL, path); if (state != NULL !strcmp(state, paused)) { libxl__qemu_traditional_cmd(gc, domid, continue); diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index a74b340..df946e2 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1002,6 +1002,9 @@ static void domcreate_bootloader_done(libxl__egc *egc, return; } +libxl__xs_write(gc, XBT_NULL, GCSPRINTF(/local/domain/%u/device-model, domid), +%s, libxl__domain_device_model(gc, info)); + /* consume bootloader outputs. state-pv_{kernel,ramdisk} have * been initialised by the bootloader already. */ diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c index 93bb41e..aadcd08 100644 --- a/tools/libxl/libxl_device.c +++ b/tools/libxl/libxl_device.c @@ -1190,7 +1190,7 @@ int libxl__wait_for_device_model_deprecated(libxl__gc *gc, char *path; uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid); -path = libxl__device_model_xs_path(gc, dm_domid, domid, /state); +path = libxl__device_model_xs_path(gc, false, dm_domid, domid, /state); return libxl__xenstore_child_wait_deprecated(gc, domid, LIBXL_DEVICE_MODEL_START_TIMEOUT, Device Model, path, state, spawning, diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index ecae536..90f2c97 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1274,9 +1274,9 @@ void libxl__spawn_stub_dm(libxl__egc *egc, libxl__stub_dm_spawn_state *sdss) retry_transaction: t = xs_transaction_start(ctx-xsh);
Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access
At 12:32 +0100 on 02 Jul (1435840328), George Dunlap wrote: On 07/02/2015 12:25 PM, Tim Deegan wrote: At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote: On 02/07/15 11:48, George Dunlap wrote: Now in p2m_set_mem_access(), rather than just using an unsigned long in the loop iterating over gfns, you do this thing where you convert gfn_t to unsigned long, add one, and then convert it back to gfn_t again. I can't see any comments in v3 that suggest you doing that, and it seems a bit clunky. Is that really necessary? Wouldn't it be better to declare a local variable? I'm not strongly opinionated on this one, it just seems a bit strange. Everything else looks good, thanks. Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for typesafety don't allow for simply adding 1 to a typesafe variable. In a cases like this, I think it is acceptable to keep a unsigned long shadow variable and manipulate it is a plain integer. The eventual _gfn() required to pass it further down the callchain will help to visually re-enforce the appropriate type. After all, the entire point of these typesafes are to try and avoid accidentally mixing up the different address spaces, but a function which takes a typesafe, loops over a subset and passes the same typesafe further down can probably be trusted to DTRT, catching errors at review time. Off the top of my head, the only functions which would normally expect to mix and match the typesafes are the pagetable walking ones. It should be easy enough to extend the macros to define a gfn_inc(gfn_t) operator for this kind of thing. I was thinking that -- although in this case you'd still need to un-pack it to do the loop exit conditional. To really make things pretty you'd want a for_gfn_range() macro or something like that that takes a start gfn and a number. But that's really starting to be feature creep for this patch, which is why I didn't want to suggest it on v4. :-) Oh, quite. :) Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 0/6] libxl: xs_restrict QEMU
Hi all, this patch series changes libxl to start QEMU as device model with the new xsrestrict option (http://marc.info/?l=xen-develm=143341692707358). It also starts a second QEMU to provide PV backends in userspace (qdisk) to HVM guests. Changes in v4: - update xenstore-paths.markdown - add error message in case count MAX_PHYSMAP_ENTRIES - add a note to xenstore-paths.markdown about the possible change in privilege level - only change permissions if xsrestrict is supported Changes in v3: - use LIBXL_TOOLSTACK_DOMID instead of 0 in the commit message - update commit message with more info on why it is safe - add a limit on the number of physmap entries to save and restore - add emulator_ids - mark patch #3 as WIP - use LIBXL_TOOLSTACK_DOMID instead of 0 in the commit message - change xs path to include the emulator_id - change qdisk-backend-pid path on xenstore - use dcs-dmss.pvqemu to spawn the second QEMU - keep track of the rc of both QEMUs before proceeding Stefano Stabellini (6): libxl: do not add a vkb backend to hvm guests [WIP] libxl: xsrestrict QEMU libxl: allow /local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID to be written by $DOMID libxl: change xs path for QEMU libxl: change qdisk-backend-pid path on xenstore libxl: spawns two QEMUs for HVM guests docs/misc/xenstore-paths.markdown | 30 -- tools/libxl/libxl.c |2 +- tools/libxl/libxl_create.c| 58 +-- tools/libxl/libxl_device.c|2 +- tools/libxl/libxl_dm.c| 115 + tools/libxl/libxl_dom.c | 19 -- tools/libxl/libxl_internal.c | 19 -- tools/libxl/libxl_internal.h | 15 - tools/libxl/libxl_pci.c | 14 ++--- tools/libxl/libxl_utils.c | 10 10 files changed, 225 insertions(+), 59 deletions(-) Cheers, Stefano ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 3/6] libxl: allow /local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID to be written by $DOMID
The device model is going to restrict its xenstore connection to $DOMID level, using XS_RESTRICT, only implemented by oxenstored at present. Let qemu-xen access /local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID, as it is required by QEMU to read/write the physmap. It doesn't contain any information the guest is not already fully aware of. Add a maximum limit of physmap entries to save, so that the guest cannot DOS the toolstack. Information under /local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID include: - the device model status, which is updated by the device model before the guest is unpaused, then ignored by both QEMU and libxl - the guest physmap, which contains the memory layout of the guest. The guest is already in possession of this information. A malicious guest could modify the physmap entries causing QEMU to read wrong information at restore time. QEMU would populate its XenPhysmap list with wrong entries that wouldn't match its own device emulation addresses, leading to a failure in restoring the domain. But it could not compromise the security of the system because the addresses are only used for checks against QEMU's addresses. In the case of qemu-traditional, the state node is used to issue commands to the device model, so avoid to change permissions in that case. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- Changes in v4: - add error message in case count MAX_PHYSMAP_ENTRIES - add a note to xenstore-paths.markdown about the possible change in privilege level - only change permissions if xsrestrict is supported Changes in v3: - use LIBXL_TOOLSTACK_DOMID instead of 0 in the commit message - update commit message with more info on why it is safe - add a limit on the number of physmap entries to save and restore Changes in v2: - fix permissions to actually do what intended - use LIBXL_TOOLSTACK_DOMID instead of 0 --- docs/misc/xenstore-paths.markdown |7 +-- tools/libxl/libxl_dm.c| 12 +++- tools/libxl/libxl_dom.c |7 +++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown index 780f601..99e038b 100644 --- a/docs/misc/xenstore-paths.markdown +++ b/docs/misc/xenstore-paths.markdown @@ -306,10 +306,13 @@ A virtual network device backend. Described by A PV console backend. Described in [console.txt](console.txt) - ~/device-model/$DOMID/* [INTERNAL] + ~/device-model/$DOMID/* [w] Information relating to device models running in the domain. $DOMID is -the target domain of the device model. +the target domain of the device model. When the device model is running +at the same privilege level of the guest domain, this path does not +contain any sensitive information and becomes guest writeable. Otherwise +it is INTERNAL. ~/libxl/disable_udev = (1|0) [] diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index d5f230a..ecae536 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1531,6 +1531,7 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) char **pass_stuff; const char *dm; int dm_state_fd = -1; +struct xs_permissions rwperm[2]; if (libxl_defbool_val(b_info-device_model_stubdomain)) { abort(); @@ -1573,7 +1574,16 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) } path = libxl__device_model_xs_path(gc, LIBXL_TOOLSTACK_DOMID, domid, ); -xs_mkdir(ctx-xsh, XBT_NULL, path); +if (b_info-device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN +libxl__check_qemu_supported(gc, dm, xsrestrict)) { +rwperm[0].id = LIBXL_TOOLSTACK_DOMID; +rwperm[0].perms = XS_PERM_NONE; +rwperm[1].id = domid; +rwperm[1].perms = XS_PERM_WRITE; +libxl__xs_mkdir(gc, XBT_NULL, path, rwperm, 2); +} else { +xs_mkdir(ctx-xsh, XBT_NULL, path); +} if (b_info-type == LIBXL_DOMAIN_TYPE_HVM b_info-device_model_version diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index f408646..c8523da 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1677,6 +1677,8 @@ static inline char *physmap_path(libxl__gc *gc, uint32_t dm_domid, phys_offset, node); } +#define MAX_PHYSMAP_ENTRIES 12 + int libxl__toolstack_save(uint32_t domid, uint8_t **buf, uint32_t *len, void *dss_void) { @@ -1698,6 +1700,11 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf, num); count = num; +if (count MAX_PHYSMAP_ENTRIES) { +LOG(ERROR, Max physmap entries reached); +return -1; +} + *len = sizeof(version) + sizeof(count); *buf = calloc(1, *len); ptr = *buf; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 08/18] xen/arm: vITS: Add virtual ITS driver
On Thu, 2015-07-02 at 19:03 +0530, Vijay Kilari wrote: Hi Julien, On Tue, Jun 23, 2015 at 10:09 PM, Julien Grall julien.gr...@citrix.com wrote: Hi Vijay, +struct vits_device { +uint32_t vdevid; +uint32_t pdevid; +struct its_device *its_dev; +struct rb_node node; +}; We spoke about a specific structure in the design [2] but you introduced a new one. Why? Section 6 of DraftG specifies to manage separate tree for device assignment. This helps to manage RB-tree per domain to hold list of devices assigned to this domain index with vdevid. This helps to check if device is assigned to this domain before processing any ITS command with that vdevid. Having everything in the its_device would help to catch a device attached to 2 different domains... One option is to introduce a new variable inside its_device to know to which domain the device is currently assigned. IIRC that's what I intended, e.g. two trees referencing the same underlying data structure. Sorry that wasn't clear. Also, the field pdevid is not vits specific but its. pdevid can be removed as its_device structure already has it Regards Vijay ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 4/6] xen/PMU: Describe vendor-specific PMU registers
AMD and Intel PMU register initialization and helpers that determine whether a register belongs to PMU. This and some of subsequent PMU emulation code is somewhat similar to Xen's PMU implementation. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Reviewed-by: David Vrabel david.vra...@citrix.com --- arch/x86/xen/pmu.c | 153 - 1 file changed, 152 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c index ba7687c..1fc7e10 100644 --- a/arch/x86/xen/pmu.c +++ b/arch/x86/xen/pmu.c @@ -18,6 +18,155 @@ DEFINE_PER_CPU(struct xen_pmu_data *, xenpmu_shared); #define get_xenpmu_data()per_cpu(xenpmu_shared, smp_processor_id()) + +/* AMD PMU */ +#define F15H_NUM_COUNTERS 6 +#define F10H_NUM_COUNTERS 4 + +static __read_mostly uint32_t amd_counters_base; +static __read_mostly uint32_t amd_ctrls_base; +static __read_mostly int amd_msr_step; +static __read_mostly int k7_counters_mirrored; +static __read_mostly int amd_num_counters; + +/* Intel PMU */ +#define MSR_TYPE_COUNTER0 +#define MSR_TYPE_CTRL 1 +#define MSR_TYPE_GLOBAL 2 +#define MSR_TYPE_ARCH_COUNTER 3 +#define MSR_TYPE_ARCH_CTRL 4 + +/* Number of general pmu registers (CPUID.EAX[0xa].EAX[8..15]) */ +#define PMU_GENERAL_NR_SHIFT8 +#define PMU_GENERAL_NR_BITS 8 +#define PMU_GENERAL_NR_MASK (((1 PMU_GENERAL_NR_BITS) - 1) \ + PMU_GENERAL_NR_SHIFT) + +/* Number of fixed pmu registers (CPUID.EDX[0xa].EDX[0..4]) */ +#define PMU_FIXED_NR_SHIFT 0 +#define PMU_FIXED_NR_BITS 5 +#define PMU_FIXED_NR_MASK (((1 PMU_FIXED_NR_BITS) - 1) \ + PMU_FIXED_NR_SHIFT) + +/* Alias registers (0x4c1) for full-width writes to PMCs */ +#define MSR_PMC_ALIAS_MASK (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_PMC0)) + +static __read_mostly int intel_num_arch_counters, intel_num_fixed_counters; + + +static void xen_pmu_arch_init(void) +{ + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { + + switch (boot_cpu_data.x86) { + case 0x15: + amd_num_counters = F15H_NUM_COUNTERS; + amd_counters_base = MSR_F15H_PERF_CTR; + amd_ctrls_base = MSR_F15H_PERF_CTL; + amd_msr_step = 2; + k7_counters_mirrored = 1; + break; + case 0x10: + case 0x12: + case 0x14: + case 0x16: + default: + amd_num_counters = F10H_NUM_COUNTERS; + amd_counters_base = MSR_K7_PERFCTR0; + amd_ctrls_base = MSR_K7_EVNTSEL0; + amd_msr_step = 1; + k7_counters_mirrored = 0; + break; + } + } else { + uint32_t eax, ebx, ecx, edx; + + cpuid(0xa, eax, ebx, ecx, edx); + + intel_num_arch_counters = (eax PMU_GENERAL_NR_MASK) + PMU_GENERAL_NR_SHIFT; + intel_num_fixed_counters = (edx PMU_FIXED_NR_MASK) + PMU_FIXED_NR_SHIFT; + } +} + +static inline uint32_t get_fam15h_addr(u32 addr) +{ + switch (addr) { + case MSR_K7_PERFCTR0: + case MSR_K7_PERFCTR1: + case MSR_K7_PERFCTR2: + case MSR_K7_PERFCTR3: + return MSR_F15H_PERF_CTR + (addr - MSR_K7_PERFCTR0); + case MSR_K7_EVNTSEL0: + case MSR_K7_EVNTSEL1: + case MSR_K7_EVNTSEL2: + case MSR_K7_EVNTSEL3: + return MSR_F15H_PERF_CTL + (addr - MSR_K7_EVNTSEL0); + default: + break; + } + + return addr; +} + +static inline bool is_amd_pmu_msr(unsigned int msr) +{ + if ((msr = MSR_F15H_PERF_CTL +msr MSR_F15H_PERF_CTR + (amd_num_counters * 2)) || + (msr = MSR_K7_EVNTSEL0 +msr MSR_K7_PERFCTR0 + amd_num_counters)) + return true; + + return false; +} + +static int is_intel_pmu_msr(u32 msr_index, int *type, int *index) +{ + u32 msr_index_pmc; + + switch (msr_index) { + case MSR_CORE_PERF_FIXED_CTR_CTRL: + case MSR_IA32_DS_AREA: + case MSR_IA32_PEBS_ENABLE: + *type = MSR_TYPE_CTRL; + return true; + + case MSR_CORE_PERF_GLOBAL_CTRL: + case MSR_CORE_PERF_GLOBAL_STATUS: + case MSR_CORE_PERF_GLOBAL_OVF_CTRL: + *type = MSR_TYPE_GLOBAL; + return true; + + default: + + if ((msr_index = MSR_CORE_PERF_FIXED_CTR0) + (msr_index MSR_CORE_PERF_FIXED_CTR0 + +intel_num_fixed_counters)) { + *index = msr_index - MSR_CORE_PERF_FIXED_CTR0; + *type = MSR_TYPE_COUNTER; +
[Xen-devel] [PATCH v4 6/6] libxl: spawns two QEMUs for HVM guests
Starts a second QEMU to provide PV backends in userspace to HVM guests. Use both dcs-dmss.pvqemu and dcs-dmss.dm to keep track of the starting QEMUs. Introduce two new fields to struct libxl__dm_spawn_state: dcs to store the pointer to libxl__domain_create_state, and rc to store the return code. Only proceed when both QEMUs have started. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- Changes in v3: - use dcs-dmss.pvqemu to spawn the second QEMU - keep track of the rc of both QEMUs before proceeding --- tools/libxl/libxl_create.c | 50 ++ tools/libxl/libxl_dm.c |9 +++- tools/libxl/libxl_internal.h |3 +++ 3 files changed, 47 insertions(+), 15 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index df946e2..94dd52c 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -751,6 +751,9 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid, static void domcreate_devmodel_started(libxl__egc *egc, libxl__dm_spawn_state *dmss, int rc); +static void domcreate_devmodel_callback(libxl__egc *egc, + libxl__dm_spawn_state *dmss, + int ret); static void domcreate_bootloader_console_available(libxl__egc *egc, libxl__bootloader_state *bl); static void domcreate_bootloader_done(libxl__egc *egc, @@ -1016,8 +1019,17 @@ static void domcreate_bootloader_done(libxl__egc *egc, dcs-dmss.dm.spawn.ao = ao; dcs-dmss.dm.guest_config = dcs-guest_config; dcs-dmss.dm.build_state = dcs-build_state; -dcs-dmss.dm.callback = domcreate_devmodel_started; -dcs-dmss.callback = domcreate_devmodel_started; +dcs-dmss.dm.callback = domcreate_devmodel_callback; +dcs-dmss.dm.dcs = dcs; +dcs-dmss.callback = domcreate_devmodel_callback; + +if (info-type == LIBXL_DOMAIN_TYPE_HVM) { +dcs-dmss.pvqemu.guest_domid = domid; +dcs-dmss.pvqemu.spawn.ao = ao; +dcs-dmss.pvqemu.callback = domcreate_devmodel_callback; +dcs-dmss.pvqemu.dcs = dcs; +libxl__spawn_qdisk_backend(egc, dcs-dmss.pvqemu); +} if ( restore_fd 0 ) { rc = libxl__domain_build(gc, d_config, domid, state); @@ -1347,20 +1359,13 @@ static void domcreate_devmodel_started(libxl__egc *egc, libxl__dm_spawn_state *dmss, int ret) { -libxl__domain_create_state *dcs = CONTAINER_OF(dmss, *dcs, dmss.dm); +libxl__domain_create_state *dcs = dmss-dcs; STATE_AO_GC(dmss-spawn.ao); -libxl_ctx *ctx = CTX; int domid = dcs-guest_domid; /* convenience aliases */ libxl_domain_config *const d_config = dcs-guest_config; -if (ret) { -LIBXL__LOG(ctx, LIBXL__LOG_ERROR, - device model did not start: %d, ret); -goto error_out; -} - if (dcs-dmss.dm.guest_domid) { if (d_config-b_info.device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { @@ -1379,11 +1384,28 @@ static void domcreate_devmodel_started(libxl__egc *egc, } domcreate_attach_vtpms(egc, dcs-multidev, 0); -return; +} -error_out: -assert(ret); -domcreate_complete(egc, dcs, ret); +static void domcreate_devmodel_callback(libxl__egc *egc, + libxl__dm_spawn_state *dmss, + int ret) +{ +libxl__domain_create_state *dcs = dmss-dcs; +STATE_AO_GC(dmss-spawn.ao); +int worst_rc = 0; + +dmss-rc = ret; + +if (libxl__spawn_inuse(dcs-dmss.dm.spawn) || +libxl__spawn_inuse(dcs-dmss.pvqemu.spawn)) + return; +worst_rc = (dcs-dmss.dm.rc dcs-dmss.pvqemu.rc) ? dcs-dmss.dm.rc : dcs-dmss.pvqemu.rc; + +/* all qemus have completed */ +if (worst_rc) +domcreate_complete(egc, dcs, worst_rc); +else +domcreate_devmodel_started(egc, dmss, 0); } static void domcreate_attach_vtpms(libxl__egc *egc, diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 3a4625a..71e7fe2 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1850,13 +1850,20 @@ out: int libxl__destroy_device_model(libxl__gc *gc, uint32_t domid) { +int rc; char *path = libxl__device_model_xs_path(gc, false, LIBXL_TOOLSTACK_DOMID, domid, ); if (!xs_rm(CTX-xsh, XBT_NULL, path)) LOG(ERROR, xs_rm failed for %s, path); + +kill_device_model(gc, +GCSPRINTF(/local/domain/%d/image/pvqemu-pid, domid)); + /* We should try to destroy the device model anyway. */ -return kill_device_model(gc, +rc = kill_device_model(gc, GCSPRINTF(/local/domain/%d/image/device-model-pid, domid)); + +return rc;
Re: [Xen-devel] [PATCH 0/2] Refactor ioreq server for better performance.
Oh, I forgot the version number and change history. This patchset should be version 2. The change history should be: 1 Split the original patch into 2; 2 Take Paul Durrant’s comments: a Add a name member in the struct rb_rangeset, and use the ‘q’ debug key to dump the ranges in ioreq server; b Keep original routine names for hvm ioreq server; c. Commit message changes – mention that a future patch to change the maximum ranges inside ioreq server; Sorry, my fault. Could I add this change history in next version, or should I resend the version 2? :) Thanks Yu On 7/2/2015 8:31 PM, Yu Zhang wrote: XenGT leverages ioreq server to track and forward the accesses to GPU I/O resources, e.g. the PPGTT(per-process graphic translation tables). Currently, ioreq server uses rangeset to track the BDF/ PIO/MMIO ranges to be emulated. To select an ioreq server, the rangeset is searched to see if the I/O range is recorded. However, traversing the link list inside rangeset could be time consuming when number of ranges is too high. On HSW platform, number of PPGTTs for each vGPU could be several hundred. On BDW, this value could be several thousand. To accommodate more ranges, limitation of the number of ranges in an ioreq server, MAX_NR_IO_RANGES is changed - future patches will be provided to tune this with other approaches. And to increase the ioreq server performance, a new data structure, rb_rangeset, is introduced. Yu Zhang (2): Resize the MAX_NR_IO_RANGES for ioreq server Add new data structure to track ranges. xen/arch/x86/domain.c| 3 + xen/arch/x86/hvm/hvm.c | 56 ++-- xen/common/Makefile | 1 + xen/common/rb_rangeset.c | 281 +++ xen/include/asm-x86/hvm/domain.h | 4 +- xen/include/asm-x86/hvm/hvm.h| 1 + xen/include/xen/rb_rangeset.h| 49 +++ 7 files changed, 379 insertions(+), 16 deletions(-) create mode 100644 xen/common/rb_rangeset.c create mode 100644 xen/include/xen/rb_rangeset.h ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] osstest: install libnl3 packages
Roger Pau Monne writes ([PATCH] osstest: install libnl3 packages): Install the libnl3 packages needed by the remus code. Those are available on both Wheezy and Jessie, although the Wheezy ones are too old. This patch implicitly drops support for lenny and squeeze. I think you should mention that in the commit message. Personally I have no objection to that but it ought to be mentioned explicitly, both so that people can object and so that anyone trying to fix it up later can find the commit easily. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 6/6] xen/PMU: PMU emulation code
Add PMU emulation code that runs when we are processing a PMU interrupt. This code will allow us not to trap to hypervisor on each MSR/LVTPC access (of which there may be quite a few in the handler). Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com --- arch/x86/xen/pmu.c | 214 + 1 file changed, 185 insertions(+), 29 deletions(-) diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c index 69a0b68..4c09167 100644 --- a/arch/x86/xen/pmu.c +++ b/arch/x86/xen/pmu.c @@ -13,11 +13,20 @@ /* x86_pmu.handle_irq definition */ #include ../kernel/cpu/perf_event.h +#define XENPMU_IRQ_PROCESSING1 +struct xenpmu { + /* Shared page between hypervisor and domain */ + struct xen_pmu_data *xenpmu_data; -/* Shared page between hypervisor and domain */ -DEFINE_PER_CPU(struct xen_pmu_data *, xenpmu_shared); -#define get_xenpmu_data()per_cpu(xenpmu_shared, smp_processor_id()) + uint8_t flags; +}; +DEFINE_PER_CPU(struct xenpmu, xenpmu_shared); +#define get_xenpmu_data()(this_cpu_ptr(xenpmu_shared)-xenpmu_data) +#define get_xenpmu_flags() (this_cpu_ptr(xenpmu_shared)-flags) +/* Macro for computing address of a PMU MSR bank */ +#define field_offset(ctxt, field) ((void *)((uintptr_t)ctxt + \ + (uintptr_t)ctxt-field)) /* AMD PMU */ #define F15H_NUM_COUNTERS 6 @@ -169,19 +178,124 @@ static int is_intel_pmu_msr(u32 msr_index, int *type, int *index) } } -bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err) +static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type, + int index, bool is_read) { + uint64_t *reg = NULL; + struct xen_pmu_intel_ctxt *ctxt; + uint64_t *fix_counters; + struct xen_pmu_cntr_pair *arch_cntr_pair; + struct xen_pmu_data *xenpmu_data = get_xenpmu_data(); + uint8_t xenpmu_flags = get_xenpmu_flags(); + + if (!xenpmu_data || !(xenpmu_flags XENPMU_IRQ_PROCESSING)) + return false; + + ctxt = xenpmu_data-pmu.c.intel; + + switch (msr) { + case MSR_CORE_PERF_GLOBAL_OVF_CTRL: + reg = ctxt-global_ovf_ctrl; + break; + case MSR_CORE_PERF_GLOBAL_STATUS: + reg = ctxt-global_status; + break; + case MSR_CORE_PERF_GLOBAL_CTRL: + reg = ctxt-global_ctrl; + break; + case MSR_CORE_PERF_FIXED_CTR_CTRL: + reg = ctxt-fixed_ctrl; + break; + default: + switch (type) { + case MSR_TYPE_COUNTER: + fix_counters = field_offset(ctxt, fixed_counters); + reg = fix_counters[index]; + break; + case MSR_TYPE_ARCH_COUNTER: + arch_cntr_pair = field_offset(ctxt, arch_counters); + reg = arch_cntr_pair[index].counter; + break; + case MSR_TYPE_ARCH_CTRL: + arch_cntr_pair = field_offset(ctxt, arch_counters); + reg = arch_cntr_pair[index].control; + break; + default: + return false; + } + } + + if (reg) { + if (is_read) + *val = *reg; + else { + *reg = *val; + + if (msr == MSR_CORE_PERF_GLOBAL_OVF_CTRL) + ctxt-global_status = (~(*val)); + } + return true; + } + + return false; +} + +static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read) +{ + uint64_t *reg = NULL; + int i, off = 0; + struct xen_pmu_amd_ctxt *ctxt; + uint64_t *counter_regs, *ctrl_regs; + struct xen_pmu_data *xenpmu_data = get_xenpmu_data(); + uint8_t xenpmu_flags = get_xenpmu_flags(); + + if (!xenpmu_data || !(xenpmu_flags XENPMU_IRQ_PROCESSING)) + return false; + + if (k7_counters_mirrored + ((msr = MSR_K7_EVNTSEL0) (msr = MSR_K7_PERFCTR3))) + msr = get_fam15h_addr(msr); + + ctxt = xenpmu_data-pmu.c.amd; + for (i = 0; i amd_num_counters; i++) { + if (msr == amd_ctrls_base + off) { + ctrl_regs = field_offset(ctxt, ctrls); + reg = ctrl_regs[i]; + break; + } else if (msr == amd_counters_base + off) { + counter_regs = field_offset(ctxt, counters); + reg = counter_regs[i]; + break; + } + off += amd_msr_step; + } + + if (reg) { + if (is_read) + *val = *reg; + else + *reg = *val; + + return true; + } +
[Xen-devel] [PATCH v5 2/6] xen/PMU: Sysfs interface for setting Xen PMU mode
Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com --- Documentation/ABI/testing/sysfs-hypervisor-pmu | 23 + arch/x86/include/asm/xen/hypercall.h | 6 ++ drivers/xen/sys-hypervisor.c | 127 + include/xen/interface/xen.h| 1 + include/xen/interface/xenpmu.h | 59 5 files changed, 216 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu create mode 100644 include/xen/interface/xenpmu.h diff --git a/Documentation/ABI/testing/sysfs-hypervisor-pmu b/Documentation/ABI/testing/sysfs-hypervisor-pmu new file mode 100644 index 000..d1ceb77 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-hypervisor-pmu @@ -0,0 +1,23 @@ +What: /sys/hypervisor/pmu/pmu_mode +Date: July 2015 +KernelVersion: 4.2 +Contact: Boris Ostrovsky boris.ostrov...@oracle.com +Description: + Describes mode that Xen's performance-monitoring unit (PMU) + uses. Accepted values are + off -- PMU is disabled + self -- The guest can profile itself + hv -- The guest can profile itself and, if it is + privileged (e.g. dom0), the hypervisor + all -- The guest can profile itself, the hypervisor + and all other guests. Only available to + privileged guests. + +What: /sys/hypervisor/pmu/pmu_features +Date: July 2015 +KernelVersion: 4.2 +Contact:Boris Ostrovsky boris.ostrov...@oracle.com +Description: + Describes Xen PMU features (as an integer). A set bit indicates + that the corresponding feature is enabled. See + include/xen/interface/xenpmu.h for available features diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index ca08a27..83aea80 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -465,6 +465,12 @@ HYPERVISOR_tmem_op( return _hypercall1(int, tmem_op, op); } +static inline int +HYPERVISOR_xenpmu_op(unsigned int op, void *arg) +{ + return _hypercall2(int, xenpmu_op, op, arg); +} + static inline void MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set) { diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c index 96453f8..f642085 100644 --- a/drivers/xen/sys-hypervisor.c +++ b/drivers/xen/sys-hypervisor.c @@ -20,6 +20,7 @@ #include xen/xenbus.h #include xen/interface/xen.h #include xen/interface/version.h +#include xen/interface/xenpmu.h #define HYPERVISOR_ATTR_RO(_name) \ static struct hyp_sysfs_attr _name##_attr = __ATTR_RO(_name) @@ -368,6 +369,124 @@ static void xen_properties_destroy(void) sysfs_remove_group(hypervisor_kobj, xen_properties_group); } +struct pmu_mode { + const char *name; + uint32_t mode; +}; + +struct pmu_mode pmu_modes[] = { + {off, XENPMU_MODE_OFF}, + {self, XENPMU_MODE_SELF}, + {hv, XENPMU_MODE_HV}, + {all, XENPMU_MODE_ALL} +}; + +static ssize_t pmu_mode_store(struct hyp_sysfs_attr *attr, + const char *buffer, size_t len) +{ + int ret; + struct xen_pmu_params xp; + int i; + + for (i = 0; i ARRAY_SIZE(pmu_modes); i++) { + if (strncmp(buffer, pmu_modes[i].name, len - 1) == 0) { + xp.val = pmu_modes[i].mode; + break; + } + } + + if (i == ARRAY_SIZE(pmu_modes)) + return -EINVAL; + + xp.version.maj = XENPMU_VER_MAJ; + xp.version.min = XENPMU_VER_MIN; + ret = HYPERVISOR_xenpmu_op(XENPMU_mode_set, xp); + if (ret) + return ret; + + return len; +} + +static ssize_t pmu_mode_show(struct hyp_sysfs_attr *attr, char *buffer) +{ + int ret; + struct xen_pmu_params xp; + int i; + uint32_t mode; + + xp.version.maj = XENPMU_VER_MAJ; + xp.version.min = XENPMU_VER_MIN; + ret = HYPERVISOR_xenpmu_op(XENPMU_mode_get, xp); + if (ret) + return ret; + + mode = (uint32_t)xp.val; + for (i = 0; i ARRAY_SIZE(pmu_modes); i++) { + if (mode == pmu_modes[i].mode) + return sprintf(buffer, %s\n, pmu_modes[i].name); + } + + return -EINVAL; +} +HYPERVISOR_ATTR_RW(pmu_mode); + +static ssize_t pmu_features_store(struct hyp_sysfs_attr *attr, + const char *buffer, size_t len) +{ + int ret; + uint32_t features; + struct xen_pmu_params xp; + + ret = kstrtou32(buffer, 0, features); + if (ret) + return ret; + + xp.val = features; + xp.version.maj =
[Xen-devel] [PATCH v5 0/6] xen/PMU: PMU support for Xen PV(H) guests
I haven't posted Linux part of PV(H) VPMU support in a while but now that (hopefully) the hypervisor part is getting close to be done I think it's time to post it again. There are very few differences compared to the last version, mostly due to updates in shared structures layouts. Patches 1 and 4 have no changes at all and patch 5 has minor changes due to rebasing so I kept David's Reviewed-by tag. Boris Ostrovsky (6): xen: xensyms support xen/PMU: Sysfs interface for setting Xen PMU mode xen/PMU: Initialization code for Xen PMU xen/PMU: Describe vendor-specific PMU registers xen/PMU: Intercept PMU-related MSR and APIC accesses xen/PMU: PMU emulation code Documentation/ABI/testing/sysfs-hypervisor-pmu | 23 + arch/x86/include/asm/xen/hypercall.h | 6 + arch/x86/include/asm/xen/interface.h | 123 ++ arch/x86/xen/Makefile | 2 +- arch/x86/xen/apic.c| 6 + arch/x86/xen/enlighten.c | 13 +- arch/x86/xen/pmu.c | 572 + arch/x86/xen/pmu.h | 15 + arch/x86/xen/smp.c | 29 +- arch/x86/xen/suspend.c | 23 +- drivers/xen/Kconfig| 8 + drivers/xen/sys-hypervisor.c | 127 ++ drivers/xen/xenfs/Makefile | 1 + drivers/xen/xenfs/super.c | 3 + drivers/xen/xenfs/xenfs.h | 1 + drivers/xen/xenfs/xensyms.c| 152 +++ include/xen/interface/platform.h | 18 + include/xen/interface/xen.h| 2 + include/xen/interface/xenpmu.h | 94 19 files changed, 1208 insertions(+), 10 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu create mode 100644 arch/x86/xen/pmu.c create mode 100644 arch/x86/xen/pmu.h create mode 100644 drivers/xen/xenfs/xensyms.c create mode 100644 include/xen/interface/xenpmu.h -- 1.8.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 1/6] xen: xensyms support
Export Xen symbols to dom0 via /proc/xen/xensyms (similar to /proc/kallsyms). Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Reviewed-by: David Vrabel david.vra...@citrix.com --- drivers/xen/Kconfig | 8 +++ drivers/xen/xenfs/Makefile | 1 + drivers/xen/xenfs/super.c| 3 + drivers/xen/xenfs/xenfs.h| 1 + drivers/xen/xenfs/xensyms.c | 152 +++ include/xen/interface/platform.h | 18 + 6 files changed, 183 insertions(+) create mode 100644 drivers/xen/xenfs/xensyms.c diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index 7cd226d..1d825ca 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -280,4 +280,12 @@ config XEN_ACPI def_bool y depends on X86 ACPI +config XEN_SYMS + bool Xen symbols + depends on XEN_DOM0 XENFS + default y if KALLSYMS + help + Exports hypervisor symbols (along with their types and addresses) via + /proc/xen/xensyms file, similar to /proc/kallsyms + endmenu diff --git a/drivers/xen/xenfs/Makefile b/drivers/xen/xenfs/Makefile index b019865..1a83010 100644 --- a/drivers/xen/xenfs/Makefile +++ b/drivers/xen/xenfs/Makefile @@ -2,3 +2,4 @@ obj-$(CONFIG_XENFS) += xenfs.o xenfs-y = super.o xenfs-$(CONFIG_XEN_DOM0) += xenstored.o +xenfs-$(CONFIG_XEN_SYMS) += xensyms.o diff --git a/drivers/xen/xenfs/super.c b/drivers/xen/xenfs/super.c index 06092e0..8559a71 100644 --- a/drivers/xen/xenfs/super.c +++ b/drivers/xen/xenfs/super.c @@ -57,6 +57,9 @@ static int xenfs_fill_super(struct super_block *sb, void *data, int silent) { privcmd, xen_privcmd_fops, S_IRUSR|S_IWUSR }, { xsd_kva, xsd_kva_file_ops, S_IRUSR|S_IWUSR}, { xsd_port, xsd_port_file_ops, S_IRUSR|S_IWUSR}, +#ifdef CONFIG_XEN_SYMS + { xensyms, xensyms_ops, S_IRUSR}, +#endif {}, }; diff --git a/drivers/xen/xenfs/xenfs.h b/drivers/xen/xenfs/xenfs.h index 6b80c77..2c5934e 100644 --- a/drivers/xen/xenfs/xenfs.h +++ b/drivers/xen/xenfs/xenfs.h @@ -3,5 +3,6 @@ extern const struct file_operations xsd_kva_file_ops; extern const struct file_operations xsd_port_file_ops; +extern const struct file_operations xensyms_ops; #endif /* _XENFS_XENBUS_H */ diff --git a/drivers/xen/xenfs/xensyms.c b/drivers/xen/xenfs/xensyms.c new file mode 100644 index 000..ed16f8d --- /dev/null +++ b/drivers/xen/xenfs/xensyms.c @@ -0,0 +1,152 @@ +#include linux/module.h +#include linux/init.h +#include linux/seq_file.h +#include linux/fs.h +#include linux/mm.h +#include linux/proc_fs.h +#include linux/slab.h +#include xen/interface/platform.h +#include asm/xen/hypercall.h +#include xen/xen-ops.h +#include xenfs.h + + +#define XEN_KSYM_NAME_LEN 127 /* Hypervisor may have different name length */ + +struct xensyms { + struct xen_platform_op op; + char *name; + uint32_t namelen; +}; + +/* Grab next output page from the hypervisor */ +static int xensyms_next_sym(struct xensyms *xs) +{ + int ret; + struct xenpf_symdata *symdata = xs-op.u.symdata; + uint64_t symnum; + + memset(xs-name, 0, xs-namelen); + symdata-namelen = xs-namelen; + + symnum = symdata-symnum; + + ret = HYPERVISOR_dom0_op(xs-op); + if (ret 0) + return ret; + + /* +* If hypervisor's symbol didn't fit into the buffer then allocate +* a larger buffer and try again +*/ + if (unlikely(symdata-namelen xs-namelen)) { + kfree(xs-name); + + xs-namelen = symdata-namelen; + xs-name = kzalloc(xs-namelen, GFP_KERNEL); + if (!xs-name) + return 1; + + set_xen_guest_handle(symdata-name, xs-name); + symdata-symnum--; /* Rewind */ + + ret = HYPERVISOR_dom0_op(xs-op); + if (ret 0) + return ret; + } + + if (symdata-symnum == symnum) + /* End of symbols */ + return 1; + + return 0; +} + +static void *xensyms_start(struct seq_file *m, loff_t *pos) +{ + struct xensyms *xs = (struct xensyms *)m-private; + + xs-op.u.symdata.symnum = *pos; + + if (xensyms_next_sym(xs)) + return NULL; + + return m-private; +} + +static void *xensyms_next(struct seq_file *m, void *p, loff_t *pos) +{ + struct xensyms *xs = (struct xensyms *)m-private; + + xs-op.u.symdata.symnum = ++(*pos); + + if (xensyms_next_sym(xs)) + return NULL; + + return p; +} + +static int xensyms_show(struct seq_file *m, void *p) +{ + struct xensyms *xs = (struct xensyms *)m-private; + struct xenpf_symdata *symdata = xs-op.u.symdata; + + seq_printf(m, %016llx %c %s\n, symdata-address, + symdata-type, xs-name); + + return 0; +} + +static void
[Xen-devel] [PATCH v5 3/6] xen/PMU: Initialization code for Xen PMU
Map shared data structure that will hold CPU registers, VPMU context, V/PCPU IDs of the CPU interrupted by PMU interrupt. Hypervisor fills this information in its handler and passes it to the guest for further processing. Set up PMU VIRQ. Now that perf infrastructure will assume that PMU is available on a PV guest we need to be careful and make sure that accesses via RDPMC instruction don't cause fatal traps by the hypervisor. Provide a nop RDPMC handler. For the same reason avoid issuing a warning on a write to APIC's LVTPC. Both of these will be made functional in later patches. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com --- arch/x86/include/asm/xen/interface.h | 123 + arch/x86/xen/Makefile| 2 +- arch/x86/xen/apic.c | 3 + arch/x86/xen/enlighten.c | 12 ++- arch/x86/xen/pmu.c | 172 +++ arch/x86/xen/pmu.h | 11 +++ arch/x86/xen/smp.c | 29 +- arch/x86/xen/suspend.c | 23 +++-- include/xen/interface/xen.h | 1 + include/xen/interface/xenpmu.h | 33 +++ 10 files changed, 400 insertions(+), 9 deletions(-) create mode 100644 arch/x86/xen/pmu.c create mode 100644 arch/x86/xen/pmu.h diff --git a/arch/x86/include/asm/xen/interface.h b/arch/x86/include/asm/xen/interface.h index 3400dba..dced9bc 100644 --- a/arch/x86/include/asm/xen/interface.h +++ b/arch/x86/include/asm/xen/interface.h @@ -172,6 +172,129 @@ struct vcpu_guest_context { #endif }; DEFINE_GUEST_HANDLE_STRUCT(vcpu_guest_context); + +/* AMD PMU registers and structures */ +struct xen_pmu_amd_ctxt { + /* +* Offsets to counter and control MSRs (relative to xen_pmu_arch.c.amd). +* For PV(H) guests these fields are RO. +*/ + uint32_t counters; + uint32_t ctrls; + + /* Counter MSRs */ +#if defined(__STDC_VERSION__) __STDC_VERSION__ = 199901L + uint64_t regs[]; +#elif defined(__GNUC__) + uint64_t regs[0]; +#endif +}; + +/* Intel PMU registers and structures */ +struct xen_pmu_cntr_pair { + uint64_t counter; + uint64_t control; +}; + +struct xen_pmu_intel_ctxt { + /* +* Offsets to fixed and architectural counter MSRs (relative to +* xen_pmu_arch.c.intel). +* For PV(H) guests these fields are RO. +*/ + uint32_t fixed_counters; + uint32_t arch_counters; + + /* PMU registers */ + uint64_t global_ctrl; + uint64_t global_ovf_ctrl; + uint64_t global_status; + uint64_t fixed_ctrl; + uint64_t ds_area; + uint64_t pebs_enable; + uint64_t debugctl; + + /* Fixed and architectural counter MSRs */ +#if defined(__STDC_VERSION__) __STDC_VERSION__ = 199901L + uint64_t regs[]; +#elif defined(__GNUC__) + uint64_t regs[0]; +#endif +}; + +/* Sampled domain's registers */ +struct xen_pmu_regs { + uint64_t ip; + uint64_t sp; + uint64_t flags; + uint16_t cs; + uint16_t ss; + uint8_t cpl; + uint8_t pad[3]; +}; + +/* PMU flags */ +#define PMU_CACHED(10) /* PMU MSRs are cached in the context */ +#define PMU_SAMPLE_USER (11) /* Sample is from user or kernel mode */ +#define PMU_SAMPLE_REAL (12) /* Sample is from realmode */ +#define PMU_SAMPLE_PV (13) /* Sample from a PV guest */ + +/* + * Architecture-specific information describing state of the processor at + * the time of PMU interrupt. + * Fields of this structure marked as RW for guest should only be written by + * the guest when PMU_CACHED bit in pmu_flags is set (which is done by the + * hypervisor during PMU interrupt). Hypervisor will read updated data in + * XENPMU_flush hypercall and clear PMU_CACHED bit. + */ +struct xen_pmu_arch { + union { + /* +* Processor's registers at the time of interrupt. +* WO for hypervisor, RO for guests. +*/ + struct xen_pmu_regs regs; + /* +* Padding for adding new registers to xen_pmu_regs in +* the future +*/ +#define XENPMU_REGS_PAD_SZ 64 + uint8_t pad[XENPMU_REGS_PAD_SZ]; + } r; + + /* WO for hypervisor, RO for guest */ + uint64_t pmu_flags; + + /* +* APIC LVTPC register. +* RW for both hypervisor and guest. +* Only APIC_LVT_MASKED bit is loaded by the hypervisor into hardware +* during XENPMU_flush or XENPMU_lvtpc_set. +*/ + union { + uint32_t lapic_lvtpc; + uint64_t pad; + } l; + + /* +* Vendor-specific PMU registers. +* RW for both hypervisor and guest (see exceptions above). +* Guest's updates to this field are verified and then loaded by the +* hypervisor into hardware during XENPMU_flush +*/
Re: [Xen-devel] [PATCH v5 02/16] x86/hvm: remove multiple open coded 'chunking' loops
On 30/06/15 14:05, Paul Durrant wrote: ...in hvmemul_read/write() Add hvmemul_phys_mmio_access() and hvmemul_linear_mmio_access() functions to reduce code duplication. NOTE: This patch also introduces a change in 'chunking' around a page boundary. Previously (for example) an 8 byte access at the last byte of a page would get carried out as 8 single-byte accesses. It will now be carried out as a single-byte access, followed by a 4-byte access, a 2-byte access and then another single-byte access. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com --- xen/arch/x86/hvm/emulate.c | 223 +++- 1 file changed, 116 insertions(+), 107 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 8b60843..b67f5db 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -539,6 +539,117 @@ static int hvmemul_virtual_to_linear( return X86EMUL_EXCEPTION; } +static int hvmemul_phys_mmio_access( +paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer) +{ +unsigned long one_rep = 1; +unsigned int chunk; +int rc; + +/* Accesses must fall within a page */ Full stop. +BUG_ON((gpa (PAGE_SIZE - 1)) + size PAGE_SIZE); ~PAGE_MASK as opposed to (PAGE_SIZE - 1) + +/* + * hvmemul_do_io() cannot handle non-power-of-2 accesses or + * accesses larger than sizeof(long), so choose the highest power + * of 2 not exceeding sizeof(long) as the 'chunk' size. + */ +chunk = 1 (fls(size) - 1); Depending on size, chunk can become undefined (shifting by 31 or -1) or zero (shifting by 32). How about if ( size sizeof(long) ) chunk = sizeof(long); else chunk = 1U (fls(size) - 1); ? +if ( chunk sizeof (long) ) +chunk = sizeof (long); + +for ( ;; ) +{ +rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, +buffer); +if ( rc != X86EMUL_OKAY ) +break; + +/* Advance to the next chunk */ Full stop. +gpa += chunk; +buffer += chunk; +size -= chunk; + +if ( size == 0 ) +break; + +/* + * If the chunk now exceeds the remaining size, choose the next + * lowest power of 2 that will fit. + */ +while ( chunk size ) +chunk = 1; +} + +return rc; +} + +static int hvmemul_linear_mmio_access( +unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer, +uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t known_gpfn) +{ +struct hvm_vcpu_io *vio = current-arch.hvm_vcpu.hvm_io; +unsigned long page_off = gla (PAGE_SIZE - 1); Can be int as opposed to long, and offset appears to be the prevailing name. Also, ~PAGE_MASK. +unsigned int chunk; +paddr_t gpa; +unsigned long one_rep = 1; +int rc; + +chunk = min_t(unsigned int, size, PAGE_SIZE - page_off); + +if ( known_gpfn ) +gpa = pfn_to_paddr(vio-mmio_gpfn) | page_off; +else +{ +rc = hvmemul_linear_to_phys(gla, gpa, chunk, one_rep, pfec, +hvmemul_ctxt); +if ( rc != X86EMUL_OKAY ) +return rc; +} + +for ( ;; ) +{ +rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer); +if ( rc != X86EMUL_OKAY ) +break; + +gla += chunk; +buffer += chunk; +size -= chunk; + +if ( size == 0 ) +break; + +ASSERT((gla (PAGE_SIZE - 1)) == 0); ~PAGE_MASK. +ASSERT(size PAGE_SIZE); Nothing I can see here prevents size being greater than PAGE_SIZE. chunk strictly will be, but size -= chunk can still leave size greater than a page. ~Andrew +chunk = size; +rc = hvmemul_linear_to_phys(gla, gpa, chunk, one_rep, pfec, +hvmemul_ctxt); +if ( rc != X86EMUL_OKAY ) +return rc; +} + +return rc; +} + ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()
At 13:43 +0100 on 02 Jul (1435844582), Ben Catterall wrote: Removed as they were wrappers around map_domain_page() to make it appear to take an mfn_t type. Signed-off-by: Ben Catterall ben.catter...@citrix.com Reviewed-by: Tim Deegan t...@xen.org ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)
Ian Campbell writes (Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)): On Thu, 2015-07-02 at 09:45 +0100, Ian Campbell wrote: Shall I put up a poll of some sort to gather preferred timeslot options out of that set? Please can everyone who is interested in this topic indicate their date preference/availability at: http://doodle.com/cy88dhwzybg7hh7p I've gone with the usual 5pm BST slow for simplicity. That's 1200 Noon EDT, 9am PDT and 6pm CEST. I'm never available at 1700 BST on a Wednesday, I'm afraid. I can make that time any other day of the week. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] xen/arm: Find automatically the gnttab region for DOM0
Hi, Ping? Regards, On 17/06/15 14:58, Julien Grall wrote: Currently, the grant table region is hardcoded per-platform. When a new board is coming up, we have to check the spec in order to find a space in the memory layout free. Depending on the platform it may be tedious. A good candidate for the gnttab region is the one used by Xen binary as some part will never be mapped to the DOM0 address, MMIO are mapped 1:1 and the RAM will be either: - direct mapped: 1:1 mapping is used = no problem - non direct mapped: Xen always relocates himself as high as possible (limited to 4GB on ARM32) and the RAM bank are filled from the first one. It's very unlikely that the gnttab region will overlap with the RAM. Although for safety a check may be necessary when we will reenable the option. Furthermore, there is plenty of space to contain a big gnttab, the default size is 32 frame (i.e 128KB) but it can be changed via a command option. It's not possible to use the whole region used by Xen, as some part of the binary will be freed after Xen boot and can be used by DOM0 and other guest. A sensible choice is the text secion as it will always reside in memory never be mapped to the guest and the size is big enough (~300KB on ARM64). It could be extended later to use other contiguous sections such as data... Note that on ARM64, the grant table region may be after 4GB (Xen is relocated to the highest address) using DOM0 32 bit with short page table may not work. Although, I don't think this is a big deal as device may not work and/or the RAM is too high due to the 1:1 mapping. This patch also drop the platforms thunderx and xilinx-zynqmp which became dummy by dropping the hardcoding DOM0 grant table region. Signed-off-by: Julien Grall julien.gr...@citrix.com --- Changes in v2: - Typoes - Message and comment improvement --- xen/arch/arm/domain_build.c| 47 -- xen/arch/arm/kernel.h | 4 +++ xen/arch/arm/platform.c| 14 -- xen/arch/arm/platforms/Makefile| 2 -- xen/arch/arm/platforms/midway.c| 3 --- xen/arch/arm/platforms/omap5.c | 6 - xen/arch/arm/platforms/rcar2.c | 3 --- xen/arch/arm/platforms/seattle.c | 3 --- xen/arch/arm/platforms/sunxi.c | 3 --- xen/arch/arm/platforms/thunderx.c | 41 - xen/arch/arm/platforms/vexpress.c | 2 -- xen/arch/arm/platforms/xgene-storm.c | 3 --- xen/arch/arm/platforms/xilinx-zynqmp.c | 41 - xen/include/asm-arm/platform.h | 7 - 14 files changed, 43 insertions(+), 136 deletions(-) delete mode 100644 xen/arch/arm/platforms/thunderx.c delete mode 100644 xen/arch/arm/platforms/xilinx-zynqmp.c diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index e9cb8a9..d8b775f 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -21,6 +21,7 @@ #include asm/gic.h #include xen/irq.h +#include xen/grant_table.h #include kernel.h static unsigned int __initdata opt_dom0_max_vcpus; @@ -605,8 +606,8 @@ static int make_memory_node(const struct domain *d, return res; } -static int make_hypervisor_node(struct domain *d, -void *fdt, const struct dt_device_node *parent) +static int make_hypervisor_node(const struct kernel_info *kinfo, +const struct dt_device_node *parent) { const char compat[] = xen,xen-__stringify(XEN_VERSION).__stringify(XEN_SUBVERSION)\0 @@ -615,9 +616,10 @@ static int make_hypervisor_node(struct domain *d, gic_interrupt_t intr; __be32 *cells; int res; +/* Convenience alias */ int addrcells = dt_n_addr_cells(parent); int sizecells = dt_n_size_cells(parent); -paddr_t gnttab_start, gnttab_size; +void *fdt = kinfo-fdt; DPRINT(Create hypervisor node\n); @@ -639,12 +641,9 @@ static int make_hypervisor_node(struct domain *d, if ( res ) return res; -platform_dom0_gnttab(gnttab_start, gnttab_size); -DPRINT( Grant table range: %#PRIpaddr-%#PRIpaddr\n, - gnttab_start, gnttab_start + gnttab_size); /* reg 0 is grant table space */ cells = reg[0]; -dt_set_range(cells, parent, gnttab_start, gnttab_size); +dt_set_range(cells, parent, kinfo-gnttab_start, kinfo-gnttab_size); res = fdt_property(fdt, reg, reg, dt_cells_to_size(addrcells + sizecells)); if ( res ) @@ -1192,7 +1191,7 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo, if ( node == dt_host ) { -res = make_hypervisor_node(d, kinfo-fdt, node); +res = make_hypervisor_node(kinfo, node); if ( res ) return res; @@ -1368,6
Re: [Xen-devel] [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask()
On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli dario.faggi...@citrix.com wrote: and of (almost every) direct use of cpupool_online_cpumask(). In fact, what we really want for the most of the times, is the set of valid pCPUs of the cpupool a certain domain is part of. Furthermore, in case it's called with a NULL pool as argument, cpupool_scheduler_cpumask() does more harm than good, by returning the bitmask of free pCPUs! This commit, therefore: * gets rid of cpupool_scheduler_cpumask(), in favour of cpupool_domain_cpumask(), which makes it more evident what we are after, and accommodates some sanity checking; * replaces some of the calls to cpupool_online_cpumask() with calls to the new functions too. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- Cc: George Dunlap george.dun...@eu.citrix.com Cc: Juergen Gross jgr...@suse.com Cc: Robert VanVossen robert.vanvos...@dornerworks.com Cc: Josh Whitehead josh.whiteh...@dornerworks.com Cc: Meng Xu men...@cis.upenn.edu Cc: Sisu Xi xis...@gmail.com --- xen/common/domain.c |5 +++-- xen/common/domctl.c |4 ++-- xen/common/sched_arinc653.c |2 +- xen/common/sched_credit.c |6 +++--- xen/common/sched_rt.c | 12 ++-- xen/common/sched_sedf.c |2 +- xen/common/schedule.c |2 +- xen/include/xen/sched-if.h | 12 ++-- 8 files changed, 27 insertions(+), 18 deletions(-) diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index a1945ac..8c36635 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -309,7 +309,7 @@ __runq_remove(struct csched_vcpu *svc) static inline int __vcpu_has_soft_affinity(const struct vcpu *vc, const cpumask_t *mask) { -return !cpumask_subset(cpupool_online_cpumask(vc-domain-cpupool), +return !cpumask_subset(cpupool_domain_cpumask(vc-domain), vc-cpu_soft_affinity) !cpumask_subset(vc-cpu_hard_affinity, vc-cpu_soft_affinity) cpumask_intersects(vc-cpu_soft_affinity, mask); @@ -374,7 +374,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) /* cpu is vc-processor, so it must be in a cpupool. */ ASSERT(per_cpu(cpupool, cpu) != NULL); -online = cpupool_online_cpumask(per_cpu(cpupool, cpu)); +online = cpupool_domain_cpumask(new-sdom-dom); Two comments in passing here (no action required): 1. Looks like passing both cpu and svc to this function is a bit redundant, as the function can just look up new-vcpu-processor itself 2. After this patch, the ASSERT there will be redundant, as there's already an identical assert in cpupool_domain_cpumask() Those won't hurt, but if you end up respinning you might think about at least taking the ASSERT out. diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 4ffcd98..7ad298a 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -80,7 +80,7 @@ static struct scheduler __read_mostly ops; #define DOM2OP(_d)(((_d)-cpupool == NULL) ? ops : ((_d)-cpupool-sched)) #define VCPU2OP(_v) (DOM2OP((_v)-domain)) -#define VCPU2ONLINE(_v) cpupool_online_cpumask((_v)-domain-cpupool) +#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)-domain) static inline void trace_runstate_change(struct vcpu *v, int new_state) { diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 7cc25c6..20af791 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -183,9 +183,17 @@ struct cpupool atomic_t refcnt; }; -#define cpupool_scheduler_cpumask(_pool) \ -(((_pool) == NULL) ? cpupool_free_cpus : (_pool)-cpu_valid) #define cpupool_online_cpumask(_pool) \ (((_pool) == NULL) ? cpu_online_map : (_pool)-cpu_valid) +static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) +{ +/* + * d-cpupool == NULL only for the idle domain, which should + * have no interest in calling into this. + */ +ASSERT(d-cpupool != NULL); +return cpupool_online_cpumask(d-cpupool); +} It's a bit strange to assert that d-cpupool != NULL, and then call a macro whose return value only depends on whether d-cpupool is NULL or not. Why not just return d-cpupool-cpu_valid? Thanks for tracking these down, BTW! -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 03/16] x86/hvm: change hvm_mmio_read_t and hvm_mmio_write_t length argument...
On 30/06/15 14:05, Paul Durrant wrote: ...from unsigned long to unsigned int A 64-bit length is not necessary, 32 bits is enough. Signed-off-by: Paul Durrant paul.durr...@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel