Re: [Xen-devel] Xen x86 host memory limit issues
On 24/08/15 12:47, Jan Beulich wrote: On 24.08.15 at 12:36, andrew.coop...@citrix.com wrote: The infrastructure around xenheap_max_mfn() is supposed cause all xenheap page allocations to fall within the Xen direct mapped region, but experimentally doesn't work correctly. In all cases I have seen, the bad xenheap allocations have been from calls which contain numa information in the memflags, which leads me to suspect it is an interaction issue of numa hinting information and xenheap_bits. At a guess I suspect alloc_heap_pages() doesn't correctly override the numa hint when both a numa hint and zone limit are provided, but I have not investigated this yet. But you're in the ideal position to do so. As said previously on the same topic, looking just at the code I can't see what's wrong, even when taking into account the experimentally observed behavior. It is high on (but not top of) my todo list, as we currently have the workaround in place. From discussions at the Summit, I know that Orcale, Suse and Citrix all have machines large enough to reproduce the issue. This information is provided as the request of Elena and Konrad (who it turns out I forgot to CC on the original message. Sorry!) Fixing that bug will be a useful step, as it will allow Xen to function with host ram above the direct map limit, but is still not an optimal solution as it prevents getting numa-local xenheap memory. Longterm it would be optimal to segment the direct map region by numa node so there is equal quantities of xenheap memory available from each numa node. Yes, albeit I'm suspecting there to arise (at least theoretical) issues on systems with many nodes - the per-node ranges directly mapped may become unreasonably small (and we may risk exhausting node 0's memory due to not NUMA-tagged allocation requests). There are a number of allocation constraints. Off the top of my head: * DMA pools for dom0 (mitigated in certain circumstances by PVIOMMU) * 128GB for 32bit PV domheap pages * 4GB for some 32bit PV L3 pages Some of this can be avoided by allocating directmap from the upper ram in the numa nodes. Exhaustion of node 0 can be mitigated by striping allocations without a numa hint, or allocating from the node with most free space remaining. There should actually be very few allocations which can't have a numa hint provided. All allocations for anything hardware related should be on local node, and everything else should be allocations on behalf a domain, which itself has numa information. As an orthogonal task, we should see whether it is possible to nab any virtual address space back from 64bit PV guests, or whether it is irreparably fixed at its current value. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Build failres with Xen 4.6.0-rc1 (firmware/etherboot/ipxe)
On 21.08.15 at 19:51, konrad.w...@oracle.com wrote: I don't think we can rev ipxe.git to the latest in Xen 4.6 time-frame. But having that patch should help with compile issues, like mine. Agreed. So how do we want to fix this in 4.6 time-frame? Pull the one patch (or a few more hand selected ones if need be). And 4.7 time-frame? Rev up to ipxe.git master? I would say so. Them apparently never doing any releases (at least there no respective tags that I could see) of course makes the when part of this a little problematic... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Xen 4.6 test report from XenServer
Hi, While the Developer Summit was happening in Seattle, I set up a XenServer nightly test on -rc1(ish). There were no identified issues, which is a clear testament to quality of the 4.6, and shows a substantial improvement over previous releases. As 4.6 has matched our quality bars for inclusion, XenServer trunk is now 4.6 based. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for 4.7] xen: Replace alloc_vcpu_guest_context() with vmalloc()
On 21.08.15 at 19:51, andrew.coop...@citrix.com wrote: This essentially reverts c/s 2037f2adb x86: introduce alloc_vcpu_guest_context(), including the newer arm bits, but achieves the same end goal by using the newer vmalloc() infrastructure. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Reviewed-by: Jan Beulich jbeul...@suse.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for 4.7] xen: Replace alloc_vcpu_guest_context() with vmalloc()
On 21.08.15 at 20:10, andrew.coop...@citrix.com wrote: On 21/08/15 18:55, Konrad Rzeszutek Wilk wrote: On Fri, Aug 21, 2015 at 06:51:46PM +0100, Andrew Cooper wrote: This essentially reverts c/s 2037f2adb x86: introduce alloc_vcpu_guest_context(), including the newer arm bits, but achieves the same end goal by using the newer vmalloc() infrastructure. Could you explain what this fixes? Or perhaps with an explanation of why this will make Xen [ ]better; [ ] faster [ ] magical. :-) Ain't the diffstat enough to qualify for [x]better ;) ? Thanks. It is relevant to a patch of Rogers which I am reviewing from the no-DM series. I was writing in reply to that, but this can also do. alloc_vcpu_guest_context() was introduced long before vmalloc(), and attempts to make the same end result using per-pcpu fixmap entries, which scale by the compile-time NR_CPUS. This patch causes a net reduction in compiled size for each arch (small for ARM, larger for x86) and removes a scalability limit for compiling with large numbers of cpus. I think I can guess what you are going to ask me to do, given this explanation. To be honest the patch (and its description) looks fine to me as is. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 26/28] libxc/xen: introduce a start info structure for HVMlite guests
El 21/08/15 a les 23.00, Andrew Cooper ha escrit: On 21/08/15 17:53, Roger Pau Monne wrote: --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -784,6 +784,25 @@ struct start_info { }; typedef struct start_info start_info_t; +/* + * Start of day structure passed to HVMlite guests in %ebx. As we are planning to rename HVMlite to PVH, I would avoid the use of HVMlite in committed code. Also as this is HVM specific, might it be better in public/hvm? I guess that depends on whether ARM is likely to use it. I don't have an opinion on this, I just placed it where it seemed more natural to me, which is after the classic PV start_info definition. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [RFC for-4.6 0/2] In-tree feature documentation
An issue which Xen has is an uncertain support statement for features. Given the success seen with docs/misc/xen-command-line.markdown, and in particular keeping it up to date, introduce a similar system for features. Patch 1 introduces a proposed template (and a makefile tweak to include the new docs/features subdirectory), while patch 2 is a feature document covering the topic of migration. This is tagged RFC as I expect people to have different views as to what is useful to include. I would particilarly appreciate feedback on the template before it starts getting used widely. Lars: Does this look like a reasonable counterpart to your formal support statement document? Jim: Per your request at the summit for new information, is patch 2 suitable? Andrew Cooper (2): docs: Template for feature documents docs: Migration feature document docs/Makefile |2 +- docs/features/migration.pandoc | 90 docs/features/template.pandoc | 55 3 files changed, 146 insertions(+), 1 deletion(-) create mode 100644 docs/features/migration.pandoc create mode 100644 docs/features/template.pandoc -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH for-4.6 2/2] docs: Migration feature document
Signed-off-by: Andrew Cooper andrew.coop...@citrix.com --- docs/features/migration.pandoc | 90 1 file changed, 90 insertions(+) create mode 100644 docs/features/migration.pandoc diff --git a/docs/features/migration.pandoc b/docs/features/migration.pandoc new file mode 100644 index 000..e0422f9 --- /dev/null +++ b/docs/features/migration.pandoc @@ -0,0 +1,90 @@ +% Migration + +\clearpage + +# Basics +--- - +Status: **Supported** + + Architecture: x86 + + Component: Toolstack +--- - + +# Overview + +Migration is a mechanism to move a virtual machine while the VM is running. +Live migration moves a running virtual machine between two physical servers, +but the same mechanism can be used for non-live migrate (pause and copy) and +suspend/resume from disk. + +# User details + +No hardware requirements, although hypervisor logdirty support is required for +live migration. + +From the command line, `xl migrate/save/restore` are the top level +interactions. e.g. + +xl create my-vm.cfg +xl migrate my-vm localhost + +or + +xl create my-vm.cfg +xl save my-vm /path/to/save/file +xl restore /path/to/save/file + +Xen 4.6 sees the instruction of Migration v2. There is no change for people +using `xl`, although the `libxl` API has had an extension. + +# Technical details + +Migration is of formed of several layers. `libxc` is responsible for the +contents of the VM (ram, vcpus, etc) and the live migration loop, while +`libxl` is responsible for items such as emulator state. + +The format of the migration v2 stream is specified in two documents, and is +architecture neutral. Compatibility with legacy streams is maintained via the +`convert-legacy-stream` script which transforms a legacy stream into a +migration v2 stream. + +* Documents +* `docs/specs/libxc-migration-stream.pandoc` +* `docs/specs/libxl-migration-stream.pandoc` +* `libxc` +* `tools/libxc/xc_sr_*.[hc]` +* `libxl` +* `tools/libxl/libxl_stream_{read,write}.c` +* Scripts +* `tools/python/xen/migration/*.py` +* `tools/python/scripts/convert-legacy-stream` +* `tools/python/scripts/verify-stream-v2` + +Users of the `libxl` API have a new parameter `stream_version` in +`domain_restore_params` which is used to distinguish between legacy and v2 +migration streams, and hence whether legacy conversion is required. + +# Limitations + +Hypervisor logdirty support is incompatible with hardware passthrough, as +IOMMU faults cannot be used to track writes. + +While not a bug in migration specifically, VMs are very sensitive to changes +in cpuid information, and cpuid levelling support currently has its issues. +Extreme care should be taken when migrating VMs between non-identical CPUs +until the cpuid levelling improvements are complete. + +# Areas for improvement + +* Arm support +* Linear P2M support for x86 PV +* Live looping parameters + +# Known issues + +* x86 HVM guest physmap operations (not reflected in logdirty bitmap) +* x86 HVM with PoD pages (attempts to map cause PoD allocations) +* x86 HVM with nested-virt (no relevant information included in the stream) +* x86 PV ballooning (P2M marked dirty, target frame not marked) +* x86 PV P2M structure changes (not noticed, stale mappings used) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH for-4.6 1/2] docs: Template for feature documents
Signed-off-by: Andrew Cooper andrew.coop...@citrix.com --- docs/Makefile |2 +- docs/features/template.pandoc | 55 + 2 files changed, 56 insertions(+), 1 deletion(-) create mode 100644 docs/features/template.pandoc diff --git a/docs/Makefile b/docs/Makefile index 272292c..5d620e5 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -16,7 +16,7 @@ MARKDOWNSRC-y := $(sort $(shell find misc -name '*.markdown' -print)) TXTSRC-y := $(sort $(shell find misc -name '*.txt' -print)) -PANDOCSRC-y := $(sort $(shell find specs -name '*.pandoc' -print)) +PANDOCSRC-y := $(sort $(shell find features/ misc/ specs/ -name '*.pandoc' -print)) # Documentation targets DOC_MAN1 := $(patsubst man/%.pod.1,man1/%.1,$(MAN1SRC-y)) diff --git a/docs/features/template.pandoc b/docs/features/template.pandoc new file mode 100644 index 000..d883b82 --- /dev/null +++ b/docs/features/template.pandoc @@ -0,0 +1,55 @@ +% Template for feature documents + +\clearpage + +This is a suggested template for formatting of a Xen feature document in tree. + +The purpose of this document is to provide a concrete support statement for the +feature (indicating its security status), as well as brief user and technical +documentation. + +# Basics + +A table with an overview of the support status and applicability. + + + Status: e.g. **Supported**/**Tech Preview**/**Experimental** + +Architecture(s): e.g. x86, arm + + Component(s): e.g. Hypervisor, toolstack, guest + + Hardware: _where applicable_ + + +# Overview + +A short description the feature, similar to an abstract for a +paper/presentation. + +# User information + +Information for a user attempting to use the feature. Should include how to +enable the feature (is it enabled by default? If not, how to turn it on?), and +how to interact with the feature (typically via `xl`). + +# Limitations + +Information concerning incompatibilities with other features or hardware +combinations. + +# Technical information + +Information for a developer or power user. Should include where to look +in-tree for detailed documents and code. + +# Areas for improvement + +List of enhancements which could be undertaken, e.g. to improve the feature +itself, or improve interaction with other features. + +# Known issues + +List of known issues or bugs. For tech preview or experimental features, this +section must contain the list of items needing fixing for its status to be +upgraded. -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 0/6] multiboot2: Add two extensions and fix some issues
Guys, especially GRUB2 maintainers, On Mon, Jul 20, 2015 at 04:35:48PM +0200, Daniel Kiper wrote: Hi, This patch series: - enables EFI boot services usage in loaded images by multiboot2 protocol, - add support for multiboot2 protocol compatible relocatable images, - fixes two minor issues. Is it possible to get your comments to this patch series (I would like to thank you Andrei and Konrad for review)? We need this functionality as Xen community and as Oracle. Hence, it will be nice to know that we go in good direction. So, if you think that we should change something please drop me a line. I know that you are busy but please, at least, tell us when you take a look at it. Daniel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 07/28] libxc: rework BSP initialization
On Fri, Aug 21, 2015 at 06:53:20PM +0200, Roger Pau Monne wrote: Place the calls to xc_vcpu_setcontext and the allocation of the hypercall buffer into the arch-specific vcpu hooks. This is needed for the next patch, s/next patch/$title/ please so x86 HVM guests can initialize the BSP using XEN_DOMCTL_sethvmcontext instead of XEN_DOMCTL_setvcpucontext. This patch should not introduce any functional change. Signed-off-by: Roger Pau Monné roger@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com --- Changes since v4: - Add Andrew Cooper Reviewed-by. --- tools/libxc/include/xc_dom.h | 2 +- tools/libxc/xc_dom_arm.c | 22 +- tools/libxc/xc_dom_boot.c| 23 +-- tools/libxc/xc_dom_x86.c | 26 -- 4 files changed, 39 insertions(+), 34 deletions(-) diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index 5c1bb0f..0245d24 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -221,7 +221,7 @@ struct xc_dom_arch { /* arch-specific data structs setup */ int (*start_info) (struct xc_dom_image * dom); int (*shared_info) (struct xc_dom_image * dom, void *shared_info); -int (*vcpu) (struct xc_dom_image * dom, void *vcpu_ctxt); +int (*vcpu) (struct xc_dom_image * dom); int (*bootearly) (struct xc_dom_image * dom); int (*bootlate) (struct xc_dom_image * dom); diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c index 7548dae..8865097 100644 --- a/tools/libxc/xc_dom_arm.c +++ b/tools/libxc/xc_dom_arm.c @@ -119,9 +119,10 @@ static int shared_info_arm(struct xc_dom_image *dom, void *ptr) /* */ -static int vcpu_arm32(struct xc_dom_image *dom, void *ptr) +static int vcpu_arm32(struct xc_dom_image *dom) { -vcpu_guest_context_t *ctxt = ptr; +vcpu_guest_context_any_t any_ctx; +vcpu_guest_context_t *ctxt = any_ctx.c; DOMPRINTF_CALLED(dom-xch); @@ -154,12 +155,18 @@ static int vcpu_arm32(struct xc_dom_image *dom, void *ptr) DOMPRINTF(Initial state CPSR %#PRIx32 PC %#PRIx32, ctxt-user_regs.cpsr, ctxt-user_regs.pc32); -return 0; +rc = xc_vcpu_setcontext(dom-xch, dom-guest_domid, 0, any_ctx); +if ( rc != 0 ) +xc_dom_panic(dom-xch, XC_INTERNAL_ERROR, + %s: SETVCPUCONTEXT failed (rc=%d), __func__, rc); + +return rc; } -static int vcpu_arm64(struct xc_dom_image *dom, void *ptr) +static int vcpu_arm64(struct xc_dom_image *dom) { -vcpu_guest_context_t *ctxt = ptr; +vcpu_guest_context_any_t any_ctx; +vcpu_guest_context_t *ctxt = any_ctx.c; DOMPRINTF_CALLED(dom-xch); /* clear everything */ @@ -189,6 +196,11 @@ static int vcpu_arm64(struct xc_dom_image *dom, void *ptr) DOMPRINTF(Initial state CPSR %#PRIx32 PC %#PRIx64, ctxt-user_regs.cpsr, ctxt-user_regs.pc64); +rc = xc_vcpu_setcontext(dom-xch, dom-guest_domid, 0, any_ctx); +if ( rc != 0 ) +xc_dom_panic(dom-xch, XC_INTERNAL_ERROR, + %s: SETVCPUCONTEXT failed (rc=%d), __func__, rc); + return 0; } diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c index e6f7794..791041b 100644 --- a/tools/libxc/xc_dom_boot.c +++ b/tools/libxc/xc_dom_boot.c @@ -62,19 +62,6 @@ static int setup_hypercall_page(struct xc_dom_image *dom) return rc; } -static int launch_vm(xc_interface *xch, domid_t domid, - vcpu_guest_context_any_t *ctxt) -{ -int rc; - -xc_dom_printf(xch, %s: called, ctxt=%p, __FUNCTION__, ctxt); -rc = xc_vcpu_setcontext(xch, domid, 0, ctxt); -if ( rc != 0 ) -xc_dom_panic(xch, XC_INTERNAL_ERROR, - %s: SETVCPUCONTEXT failed (rc=%d), __FUNCTION__, rc); -return rc; -} - static int clear_page(struct xc_dom_image *dom, xen_pfn_t pfn) { xen_pfn_t dst; @@ -197,14 +184,9 @@ void *xc_dom_boot_domU_map(struct xc_dom_image *dom, xen_pfn_t pfn, int xc_dom_boot_image(struct xc_dom_image *dom) { -DECLARE_HYPERCALL_BUFFER(vcpu_guest_context_any_t, ctxt); xc_dominfo_t info; int rc; -ctxt = xc_hypercall_buffer_alloc(dom-xch, ctxt, sizeof(*ctxt)); -if ( ctxt == NULL ) -return -1; - DOMPRINTF_CALLED(dom-xch); /* misc stuff*/ @@ -259,13 +241,10 @@ int xc_dom_boot_image(struct xc_dom_image *dom) return rc; /* let the vm run */ -memset(ctxt, 0, sizeof(*ctxt)); -if ( (rc = dom-arch_hooks-vcpu(dom, ctxt)) != 0 ) +if ( (rc = dom-arch_hooks-vcpu(dom)) != 0 ) return rc; xc_dom_unmap_all(dom); -
[Xen-devel] [PATCH v4 04/11] drivers/video/fbdev/gxt4500: Use pci_ioremap_wc_bar() to map framebuffer
From: Luis R. Rodriguez mcg...@suse.com The driver doesn't use mtrr_add() or arch_phys_wc_add() but since we know the framebuffer is isolated already on an ioremap() we can take advantage of write combining for performance where possible. In this case there are a few motivations for this: a) Take advantage of PAT when available. b) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()). Signed-off-by: Luis R. Rodriguez mcg...@suse.com Acked-by: Tomi Valkeinen tomi.valkei...@ti.com Cc: Andrew Morton a...@linux-foundation.org Cc: Andy Lutomirski l...@amacapital.net Cc: Antonino Daplas adap...@gmail.com Cc: Arnd Bergmann a...@arndb.de Cc: b...@kernel.crashing.org Cc: bhelg...@google.com Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Dave Airlie airl...@redhat.com Cc: Geert Uytterhoeven ge...@linux-m68k.org Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@kernel.org Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com Cc: Juergen Gross jgr...@suse.com Cc: Laurent Pinchart laurent.pinch...@ideasonboard.com Cc: linux-fb...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: m...@redhat.com Cc: Rob Clark robdcl...@gmail.com Cc: Suresh Siddha sbsid...@gmail.com Cc: Thomas Gleixner t...@linutronix.de Cc: toshi.k...@hp.com Link: http://lkml.kernel.org/r/1435195342-26879-5-git-send-email-mcg...@do-not-panic.com Signed-off-by: Borislav Petkov b...@suse.de --- drivers/video/fbdev/gxt4500.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/video/fbdev/gxt4500.c b/drivers/video/fbdev/gxt4500.c index 135d78a02588..f19133a80e8c 100644 --- a/drivers/video/fbdev/gxt4500.c +++ b/drivers/video/fbdev/gxt4500.c @@ -662,7 +662,7 @@ static int gxt4500_probe(struct pci_dev *pdev, const struct pci_device_id *ent) info-fix.smem_start = fb_phys; info-fix.smem_len = pci_resource_len(pdev, 1); - info-screen_base = pci_ioremap_bar(pdev, 1); + info-screen_base = pci_ioremap_wc_bar(pdev, 1); if (!info-screen_base) { dev_err(pdev-dev, gxt4500: cannot map framebuffer\n); goto err_unmap_regs; -- 2.4.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 02/11] drivers/video/fbdev/i740fb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
From: Luis R. Rodriguez mcg...@suse.com Convert the driver from using the x86-specific MTRR code to the architecture-agnostic arch_phys_wc_add(). It will avoid MTRR if write-combining is available, in order to take advantage of that also ensure the ioremapped area is requested as write-combining. There are a few motivations for this: a) Take advantage of PAT when available b) Help bury MTRR code away, MTRR is architecture-specific and on x86 it is being replaced by PAT. c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()) The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the ifdeffery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info-fix.smem_start, info-fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info-screen_base = ioremap_nocache(base, size); +info-screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info-screen_base = ioremap(base, size); +info-screen_base = ioremap_wc(base, size); Signed-off-by: Luis R. Rodriguez mcg...@suse.com Acked-by: Tomi Valkeinen tomi.valkei...@ti.com Cc: Andrew Morton a...@linux-foundation.org Cc: Andy Lutomirski l...@amacapital.net Cc: Antonino Daplas adap...@gmail.com Cc: Arnd Bergmann a...@arndb.de Cc: b...@kernel.crashing.org Cc: Benoit Taine benoit.ta...@lip6.fr Cc: Bjorn Helgaas bhelg...@google.com Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Dave Airlie airl...@redhat.com Cc: Geert Uytterhoeven ge...@linux-m68k.org Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@kernel.org Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com Cc: Jingoo Han jg1@samsung.com Cc: Juergen Gross jgr...@suse.com Cc: linux-fb...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: m...@redhat.com Cc: Rob Clark robdcl...@gmail.com Cc: Suresh Siddha sbsid...@gmail.com Cc: Thomas Gleixner t...@linutronix.de Cc: toshi.k...@hp.com Link: http://lkml.kernel.org/r/1435195342-26879-3-git-send-email-mcg...@do-not-panic.com Signed-off-by: Borislav Petkov b...@suse.de --- drivers/video/fbdev/i740fb.c | 35 ++- 1 file changed, 6 insertions(+), 29 deletions(-) diff --git a/drivers/video/fbdev/i740fb.c b/drivers/video/fbdev/i740fb.c index a2b4204b42bb..452e1163ad02 100644 --- a/drivers/video/fbdev/i740fb.c +++ b/drivers/video/fbdev/i740fb.c @@ -27,24 +27,15 @@ #include linux/console.h #include video/vga.h -#ifdef CONFIG_MTRR -#include asm/mtrr.h -#endif - #include i740_reg.h static char *mode_option; - -#ifdef CONFIG_MTRR static int mtrr = 1; -#endif struct i740fb_par { unsigned char __iomem *regs; bool has_sgram; -#ifdef CONFIG_MTRR - int mtrr_reg; -#endif + int wc_cookie; bool ddc_registered; struct i2c_adapter ddc_adapter; struct i2c_algo_bit_data ddc_algo; @@ -1040,7 +1031,7 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent) goto err_request_regions; } - info-screen_base = pci_ioremap_bar(dev, 0); + info-screen_base = pci_ioremap_wc_bar(dev, 0); if (!info-screen_base) { dev_err(info-device, error remapping base\n); ret = -ENOMEM; @@ -1144,13 +1135,9 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent) fb_info(info, %s frame buffer device\n, info-fix.id); pci_set_drvdata(dev, info); -#ifdef CONFIG_MTRR - if (mtrr) { - par-mtrr_reg = -1; - par-mtrr_reg = mtrr_add(info-fix.smem_start, - info-fix.smem_len, MTRR_TYPE_WRCOMB, 1); - } -#endif + if (mtrr) + par-wc_cookie = arch_phys_wc_add(info-fix.smem_start, + info-fix.smem_len); return 0; err_reg_framebuffer: @@ -1177,13 +1164,7 @@ static void i740fb_remove(struct pci_dev *dev) if (info) {
[Xen-devel] [PATCH v4 07/11] drivers/video/fbdev/s3fb: Use arch_phys_wc_add() and pci_iomap_wc()
From: Luis R. Rodriguez mcg...@suse.com This driver uses the same area for MTRR as for the ioremap(). Convert the driver from using the x86-specific MTRR code to the architecture-agnostic arch_phys_wc_add(). It will avoid MTRRs if write-combining is available. In order to take advantage of that also ensure the ioremapped area is requested as write-combining. There are a few motivations for this: a) Take advantage of PAT when available. b) Help bury MTRR code away, MTRR is architecture-specific and on x86 it is being replaced by PAT. c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()). The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the ifdeffery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info-fix.smem_start, info-fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info-screen_base = ioremap_nocache(base, size); +info-screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info-screen_base = ioremap(base, size); +info-screen_base = ioremap_wc(base, size); Signed-off-by: Luis R. Rodriguez mcg...@suse.com Acked-by: Tomi Valkeinen tomi.valkei...@ti.com Cc: Andrew Morton a...@linux-foundation.org Cc: Andy Lutomirski l...@amacapital.net Cc: Antonino Daplas adap...@gmail.com Cc: Arnd Bergmann a...@arndb.de Cc: b...@kernel.crashing.org Cc: bhelg...@google.com Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Dave Airlie airl...@redhat.com Cc: Geert Uytterhoeven ge...@linux-m68k.org Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@kernel.org Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com Cc: Jingoo Han jg1@samsung.com Cc: Juergen Gross jgr...@suse.com Cc: Lad, Prabhakar prabhakar.cse...@gmail.com Cc: linux-fb...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: m...@redhat.com Cc: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se Cc: Suresh Siddha sbsid...@gmail.com Cc: Thomas Gleixner t...@linutronix.de Cc: toshi.k...@hp.com Link: http://lkml.kernel.org/r/1435195342-26879-9-git-send-email-mcg...@do-not-panic.com Signed-off-by: Borislav Petkov b...@suse.de --- drivers/video/fbdev/s3fb.c | 35 ++- 1 file changed, 6 insertions(+), 29 deletions(-) diff --git a/drivers/video/fbdev/s3fb.c b/drivers/video/fbdev/s3fb.c index f0ae61a37f04..13b109073c63 100644 --- a/drivers/video/fbdev/s3fb.c +++ b/drivers/video/fbdev/s3fb.c @@ -28,13 +28,9 @@ #include linux/i2c.h #include linux/i2c-algo-bit.h -#ifdef CONFIG_MTRR -#include asm/mtrr.h -#endif - struct s3fb_info { int chip, rev, mclk_freq; - int mtrr_reg; + int wc_cookie; struct vgastate state; struct mutex open_lock; unsigned int ref_count; @@ -154,11 +150,7 @@ static const struct svga_timing_regs s3_timing_regs = { static char *mode_option; - -#ifdef CONFIG_MTRR static int mtrr = 1; -#endif - static int fasttext = 1; @@ -170,11 +162,8 @@ module_param(mode_option, charp, 0444); MODULE_PARM_DESC(mode_option, Default video mode ('640x480-8@60', etc)); module_param_named(mode, mode_option, charp, 0444); MODULE_PARM_DESC(mode, Default video mode ('640x480-8@60', etc) (deprecated)); - -#ifdef CONFIG_MTRR module_param(mtrr, int, 0444); MODULE_PARM_DESC(mtrr, Enable write-combining with MTRR (1=enable, 0=disable, default=1)); -#endif module_param(fasttext, int, 0644); MODULE_PARM_DESC(fasttext, Enable S3 fast text mode (1=enable, 0=disable, default=1)); @@ -1168,7 +1157,7 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) info-fix.smem_len = pci_resource_len(dev, 0); /* Map physical IO memory address into kernel space */ - info-screen_base = pci_iomap(dev, 0, 0); + info-screen_base = pci_iomap_wc(dev, 0, 0); if (! info-screen_base) { rc = -ENOMEM; dev_err(info-device, iomap for framebuffer failed\n); @@ -1365,12 +1354,9 @@
[Xen-devel] [PATCH v4 10/11] dma: rename dma_*_writecombine() to dma_*_wc()
From: Luis R. Rodriguez mcg...@suse.com Rename dma_*_writecombine() to dma_*_wc(), so that the naming is coherent across the various write-combining APIs. The following Coccinelle SmPL patch was used for this simple transformation: @ rename_dma_alloc_writecombine @ expression dev, size, dma_addr, gfp; @@ -dma_alloc_writecombine(dev, size, dma_addr, gfp) +dma_alloc_wc(dev, size, dma_addr, gfp) @ rename_dma_free_writecombine @ expression dev, size, cpu_addr, dma_addr; @@ -dma_free_writecombine(dev, size, cpu_addr, dma_addr) +dma_free_wc(dev, size, cpu_addr, dma_addr) @ rename_dma_mmap_writecombine @ expression dev, vma, cpu_addr, dma_addr, size; @@ -dma_mmap_writecombine(dev, vma, cpu_addr, dma_addr, size) +dma_mmap_wc(dev, vma, cpu_addr, dma_addr, size) Generated-by: Coccinelle SmPL Suggested-by: Ingo Molnar mi...@kernel.org Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- arch/arm/mach-lpc32xx/phy3250.c | 13 ++--- arch/arm/mach-netx/fb.c | 14 ++ arch/arm/mach-nspire/clcd.c | 13 ++--- arch/avr32/include/asm/dma-mapping.h | 20 ++-- arch/avr32/mm/dma-coherent.c | 12 ++-- arch/metag/include/asm/dma-mapping.h | 4 ++-- arch/metag/kernel/dma.c | 6 +++--- drivers/dma/iop-adma.c| 8 drivers/dma/mv_xor.c | 4 ++-- drivers/dma/qcom_bam_dma.c| 14 +++--- drivers/gpu/drm/drm_gem_cma_helper.c | 13 ++--- drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 13 ++--- drivers/gpu/drm/omapdrm/omap_gem.c| 8 drivers/gpu/drm/sti/sti_cursor.c | 13 ++--- drivers/gpu/drm/sti/sti_gdp.c | 3 +-- drivers/gpu/drm/sti/sti_hqvdp.c | 6 +++--- drivers/gpu/drm/tegra/gem.c | 11 +-- drivers/gpu/host1x/cdma.c | 8 drivers/gpu/host1x/job.c | 10 -- drivers/media/platform/coda/coda-bit.c| 10 +- drivers/video/fbdev/acornfb.c | 4 ++-- drivers/video/fbdev/amba-clcd-versatile.c | 14 ++ drivers/video/fbdev/amba-clcd.c | 4 ++-- drivers/video/fbdev/atmel_lcdfb.c | 9 + drivers/video/fbdev/ep93xx-fb.c | 9 +++-- drivers/video/fbdev/gbefb.c | 8 drivers/video/fbdev/imxfb.c | 12 ++-- drivers/video/fbdev/mx3fb.c | 9 - drivers/video/fbdev/nuc900fb.c| 8 drivers/video/fbdev/omap/lcdc.c | 16 drivers/video/fbdev/pxa168fb.c| 8 drivers/video/fbdev/pxafb.c | 4 ++-- drivers/video/fbdev/s3c-fb.c | 7 +++ drivers/video/fbdev/s3c2410fb.c | 8 drivers/video/fbdev/sa1100fb.c| 8 include/linux/dma-mapping.h | 16 sound/arm/pxa2xx-pcm-lib.c| 20 sound/soc/fsl/imx-pcm-fiq.c | 10 -- sound/soc/nuc900/nuc900-pcm.c | 6 ++ sound/soc/omap/omap-pcm.c | 12 40 files changed, 183 insertions(+), 212 deletions(-) diff --git a/arch/arm/mach-lpc32xx/phy3250.c b/arch/arm/mach-lpc32xx/phy3250.c index 77d6b1bab278..ee06fabdf60e 100644 --- a/arch/arm/mach-lpc32xx/phy3250.c +++ b/arch/arm/mach-lpc32xx/phy3250.c @@ -86,8 +86,8 @@ static int lpc32xx_clcd_setup(struct clcd_fb *fb) { dma_addr_t dma; - fb-fb.screen_base = dma_alloc_writecombine(fb-dev-dev, - PANEL_SIZE, dma, GFP_KERNEL); + fb-fb.screen_base = dma_alloc_wc(fb-dev-dev, PANEL_SIZE, dma, + GFP_KERNEL); if (!fb-fb.screen_base) { printk(KERN_ERR CLCD: unable to map framebuffer\n); return -ENOMEM; @@ -116,15 +116,14 @@ static int lpc32xx_clcd_setup(struct clcd_fb *fb) static int lpc32xx_clcd_mmap(struct clcd_fb *fb, struct vm_area_struct *vma) { - return dma_mmap_writecombine(fb-dev-dev, vma, - fb-fb.screen_base, fb-fb.fix.smem_start, - fb-fb.fix.smem_len); + return dma_mmap_wc(fb-dev-dev, vma, fb-fb.screen_base, + fb-fb.fix.smem_start, fb-fb.fix.smem_len); } static void lpc32xx_clcd_remove(struct clcd_fb *fb) { - dma_free_writecombine(fb-dev-dev, fb-fb.fix.smem_len, - fb-fb.screen_base, fb-fb.fix.smem_start); + dma_free_wc(fb-dev-dev, fb-fb.fix.smem_len, fb-fb.screen_base, + fb-fb.fix.smem_start); } /* diff --git a/arch/arm/mach-netx/fb.c b/arch/arm/mach-netx/fb.c index d122ee6ab991..8814ee5e98fd 100644 --- a/arch/arm/mach-netx/fb.c +++ b/arch/arm/mach-netx/fb.c @@ -42,8 +42,8 @@ int netx_clcd_setup(struct clcd_fb *fb) fb-panel = netx_panel; - fb-fb.screen_base =
[Xen-devel] [PATCH v4 09/11] drivers/dma/iop-adma: Use dma_alloc_writecombine() kernel-style
From: Luis R. Rodriguez mcg...@suse.com dma_alloc_writecombine()'s call and return value check is tangled in all in one call. Untangle both calls according to kernel coding style. Signed-off-by: Luis R. Rodriguez mcg...@suse.com Acked-by: Vinod Koul vinod.k...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams dan.j.willi...@intel.com Cc: dmaeng...@vger.kernel.org Cc: x...@kernel.org Link: http://lkml.kernel.org/r/1435258191-543-2-git-send-email-mcg...@do-not-panic.com Signed-off-by: Borislav Petkov b...@suse.de --- drivers/dma/iop-adma.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/dma/iop-adma.c b/drivers/dma/iop-adma.c index 998826854fdd..e4f43125e0fb 100644 --- a/drivers/dma/iop-adma.c +++ b/drivers/dma/iop-adma.c @@ -1300,10 +1300,11 @@ static int iop_adma_probe(struct platform_device *pdev) * note: writecombine gives slightly better performance, but * requires that we explicitly flush the writes */ - if ((adev-dma_desc_pool_virt = dma_alloc_writecombine(pdev-dev, - plat_data-pool_size, - adev-dma_desc_pool, - GFP_KERNEL)) == NULL) { + adev-dma_desc_pool_virt = dma_alloc_writecombine(pdev-dev, + plat_data-pool_size, + adev-dma_desc_pool, + GFP_KERNEL); + if (!adev-dma_desc_pool_virt) { ret = -ENOMEM; goto err_free_adev; } -- 2.4.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 00/11] x86/dma: RIP MTRR and dma write-combine API rename
From: Luis R. Rodriguez mcg...@suse.com Ingo, This is my pending series of patches for both write-combining and moving Linux' use of MTRR into the grave. It combines three set of straggler patch series which have been pending integration for a while now. The rename patches do not depend in any way with the MTRR patches but I've combined them here as they are all pending and relating to write-combining. I explain why integration of such patches has been delayed but also provide reasoning for why I believe its time to merge them. 1) The DMA API rename for write-combining goes with the old naming convention defines added as suggested by you for any possible stragglers which may come up as this goes through and gets merged. We can remove the old define mappings after a release once this gets sucked in and things settle. These patches have been in Boris tree for a while but I keep having to refresh them as the kernel moves on, the addition of the old mapping should allow us to merge this without any collateral. 2) The PCI driver changes go with Tomi Valkeinen's Acks as well as Arnd Bergmann's own Acks for the PCI and asm-generic changes. This series was technically acknowledged by Bjorn to be correct and acceptable but his preference was for this to not use EXPORT_SYMBOL_GPL() as not *all* write-combine APIs are using EXPORT_SYMBOL_GPL(). Our goal on x86 though is to not deal with bug reports for new PAT APIs [0] and since its now clear through documentation that its up to the maintainers / developers if they decide to use EXPORT_SYMBOL_GPL() for new *features* [1] I keep that practice in alignment with our own x86 goals to avoid bug reports and issues with proprietary drivers on new PAT interfaces. Bjorn was happy for this to go through someone else's tree, in particular Arnd's. Arnd Acked the series [2] but is unable to take these patches in at this time as he's out on paternal leave so sending these through you with the respective Acks. Boris has been hugely instrumental on helping review all MTRR related series, these were sitting on his queue for a while, but he's also unavailable now as he's on vacation. 3) Unexporting direct MTRR access. This is last patch, I had posted this first on March 2015 when [3] I originally had meshed up all the MTRR work into one giant series. I ending up splitting up all the work into *over* 12 series. Now that all the work is merged except these few patches I've combined that patch as the last part of this series. I've amended an obituary note for MTRR regarding platform firmware access to MTRR based on recent discussions [4] and updated the documentation to reflect the status quo for Linux. All of these patches have been tested by the 0-day bot machine. The only thing *new* here is just my obituary amended documentation note for platform firmware access to MTRRs on the last patch based hugely on review with Toshi. [0] http://lkml.kernel.org/r/1424961893.17007.139.ca...@misato.fc.hp.com [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=582ed8d51e2b6cb8a168c94852bca482685c2509 [2] http://lkml.kernel.org/r/2962702.QXZzP3RbKY@wuerfel [3] http://lkml.kernel.org/r/1426893517-2511-48-git-send-email-mcg...@do-not-panic.com [4] http://lkml.kernel.org/r/1438991330.3109.196.ca...@hp.com Luis R. Rodriguez (11): PCI: Add pci_ioremap_wc_bar() drivers/video/fbdev/i740fb: Use arch_phys_wc_add() and pci_ioremap_wc_bar() drivers/video/fbdev/kyrofb: Use arch_phys_wc_add() and pci_ioremap_wc_bar() drivers/video/fbdev/gxt4500: Use pci_ioremap_wc_bar() to map framebuffer PCI: Add pci_iomap_wc() variants drivers/video/fbdev/arkfb.c: Use arch_phys_wc_add() and pci_iomap_wc() drivers/video/fbdev/s3fb: Use arch_phys_wc_add() and pci_iomap_wc() drivers/video/fbdev/vt8623fb: Use arch_phys_wc_add() and pci_iomap_wc() drivers/dma/iop-adma: Use dma_alloc_writecombine() kernel-style dma: rename dma_*_writecombine() to dma_*_wc() mtrr: bury MTRR - unexport mtrr_add() and mtrr_del() Documentation/x86/mtrr.txt| 20 -- arch/arm/mach-lpc32xx/phy3250.c | 13 +++--- arch/arm/mach-netx/fb.c | 14 +++ arch/arm/mach-nspire/clcd.c | 13 +++--- arch/avr32/include/asm/dma-mapping.h | 20 +- arch/avr32/mm/dma-coherent.c | 12 +++--- arch/metag/include/asm/dma-mapping.h | 4 +- arch/metag/kernel/dma.c | 6 +-- arch/x86/kernel/cpu/mtrr/main.c | 2 - drivers/dma/iop-adma.c| 9 +++-- drivers/dma/mv_xor.c | 4 +- drivers/dma/qcom_bam_dma.c| 14 +++ drivers/gpu/drm/drm_gem_cma_helper.c | 13 +++--- drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 13 +++--- drivers/gpu/drm/omapdrm/omap_gem.c| 8 ++-- drivers/gpu/drm/sti/sti_cursor.c | 13 +++---
[Xen-devel] [PATCH v4 01/11] PCI: Add pci_ioremap_wc_bar()
From: Luis R. Rodriguez mcg...@suse.com This lets drivers take advantage of PAT when available. It should help with the transition of converting video drivers over to ioremap_wc() to help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache(), see: de33c442ed2a (x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()) Signed-off-by: Luis R. Rodriguez mcg...@suse.com Acked-by: Arnd Bergmann a...@arndb.de Cc: Andrew Morton a...@linux-foundation.org Cc: Andy Lutomirski l...@amacapital.net Cc: Antonino Daplas adap...@gmail.com Cc: b...@kernel.crashing.org Cc: Bjorn Helgaas bhelg...@google.com Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Dave Airlie airl...@redhat.com Cc: Davidlohr Bueso dbu...@suse.de Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@kernel.org Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com Cc: Juergen Gross jgr...@suse.com Cc: linux-fb...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: Mel Gorman mgor...@suse.de Cc: m...@redhat.com Cc: Suresh Siddha sbsid...@gmail.com Cc: Thomas Gleixner t...@linutronix.de Cc: Tomi Valkeinen tomi.valkei...@ti.com Cc: Toshi Kani toshi.k...@hp.com Cc: Ville Syrjälä syrj...@sci.fi Cc: Vlastimil Babka vba...@suse.cz Link: http://lkml.kernel.org/r/1435195342-26879-2-git-send-email-mcg...@do-not-panic.com Signed-off-by: Borislav Petkov b...@suse.de --- drivers/pci/pci.c | 14 ++ include/linux/pci.h | 1 + 2 files changed, 15 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8037c27beb05..33867b8a4bc9 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -138,6 +138,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar) return ioremap_nocache(res-start, resource_size(res)); } EXPORT_SYMBOL_GPL(pci_ioremap_bar); + +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar) +{ + /* +* Make sure the BAR is actually a memory resource, not an IO resource +*/ + if (!(pci_resource_flags(pdev, bar) IORESOURCE_MEM)) { + WARN_ON(1); + return NULL; + } + return ioremap_wc(pci_resource_start(pdev, bar), + pci_resource_len(pdev, bar)); +} +EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar); #endif diff --git a/include/linux/pci.h b/include/linux/pci.h index 88bee285b93d..2b2d7d44c21a 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1701,6 +1701,7 @@ static inline void pci_mmcfg_late_init(void) { } int pci_ext_cfg_avail(void); void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar); +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar); #ifdef CONFIG_PCI_IOV int pci_iov_virtfn_bus(struct pci_dev *dev, int id); -- 2.4.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 11/11] mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()
From: Luis R. Rodriguez mcg...@suse.com The crusade to replace mtrr_add() with architecture agnostic arch_phys_wc_add() is complete, this will ensure write-combining implementations (PAT on x86) is taken advantage instead of using MTRR. With the crusade done now, hide direct MTRR access for drivers. Update x86 documentation on MTRR to reflect the completion of the phasing out of direct access to MTRR, also add a note on platform firmware code use of MTRRs based on the obituary discussion of MTRRs on Linux [0]. [0] http://lkml.kernel.org/r/1438991330.3109.196.ca...@hp.com Cc: Toshi Kani toshi.k...@hp.com Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Borislav Petkov b...@suse.de Cc: Dave Hansen dave.han...@linux.intel.com Cc: Suresh Siddha sbsid...@gmail.com Cc: Ingo Molnar mi...@elte.hu Cc: Juergen Gross jgr...@suse.com Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Andy Lutomirski l...@amacapital.net Cc: Dave Airlie airl...@redhat.com Cc: Antonino Daplas adap...@gmail.com Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com Cc: Tomi Valkeinen tomi.valkei...@ti.com Cc: Ville Syrjälä syrj...@sci.fi Cc: Mel Gorman mgor...@suse.de Cc: Vlastimil Babka vba...@suse.cz Cc: Davidlohr Bueso dbu...@suse.de Cc: Doug Ledford dledf...@redhat.com Cc: Andy Walls awa...@md.metrocast.net Cc: x...@kernel.org Cc: net...@vger.kernel.org Cc: linux-me...@vger.kernel.org Cc: linux-fb...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- Documentation/x86/mtrr.txt | 20 arch/x86/kernel/cpu/mtrr/main.c | 2 -- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt index 860bc3adc223..8a0bdb6e7370 100644 --- a/Documentation/x86/mtrr.txt +++ b/Documentation/x86/mtrr.txt @@ -6,10 +6,22 @@ Luis R. Rodriguez mcg...@do-not-panic.com - April 9, 2015 === Phasing out MTRR use -MTRR use is replaced on modern x86 hardware with PAT. Over time the only type -of effective MTRR that is expected to be supported will be for write-combining. -As MTRR use is phased out device drivers should use arch_phys_wc_add() to make -MTRR effective on non-PAT systems while a no-op on PAT enabled systems. +MTRR use is replaced on modern x86 hardware with PAT. Direct MTRR use by +drivers on Linux is now completely phased out, device drivers should use +arch_phys_wc_add() in combination with ioremap_wc() to make MTRR effective on +non-PAT systems while a no-op but equally effective on PAT enabled systems. + +Even if Linux does not use MTRR directly some x86 platform firmware may still +set up MTRRs early before booting the OS, they do this as some platform +firmware may still have implemented access to MTRRs which would be controlled +and handled by the platform firmware directly. An example of platform use of +MTRR is through the use of SMI handlers, one case could be for fan control, +the platform code would need uncachable access to some of its fan control +registers. Such platform access does not need any Operating System MTRR code in +place other than mtrr_type_lookup() to ensure any OS specific mapping requests +are aligned with platform MTRR setup. If MTRRs are only set up by the platform +firmware code though and the OS does not make any specific MTRR mapping +requests mtrr_type_lookup() should always return MTRR_TYPE_INVALID. For details refer to Documentation/x86/pat.txt. diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index e7ed0d8ebacb..f891b4750f04 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -448,7 +448,6 @@ int mtrr_add(unsigned long base, unsigned long size, unsigned int type, return mtrr_add_page(base PAGE_SHIFT, size PAGE_SHIFT, type, increment); } -EXPORT_SYMBOL(mtrr_add); /** * mtrr_del_page - delete a memory type region @@ -537,7 +536,6 @@ int mtrr_del(int reg, unsigned long base, unsigned long size) return -EINVAL; return mtrr_del_page(reg, base PAGE_SHIFT, size PAGE_SHIFT); } -EXPORT_SYMBOL(mtrr_del); /** * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable -- 2.4.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 06/11] drivers/video/fbdev/arkfb.c: Use arch_phys_wc_add() and pci_iomap_wc()
From: Luis R. Rodriguez mcg...@suse.com Convert the driver from using the x86-specific MTRR code to the architecture-agnostic arch_phys_wc_add(). It will avoid MTRRs if write-combining is available. In order to take advantage of that also ensure the ioremapped area is requested as write-combining. There are a few motivations for this: a) Take advantage of PAT when available. b) Help bury MTRR code away, MTRR is architecture-specific and on x86 it is being replaced by PAT. c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()). The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the ifdeffery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info-fix.smem_start, info-fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info-screen_base = ioremap_nocache(base, size); +info-screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info-screen_base = ioremap(base, size); +info-screen_base = ioremap_wc(base, size); Signed-off-by: Luis R. Rodriguez mcg...@suse.com Acked-by: Tomi Valkeinen tomi.valkei...@ti.com Cc: Andrew Morton a...@linux-foundation.org Cc: Andy Lutomirski l...@amacapital.net Cc: Antonino Daplas adap...@gmail.com Cc: Arnd Bergmann a...@arndb.de Cc: b...@kernel.crashing.org Cc: bhelg...@google.com Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Dave Airlie airl...@redhat.com Cc: Geert Uytterhoeven ge...@linux-m68k.org Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@kernel.org Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com Cc: Juergen Gross jgr...@suse.com Cc: Lad, Prabhakar prabhakar.cse...@gmail.com Cc: Laurent Pinchart laurent.pinch...@ideasonboard.com Cc: linux-fb...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: m...@redhat.com Cc: Suresh Siddha sbsid...@gmail.com Cc: Thomas Gleixner t...@linutronix.de Cc: toshi.k...@hp.com Link: http://lkml.kernel.org/r/1435195342-26879-8-git-send-email-mcg...@do-not-panic.com Signed-off-by: Borislav Petkov b...@suse.de --- drivers/video/fbdev/arkfb.c | 36 +--- 1 file changed, 5 insertions(+), 31 deletions(-) diff --git a/drivers/video/fbdev/arkfb.c b/drivers/video/fbdev/arkfb.c index b305a1e7cc76..6a317de7082c 100644 --- a/drivers/video/fbdev/arkfb.c +++ b/drivers/video/fbdev/arkfb.c @@ -26,13 +26,9 @@ #include linux/console.h /* Why should fb driver call console functions? because console_lock() */ #include video/vga.h -#ifdef CONFIG_MTRR -#include asm/mtrr.h -#endif - struct arkfb_info { int mclk_freq; - int mtrr_reg; + int wc_cookie; struct dac_info *dac; struct vgastate state; @@ -102,10 +98,6 @@ static const struct svga_timing_regs ark_timing_regs = { static char *mode_option = 640x480-8@60; -#ifdef CONFIG_MTRR -static int mtrr = 1; -#endif - MODULE_AUTHOR((c) 2007 Ondrej Zajicek santi...@crfreenet.org); MODULE_LICENSE(GPL); MODULE_DESCRIPTION(fbdev driver for ARK 2000PV); @@ -115,11 +107,6 @@ MODULE_PARM_DESC(mode_option, Default video mode ('640x480-8@60', etc)); module_param_named(mode, mode_option, charp, 0444); MODULE_PARM_DESC(mode, Default video mode ('640x480-8@60', etc) (deprecated)); -#ifdef CONFIG_MTRR -module_param(mtrr, int, 0444); -MODULE_PARM_DESC(mtrr, Enable write-combining with MTRR (1=enable, 0=disable, default=1)); -#endif - static int threshold = 4; module_param(threshold, int, 0644); @@ -1002,7 +989,7 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) info-fix.smem_len = pci_resource_len(dev, 0); /* Map physical IO memory address into kernel space */ - info-screen_base = pci_iomap(dev, 0, 0); + info-screen_base = pci_iomap_wc(dev, 0, 0); if (! info-screen_base) { rc = -ENOMEM; dev_err(info-device, iomap for framebuffer failed\n); @@ -1057,14 +1044,8 @@ static int ark_pci_probe(struct pci_dev *dev, const
[Xen-devel] [PATCH v4 05/11] PCI: Add pci_iomap_wc() variants
From: Luis R. Rodriguez mcg...@suse.com PCI BARs tell us whether prefetching is safe, but they don't say anything about write combining (WC). WC changes ordering rules and allows writes to be collapsed, so it's not safe in general to use it on a prefetchable region. Add pci_iomap_wc() and pci_iomap_wc_range() so drivers can take advantage of write combining when they know it's safe. On architectures that don't fully support WC, e.g., x86 without PAT, drivers for legacy framebuffers may get some of the benefit by using arch_phys_wc_add() in addition to pci_iomap_wc(). But arch_phys_wc_add() is unreliable and should be avoided in general. On x86, it uses MTRRs, which are limited in number and size, so the results will vary based on driver loading order. The goals of adding pci_iomap_wc() are to: - Give drivers an architecture-independent way to use WC so they can stop using interfaces like mtrr_add() (on x86, pci_iomap_wc() uses PAT when available). - Move toward using _PAGE_CACHE_MODE_UC, not _PAGE_CACHE_MODE_UC_MINUS, on x86 on ioremap_nocache() (see de33c442ed2a (x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()). Signed-off-by: Luis R. Rodriguez mcg...@suse.com Acked-by: Arnd Bergmann a...@arndb.de Cc: Andrew Morton a...@linux-foundation.org Cc: Andy Lutomirski l...@amacapital.net Cc: Antonino Daplas adap...@gmail.com Cc: Arnd Bergmann a...@arndb.de Cc: b...@kernel.crashing.org Cc: bhelg...@google.com Cc: Bjorn Helgaas bhelg...@google.com Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Dave Airlie airl...@redhat.com Cc: Dave Hansen dave.han...@linux.intel.com Cc: Davidlohr Bueso dbu...@suse.de Cc: david.vra...@citrix.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@kernel.org Cc: jbeul...@suse.com Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com Cc: Juergen Gross jgr...@suse.com Cc: konrad.w...@oracle.com Cc: linux-a...@vger.kernel.org Cc: linux-fb...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: Mel Gorman mgor...@suse.de Cc: Michael S. Tsirkin m...@redhat.com Cc: Roger Pau Monné roger@citrix.com Cc: Rusty Russell ru...@rustcorp.com.au Cc: Stefan Bader stefan.ba...@canonical.com Cc: Suresh Siddha sbsid...@gmail.com Cc: Thomas Gleixner t...@linutronix.de Cc: Tomi Valkeinen tomi.valkei...@ti.com Cc: Toshi Kani toshi.k...@hp.com Cc: venkatesh.pallip...@intel.com Cc: Ville Syrjälä syrj...@sci.fi Cc: Vlastimil Babka vba...@suse.cz Link: http://lkml.kernel.org/r/1426893517-2511-6-git-send-email-mcg...@do-not-panic.com Link: http://lkml.kernel.org/r/1435195342-26879-6-git-send-email-mcg...@do-not-panic.com [ Move IORESOURCE_IO check up, space out statements for better readability. ] Signed-off-by: Borislav Petkov b...@suse.de --- include/asm-generic/pci_iomap.h | 14 + lib/pci_iomap.c | 66 + 2 files changed, 80 insertions(+) diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h index 7389c87116a0..b1e17fcee2d0 100644 --- a/include/asm-generic/pci_iomap.h +++ b/include/asm-generic/pci_iomap.h @@ -15,9 +15,13 @@ struct pci_dev; #ifdef CONFIG_PCI /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */ extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max); +extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max); extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, unsigned long offset, unsigned long maxlen); +extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, + unsigned long offset, + unsigned long maxlen); /* Create a virtual mapping cookie for a port on a given PCI device. * Do not call this directly, it exists to make it easier for architectures * to override */ @@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon return NULL; } +static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max) +{ + return NULL; +} static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, unsigned long offset, unsigned long maxlen) { return NULL; } +static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, + unsigned long offset, + unsigned long maxlen) +{ + return NULL; +} #endif #endif /* __ASM_GENERIC_IO_H */ diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c index e1930dbab2da..c10fba461454 100644 --- a/lib/pci_iomap.c +++ b/lib/pci_iomap.c @@ -49,6 +49,51 @@ void __iomem *pci_iomap_range(struct pci_dev *dev, EXPORT_SYMBOL(pci_iomap_range); /** + *
[Xen-devel] [PATCH v4 03/11] drivers/video/fbdev/kyrofb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
From: Luis R. Rodriguez mcg...@suse.com Convert the driver from using the x86-specific MTRR code to the architecture-agnostic arch_phys_wc_add(). It will avoid MTRR if write-combining is available, in order to take advantage of that also ensure the ioremapped area is requested as write-combining. There are a few motivations for this: a) Take advantage of PAT when available b) Help bury MTRR code away, MTRR is architecture-specific and on x86 it is being replaced by PAT. c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()) The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the ifdeffery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info-fix.smem_start, info-fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info-screen_base = ioremap_nocache(base, size); +info-screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info-screen_base = ioremap(base, size); +info-screen_base = ioremap_wc(base, size); Signed-off-by: Luis R. Rodriguez mcg...@suse.com Acked-by: Tomi Valkeinen tomi.valkei...@ti.com Cc: Andrew Morton a...@linux-foundation.org Cc: Andy Lutomirski l...@amacapital.net Cc: Antonino Daplas adap...@gmail.com Cc: Arnd Bergmann a...@arndb.de Cc: b...@kernel.crashing.org Cc: bhelg...@google.com Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Dave Airlie airl...@redhat.com Cc: Geert Uytterhoeven ge...@linux-m68k.org Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@kernel.org Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com Cc: Jingoo Han jg1@samsung.com Cc: Juergen Gross jgr...@suse.com Cc: Laurent Pinchart laurent.pinch...@ideasonboard.com Cc: linux-fb...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: m...@redhat.com Cc: Suresh Siddha sbsid...@gmail.com Cc: Thomas Gleixner t...@linutronix.de Cc: toshi.k...@hp.com Link: http://lkml.kernel.org/r/1435195342-26879-4-git-send-email-mcg...@do-not-panic.com Signed-off-by: Borislav Petkov b...@suse.de --- drivers/video/fbdev/kyro/fbdev.c | 33 +++-- include/video/kyro.h | 4 +--- 2 files changed, 12 insertions(+), 25 deletions(-) diff --git a/drivers/video/fbdev/kyro/fbdev.c b/drivers/video/fbdev/kyro/fbdev.c index 65041e15fd59..5bb01533271e 100644 --- a/drivers/video/fbdev/kyro/fbdev.c +++ b/drivers/video/fbdev/kyro/fbdev.c @@ -22,9 +22,6 @@ #include linux/pci.h #include asm/io.h #include linux/uaccess.h -#ifdef CONFIG_MTRR -#include asm/mtrr.h -#endif #include video/kyro.h @@ -84,9 +81,7 @@ static device_info_t deviceInfo; static char *mode_option = NULL; static int nopan = 0; static int nowrap = 1; -#ifdef CONFIG_MTRR static int nomtrr = 0; -#endif /* PCI driver prototypes */ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent); @@ -570,10 +565,8 @@ static int __init kyrofb_setup(char *options) nopan = 1; } else if (strcmp(this_opt, nowrap) == 0) { nowrap = 1; -#ifdef CONFIG_MTRR } else if (strcmp(this_opt, nomtrr) == 0) { nomtrr = 1; -#endif } else { mode_option = this_opt; } @@ -691,17 +684,16 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent) currentpar-regbase = deviceInfo.pSTGReg = ioremap_nocache(kyro_fix.mmio_start, kyro_fix.mmio_len); + if (!currentpar-regbase) + goto out_free_fb; - info-screen_base = ioremap_nocache(kyro_fix.smem_start, - kyro_fix.smem_len); + info-screen_base = pci_ioremap_wc_bar(pdev, 0); + if (!info-screen_base) + goto out_unmap_regs; -#ifdef CONFIG_MTRR if (!nomtrr) - currentpar-mtrr_handle = - mtrr_add(kyro_fix.smem_start, -
[Xen-devel] [PATCH v4 08/11] drivers/video/fbdev/vt8623fb: Use arch_phys_wc_add() and pci_iomap_wc()
From: Luis R. Rodriguez mcg...@suse.com This driver uses the same area for MTRR as for the ioremap(). Convert the driver from using the x86-specific MTRR code to the architecture-agnostic arch_phys_wc_add(). It will avoid MTRRs if write-combining is available. In order to take advantage of that also ensure the ioremapped area is requested as write-combining. There are a few motivations for this: a) Take advantage of PAT when available. b) Help bury MTRR code away, MTRR is architecture-specific and on x86 it is being replaced by PAT. c) Help with the goal of eventually using _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit de33c442e titled x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()). The conversion done is expressed by the following Coccinelle SmPL patch, it additionally required manual intervention to address all the ifdeffery and removal of redundant things which arch_phys_wc_add() already addresses such as verbose message about when MTRR fails and doing nothing when we didn't get an MTRR. @ mtrr_found @ expression index, base, size; @@ -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1); +index = arch_phys_wc_add(base, size); @ mtrr_rm depends on mtrr_found @ expression mtrr_found.index, mtrr_found.base, mtrr_found.size; @@ -mtrr_del(index, base, size); +arch_phys_wc_del(index); @ mtrr_rm_zero_arg depends on mtrr_found @ expression mtrr_found.index; @@ -mtrr_del(index, 0, 0); +arch_phys_wc_del(index); @ mtrr_rm_fb_info depends on mtrr_found @ struct fb_info *info; expression mtrr_found.index; @@ -mtrr_del(index, info-fix.smem_start, info-fix.smem_len); +arch_phys_wc_del(index); @ ioremap_replace_nocache depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info-screen_base = ioremap_nocache(base, size); +info-screen_base = ioremap_wc(base, size); @ ioremap_replace_default depends on mtrr_found @ struct fb_info *info; expression base, size; @@ -info-screen_base = ioremap(base, size); +info-screen_base = ioremap_wc(base, size); Signed-off-by: Luis R. Rodriguez mcg...@suse.com Acked-by: Tomi Valkeinen tomi.valkei...@ti.com Cc: Andrew Morton a...@linux-foundation.org Cc: Andy Lutomirski l...@amacapital.net Cc: Antonino Daplas adap...@gmail.com Cc: Arnd Bergmann a...@arndb.de Cc: b...@kernel.crashing.org Cc: bhelg...@google.com Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Dave Airlie airl...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@kernel.org Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com Cc: Jingoo Han jg1@samsung.com Cc: Juergen Gross jgr...@suse.com Cc: Lad, Prabhakar prabhakar.cse...@gmail.com Cc: Laurent Pinchart laurent.pinch...@ideasonboard.com Cc: linux-fb...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: m...@redhat.com Cc: Rob Clark robdcl...@gmail.com Cc: Suresh Siddha sbsid...@gmail.com Cc: Thomas Gleixner t...@linutronix.de Cc: toshi.k...@hp.com Link: http://lkml.kernel.org/r/1435195342-26879-10-git-send-email-mcg...@do-not-panic.com Signed-off-by: Borislav Petkov b...@suse.de --- drivers/video/fbdev/vt8623fb.c | 31 ++- 1 file changed, 6 insertions(+), 25 deletions(-) diff --git a/drivers/video/fbdev/vt8623fb.c b/drivers/video/fbdev/vt8623fb.c index 8bac309c24b9..dd0f18e42d3e 100644 --- a/drivers/video/fbdev/vt8623fb.c +++ b/drivers/video/fbdev/vt8623fb.c @@ -26,13 +26,9 @@ #include linux/console.h /* Why should fb driver call console functions? because console_lock() */ #include video/vga.h -#ifdef CONFIG_MTRR -#include asm/mtrr.h -#endif - struct vt8623fb_info { char __iomem *mmio_base; - int mtrr_reg; + int wc_cookie; struct vgastate state; struct mutex open_lock; unsigned int ref_count; @@ -99,10 +95,7 @@ static struct svga_timing_regs vt8623_timing_regs = { /* Module parameters */ static char *mode_option = 640x480-8@60; - -#ifdef CONFIG_MTRR static int mtrr = 1; -#endif MODULE_AUTHOR((c) 2006 Ondrej Zajicek santi...@crfreenet.org); MODULE_LICENSE(GPL); @@ -112,11 +105,8 @@ module_param(mode_option, charp, 0644); MODULE_PARM_DESC(mode_option, Default video mode ('640x480-8@60', etc)); module_param_named(mode, mode_option, charp, 0); MODULE_PARM_DESC(mode, Default video mode e.g. '648x480-8@60' (deprecated)); - -#ifdef CONFIG_MTRR module_param(mtrr, int, 0444); MODULE_PARM_DESC(mtrr, Enable write-combining with MTRR (1=enable, 0=disable, default=1)); -#endif /* - */ @@ -710,7 +700,7 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) info-fix.mmio_len = pci_resource_len(dev, 1); /* Map physical IO memory address into kernel space */ - info-screen_base = pci_iomap(dev, 0, 0); + info-screen_base = pci_iomap_wc(dev, 0, 0); if (! info-screen_base) {
Re: [Xen-devel] Question: Redirect guest kernel's message via serial port to a file on dom0
On 24/08/2015 04:01, Meng Xu wrote: Hi, I'm trying to use a PV guest VM on Xen to help debug Linux. I was using VirtualBox to help debug Linux kernel by redirecting the output of the serial port of the VM to a file in the host. I can do it in VirtualBox. [Why do I want to achieve this?] It is much faster to reboot a VM than rebooting the physical machine. I don't need another machine to physically connect to the serial port of the development machine. I want to use Xen for as many things as possible. ;-) I tried to google a tutorial or manual about how to configure it, but didn't find any. :-( In my understanding, I need to do the following things: 1) I need to add a line (something like serial=) in the guest's configuration file to specify the serial port device to the VM; 2) I need some configuration to redirect the output of the serial device to a file in domU; 3) After that, I can configure the kernel command line in the VM to dump the kernel message via the serial port of the VM. (I know how to do this step.) Did anyone have tried this before and have some configuration I can refer to? or Could anyone give me some references that describes how to configure the above three steps? I really appreciate any help or suggestion or comment. Configure xenconsoled to log guest consoles to file --log=guest at which point anything sent to hvc0 will be logged to files in /var/log/xen/guest/console (configurable with --log-dir=) There is usually XENCONSOLED_ARGS= in a configuration file somewhere in /etc. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Xen x86 host memory limit issues
(Following up from a discussion at the Seattle Summit). While the theoretical Xen x86 host memory limit is 16TB (or 123TB with CONFIG_BIGMEM), Xen doesn't actually function correctly if host ram exceeds the addressable range in the directmap region, which is at the 5TB boundary (or 3.5TB with CONFIG_BIGMEM). The ultimate bug is that alloc_xenheap_pages() returns virtual addresses which exceed HYPERVISOR_VIRT_END. Because of the way the idle pagetables and monitor pagetables extend the directmap region, these pointers are safe to use. However, in the context of a 64bit PV guest, these virtual addresses belong to the guest kernel. In my repro case (6TB box, 8 numa nodes), it was particularly easy to trigger the issue from a 64bit dom0 with `xenpm get-cpuidle-states all` or `echo c /proc/sysrq-trigger`, both of which went and accessed per-cpu data allocated higher than HYPERVISOR_VIRT_END and unmapped in the dom0 kernel pagetables. (On broadwell hardware, I would expect SMAP violations as the guest kernel pages are user pages). For XenServer, I used the following gross hack to work around the problem diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c index 3c64f19..715765a 100644 --- a/xen/arch/x86/e820.c +++ b/xen/arch/x86/e820.c @@ -15,7 +15,7 @@ * opt_mem: Limit maximum address of physical RAM. * Any RAM beyond this address limit is ignored. */ -static unsigned long long __initdata opt_mem; +static unsigned long long __initdata opt_mem = GB(5 * 1024); size_param(mem, opt_mem); /* Which cases Xen to ignore any RAM above the 5TB boundary. (We used a similar trick with the 1TB limit for 32bit toolstack domains and migration). The infrastructure around xenheap_max_mfn() is supposed cause all xenheap page allocations to fall within the Xen direct mapped region, but experimentally doesn't work correctly. In all cases I have seen, the bad xenheap allocations have been from calls which contain numa information in the memflags, which leads me to suspect it is an interaction issue of numa hinting information and xenheap_bits. At a guess I suspect alloc_heap_pages() doesn't correctly override the numa hint when both a numa hint and zone limit are provided, but I have not investigated this yet. Fixing that bug will be a useful step, as it will allow Xen to function with host ram above the direct map limit, but is still not an optimal solution as it prevents getting numa-local xenheap memory. Longterm it would be optimal to segment the direct map region by numa node so there is equal quantities of xenheap memory available from each numa node. This also has an added security benefit as it makes ret2dir exploits harder, as the direct map target address is no longer a static calculation from the point of view of the attacker. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 24/28] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
El 21/08/15 a les 22.36, Andrew Cooper ha escrit: On 21/08/15 17:53, Roger Pau Monne wrote: Allow the usage of the VCPUOP_initialise, VCPUOP_up, VCPUOP_down and VCPUOP_is_up hypercalls from HVM guests. This patch introduces a new structure (vcpu_hvm_context) that should be used in conjuction with the VCPUOP_initialise hypercall in order to initialize vCPUs for HVM guests. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Stefano Stabellini stefano.stabell...@citrix.com --- Changes since v4: - Don't assume mode is 64B, add an explicit check. - Don't set TF_kernel_mode, it is only needed for PV guests. - Don't set CR0_ET unconditionally. --- xen/arch/arm/domain.c | 24 ++ xen/arch/x86/domain.c | 164 + xen/arch/x86/hvm/hvm.c| 8 ++ xen/common/domain.c | 16 +--- xen/include/public/hvm/hvm_vcpu.h | 168 ++ xen/include/xen/domain.h | 2 + 6 files changed, 367 insertions(+), 15 deletions(-) create mode 100644 xen/include/public/hvm/hvm_vcpu.h diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index b2bfc7d..b20035d 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -752,6 +752,30 @@ int arch_set_info_guest( return 0; } +int arch_initialize_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg) +{ +struct vcpu_guest_context *ctxt; +struct domain *d = current-domain; +int rc; + +if ( (ctxt = alloc_vcpu_guest_context()) == NULL ) +return -ENOMEM; I have posted my remove alloc_vcpu_guest_context() patch to the list for reference as it interacts with this patch. I don't mind rebasing it, but it might also influence this patch. Thanks, I was planning to add such a patch to the series because of your comments in the previous round, but completely forgot about it, sorry. I don't mind picking it up and adding it to my series if now it's too late in the release process to commit it. + +if ( copy_from_guest(ctxt, arg, 1) ) +{ +free_vcpu_guest_context(ctxt); +return -EFAULT; +} + +domain_lock(d); +rc = v-is_initialised ? -EEXIST : arch_set_info_guest(v, ctxt); +domain_unlock(d); + +free_vcpu_guest_context(ctxt); + +return rc; +} + int arch_vcpu_reset(struct vcpu *v) { vcpu_end_shutdown_deferral(v); diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 8fe95f7..23ff14c 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -37,6 +37,7 @@ #include xen/wait.h #include xen/guest_access.h #include public/sysctl.h +#include public/hvm/hvm_vcpu.h #include asm/regs.h #include asm/mc146818rtc.h #include asm/system.h @@ -1140,6 +1141,169 @@ int arch_set_info_guest( #undef c } +/* Called by VCPUOP_initialise for HVM guests. */ +static int arch_set_info_hvm_guest(struct vcpu *v, vcpu_hvm_context_t *ctx) +{ +struct segment_register seg; + +#define get_context_seg(ctx, seg, f) \ +(ctx)-mode == VCPU_HVM_MODE_16B ? (ctx)-cpu_regs.x86_16.seg##_##f : \ +(ctx)-mode == VCPU_HVM_MODE_32B ? (ctx)-cpu_regs.x86_32.seg##_##f : \ +(ctx)-mode == VCPU_HVM_MODE_64B ? (ctx)-cpu_regs.x86_64.seg##_##f : \ +({ panic(Invalid vCPU mode %u requested\n, (ctx)-mode); 0; }) panic() is far too severe. domain_crash() would be better. with an early exit. + +#define get_context_gpr(ctx, gpr) \ +(ctx)-mode == VCPU_HVM_MODE_16B ? (ctx)-cpu_regs.x86_16.gpr: \ +(ctx)-mode == VCPU_HVM_MODE_32B ? (ctx)-cpu_regs.x86_32.e##gpr : \ +(ctx)-mode == VCPU_HVM_MODE_64B ? (ctx)-cpu_regs.x86_64.r##gpr : \ +({ panic(Invalid vCPU mode %u requested\n, (ctx)-mode); 0; }) + +#define get_context_field(ctx, field) \ +(ctx)-mode == VCPU_HVM_MODE_16B ? (ctx)-cpu_regs.x86_16.field : \ +(ctx)-mode == VCPU_HVM_MODE_32B ? (ctx)-cpu_regs.x86_32.field : \ +(ctx)-mode == VCPU_HVM_MODE_64B ? (ctx)-cpu_regs.x86_64.field : \ +({ panic(Invalid vCPU mode %u requested\n, (ctx)-mode); 0; }) + +if ( ctx-mode != VCPU_HVM_MODE_16B ctx-mode != VCPU_HVM_MODE_32B + ctx-mode != VCPU_HVM_MODE_64B ) +return -EINVAL; For readability (and style), I would suggest formatting this as if ( !((ctx-mode == VCPU_HVM_MODE_16B) || (ctx-mode == VCPU_HVM_MODE_32B) || (ctx-mode == VCPU_HVM_MODE_64B)) ) return -EINVAL; + +memset(seg, 0, sizeof(seg)); + +if ( !paging_mode_hap(v-domain) ) +v-arch.guest_table = pagetable_null(); + +v-arch.user_regs.rax = get_context_gpr(ctx,
[Xen-devel] [qemu-upstream-unstable test] 60822: tolerable FAIL - PUSHED
flight 60822 qemu-upstream-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/60822/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 11 guest-start fail like 60605 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-armhf-armhf-xl-qcow2 9 debian-di-installfail never pass test-armhf-armhf-xl-vhd 9 debian-di-installfail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-armhf-armhf-libvirt-vhd 9 debian-di-installfail never pass test-armhf-armhf-libvirt 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 9 debian-di-installfail never pass test-armhf-armhf-libvirt-qcow2 9 debian-di-installfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-amd64-i386-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-raw 9 debian-di-installfail never pass version targeted for testing: qemuub05befcbea71a979509ce04f02929969a790c923 baseline version: qemuubcf35eec0b621c46dbf0aeb40c6bc06b5d3981aa Last test of basis60605 2015-08-05 12:13:08 Z 18 days Testing same since60822 2015-08-21 22:41:47 Z2 days1 attempts People who touched revisions under test: Amit Shah amit.s...@redhat.com Gerd Hoffmann kra...@redhat.com Stefano Stabellini stefano.stabell...@eu.citrix.com jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-xl
[Xen-devel] [PATCH 3.12 48/82] x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
From: Andy Lutomirski l...@kernel.org 3.12-stable review patch. If anyone has any objections, please let me know. === commit aa1acff356bbedfd03b544051f5b371746735d89 upstream. The update_va_mapping hypercall can fail if the VA isn't present in the guest's page tables. Under certain loads, this can result in an OOPS when the target address is in unpopulated vmap space. While we're at it, add comments to help explain what's going on. This isn't a great long-term fix. This code should probably be changed to use something like set_memory_ro. Signed-off-by: Andy Lutomirski l...@kernel.org Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Andy Lutomirski l...@amacapital.net Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: Borislav Petkov b...@alien8.de Cc: Brian Gerst brge...@gmail.com Cc: David Vrabel dvra...@cantab.net Cc: Denys Vlasenko dvlas...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Jan Beulich jbeul...@suse.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Peter Zijlstra pet...@infradead.org Cc: Sasha Levin sasha.le...@oracle.com Cc: Steven Rostedt rost...@goodmis.org Cc: Thomas Gleixner t...@linutronix.de Cc: secur...@kernel.org secur...@kernel.org Cc: xen-devel xen-devel@lists.xen.org Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org Signed-off-by: Ingo Molnar mi...@kernel.org Signed-off-by: Jiri Slaby jsl...@suse.cz --- arch/x86/xen/enlighten.c | 40 1 file changed, 40 insertions(+) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index fa6ade76ef3f..2cbc2f2cf43e 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -480,6 +480,7 @@ static void set_aliased_prot(void *v, pgprot_t prot) pte_t pte; unsigned long pfn; struct page *page; + unsigned char dummy; ptep = lookup_address((unsigned long)v, level); BUG_ON(ptep == NULL); @@ -489,6 +490,32 @@ static void set_aliased_prot(void *v, pgprot_t prot) pte = pfn_pte(pfn, prot); + /* +* Careful: update_va_mapping() will fail if the virtual address +* we're poking isn't populated in the page tables. We don't +* need to worry about the direct map (that's always in the page +* tables), but we need to be careful about vmap space. In +* particular, the top level page table can lazily propagate +* entries between processes, so if we've switched mms since we +* vmapped the target in the first place, we might not have the +* top-level page table entry populated. +* +* We disable preemption because we want the same mm active when +* we probe the target and when we issue the hypercall. We'll +* have the same nominal mm, but if we're a kernel thread, lazy +* mm dropping could change our pgd. +* +* Out of an abundance of caution, this uses __get_user() to fault +* in the target address just in case there's some obscure case +* in which the target address isn't readable. +*/ + + preempt_disable(); + + pagefault_disable();/* Avoid warnings due to being atomic. */ + __get_user(dummy, (unsigned char __user __force *)v); + pagefault_enable(); + if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0)) BUG(); @@ -500,6 +527,8 @@ static void set_aliased_prot(void *v, pgprot_t prot) BUG(); } else kmap_flush_unused(); + + preempt_enable(); } static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries) @@ -507,6 +536,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries) const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE; int i; + /* +* We need to mark the all aliases of the LDT pages RO. We +* don't need to call vm_flush_aliases(), though, since that's +* only responsible for flushing aliases out the TLBs, not the +* page tables, and Xen will flush the TLB for us if needed. +* +* To avoid confusing future readers: none of this is necessary +* to load the LDT. The hypervisor only checks this when the +* LDT is faulted in due to subsequent descriptor access. +*/ + for(i = 0; i entries; i += entries_per_page) set_aliased_prot(ldt + i, PAGE_KERNEL_RO); } -- 2.5.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/tmem: Pass page instead of pfn to xen_tmem_get_page()
On Thu, Aug 20, 2015 at 8:27 AM, David Vrabel david.vra...@citrix.com wrote: On 19/08/15 14:25, Murilo Opsfelder Araujo wrote: The commit 091208a676dfdabb2b8fe86ee155c6fc80081b69 xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn left behind a call to xen_tmem_get_page() receiving pfn instead of page. This change also fixes the following build warning: drivers/xen/tmem.c: In function ‘tmem_cleancache_get_page’: drivers/xen/tmem.c:194:47: warning: passing argument 4 of ‘xen_tmem_get_page’ makes pointer from integer without a cast ret = xen_tmem_get_page((u32)pool, oid, ind, pfn); ^ drivers/xen/tmem.c:138:12: note: expected ‘struct page *’ but argument is of type ‘long unsigned int’ static int xen_tmem_get_page(u32 pool_id, struct tmem_oid oid, I've folded this in, thanks. David Thanks, David. -- Murilo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/tmem: Pass page instead of pfn to xen_tmem_get_page()
On Wed, Aug 19, 2015 at 9:23 PM, Julien Grall julien.gr...@citrix.com wrote: Hi, On 19/08/2015 06:25, Murilo Opsfelder Araujo wrote: The commit 091208a676dfdabb2b8fe86ee155c6fc80081b69 xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn left behind a call to xen_tmem_get_page() receiving pfn instead of page. This change also fixes the following build warning: drivers/xen/tmem.c: In function ‘tmem_cleancache_get_page’: drivers/xen/tmem.c:194:47: warning: passing argument 4 of ‘xen_tmem_get_page’ makes pointer from integer without a cast ret = xen_tmem_get_page((u32)pool, oid, ind, pfn); ^ drivers/xen/tmem.c:138:12: note: expected ‘struct page *’ but argument is of type ‘long unsigned int’ static int xen_tmem_get_page(u32 pool_id, struct tmem_oid oid, ^ Signed-off-by: Murilo Opsfelder Araujo mopsfel...@gmail.com Sorry for the breakage, I haven't spot it because CONFIG_CLEANCACHE is not enabled on my config. Reviewed-by: Julien Grall julien.gr...@citrix.com Regards, -- Julien Grall Hi, Julien. No need to apologize, and thanks for reviewing that. -- Murilo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 09/23] efi: create efi_enabled()
On 22.08.15 at 14:33, daniel.ki...@oracle.com wrote: On Thu, Aug 20, 2015 at 09:18:17AM -0600, Jan Beulich wrote: On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote: --- a/xen/arch/x86/efi/stub.c +++ b/xen/arch/x86/efi/stub.c @@ -4,9 +4,14 @@ #include xen/lib.h #include asm/page.h -#ifndef efi_enabled -const bool_t efi_enabled = 0; -#endif +struct efi __read_mostly efi = { + .flags = 0, /* Initialized later. */ + .acpi= EFI_INVALID_TABLE_ADDR, + .acpi20 = EFI_INVALID_TABLE_ADDR, + .mps = EFI_INVALID_TABLE_ADDR, + .smbios = EFI_INVALID_TABLE_ADDR, + .smbios3 = EFI_INVALID_TABLE_ADDR +}; How is this change related to the subject of the patch? I need to add this struct because... --- a/xen/arch/x86/xen.lds.S +++ b/xen/arch/x86/xen.lds.S @@ -191,8 +191,6 @@ SECTIONS .pad : { . = ALIGN(MB(16)); } :text -#else - efi = .; #endif Same here. ...this creates efi symbol to just satisfy linker and I am removing it. However, existing solution does not allocate space for this symbol and any references to acpi20, etc. does not make sense. As I saw any efi.* references are protected by relevant ifs but we should not do that because it makes code very fragile. If somebody does not know how efi symbol is created he/she may assume that it always represent valid structure and do invalid references somewhere. So, I still think that stub.c should define efi struct properly even if we assume that flags should not be there. However, I agree that this could be separate patch. By the way why did you choose so strange way to satisfy liker needs? To me there's nothing strange about it: I want a symbol that occupies no space in memory. --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -717,6 +717,10 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) char *option_str; bool_t use_cfg_file; +#ifndef CONFIG_ARM /* Disabled until runtime services implemented. */ +set_bit(EFI_PLATFORM, efi.flags); +#endif Just for this to work? I don't see the need for all the pointers in the stub case - why can't this be a separate variable? We don't Could be but if we create struct with so generic name like just simple efi it suggest that this is good place to put flags there. If it is not how to call it? efi_flags? Or maybe we should rename efi to efi_tables too. Then everything will be clear. I agree that this may be matter of taste, but to me the current naming looks quite fine. And yes, efi_flags of efi_state would be a fine name. In general I wouldn't even mind it to be a field in the structure, if only that resulted in the _full_ structure to be allocated even in the no-EFI build case. I admit though that with the goal of always building EFI code (unless the tool chain doesn't support doing so) this becomes less of an issue; otoh us probably wanting some Kconfig-like mechanism sooner or later to {en,dis}able certain features would call for this to remain separable. And yes, at that point it could be done by #ifdef-ing out everything by the flags member. So based on the above I'm withdrawing my implied objection, but please make sure you write better patch descriptions explaining what is done and _why_. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 10/23] efi: build xen.gz with EFI code
On 22.08.15 at 15:59, daniel.ki...@oracle.com wrote: On Thu, Aug 20, 2015 at 09:39:39AM -0600, Jan Beulich wrote: On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote: Build xen.gz with EFI code. We need this to support multiboot2 protocol on EFI platforms. If we wish to load not ELF file using multiboot (v1) or multiboot2 then DYM a non-ELF file? it must contain linear (or flat) representation of code and data. Why? Please don't just put out statements, but also reasons (i.e. at least which component is unable to deal with the current [valid afaict] PE image we have). This is a requirement of multiboot (v1) or multiboot2 protocol. They both know nothing about PE image format. And hence how specifically we arrange data inside the image should be benign to them, as they won't be able to load the file _anyway_. Currently, PE file contains many sections which are not linear (one after another without any holes) or even do not have representation in a file (e.g. BSS). In theory there is a chance that we could build proper PE file using current build system. However, it means that What is improper about the currently built PE file? And if there is anything improper, did you inform the binutils maintainers of the problem? From PE loader point of view everything is OK. However, current Xen PE image (at least build on my machines) is not usable by multiboot (v1) or multiboot2 protocol compatible loader because it is not linear (one section does not live immediately after another without any voids). Again - either I'm missing something (and then your explanation is not good enough) or this is (as said above) a pointless adjustment. --- a/xen/arch/x86/efi/Makefile +++ b/xen/arch/x86/efi/Makefile @@ -1,14 +1,16 @@ CFLAGS += -fshort-wchar -obj-y += stub.o - -create = test -e $(1) || touch -t 19990101 $(1) - efi := $(filter y,$(x86_64)$(shell rm -f disabled)) efi := $(if $(efi),$(shell $(CC) $(filter-out $(CFLAGS-y) .%.d,$(CFLAGS)) -c check.c 2disabled echo y)) efi := $(if $(efi),$(shell $(LD) -mi386pep --subsystem=10 -o check.efi check.o disabled echo y)) -efi := $(if $(efi),$(shell rm disabled)y,$(shell $(call create,boot.init.o); $(call create,runtime.o))) +efi := $(if $(efi),$(shell rm disabled)y) -extra-$(efi) += boot.init.o relocs-dummy.o runtime.o compat.o +extra-y += relocs-dummy.o Why is this no longer extra-$(efi)? Because we need proper EFI code in xen.gz to support boot via multiboot2 on EFI platforms. What would we need that for when not building an EFI-capable binary anyway? -stub.o: $(extra-y) With this dependency removed (instead of perhaps replaced or extended) - what will trigger relocs-dummy.o to be (re)built? It is triggered by prelink.o build rule in xen/arch/x86/Makefile. No. Or better - if it is, then this is wrong. With the way our build logic works, unless there are exceptional circumstances things in a certain directory should be built _only_ when recursion reaches that particular directory (i.e. not from any of its parents). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen x86 host memory limit issues
On 24.08.15 at 12:36, andrew.coop...@citrix.com wrote: The infrastructure around xenheap_max_mfn() is supposed cause all xenheap page allocations to fall within the Xen direct mapped region, but experimentally doesn't work correctly. In all cases I have seen, the bad xenheap allocations have been from calls which contain numa information in the memflags, which leads me to suspect it is an interaction issue of numa hinting information and xenheap_bits. At a guess I suspect alloc_heap_pages() doesn't correctly override the numa hint when both a numa hint and zone limit are provided, but I have not investigated this yet. But you're in the ideal position to do so. As said previously on the same topic, looking just at the code I can't see what's wrong, even when taking into account the experimentally observed behavior. Fixing that bug will be a useful step, as it will allow Xen to function with host ram above the direct map limit, but is still not an optimal solution as it prevents getting numa-local xenheap memory. Longterm it would be optimal to segment the direct map region by numa node so there is equal quantities of xenheap memory available from each numa node. Yes, albeit I'm suspecting there to arise (at least theoretical) issues on systems with many nodes - the per-node ranges directly mapped may become unreasonably small (and we may risk exhausting node 0's memory due to not NUMA-tagged allocation requests). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 24/28] xen/x86: allow HVM guests to use hypercalls to bring up vCPUs
On 24/08/15 11:43, Roger Pau Monné wrote: El 21/08/15 a les 22.36, Andrew Cooper ha escrit: On 21/08/15 17:53, Roger Pau Monne wrote: Allow the usage of the VCPUOP_initialise, VCPUOP_up, VCPUOP_down and VCPUOP_is_up hypercalls from HVM guests. This patch introduces a new structure (vcpu_hvm_context) that should be used in conjuction with the VCPUOP_initialise hypercall in order to initialize vCPUs for HVM guests. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Stefano Stabellini stefano.stabell...@citrix.com --- Changes since v4: - Don't assume mode is 64B, add an explicit check. - Don't set TF_kernel_mode, it is only needed for PV guests. - Don't set CR0_ET unconditionally. --- xen/arch/arm/domain.c | 24 ++ xen/arch/x86/domain.c | 164 + xen/arch/x86/hvm/hvm.c| 8 ++ xen/common/domain.c | 16 +--- xen/include/public/hvm/hvm_vcpu.h | 168 ++ xen/include/xen/domain.h | 2 + 6 files changed, 367 insertions(+), 15 deletions(-) create mode 100644 xen/include/public/hvm/hvm_vcpu.h diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index b2bfc7d..b20035d 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -752,6 +752,30 @@ int arch_set_info_guest( return 0; } +int arch_initialize_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg) +{ +struct vcpu_guest_context *ctxt; +struct domain *d = current-domain; +int rc; + +if ( (ctxt = alloc_vcpu_guest_context()) == NULL ) +return -ENOMEM; I have posted my remove alloc_vcpu_guest_context() patch to the list for reference as it interacts with this patch. I don't mind rebasing it, but it might also influence this patch. Thanks, I was planning to add such a patch to the series because of your comments in the previous round, but completely forgot about it, sorry. I don't mind picking it up and adding it to my series if now it's too late in the release process to commit it. The patch in definitely 4.7 material at this point. I will respin it with an improved commit message, per konrads implied request. + +memset(seg, 0, sizeof(seg)); + +if ( !paging_mode_hap(v-domain) ) +v-arch.guest_table = pagetable_null(); + +v-arch.user_regs.rax = get_context_gpr(ctx, ax); +v-arch.user_regs.rcx = get_context_gpr(ctx, cx); +v-arch.user_regs.rdx = get_context_gpr(ctx, dx); +v-arch.user_regs.rbx = get_context_gpr(ctx, bx); +v-arch.user_regs.rsp = get_context_gpr(ctx, sp); +v-arch.user_regs.rbp = get_context_gpr(ctx, bp); +v-arch.user_regs.rsi = get_context_gpr(ctx, si); +v-arch.user_regs.rdi = get_context_gpr(ctx, di); +v-arch.user_regs.rip = get_context_gpr(ctx, ip); +v-arch.user_regs.rflags = get_context_gpr(ctx, flags); All these hidden conditionals cause the compiler to generate a 2K function, a large quantity of which are conditional jumps. I was expecting the compiler to be clever here and realize ctx-mode is always the same and perform some kind of clever optimization, but I guess this is too much. C may not assume that ctx-$FOO it doesn't alias v-$BAR, and therefore that ctx-mode doesn't change as a result of writing into v. I tried experimenting with a const uint32_t mode = ctx-mode but even that wasn't sufficient for the compiler to optimise the branches away. Perhaps some C11 restrict keywords might have helped. I didn't investigate. I did some experimentation, available from git://xenbits.xen.org/people/andrewcoop/xen.git wip-dmlite-v5-refactor Bloat-o-meter indicates: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-559 (-559) function old new delta arch_set_info_hvm_guest 22091650-559 And looking at the disassembly, those -559 are mostly cmp/jXX constructs, and the dead panic() calls. The code is now longer, but I don't think it detracts from the readability, and it will certainly be faster to execute. What do you think? If others agree, you are welcome to fold the patch into your series. That looks fine IMHO, I can fold it into this patch and add your SoB if that's fine. Completely fine. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1 for v4.6] etherboot: Build fix for GCC 5.1.1
Specificially we are pulling in the upstream patch (commit 1b56452121672e6408c38ac8926bdd6998a39004)): [ath9k] Remove confusing logic inversion in an ANI variable Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- tools/firmware/etherboot/patches/build_fix_4.patch | 225 + tools/firmware/etherboot/patches/series| 1 + 2 files changed, 226 insertions(+) create mode 100644 tools/firmware/etherboot/patches/build_fix_4.patch diff --git a/tools/firmware/etherboot/patches/build_fix_4.patch b/tools/firmware/etherboot/patches/build_fix_4.patch new file mode 100644 index 000..9271c8c --- /dev/null +++ b/tools/firmware/etherboot/patches/build_fix_4.patch @@ -0,0 +1,225 @@ +From 1b56452121672e6408c38ac8926bdd6998a39004 Mon Sep 17 00:00:00 2001 +From: Christian Hesse m...@eworm.de +Date: Thu, 23 Apr 2015 13:33:26 +0200 +Subject: [PATCH] [ath9k] Remove confusing logic inversion in an ANI variable + +This changed in Linux kernel the same way in commit 7067e701 +(ath9k_hw: remove confusing logic inversion in an ANI variable) by +Felix Fietkau. + +Additionally this fixes error: logical not is only applied to the +left hand side of comparison with GCC 5.1.0. + +Signed-off-by: Christian Hesse m...@eworm.de +Signed-off-by: Michael Brown mc...@ipxe.org +--- + src/drivers/net/ath/ath9k/ani.h | 2 +- + src/drivers/net/ath/ath9k/ath9k_ani.c| 16 + src/drivers/net/ath/ath9k/ath9k_ar5008_phy.c | 18 +- + src/drivers/net/ath/ath9k/ath9k_ar9003_phy.c | 12 ++-- + 4 files changed, 24 insertions(+), 24 deletions(-) + +diff --git a/src/drivers/net/ath/ath9k/ani.h b/src/drivers/net/ath/ath9k/ani.h +index dbd4d4d..ba87ba0 100644 +--- a/src/drivers/net/ath/ath9k/ani.h b/src/drivers/net/ath/ath9k/ani.h +@@ -125,7 +125,7 @@ struct ar5416AniState { + u8 mrcCCKOff; + u8 spurImmunityLevel; + u8 firstepLevel; +- u8 ofdmWeakSigDetectOff; ++ u8 ofdmWeakSigDetect; + u8 cckWeakSigThreshold; + u32 listenTime; + int32_t rssiThrLow; +diff --git a/src/drivers/net/ath/ath9k/ath9k_ani.c b/src/drivers/net/ath/ath9k/ath9k_ani.c +index ff7df49..76ca79c 100644 +--- a/src/drivers/net/ath/ath9k/ath9k_ani.c b/src/drivers/net/ath/ath9k/ath9k_ani.c +@@ -177,7 +177,7 @@ static void ath9k_hw_ani_ofdm_err_trigger_old(struct ath_hw *ah) + + rssi = BEACON_RSSI(ah); + if (rssi aniState-rssiThrHigh) { +- if (!aniState-ofdmWeakSigDetectOff) { ++ if (aniState-ofdmWeakSigDetect) { + if (ath9k_hw_ani_control(ah, +ATH9K_ANI_OFDM_WEAK_SIGNAL_DETECTION, +0)) { +@@ -192,7 +192,7 @@ static void ath9k_hw_ani_ofdm_err_trigger_old(struct ath_hw *ah) + return; + } + } else if (rssi aniState-rssiThrLow) { +- if (aniState-ofdmWeakSigDetectOff) ++ if (!aniState-ofdmWeakSigDetect) + ath9k_hw_ani_control(ah, +ATH9K_ANI_OFDM_WEAK_SIGNAL_DETECTION, +1); +@@ -202,7 +202,7 @@ static void ath9k_hw_ani_ofdm_err_trigger_old(struct ath_hw *ah) + return; + } else { + if ((ah-dev-channels + ah-dev-channel)-band == NET80211_BAND_2GHZ) { +- if (!aniState-ofdmWeakSigDetectOff) ++ if (aniState-ofdmWeakSigDetect) + ath9k_hw_ani_control(ah, +ATH9K_ANI_OFDM_WEAK_SIGNAL_DETECTION, +0); +@@ -360,7 +360,7 @@ static void ath9k_hw_ani_lower_immunity_old(struct ath_hw *ah) + if (rssi aniState-rssiThrHigh) { + /* XXX: Handle me */ + } else if (rssi aniState-rssiThrLow) { +- if (aniState-ofdmWeakSigDetectOff) { ++ if (!aniState-ofdmWeakSigDetect) { + if (ath9k_hw_ani_control(ah, +ATH9K_ANI_OFDM_WEAK_SIGNAL_DETECTION, +1) == 1) +@@ -436,9 +436,9 @@ static void ath9k_ani_reset_old(struct ath_hw *ah) + if (aniState-spurImmunityLevel != 0) + ath9k_hw_ani_control(ah, ATH9K_ANI_SPUR_IMMUNITY_LEVEL, +aniState-spurImmunityLevel); +- if (aniState-ofdmWeakSigDetectOff) ++ if (!aniState-ofdmWeakSigDetect) + ath9k_hw_ani_control(ah, ATH9K_ANI_OFDM_WEAK_SIGNAL_DETECTION, +- !aniState-ofdmWeakSigDetectOff); ++ aniState-ofdmWeakSigDetect); + if (aniState-cckWeakSigThreshold) + ath9k_hw_ani_control(ah, ATH9K_ANI_CCK_WEAK_SIGNAL_THR, +aniState-cckWeakSigThreshold); +@@ -709,8 +709,8 @@ void ath9k_hw_ani_init(struct ath_hw *ah) + +
Re: [Xen-devel] Build failres with Xen 4.6.0-rc1 (firmware/etherboot/ipxe)
On Mon, Aug 24, 2015 at 06:29:31AM -0600, Jan Beulich wrote: On 21.08.15 at 19:51, konrad.w...@oracle.com wrote: I don't think we can rev ipxe.git to the latest in Xen 4.6 time-frame. But having that patch should help with compile issues, like mine. Agreed. So how do we want to fix this in 4.6 time-frame? Pull the one patch (or a few more hand selected ones if need be). Done. And 4.7 time-frame? Rev up to ipxe.git master? I would say so. Them apparently never doing any releases (at least there no respective tags that I could see) of course makes the when part of this a little problematic... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 10/23] efi: build xen.gz with EFI code
On Mon, Aug 24, 2015 at 05:35:21AM -0600, Jan Beulich wrote: On 22.08.15 at 15:59, daniel.ki...@oracle.com wrote: On Thu, Aug 20, 2015 at 09:39:39AM -0600, Jan Beulich wrote: On 20.07.15 at 16:29, daniel.ki...@oracle.com wrote: Build xen.gz with EFI code. We need this to support multiboot2 protocol on EFI platforms. If we wish to load not ELF file using multiboot (v1) or multiboot2 then DYM a non-ELF file? it must contain linear (or flat) representation of code and data. Why? Please don't just put out statements, but also reasons (i.e. at least which component is unable to deal with the current [valid afaict] PE image we have). This is a requirement of multiboot (v1) or multiboot2 protocol. They both know nothing about PE image format. And hence how specifically we arrange data inside the image should be benign to them, as they won't be able to load the file _anyway_. Currently, PE file contains many sections which are not linear (one after another without any holes) or even do not have representation in a file (e.g. BSS). In theory there is a chance that we could build proper PE file using current build system. However, it means that What is improper about the currently built PE file? And if there is anything improper, did you inform the binutils maintainers of the problem? From PE loader point of view everything is OK. However, current Xen PE image (at least build on my machines) is not usable by multiboot (v1) or multiboot2 protocol compatible loader because it is not linear (one section does not live immediately after another without any voids). Again - either I'm missing something (and then your explanation is not good enough) or this is (as said above) a pointless adjustment. Let's focus on multiboot2 protocol (multiboot (v1) is similar to multiboot2 in discussed case). In general multiboot2 is able to load any file which has: 1. proper multiboot2 header in first 32 KiB of a given file, 2. the text and data segments must be consecutive in the OS image (The Multiboot Specification version 1.6). This implies that we can e.g. build valid ELF file which is also multiboot2 protocol compatible image. And we does. However, we can go further. Potentially we can build valid PE image which is also valid multiboot2 protocol image. Although current build method does not satisfy requirement number 2 because, e.g.: Sections: Idx Name Size VMA LMA File off Algn 0 .text 001513d0 82d08020 82d08020 1000 2**12 ^^ CONTENTS, ALLOC, LOAD, CODE 1 .rodata 0004de12 82d0803513e0 82d0803513e0 00153000 2**5 ^^ CONTENTS, ALLOC, LOAD, READONLY, DATA Hence, we must use special method to build PE image (I discussed that in my earlier email in that topic) to do it compatible with multiboot2 protocol. This way one file could be loaded by native PE loader, mulitboot (v1) protocol (it requires relevant header but it does not interfere with PE and multiboot2 protocol stuff) and mutliboot2 protocol compatible loaders. Additionally, if it is signed with Secure Boot signature then potentially signature could be verified by UEFI itself and e.g. GRUB2. However, as I said earlier this requires more work and this is next step which I am going to do after applying this series. Currently I am going to embed EFI support into ELF file because it is easy (less changes; currently used ELF file has required properties because multiboot (v1) which we use has similar requirements like multiboot2 protocol) to make it compatible with multiboot2 protocol. I hope that helps. --- a/xen/arch/x86/efi/Makefile +++ b/xen/arch/x86/efi/Makefile @@ -1,14 +1,16 @@ CFLAGS += -fshort-wchar -obj-y += stub.o - -create = test -e $(1) || touch -t 19990101 $(1) - efi := $(filter y,$(x86_64)$(shell rm -f disabled)) efi := $(if $(efi),$(shell $(CC) $(filter-out $(CFLAGS-y) .%.d,$(CFLAGS)) -c check.c 2disabled echo y)) efi := $(if $(efi),$(shell $(LD) -mi386pep --subsystem=10 -o check.efi check.o disabled echo y)) -efi := $(if $(efi),$(shell rm disabled)y,$(shell $(call create,boot.init.o); $(call create,runtime.o))) +efi := $(if $(efi),$(shell rm disabled)y) -extra-$(efi) += boot.init.o relocs-dummy.o runtime.o compat.o +extra-y += relocs-dummy.o Why is this no longer extra-$(efi)? Because we need proper EFI code in xen.gz to support boot via multiboot2 on EFI platforms. What would we need that for when not building an EFI-capable binary anyway? xen/arch/x86/efi/stub.c Daniel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] x86/vmx: fix vmx_is_singlestep_supported return value
The function supposed to return a boolean but instead it returned the value 0x800 which is the Intel internal flag for MTF. This has caused various checks using this function to falsely report no MTF capability. Signed-off-by: Tamas K Lengyel tleng...@novetta.com --- xen/arch/x86/hvm/vmx/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 999defe..35bcd79 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1768,7 +1768,7 @@ static void vmx_enable_msr_exit_interception(struct domain *d) static bool_t vmx_is_singlestep_supported(void) { -return cpu_has_monitor_trap_flag; +return cpu_has_monitor_trap_flag ? 1 : 0; } static void vmx_vcpu_update_eptp(struct vcpu *v) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-next test] 60829: regressions - FAIL
flight 60829 linux-next real [real] http://logs.test-lab.xenproject.org/osstest/logs/60829/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-pvh-intel 11 guest-start fail REGR. vs. 60773 test-armhf-armhf-xl-cubietruck 6 xen-bootfail REGR. vs. 60773 test-armhf-armhf-xl-multivcpu 6 xen-boot fail REGR. vs. 60773 test-armhf-armhf-xl 6 xen-boot fail REGR. vs. 60773 test-armhf-armhf-xl-xsm 6 xen-boot fail REGR. vs. 60773 test-amd64-i386-xl-qemuu-debianhvm-amd64 18 guest-start.2 fail REGR. vs. 60773 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-libvirt-xsm 6 xen-boot fail REGR. vs. 60773 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 16 guest-start/debianhvm.repeat fail REGR. vs. 60773 test-armhf-armhf-xl-rtds 15 guest-start.2fail blocked in 60773 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail blocked in 60773 test-amd64-i386-xl 14 guest-saverestorefail like 60773 test-amd64-i386-xl-xsm 14 guest-saverestorefail like 60773 test-amd64-i386-pair21 guest-migrate/src_host/dst_host fail like 60773 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 60773 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 60773 Tests which did not succeed, but are not blocking: test-amd64-i386-libvirt 14 guest-saverestorefail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 14 guest-saverestorefail never pass test-armhf-armhf-libvirt-vhd 9 debian-di-installfail never pass test-armhf-armhf-libvirt-qcow2 9 debian-di-installfail never pass test-armhf-armhf-xl-raw 9 debian-di-installfail never pass test-armhf-armhf-libvirt-raw 9 debian-di-installfail never pass test-armhf-armhf-libvirt 14 guest-saverestorefail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass test-armhf-armhf-xl-vhd 9 debian-di-installfail never pass test-armhf-armhf-xl-qcow2 9 debian-di-installfail never pass test-amd64-i386-libvirt-raw 11 migrate-support-checkfail never pass test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail never pass version targeted for testing: linux1ef981bcd18de26dc78bc79f092d6f4bb25e0e8f baseline version: linuxbf6740281ed599f98ba13eb3f017ca83deb6277f Last test of basis (not found) Failing since 0 1970-01-01 00:00:00 Z 16671 days Testing same since60829 2015-08-22 08:36:40 Z2 days1 attempts jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt
Re: [Xen-devel] [PATCH for-4.6 1/2] docs: Template for feature documents
On 08/24/2015 07:37 PM, Andrew Cooper wrote: Signed-off-by: Andrew Cooper andrew.coop...@citrix.com --- docs/Makefile |2 +- docs/features/template.pandoc | 55 + 2 files changed, 56 insertions(+), 1 deletion(-) create mode 100644 docs/features/template.pandoc diff --git a/docs/Makefile b/docs/Makefile index 272292c..5d620e5 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -16,7 +16,7 @@ MARKDOWNSRC-y := $(sort $(shell find misc -name '*.markdown' -print)) TXTSRC-y := $(sort $(shell find misc -name '*.txt' -print)) -PANDOCSRC-y := $(sort $(shell find specs -name '*.pandoc' -print)) +PANDOCSRC-y := $(sort $(shell find features/ misc/ specs/ -name '*.pandoc' -print)) # Documentation targets DOC_MAN1 := $(patsubst man/%.pod.1,man1/%.1,$(MAN1SRC-y)) diff --git a/docs/features/template.pandoc b/docs/features/template.pandoc new file mode 100644 index 000..d883b82 --- /dev/null +++ b/docs/features/template.pandoc @@ -0,0 +1,55 @@ +% Template for feature documents + +\clearpage + +This is a suggested template for formatting of a Xen feature document in tree. + +The purpose of this document is to provide a concrete support statement for the +feature (indicating its security status), as well as brief user and technical +documentation. + +# Basics + +A table with an overview of the support status and applicability. + + + Status: e.g. **Supported**/**Tech Preview**/**Experimental** + +Architecture(s): e.g. x86, arm + + Component(s): e.g. Hypervisor, toolstack, guest + + Hardware: _where applicable_ + What about adding some information when the feature was introduced or some other historical stuff? Something like: Experimental in Xen 4.1 Supported in Xen 4.3 xl syntax changed in Xen 4.4 Juergen + +# Overview + +A short description the feature, similar to an abstract for a +paper/presentation. + +# User information + +Information for a user attempting to use the feature. Should include how to +enable the feature (is it enabled by default? If not, how to turn it on?), and +how to interact with the feature (typically via `xl`). + +# Limitations + +Information concerning incompatibilities with other features or hardware +combinations. + +# Technical information + +Information for a developer or power user. Should include where to look +in-tree for detailed documents and code. + +# Areas for improvement + +List of enhancements which could be undertaken, e.g. to improve the feature +itself, or improve interaction with other features. + +# Known issues + +List of known issues or bugs. For tech preview or experimental features, this +section must contain the list of items needing fixing for its status to be +upgraded. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/vmx: fix vmx_is_singlestep_supported return value
On 24/08/2015 20:55, Tamas K Lengyel wrote: The function supposed to return a boolean but instead it returned the value 0x800 which is the Intel internal flag for MTF. This has caused various checks using this function to falsely report no MTF capability. Ouch. Given than bool_t is current signed char, that won't be of much use. Signed-off-by: Tamas K Lengyel tleng...@novetta.com --- xen/arch/x86/hvm/vmx/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 999defe..35bcd79 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1768,7 +1768,7 @@ static void vmx_enable_msr_exit_interception(struct domain *d) static bool_t vmx_is_singlestep_supported(void) { -return cpu_has_monitor_trap_flag; +return cpu_has_monitor_trap_flag ? 1 : 0; Prevailing style would tend towards !!cpu_has_monitor_trap_flag Either way, Reviewed-by: Andrew Cooper andrew.coop...@citrix.com } static void vmx_vcpu_update_eptp(struct vcpu *v) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/vmx: fix vmx_is_singlestep_supported return value
@@ -1768,7 +1768,7 @@ static void vmx_enable_msr_exit_interception(struct domain *d) static bool_t vmx_is_singlestep_supported(void) { -return cpu_has_monitor_trap_flag; +return cpu_has_monitor_trap_flag ? 1 : 0; Prevailing style would tend towards !!cpu_has_monitor_trap_flag Yeap, you are right. If the maintainers prefer I can resend with that style. Either way, Reviewed-by: Andrew Cooper andrew.coop...@citrix.com Thanks! Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] VT-d faults with Integrated Intel graphics on 4.6
Hi everyone, I saw some people passingly mention this on the list before but just in case it has been missed, my serial is also being spammed with the following printouts with both Xen 4.6 RC1 and the latest staging build: ... (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 2610742000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid ... The device in question is an integrated Intel graphics card: 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) The only way I found to stop the messages from making my serial connection useless was by assigning the device to xen-pciback. Cheers, Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 05/18] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
Extend struct pi_desc according to VT-d Posted-Interrupts Spec. CC: Kevin Tian kevin.t...@intel.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Kevin Tian kevin.t...@intel.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- v3: - Use u32 instead of u64 for the bitfield in 'struct pi_desc' xen/include/asm-x86/hvm/vmx/vmcs.h | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index f1126d4..7e81752 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -80,8 +80,19 @@ struct vmx_domain { struct pi_desc { DECLARE_BITMAP(pir, NR_VECTORS); -u32 control; -u32 rsvd[7]; +union { +struct +{ +u16 on : 1, /* bit 256 - Outstanding Notification */ +sn : 1, /* bit 257 - Suppress Notification */ +rsvd_1 : 14; /* bit 271:258 - Reserved */ +u8 nv; /* bit 279:272 - Notification Vector */ +u8 rsvd_2; /* bit 287:280 - Reserved */ +u32 ndst;/* bit 319:288 - Notification Destination */ +}; +u64 control; +}; +u32 rsvd[6]; } __attribute__ ((aligned (64))); #define ept_get_wl(ept) ((ept)-ept_wl) -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 13/18] Update IRTE according to guest interrupt config changes
When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. For lowest-priority interrupts, we use vector-hashing mechamisn to find the destination vCPU. This follows the hardware behavior, since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. For multicast/broadcast vCPU, we cannot handle it via interrupt posting, still use interrupt remapping. CC: Jan Beulich jbeul...@suse.com Signed-off-by: Feng Wu feng...@intel.com --- v6: - Use macro to replace plain numbers - Correct the overflow error in a loop v5: - Make 'struct vcpu *vcpu' const v4: - Make some 'int' variables 'unsigned int' in pi_find_dest_vcpu() - Make 'dest_id' uint32_t - Rename 'size' to 'bitmap_array_size' - find_next_bit() and find_first_bit() always return unsigned int, so no need to check whether the return value is less than 0. - Message error level XENLOG_G_WARNING - XENLOG_G_INFO - Remove useless warning message - Create a seperate function vector_hashing_dest() to find the - destination of lowest-priority interrupts. - Change some comments v3: - Use bitmap to store the all the possible destination vCPUs of an interrupt, then trying to find the right destination from the bitmap - Typo and some small changes xen/drivers/passthrough/io.c | 125 ++- 1 file changed, 124 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index bda9374..8e36948 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -25,6 +25,7 @@ #include asm/hvm/iommu.h #include asm/hvm/support.h #include xen/hvm/irq.h +#include asm/io_apic.h static DEFINE_PER_CPU(struct list_head, dpci_list); @@ -198,6 +199,109 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci) xfree(dpci); } +/* + * This routine handles lowest-priority interrupts using vector-hashing + * mechanism. As an example, modern Intel CPUs use this method to handle + * lowest-priority interrupts. + * + * Here is the details about the vector-hashing mechanism: + * 1. For lowest-priority interrupts, store all the possible destination + *vCPUs in an array. + * 2. Use gvec % max number of destination vCPUs to find the right + *destination vCPU in the array for the lowest-priority interrupt. + */ +static struct vcpu *vector_hashing_dest(const struct domain *d, +uint32_t dest_id, +bool_t dest_mode, +uint8_t gvec) + +{ +unsigned long *dest_vcpu_bitmap; +unsigned int dest_vcpus = 0, idx; +unsigned int bitmap_array_size = BITS_TO_LONGS(d-max_vcpus); +struct vcpu *v, *dest = NULL; +unsigned int i; + +dest_vcpu_bitmap = xzalloc_array(unsigned long, bitmap_array_size); +if ( !dest_vcpu_bitmap ) +{ +dprintk(XENLOG_G_INFO, +dom%d: failed to allocate memory\n, d-domain_id); +return NULL; +} + +for_each_vcpu ( d, v ) +{ +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT, +dest_id, dest_mode) ) +continue; + +__set_bit(v-vcpu_id, dest_vcpu_bitmap); +dest_vcpus++; +} + +if ( dest_vcpus != 0 ) +{ +unsigned int mod = gvec % dest_vcpus; +idx = 0; + +for ( i = 0; i = mod; i++ ) +{ +idx = find_next_bit(dest_vcpu_bitmap, d-max_vcpus, idx) + 1; +BUG_ON(idx = d-max_vcpus); +} +idx--; + +dest = d-vcpu[idx]; +} + +xfree(dest_vcpu_bitmap); + +return dest; +} + +/* + * The purpose of this routine is to find the right destination vCPU for + * an interrupt which will be delivered by VT-d posted-interrupt. There + * are several cases as below: + * + * - For lowest-priority interrupts, use vector-hashing mechanism to find + * the destination. + * - Otherwise, for single destination interrupt, it is straightforward to + * find the destination vCPU and return true. + * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting, + * so return NULL. + */ +static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t dest_id, + bool_t dest_mode, uint8_t delivery_mode, + uint8_t gvec) +{ +unsigned int dest_vcpus = 0; +struct vcpu *v, *dest = NULL; + +if ( delivery_mode == dest_LowestPrio ) +return vector_hashing_dest(d, dest_id, dest_mode, gvec); + +for_each_vcpu ( d, v ) +{ +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT, +dest_id, dest_mode) ) +continue; + +dest_vcpus++; +dest = v; +} +
[Xen-devel] [PATCH v6 07/18] vmx: Initialize VT-d Posted-Interrupts Descriptor
This patch initializes the VT-d Posted-interrupt Descriptor. CC: Kevin Tian kevin.t...@intel.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- v3: - Move pi_desc_init() to xen/arch/x86/hvm/vmx/vmcs.c - Remove the 'inline' flag of pi_desc_init() xen/arch/x86/hvm/vmx/vmcs.c | 18 ++ xen/include/asm-x86/hvm/vmx/vmx.h | 2 ++ 2 files changed, 20 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index a0a97e7..28c553f 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -39,6 +39,7 @@ #include asm/flushtlb.h #include asm/shadow.h #include asm/tboot.h +#include asm/apic.h static bool_t __read_mostly opt_vpid_enabled = 1; boolean_param(vpid, opt_vpid_enabled); @@ -951,6 +952,20 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val) virtual_vmcs_exit(vvmcs); } +static void pi_desc_init(struct vcpu *v) +{ +uint32_t dest; + +v-arch.hvm_vmx.pi_desc.nv = posted_intr_vector; + +dest = cpu_physical_id(v-processor); + +if ( x2apic_enabled ) +v-arch.hvm_vmx.pi_desc.ndst = dest; +else +v-arch.hvm_vmx.pi_desc.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK); +} + static int construct_vmcs(struct vcpu *v) { struct domain *d = v-domain; @@ -1089,6 +1104,9 @@ static int construct_vmcs(struct vcpu *v) if ( cpu_has_vmx_posted_intr_processing ) { +if ( iommu_intpost ) +pi_desc_init(v); + __vmwrite(PI_DESC_ADDR, virt_to_maddr(v-arch.hvm_vmx.pi_desc)); __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector); } diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index acd4aec..03c529c 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -88,6 +88,8 @@ typedef enum { #define EPT_EMT_WB 6 #define EPT_EMT_RSV27 +#define PI_xAPIC_NDST_MASK 0xFF00 + void vmx_asm_vmexit_handler(struct cpu_user_regs); void vmx_asm_do_vmentry(void); void vmx_intr_assist(void); -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 06/18] vmx: Add some helper functions for Posted-Interrupts
This patch adds some helper functions to manipulate the Posted-Interrupts Descriptor. CC: Kevin Tian kevin.t...@intel.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- v4: - Newly added xen/include/asm-x86/hvm/vmx/vmx.h | 21 + 1 file changed, 21 insertions(+) diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index 3fbfa44..acd4aec 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -101,6 +101,7 @@ void vmx_update_cpu_exec_control(struct vcpu *v); void vmx_update_secondary_exec_control(struct vcpu *v); #define POSTED_INTR_ON 0 +#define POSTED_INTR_SN 1 static inline int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc) { return test_and_set_bit(vector, pi_desc-pir); @@ -121,11 +122,31 @@ static inline int pi_test_and_clear_on(struct pi_desc *pi_desc) return test_and_clear_bit(POSTED_INTR_ON, pi_desc-control); } +static inline int pi_test_on(struct pi_desc *pi_desc) +{ +return test_bit(POSTED_INTR_ON, pi_desc-control); +} + static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group) { return xchg(pi_desc-pir[group], 0); } +static inline int pi_test_sn(struct pi_desc *pi_desc) +{ +return test_bit(POSTED_INTR_SN, pi_desc-control); +} + +static inline void pi_set_sn(struct pi_desc *pi_desc) +{ +set_bit(POSTED_INTR_SN, pi_desc-control); +} + +static inline void pi_clear_sn(struct pi_desc *pi_desc) +{ +clear_bit(POSTED_INTR_SN, pi_desc-control); +} + /* * Exit Reasons */ -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 03/18] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. This patch adds variable 'iommu_intpost' to control whether enable VT-d posted-interrupt or not in the generic IOMMU code. CC: Jan Beulich jbeul...@suse.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- v5: - Remove the if no intremap then no intpost logic in parse_iommu_param(), which can be covered in iommu_setup() v3: - Remove pointless initializer for 'iommu_intpost'. - Some adjustment for if no intremap then no intpost logic. * For parse_iommu_param(), move it to the end of the function, so we don't need to add the some logic when introduing the new kernel parameter 'intpost' in later patch. * Add this logic in iommu_setup() after iommu_hardware_setup() is called. xen/drivers/passthrough/iommu.c | 13 - xen/include/xen/iommu.h | 2 +- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index fc7831e..36d5cc0 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -51,6 +51,14 @@ bool_t __read_mostly iommu_passthrough; bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_qinval = 1; bool_t __read_mostly iommu_intremap = 1; + +/* + * In the current implementation of VT-d posted interrupts, in some extreme + * cases, the per cpu list which saves the blocked vCPU will be very long, + * and this will affect the interrupt latency, so let this feature off by + * default until we find a good solution to resolve it. + */ +bool_t __read_mostly iommu_intpost; bool_t __read_mostly iommu_hap_pt_share = 1; bool_t __read_mostly iommu_debug; bool_t __read_mostly amd_iommu_perdev_intremap = 1; @@ -307,6 +315,9 @@ int __init iommu_setup(void) panic(Couldn't enable %s and iommu=required/force, !iommu_enabled ? IOMMU : Interrupt Remapping); +if ( !iommu_intremap ) +iommu_intpost = 0; + if ( !iommu_enabled ) { iommu_snoop = 0; @@ -374,7 +385,7 @@ void iommu_crash_shutdown(void) const struct iommu_ops *ops = iommu_get_ops(); if ( iommu_enabled ) ops-crash_shutdown(); -iommu_enabled = iommu_intremap = 0; +iommu_enabled = iommu_intremap = iommu_intpost = 0; } int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt) diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 8f3a20e..1f5d04a 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -30,7 +30,7 @@ extern bool_t iommu_enable, iommu_enabled; extern bool_t force_iommu, iommu_verbose; extern bool_t iommu_workaround_bios_bug, iommu_igfx, iommu_passthrough; -extern bool_t iommu_snoop, iommu_qinval, iommu_intremap; +extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost; extern bool_t iommu_hap_pt_share; extern bool_t iommu_debug; extern bool_t amd_iommu_perdev_intremap; -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 02/18] Add cmpxchg16b support for x86-64
This patch adds cmpxchg16b support for x86-64, so software can perform 128-bit atomic write/read. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com --- v6: - Fix a typo v5: - Change back the parameters of __cmpxchg16b() to __uint128_t * - Remove pointless cast for 'ptr' - Remove pointless parentheses - Use A constraint for the output v4: - Use pointer as the parameter of __cmpxchg16b(). - Use gcc's __uint128_t built-in type - Make the parameters of __cmpxchg16b() void * v3: - Newly added. xen/include/asm-x86/x86_64/system.h | 28 1 file changed, 28 insertions(+) diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h index 662813a..e4e959d 100644 --- a/xen/include/asm-x86/x86_64/system.h +++ b/xen/include/asm-x86/x86_64/system.h @@ -6,6 +6,34 @@ (unsigned long)(n),sizeof(*(ptr /* + * Atomic 16 bytes compare and exchange. Compare OLD with MEM, if + * identical, store NEW in MEM. Return the initial value in MEM. + * Success is indicated by comparing RETURN with OLD. + * + * This function can only be called when cpu_has_cx16 is true. + */ + +static always_inline __uint128_t __cmpxchg16b( +volatile void *ptr, __uint128_t *old, __uint128_t *new) +{ +__uint128_t prev; +uint64_t new_high = *new 64; +uint64_t new_low = (uint64_t)*new; + +ASSERT(cpu_has_cx16); + +asm volatile ( lock; cmpxchg16b %3 + : =A (prev) + : c (new_high), b (new_low), m (*__xg(ptr)), 0 (*old) + : memory ); + +return prev; +} + +#define cmpxchg16b(ptr,o,n) \ +__cmpxchg16b((ptr), (__uint128_t *)(o), (__uint128_t *)(n)) + +/* * This function causes value _o to be changed to _n at location _p. * If this access causes a fault then we return 1, otherwise we return 0. * If no fault occurs then _o is updated to the value we saw at _p. If this -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 00/18] Add VT-d Posted-Interrupts support
VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. You can find the VT-d Posted-Interrtups Spec. in the following URL: http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html Feng Wu (18): VT-d Posted-intterrupt (PI) design Add cmpxchg16b support for x86-64 iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature vt-d: VT-d Posted-Interrupts feature detection vmx: Extend struct pi_desc to support VT-d Posted-Interrupts vmx: Add some helper functions for Posted-Interrupts vmx: Initialize VT-d Posted-Interrupts Descriptor vmx: Suppress posting interrupts when 'SN' is set VT-d: Remove pointless casts vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts vt-d: Add API to update IRTE when VT-d PI is used x86: move some APIC related macros to apicdef.h Update IRTE according to guest interrupt config changes vmx: posted-interrupt handling when vCPU is blocked vmx: Properly handle notification event when vCPU is running vmx: Add some scheduler hooks for VT-d posted interrupts VT-d: Dump the posted format IRTE Add a command line parameter for VT-d posted-interrupts docs/misc/vtd-pi.txt | 332 + docs/misc/xen-command-line.markdown| 9 +- xen/arch/x86/domain.c | 19 ++ xen/arch/x86/hvm/vlapic.c | 5 - xen/arch/x86/hvm/vmx/vmcs.c| 21 +++ xen/arch/x86/hvm/vmx/vmx.c | 289 +++- xen/common/schedule.c | 2 + xen/drivers/passthrough/io.c | 125 - xen/drivers/passthrough/iommu.c| 16 +- xen/drivers/passthrough/vtd/intremap.c | 199 +++- xen/drivers/passthrough/vtd/iommu.c| 17 +- xen/drivers/passthrough/vtd/iommu.h| 50 +++-- xen/drivers/passthrough/vtd/utils.c| 59 -- xen/include/asm-arm/domain.h | 2 + xen/include/asm-x86/apicdef.h | 4 + xen/include/asm-x86/domain.h | 3 + xen/include/asm-x86/hvm/hvm.h | 2 + xen/include/asm-x86/hvm/vmx/vmcs.h | 26 ++- xen/include/asm-x86/hvm/vmx/vmx.h | 28 +++ xen/include/asm-x86/iommu.h| 2 + xen/include/asm-x86/x86_64/system.h| 28 +++ xen/include/xen/iommu.h| 2 +- 22 files changed, 1155 insertions(+), 85 deletions(-) create mode 100644 docs/misc/vtd-pi.txt -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 01/18] VT-d Posted-intterrupt (PI) design
Add the design doc for VT-d PI. CC: Kevin Tian kevin.t...@intel.com CC: Yang Zhang yang.z.zh...@intel.com CC: Jan Beulich jbeul...@suse.com CC: Keir Fraser k...@xen.org CC: Andrew Cooper andrew.coop...@citrix.com CC: George Dunlap george.dun...@eu.citrix.com Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- v6: - Better description in English docs/misc/vtd-pi.txt | 332 +++ 1 file changed, 332 insertions(+) create mode 100644 docs/misc/vtd-pi.txt diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt new file mode 100644 index 000..af5409a --- /dev/null +++ b/docs/misc/vtd-pi.txt @@ -0,0 +1,332 @@ +Authors: Feng Wu feng...@intel.com + +VT-d Posted-interrupt (PI) design for XEN + +Background +== +With the development of virtualization, there are more and more device +assignment requirements. However, today when a VM is running with +assigned devices (such as, NIC), external interrupt handling for the assigned +devices always needs VMM intervention. + +VT-d Posted-interrupt is a more enhanced method to handle interrupts +in the virtualization environment. Interrupt posting is the process by +which an interrupt request is recorded in a memory-resident +posted-interrupt-descriptor structure by the root-complex, followed by +an optional notification event issued to the CPU complex. + +With VT-d Posted-interrupt we can get the following advantages: +- Direct delivery of external interrupts to running vCPUs without VMM +intervention +- Decrease the interrupt migration complexity. On vCPU migration, software +can atomically co-migrate all interrupts targeting the migrating vCPU. For +virtual machines with assigned devices, migrating a vCPU across pCPUs +either incurs the overhead of forwarding interrupts in software (e.g. via VMM +generated IPIs), or complexity to independently migrate each interrupt targeting +the vCPU to the new pCPU. However, after enabling VT-d PI, the destination vCPU +of an external interrupt from assigned devices is stored in the IRTE (i.e. +Posted-interrupt Descriptor Address), when vCPU is migrated to another pCPU, +we will set this new pCPU in the 'NDST' filed of Posted-interrupt descriptor, this +make the interrupt migration automatic. + +Here is what Xen currently does for external interrupts from assigned devices: + +When a VM is running and an external interrupt from an assigned device occurs +for it. VM-EXIT happens, then: + +vmx_do_extint() -- do_IRQ() -- __do_IRQ_guest() -- hvm_do_IRQ_dpci() -- +raise_softirq_for(pirq_dpci) -- raise_softirq(HVM_DPCI_SOFTIRQ) + +softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq() + +dpci_softirq() -- hvm_dirq_assist() -- vmsi_deliver_pirq() -- vmsi_deliver() -- +vmsi_inj_irq() -- vlapic_set_irq() + +vlapic_set_irq() does the following things: +1. If CPU-side posted-interrupt is supported, call vmx_deliver_posted_intr() to deliver +the virtual interrupt via posted-interrupt infrastructure. +2. Else if CPU-side posted-interrupt is not supported, set the related vIRR in vLAPIC +page and call vcpu_kick() to kick the related vCPU. Before VM-Entry, vmx_intr_assist() +will help to inject the interrupt to guests. + +However, after VT-d PI is supported, when a guest is running in non-root and an +external interrupt from an assigned device occurs for it. No VM-Exit is needed, +the guest can handle this totally in non-root mode, thus avoiding all the above +code flow. + +Posted-interrupt Introduction + +There are two components to the Posted-interrupt architecture: +Processor Support and Root-Complex Support + +- Processor Support +Posted-interrupt processing is a feature by which a processor processes +the virtual interrupts by recording them as pending on the virtual-APIC +page. + +Posted-interrupt processing is enabled by setting the process posted +interrupts VM-execution control. The processing is performed in response +to the arrival of an interrupt with the posted-interrupt notification vector. +In response to such an interrupt, the processor processes virtual interrupts +recorded in a data structure called a posted-interrupt descriptor. + +More information about APICv and CPU-side Posted-interrupt, please refer +to Chapter 29, and Section 29.6 in the Intel SDM: +http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf + +- Root-Complex Support +Interrupt posting is the process by which an interrupt request (from IOAPIC +or MSI/MSIx capable sources) is recorded in a memory-resident +posted-interrupt-descriptor structure by the root-complex, followed by +an optional notification event issued to the CPU complex. The interrupt +request arriving at the root-complex carry the identity of the interrupt +request source and a 'remapping-index'. The remapping-index is used to +look-up an entry from
[Xen-devel] [PATCH v6 18/18] Add a command line parameter for VT-d posted-interrupts
Enable VT-d Posted-Interrupts and add a command line parameter for it. Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com --- v6: - Change the default value to 'false' in xen-command-line.markdown docs/misc/xen-command-line.markdown | 9 - xen/drivers/passthrough/iommu.c | 3 +++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index a2e427c..ecaf221 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -855,7 +855,7 @@ debug hypervisor only). Default: `new` unless directed-EOI is supported ### iommu - `= List of [ boolean | force | required | intremap | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | igfx | verbose | debug ]` + `= List of [ boolean | force | required | intremap | intpost | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | igfx | verbose | debug ]` Sub-options: @@ -882,6 +882,13 @@ debug hypervisor only). Control the use of interrupt remapping (DMA remapping will always be enabled if IOMMU functionality is enabled). + `intpost` + + Default: `false` + + Control the use of interrupt posting, which depends on the availability of + interrupt remapping. + `qinval` (VT-d) Default: `true` diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 36d5cc0..8d03076 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -38,6 +38,7 @@ static void iommu_dump_p2m_table(unsigned char key); * no-snoop Disable VT-d Snoop Control * no-qinval Disable VT-d Queued Invalidation * no-intremapDisable VT-d Interrupt Remapping + * no-intpost Disable VT-d Interrupt posting */ custom_param(iommu, parse_iommu_param); bool_t __initdata iommu_enable = 1; @@ -105,6 +106,8 @@ static void __init parse_iommu_param(char *s) iommu_qinval = val; else if ( !strcmp(s, intremap) ) iommu_intremap = val; +else if ( !strcmp(s, intpost) ) +iommu_intpost = val; else if ( !strcmp(s, debug) ) { iommu_debug = val; -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts
This patch adds the following arch hooks in scheduler: - vmx_pre_ctx_switch_pi(): It is called before context switch, we update the posted interrupt descriptor when the vCPU is preempted, go to sleep, or is blocked. - vmx_post_ctx_switch_pi() It is called after context switch, we update the posted interrupt descriptor when the vCPU is going to run. - arch_vcpu_wake_prepare() It will be called when waking up the vCPU, we update the posted interrupt descriptor when the vCPU is unblocked. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Kevin Tian kevin.t...@intel.com CC: George Dunlap george.dun...@eu.citrix.com CC: Dario Faggioli dario.faggi...@citrix.com Sugguested-by: Dario Faggioli dario.faggi...@citrix.com Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Dario Faggioli dario.faggi...@citrix.com --- v6: - Add two static inline functions for pi context switch - Fix typos v5: - Rename arch_vcpu_wake to arch_vcpu_wake_prepare - Make arch_vcpu_wake_prepare() inline for ARM - Merge the ARM dummy hook with together - Changes to some code comments - Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if PI is disabled or the vCPU is not in HVM - Coding style v4: - Newly added xen/arch/x86/domain.c | 19 + xen/arch/x86/hvm/vmx/vmx.c | 147 + xen/common/schedule.c | 2 + xen/include/asm-arm/domain.h | 2 + xen/include/asm-x86/domain.h | 3 + xen/include/asm-x86/hvm/hvm.h | 2 + xen/include/asm-x86/hvm/vmx/vmcs.h | 8 ++ 7 files changed, 183 insertions(+) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 045f6ff..443986e 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1573,6 +1573,22 @@ static void __context_switch(void) per_cpu(curr_vcpu, cpu) = n; } +static inline void pi_ctxt_switch_from(struct vcpu *prev) +{ +/* + * When switching from non-idle to idle, we only do a lazy context switch. + * However, in order for posted interrupt (if available and enabled) to + * work properly, we at least need to update the descriptors. + */ +if ( prev-arch.pi_ctxt_switch_from !is_idle_vcpu(prev) ) +prev-arch.pi_ctxt_switch_from(prev); +} + +static inline void pi_ctxt_switch_to(struct vcpu *next) +{ +if ( next-arch.pi_ctxt_switch_to !is_idle_vcpu(next) ) +next-arch.pi_ctxt_switch_to(next); +} void context_switch(struct vcpu *prev, struct vcpu *next) { @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct vcpu *next) set_current(next); +pi_ctxt_switch_from(prev); + if ( (per_cpu(curr_vcpu, cpu) == next) || (is_idle_domain(nextd) cpu_online(cpu)) ) { +pi_ctxt_switch_to(next); local_irq_enable(); } else diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 5167fae..889ede3 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -67,6 +67,8 @@ enum handler_return { HNDL_done, HNDL_unhandled, HNDL_exception_raised }; static void vmx_ctxt_switch_from(struct vcpu *v); static void vmx_ctxt_switch_to(struct vcpu *v); +static void vmx_pre_ctx_switch_pi(struct vcpu *v); +static void vmx_post_ctx_switch_pi(struct vcpu *v); static int vmx_alloc_vlapic_mapping(struct domain *d); static void vmx_free_vlapic_mapping(struct domain *d); @@ -117,10 +119,20 @@ static int vmx_vcpu_initialise(struct vcpu *v) INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list); INIT_LIST_HEAD(v-arch.hvm_vmx.pi_vcpu_on_set_list); +v-arch.hvm_vmx.pi_block_cpu = -1; + +spin_lock_init(v-arch.hvm_vmx.pi_lock); + v-arch.schedule_tail= vmx_do_resume; v-arch.ctxt_switch_from = vmx_ctxt_switch_from; v-arch.ctxt_switch_to = vmx_ctxt_switch_to; +if ( iommu_intpost is_hvm_vcpu(v) ) +{ +v-arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi; +v-arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi; +} + if ( (rc = vmx_create_vmcs(v)) != 0 ) { dprintk(XENLOG_WARNING, @@ -718,6 +730,140 @@ static void vmx_fpu_leave(struct vcpu *v) } } +void arch_vcpu_wake_prepare(struct vcpu *v) +{ +unsigned long gflags; + +if ( !iommu_intpost || !is_hvm_vcpu(v) || !has_arch_pdevs(v-domain) ) +return; + +spin_lock_irqsave(v-arch.hvm_vmx.pi_lock, gflags); + +if ( likely(vcpu_runnable(v)) || + !test_bit(_VPF_blocked, v-pause_flags) ) +{ +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc; +unsigned long flags; + +/* + * We don't need to send notification event to a non-running + * vcpu, the interrupt information will be delivered to it before + * VM-ENTRY when the vcpu is scheduled to run next time. + */ +pi_set_sn(pi_desc); + +/* + * Set 'NV' field back to posted_intr_vector, so the + *
[Xen-devel] [PATCH v6 15/18] vmx: Properly handle notification event when vCPU is running
When a vCPU is running in Root mode and a notification event has been injected to it. we need to set VCPU_KICK_SOFTIRQ for the current cpu, so the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time. CC: Kevin Tian kevin.t...@intel.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v6: - Ack the interrupt in the beginning of pi_notification_interrupt() v4: - Coding style. v3: - Make pi_notification_interrupt() static xen/arch/x86/hvm/vmx/vmx.c | 48 +- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 9cde9a4..5167fae 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2035,6 +2035,52 @@ static void pi_wakeup_interrupt(struct cpu_user_regs *regs) this_cpu(irq_count)++; } +/* Handle VT-d posted-interrupt when VCPU is running. */ +static void pi_notification_interrupt(struct cpu_user_regs *regs) +{ +ack_APIC_irq(); + +/* + * We get here when a vCPU is running in root-mode (such as via hypercall, + * or any other reasons which can result in VM-Exit), and before vCPU is + * back to non-root, external interrupts from an assigned device happen + * and a notification event is delivered to this logical CPU. + * + * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like + * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will + * be synced to vIRR before VM-Exit in time. + * + * Please refer to the following code fragments from + * xen/arch/x86/hvm/vmx/entry.S: + * + * .Lvmx_do_vmentry + * + * .. + * point 1 + * + * cmp %ecx,(%rdx,%rax,1) + * jnz .Lvmx_process_softirqs + * + * .. + * + * je .Lvmx_launch + * + * .. + * + * .Lvmx_process_softirqs: + * sti + * call do_softirq + * jmp .Lvmx_do_vmentry + * + * If VT-d engine issues a notification event at point 1 above, it cannot + * be delivered to the guest during this VM-entry without raising the + * softirq in this notification handler. + */ +raise_softirq(VCPU_KICK_SOFTIRQ); + +this_cpu(irq_count)++; +} + const struct hvm_function_table * __init start_vmx(void) { set_in_cr4(X86_CR4_VMXE); @@ -2073,7 +2119,7 @@ const struct hvm_function_table * __init start_vmx(void) if ( cpu_has_vmx_posted_intr_processing ) { -alloc_direct_apic_vector(posted_intr_vector, event_check_interrupt); +alloc_direct_apic_vector(posted_intr_vector, pi_notification_interrupt); if ( iommu_intpost ) alloc_direct_apic_vector(pi_wakeup_vector, pi_wakeup_interrupt); -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked
This patch includes the following aspects: - Add a global vector to wake up the blocked vCPU when an interrupt is being posted to it (This part was sugguested by Yang Zhang yang.z.zh...@intel.com). - Adds a new per-vCPU tasklet to wakeup the blocked vCPU. It can be used in the case vcpu_unblock cannot be called directly. - Define two per-cpu variables: * pi_blocked_vcpu: A list storing the vCPUs which were blocked on this pCPU. * pi_blocked_vcpu_lock: The spinlock to protect pi_blocked_vcpu. CC: Kevin Tian kevin.t...@intel.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com --- v6: - Fix some typos - Ack the interrupt right after the spin_unlock in pi_wakeup_interrupt() v4: - Use local variables in pi_wakeup_interrupt() - Remove vcpu from the blocked list when pi_desc.on==1, this - avoid kick vcpu multiple times. - Remove tasklet v3: - This patch is generated by merging the following three patches in v2: [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU [RFC v2 10/15] vmx: Define two per-cpu variables [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' - Make pi_wakeup_interrupt() static - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' xen/arch/x86/hvm/vmx/vmcs.c| 3 ++ xen/arch/x86/hvm/vmx/vmx.c | 64 ++ xen/include/asm-x86/hvm/vmx/vmcs.h | 3 ++ xen/include/asm-x86/hvm/vmx/vmx.h | 5 +++ 4 files changed, 75 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 28c553f..2dabf16 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -661,6 +661,9 @@ int vmx_cpu_up(void) if ( cpu_has_vmx_vpid ) vpid_sync_all(); +INIT_LIST_HEAD(per_cpu(pi_blocked_vcpu, cpu)); +spin_lock_init(per_cpu(pi_blocked_vcpu_lock, cpu)); + return 0; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 2c1c770..9cde9a4 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -83,7 +83,15 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); static void vmx_invlpg_intercept(unsigned long vaddr); static int vmx_vmfunc_intercept(struct cpu_user_regs *regs); +/* + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we + * can find which vCPU should be woken up. + */ +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); + uint8_t __read_mostly posted_intr_vector; +uint8_t __read_mostly pi_wakeup_vector; static int vmx_domain_initialise(struct domain *d) { @@ -106,6 +114,9 @@ static int vmx_vcpu_initialise(struct vcpu *v) spin_lock_init(v-arch.hvm_vmx.vmcs_lock); +INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list); +INIT_LIST_HEAD(v-arch.hvm_vmx.pi_vcpu_on_set_list); + v-arch.schedule_tail= vmx_do_resume; v-arch.ctxt_switch_from = vmx_ctxt_switch_from; v-arch.ctxt_switch_to = vmx_ctxt_switch_to; @@ -1976,6 +1987,54 @@ static struct hvm_function_table __initdata vmx_function_table = { .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc, }; +/* + * Handle VT-d posted-interrupt when VCPU is blocked. + */ +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) +{ +struct arch_vmx_struct *vmx, *tmp; +struct vcpu *v; +spinlock_t *lock = this_cpu(pi_blocked_vcpu_lock); +struct list_head *blocked_vcpus = this_cpu(pi_blocked_vcpu); +LIST_HEAD(list); + +spin_lock(lock); + +/* + * XXX: The length of the list depends on how many vCPU is current + * blocked on this specific pCPU. This may hurt the interrupt latency + * if the list grows to too many entries. + */ +list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list) +{ +if ( pi_test_on(vmx-pi_desc) ) +{ +list_del_init(vmx-pi_blocked_vcpu_list); + +/* + * We cannot call vcpu_unblock here, since it also needs + * 'pi_blocked_vcpu_lock', we store the vCPUs with ON + * set in another list and unblock them after we release + * 'pi_blocked_vcpu_lock'. + */ +list_add_tail(vmx-pi_vcpu_on_set_list, list); +} +} + +spin_unlock(lock); + +ack_APIC_irq(); + +list_for_each_entry_safe(vmx, tmp, list, pi_vcpu_on_set_list) +{ +v = container_of(vmx, struct vcpu, arch.hvm_vmx); +
[Xen-devel] [PATCH v6 17/18] VT-d: Dump the posted format IRTE
Add the utility to dump the posted format IRTE. CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Feng Wu feng...@intel.com --- v6: - Fix a typo v4: - Newly added xen/drivers/passthrough/vtd/utils.c | 43 +++-- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c index 9d556da..0c7ce3f 100644 --- a/xen/drivers/passthrough/vtd/utils.c +++ b/xen/drivers/passthrough/vtd/utils.c @@ -203,6 +203,9 @@ static void dump_iommu_info(unsigned char key) ecap_intr_remap(iommu-ecap) ? : not , (status DMA_GSTS_IRES) ? and enabled : ); +printk( Interrupt Posting: %ssupported.\n, +cap_intr_post(iommu-ecap) ? : not ); + if ( status DMA_GSTS_IRES ) { /* Dump interrupt remapping table. */ @@ -213,6 +216,7 @@ static void dump_iommu_info(unsigned char key) printk( Interrupt remapping table (nr_entry=%#x. Only dump P=1 entries here):\n, nr_entry); +printk (Entries for remapped format:\n); printk( SVT SQ SID DST V AVL DLM TM RH DM FPD P\n); for ( i = 0; i nr_entry; i++ ) @@ -220,7 +224,7 @@ static void dump_iommu_info(unsigned char key) struct iremap_entry *p; if ( i % (1 IREMAP_ENTRY_ORDER) == 0 ) { -/* This entry across page boundry */ +/* This entry across page boundary. */ if ( iremap_entries ) unmap_vtd_domain_page(iremap_entries); @@ -230,7 +234,7 @@ static void dump_iommu_info(unsigned char key) else p = iremap_entries[i % (1 IREMAP_ENTRY_ORDER)]; -if ( !p-remap.p ) +if ( !p-remap.p || p-remap.im ) continue; printk( %04x: %x %x %04x %08x %02x%x %x %x %x %x %x %x\n, i, @@ -239,6 +243,41 @@ static void dump_iommu_info(unsigned char key) p-remap.rh, p-remap.dm, p-remap.fpd, p-remap.p); print_cnt++; } + +if ( iremap_entries ) +unmap_vtd_domain_page(iremap_entries); + +iremap_entries = NULL; +printk (\nEntries for posted format:\n); +printk( SVT SQ SID PDA V URG AVL FPD P\n); +for ( i = 0; i nr_entry; i++ ) +{ +struct iremap_entry *p; +if ( i % (1 IREMAP_ENTRY_ORDER) == 0 ) +{ +/* This entry across page boundry */ +if ( iremap_entries ) +unmap_vtd_domain_page(iremap_entries); + +GET_IREMAP_ENTRY(iremap_maddr, i, + iremap_entries, p); +} +else +p = iremap_entries[i % (1 IREMAP_ENTRY_ORDER)]; + +if ( !p-post.p || !p-post.im ) +continue; + +printk( %04x: %x %x %04x %16lx %02x%x %x %x %x\n, +i, +p-post.svt, p-post.sq, p-post.sid, +((u64)p-post.pda_h 32) | (p-post.pda_l 6), +p-post.vector, p-post.urg, p-post.avail, p-post.fpd, +p-post.p); + +print_cnt++; +} + if ( iremap_entries ) unmap_vtd_domain_page(iremap_entries); if ( iommu_ir_ctrl(iommu)-iremap_num != print_cnt ) -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 12/18] x86: move some APIC related macros to apicdef.h
Move some APIC related macros to apicdef.h, so they can be used outside of vlapic.c. Signed-off-by: Feng Wu feng...@intel.com --- v6: - Newly introduced. xen/arch/x86/hvm/vlapic.c | 5 - xen/include/asm-x86/apicdef.h | 4 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index b893b40..9b7c871 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -65,11 +65,6 @@ static const unsigned int vlapic_lvt_mask[VLAPIC_LVT_NUM] = LVT_MASK }; -/* Following could belong in apicdef.h */ -#define APIC_SHORT_MASK 0xc -#define APIC_DEST_NOSHORT0x0 -#define APIC_DEST_MASK 0x800 - #define vlapic_lvt_vector(vlapic, lvt_type) \ (vlapic_get_reg(vlapic, lvt_type) APIC_VECTOR_MASK) diff --git a/xen/include/asm-x86/apicdef.h b/xen/include/asm-x86/apicdef.h index 6069fce..6d1fd94 100644 --- a/xen/include/asm-x86/apicdef.h +++ b/xen/include/asm-x86/apicdef.h @@ -124,6 +124,10 @@ #define MAX_IO_APICS 128 +#define APIC_SHORT_MASK 0xc +#define APIC_DEST_NOSHORT0x0 +#define APIC_DEST_MASK 0x800 + /* * the local APIC register structure, memory mapped. Not terribly well * tested, but we might eventually use this one in the future - the -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 09/18] VT-d: Remove pointless casts
Remove pointless casts. Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- v5: - Newly added. xen/drivers/passthrough/vtd/utils.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c index 44c4ef5..162b764 100644 --- a/xen/drivers/passthrough/vtd/utils.c +++ b/xen/drivers/passthrough/vtd/utils.c @@ -234,10 +234,9 @@ static void dump_iommu_info(unsigned char key) continue; printk( %04x: %x %x %04x %08x %02x%x %x %x %x %x %x %x\n, i, -(u32)p-hi.svt, (u32)p-hi.sq, (u32)p-hi.sid, -(u32)p-lo.dst, (u32)p-lo.vector, (u32)p-lo.avail, -(u32)p-lo.dlm, (u32)p-lo.tm, (u32)p-lo.rh, -(u32)p-lo.dm, (u32)p-lo.fpd, (u32)p-lo.p); +p-hi.svt, p-hi.sq, p-hi.sid, p-lo.dst, p-lo.vector, +p-lo.avail, p-lo.dlm, p-lo.tm, p-lo.rh, p-lo.dm, +p-lo.fpd, p-lo.p); print_cnt++; } if ( iremap_entries ) @@ -281,11 +280,10 @@ static void dump_iommu_info(unsigned char key) printk( %02x: %04x %x%x %x %x %x%x %x %02x\n, i, -(u32)remap-index_0_14 | ((u32)remap-index_15 15), -(u32)remap-format, (u32)remap-mask, (u32)remap-trigger, -(u32)remap-irr, (u32)remap-polarity, -(u32)remap-delivery_status, (u32)remap-delivery_mode, -(u32)remap-vector); +remap-index_0_14 | ((u32)remap-index_15 15), +remap-format, remap-mask, remap-trigger, remap-irr, +remap-polarity, remap-delivery_status, remap-delivery_mode, +remap-vector); } } } -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 10/18] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
Extend struct iremap_entry according to VT-d Posted-Interrupts Spec. CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v4: - res_4 is not a bitfiled, correct it. - Expose 'im' to remapped irte as well. v3: - Use u32 instead of u64 to define the bitfields in 'struct iremap_entry' - Limit using bitfield if possible xen/drivers/passthrough/vtd/intremap.c | 92 +- xen/drivers/passthrough/vtd/iommu.h| 43 ++-- xen/drivers/passthrough/vtd/utils.c| 8 +-- 3 files changed, 80 insertions(+), 63 deletions(-) diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c index 987bbe9..e9fffa6 100644 --- a/xen/drivers/passthrough/vtd/intremap.c +++ b/xen/drivers/passthrough/vtd/intremap.c @@ -122,9 +122,9 @@ static u16 hpetid_to_bdf(unsigned int hpet_id) static void set_ire_sid(struct iremap_entry *ire, unsigned int svt, unsigned int sq, unsigned int sid) { -ire-hi.svt = svt; -ire-hi.sq = sq; -ire-hi.sid = sid; +ire-remap.svt = svt; +ire-remap.sq = sq; +ire-remap.sid = sid; } static void set_ioapic_source_id(int apic_id, struct iremap_entry *ire) @@ -219,7 +219,7 @@ static unsigned int alloc_remap_entry(struct iommu *iommu, unsigned int nr) else p = iremap_entries[i % (1 IREMAP_ENTRY_ORDER)]; -if ( p-lo_val || p-hi_val ) /* not a free entry */ +if ( p-lo || p-hi ) /* not a free entry */ found = 0; else if ( ++found == nr ) break; @@ -253,7 +253,7 @@ static int remap_entry_to_ioapic_rte( GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, index, iremap_entries, iremap_entry); -if ( iremap_entry-hi_val == 0 iremap_entry-lo_val == 0 ) +if ( iremap_entry-hi == 0 iremap_entry-lo == 0 ) { dprintk(XENLOG_ERR VTDPREFIX, %s: index (%d) get an empty entry!\n, @@ -263,13 +263,13 @@ static int remap_entry_to_ioapic_rte( return -EFAULT; } -old_rte-vector = iremap_entry-lo.vector; -old_rte-delivery_mode = iremap_entry-lo.dlm; -old_rte-dest_mode = iremap_entry-lo.dm; -old_rte-trigger = iremap_entry-lo.tm; +old_rte-vector = iremap_entry-remap.vector; +old_rte-delivery_mode = iremap_entry-remap.dlm; +old_rte-dest_mode = iremap_entry-remap.dm; +old_rte-trigger = iremap_entry-remap.tm; old_rte-__reserved_2 = 0; old_rte-dest.logical.__reserved_1 = 0; -old_rte-dest.logical.logical_dest = iremap_entry-lo.dst 8; +old_rte-dest.logical.logical_dest = iremap_entry-remap.dst 8; unmap_vtd_domain_page(iremap_entries); spin_unlock_irqrestore(ir_ctrl-iremap_lock, flags); @@ -317,27 +317,28 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu, if ( rte_upper ) { if ( x2apic_enabled ) -new_ire.lo.dst = value; +new_ire.remap.dst = value; else -new_ire.lo.dst = (value 24) 8; +new_ire.remap.dst = (value 24) 8; } else { *(((u32 *)new_rte) + 0) = value; -new_ire.lo.fpd = 0; -new_ire.lo.dm = new_rte.dest_mode; -new_ire.lo.tm = new_rte.trigger; -new_ire.lo.dlm = new_rte.delivery_mode; +new_ire.remap.fpd = 0; +new_ire.remap.dm = new_rte.dest_mode; +new_ire.remap.tm = new_rte.trigger; +new_ire.remap.dlm = new_rte.delivery_mode; /* Hardware require RH = 1 for LPR delivery mode */ -new_ire.lo.rh = (new_ire.lo.dlm == dest_LowestPrio); -new_ire.lo.avail = 0; -new_ire.lo.res_1 = 0; -new_ire.lo.vector = new_rte.vector; -new_ire.lo.res_2 = 0; +new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio); +new_ire.remap.avail = 0; +new_ire.remap.res_1 = 0; +new_ire.remap.vector = new_rte.vector; +new_ire.remap.res_2 = 0; set_ioapic_source_id(IO_APIC_ID(apic), new_ire); -new_ire.hi.res_1 = 0; -new_ire.lo.p = 1; /* finally, set present bit */ +new_ire.remap.res_3 = 0; +new_ire.remap.res_4 = 0; +new_ire.remap.p = 1; /* finally, set present bit */ /* now construct new ioapic rte entry */ remap_rte-vector = new_rte.vector; @@ -510,7 +511,7 @@ static int remap_entry_to_msi_msg( GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, index, iremap_entries, iremap_entry); -if ( iremap_entry-hi_val == 0 iremap_entry-lo_val == 0 ) +if ( iremap_entry-hi == 0 iremap_entry-lo == 0 ) { dprintk(XENLOG_ERR VTDPREFIX, %s: index (%d) get an empty entry!\n, @@ -523,25 +524,25 @@ static int remap_entry_to_msi_msg( msg-address_hi = MSI_ADDR_BASE_HI; msg-address_lo = MSI_ADDR_BASE_LO | -((iremap_entry-lo.dm == 0) ?
[Xen-devel] [PATCH v6 08/18] vmx: Suppress posting interrupts when 'SN' is set
Currently, we don't support urgent interrupt, all interrupts are recognized as non-urgent interrupt, so we cannot send posted-interrupt when 'SN' is set. CC: Kevin Tian kevin.t...@intel.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- v6: - Add some comments v5: - keep the vcpu_kick() at the end of vmx_deliver_posted_intr() - Keep the 'return' after calling __vmx_deliver_posted_interrupt() v4: - Coding style. - V3 removes a vcpu_kick() from the eoi_exitmap_changed path incorrectly, fix it. v3: - use cmpxchg to test SN/ON and set ON xen/arch/x86/hvm/vmx/vmx.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index c32d863..2c1c770 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1701,8 +1701,36 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) */ pi_set_on(v-arch.hvm_vmx.pi_desc); } -else if ( !pi_test_and_set_on(v-arch.hvm_vmx.pi_desc) ) +else { +struct pi_desc old, new, prev; + +/* To skip over first check in the loop below. */ +prev.control = 0; + +do { +/* + * Currently, we don't support urgent interrupt, all + * interrupts are recognized as non-urgent interrupt, + * so we cannot send posted-interrupt when 'SN' is set. + * Besides that, if 'ON' is already set, we cannot set + * posted-interrupts as well. + */ +if ( pi_test_sn(prev) || pi_test_on(prev) ) +{ +vcpu_kick(v); +return; +} + +old.control = v-arch.hvm_vmx.pi_desc.control + ~( 1 POSTED_INTR_ON | 1 POSTED_INTR_SN ); +new.control = v-arch.hvm_vmx.pi_desc.control | + 1 POSTED_INTR_ON; + +prev.control = cmpxchg(v-arch.hvm_vmx.pi_desc.control, + old.control, new.control); +} while ( prev.control != old.control ); + __vmx_deliver_posted_interrupt(v); return; } -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6 1/2] docs: Template for feature documents
On 08/25/2015 12:52 AM, Andrew Cooper wrote: On 24/08/2015 20:27, Juergen Gross wrote: On 08/24/2015 07:37 PM, Andrew Cooper wrote: Signed-off-by: Andrew Cooper andrew.coop...@citrix.com --- docs/Makefile |2 +- docs/features/template.pandoc | 55 + 2 files changed, 56 insertions(+), 1 deletion(-) create mode 100644 docs/features/template.pandoc diff --git a/docs/Makefile b/docs/Makefile index 272292c..5d620e5 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -16,7 +16,7 @@ MARKDOWNSRC-y := $(sort $(shell find misc -name '*.markdown' -print)) TXTSRC-y := $(sort $(shell find misc -name '*.txt' -print)) -PANDOCSRC-y := $(sort $(shell find specs -name '*.pandoc' -print)) +PANDOCSRC-y := $(sort $(shell find features/ misc/ specs/ -name '*.pandoc' -print)) # Documentation targets DOC_MAN1 := $(patsubst man/%.pod.1,man1/%.1,$(MAN1SRC-y)) diff --git a/docs/features/template.pandoc b/docs/features/template.pandoc new file mode 100644 index 000..d883b82 --- /dev/null +++ b/docs/features/template.pandoc @@ -0,0 +1,55 @@ +% Template for feature documents + +\clearpage + +This is a suggested template for formatting of a Xen feature document in tree. + +The purpose of this document is to provide a concrete support statement for the +feature (indicating its security status), as well as brief user and technical +documentation. + +# Basics + +A table with an overview of the support status and applicability. + + + Status: e.g. **Supported**/**Tech Preview**/**Experimental** + +Architecture(s): e.g. x86, arm + + Component(s): e.g. Hypervisor, toolstack, guest + + Hardware: _where applicable_ + What about adding some information when the feature was introduced or some other historical stuff? Something like: Experimental in Xen 4.1 Supported in Xen 4.3 xl syntax changed in Xen 4.4 In the longterm, I would expect that information to be visible via `git log`. Having said that, it probably is useful to have a summary of history available in the written document. How about a #History section at the bottom? That can at least include document written as a starting point and subsequent major changes in short form. Sure, I'm fine with this. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] xen/blkfront: convert to blk-mq APIs
Hi Rafal, Please have a try adding --iodepth_batch=32 --iodepth_batch_complete=32 to the fio command line. I didn't see this issue any more, neither for domU. Thanks, -Bob On 08/21/2015 04:46 PM, Rafal Mielniczuk wrote: On 19/08/15 12:12, Bob Liu wrote: Hi Jens Christoph, Rafal reported an issue about this patch, that's after this patch no more merges happen and the performance dropped if modprobe null_blk irqmode=2 completion_nsec=100, but works fine if modprobe null_blk. I'm not sure whether it's as expect or not. Do you have any suggestions? Thank you! Here is the test result: fio --name=test --ioengine=libaio --rw=read --numjobs=8 --iodepth=32 \ --time_based=1 --runtime=30 --bs=4KB --filename=/dev/xvdb \ --direct=1 --group_reporting=1 --iodepth_batch=16 modprobe null_blk *no patch* (avgrq-sz = 8.00 avgqu-sz=5.00) READ: io=10655MB, aggrb=363694KB/s, minb=363694KB/s, maxb=363694KB/s, mint=30001msec, maxt=30001msec Disk stats (read/write): xvdb: ios=2715852/0, merge=1089/0, ticks=126572/0, in_queue=127456, util=100.00% *with patch* (avgrq-sz = 8.00 avgqu-sz=8.00) READ: io=20655MB, aggrb=705010KB/s, minb=705010KB/s, maxb=705010KB/s, mint=30001msec, maxt=30001msec Disk stats (read/write): xvdb: ios=5274633/0, merge=22/0, ticks=243208/0, in_queue=242908, util=99.98% modprobe null_blk irqmode=2 completion_nsec=100 *no patch* (avgrq-sz = 34.00 avgqu-sz=38.00) READ: io=10372MB, aggrb=354008KB/s, minb=354008KB/s, maxb=354008KB/s, mint=30003msec, maxt=30003msec Disk stats (read/write): xvdb: ios=621760/0, *merge=1988170/0*, ticks=1136700/0, in_queue=1146020, util=99.76% *with patch* (avgrq-sz = 8.00 avgqu-sz=28.00) READ: io=2876.8MB, aggrb=98187KB/s, minb=98187KB/s, maxb=98187KB/s, mint=30002msec, maxt=30002msec Disk stats (read/write): xvdb: ios=734048/0, merge=0/0, ticks=843584/0, in_queue=843080, util=99.72% Regards, -Bob Hello, We got a problem with the lack of merges also when we tested on null_blk device in dom0 directly. When we enabled multi queue block-layer we got no merges, even when we set the number of submission queues to 1. If I don't miss anything, that could suggest the problem lays somewhere in the blk-mq layer itself? Please take a look at the results below: fio --name=test --ioengine=libaio --rw=read --numjobs=8 --iodepth=32 \ --time_based=1 --runtime=30 --bs=4KB --filename=/dev/nullb0 \ --direct=1 --group_reporting=1 modprobe null_blk irqmode=2 completion_nsec=100 queue_mode=1 submit_queues=1 READ: io=13692MB, aggrb=467320KB/s, minb=467320KB/s, maxb=467320KB/s, mint=30002msec, maxt=30002msec Disk stats (read/write): nullb0: ios=991026/0, merge=2499524/0, ticks=1846952/0, in_queue=900012, util=100.00% modprobe null_blk irqmode=2 completion_nsec=100 queue_mode=2 submit_queues=1 READ: io=6839.1MB, aggrb=233452KB/s, minb=233452KB/s, maxb=233452KB/s, mint=30002msec, maxt=30002msec Disk stats (read/write): nullb0: ios=1743967/0, merge=0/0, ticks=1712900/0, in_queue=1839072, util=100.00% Thanks, Rafal On 07/13/2015 05:55 PM, Bob Liu wrote: Note: This patch is based on original work of Arianna's internship for GNOME's Outreach Program for Women. Only one hardware queue is used now, so there is no performance change. The legacy non-mq code is deleted completely which is the same as other drivers like virtio, mtip, and nvme. Also dropped one unnecessary holding of info-io_lock when calling blk_mq_stop_hw_queues(). Changes in v2: - Reorganized blk_mq_queue_rq() - Restored most io_locks in place Change in v3: - Rename blk_mq_queue_rq to blkif_queue_rq Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com Signed-off-by: Bob Liu
Re: [Xen-devel] [PATCH for-4.6 1/2] docs: Template for feature documents
On 24/08/2015 20:27, Juergen Gross wrote: On 08/24/2015 07:37 PM, Andrew Cooper wrote: Signed-off-by: Andrew Cooper andrew.coop...@citrix.com --- docs/Makefile |2 +- docs/features/template.pandoc | 55 + 2 files changed, 56 insertions(+), 1 deletion(-) create mode 100644 docs/features/template.pandoc diff --git a/docs/Makefile b/docs/Makefile index 272292c..5d620e5 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -16,7 +16,7 @@ MARKDOWNSRC-y := $(sort $(shell find misc -name '*.markdown' -print)) TXTSRC-y := $(sort $(shell find misc -name '*.txt' -print)) -PANDOCSRC-y := $(sort $(shell find specs -name '*.pandoc' -print)) +PANDOCSRC-y := $(sort $(shell find features/ misc/ specs/ -name '*.pandoc' -print)) # Documentation targets DOC_MAN1 := $(patsubst man/%.pod.1,man1/%.1,$(MAN1SRC-y)) diff --git a/docs/features/template.pandoc b/docs/features/template.pandoc new file mode 100644 index 000..d883b82 --- /dev/null +++ b/docs/features/template.pandoc @@ -0,0 +1,55 @@ +% Template for feature documents + +\clearpage + +This is a suggested template for formatting of a Xen feature document in tree. + +The purpose of this document is to provide a concrete support statement for the +feature (indicating its security status), as well as brief user and technical +documentation. + +# Basics + +A table with an overview of the support status and applicability. + + + Status: e.g. **Supported**/**Tech Preview**/**Experimental** + +Architecture(s): e.g. x86, arm + + Component(s): e.g. Hypervisor, toolstack, guest + + Hardware: _where applicable_ + What about adding some information when the feature was introduced or some other historical stuff? Something like: Experimental in Xen 4.1 Supported in Xen 4.3 xl syntax changed in Xen 4.4 In the longterm, I would expect that information to be visible via `git log`. Having said that, it probably is useful to have a summary of history available in the written document. How about a #History section at the bottom? That can at least include document written as a starting point and subsequent major changes in short form. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question: Redirect guest kernel's message via serial port to a file on dom0
Hi Andrew, Thank you so much for your suggestion! I tried but have some questions. 2015-08-24 4:10 GMT-04:00 Andrew Cooper andrew.coop...@citrix.com: On 24/08/2015 04:01, Meng Xu wrote: Hi, I'm trying to use a PV guest VM on Xen to help debug Linux. I was using VirtualBox to help debug Linux kernel by redirecting the output of the serial port of the VM to a file in the host. I can do it in VirtualBox. [Why do I want to achieve this?] It is much faster to reboot a VM than rebooting the physical machine. I don't need another machine to physically connect to the serial port of the development machine. I want to use Xen for as many things as possible. ;-) I tried to google a tutorial or manual about how to configure it, but didn't find any. :-( In my understanding, I need to do the following things: 1) I need to add a line (something like serial=) in the guest's configuration file to specify the serial port device to the VM; 2) I need some configuration to redirect the output of the serial device to a file in domU; 3) After that, I can configure the kernel command line in the VM to dump the kernel message via the serial port of the VM. (I know how to do this step.) Did anyone have tried this before and have some configuration I can refer to? or Could anyone give me some references that describes how to configure the above three steps? I really appreciate any help or suggestion or comment. Configure xenconsoled to log guest consoles to file --log=guest at which point anything sent to hvc0 will be logged to files in /var/log/xen/guest/console (configurable with --log-dir=) I set XENCONSOLED_TRACE=guest under /etc/default/xencommons , because in /etc/init.d/xencommons, it has: test -z $XENCONSOLED_TRACE || XENCONSOLED_ARGS= --log=$XENCONSOLED_TRACE So I think this is what you mean by --log=guest; After I set this variable and restart the xencommons by service xencommons restart on dom0 (Ubuntu 12.04LTS), I still couldn't find the file in /var/log/xen/guest/console when I reboot the VM. Actually, find /var -name console returns no result. **My question is:** Is there anything else I need to configure to get the /var/log/xen/guest/console? I don't see the folder under /var/log/xen. :-( BTW, I also tried find /etc |grep -i log-dir, but find no file has the keyword log-dir. There is usually XENCONSOLED_ARGS= in a configuration file somewhere in /etc. Yes, I think I found it in /etc/init.d/xencommons. ---I attached the config file for the VM just in case it is helpful:-- name = vm1 memory = 8192 disk = ['file:/images/vm1.img,xvda,w'] vif = ['bridge=xenbr0'] extra = debian-installer/exit/always_halt=true --console=hvc0 bootloader = pygrub Thank you very much for your help! Best regards, Meng --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [ovmf test] 60835: all pass - PUSHED
flight 60835 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/60835/ Perfect :-) All tests in this flight passed version targeted for testing: ovmf b199d9418820b873d0e05190fe5dc947a6f72b14 baseline version: ovmf 70bd69912ad2fb6e99271b418f87b98ebb36e0d8 Last test of basis60759 2015-08-17 16:11:04 Z7 days Testing same since60835 2015-08-22 21:41:56 Z2 days1 attempts People who touched revisions under test: Bob Feng bob.c.f...@intel.com Samer El-Haj-Mahmoud samer.el-haj-mahm...@hp.com Yao, Jiewen jiewen@intel.com Ard Biesheuvel ard.biesheu...@linaro.org Bob Feng bob.c.f...@intel.com Chao Zhang chao.b.zh...@intel.com Dandan Bi dandan...@intel.com Eric Dong eric.d...@intel.com Feng Tian feng.t...@intel.com Gary Ching-Pang Lin g...@suse.com Jiaxin Wu jiaxin...@intel.com Leif Lindholm leif.lindh...@linaro.org Liming Gao liming@intel.com Qin Long qin.l...@intel.com Ruiyu Ni ruiyu...@intel.com Samer El-Haj-Mahmoud samer.el-haj-mahm...@hp.com Scott Duplichan sc...@notabs.org Yao, Jiewen jiewen@intel.com jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : + branch=ovmf + revision=b199d9418820b873d0e05190fe5dc947a6f72b14 + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf b199d9418820b873d0e05190fe5dc947a6f72b14 + branch=ovmf + revision=b199d9418820b873d0e05190fe5dc947a6f72b14 + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . cri-common ++ . cri-getconfig ++ umask 002 + select_xenbranch + case $branch in + tree=ovmf + xenbranch=xen-unstable + '[' xovmf = xlinux ']' + linuxbranch= + '[' x = x ']' + qemuubranch=qemu-upstream-unstable + select_prevxenbranch + local b + local p ++ ./mg-list-all-branches + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in '$(./mg-list-all-branches)' + case $b in + for b in