CVS commit: src/sys/arch/xen/x86
Module Name:src Committed By: cherry Date: Fri Jun 7 12:43:52 UTC 2019 Modified Files: src/sys/arch/xen/x86: xen_intr.c Log Message: Fix build for the XEN3_PVHVM kernel by conditoinally compiling redundant functions x86_enable_intr()/x86_disable_intr() To generate a diff of this commit: cvs rdiff -u -r1.16 -r1.17 src/sys/arch/xen/x86/xen_intr.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Modified files: Index: src/sys/arch/xen/x86/xen_intr.c diff -u src/sys/arch/xen/x86/xen_intr.c:1.16 src/sys/arch/xen/x86/xen_intr.c:1.17 --- src/sys/arch/xen/x86/xen_intr.c:1.16 Thu May 9 17:09:51 2019 +++ src/sys/arch/xen/x86/xen_intr.c Fri Jun 7 12:43:52 2019 @@ -1,4 +1,4 @@ -/* $NetBSD: xen_intr.c,v 1.16 2019/05/09 17:09:51 bouyer Exp $ */ +/* $NetBSD: xen_intr.c,v 1.17 2019/06/07 12:43:52 cherry Exp $ */ /*- * Copyright (c) 1998, 2001 The NetBSD Foundation, Inc. @@ -30,7 +30,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: xen_intr.c,v 1.16 2019/05/09 17:09:51 bouyer Exp $"); +__KERNEL_RCSID(0, "$NetBSD: xen_intr.c,v 1.17 2019/06/07 12:43:52 cherry Exp $"); #include #include @@ -99,6 +99,7 @@ xen_spllower(int nlevel) } +#if !defined(XENPVHVM) void x86_disable_intr(void) { @@ -117,6 +118,8 @@ x86_enable_intr(void) hypervisor_force_callback(); } +#endif /* !XENPVHVM */ + u_long xen_read_psl(void) {
CVS commit: src/sys/arch/xen/x86
Module Name:src Committed By: cherry Date: Fri Jun 7 12:43:52 UTC 2019 Modified Files: src/sys/arch/xen/x86: xen_intr.c Log Message: Fix build for the XEN3_PVHVM kernel by conditoinally compiling redundant functions x86_enable_intr()/x86_disable_intr() To generate a diff of this commit: cvs rdiff -u -r1.16 -r1.17 src/sys/arch/xen/x86/xen_intr.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
Re: CVS commit: src/sys/arch/x86
(resent to source-changes-d@) "Maxime Villard" writes: [...] > > Contrary to AMD-SVM, Intel-VMX uses a different set of PTE bits from > native, and this has three important consequences: > > - We can't use the native PTE bits, so each time we want to modify the >page tables, we need to know whether we're dealing with a native pmap >or an EPT pmap. This is accomplished with callbacks, that handle >everything PTE-related. > I like this approach - perhaps it's a way to separate out other similar pmaps (OT). > - There is no recursive slot possible, so we can't use pmap_map_ptes(). >Rather, we walk down the EPT trees via the direct map, and that's >actually a lot simpler (and probably faster too...). > Does this mean that nvmm hosts have to have __HAVE_DIRECT_MAP ? If so, it might be worth having a separate kernel conf rather than GENERIC (I don't know how this works now). I recently built an amd64 kernel without __HAVE_DIRECT_MAP and was quite impressed that it actually booted. That's a nice to have feature. > - The kernel is never mapped in an EPT pmap. An EPT pmap cannot be loaded >on the host. This has two sub-consequences: at creation time we must >zero out all of the top-level PTEs, and at destruction time we force >the page out of the pool cache and into the pool, to ensure that a next >allocation will invoke pmap_pdp_ctor() to create a native pmap and not >recycle some stale EPT entries. Can you not use a separate poolcache ? This could isolate host/guest related memory pressure as well ? -- ~cherry
Re: CVS commit: src/sys/arch/x86/x86
Maxime Villard writes: > Can we do something about it now? It's been more than a week, and the issue is > still there. NVMM still doesn't modload, same for procfs, and GENERIC_KASLR > doesn't work either. > > This needs to be fixed now, and we should not start adding random hacks all > over the place (like the one below). > Sorry for the delay - I've rolled back the changes. http://mail-index.netbsd.org/source-changes/2019/01/06/msg102113.html The XEN related ones I will roll back if enough people complain. I'm meanwhile investigating other options. Thanks for your patience. -- ~cherry
Re: CVS commit: src/sys/arch
Christoph Badura writes: > On Tue, Dec 25, 2018 at 11:45:42AM +0530, Cherry G.Mathew wrote: >> Joerg Sonnenberger writes: >> > On Sat, Dec 22, 2018 at 09:27:22PM +, Cherry G. Mathew wrote: >> >> Modified Files: >> >> src/sys/arch/amd64/amd64: cpufunc.S >> >> src/sys/arch/i386/i386: cpufunc.S i386func.S >> >> src/sys/arch/xen/x86: xenfunc.c >> >> Log Message: >> >> Introduce a weak alias method of exporting different implementations >> >> of the same API. >> > Why? As in: what's the advantage of making them weak. >> I'd posted separately before this commit explaining the rationale. > > It took me a while to check the most obvious places and figure out which > message it might have been. A more precise reference would have helped. > >> Here's the scenario above: >> >> let's take lcr3(). On native this is a ring3 operation, implemented in >> assembler. On xen, this is a PV call. Currently there's no need to have >> both implementations in a single binary, since PV and native binaries >> are distinct. With PVHVM, we have a situation where at boot time, the >> native version is required, and after XEN is detected, we can use the >> XEN version of the call. I've taken the approach of naming the >> individual functions distinctly, eg: i386_lcr3(), or xen_lcr3(), while >> and exporting them weakly as the consumed version, ie; lcr3(). >> >> What happens is that when the individual binary is compiled, the >> appropriate weakly exported symbol takes over, and things work as >> expected. When the combined binary for PVHVM is required, we write a >> strongly exported "switch" function, called lcr3() (I haven't committed >> this yet) which overrides both the weak symbols. It can then do the >> runtime calls by calling into the appropriate i386_lcr3() or xen_lcr3(), >> depending on the mode in which it is running. > > I don't find this argument for weak aliases convincing. You plan to > write the function that makes a runtime decision between the > implemenation anyway. You might as well write that function now and avoid > another bit of hidden magic. > The other options have been suggested earlier (function pointers and hotpatching). Function pointers require reprototyping every function required (I don't have an exhaustive list yet). Hotpatching isn't useful in that it will overwrite the default version, so we'll have to hotpatch twice. Once to override the default, and then to make a copy so that the default is available. It's also ugly, so for me this is the least preferable method. > You can have multiple versions of that "switch" function and select the > appropriate one with config(8) logic. It's too late for that. Things like lcr3() need to work way before config(8) is a twinkle in boot(8)s eyes. > Or you can select with #ifdef. Somewhere you have to place the logic > for conditional compilation/linking. Having that logic concentrated > in a single place instead of N seems preferable to me. Yes, it's pretty intense - see x86/mainbus.c:mainbus_attach() for a sample of what is to come. -- ~cherry
Re: Weak x86 aliases
Maxime Villard writes: > Le 28/12/2018 à 15:06, Cherry G.Mathew a écrit : >> Maxime Villard writes: >>> Le 28/12/2018 à 14:57, Cherry G.Mathew a écrit : Maxime Villard writes: >> Introduce a weak alias method of exporting different implementations >> of the same API. > > Please revert or fix this change. I'm not sure what the fix is - do you have a suggestion ? >>> >>> either re-apply it without using weak symbols, or change the modloader >>> to accept weak symbols >> >> I don't like the imperative in your tone. NVMM is the user of modloader, >> not PVHVM. So if you feel like your usecase needs fixing, I'd say it's >> your problem - or don't use modules, but see below. > > Wut? Yes my suggestions are either we re-apply the change without using > weak symbols or we change the modloader to accept weak symbols. > Would a __strong_alias() fix this for you ? ie; is aliasing problematic in itself ? -- ~cherry
Re: Weak x86 aliases
John Nemeth writes: > On Dec 28, 11:52pm, "Mathew, Cherry G." wrote: > } On December 28, 2018 9:54:11 PM GMT+05:30, John Nemeth > wrote: > } >On Dec 28, 7:36pm, "Cherry G.Mathew" wrote: > } >} Maxime Villard writes: > } >} > Le 28/12/2018 �� 14:57, Cherry G.Mathew a ��crit : > } >} >> Maxime Villard writes: > } >} Introduce a weak alias method of exporting different > } >implementations > } >} of the same API. > } >} >>> > } >} >>> Please revert or fix this change. > } >} >> > } >} >> I'm not sure what the fix is - do you have a suggestion ? > } >} > > } >} > either re-apply it without using weak symbols, or change the > } >modloader > } >} > to accept weak symbols > } >} > } >} I don't like the imperative in your tone. NVMM is the user of > } >modloader, > } >} not PVHVM. So if you feel like your usecase needs fixing, I'd say > } >it's > } >} your problem - or don't use modules, but see below. > } > > } > I suspect there's a language issue here due to people using > } >English as a second language. However, I don't see an imperative > } >(command) here. You asked for suggestions on how to fix a problem. > } >He answered your question with a couple of suggestions. That's > } >all. > } > > } > Also, I would argue that the kernel uses modloader, not the > } >module. In any event, as mentioned, it is your change that broke > } >things... > } > } Based on Jason's reply I suspect I've broken modules on Xen too. > } ISTR that you did some work in this area. If you did, can you > } comment? > > I mostly worked on the infrastructure to build a seperate set > of modules for xen and xen-pae. On amd64, Xen modules didn't work > due to missing modmap, which maxv fixed. On i386, only simple > modules worked due to not being able to find symbols. I really > don't know a whole lot about linkers and loaders (manipulating > makefiles is much simpler). > > However, having said that, I suspect that your work with PVHVM > (and presumably PVH after that) will make the idea of having seperate > modules for Xen obsolete. I.e. it appears to me that we're headed > to a world where a single kernel will work both on raw iron and > under Xen. > I would like to maintain PV as is. It has advantages that the PVHVM stuff doesn't have. -- ~cherry
Re: Weak x86 aliases
Cherry G. Mathew writes: [...] > I think I'll revert these for now, PS: I'm in transit and probably not the best time to do this - if you feel like it's urgent enough, please feel free to rollback. I'll fix breakage on my end once the dust settles. I'll be in a commitable place in ~48hrs. -- ~cherry
Re: Weak x86 aliases
Maxime Villard writes: > Le 28/12/2018 à 14:57, Cherry G.Mathew a écrit : >> Maxime Villard writes: Introduce a weak alias method of exporting different implementations of the same API. >>> >>> Please revert or fix this change. >> >> I'm not sure what the fix is - do you have a suggestion ? > > either re-apply it without using weak symbols, or change the modloader > to accept weak symbols > I don't like the imperative in your tone. NVMM is the user of modloader, not PVHVM. So if you feel like your usecase needs fixing, I'd say it's your problem - or don't use modules, but see below. >>> The kernel modules that use these functions can't be modloaded >>> anymore, because weak symbols aren't resolved. >>> >>> Eg, NVMM can't be modloaded anymore, because it uses rcr0 among others. >>> >> >> I think I'll revert these for now, because PVHVM doesn't/shouldn't use >> them anyway, but I'd like to know how to fix this properly. modload not >> working due to __weak_alias() sounds like something we don't do properly >> in the modload path. > > There is a check in kobj_sym_lookup(): > > 926 case STB_WEAK: > 927 kobj_error(ko, "weak symbols not supported"); > 928 return EINVAL; > > To resolve correctly I guess we need to take the strongest of the > duplicated symbols. > I'll look into this unless someone else beats me to it - but for now I'm ok for the specific change that breaks things for NVMM to be rolled back. (Please see other email about timeframe). -- ~cherry
Re: Weak x86 aliases
Maxime Villard writes: >> Introduce a weak alias method of exporting different implementations >> of the same API. > > Please revert or fix this change. I'm not sure what the fix is - do you have a suggestion ? > The kernel modules that use these functions can't be modloaded > anymore, because weak symbols aren't resolved. > > Eg, NVMM can't be modloaded anymore, because it uses rcr0 among others. > I think I'll revert these for now, because PVHVM doesn't/shouldn't use them anyway, but I'd like to know how to fix this properly. modload not working due to __weak_alias() sounds like something we don't do properly in the modload path. -- ~cherry
Re: CVS commit: src/sys/arch
Joerg Sonnenberger writes: > On Sat, Dec 22, 2018 at 09:27:22PM +0000, Cherry G. Mathew wrote: >> Module Name: src >> Committed By:cherry >> Date:Sat Dec 22 21:27:22 UTC 2018 >> >> Modified Files: >> src/sys/arch/amd64/amd64: cpufunc.S >> src/sys/arch/i386/i386: cpufunc.S i386func.S >> src/sys/arch/xen/x86: xenfunc.c >> >> Log Message: >> Introduce a weak alias method of exporting different implementations >> of the same API. > > Why? As in: what's the advantage of making them weak. > I'd posted separately before this commit explaining the rationale. Here's the scenario above: let's take lcr3(). On native this is a ring3 operation, implemented in assembler. On xen, this is a PV call. Currently there's no need to have both implementations in a single binary, since PV and native binaries are distinct. With PVHVM, we have a situation where at boot time, the native version is required, and after XEN is detected, we can use the XEN version of the call. I've taken the approach of naming the individual functions distinctly, eg: i386_lcr3(), or xen_lcr3(), while and exporting them weakly as the consumed version, ie; lcr3(). What happens is that when the individual binary is compiled, the appropriate weakly exported symbol takes over, and things work as expected. When the combined binary for PVHVM is required, we write a strongly exported "switch" function, called lcr3() (I haven't committed this yet) which overrides both the weak symbols. It can then do the runtime calls by calling into the appropriate i386_lcr3() or xen_lcr3(), depending on the mode in which it is running. In the specific case above however, I realised that most of the functions I converted are not appropriate for use in PVHVM as the native versions are likely to be more efficient. I may roll them back once things stabilise. I hope this was a useful explanation. -- ~cherry
Re: CVS commit: src/sys/arch/xen/x86
On 2 August 2016 at 19:51, Maxime Villardwrote: > Module Name:src > Committed By: maxv > Date: Tue Aug 2 14:21:53 UTC 2016 > > Modified Files: > src/sys/arch/xen/x86: x86_xpmap.c > > Log Message: > Map the kernel text, rodata and data+bss independently on Xen, with > respectively RX, R and RW. > > > Hi - wondering why you're getting more divergence from generic x86 - is there a way to do this (and the pg_nx stuff for eg:) without having to special case this in Xen ? -- ~~Cherry
Re: CVS commit: src/sys/external/bsd/drm/dist/shared-core
Hi Matt, This doesn't fix drm2 i915 breakage. Could you revisit this please, since it's in external/ ? Thanks, On 5 September 2014 15:10, Matt Thomas m...@netbsd.org wrote: Module Name:src Committed By: matt Date: Fri Sep 5 09:40:44 UTC 2014 Modified Files: src/sys/external/bsd/drm/dist/shared-core: i915_drv.h i915_suspend.c Log Message: Rename enum pipe to enum pipe so it won't conflcit with struct pipe. To generate a diff of this commit: cvs rdiff -u -r1.5 -r1.6 src/sys/external/bsd/drm/dist/shared-core/i915_drv.h cvs rdiff -u -r1.6 -r1.7 \ src/sys/external/bsd/drm/dist/shared-core/i915_suspend.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. -- ~~Cherry
Re: CVS commit: src/sys/arch
Hi Paul, Thanks for working on this. Paul == Paul Goyette p...@whooppee.com writes: Date: Fri, 13 Jun 2014 17:52:53 -0700 (PDT) From: Paul Goyette p...@whooppee.com I modified the newer patch so that things still compile on non-XEN kernels, and added the version check to an additional invocation of xen_pagezero(). The patch is attached, and I have verified that it successfully boots under the pre-3.4 hypervisor. Is there any reason why this should not be committed? [...] Index: src/sys/arch/x86/x86/pmap.c === RCS file: /cvsroot/src/sys/arch/x86/x86/pmap.c,v retrieving revision 1.182 diff -u -p -r1.182 pmap.c --- src/sys/arch/x86/x86/pmap.c 6 May 2014 04:26:23 - 1.182 +++ src/sys/arch/x86/x86/pmap.c 14 Jun 2014 00:51:56 - @@ -211,6 +211,15 @@ __KERNEL_RCSID(0, $NetBSD: pmap.c,v 1.1 #ifdef XEN #include xen/xen-public/xen.h #include xen/hypervisor.h + +/* + * Does the hypervisor we're running on support an api + * call at the requested version number ? + */ +#define XEN_VERSION_SUPPORTED(major, minor) \ + (XEN_MAJOR(xen_version) (major) ||\ + (XEN_MAJOR(xen_version) == (major) \ + XEN_MINOR(xen_version) = (minor))) #endif This should probably go in include/hypervisor.h, right under the definitions for XEN_MAJOR() and XEN_MINOR() /* @@ -3013,9 +3022,11 @@ pmap_zero_page(paddr_t pa) { #if defined(__HAVE_DIRECT_MAP) pagezero(PMAP_DIRECT_MAP(pa)); -#elif defined(XEN) - xen_pagezero(pa); #else +#if defined(XEN) + if (XEN_VERSION_SUPPORTED(3, 4)) + xen_pagezero(pa); +#endif The version check should probably be inside xen_pagezero() but that would require a tiny bit of re-org in pmap.c since the direct map code is equally culpable. IMO this can wait for a post-commit sweep. Looks ok to me otherwise. Cheers, -- Cherry
Re: CVS commit: src/sys/arch
Paul == Paul Goyette p...@whooppee.com writes: Module Name: src Committed By: cherry Date: Tue May 6 04:26:24 UTC 2014 Modified Files: src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/include: xenpmap.h src/sys/arch/xen/x86: x86_xpmap.c Log Message: Use the hypervisor to copy/zero pages. This saves us the extra overheads of setting up temporary kernel mapping/unmapping. riz@ reports savings of about 2s on a 120s kernel build. Paul This commit seems to have broken the ability to boot NetBSD Paul under at least some hypervisors. My VPS at prgmr.com boots Paul fine with a kernel from just before this change. However, Paul with this commit, the following panic occurs: Sorry, the first patch was incorrect ( it skips over some newer legit versions! ) - thanks abs@. Thanks to Gary Duzan for the initial suggestion and abs@ for catching the error in my first patch. -- Cherry diff -r f0203e3974e7 sys/arch/x86/x86/pmap.c --- a/sys/arch/x86/x86/pmap.c Sat May 31 20:20:36 2014 +0900 +++ b/sys/arch/x86/x86/pmap.c Thu Jun 12 15:46:00 2014 +0900 @@ -211,6 +211,15 @@ #ifdef XEN #include xen/xen-public/xen.h #include xen/hypervisor.h + +/* + * Does the hypervisor we're running on support an api + * call at the requested version number ? + */ +#define XEN_VERSION_SUPPORTED(major, minor) \ + (XEN_MAJOR(xen_version) (major) || \ + (XEN_MAJOR(xen_version) == (major) \ + XEN_MINOR(xen_version) = (minor))) #endif /* @@ -3097,8 +3106,11 @@ memcpy((void *)dstva, (void *)srcva, PAGE_SIZE); #elif defined(XEN) - xen_copy_page(srcpa, dstpa); -#else + if (XEN_VERSION_SUPPORTED(3, 4)) { + xen_copy_page(srcpa, dstpa); + return; + } +#endif pt_entry_t *spte; pt_entry_t *dpte; void *csrcva; @@ -3128,7 +3140,6 @@ pmap_pte_flush(); #endif kpreempt_enable(); -#endif } static pt_entry_t * @@ -4108,8 +4119,11 @@ #ifdef __HAVE_DIRECT_MAP pagezero(PMAP_DIRECT_MAP(*paddrp)); #elif defined(XEN) - xen_pagezero(*paddrp); -#else + if (XEN_VERSION_SUPPORTED(3, 4)) { + xen_pagezero(*paddrp); + return true; + } +#endif kpreempt_disable(); pmap_pte_set(early_zero_pte, pmap_pa2pte(*paddrp) | PG_V | PG_RW | PG_k); @@ -4121,7 +4135,6 @@ pmap_pte_flush(); #endif /* defined(DIAGNOSTIC) */ kpreempt_enable(); -#endif } else { /* XXX */ ptp = uvm_pagealloc(NULL, 0, NULL,
Re: CVS commit: src/sys/arch
Paul == Paul Goyette p...@whooppee.com writes: Module Name: src Committed By: cherry Date: Tue May 6 04:26:24 UTC 2014 Modified Files: src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/include: xenpmap.h src/sys/arch/xen/x86: x86_xpmap.c Log Message: Use the hypervisor to copy/zero pages. This saves us the extra overheads of setting up temporary kernel mapping/unmapping. riz@ reports savings of about 2s on a 120s kernel build. Paul This commit seems to have broken the ability to boot NetBSD Paul under at least some hypervisors. My VPS at prgmr.com boots Paul fine with a kernel from just before this change. However, Paul with this commit, the following panic occurs: [...] Hi, Does the attached patch work for you ? -- Cherry diff -r f0203e3974e7 sys/arch/x86/x86/pmap.c --- a/sys/arch/x86/x86/pmap.c Sat May 31 20:20:36 2014 +0900 +++ b/sys/arch/x86/x86/pmap.c Thu Jun 12 14:15:30 2014 +0900 @@ -3097,8 +3097,12 @@ memcpy((void *)dstva, (void *)srcva, PAGE_SIZE); #elif defined(XEN) - xen_copy_page(srcpa, dstpa); -#else + if (XEN_MAJOR(xen_version) = 3 + XEN_MINOR(xen_version) = 4) { + xen_copy_page(srcpa, dstpa); + return; + } +#endif pt_entry_t *spte; pt_entry_t *dpte; void *csrcva; @@ -3128,7 +3132,6 @@ pmap_pte_flush(); #endif kpreempt_enable(); -#endif } static pt_entry_t * @@ -4108,8 +4111,12 @@ #ifdef __HAVE_DIRECT_MAP pagezero(PMAP_DIRECT_MAP(*paddrp)); #elif defined(XEN) + if (XEN_MAJOR(xen_version) = 3 + XEN_MINOR(xen_version) = 4) { xen_pagezero(*paddrp); -#else + return true; + } +#endif kpreempt_disable(); pmap_pte_set(early_zero_pte, pmap_pa2pte(*paddrp) | PG_V | PG_RW | PG_k); @@ -4121,7 +4128,6 @@ pmap_pte_flush(); #endif /* defined(DIAGNOSTIC) */ kpreempt_enable(); -#endif } else { /* XXX */ ptp = uvm_pagealloc(NULL, 0, NULL,
Re: CVS commit: src/sys/arch/xen
Hi Manuel, On 13 January 2013 02:39, Manuel Bouyer bou...@netbsd.org wrote: Module Name:src Committed By: bouyer Date: Sat Jan 12 21:09:10 UTC 2013 Modified Files: src/sys/arch/xen/include: hypervisor.h src/sys/arch/xen/x86: hypervisor_machdep.c src/sys/arch/xen/xen: evtchn.c Log Message: Revert these commits from november 2012: http://mail-index.netbsd.org/source-changes/2012/11/25/msg039125.html http://mail-index.netbsd.org/source-changes/2012/11/25/msg039126.html This commit has no functional difference. Are you sure it needs to be backed out ? http://mail-index.netbsd.org/source-changes/2012/11/25/msg039142.html they cause a i386PAE domU to hang while running ATF tests, as shown in http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/ (we should pay more attention to test results, myself first). I was happy to back these out myself, but I couldn't reproduce your bug. -- ~Cherry
Re: CVS commit: src/sys/uvm
On 4 September 2012 12:32, Mindaugas Rasiukevicius rm...@netbsd.org wrote: Matt Thomas m...@3am-software.com wrote: On Sep 3, 2012, at 3:33 PM, Mindaugas Rasiukevicius wrote: Matt Thomas m...@netbsd.org wrote: Module Name: src Committed By: matt Date: Mon Sep 3 19:53:43 UTC 2012 Modified Files: src/sys/uvm: uvm_km.c uvm_map.c Log Message: Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there unused. - pmap_growkernel() may use adaptive locks, which cannot be acquired with the spin lock held; so the change breaks at least x86 and alpha. - Why in the caller? I think it would be better do leave it for the pmaps, e.g. they may re-use the locks which already provide the necessary protection and which need to be taken anyway (like in x86 pmap). uvm_maxkaddr need a lock for its updating growkernel can be called uvm_km_mem_alloc which might be called at interrupt level. The second point stands, but I see you already fixed it - thanks! As for pmap_growkernel() being called from interrupt context - right, then it seems Xen is broken, as its path in pmap_growkernel() acquires adaptive pmaps_lock and might call pool_cache_invalidate() which can block.. I don't see a xen specific lock being taken in pmap_growkernel() -- ~Cherry
Re: CVS commit: src/tests/ipf
On 27 March 2012 22:38, Alan Barrett a...@cequrux.com wrote: On Tue, 27 Mar 2012, Jukka Ruohonen wrote: Modified Files: src/tests/ipf: t_filter_exec.sh t_filter_parse.sh t_nat_exec.sh t_nat_ipf_exec.sh Log Message: Mark the failing tests as broken. XXX: If no one is willing to maintain the ipf tests, these should be removed. I object to this. If ipf fails its tests, then the fact should be made clear in the test reports, not hidden by disabling the tests. I don't know whether the bugs are in ipf or in the tests, but either way, removing or disabling the tests seems to me to be counter-productive. --apb (Alan Barrett) +1 -- ~Cherry
Re: CVS commit: src/lib/libc/stdio
On 28 March 2012 07:53, Christos Zoulas chris...@astron.com wrote: In article 20120327202907.gt26...@bigmac.stderr.spb.ru, Valeriy E. Ushakov u...@stderr.spb.ru wrote: But that is not what the code was. The code was: char c; if (c == CHAR_MAX) ... and *that* is portable. As I said in another mail to thsi thread that went unanswered, it is literally schizophrenic of lint to complain about it. How can lint know that if (c == 255) is portable? Because CHAR_MAX gets expanded by cpp to 255 before lint parses the file. (How) does the compiler (not) catch this one ? -- ~Cherry
Re: CVS commit: src/sys/arch/i386/i386
On 5 March 2012 02:14, Manuel Bouyer bou...@netbsd.org wrote: Module Name: src Committed By: bouyer Date: Sun Mar 4 20:44:17 UTC 2012 Modified Files: src/sys/arch/i386/i386: machdep.c Log Message: Don't try to uvm_page_physload() the tmpgdt page: this always fails because only one physical segment is configured for Xen, and it's probably not worth it to create a second physseg with a single page (uvm has optimisations for the VM_PHYSSEG_MAX == 1 case) ic, so we're potentially leaking 2 pages at boot now -- ~Cherry
Re: CVS commit: src/sys/arch
On 25 February 2012 17:32, Manuel Bouyer bou...@antioche.eu.org wrote: On Sat, Feb 25, 2012 at 10:30:30AM +0530, Cherry G. Mathew wrote: I've made a few changes to pmap.c where it looks harmless to do so, but are in favour of consistency. ftp://ftp.netbsd.org/pub/NetBSD/misc/cherry/tmp/xen-set-pte.diff Did you test it ? I have a vague memory that using spl() in early allocation routines would cause a trap (maybe because curcpu() doesn't work yet) but maybe your recent changes fixed it. Yes and yes. Also you can change #ifndef XEN const pd_entry_t pteflags = PG_V | PG_RW; #else /* XEN */ const pd_entry_t pteflags = PG_u | PG_V | PG_RW; #endif /* XEN */ to: const pd_entry_t pteflags = PG_k | PG_V | PG_RW; (PG_k is defined to PG_u on Xen/amd64 and 0 otherwise). Ah cool, that would remove those ugly #ifdefs Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch/x86/x86
On 26 February 2012 01:33, Cherry G. Mathew che...@netbsd.org wrote: Module Name: src Committed By: cherry Date: Sat Feb 25 20:03:58 UTC 2012 Modified Files: src/sys/arch/x86/x86: pmap.c Log Message: Revert previous since it does a redundant xpq queue flush. xen_bcast_invlpg() flushes the queue for us. As bouyer@ kindly pointed out offline. Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch/x86/x86
Hi Thomas, On 24 February 2012 14:21, Thomas Klausner w...@netbsd.org wrote: Hi Cherry! On Fri, Feb 24, 2012 at 08:44:45AM +, Cherry G. Mathew wrote: Module Name: src Committed By: cherry Date: Fri Feb 24 08:44:44 UTC 2012 Modified Files: src/sys/arch/x86/x86: pmap.c Log Message: kernel page attribute change should be reflected on all cpus, since the page is going to be released after the _dtor() hook is called. Can you please explain why you reverted the previous version and now use xen_bcast*? Thomas That was a typo that I committed by mistake. xpq_bcast*() does not exist. Sorry about that. Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch
On 22 February 2012 18:31, Manuel Bouyer bou...@antioche.eu.org wrote: On Wed, Feb 22, 2012 at 06:05:21PM +0530, Cherry G. Mathew wrote: I meant we could make it work, (it would already for amd64/xen since cpu_init_msrs() is called from cpu_hatch()) since xen has its own cpu.c i don't know if we can do the same for i386. It wasn't fun, but I managed to do it. btw, do you see a gdt page leaked between machdep.c:initgdt() and gdt.c:gdt_init() ? Also xpq_cpu() is time-critical; I guess a function pointer call is faster than a test. Well, as a bonus of the early %gs/%fs setup now, I'm thinking of pruning the xpq_queue_update_xxx() in favour of pmap_set_xxx(). Also, I'll revisiting the atomicity guarantees (eg: pmap_pte_cas() of these functions, once we only start using them. Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch
On 24 February 2012 15:33, Manuel Bouyer bou...@antioche.eu.org wrote: On Fri, Feb 24, 2012 at 03:00:03PM +0530, Cherry G. Mathew wrote: On 22 February 2012 18:31, Manuel Bouyer bou...@antioche.eu.org wrote: On Wed, Feb 22, 2012 at 06:05:21PM +0530, Cherry G. Mathew wrote: I meant we could make it work, (it would already for amd64/xen since cpu_init_msrs() is called from cpu_hatch()) since xen has its own cpu.c i don't know if we can do the same for i386. It wasn't fun, but I managed to do it. btw, do you see a gdt page leaked between machdep.c:initgdt() and gdt.c:gdt_init() ? I can't see initgdt(), did you remove it ? No, it's right there: http://nxr.netbsd.org/xref/src/sys/arch/i386/i386/machdep.c#1096 Also xpq_cpu() is time-critical; I guess a function pointer call is faster than a test. Well, as a bonus of the early %gs/%fs setup now, I'm thinking of pruning the xpq_queue_update_xxx() in favour of pmap_set_xxx(). Also, I'll revisiting the atomicity guarantees (eg: pmap_pte_cas() of these functions, once we only start using them. AFAIK they're already all used by pmap ? Mostly, but not everywere. What I want to look at is *why* they're used. In some case it's only to collect PG_M/PG_D bits, and Xen has another, maybe more efficient mechanism for that. This may allow us to batch more MMU updates. Also, I want to look using more multicalls. This may decrease the number of hypercalls significantly. I wonder if there's a way to align that with pmap(9) assumptions. Quoting the manpage: In order to cope with hardware architectures that make the invalidation of virtual address mappings expensive (e.g., TLB invalidations, TLB shootdown operations for multiple processors), the pmap module is allowed to delay mapping invalidation or protection operations until such time as they are actually necessary. The functions that are allowed to delay such actions are pmap_enter(), pmap_remove(), pmap_protect(), pmap_kenter_pa(), and pmap_kremove(). Callers of these functions must use the pmap_update() function to notify the pmap module that the map- pings need to be made correct. Since the pmap module is provided with information as to which processors are using a given physical map, the pmap module may use whatever optimizations it has available to reduce the expense of virtual-to-physical mapping synchronization. Since the XPQ can be regarded as a kind of TLB, I'm guessing we can attempt to marry the two apis ? Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch
On 24 February 2012 18:29, Manuel Bouyer bou...@antioche.eu.org wrote: On Fri, Feb 24, 2012 at 06:14:11PM +0530, Cherry G. Mathew wrote: On 24 February 2012 15:33, Manuel Bouyer bou...@antioche.eu.org wrote: On Fri, Feb 24, 2012 at 03:00:03PM +0530, Cherry G. Mathew wrote: On 22 February 2012 18:31, Manuel Bouyer bou...@antioche.eu.org wrote: On Wed, Feb 22, 2012 at 06:05:21PM +0530, Cherry G. Mathew wrote: I meant we could make it work, (it would already for amd64/xen since cpu_init_msrs() is called from cpu_hatch()) since xen has its own cpu.c i don't know if we can do the same for i386. It wasn't fun, but I managed to do it. btw, do you see a gdt page leaked between machdep.c:initgdt() and gdt.c:gdt_init() ? I can't see initgdt(), did you remove it ? No, it's right there: http://nxr.netbsd.org/xref/src/sys/arch/i386/i386/machdep.c#1096 OK, I was looking in amd64 code. Yes, it's quite possible that we waste a page here. Also xpq_cpu() is time-critical; I guess a function pointer call is faster than a test. Well, as a bonus of the early %gs/%fs setup now, I'm thinking of pruning the xpq_queue_update_xxx() in favour of pmap_set_xxx(). Also, I'll revisiting the atomicity guarantees (eg: pmap_pte_cas() of these functions, once we only start using them. AFAIK they're already all used by pmap ? Mostly, but not everywere. there are places where they're not used on purpose (e.g. because we know taking the lock or raising the IPL is not needed). I've made a few changes to pmap.c where it looks harmless to do so, but are in favour of consistency. ftp://ftp.netbsd.org/pub/NetBSD/misc/cherry/tmp/xen-set-pte.diff What I want to look at is *why* they're used. In some case it's only to collect PG_M/PG_D bits, and Xen has another, maybe more efficient mechanism for that. This may allow us to batch more MMU updates. Also, I want to look using more multicalls. This may decrease the number of hypercalls significantly. I wonder if there's a way to align that with pmap(9) assumptions. Quoting the manpage: In order to cope with hardware architectures that make the invalidation of virtual address mappings expensive (e.g., TLB invalidations, TLB shootdown operations for multiple processors), the pmap module is allowed to delay mapping invalidation or protection operations until such time as they are actually necessary. The functions that are allowed to delay such actions are pmap_enter(), pmap_remove(), pmap_protect(), pmap_kenter_pa(), and pmap_kremove(). Callers of these functions must use the pmap_update() function to notify the pmap module that the map- pings need to be made correct. Since the pmap module is provided with information as to which processors are using a given physical map, the pmap module may use whatever optimizations it has available to reduce the expense of virtual-to-physical mapping synchronization. Since the XPQ can be regarded as a kind of TLB, I'm guessing we can attempt to marry the two apis ? This is more or less what I had in mind. But, for the cases where we need atomic operations, pmap_update() is not appropriate ... Very cool, Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch
Manuel == Manuel Bouyer bou...@antioche.eu.org writes: Manuel On Wed, Feb 22, 2012 at 10:15:50AM +0530, Cherry G. Mathew wrote: Hi Manuel, On 22 February 2012 00:40, Manuel Bouyer bou...@netbsd.org wrote: Module Name: src Committed By: bouyer Date: Tue Feb 21 19:10:13 UTC 2012 Modified Files: src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/x86: cpu.c Log Message: Avoid early use of xen_kpm_sync(); locks are not available at this time. Don't call cpu_init() twice. Makes LOCKDEBUG kernels boot again To generate a diff of this commit: cvs rdiff -u -r1.166 -r1.167 src/sys/arch/x86/x86/pmap.c @@ -4183,14 +4183,30 @@ pte = pmap_pa2pte(pa) | PG_k | PG_V | PG_RW; #ifdef XEN xpq_queue_pte_update(xpmap_ptetomach(pdep[i]), pte); - if (level == PTP_LEVELS) { #if defined(PAE) || defined(__x86_64__) - if (i = PDIR_SLOT_KERN) { + if (level == PTP_LEVELS i = PDIR_SLOT_KERN) { + if (__predict_true( + cpu_info_primary.ci_flags CPUF_PRESENT)) { Can we use that as a canonical MI method in Xen to detect if the tls (gs/fs) has been setup ? Manuel probably not; it works here because we want to know if we Manuel already probed CPUs; I'm not sure CPUF_PRESENT will garantee Manuel that curcpu() will work. I meant we could make it work, (it would already for amd64/xen since cpu_init_msrs() is called from cpu_hatch()) since xen has its own cpu.c Cheers, -- Cherry
Re: CVS commit: src/sys/arch
Hi Manuel, On 22 February 2012 00:40, Manuel Bouyer bou...@netbsd.org wrote: Module Name: src Committed By: bouyer Date: Tue Feb 21 19:10:13 UTC 2012 Modified Files: src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/x86: cpu.c Log Message: Avoid early use of xen_kpm_sync(); locks are not available at this time. Don't call cpu_init() twice. Makes LOCKDEBUG kernels boot again To generate a diff of this commit: cvs rdiff -u -r1.166 -r1.167 src/sys/arch/x86/x86/pmap.c @@ -4183,14 +4183,30 @@ pte = pmap_pa2pte(pa) | PG_k | PG_V | PG_RW; #ifdef XEN xpq_queue_pte_update(xpmap_ptetomach(pdep[i]), pte); - if (level == PTP_LEVELS) { #if defined(PAE) || defined(__x86_64__) - if (i = PDIR_SLOT_KERN) { + if (level == PTP_LEVELS i = PDIR_SLOT_KERN) { + if (__predict_true( + cpu_info_primary.ci_flags CPUF_PRESENT)) { Can we use that as a canonical MI method in Xen to detect if the tls (gs/fs) has been setup ? If so, I can then cleanup the xpq_cpu() mess. Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch
Hi Brian, On 13 January 2012 01:19, Cherry G. Mathew che...@netbsd.org wrote: Module Name: src Committed By: cherry Date: Thu Jan 12 19:49:37 UTC 2012 Modified Files: src/sys/arch/amd64/amd64: machdep.c src/sys/arch/i386/i386: machdep.c src/sys/arch/xen/x86: x86_xpmap.c Log Message: relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath To generate a diff of this commit: cvs rdiff -u -r1.173 -r1.174 src/sys/arch/amd64/amd64/machdep.c cvs rdiff -u -r1.716 -r1.717 src/sys/arch/i386/i386/machdep.c cvs rdiff -u -r1.37 -r1.38 src/sys/arch/xen/x86/x86_xpmap.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. These should fix the crashes with LOCKDEBUG. Thanks for reporting them. -- ~Cherry
Re: CVS commit: src/sys/arch
On 7 January 2012 22:06, David Holland dholland-sourcechan...@netbsd.org wrote: On Sat, Jan 07, 2012 at 12:36:32PM +0100, Manuel Bouyer wrote: the problem is that now the code performs different operations based upon DIAGNOSTIC or not. it's not about whether it's wrong or right, but that it's different. DIAGNOSTIC shouldn't do more or different things, just check stuff and assert if bad. if the hack works for DIAGNOSTIC, it seems generally right and should be enabled for everyone now, until the real fix is implemented. Seconded. Either that, or the DIAGNOSTIC printf which spams the console should be removed (I'm not sure why it has been added in the first place) Yeah... though I don't actually know what it's about, ISTM that if there's a harmless spammy printout, the right hack until things can be fixed properly is to silence the printout. Removing mappings can be expensive. Reverted. Thanks, -- ~Cherry
Re: CVS commit: src/sys/arch
On 6 January 2012 20:56, David Holland dholland-sourcechan...@netbsd.org wrote: On Fri, Jan 06, 2012 at 03:15:28PM +, Cherry G. Mathew wrote: Modified Files: src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/x86: cpu.c Log Message: Address those pesky DIAGNOSTIC messages. \n Take a performance hit at fork() for not DTRT. \n Note: Only applicable for kernels built with options DIAGNOSTIC \n [...] +#ifdef DIAGNOSTIC + pmap_kremove(object, PAGE_SIZE); +#endif /* DIAGNOSTIC */ [plus two more like that] Uh... even if that's correct, which doesn't seem too likely on the surface of things, it doesn't seem desirable. I agree. The correct fix is to implement pmap_protect() or some such for kernel entries. The diagnostic message that it triggers is very verbose for XEN consoles, which ships with DIAGNOSTIC set by default ( not sure why this is the case ). The pmap_kremove()s are harmless, because we know that they're not in use elsewhere Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch
On 7 January 2012 05:53, Greg Troxel g...@ir.bbn.com wrote: David Holland dholland-sourcechan...@netbsd.org writes: On Fri, Jan 06, 2012 at 03:15:28PM +, Cherry G. Mathew wrote: Modified Files: src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/x86: cpu.c Log Message: Address those pesky DIAGNOSTIC messages. \n Take a performance hit at fork() for not DTRT. \n Note: Only applicable for kernels built with options DIAGNOSTIC \n [...] +#ifdef DIAGNOSTIC + pmap_kremove(object, PAGE_SIZE); +#endif /* DIAGNOSTIC */ [plus two more like that] Uh... even if that's correct, which doesn't seem too likely on the surface of things, it doesn't seem desirable. I'm not sure if it's what David meant, but it seems wrong to have different behavior based on DIAGNOSTIC. I think DIAGNOSTIC is supposed to be about enabling inexpensive consistency checks, only. So if it's necessary to have consistency to have that call, it should be unconditional. And if not, it shouldn't exist in any case. Do you understand precisely what's going on? Is this about omitting steps necessary for consistency, but in the case where the new state will be immediately discarded? From my reply to David: The pmap_kremove()s are harmless, because we know that they're not in use elsewhere Please see the diffs in context. The one in pmap.c is in pmap_pdp_ctor() - this call constructs the PDP frame for a new pmap - this happens at fork() as a consequence of a new address space being setup by uvm. At this point nothing else is using the PDP frame, and the pmaps list is locked, so it can't get accidentally traversed or loaded ( this would be an error anyway, as the PDP would crash the cpu on which it is loaded since setup is incomplete at this stage. The other ones are in xen/x86/cpu.c:pmap_cpu_init_late() which is part of vcpu setup. The frame entries in this case are CPU local and can't be accessed by other cpus (except during setup), so we know that the frame is not in use. In any case, these diffs are all XEN specific, and I'm pretty sure I know what I'm doing, unless I've missed something glaringly obvious - please let me know. The correct solution to this is to update pmap_protect() to handle kernel entries. -- ~Cherry
Re: CVS commit: src/sys/arch
On 7 January 2012 12:31, matthew green m...@eterna.com.au wrote: On Fri, Jan 06, 2012 at 03:15:28PM +, Cherry G. Mathew wrote: Modified Files: src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/x86: cpu.c Log Message: Address those pesky DIAGNOSTIC messages. \n Take a performance hit at fork() for not DTRT. \n Note: Only applicable for kernels built with options DIAGNOSTIC \n [...] +#ifdef DIAGNOSTIC + pmap_kremove(object, PAGE_SIZE); +#endif /* DIAGNOSTIC */ [plus two more like that] Uh... even if that's correct, which doesn't seem too likely on the surface of things, it doesn't seem desirable. I'm not sure if it's what David meant, but it seems wrong to have different behavior based on DIAGNOSTIC. I think DIAGNOSTIC is supposed to be about enabling inexpensive consistency checks, only. So if it's necessary to have consistency to have that call, it should be unconditional. And if not, it shouldn't exist in any case. Do you understand precisely what's going on? Is this about omitting steps necessary for consistency, but in the case where the new state will be immediately discarded? From my reply to David: The pmap_kremove()s are harmless, because we know that they're not in use elsewhere Please see the diffs in context. The one in pmap.c is in pmap_pdp_ctor() - this call constructs the PDP frame for a new pmap - this happens at fork() as a consequence of a new address space being setup by uvm. At this point nothing else is using the PDP frame, and the pmaps list is locked, so it can't get accidentally traversed or loaded ( this would be an error anyway, as the PDP would crash the cpu on which it is loaded since setup is incomplete at this stage. The other ones are in xen/x86/cpu.c:pmap_cpu_init_late() which is part of vcpu setup. The frame entries in this case are CPU local and can't be accessed by other cpus (except during setup), so we know that the frame is not in use. In any case, these diffs are all XEN specific, and I'm pretty sure I know what I'm doing, unless I've missed something glaringly obvious - please let me know. The correct solution to this is to update pmap_protect() to handle kernel entries. the problem is that now the code performs different operations based upon DIAGNOSTIC or not. it's not about whether it's wrong or right, but that it's different. DIAGNOSTIC shouldn't do more or different things, just check stuff and assert if bad. if the hack works for DIAGNOSTIC, it seems generally right and should be enabled for everyone now, until the real fix is implemented. Ah, right - sorry, I got distracted a bit by the description of the paper-over. I'll remove the #ifdef's presently. -- ~Cherry
Re: CVS commit: src/sys/arch
On 8 November 2011 05:50, Jean-Yves Migeon jeanyves.mig...@free.fr wrote: On 06.11.2011 16:18, Cherry G. Mathew wrote: Module Name: src Committed By: cherry Date: Sun Nov 6 15:18:19 UTC 2011 Modified Files: src/sys/arch/amd64/include: pmap.h src/sys/arch/i386/include: pmap.h src/sys/arch/x86/include: cpu.h src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/x86: cpu.c x86_xpmap.c Log Message: [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware. Some comments. -#ifdef PAE -/* Address of the static kernel's L2 page */ -pd_entry_t *pmap_kl2pd; -paddr_t pmap_kl2paddr; -#endif - - [snip] #ifdef PAE - uint32_t ci_pae_l3_pdirpa; /* PA of L3 PD */ + paddr_t ci_pae_l3_pdirpa; /* PA of L3 PD */ pd_entry_t * ci_pae_l3_pdir; /* VA pointer to L3 PD */ #endif [snip] + +#if defined(PAE) + ci-ci_pae_l3_pdir = (paddr_t *)uvm_km_alloc(kernel_map, PAGE_SIZE, 0, + UVM_KMF_WIRED | UVM_KMF_ZERO | UVM_KMF_NOWAIT); + + if (ci-ci_pae_l3_pdir == NULL) { + panic(%s: failed to allocate L3 per-cpu PD for CPU %d\n, + __func__, cpu_index(ci)); + } Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type. Done. Forgot to attribute your suggestion + this thread in the commit log. I hope this makes up for it :-) -- ~Cherry
Re: CVS commit: src/sys/arch/xen/xen
On 7 December 2011 19:19, Christoph Egger ceg...@netbsd.org wrote: Module Name: src Committed By: cegger Date: Wed Dec 7 13:49:04 UTC 2011 Modified Files: src/sys/arch/xen/xen: evtchn.c Log Message: build fix: add back sys/malloc.h. malloc(9) is still in use. Hi, sorry about this, I missed it - can you move the malloc() in pirq_establish() to kmem_alloc() instead ? iiuc, we're moving away from malloc() so this is regress. Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch
On 21 November 2011 01:11, Jean-Yves Migeon j...@netbsd.org wrote: Module Name: src Committed By: jym Date: Sun Nov 20 19:41:27 UTC 2011 Modified Files: src/sys/arch/x86/include: pmap.h ... +#ifdef PAE +void pmap_map_recursive_entries(void); +void pmap_unmap_recursive_entries(void); +#endif /* PAE */ + #endif /* XEN */ Since that's xen specific, can that go in xenpmap.h ? Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch
On 21 November 2011 02:45, Jean-Yves Migeon jeanyves.mig...@free.fr wrote: Observed in QEMU, Virtualbox, and my amd64 spare host. Thank you, I will look into this presently. -- ~Cherry
Re: CVS commit: src/sys/arch
On 8 November 2011 05:50, Jean-Yves Migeon jeanyves.mig...@free.fr wrote: On 06.11.2011 16:18, Cherry G. Mathew wrote: Module Name: src Committed By: cherry Date: Sun Nov 6 15:18:19 UTC 2011 Modified Files: src/sys/arch/amd64/include: pmap.h src/sys/arch/i386/include: pmap.h src/sys/arch/x86/include: cpu.h src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/x86: cpu.c x86_xpmap.c Log Message: [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware. Some comments. ... -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */ -#if defined(XEN) defined(__x86_64__) -#define PG_k PG_u -#else -#define PG_k 0 -#endif - Are you sure that all the mapping sites are safe (PT/PD bits), given the pmap split between pmap/xen_xpmap.c? Ok, I realise I've broken the build with this one - apologies. For some odd reason my tree built ok ( even after nuking the obj dir) The current bandaid by christos and njoly is incorrect. I propose re-exporting PG_k to x86/include/pmap.h until (if ?) the xen pmap is completely independant of the x86 one. Please let me know if there are objections to this patch below: Thanks, -- ~Cherry Index: arch/x86/include/pmap.h === RCS file: /cvsroot/src/sys/arch/x86/include/pmap.h,v retrieving revision 1.44 diff -u -r1.44 pmap.h --- arch/x86/include/pmap.h 6 Nov 2011 11:40:47 - 1.44 +++ arch/x86/include/pmap.h 8 Nov 2011 13:35:16 - @@ -173,6 +173,13 @@ ((pmap)-pm_pdirpa[0] + (index) * sizeof(pd_entry_t)) #endif +/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */ +#if defined(XEN) defined(__x86_64__) +#define PG_k PG_u +#else +#define PG_k 0 +#endif + /* * MD flags that we use for pmap_enter and pmap_kenter_pa: */ Index: arch/x86/x86/pmap.c === RCS file: /cvsroot/src/sys/arch/x86/x86/pmap.c,v retrieving revision 1.140 diff -u -r1.140 pmap.c --- arch/x86/x86/pmap.c 8 Nov 2011 12:44:29 - 1.140 +++ arch/x86/x86/pmap.c 8 Nov 2011 13:35:20 - @@ -211,11 +211,6 @@ #include xen/hypervisor.h #endif -/* If this is not needed anymore it should be GC'ed */ -#ifndef PG_k -#definePG_k0 -#endif - /* * general info: * Index: arch/xen/include/xenpmap.h === RCS file: /cvsroot/src/sys/arch/xen/include/xenpmap.h,v retrieving revision 1.30 diff -u -r1.30 xenpmap.h --- arch/xen/include/xenpmap.h 6 Nov 2011 11:40:47 - 1.30 +++ arch/xen/include/xenpmap.h 8 Nov 2011 13:35:20 - @@ -34,13 +34,6 @@ #include opt_xen.h #endif -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */ -#if defined(XEN) defined(__x86_64__) -#define PG_k PG_u -#else -#define PG_k 0 -#endif - #defineINVALID_P2M_ENTRY (~0UL) void xpq_queue_machphys_update(paddr_t, paddr_t); Index: arch/xen/x86/xen_pmap.c === RCS file: /cvsroot/src/sys/arch/xen/x86/xen_pmap.c,v retrieving revision 1.7 diff -u -r1.7 xen_pmap.c --- arch/xen/x86/xen_pmap.c 6 Nov 2011 11:40:47 - 1.7 +++ arch/xen/x86/xen_pmap.c 8 Nov 2011 13:35:21 - @@ -142,13 +142,6 @@ #include xen/hypervisor.h #endif -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */ -#if defined(XEN) defined(__x86_64__) -#define PG_k PG_u -#else -#define PG_k 0 -#endif - #define COUNT(x) /* nothing */ static pd_entry_t * const alternate_pdes[] = APDES_INITIALIZER;
Re: CVS commit: src/sys/arch
On 8 November 2011 15:20, Jean-Yves Migeon jeanyves.mig...@free.fr wrote: On Tue, 8 Nov 2011 08:49:22 +0530, Cherry G. Mathew wrote: Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type. Unless you can convince me that your code will do the right thing(tm), the PDP for PAE should to be allocated below the 4GiB physical boundary; unless mistaken, the uvm_km_alloc(kernel_map, ...) does not enforce that. Besides, %cr3 is still a 32 bits entity under PAE. Trying to stash a 64 bits paddr_t in %cr3 is a really bad idea. See comments http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588 This doesn't hold true for Xen. See: http://lxr.xensource.com/lxr/source/xen/arch/x86/mm.c#L509 Xen does the 4GB translation for the VM by maintaining its own shadow. I just took the easier route ( after initially using the x86 pmap 4GB cr3 code ), but if you're thinking of a unified a xen/non-xen implementation we'll have to re-think this one. Okay; then all users of cr3 have to be documented (_especially_ the native x86 code), as there at least two places where there's implicit uint64_t = uint32_t truncation: lcr3() and pcb_cr3. A comment above these should be enough. Same goes for having paddr_t for L3 PD in struct cpu_info. hmm... I wonder if it would be cleaner to do this within a #ifdef XEN/#endif ? - are you sure that these have to be defined(PAE) || defined(__x86_64__) ? That's a crude way of making pmap_cpu_init_late() do nothing for i386 (non-pae) since i386 doesn't need shadow PMDs. - please use for (i = 0; i PTP_LEVELS - 1; i++ ) { ... }. I will have to identify these blocks later when GENERIC will include both PAE and !PAE pmaps. PTP_LEVELS makes it a lot easier. ok, will watchout for it - do you want to do this one yourself ? Please do :) ok, I'll take flak for further breakage ;-P Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch
On 8 November 2011 19:11, Cherry G. Mathew cherry.g.mat...@gmail.com wrote: On 8 November 2011 05:50, Jean-Yves Migeon jeanyves.mig...@free.fr wrote: On 06.11.2011 16:18, Cherry G. Mathew wrote: Module Name: src Committed By: cherry Date: Sun Nov 6 15:18:19 UTC 2011 Modified Files: src/sys/arch/amd64/include: pmap.h src/sys/arch/i386/include: pmap.h src/sys/arch/x86/include: cpu.h src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/x86: cpu.c x86_xpmap.c Log Message: [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware. Some comments. ... -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */ -#if defined(XEN) defined(__x86_64__) -#define PG_k PG_u -#else -#define PG_k 0 -#endif - Are you sure that all the mapping sites are safe (PT/PD bits), given the pmap split between pmap/xen_xpmap.c? Ok, I realise I've broken the build with this one - apologies. For some odd reason my tree built ok ( even after nuking the obj dir) The current bandaid by christos and njoly is incorrect. I propose re-exporting PG_k to x86/include/pmap.h until (if ?) the xen pmap is completely independant of the x86 one. Please let me know if there are objections to this patch below: Thanks, -- ~Cherry Index: arch/x86/include/pmap.h === RCS file: /cvsroot/src/sys/arch/x86/include/pmap.h,v retrieving revision 1.44 diff -u -r1.44 pmap.h --- arch/x86/include/pmap.h 6 Nov 2011 11:40:47 - 1.44 +++ arch/x86/include/pmap.h 8 Nov 2011 13:35:16 - @@ -173,6 +173,13 @@ ((pmap)-pm_pdirpa[0] + (index) * sizeof(pd_entry_t)) #endif +/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */ +#if defined(XEN) defined(__x86_64__) +#define PG_k PG_u +#else +#define PG_k 0 +#endif + /* * MD flags that we use for pmap_enter and pmap_kenter_pa: */ Index: arch/x86/x86/pmap.c === RCS file: /cvsroot/src/sys/arch/x86/x86/pmap.c,v retrieving revision 1.140 diff -u -r1.140 pmap.c --- arch/x86/x86/pmap.c 8 Nov 2011 12:44:29 - 1.140 +++ arch/x86/x86/pmap.c 8 Nov 2011 13:35:20 - @@ -211,11 +211,6 @@ #include xen/hypervisor.h #endif -/* If this is not needed anymore it should be GC'ed */ -#ifndef PG_k -#define PG_k 0 -#endif - /* * general info: * Index: arch/xen/include/xenpmap.h === RCS file: /cvsroot/src/sys/arch/xen/include/xenpmap.h,v retrieving revision 1.30 diff -u -r1.30 xenpmap.h --- arch/xen/include/xenpmap.h 6 Nov 2011 11:40:47 - 1.30 +++ arch/xen/include/xenpmap.h 8 Nov 2011 13:35:20 - @@ -34,13 +34,6 @@ #include opt_xen.h #endif -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */ -#if defined(XEN) defined(__x86_64__) -#define PG_k PG_u -#else -#define PG_k 0 -#endif - #define INVALID_P2M_ENTRY (~0UL) void xpq_queue_machphys_update(paddr_t, paddr_t); Index: arch/xen/x86/xen_pmap.c === RCS file: /cvsroot/src/sys/arch/xen/x86/xen_pmap.c,v retrieving revision 1.7 diff -u -r1.7 xen_pmap.c --- arch/xen/x86/xen_pmap.c 6 Nov 2011 11:40:47 - 1.7 +++ arch/xen/x86/xen_pmap.c 8 Nov 2011 13:35:21 - @@ -142,13 +142,6 @@ #include xen/hypervisor.h #endif -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */ -#if defined(XEN) defined(__x86_64__) -#define PG_k PG_u -#else -#define PG_k 0 -#endif - #define COUNT(x) /* nothing */ static pd_entry_t * const alternate_pdes[] = APDES_INITIALIZER; Committed. Thanks, -- ~Cherry
Re: CVS commit: src/sys/arch
Hi, On 9 November 2011 02:02, Jean-Yves Migeon jeanyves.mig...@free.fr wrote: On 08.11.2011 14:53, Cherry G. Mathew wrote: On 8 November 2011 15:20, Jean-Yves Migeonjeanyves.mig...@free.fr wrote: On Tue, 8 Nov 2011 08:49:22 +0530, Cherry G. Mathew wrote: Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type. Unless you can convince me that your code will do the right thing(tm), the PDP for PAE should to be allocated below the 4GiB physical boundary; unless mistaken, the uvm_km_alloc(kernel_map, ...) does not enforce that. Besides, %cr3 is still a 32 bits entity under PAE. Trying to stash a 64 bits paddr_t in %cr3 is a really bad idea. See comments http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588 This doesn't hold true for Xen. See: http://lxr.xensource.com/lxr/source/xen/arch/x86/mm.c#L509 Xen does the4GB translation for the VM by maintaining its own shadow. I just took the easier route ( after initially using the x86 pmap 4GB cr3 code ), but if you're thinking of a unified a xen/non-xen implementation we'll have to re-think this one. Okay; then all users of cr3 have to be documented (_especially_ the native x86 code), as there at least two places where there's implicit uint64_t = uint32_t truncation: lcr3() and pcb_cr3. A comment above these should be enough. Same goes for having paddr_t for L3 PD in struct cpu_info. hmm... I wonder if it would be cleaner to do this within a #ifdef XEN/#endif ? The cleanest way would be to share the code between x86 and Xen, keep the allocation below 4GiB boundary for both, and use it everywhere in the pmap code. Only the manipulation of the vcpu_guest_context_t ctrlregs members would have to force this use. Fair enough. Although the 4G tests would be a bit deceptive (since they're cosmetic) - I guess leaving a note in the code about the rationale behind this will help. - are you sure that these have to be defined(PAE) || defined(__x86_64__) ? That's a crude way of making pmap_cpu_init_late() do nothing for i386 (non-pae) since i386 doesn't need shadow PMDs. In that case, test against i386_use_pae (0 == !PAE, 1 == PAE), and simply return if !PAE. Avoid having loonnng macro blocks with different levels of #ifdef. It's fairly difficult to untangle and unpleasant to read. I agree - this looks better. Thanks, -- ~Cherry
Re: CVS commit: src/sys/arch
Hi, On 8 November 2011 05:50, Jean-Yves Migeon jeanyves.mig...@free.fr wrote: On 06.11.2011 16:18, Cherry G. Mathew wrote: Module Name: src Committed By: cherry Date: Sun Nov 6 15:18:19 UTC 2011 Modified Files: src/sys/arch/amd64/include: pmap.h src/sys/arch/i386/include: pmap.h src/sys/arch/x86/include: cpu.h src/sys/arch/x86/x86: pmap.c src/sys/arch/xen/x86: cpu.c x86_xpmap.c Log Message: [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware. Some comments. -#ifdef PAE -/* Address of the static kernel's L2 page */ -pd_entry_t *pmap_kl2pd; -paddr_t pmap_kl2paddr; -#endif - - [snip] #ifdef PAE - uint32_t ci_pae_l3_pdirpa; /* PA of L3 PD */ + paddr_t ci_pae_l3_pdirpa; /* PA of L3 PD */ pd_entry_t * ci_pae_l3_pdir; /* VA pointer to L3 PD */ #endif [snip] + +#if defined(PAE) + ci-ci_pae_l3_pdir = (paddr_t *)uvm_km_alloc(kernel_map, PAGE_SIZE, 0, + UVM_KMF_WIRED | UVM_KMF_ZERO | UVM_KMF_NOWAIT); + + if (ci-ci_pae_l3_pdir == NULL) { + panic(%s: failed to allocate L3 per-cpu PD for CPU %d\n, + __func__, cpu_index(ci)); + } Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type. Unless you can convince me that your code will do the right thing(tm), the PDP for PAE should to be allocated below the 4GiB physical boundary; unless mistaken, the uvm_km_alloc(kernel_map, ...) does not enforce that. Besides, %cr3 is still a 32 bits entity under PAE. Trying to stash a 64 bits paddr_t in %cr3 is a really bad idea. See comments http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588 This doesn't hold true for Xen. See: http://lxr.xensource.com/lxr/source/xen/arch/x86/mm.c#L509 Xen does the 4GB translation for the VM by maintaining its own shadow. I just took the easier route ( after initially using the x86 pmap 4GB cr3 code ), but if you're thinking of a unified a xen/non-xen implementation we'll have to re-think this one. -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */ -#if defined(XEN) defined(__x86_64__) -#define PG_k PG_u -#else -#define PG_k 0 -#endif - Are you sure that all the mapping sites are safe (PT/PD bits), given the pmap split between pmap/xen_xpmap.c? No. I haven't reviewed this in that context - however the use of PG_k is exactly the same in xen as it was before this commit. On a side note, please stick to KNF especially for 80 columns. Noted, thanks. [snip] +void +pmap_cpu_init_late(struct cpu_info *ci) +{ +#if defined(PAE) || defined(__x86_64__) + /* + * The BP has already its own PD page allocated during early + * MD startup. + */ + + if (ci ==cpu_info_primary) + return; + + KASSERT(ci != NULL); + ci-ci_pae_l3_pdirpa = vtophys((vaddr_t) ci-ci_pae_l3_pdir); + KASSERT(ci-ci_pae_l3_pdirpa != 0); + + /* Initialise L2 entries 0 - 2: Point them to pmap_kernel() */ + ci-ci_pae_l3_pdir[0] = + xpmap_ptom_masked(pmap_kernel()-pm_pdirpa[0]) | PG_V; + ci-ci_pae_l3_pdir[1] = + xpmap_ptom_masked(pmap_kernel()-pm_pdirpa[1]) | PG_V; + ci-ci_pae_l3_pdir[2] = + xpmap_ptom_masked(pmap_kernel()-pm_pdirpa[2]) | PG_V; +#endif /* PAE */ - are you sure that these have to be defined(PAE) || defined(__x86_64__) ? - please use for (i = 0; i PTP_LEVELS - 1; i++ ) { ... }. I will have to identify these blocks later when GENERIC will include both PAE and !PAE pmaps. PTP_LEVELS makes it a lot easier. ok, will watchout for it - do you want to do this one yourself ? That's all for the first quick review :) Thanks for starting the merge of your branch. Thanks for this - looking forward to more :-) -- ~Cherry
Re: CVS commit: [cherry-xenmp] src/sys/arch/xen/xen
Hi Manuel, On 23 October 2011 04:12, Manuel Bouyer bou...@netbsd.org wrote: Module Name: src Committed By: bouyer Date: Sat Oct 22 22:42:21 UTC 2011 Modified Files: src/sys/arch/xen/xen [cherry-xenmp]: clock.c Log Message: More improvements: - Precise what tmutex protects - springle some KASSERT(mutex_owned(tmutex)) - the system time uses the CPU timecounter to get some precision; don't mix local CPU timecounter with VCPU0's time, it won't work well. Always use curcpu()'s time. To generate a diff of this commit: cvs rdiff -u -r1.54.6.6 -r1.54.6.7 src/sys/arch/xen/xen/clock.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Thanks for this - I'm curious to know if this fixes the clock drift that has been reported on port-xen@ Thanks, -- ~Cherry
Re: CVS commit: [cherry-xenmp] src/sys/arch
On 31/08/2011, Jean-Yves Migeon jeanyves.mig...@free.fr wrote: On 30.08.2011 14:53, Cherry G. Mathew wrote: Module Name: src Committed By:cherry Date:Tue Aug 30 12:53:46 UTC 2011 Modified Files: src/sys/arch/i386/i386 [cherry-xenmp]: machdep.c src/sys/arch/xen/x86 [cherry-xenmp]: cpu.c x86_xpmap.c Log Message: Add per-cpu mmu queues Thanks for looking into it! I do owe you a reply on tech-kern@ :-) Will get to it once I have more data! Thanks, -- ~Cherry
Re: CVS commit: src/sys/arch/xen
Cc:ing tech-kern, to get wider feedback. Thread started here: http://mail-index.netbsd.org/source-changes-d/2011/08/21/msg003897.html JM == Jean-Yves Migeon jeanyves.mig...@free.fr writes: JM On Mon, 22 Aug 2011 12:47:40 +0200, Manuel Bouyer wrote: This is slightly more complicated than it appears. Some of the ops in a per-cpu queue may have ordering dependencies with other cpu queues, and I think this would be hard to express trivially. (an example would be a pte update on one queue, and reading the same pte read on another queue - these cases are quite analogous (although completely unrelated) Hi, So I had a better look at this - implemented per-cpu queues and messed with locking a bit: read don't go through the xpq queue, don't they ? JM Nope, PTE are directly obtained from the recursive mappings JM (vtopte/kvtopte). Let's call this out of band reads. But see below for in-band reads. JM Content is obviously only writable by hypervisor (so it can JM keep control of his mapping alone). I think this is similar to a tlb flush but the other way round, I guess we could use a IPI for this too. JM IIRC that's what the current native x86 code does: it uses an JM IPI to signal other processors that a shootdown is necessary. Xen's TLB_FLUSH operation is synchronous, and doesn't require an IPI (within the domain), which makes the queue ordering even more important (to make sure that stale ptes are not reloaded before the per-cpu queue has made progress). Yes, we can implement a roundabout ipi driven queueflush + tlbflush scheme(described below), but that would be performance sensitive, and the basic issue won't go away, imho. Let's stick to the xpq ops for a second, ignoring out-of-band reads (for which I agree that your assertion, that locking needs to be done at a higher level, holds true). The question here, really is, what are the global ordering requirements of per-cpu memory op queues, given the following basic ops: i) write memory (via MMU_NORMAL_PT_UPDATE, MMU_MACHPHYS_UPDATE) ii) read memory via: MMUEXT_PIN_L1_TABLE MMUEXT_PIN_L2_TABLE MMUEXT_PIN_L3_TABLE MMUEXT_PIN_L4_TABLE MMUEXT_UNPIN_TABLE MMUEXT_NEW_BASEPTR MMUEXT_TLB_FLUSH_LOCAL MMUEXT_INVLPG_LOCAL MMUEXT_TLB_FLUSH_MULTI MMUEXT_INVLPG_MULTI MMUEXT_TLB_FLUSH_ALL MMUEXT_INVLPG_ALL MMUEXT_FLUSH_CACHE MMUEXT_NEW_USER_BASEPTR (ie; anything that will cause the processor to re-read data updated via another cpu (via, for eg: pte update with i), above) There's two ways I can think of fixing this: a) *before* queueing a local read-op, synchronously flush queues on all other CPUs via ipis. This is slightly racy, but can be done, I think. An optimisation for invlpg could be to implement a scoreboard that watches mem. locations that have been queued for update on any cpu. Scan through the scoreboard for the memory range we're invlpg-ing. If it's not there, there's no need to flush any queues on other cpus. b) read-ops wait on a global condvar. If it's set, a write-op that needs flushing is pending. Wait (with optional timeout and ipi-nudge) until the remote queue is flushed. When flushing a queue, send a cv_broadcast to any waiters. Option b) is slightly better than my current scheme which is to lock any and all mmu-ops and operate the queues synchronously (via XENDEBUG_SYNC). I cannot think of anything else, other than ad-hoc locking + queue flushing, which could be hard to maintain and debug in the long run. I'm thinking that it might be easier and more justifiable to nuke the current queue scheme and implement shadow page tables, which would fit more naturally and efficiently with CAS pte updates, etc. I'm not sure this would completely fis the issue: with shadow page tables you can't use a CAS to assure atomic operation with the hardware TLB, as this is, precisely, a shadow PT and not the one used by hardware. Definitely worth looking into, I imho. I'm not very comfortable with the queue based scheme for MP. the CAS doesn't provide any guarantees with the TLB on native h/w, afaict. If you do a CAS pte update, and the update succeeded, it's a good idea to invalidate + shootdown anyway (even on baremetal). Do let me know your thoughts, Cheers, -- Cherry
Re: CVS commit: src/sys/arch/xen
Hi Manuel, Manuel == Manuel Bouyer bou...@antioche.eu.org writes: [...] Xen's TLB_FLUSH operation is synchronous, and doesn't require an IPI (within the domain), which makes the queue ordering even more important (to make sure that stale ptes are not reloaded before the per-cpu queue has made progress). Yes, we can implement a roundabout ipi driven queueflush + tlbflush scheme(described below), but that would be performance sensitive, and the basic issue won't go away, imho. Let's stick to the xpq ops for a second, ignoring out-of-band reads (for which I agree that your assertion, that locking needs to be done at a higher level, holds true). The question here, really is, what are the global ordering requirements of per-cpu memory op queues, given the following basic ops: i) write memory (via MMU_NORMAL_PT_UPDATE, MMU_MACHPHYS_UPDATE) ii) read memory via: MMUEXT_PIN_L1_TABLE MMUEXT_PIN_L2_TABLE MMUEXT_PIN_L3_TABLE MMUEXT_PIN_L4_TABLE MMUEXT_UNPIN_TABLE Manuel This is when adding/removing a page table from a pmap. When Manuel this occurs, the pmap is locked, isn't it ? MMUEXT_NEW_BASEPTR MMUEXT_NEW_USER_BASEPTR Manuel This is a context switch MMUEXT_TLB_FLUSH_LOCAL MMUEXT_INVLPG_LOCAL MMUEXT_TLB_FLUSH_MULTI MMUEXT_INVLPG_MULTI MMUEXT_TLB_FLUSH_ALL MMUEXT_INVLPG_ALL MMUEXT_FLUSH_CACHE Manuel This may, or may not, cause a read. This usually happens Manuel after updating the pmap, and I guess this also happens with Manuel the pmap locked (I have not carefully checked). Manuel So couldn't we just use the pmap lock for this ? I suspect Manuel the same lock will also be needed for out of band reads at Manuel some point (right now it's protected by splvm()). I'm a bit confused now - are we assuming that the pmap lock protects the (pte update op queue-push(es) + pmap_pte_flush()) as a single atomic operation (ie; no invlpg/tlbflush or out-of-band-read can occur between the update(s) and the pmap_pte_flush()) ? If so, I think I've slightly misunderstood the scope of the mmu queue design - I assumed that the queue is longer-lived, and the sync points (for the queue flush) can span across pmap locking - a sort of lazy pte update, with the queue being flushed at out-of-band or in-band read time ( I guess that won't work though - how does one know when the hardware walks the page table ?) . It seems that the queue is meant for pte updates in loops, for eg:, quickly followed by a flush. Is this correct ? If so, there's just one hazard afaict - the synchronous TLB_FLUSH_MULTI could beat the race between the queue update and the queue flush via pmap_tlb_shootnow() (see pmap_tlb.c on the cherry-xenmp branch, and *if* other CPUs reload their TLBs before the flush, they'll have stale info. So the important question (rmind@ ?) is, is pmap_tlb_shootnow() guaranteed to be always called with the pmap lock held ? In real life, I removed the global xpq_queue_lock() and the pmap was falling apart. So a bit of debugging ahead. [...] I'm thinking that it might be easier and more justifiable to nuke the current queue scheme and implement shadow page tables, which would fit more naturally and efficiently with CAS pte updates, etc. I'm not sure this would completely fis the issue: with shadow page tables you can't use a CAS to assure atomic operation with the hardware TLB, as this is, precisely, a shadow PT and not the one used by hardware. Definitely worth looking into, I imho. I'm not very comfortable with the queue based scheme for MP. the CAS doesn't provide any guarantees with the TLB on native h/w, afaict. Manuel What makes you think so ? I think the hw TLB also does CAS Manuel to update referenced and dirty bits in the PTE, otherwise we Manuel couldn't rely on these bits; this would be bad especially Manuel for the dirty bit. Yes, I missed that one (which is much of the point of the CAS in the first place!), you're right. If you do a CAS pte update, and the update succeeded, it's a good idea to invalidate + shootdown anyway (even on baremetal). Manuel Yes, of course inval is needed after updating the PTE. But Manuel using a true CAS is important to get the refereced and dirty Manuel bits right. I haven't looked into this in detail, but I thought that xen disconnects the shadow (presumably with correct update/dirty bits) and flushes the TLB when a write to the shadow occurs ? In which case these bits will be in sync until the next h/w access ? I'm speculating, I haven't looked at the xen code for this yet. -- Cherry
Re: CVS commit: src/sys/arch/xen
JM == Jean-Yves Migeon jeanyves.mig...@free.fr writes: JM On 21.08.2011 12:26, Jean-Yves Migeon wrote: - second, the lock is not placed at the correct abstraction level IMHO, it is way too high in the caller/callee hierarchy. It should remain hidden from most consumers of the xpq_queue API, and should only be used to protect the xpq_queue array together with its counters (and everything that isn't safe for all memory operations happening in xpq). I agree - part of the reason I did this was to make sure I didn't mess with the current queueing scheme before having a working implementation/PoC. Reason behind this is that your lock protects calls to hypervisor MMU operations, which are hypercalls (hence, a slow operation with regard to kernel). You are serializing lots of memory operations, something that should not happen from a performance point of view (some may take a faire amount of cycles to complete, like TLB flushes). I'd expect all Xen MMU hypercalls to be reentrant. I agree - it's not meant to be an efficient implementation - just a correct one. JM An alternative would be to have per-CPU xpq_queue[] also. This JM is not completely stupid, xpq_queue is meant as a way to put JM multiple MMU operations in a queue asynchronously before issuing JM only one hypercall to handle them all. This is slightly more complicated than it appears. Some of the ops in a per-cpu queue may have ordering dependencies with other cpu queues, and I think this would be hard to express trivially. (an example would be a pte update on one queue, and reading the same pte read on another queue - these cases are quite analogous (although completely unrelated) to classic RAW and other ordering dependencies in out-of-order execution scenarios due to pipelining, etc.). I'm thinking that it might be easier and more justifiable to nuke the current queue scheme and implement shadow page tables, which would fit more naturally and efficiently with CAS pte updates, etc. Otherwise, we'll have to re-invent scoreboarding or other dynamic scheduling tricks - I think that's a bit overkill :-) Cheers, -- Cherry
Re: CVS commit: src/sys/arch
On 10 August 2011 13:39, Cherry G. Mathew che...@netbsd.org wrote: Module Name: src Committed By: cherry Date: Wed Aug 10 11:39:46 UTC 2011 Modified Files: src/sys/arch/amd64/amd64: fpu.c machdep.c src/sys/arch/i386/isa: npx.c src/sys/arch/x86/x86: x86_machdep.c src/sys/arch/xen/conf: files.xen src/sys/arch/xen/include: intr.h Added Files: src/sys/arch/xen/x86: xen_ipi.c Log Message: xen ipi infrastructure Hi, I'd appreciate feedback specifically for multiuser boot of i386/conf/XEN3* please, with this change in place. Many Thanks, -- ~Cherry
Re: CVS commit: [cherry-xenmp] src/sys/arch/xen/x86
On 27 June 2011 12:23, Cherry G. Mathew che...@netbsd.org wrote: Module Name: src Committed By: cherry Date: Mon Jun 27 10:23:21 UTC 2011 Modified Files: src/sys/arch/xen/x86 [cherry-xenmp]: x86_xpmap.c Log Message: Add xpq locking around xpq-QUEUE_TLB_FLUSH() sorry, that was meant to be: Add xpq locking around xpq_queue_tlb_flush() cvs surgery done. Thanks, -- ~Cherry
Re: CVS commit: [cherry-xenmp] src/sys/arch/xen
On 6/26/2011 3:00 PM, Jukka Ruohonen wrote: On Sun, Jun 26, 2011 at 12:56:33PM +, Cherry G. Mathew wrote: Module Name:src Committed By: cherry Date: Sun Jun 26 12:56:33 UTC 2011 Modified Files: src/sys/arch/xen/include [cherry-xenmp]: intr.h src/sys/arch/xen/xen [cherry-xenmp]: hypervisor.c Log Message: Unbreak uniprocessor build I've asked this privately few times, but what -- if any -- is the rationale of #ifdef MULTIPROCESSOR in 2011 (for x86 MD, at least)? This has to do with the way the Xen domU kernel probes APs. In the UP case, one assumes that at least one cpu exists ( :-) ), and the attach succeeds. In the MP case, there is no way to figure out the number of configured number of vcpus, since xenstore is not accessible at that point ( see comments within the file). So we brute force, by telling Xen to bring up as many cpus as it lets us. In the case of domU, Xen will allow upto a maximum number of vcpus specified in the VM config file. The reason I've put hypervisor_vcpu_print() within the #ifdef is that it needs to be quiet, since there's no way to know if the probe succeed until the vcpu_match() function returns. So in short - it's a hack. I agree that the #ifdef should go if possible. Cherry -- Arms, drugs and spirituality – these are the three big businesses in the world ~ Javed Akhtar
Re: CVS commit: src/sys/arch/xen/xen
On 18 April 2011 04:21, Mindaugas Rasiukevicius rm...@netbsd.org wrote: Masao Uebayashi uebay...@tombi.co.jp wrote: On Mon, Apr 18, 2011 at 03:04:32AM +, Mindaugas Rasiukevicius wrote: Module Name: src Committed By: rmind Date: Mon Apr 18 03:04:31 UTC 2011 Modified Files: src/sys/arch/xen/xen: balloon.c Log Message: balloon_xenbus_attach: use KM_SLEEP for allocation. Note: please do not use KM_NOSLEEP. And, according to yamt@, KM_SLEEP can fail in the current design... IIRC yamt@ fixed it a year or few ago. And in the more specific immediate context: http://mail-index.netbsd.org/port-xen/2011/04/07/msg006613.html -- ~Cherry
Re: CVS commit: src/sys/arch/xen/xen
On 18 April 2011 08:00, Cherry G. Mathew cherry.g.mat...@gmail.com wrote: On 18 April 2011 04:21, Mindaugas Rasiukevicius rm...@netbsd.org wrote: Masao Uebayashi uebay...@tombi.co.jp wrote: On Mon, Apr 18, 2011 at 03:04:32AM +, Mindaugas Rasiukevicius wrote: Module Name: src Committed By: rmind Date: Mon Apr 18 03:04:31 UTC 2011 Modified Files: src/sys/arch/xen/xen: balloon.c Log Message: balloon_xenbus_attach: use KM_SLEEP for allocation. Note: please do not use KM_NOSLEEP. And, according to yamt@, KM_SLEEP can fail in the current design... IIRC yamt@ fixed it a year or few ago. And in the more specific immediate context: http://mail-index.netbsd.org/port-xen/2011/04/07/msg006613.html Hi, PS: Can you please revert ? Unless it's breaking anything else, or you have a fix for the problem mentioned in the thread above, ie; Cheers, -- ~Cherry
Re: CVS commit: src/sys/arch/xen/xen
On 18 April 2011 09:04, Jean-Yves Migeon jeanyves.mig...@free.fr wrote: On 18.04.2011 09:23, Cherry G. Mathew wrote: On 18 April 2011 08:00, Cherry G. Mathew cherry.g.mat...@gmail.com wrote: On 18 April 2011 04:21, Mindaugas Rasiukevicius rm...@netbsd.org wrote: Masao Uebayashi uebay...@tombi.co.jp wrote: On Mon, Apr 18, 2011 at 03:04:32AM +, Mindaugas Rasiukevicius wrote: Module Name: src Committed By: rmind Date: Mon Apr 18 03:04:31 UTC 2011 Modified Files: src/sys/arch/xen/xen: balloon.c Log Message: balloon_xenbus_attach: use KM_SLEEP for allocation. Note: please do not use KM_NOSLEEP. And, according to yamt@, KM_SLEEP can fail in the current design... IIRC yamt@ fixed it a year or few ago. And in the more specific immediate context: http://mail-index.netbsd.org/port-xen/2011/04/07/msg006613.html PS: Can you please revert ? Unless it's breaking anything else, or you have a fix for the problem mentioned in the thread above, ie; Uho, I forgot to mention in my commit log that I fixed it. I am allocating bpg_entries via pool_cache(9), and the constructor bpge_ctor() will return an error if uvm(9) fails to find a free page. In that case, the thread will just bail out and start waiting again. Ah right, I missed your commit, sorry. Thanks, -- ~Cherry