Re: CVS commit: src/sys/arch
On Tue, 17 Aug 2010 18:18:22 +0100, David Laight wrote: >> Unifying sizes of certain types up to 64-bits would be a good choice >> from engineering standpoint. Actually measuring the overhead would >> answer whether it is worth. > > Why not use a 64bit type that is the union of a 64bit number and two > 32bit numbers. > A kernel that uses 32bit paddr_t would use tha approriate (for the > endianness) 32bit value, having initialised the unused half to zero. > > That would remove almost everything except the structure size overhead Which is precisely the "problem" here :) > while maintaining binary compatibility. I don't think so; modules that manipulates 32 bits bus_addr_t will likely fail with PAE when you cross the 4GB boundary. I don't see how the union could solve that. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On Tue, Aug 17, 2010 at 04:08:10PM +0100, Mindaugas Rasiukevicius wrote: > > Unifying sizes of certain types up to 64-bits would be a good choice > from engineering standpoint. Actually measuring the overhead would > answer whether it is worth. Why not use a 64bit type that is the union of a 64bit number and two 32bit numbers. A kernel that uses 32bit paddr_t would use tha approriate (for the endianness) 32bit value, having initialised the unused half to zero. That would remove almost everything except the structure size overhead while maintaining binary compatibility. (Embedding the integer type in a struct/union also makes any places where it is used incorrectly rather more obvious.) David -- David Laight: da...@l8s.co.uk
Re: CVS commit: src/sys/arch
On Tue, 17 Aug 2010 14:26:53 +, Andrew Doran wrote: > Hi, > > Why are any types other than in the pmap different between PAE and !PAE? > > When this was originally proposed I asked for stuff like paddr_t to be 64 > bits no matter what the kernel compile settings where. > > Thanks. I need more time! Let's do this /gradually/. There are already regressions with PAE that needs fixing before moving further, especially "bugs" that people reported to me I cannot reproduce with my hosts (a.out breakage with Xen PAE + NX, and invalid TLB entries that only happens 'randomly' with some AMD 64 Athlon CPUs). That, and also: - rmind@ branch work on uvm/pmap. Such a change will interfere with his work. - the issue is wider, bus_addr_t will move to "all 64 bits" too (performance impact?) - p{d,t}_entry_t vs paddr_t size mix up in pmap, which is bad in the !PAE case (32 vs 64 bits). Breaking i386 pmap will not be particularly good for my life expectancy. - I want to merge xensuspend before entering another bug hunting session :) One at a time is enough :) - this is orthogonal to the kvm(3) issue: having 64 bits in-kernel paddr_t will not automagically solve the "all PAs are unsigned long" assumption of kvm(3). -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/kern
Juergen Hannken-Illjes wrote: > > > > <...> > > > > > > > > It's not obvious from your commit message what prevents it from > > > > gaining a reference after the lock is dropped. > > > > > > The interlock is taken before the freelist lock is dropped and vnodes > > > on the free lists should never appear on other lists. > > > > Not even the name cache? > > > > (I can't remember off the top of my head, but IIRC there's something > > sneaky about it) > > Name cache does vtryget() then vget(). As the usecount is zero, vtryget() > fails and vget() gets called with the interlock held and needs the > freelist lock to proceed. > > I still don't see a race here. It would be quite cool to expand the comment in the top of vfs_subr.c, with sections like "Vnode reclamation mechanism", "Vnode life-cycle", etc and thus document our knowledge. Now, each time we re-read and re-discover the logic and potential corner cases. And, I believe, not all corner cases. Just thinking out load. :) -- Mindaugas
Re: CVS commit: src/sys/arch
"Christoph Egger" wrote: > > > Hi, > > > > Why are any types other than in the pmap different between > > PAE and !PAE? > > > > When this was originally proposed I asked for stuff like > > paddr_t to be 64 bits no matter what the kernel compile > > settings where. > > Interesting to hear that from your side. When asked for this > a debate about overhead started. Unifying sizes of certain types up to 64-bits would be a good choice from engineering standpoint. Actually measuring the overhead would answer whether it is worth. My bet is that the overhead would be quite low on modern x86 machines. > > Christoph -- Mindaugas
Re: CVS commit: src/sys/arch
> Hi, > > Why are any types other than in the pmap different between > PAE and !PAE? > > When this was originally proposed I asked for stuff like > paddr_t to be 64 bits no matter what the kernel compile > settings where. Interesting to hear that from your side. When asked for this a debate about overhead started. Christoph > Thanks. > > > On Mon, Aug 16, 2010 at 10:45:10PM +0200, Jean-Yves Migeon wrote: > > On 16.08.2010 22:22, matthew green wrote: > > >> Module Name: src > > >> Committed By:jym > > >> Date:Mon Aug 16 19:39:06 UTC 2010 > > >> > > >> Modified Files: > > >> src/sys/arch/i386/i386: machdep.c > > >> src/sys/arch/x86/include: cpu.h > > >> > > >> Log Message: > > >> Add machdep.pae sysctl(7) for i386. Thanks to Paul and Joerg for > their > > >> reviews. > > >> > > >> In kernel, it matches the 'i386_use_pae' variable (0: kernel does not > use > > >> PAE, 1: kernel uses PAE). Will be used by i386 kvm(3) to know the > functions > > >> that should get called for VA => PA translations. > > > > > > does this work for core files as well? > > > > > > if not, wouldn't this feature be better done use ugly kvm/nlist so > > > that it works with the same code on dead kernels? getting a 0/1 > > > value via kvm shouldn't really be considered any real problem. > > > > That's the purpose; I don't know of any other way. The sysctl is only > > there for practical purposes: cpuctl(8) can indicate that the CPU > > supports PAE, but there is no easy way to know if it is active or not > > (unless playing with config -x together with the booted kernel, not very > > practical...). Hence, the sysctl machdep. > > > > > (why the extern in cpu.h? i doesn't seem necessary.) > > > > Yes; I wanted to place it at the same level as i386_use_fxsave. > > i386_use_pae may get use elsewhere eventually, so I added it to cpu.h. > > > > -- > > Jean-Yves Migeon > > jeanyves.mig...@free.fr
Re: CVS commit: src/sys/kern
On Tue Aug 17 2010 at 16:26:10 +0200, Juergen Hannken-Illjes wrote: > > > > It's not obvious from your commit message what prevents it from gaining > > > > a reference after the lock is dropped. > > > > > > The interlock is taken before the freelist lock is dropped and vnodes on > > > the free lists should never appear on other lists. > > > > Not even the name cache? > > > > (I can't remember off the top of my head, but IIRC there's something > > sneaky about it) > > Name cache does vtryget() then vget(). As the usecount is zero, vtryget() > fails and vget() gets called with the interlock held and needs the freelist > lock to proceed. > > I still don't see a race here. Ok, it wasn't clear what you meant by "other lists". Your commit message makes sense now.
Re: CVS commit: src/sys/kern
On Tue, Aug 17, 2010 at 04:45:36PM +0300, Antti Kantee wrote: > On Tue Aug 17 2010 at 15:41:11 +0200, Juergen Hannken-Illjes wrote: > > > > vp->v_freelisthd = NULL; > > > > mutex_exit(&vnode_free_list_lock); > > > > > > > > - if (vp->v_usecount != 0) { > > > > - /* > > > > -* was referenced again before we got the interlock > > > > -* Don't return to freelist - the holder of the last > > > > -* reference will destroy it. > > > > -*/ > > > > - mutex_exit(&vp->v_interlock); > > > > - mutex_enter(&vnode_free_list_lock); > > > > - goto retry; > > > > - } > > > > + KASSERT(vp->v_usecount == 0); > > > > > > It's not obvious from your commit message what prevents it from gaining > > > a reference after the lock is dropped. > > > > The interlock is taken before the freelist lock is dropped and vnodes on > > the free lists should never appear on other lists. > > Not even the name cache? > > (I can't remember off the top of my head, but IIRC there's something > sneaky about it) Name cache does vtryget() then vget(). As the usecount is zero, vtryget() fails and vget() gets called with the interlock held and needs the freelist lock to proceed. I still don't see a race here. -- Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: CVS commit: src/sys/arch
Hi, Why are any types other than in the pmap different between PAE and !PAE? When this was originally proposed I asked for stuff like paddr_t to be 64 bits no matter what the kernel compile settings where. Thanks. On Mon, Aug 16, 2010 at 10:45:10PM +0200, Jean-Yves Migeon wrote: > On 16.08.2010 22:22, matthew green wrote: > >> Module Name: src > >> Committed By: jym > >> Date: Mon Aug 16 19:39:06 UTC 2010 > >> > >> Modified Files: > >>src/sys/arch/i386/i386: machdep.c > >>src/sys/arch/x86/include: cpu.h > >> > >> Log Message: > >> Add machdep.pae sysctl(7) for i386. Thanks to Paul and Joerg for their > >> reviews. > >> > >> In kernel, it matches the 'i386_use_pae' variable (0: kernel does not use > >> PAE, 1: kernel uses PAE). Will be used by i386 kvm(3) to know the functions > >> that should get called for VA => PA translations. > > > > does this work for core files as well? > > > > if not, wouldn't this feature be better done use ugly kvm/nlist so > > that it works with the same code on dead kernels? getting a 0/1 > > value via kvm shouldn't really be considered any real problem. > > That's the purpose; I don't know of any other way. The sysctl is only > there for practical purposes: cpuctl(8) can indicate that the CPU > supports PAE, but there is no easy way to know if it is active or not > (unless playing with config -x together with the booted kernel, not very > practical...). Hence, the sysctl machdep. > > > (why the extern in cpu.h? i doesn't seem necessary.) > > Yes; I wanted to place it at the same level as i386_use_fxsave. > i386_use_pae may get use elsewhere eventually, so I added it to cpu.h. > > -- > Jean-Yves Migeon > jeanyves.mig...@free.fr
Re: CVS commit: src/sys/kern
On Tue, Aug 17, 2010 at 04:45:36PM +0300, Antti Kantee wrote: > On Tue Aug 17 2010 at 15:41:11 +0200, Juergen Hannken-Illjes wrote: > > > > vp->v_freelisthd = NULL; > > > > mutex_exit(&vnode_free_list_lock); > > > > > > > > - if (vp->v_usecount != 0) { > > > > - /* > > > > -* was referenced again before we got the interlock > > > > -* Don't return to freelist - the holder of the last > > > > -* reference will destroy it. > > > > -*/ > > > > - mutex_exit(&vp->v_interlock); > > > > - mutex_enter(&vnode_free_list_lock); > > > > - goto retry; > > > > - } > > > > + KASSERT(vp->v_usecount == 0); > > > > > > It's not obvious from your commit message what prevents it from gaining > > > a reference after the lock is dropped. > > > > The interlock is taken before the freelist lock is dropped and vnodes on > > the free lists should never appear on other lists. > > Not even the name cache? > > (I can't remember off the top of my head, but IIRC there's something > sneaky about it) They can still appear on mount lists I think (it has been a while).. Are these changes in diff format anywhere?
Re: CVS commit: src/sys/kern
On Tue Aug 17 2010 at 15:41:11 +0200, Juergen Hannken-Illjes wrote: > > > vp->v_freelisthd = NULL; > > > mutex_exit(&vnode_free_list_lock); > > > > > > - if (vp->v_usecount != 0) { > > > - /* > > > - * was referenced again before we got the interlock > > > - * Don't return to freelist - the holder of the last > > > - * reference will destroy it. > > > - */ > > > - mutex_exit(&vp->v_interlock); > > > - mutex_enter(&vnode_free_list_lock); > > > - goto retry; > > > - } > > > + KASSERT(vp->v_usecount == 0); > > > > It's not obvious from your commit message what prevents it from gaining > > a reference after the lock is dropped. > > The interlock is taken before the freelist lock is dropped and vnodes on > the free lists should never appear on other lists. Not even the name cache? (I can't remember off the top of my head, but IIRC there's something sneaky about it)
Re: CVS commit: src/sys/kern
On Tue, Aug 17, 2010 at 04:35:31PM +0300, Antti Kantee wrote: > On Tue Aug 17 2010 at 13:17:48 +, Juergen Hannken-Illjes wrote: > > Module Name:src > > Committed By: hannken > > Date: Tue Aug 17 13:17:48 UTC 2010 > > > > Modified Files: > > src/sys/kern: vfs_subr.c > > > > Log Message: > > Now that ffs on disk inodes get freed in the reclaim routine it is no longer > > necessary for vget() to handle VI_INACTNOW as a special case. Remove this > > check and its support in vrelel(). > > Hi, > > I didn't fully review your changes, so just to make sure: it's still > possible to gain a reference during inactive, right? Yes. > > Getting another reference while the freelist is locked is an error. Replace > > the check with a KASSERT. > > > > > vp->v_freelisthd = NULL; > > mutex_exit(&vnode_free_list_lock); > > > > - if (vp->v_usecount != 0) { > > - /* > > -* was referenced again before we got the interlock > > -* Don't return to freelist - the holder of the last > > -* reference will destroy it. > > -*/ > > - mutex_exit(&vp->v_interlock); > > - mutex_enter(&vnode_free_list_lock); > > - goto retry; > > - } > > + KASSERT(vp->v_usecount == 0); > > It's not obvious from your commit message what prevents it from gaining > a reference after the lock is dropped. The interlock is taken before the freelist lock is dropped and vnodes on the free lists should never appear on other lists. -- Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: CVS commit: src/sys/kern
On Tue Aug 17 2010 at 13:17:48 +, Juergen Hannken-Illjes wrote: > Module Name: src > Committed By: hannken > Date: Tue Aug 17 13:17:48 UTC 2010 > > Modified Files: > src/sys/kern: vfs_subr.c > > Log Message: > Now that ffs on disk inodes get freed in the reclaim routine it is no longer > necessary for vget() to handle VI_INACTNOW as a special case. Remove this > check and its support in vrelel(). Hi, I didn't fully review your changes, so just to make sure: it's still possible to gain a reference during inactive, right? > Getting another reference while the freelist is locked is an error. Replace > the check with a KASSERT. > > vp->v_freelisthd = NULL; > mutex_exit(&vnode_free_list_lock); > > - if (vp->v_usecount != 0) { > - /* > - * was referenced again before we got the interlock > - * Don't return to freelist - the holder of the last > - * reference will destroy it. > - */ > - mutex_exit(&vp->v_interlock); > - mutex_enter(&vnode_free_list_lock); > - goto retry; > - } > + KASSERT(vp->v_usecount == 0); It's not obvious from your commit message what prevents it from gaining a reference after the lock is dropped.
Re: CVS commit: src/sys/arch
On Tue, Aug 17, 2010 at 12:57:21AM +0200, Jean-Yves Migeon wrote: > - kvm(3) API will have to move from using 'unsigned long' to 'paddr_t'. I > am currently checking that for all architectures except i386, > sizeof(paddr_t) == sizeof(u_long). No. The kvm(3) API should be uint64_t or similar exclusively. Joerg
re: CVS commit: src/sys/arch
> >> am currently checking that for all architectures except i386, > >> sizeof(paddr_t) == sizeof(u_long). > > > > sparc and 32 bit sparc64 kernels use 64 bit paddr_t. there are some > > mips ports that do too. you can't assume the above equality at all, > > not since "paddr_t" was introduced a decade+ ago. > > Eeek. I can't see how I can fix that then. Would it be acceptable to say > "the affected functions are private to kvm(3), so don't bother with this > anyway?" kvm is about as special case as we get in userland. > >> - i386 will use 64 bits paddr_t for kvm(3), so we will have to discuss > on > z> how to make it work without too much fuzz with pre-64 bits i386 dumps. > > > > i think this seems the sanest way. make userland see the 64 bit > > definition in all cases, and have the libkvm code DTRT. > > Thing is, kvm(3) has two functions (kvm_kvatop and kvm_pa2off) that use > PAs internally and assume they are unsigned long. I can't move i386 to 64 > bits PA without having to tweak kvm_private.h prototypes, which will affect > all ports in turn. > > IMHO, the path of least resistance would be to move to paddr_t. i think that this sounds sane. > It is only there for convenience, eg. when someone wants to know easily if > its system is running under PAE. ok. .mrg.
re: CVS commit: src/sys/arch
On Tue, 17 Aug 2010 09:59:54 +1000, matthew green wrote: >> For kernel core dumps, at the present time, no; this needs thinking, and >> testing: >> - kvm(3) API will have to move from using 'unsigned long' to 'paddr_t'. I >> am currently checking that for all architectures except i386, >> sizeof(paddr_t) == sizeof(u_long). > > sparc and 32 bit sparc64 kernels use 64 bit paddr_t. there are some > mips ports that do too. you can't assume the above equality at all, > not since "paddr_t" was introduced a decade+ ago. Eeek. I can't see how I can fix that then. Would it be acceptable to say "the affected functions are private to kvm(3), so don't bother with this anyway?" >> - i386 will use 64 bits paddr_t for kvm(3), so we will have to discuss on z> how to make it work without too much fuzz with pre-64 bits i386 dumps. > > i think this seems the sanest way. make userland see the 64 bit > definition in all cases, and have the libkvm code DTRT. Thing is, kvm(3) has two functions (kvm_kvatop and kvm_pa2off) that use PAs internally and assume they are unsigned long. I can't move i386 to 64 bits PA without having to tweak kvm_private.h prototypes, which will affect all ports in turn. IMHO, the path of least resistance would be to move to paddr_t. >> - sparse dumps for memory above 4GB. >> >> Eventually, PAE and !PAE kernels core files should be handled by kvm, if >> that's what you are asking. > > right. i was suggesting that if you simply used nlist(3) to find out the > value of "i386_use_pae" then the same code will work for live debugging > as well as core dumps. > > since core file debugging basically requires this method, and it works > just as well for live debugging, the sysctl seems like something we don't > need to do. I insist: the sysctl(7) was never meant to be used for checking whether PAE is enabled for "live" kvm(3) session. I admit, my commit message is not very clear: the " Will be used by i386 kvm(3) to know the functions that should [...]" sentence applies to i386_use_pae, not machdep.pae. It is only there for convenience, eg. when someone wants to know easily if its system is running under PAE. -- Jean-Yves Migeon jeanyves.mig...@free.fr