Re: [ast-users] [patch] vmalloc |mmap(MAP_ANON)| fixes for fragmentation issues (on Solaris) ... / was: Re: Severe performance regression between ksh 2010-03-05 and 2013-10-08
On Mon, Dec 9, 2013 at 11:08 PM, Glenn Fowler wrote: > On Mon, Dec 9, 2013 at 4:53 PM, Roland Mainz > wrote: [snip] >> 4. The patch removes one unneccesary |memset(p, 0, size)| which was >> touching pages and therefore allocating them > > if that memset(0) is in vmopen() then im not sure its unnecessary > > run these tests to check your patch with different sizes and with/without > the memset(0) > > bin/package use > cd builtin > nmake test Seems to be no problem... and neither valgrind nor Rational Purify complained. I think the issue is that a memory page obtained via |mmap(MAP_ANON)| is zero'ed by the system on the first read/write/execute access. This behaviour is AFAIK defined by some standard (POSIX) because Linux has this extra |mmap()| flag: -- snip -- MAP_UNINITIALIZED (since Linux 2.6.33) Don't clear anonymous pages. This flag is intended to improve performance on embedded devices. This flag is only honored if the kernel was configured with the CONFIG_MMAP_ALLOW_UNINITIALIZED option. Because of the security implications, that option is normally enabled only on embedded devices (i.e., devices where one has complete control of the contents of user memory). -- snip -- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.ma...@nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) ___ ast-users mailing list ast-users@lists.research.att.com http://lists.research.att.com/mailman/listinfo/ast-users
[ast-users] vmalloc memory allocations via shared memory ? / was: Re: [patch] vmalloc |mmap(MAP_ANON)| fixes for fragmentation issues (on Solaris) ... / was: Re: Severe performance regression between
On Mon, Dec 9, 2013 at 10:53 PM, Roland Mainz wrote: > On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler wrote: >> On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak >> wrote: [snip] > Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt" > from this functionality since it cannot overcommit memory (except if > someone uses |MAP_NORESERVE| or uses kernel debugging options in > /etc/system) ... > > ... attached (as > "astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a > patch which... > 1. ... restores this exception for Solaris > > 2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for > 64bit processes since both values are more or less the points where > the fragmentation stops. Note that this does *not* mean it will use so > much memory... it only means that it reserves this amount of memory > and the real allocation happens on the first read, write or execute > access of the matching MMU page. This also means there is no > performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB > |mmap(MAP_ANON)| since it only reserves memory but does not > initalise/allocate it yet... this happens on the first time it's > accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB > largepages, allowing a ksh process to benefit from such pages, > additionaly most AST (including ksh93) applications consume a few MB > of memory... so there is a good chance that the "typical" > application/shell memory consumtion completly fits into that 4MB > chunk. 64bit processes get four times as much memory since it's > expected that they may operate on much larger datasets (and see the > comment about fragmentation above) > > Just to demonstrate "reservation" vs. "real usage" via Solaris pmap: > -- snip -- > $ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon' > FD7FFDA0 16384148 20 - rw---[ anon ] > -- snip -- > The test shows that of 16384k only 148k have really been touched... > the difference (16384-148) is reserved by the shell process but not > used. > > 3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to > describe whether the kernel permits overcommitment of memory or not. > AFAIK a simple function could be written which returns |-1| (not not > permit overcommitment), |0| (don't know) or |1| (does permit > overcommitment) ... and if the function returns |-1| vmalloc should do > the same as on Solaris > > 4. The patch removes one unneccesary |memset(p, 0, size)| which was > touching pages and therefore allocating them Note that if I use VMALLOC_OPTIONS=getmem=safe with the patch above vmalloc seems to resort to try shared memory: -- snip -- shmget(IPC_PRIVATE, 67108864, 0600|IPC_CREAT) = 8 brk(0x00603480) = 0 shmat(8, 0, 0600) = 0xFD7FFAA0 shmdt(0xFD7FFAA0) = 0 shmat(8, 0xDFAFFFA83000, 0600) Err#22 EINVAL shmat(8, 0xEE97FD241000, 0600) Err#22 EINVAL shmat(8, 0xF7FFFE0BFBE2, 0600) Err#22 EINVAL shmat(8, 0xFBFFFDC5FB41, 0600) Err#22 EINVAL shmat(8, 0xFDFFFDA2FAF08000, 0600) Err#22 EINVAL shmat(8, 0xFEFFFD917AC84000, 0600) Err#22 EINVAL shmat(8, 0xFF7FFD88BAB42000, 0600) Err#22 EINVAL shmat(8, 0xFFBFFD845AAA1000, 0600) Err#22 EINVAL shmat(8, 0xFFDFFD822AA5, 0600) Err#22 EINVAL shmat(8, 0xFFEFFD8112A28000, 0600) Err#22 EINVAL shmat(8, 0xFFF7FD8086A14000, 0600) Err#22 EINVAL shmat(8, 0xFFFBFD8040A0A000, 0600) Err#22 EINVAL shmat(8, 0xFFFDFD801DA05000, 0600) Err#22 EINVAL shmat(8, 0xFFFEFD800C202000, 0600) Err#22 EINVAL shmat(8, 0x7D8003601000, 0600) Err#22 EINVAL shmat(8, 0xBD7FFF00, 0600) = 0xBD7FFF00 shmdt(0xBD7FFF00) = 0 shmat(8, 0xBD7FFB00, 0600) = 0xBD7FFB00 shmdt(0xBD7FFB00) = 0 shmat(8, 0xBD7FF700, 0600) = 0xBD7FF700 shmdt(0xBD7FF700) = 0 -- snip -- ... note that such an allocation is... erm... not wise... because shared memory is usually a resource which system-wide... which means if many shell processes use shared memory it won't be available for other proceses (like databases) anymore. Are there any platforms which really have to resort to use shared memory ? Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.ma...@nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) ___ ast-users mailing list ast-users@lists.research.att.com http://lists.research.att.com/mailman/listinfo/ast-users
Re: [ast-users] [patch] vmalloc |mmap(MAP_ANON)| fixes for fragmentation issues (on Solaris) ... / was: Re: Severe performance regression between ksh 2010-03-05 and 2013-10-08
if that memset(0) is in vmopen() then im not sure its unnecessary run these tests to check your patch with different sizes and with/without the memset(0) bin/package use cd builtin nmake test On Mon, Dec 9, 2013 at 4:53 PM, Roland Mainz wrote: > On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler > wrote: > > On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak > > wrote: > >> > >> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler > >> wrote: > >> > On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons > > >> > wrote: > >> >> > >> >> On 1 December 2013 17:26, Glenn Fowler > >> >> wrote: > >> >> > I believe this is related to vmalloc changes between 2013-05-31 and > >> >> > 2013-06-09 > >> >> > re-run the tests with > >> >> > export VMALLOC_OPTIONS=getmem=safe > >> >> > if that's the problem then it gives a clue on a general solution > >> >> > details after confirmation > >> >> > > >> >> > >> >> timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0; > >> >> while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf > >> >> "%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort yyy' > >> >> Version AIJMP 93v- 2013-10-08 > >> >> > >> >> real 34.60 > >> >> user 33.27 > >> >> sys1.19 > >> >> > >> >> VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort { > >> >> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ; > >> >> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print > >> >> "${.sh.version}" ; nanosort yyy' > >> >> Version AIJMP 93v- 2013-10-08 > >> >> real 15.34 > >> >> user 14.67 > >> >> sys0.52 > >> >> > >> >> So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is > >> >> correct. > >> >> > >> >> What does VMALLOC_OPTIONS=getmem=safe do? > >> > > >> > > >> > vmalloc has an internal discipline/method for getting memory from the > >> > system > >> > several methods are available with varying degrees of thread safety > etc. > >> > see src/lib/libast/vmalloc/vmdcsystem.c for the code > >> > and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS > >> > description (vmalloc.3 update shortly) > >> > > >> > ** getmemory=f enable f[,g] getmemory() functions if > supported, > >> > all > >> > by default > >> > ** anon: mmap(MAP_ANON) > >> > ** break|sbrk: sbrk() > >> > ** native: native malloc() > >> > ** safe: safe sbrk() emulation via > >> > mmap(MAP_ANON) > >> > ** zero: mmap(/dev/zero) > >> > > >> > i believe the performance regression with "anon" is that on linux > >> > mmap(0MAP_ANON|MAP_PRIVATE...), > >> > which lets the system decide the address, returns adjacent (when > >> > possible) > >> > region addresses from highest to lowest order > >> > and the reverse order at minimum tends to fragment more memory > >> > "zero" has the same hi=>lo characteristic > >> > i suspect it adversely affects the vmalloc coalescing algorithm but > have > >> > not > >> > dug deeper > >> > for now the probe order in vmalloc/vmdcsystem.c was simply changed to > >> > favor > >> > "safe" > >> > >> MAP_FIXED should be avoided because its only there for special > >> purposes like the runtime linker ld.so.1 or debuggers. > >> > >> Using this for a general-purpose memory allocator causes serious > problems: > >> 1. On some systems this is a privileged operation and only available > >> for users with root privileges > >> > >> 2. SPARC T4 with 256GB and Solaris 11.1 the use of 'safe' degraded the > >> performance from 9 seconds to almost 15 minutes because it utterly > >> destroys the systems concept of large pages. If two MAP_FIXED mappings > >> follow directly each other the system downgrades the page size to the > >> smallest possible size, even trying to break up larger pages, which in > >> turn must be done by a special deamon (vmtasks) > >> > >> 3. MAP_PRIVATE|MAP_FIXED|MAP_ANON may no longer be available in future > >> versions of Solaris > >> > >> 4. Using the 'safe' allocator on SmartOS (solaris 11 clone) triggers a > >> SEGV: > >> map(0xCD800B482000, 1048576, PROT_READ|PROT_WRITE, > >> MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xCD800B482000 > >> sigaction(SIGSEGV, 0xFD7FFFDFDE50, 0xFD7FFFDFDED0) = 0 > >> Incurred fault #6, FLTBOUNDS %pc = 0x0052FE06 > >> siginfo: SIGSEGV SEGV_MAPERR addr=0xCD800B582000 > >> Received signal #11, SIGSEGV [caught] > >> siginfo: SIGSEGV SEGV_MAPERR addr=0xCD800B582000 > >> lwp_sigmask(SIG_SETMASK, 0x0400, 0x, 0x, > >> 0x) = 0xFFBFFEFF [0x] > > > > edit src/lib/libast/vmalloc/vmmaddress.c and change > > #define VMCHKMEM0 > > this affects vmalloc detecting overbooked memory but will disable the > > MAP_FIXED codepath > > Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt" > from this functionality since it cannot overcommit memo
Re: [ast-users] Avoid clearing memory from |mmap(MAP_ANON)| ? / was: Re: Severe performance regression between ksh 2010-03-05 and 2013-10-08
On Sat, Dec 7, 2013 at 4:09 AM, Phong Vo wrote: > On Fri, Dec 6, 2013 at 10:18 AM, Glenn Fowler > wrote: >> On Thu, Dec 5, 2013 at 6:41 PM, Roland Mainz >> wrote: >>> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler >>> wrote: >>> > On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons >>> > wrote: [snip] >>> Just to make it clear: Allocating a 1MB chunk of memory via >>> |mmap(MAP_ANON)| and a 128MB chunk of memory via |mmap(MAP_ANON)| has >>> *no* (visible) difference in performance until we touch the pages via >>> either read/execute or write accesses. >>> Currently the libast allocator code writes zeros into the whole chunk >>> of memory obtained via |mmap(MAP_ANON)| which pretty much ruins >>> performance because *all* pages are created physically instead of just >>> being some memory marked as "reserved". If libast would stop writing >>> into memory chunks directly after the |mmap(MAP_ANON)| we could easily >>> bump the allocation size up to 32MB or better without any performance >>> penalty... > > Vmalloc disciplines do not zero out memory. The only explicit zeroing of > memory occurs in the call vmresize() and only with the flag VM_RSZERO or in > the malloc-compatible calloc() call. Erm... see http://lists.research.att.com/pipermail/ast-developers/2013q4/003770.html ... there is one in src/lib/libast/vmalloc/vmopen.c ... Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.ma...@nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) ___ ast-users mailing list ast-users@lists.research.att.com http://lists.research.att.com/mailman/listinfo/ast-users
[ast-users] [patch] vmalloc |mmap(MAP_ANON)| fixes for fragmentation issues (on Solaris) ... / was: Re: Severe performance regression between ksh 2010-03-05 and 2013-10-08
On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler wrote: > On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak > wrote: >> >> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler >> wrote: >> > On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons >> > wrote: >> >> >> >> On 1 December 2013 17:26, Glenn Fowler >> >> wrote: >> >> > I believe this is related to vmalloc changes between 2013-05-31 and >> >> > 2013-06-09 >> >> > re-run the tests with >> >> > export VMALLOC_OPTIONS=getmem=safe >> >> > if that's the problem then it gives a clue on a general solution >> >> > details after confirmation >> >> > >> >> >> >> timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0; >> >> while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf >> >> "%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort yyy' >> >> Version AIJMP 93v- 2013-10-08 >> >> >> >> real 34.60 >> >> user 33.27 >> >> sys1.19 >> >> >> >> VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort { >> >> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ; >> >> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print >> >> "${.sh.version}" ; nanosort yyy' >> >> Version AIJMP 93v- 2013-10-08 >> >> real 15.34 >> >> user 14.67 >> >> sys0.52 >> >> >> >> So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is >> >> correct. >> >> >> >> What does VMALLOC_OPTIONS=getmem=safe do? >> > >> > >> > vmalloc has an internal discipline/method for getting memory from the >> > system >> > several methods are available with varying degrees of thread safety etc. >> > see src/lib/libast/vmalloc/vmdcsystem.c for the code >> > and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS >> > description (vmalloc.3 update shortly) >> > >> > ** getmemory=f enable f[,g] getmemory() functions if supported, >> > all >> > by default >> > ** anon: mmap(MAP_ANON) >> > ** break|sbrk: sbrk() >> > ** native: native malloc() >> > ** safe: safe sbrk() emulation via >> > mmap(MAP_ANON) >> > ** zero: mmap(/dev/zero) >> > >> > i believe the performance regression with "anon" is that on linux >> > mmap(0MAP_ANON|MAP_PRIVATE...), >> > which lets the system decide the address, returns adjacent (when >> > possible) >> > region addresses from highest to lowest order >> > and the reverse order at minimum tends to fragment more memory >> > "zero" has the same hi=>lo characteristic >> > i suspect it adversely affects the vmalloc coalescing algorithm but have >> > not >> > dug deeper >> > for now the probe order in vmalloc/vmdcsystem.c was simply changed to >> > favor >> > "safe" >> >> MAP_FIXED should be avoided because its only there for special >> purposes like the runtime linker ld.so.1 or debuggers. >> >> Using this for a general-purpose memory allocator causes serious problems: >> 1. On some systems this is a privileged operation and only available >> for users with root privileges >> >> 2. SPARC T4 with 256GB and Solaris 11.1 the use of 'safe' degraded the >> performance from 9 seconds to almost 15 minutes because it utterly >> destroys the systems concept of large pages. If two MAP_FIXED mappings >> follow directly each other the system downgrades the page size to the >> smallest possible size, even trying to break up larger pages, which in >> turn must be done by a special deamon (vmtasks) >> >> 3. MAP_PRIVATE|MAP_FIXED|MAP_ANON may no longer be available in future >> versions of Solaris >> >> 4. Using the 'safe' allocator on SmartOS (solaris 11 clone) triggers a >> SEGV: >> map(0xCD800B482000, 1048576, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xCD800B482000 >> sigaction(SIGSEGV, 0xFD7FFFDFDE50, 0xFD7FFFDFDED0) = 0 >> Incurred fault #6, FLTBOUNDS %pc = 0x0052FE06 >> siginfo: SIGSEGV SEGV_MAPERR addr=0xCD800B582000 >> Received signal #11, SIGSEGV [caught] >> siginfo: SIGSEGV SEGV_MAPERR addr=0xCD800B582000 >> lwp_sigmask(SIG_SETMASK, 0x0400, 0x, 0x, >> 0x) = 0xFFBFFEFF [0x] > > edit src/lib/libast/vmalloc/vmmaddress.c and change > #define VMCHKMEM0 > this affects vmalloc detecting overbooked memory but will disable the > MAP_FIXED codepath Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt" from this functionality since it cannot overcommit memory (except if someone uses |MAP_NORESERVE| or uses kernel debugging options in /etc/system) ... ... attached (as "astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a patch which... 1. ... restores this exception for Solaris 2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for 64bit processes since both values are more or less the points where the fragmentation stops. Note that this does *not* mean it will use so much memory... it only means t