Re: [ast-users] [patch] vmalloc |mmap(MAP_ANON)| fixes for fragmentation issues (on Solaris) ... / was: Re: Severe performance regression between ksh 2010-03-05 and 2013-10-08

2013-12-09 Thread Roland Mainz
On Mon, Dec 9, 2013 at 11:08 PM, Glenn Fowler  wrote:
> On Mon, Dec 9, 2013 at 4:53 PM, Roland Mainz 
> wrote:
[snip]
>> 4. The patch removes one unneccesary |memset(p, 0, size)| which was
>> touching pages and therefore allocating them
>
> if that memset(0) is in vmopen() then im not sure its unnecessary
>
> run these tests to check your patch with different sizes and with/without
> the memset(0)
>
> bin/package use
> cd builtin
> nmake test

Seems to be no problem... and neither valgrind nor Rational Purify
complained. I think the issue is that a memory page obtained via
|mmap(MAP_ANON)| is zero'ed by the system on the first
read/write/execute access.

This behaviour is AFAIK defined by some standard (POSIX) because Linux
has this extra |mmap()| flag:
-- snip --
   MAP_UNINITIALIZED (since Linux 2.6.33)
  Don't  clear anonymous pages.  This flag is intended to
improve performance on embedded devices.  This flag is only honored if
the kernel was configured with the
  CONFIG_MMAP_ALLOW_UNINITIALIZED option.  Because of the
security implications, that option is normally enabled only on
embedded devices (i.e., devices where  one
  has complete control of the contents of user memory).
-- snip --



Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.ma...@nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
___
ast-users mailing list
ast-users@lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-users


[ast-users] vmalloc memory allocations via shared memory ? / was: Re: [patch] vmalloc |mmap(MAP_ANON)| fixes for fragmentation issues (on Solaris) ... / was: Re: Severe performance regression between

2013-12-09 Thread Roland Mainz
On Mon, Dec 9, 2013 at 10:53 PM, Roland Mainz  wrote:
> On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler  wrote:
>> On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak 
>> wrote:
[snip]
> Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"
> from this functionality since it cannot overcommit memory (except if
> someone uses |MAP_NORESERVE| or uses kernel debugging options in
> /etc/system) ...
>
> ... attached (as
> "astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
> patch which...
> 1. ... restores this exception for Solaris
>
> 2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
> 64bit processes since both values are more or less the points where
> the fragmentation stops. Note that this does *not* mean it will use so
> much memory... it only means that it reserves this amount of memory
> and the real allocation happens on the first read, write or execute
> access of the matching MMU page. This also means there is no
> performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB
> |mmap(MAP_ANON)| since it only reserves memory but does not
> initalise/allocate it yet... this happens on the first time it's
> accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB
> largepages, allowing a ksh process to benefit from such pages,
> additionaly most AST (including ksh93) applications consume a few MB
> of memory... so there is a good chance that the "typical"
> application/shell memory consumtion completly fits into that 4MB
> chunk. 64bit processes get four times as much memory since it's
> expected that they may operate on much larger datasets (and see the
> comment about fragmentation above)
>
> Just to demonstrate "reservation" vs. "real usage" via Solaris pmap:
> -- snip --
> $ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon'
> FD7FFDA0  16384148 20  - rw---[ anon ]
> -- snip --
> The test shows that of 16384k only 148k have really been touched...
> the difference (16384-148) is reserved by the shell process but not
> used.
>
> 3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to
> describe whether the kernel permits overcommitment of memory or not.
> AFAIK a simple function could be written which returns |-1| (not not
> permit overcommitment), |0| (don't know) or |1| (does permit
> overcommitment) ... and if the function returns |-1| vmalloc should do
> the same as on Solaris
>
> 4. The patch removes one unneccesary |memset(p, 0, size)| which was
> touching pages and therefore allocating them

Note that if I use VMALLOC_OPTIONS=getmem=safe with the patch above
vmalloc seems to resort to try shared memory:
-- snip --
shmget(IPC_PRIVATE, 67108864, 0600|IPC_CREAT)   = 8
brk(0x00603480) = 0
shmat(8, 0, 0600)   = 0xFD7FFAA0
shmdt(0xFD7FFAA0)   = 0
shmat(8, 0xDFAFFFA83000, 0600)  Err#22 EINVAL
shmat(8, 0xEE97FD241000, 0600)  Err#22 EINVAL
shmat(8, 0xF7FFFE0BFBE2, 0600)  Err#22 EINVAL
shmat(8, 0xFBFFFDC5FB41, 0600)  Err#22 EINVAL
shmat(8, 0xFDFFFDA2FAF08000, 0600)  Err#22 EINVAL
shmat(8, 0xFEFFFD917AC84000, 0600)  Err#22 EINVAL
shmat(8, 0xFF7FFD88BAB42000, 0600)  Err#22 EINVAL
shmat(8, 0xFFBFFD845AAA1000, 0600)  Err#22 EINVAL
shmat(8, 0xFFDFFD822AA5, 0600)  Err#22 EINVAL
shmat(8, 0xFFEFFD8112A28000, 0600)  Err#22 EINVAL
shmat(8, 0xFFF7FD8086A14000, 0600)  Err#22 EINVAL
shmat(8, 0xFFFBFD8040A0A000, 0600)  Err#22 EINVAL
shmat(8, 0xFFFDFD801DA05000, 0600)  Err#22 EINVAL
shmat(8, 0xFFFEFD800C202000, 0600)  Err#22 EINVAL
shmat(8, 0x7D8003601000, 0600)  Err#22 EINVAL
shmat(8, 0xBD7FFF00, 0600)  = 0xBD7FFF00
shmdt(0xBD7FFF00)   = 0
shmat(8, 0xBD7FFB00, 0600)  = 0xBD7FFB00
shmdt(0xBD7FFB00)   = 0
shmat(8, 0xBD7FF700, 0600)  = 0xBD7FF700
shmdt(0xBD7FF700)   = 0
-- snip --
... note that such an allocation is... erm... not wise... because
shared memory is usually a resource which system-wide... which means
if many shell processes use shared memory it won't be available for
other proceses (like databases) anymore.

Are there any platforms which really have to resort to use shared memory ?



Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.ma...@nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
___
ast-users mailing list
ast-users@lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-users


Re: [ast-users] [patch] vmalloc |mmap(MAP_ANON)| fixes for fragmentation issues (on Solaris) ... / was: Re: Severe performance regression between ksh 2010-03-05 and 2013-10-08

2013-12-09 Thread Glenn Fowler
if that memset(0) is in vmopen() then im not sure its unnecessary

run these tests to check your patch with different sizes and with/without
the memset(0)

bin/package use
cd builtin
nmake test


On Mon, Dec 9, 2013 at 4:53 PM, Roland Mainz wrote:

> On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler 
> wrote:
> > On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak 
> > wrote:
> >>
> >> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler 
> >> wrote:
> >> > On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons  >
> >> > wrote:
> >> >>
> >> >> On 1 December 2013 17:26, Glenn Fowler 
> >> >> wrote:
> >> >> > I believe this is related to vmalloc changes between 2013-05-31 and
> >> >> > 2013-06-09
> >> >> > re-run the tests with
> >> >> > export VMALLOC_OPTIONS=getmem=safe
> >> >> > if that's the problem then it gives a clue on a general solution
> >> >> > details after confirmation
> >> >> >
> >> >>
> >> >> timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
> >> >> while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
> >> >> "%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort yyy'
> >> >> Version AIJMP 93v- 2013-10-08
> >> >>
> >> >> real  34.60
> >> >> user  33.27
> >> >> sys1.19
> >> >>
> >> >> VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
> >> >> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
> >> >> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
> >> >> "${.sh.version}" ; nanosort yyy'
> >> >> Version AIJMP 93v- 2013-10-08
> >> >> real  15.34
> >> >> user  14.67
> >> >> sys0.52
> >> >>
> >> >> So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is
> >> >> correct.
> >> >>
> >> >> What does VMALLOC_OPTIONS=getmem=safe do?
> >> >
> >> >
> >> > vmalloc has an internal discipline/method for getting memory from the
> >> > system
> >> > several methods are available with varying degrees of thread safety
> etc.
> >> > see src/lib/libast/vmalloc/vmdcsystem.c for the code
> >> > and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
> >> > description (vmalloc.3 update shortly)
> >> >
> >> > **  getmemory=f enable f[,g] getmemory() functions if
> supported,
> >> > all
> >> > by default
> >> > **  anon: mmap(MAP_ANON)
> >> > **  break|sbrk: sbrk()
> >> > **  native: native malloc()
> >> > **  safe: safe sbrk() emulation via
> >> > mmap(MAP_ANON)
> >> > **  zero: mmap(/dev/zero)
> >> >
> >> > i believe the performance regression with "anon" is that on linux
> >> > mmap(0MAP_ANON|MAP_PRIVATE...),
> >> > which lets the system decide the address, returns adjacent (when
> >> > possible)
> >> > region addresses from highest to lowest order
> >> > and the reverse order at minimum tends to fragment more memory
> >> > "zero" has the same hi=>lo characteristic
> >> > i suspect it adversely affects the vmalloc coalescing algorithm but
> have
> >> > not
> >> > dug deeper
> >> > for now the probe order in vmalloc/vmdcsystem.c was simply changed to
> >> > favor
> >> > "safe"
> >>
> >> MAP_FIXED should be avoided because its only there for special
> >> purposes like the runtime linker ld.so.1 or debuggers.
> >>
> >> Using this for a general-purpose memory allocator causes serious
> problems:
> >> 1. On some systems this is a privileged operation and only available
> >> for users with root privileges
> >>
> >> 2. SPARC T4 with 256GB and Solaris 11.1 the use of 'safe' degraded the
> >> performance from 9 seconds to almost 15 minutes because it utterly
> >> destroys the systems concept of large pages. If two MAP_FIXED mappings
> >> follow directly each other the system downgrades the page size to the
> >> smallest possible size, even trying to break up larger pages, which in
> >> turn must be done by a special deamon (vmtasks)
> >>
> >> 3. MAP_PRIVATE|MAP_FIXED|MAP_ANON may no longer be available in future
> >> versions of Solaris
> >>
> >> 4. Using the 'safe' allocator on SmartOS (solaris 11 clone) triggers a
> >> SEGV:
> >> map(0xCD800B482000, 1048576, PROT_READ|PROT_WRITE,
> >> MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xCD800B482000
> >> sigaction(SIGSEGV, 0xFD7FFFDFDE50, 0xFD7FFFDFDED0) = 0
> >> Incurred fault #6, FLTBOUNDS  %pc = 0x0052FE06
> >>   siginfo: SIGSEGV SEGV_MAPERR addr=0xCD800B582000
> >> Received signal #11, SIGSEGV [caught]
> >>   siginfo: SIGSEGV SEGV_MAPERR addr=0xCD800B582000
> >> lwp_sigmask(SIG_SETMASK, 0x0400, 0x, 0x,
> >> 0x) = 0xFFBFFEFF [0x]
> >
> > edit src/lib/libast/vmalloc/vmmaddress.c and change
> > #define VMCHKMEM0
> > this affects vmalloc detecting overbooked memory but will disable the
> > MAP_FIXED codepath
>
> Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"
> from this functionality since it cannot overcommit memo

Re: [ast-users] Avoid clearing memory from |mmap(MAP_ANON)| ? / was: Re: Severe performance regression between ksh 2010-03-05 and 2013-10-08

2013-12-09 Thread Roland Mainz
On Sat, Dec 7, 2013 at 4:09 AM, Phong Vo  wrote:
> On Fri, Dec 6, 2013 at 10:18 AM, Glenn Fowler 
> wrote:
>> On Thu, Dec 5, 2013 at 6:41 PM, Roland Mainz 
>> wrote:
>>> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler 
>>> wrote:
>>> > On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons 
>>> > wrote:
[snip]
>>> Just to make it clear: Allocating a 1MB chunk of memory via
>>> |mmap(MAP_ANON)| and a 128MB chunk of memory via |mmap(MAP_ANON)| has
>>> *no* (visible) difference in performance until we touch the pages via
>>> either read/execute or write accesses.
>>> Currently the libast allocator code writes zeros into the whole chunk
>>> of memory obtained via |mmap(MAP_ANON)| which pretty much ruins
>>> performance because *all* pages are created physically instead of just
>>> being some memory marked as "reserved". If libast would stop writing
>>> into memory chunks directly after the |mmap(MAP_ANON)| we could easily
>>> bump the allocation size up to 32MB or better without any performance
>>> penalty...
>
> Vmalloc disciplines do not zero out memory. The only explicit zeroing of
> memory occurs in the call vmresize() and only with the flag VM_RSZERO or in
> the malloc-compatible calloc() call.

Erm... see 
http://lists.research.att.com/pipermail/ast-developers/2013q4/003770.html
... there is one in src/lib/libast/vmalloc/vmopen.c ...



Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.ma...@nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
___
ast-users mailing list
ast-users@lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-users


[ast-users] [patch] vmalloc |mmap(MAP_ANON)| fixes for fragmentation issues (on Solaris) ... / was: Re: Severe performance regression between ksh 2010-03-05 and 2013-10-08

2013-12-09 Thread Roland Mainz
On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler  wrote:
> On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak 
> wrote:
>>
>> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler 
>> wrote:
>> > On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons 
>> > wrote:
>> >>
>> >> On 1 December 2013 17:26, Glenn Fowler 
>> >> wrote:
>> >> > I believe this is related to vmalloc changes between 2013-05-31 and
>> >> > 2013-06-09
>> >> > re-run the tests with
>> >> > export VMALLOC_OPTIONS=getmem=safe
>> >> > if that's the problem then it gives a clue on a general solution
>> >> > details after confirmation
>> >> >
>> >>
>> >> timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
>> >> while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
>> >> "%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort yyy'
>> >> Version AIJMP 93v- 2013-10-08
>> >>
>> >> real  34.60
>> >> user  33.27
>> >> sys1.19
>> >>
>> >> VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
>> >> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
>> >> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
>> >> "${.sh.version}" ; nanosort yyy'
>> >> Version AIJMP 93v- 2013-10-08
>> >> real  15.34
>> >> user  14.67
>> >> sys0.52
>> >>
>> >> So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is
>> >> correct.
>> >>
>> >> What does VMALLOC_OPTIONS=getmem=safe do?
>> >
>> >
>> > vmalloc has an internal discipline/method for getting memory from the
>> > system
>> > several methods are available with varying degrees of thread safety etc.
>> > see src/lib/libast/vmalloc/vmdcsystem.c for the code
>> > and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
>> > description (vmalloc.3 update shortly)
>> >
>> > **  getmemory=f enable f[,g] getmemory() functions if supported,
>> > all
>> > by default
>> > **  anon: mmap(MAP_ANON)
>> > **  break|sbrk: sbrk()
>> > **  native: native malloc()
>> > **  safe: safe sbrk() emulation via
>> > mmap(MAP_ANON)
>> > **  zero: mmap(/dev/zero)
>> >
>> > i believe the performance regression with "anon" is that on linux
>> > mmap(0MAP_ANON|MAP_PRIVATE...),
>> > which lets the system decide the address, returns adjacent (when
>> > possible)
>> > region addresses from highest to lowest order
>> > and the reverse order at minimum tends to fragment more memory
>> > "zero" has the same hi=>lo characteristic
>> > i suspect it adversely affects the vmalloc coalescing algorithm but have
>> > not
>> > dug deeper
>> > for now the probe order in vmalloc/vmdcsystem.c was simply changed to
>> > favor
>> > "safe"
>>
>> MAP_FIXED should be avoided because its only there for special
>> purposes like the runtime linker ld.so.1 or debuggers.
>>
>> Using this for a general-purpose memory allocator causes serious problems:
>> 1. On some systems this is a privileged operation and only available
>> for users with root privileges
>>
>> 2. SPARC T4 with 256GB and Solaris 11.1 the use of 'safe' degraded the
>> performance from 9 seconds to almost 15 minutes because it utterly
>> destroys the systems concept of large pages. If two MAP_FIXED mappings
>> follow directly each other the system downgrades the page size to the
>> smallest possible size, even trying to break up larger pages, which in
>> turn must be done by a special deamon (vmtasks)
>>
>> 3. MAP_PRIVATE|MAP_FIXED|MAP_ANON may no longer be available in future
>> versions of Solaris
>>
>> 4. Using the 'safe' allocator on SmartOS (solaris 11 clone) triggers a
>> SEGV:
>> map(0xCD800B482000, 1048576, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xCD800B482000
>> sigaction(SIGSEGV, 0xFD7FFFDFDE50, 0xFD7FFFDFDED0) = 0
>> Incurred fault #6, FLTBOUNDS  %pc = 0x0052FE06
>>   siginfo: SIGSEGV SEGV_MAPERR addr=0xCD800B582000
>> Received signal #11, SIGSEGV [caught]
>>   siginfo: SIGSEGV SEGV_MAPERR addr=0xCD800B582000
>> lwp_sigmask(SIG_SETMASK, 0x0400, 0x, 0x,
>> 0x) = 0xFFBFFEFF [0x]
>
> edit src/lib/libast/vmalloc/vmmaddress.c and change
> #define VMCHKMEM0
> this affects vmalloc detecting overbooked memory but will disable the
> MAP_FIXED codepath

Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"
from this functionality since it cannot overcommit memory (except if
someone uses |MAP_NORESERVE| or uses kernel debugging options in
/etc/system) ...

... attached (as
"astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
patch which...
1. ... restores this exception for Solaris

2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
64bit processes since both values are more or less the points where
the fragmentation stops. Note that this does *not* mean it will use so
much memory... it only means t