Re: [ppc] Default stack size on ppc64

2015-03-26 Thread Richard W.M. Jones
On Wed, Mar 25, 2015 at 07:57:25PM -0500, Steven Munroe wrote:
 On Wed, 2015-03-25 at 19:45 +, Richard W.M. Jones wrote:
  On Wed, Mar 25, 2015 at 06:30:25PM +, David Woodhouse wrote:
   If the compiler is single-threaded, and increasing the stack ulimit 
   fixes the problem, that implies that the default stack ulimit is less 
   than the 8MiB-64KiB that it takes to reach the guard page...
  
  Just so I'm clear, is the stack supposed to grow down automatically
  (ie. does the stack automatically use MAP_GROWSDOWN), or is OCaml
  supposed to do something when the stack hits the guard page?
 
 It depends on which code is allocating the stack.
 
 The default GLIBC implementation under pthread_create will allocate it
 as single anonymous mmap of 8MB then use mprotect on the lowest page to
 mark the guard page no-access. As in:

Is pthread_create used even for the main thread in the process (I
thought the kernel created that).  In any case threads aren't being
used explicitly by this process.

Anyway I guess the guard page is just a hole that causes the process
to abort (as observed) and is not a mechanism for automatically
growing the stack.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: [ppc] Default stack size on ppc64

2015-03-26 Thread David Woodhouse
On Thu, 2015-03-26 at 08:57 +, Richard W.M. Jones wrote:
 Is pthread_create used even for the main thread in the process (I
 thought the kernel created that).  In any case threads aren't being
 used explicitly by this process.
 
 Anyway I guess the guard page is just a hole that causes the process
 to abort (as observed) and is not a mechanism for automatically
 growing the stack.

Right, It's a mechanism for catching stack overflow reliably, rather 
than letting it cause random memory corruption.

But for the original stack created by the kernel when the process is 
created, I think you just don't have the guard page. All that stops 
you is the ulimit.

So... we could potentially look for ways to reduce the stack usage of 
OCaml-compiled code, or you get to keep increasing the limit.

It might be worth taking a look at the stack frames and seeing if I 
did anything entirely stupid when setting them up, and whether they 
can be made smaller. Although if you said it was averaging ~168 bytes, 
that doesn't seem *particularly* excessive. Can you see how big the 
stack actually does get in your worst case, and how much we'd have to 
shrink by in order to get away without tweaking the ulimit?

Perhaps we could also look at using a smaller stack frame between 
OCaml functions. If we're never using certain registers in the OCaml 
code generator, we don't need to save them or leave space for them on 
the stack. All we need to do it ensure that if we're calling out to a C
function, we *do* leave enough space in the stack frame for callee-
saved registers, right?

I concede to being a bit rusty here... :)

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: [ppc] Default stack size on ppc64

2015-03-25 Thread Richard W.M. Jones
On Wed, Mar 25, 2015 at 05:54:58PM +, Richard W.M. Jones wrote:
 OCaml uses its own code generator.  However the description of
 -fsplit-stack from GCC sounds interesting.  Are there any more details
 of how exactly it works?  Does it catch the segfault from hitting the
 guard page and do something clever?

Just after posting that I found the details:

https://gcc.gnu.org/wiki/SplitStacks

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: [ppc] Default stack size on ppc64

2015-03-25 Thread Richard W.M. Jones
On Tue, Mar 24, 2015 at 11:35:30AM -0500, Steven Munroe wrote:
 On Tue, 2015-03-24 at 12:19 +, Richard W.M. Jones wrote:
  For the OCaml packages on ppc64/ppc64le, we keep having bugs like this
  one: https://bugzilla.redhat.com/show_bug.cgi?id=1204876
  
  The OCaml compiler is quite recursive, and so it can easily overflow
  the default stack.  For reasons that are not entirely clear this
  happens only on ppc64/ppc64le (not on x86 or aarch64).  Perhaps POWER
  stack frames are bigger, or the default stack size is smaller.
  
 The default thread stack is 8MB-64KB plus a 64KB guard page. PPC uses a
 64K default page size.
 
 So there are two possible problems:
 
 1) running into the guard page for a single thread.
 2) the sum of all thread stacks exceeds the ulimit -s

The OCaml compiler is single threaded AFAIK.

 If changing the ulimit -s solves the problem the sum of all thread
 stacks has exceeded the ulimit.
 
 If in cases where increasing the ulimit -s to unlimited does no help
 then you have at least one thread that has overflowed the default stack.
 
 Power has lots of registers (32x64b GPRs + 32x64b FPRs + 32x128b VRs) so
 we are likely to need a bigger stack frame at each level. The minimums
 are 112 bytes for ELF V1 ABI (PPC64BE) and 48 for ELF V2 ABI (PPC64LE),
 but any interesting function will need additional space to spill
 non-volatiles across function calls and store any local variables.
 
 You (ocaml runtime) can control the stack size on a per thread basis via
 pthread_attr_setstack().
 
 I am not sure how ocaml is generating code for PPC64, you could look in
 to split stack support,

OCaml uses its own code generator.  However the description of
-fsplit-stack from GCC sounds interesting.  Are there any more details
of how exactly it works?  Does it catch the segfault from hitting the
guard page and do something clever?

 but at this time GCC does not implement split
 stack.

Did you mean does implement?  GCC 5 documents it at least ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: [ppc] Default stack size on ppc64

2015-03-25 Thread David Woodhouse
On Tue, 2015-03-24 at 11:35 -0500, Steven Munroe wrote:
 I am not sure how ocaml is generating code for PPC64, you could look in
 to split stack support, but at this time GCC does not implement split
 stack.
 ... for PPC64.

I wouldn't want to do it in OCaml before it's supported in GCC and the 
runtime. But once it *is*, it shouldn't be hard to make OCaml support 
it.

It's mostly just a matter of emitting the right instructions in the 
function prologue and epilogue, in emit.mlp.

But it does depend on the runtime support for allocating more stack 
while not overflowing the 'slop' space on the existing stack, and 
linker support for expanding the stack frame size when calling through 
to legacy non-split-stack functions, and probably other things. So not 
something we'd want to do purely within OCaml.

I'm a little confused about what the problem is, though.

If the compiler is single-threaded, and increasing the stack ulimit 
fixes the problem, that implies that the default stack ulimit is less 
than the 8MiB-64KiB that it takes to reach the guard page... can that 
be right? Richard, what does 'ulimit -s' report *before* you increase 
it?

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: [ppc] Default stack size on ppc64

2015-03-25 Thread Richard W.M. Jones
On Wed, Mar 25, 2015 at 06:30:25PM +, David Woodhouse wrote:
 If the compiler is single-threaded, and increasing the stack ulimit 
 fixes the problem, that implies that the default stack ulimit is less 
 than the 8MiB-64KiB that it takes to reach the guard page...

Just so I'm clear, is the stack supposed to grow down automatically
(ie. does the stack automatically use MAP_GROWSDOWN), or is OCaml
supposed to do something when the stack hits the guard page?

I guess it's also possible that an OCaml stack frame is so big that it
skips the guard page, or that GCC does some kind of stack filling to
trigger the guard page which OCaml does not do.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: [ppc] Default stack size on ppc64

2015-03-25 Thread Richard W.M. Jones
On Wed, Mar 25, 2015 at 06:30:25PM +, David Woodhouse wrote:
 On Tue, 2015-03-24 at 11:35 -0500, Steven Munroe wrote:
  I am not sure how ocaml is generating code for PPC64, you could look in
  to split stack support, but at this time GCC does not implement split
  stack.
  ... for PPC64.
 
 I wouldn't want to do it in OCaml before it's supported in GCC and the 
 runtime. But once it *is*, it shouldn't be hard to make OCaml support 
 it.
 
 It's mostly just a matter of emitting the right instructions in the 
 function prologue and epilogue, in emit.mlp.
 
 But it does depend on the runtime support for allocating more stack 
 while not overflowing the 'slop' space on the existing stack, and 
 linker support for expanding the stack frame size when calling through 
 to legacy non-split-stack functions, and probably other things. So not 
 something we'd want to do purely within OCaml.
 
 I'm a little confused about what the problem is, though.

In summary: when you run ocamlopt to compile a sufficiently
complicated OCaml program, ocamlopt segfaults.  We found the cure is
to do 'ulimit -s 65536' before running ocamlopt.

 If the compiler is single-threaded, and increasing the stack ulimit 
 fixes the problem, that implies that the default stack ulimit is less 
 than the 8MiB-64KiB that it takes to reach the guard page... can that 
 be right? Richard, what does 'ulimit -s' report *before* you increase 
 it?

$ ulimit -s
8192

(For reference, all the limits are attached below).

That is from Fedora 20 ppc64 (not -le).  The server is a POWER 7
LPAR.  The page size is 64K.

I don't currently have access to a newer Fedora machine, but the same
segfault was reported in current Rawhide and it was cured in the same
way, so it seems unlikely that the default stack size is different there.

Rich.

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 496059
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 4096
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: [ppc] Default stack size on ppc64

2015-03-25 Thread David Woodhouse
On Wed, 2015-03-25 at 19:45 +, Richard W.M. Jones wrote:
 On Wed, Mar 25, 2015 at 06:30:25PM +, David Woodhouse wrote:
  If the compiler is single-threaded, and increasing the stack ulimit 
  fixes the problem, that implies that the default stack ulimit is less 
  than the 8MiB-64KiB that it takes to reach the guard page...
 
 Just so I'm clear, is the stack supposed to grow down automatically
 (ie. does the stack automatically use MAP_GROWSDOWN), or is OCaml
 supposed to do something when the stack hits the guard page?
 
 I guess it's also possible that an OCaml stack frame is so big that it
 skips the guard page, or that GCC does some kind of stack filling to
 trigger the guard page which OCaml does not do.

This is easy to trigger, right? Can you catch it in gdb and 't a a bt'


-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: [ppc] Default stack size on ppc64

2015-03-25 Thread Richard W.M. Jones
On Wed, Mar 25, 2015 at 08:03:31PM +, David Woodhouse wrote:
 On Wed, 2015-03-25 at 19:45 +, Richard W.M. Jones wrote:
  On Wed, Mar 25, 2015 at 06:30:25PM +, David Woodhouse wrote:
   If the compiler is single-threaded, and increasing the stack ulimit 
   fixes the problem, that implies that the default stack ulimit is less 
   than the 8MiB-64KiB that it takes to reach the guard page...
  
  Just so I'm clear, is the stack supposed to grow down automatically
  (ie. does the stack automatically use MAP_GROWSDOWN), or is OCaml
  supposed to do something when the stack hits the guard page?
  
  I guess it's also possible that an OCaml stack frame is so big that it
  skips the guard page, or that GCC does some kind of stack filling to
  trigger the guard page which OCaml does not do.
 
 This is easy to trigger, right? Can you catch it in gdb and 't a a bt'

I won't bore you with the full stack trace, but the top and bottom
and some interesting registers are below.

It confirms a few things:

 - Stack is overflowing because of a highly recursive function (which
   does eventually bottom-out, if you give it a big enough stack).

 - Compiler is single threaded.

 - %r1 (at top) - %r1 (at bottom) is approximately 8 MB

 - Avg. size of each stack frame is ~ 168 bytes.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
GNU gdb (GDB) Fedora 7.7.1-17.fc20
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as ppc64-redhat-linux-gnu.
Type show configuration for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type help.
Type apropos word to search for commands related to word...
Reading symbols from /home/rjones/local/bin/ocamlopt.opt...(no debugging 
symbols found)...done.
[?1034h(gdb) run
Starting program: /home/rjones/local/bin/ocamlopt.opt -c -g -w a -I 
camlp4/import -warn-error A-3 -I camlp4/config -I camlp4/boot -o 
camlp4/boot/camlp4boot.cmx camlp4/boot/camlp4boot.ml

Program received signal SIGSEGV, Segmentation fault.
0x100587cc in .camlCSEgen__compare_1011 ()
Missing separate debuginfos, use: debuginfo-install glibc-2.18-12.fc20.ppc64p7
(gdb) t a a bt

Thread 1 (process 48132):
#0  0x100587cc in .camlCSEgen__compare_1011 ()
#1  0x10007b48 in .caml_apply2 ()
#2  0x10263e44 in .camlMap__find_1091 ()
#3  0x10058d4c in .camlCSEgen__find_equation_1124 ()
#4  0x10057624 in .camlCSEgen__fun_1397 ()
#5  0x10007a84 in .caml_apply3 ()
#6  0x10057c5c in .camlCSEgen__fun_1397 ()
#7  0x10007a84 in .caml_apply3 ()
#8  0x10057c5c in .camlCSEgen__fun_1397 ()
#9  0x10007a84 in .caml_apply3 ()
#10 0x1005799c in .camlCSEgen__fun_1397 ()

[these two stack frames are repeated over and over]

#49873 0x10007a85 in .caml_apply3 ()
#49874 0x10057d51 in .camlCSEgen__fun_1397 ()
#49875 0x10007a85 in .caml_apply3 ()
#49876 0x10057d51 in .camlCSEgen__fun_1397 ()
#49877 0x10007a85 in .caml_apply3 ()
#49878 0x10057315 in .camlCSEgen__fun_1406 ()
#49879 0x10006a41 in .caml_send1 ()
#49880 0x10059ea9 in .camlCSE__fundecl_1029 ()
#49881 0x1007c59d in .camlAsmgen__compile_fundecl_1050 ()
#49882 0x1007c895 in .camlAsmgen__compile_phrase_1053 ()
#49883 0x1007bbe9 in .camlAsmgen__fun_1285 ()
#49884 0x10258a01 in .camlList__iter_1061 ()
#49885 0x1007bc2d in .camlAsmgen__fun_1290 ()
#49886 0x1007cc49 in .camlAsmgen__compile_implementation_1063 ()
#49887 0x10082da9 in .camlOptcompile__fun_1274 ()
#49888 0x10083525 in .camlOptcompile__comp_1048 ()
#49889 0x10083de1 in .camlOptcompile__implementation_1040 ()
#49890 0x10007cd9 in .camlOptmain__process_implementation_file_1011 ()
#49891 0x10008445 in .camlOptmain__process_file_1016 ()
#49892 0x100084c5 in .camlOptmain__anonymous_1022 ()
#49893 0x10283ec5 in .camlArg__parse_argv_dynamic_1078 ()
#49894 0x10284161 in .camlArg__parse_1140 ()
#49895 0x100092bd in .camlOptmain__main_1474 ()
#49896 0x1000a785 in .camlOptmain__entry ()
#49897 0x10003439 in caml_startup.code_begin ()
#49898 0x102b633d in .caml_start_program ()
#49899 0x102b6a30 in .caml_main ()
#49900