Re: [ppc] Default stack size on ppc64
On Wed, Mar 25, 2015 at 07:57:25PM -0500, Steven Munroe wrote: On Wed, 2015-03-25 at 19:45 +, Richard W.M. Jones wrote: On Wed, Mar 25, 2015 at 06:30:25PM +, David Woodhouse wrote: If the compiler is single-threaded, and increasing the stack ulimit fixes the problem, that implies that the default stack ulimit is less than the 8MiB-64KiB that it takes to reach the guard page... Just so I'm clear, is the stack supposed to grow down automatically (ie. does the stack automatically use MAP_GROWSDOWN), or is OCaml supposed to do something when the stack hits the guard page? It depends on which code is allocating the stack. The default GLIBC implementation under pthread_create will allocate it as single anonymous mmap of 8MB then use mprotect on the lowest page to mark the guard page no-access. As in: Is pthread_create used even for the main thread in the process (I thought the kernel created that). In any case threads aren't being used explicitly by this process. Anyway I guess the guard page is just a hole that causes the process to abort (as observed) and is not a mechanism for automatically growing the stack. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: [ppc] Default stack size on ppc64
On Thu, 2015-03-26 at 08:57 +, Richard W.M. Jones wrote: Is pthread_create used even for the main thread in the process (I thought the kernel created that). In any case threads aren't being used explicitly by this process. Anyway I guess the guard page is just a hole that causes the process to abort (as observed) and is not a mechanism for automatically growing the stack. Right, It's a mechanism for catching stack overflow reliably, rather than letting it cause random memory corruption. But for the original stack created by the kernel when the process is created, I think you just don't have the guard page. All that stops you is the ulimit. So... we could potentially look for ways to reduce the stack usage of OCaml-compiled code, or you get to keep increasing the limit. It might be worth taking a look at the stack frames and seeing if I did anything entirely stupid when setting them up, and whether they can be made smaller. Although if you said it was averaging ~168 bytes, that doesn't seem *particularly* excessive. Can you see how big the stack actually does get in your worst case, and how much we'd have to shrink by in order to get away without tweaking the ulimit? Perhaps we could also look at using a smaller stack frame between OCaml functions. If we're never using certain registers in the OCaml code generator, we don't need to save them or leave space for them on the stack. All we need to do it ensure that if we're calling out to a C function, we *do* leave enough space in the stack frame for callee- saved registers, right? I concede to being a bit rusty here... :) -- dwmw2 smime.p7s Description: S/MIME cryptographic signature -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: [ppc] Default stack size on ppc64
On Wed, Mar 25, 2015 at 05:54:58PM +, Richard W.M. Jones wrote: OCaml uses its own code generator. However the description of -fsplit-stack from GCC sounds interesting. Are there any more details of how exactly it works? Does it catch the segfault from hitting the guard page and do something clever? Just after posting that I found the details: https://gcc.gnu.org/wiki/SplitStacks Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/ -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: [ppc] Default stack size on ppc64
On Tue, Mar 24, 2015 at 11:35:30AM -0500, Steven Munroe wrote: On Tue, 2015-03-24 at 12:19 +, Richard W.M. Jones wrote: For the OCaml packages on ppc64/ppc64le, we keep having bugs like this one: https://bugzilla.redhat.com/show_bug.cgi?id=1204876 The OCaml compiler is quite recursive, and so it can easily overflow the default stack. For reasons that are not entirely clear this happens only on ppc64/ppc64le (not on x86 or aarch64). Perhaps POWER stack frames are bigger, or the default stack size is smaller. The default thread stack is 8MB-64KB plus a 64KB guard page. PPC uses a 64K default page size. So there are two possible problems: 1) running into the guard page for a single thread. 2) the sum of all thread stacks exceeds the ulimit -s The OCaml compiler is single threaded AFAIK. If changing the ulimit -s solves the problem the sum of all thread stacks has exceeded the ulimit. If in cases where increasing the ulimit -s to unlimited does no help then you have at least one thread that has overflowed the default stack. Power has lots of registers (32x64b GPRs + 32x64b FPRs + 32x128b VRs) so we are likely to need a bigger stack frame at each level. The minimums are 112 bytes for ELF V1 ABI (PPC64BE) and 48 for ELF V2 ABI (PPC64LE), but any interesting function will need additional space to spill non-volatiles across function calls and store any local variables. You (ocaml runtime) can control the stack size on a per thread basis via pthread_attr_setstack(). I am not sure how ocaml is generating code for PPC64, you could look in to split stack support, OCaml uses its own code generator. However the description of -fsplit-stack from GCC sounds interesting. Are there any more details of how exactly it works? Does it catch the segfault from hitting the guard page and do something clever? but at this time GCC does not implement split stack. Did you mean does implement? GCC 5 documents it at least ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: [ppc] Default stack size on ppc64
On Tue, 2015-03-24 at 11:35 -0500, Steven Munroe wrote: I am not sure how ocaml is generating code for PPC64, you could look in to split stack support, but at this time GCC does not implement split stack. ... for PPC64. I wouldn't want to do it in OCaml before it's supported in GCC and the runtime. But once it *is*, it shouldn't be hard to make OCaml support it. It's mostly just a matter of emitting the right instructions in the function prologue and epilogue, in emit.mlp. But it does depend on the runtime support for allocating more stack while not overflowing the 'slop' space on the existing stack, and linker support for expanding the stack frame size when calling through to legacy non-split-stack functions, and probably other things. So not something we'd want to do purely within OCaml. I'm a little confused about what the problem is, though. If the compiler is single-threaded, and increasing the stack ulimit fixes the problem, that implies that the default stack ulimit is less than the 8MiB-64KiB that it takes to reach the guard page... can that be right? Richard, what does 'ulimit -s' report *before* you increase it? -- dwmw2 smime.p7s Description: S/MIME cryptographic signature -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: [ppc] Default stack size on ppc64
On Wed, Mar 25, 2015 at 06:30:25PM +, David Woodhouse wrote: If the compiler is single-threaded, and increasing the stack ulimit fixes the problem, that implies that the default stack ulimit is less than the 8MiB-64KiB that it takes to reach the guard page... Just so I'm clear, is the stack supposed to grow down automatically (ie. does the stack automatically use MAP_GROWSDOWN), or is OCaml supposed to do something when the stack hits the guard page? I guess it's also possible that an OCaml stack frame is so big that it skips the guard page, or that GCC does some kind of stack filling to trigger the guard page which OCaml does not do. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: [ppc] Default stack size on ppc64
On Wed, Mar 25, 2015 at 06:30:25PM +, David Woodhouse wrote: On Tue, 2015-03-24 at 11:35 -0500, Steven Munroe wrote: I am not sure how ocaml is generating code for PPC64, you could look in to split stack support, but at this time GCC does not implement split stack. ... for PPC64. I wouldn't want to do it in OCaml before it's supported in GCC and the runtime. But once it *is*, it shouldn't be hard to make OCaml support it. It's mostly just a matter of emitting the right instructions in the function prologue and epilogue, in emit.mlp. But it does depend on the runtime support for allocating more stack while not overflowing the 'slop' space on the existing stack, and linker support for expanding the stack frame size when calling through to legacy non-split-stack functions, and probably other things. So not something we'd want to do purely within OCaml. I'm a little confused about what the problem is, though. In summary: when you run ocamlopt to compile a sufficiently complicated OCaml program, ocamlopt segfaults. We found the cure is to do 'ulimit -s 65536' before running ocamlopt. If the compiler is single-threaded, and increasing the stack ulimit fixes the problem, that implies that the default stack ulimit is less than the 8MiB-64KiB that it takes to reach the guard page... can that be right? Richard, what does 'ulimit -s' report *before* you increase it? $ ulimit -s 8192 (For reference, all the limits are attached below). That is from Fedora 20 ppc64 (not -le). The server is a POWER 7 LPAR. The page size is 64K. I don't currently have access to a newer Fedora machine, but the same segfault was reported in current Rawhide and it was cured in the same way, so it seems unlikely that the default stack size is different there. Rich. $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 496059 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/ -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: [ppc] Default stack size on ppc64
On Wed, 2015-03-25 at 19:45 +, Richard W.M. Jones wrote: On Wed, Mar 25, 2015 at 06:30:25PM +, David Woodhouse wrote: If the compiler is single-threaded, and increasing the stack ulimit fixes the problem, that implies that the default stack ulimit is less than the 8MiB-64KiB that it takes to reach the guard page... Just so I'm clear, is the stack supposed to grow down automatically (ie. does the stack automatically use MAP_GROWSDOWN), or is OCaml supposed to do something when the stack hits the guard page? I guess it's also possible that an OCaml stack frame is so big that it skips the guard page, or that GCC does some kind of stack filling to trigger the guard page which OCaml does not do. This is easy to trigger, right? Can you catch it in gdb and 't a a bt' -- dwmw2 smime.p7s Description: S/MIME cryptographic signature -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: [ppc] Default stack size on ppc64
On Wed, Mar 25, 2015 at 08:03:31PM +, David Woodhouse wrote: On Wed, 2015-03-25 at 19:45 +, Richard W.M. Jones wrote: On Wed, Mar 25, 2015 at 06:30:25PM +, David Woodhouse wrote: If the compiler is single-threaded, and increasing the stack ulimit fixes the problem, that implies that the default stack ulimit is less than the 8MiB-64KiB that it takes to reach the guard page... Just so I'm clear, is the stack supposed to grow down automatically (ie. does the stack automatically use MAP_GROWSDOWN), or is OCaml supposed to do something when the stack hits the guard page? I guess it's also possible that an OCaml stack frame is so big that it skips the guard page, or that GCC does some kind of stack filling to trigger the guard page which OCaml does not do. This is easy to trigger, right? Can you catch it in gdb and 't a a bt' I won't bore you with the full stack trace, but the top and bottom and some interesting registers are below. It confirms a few things: - Stack is overflowing because of a highly recursive function (which does eventually bottom-out, if you give it a big enough stack). - Compiler is single threaded. - %r1 (at top) - %r1 (at bottom) is approximately 8 MB - Avg. size of each stack frame is ~ 168 bytes. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW GNU gdb (GDB) Fedora 7.7.1-17.fc20 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as ppc64-redhat-linux-gnu. Type show configuration for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type help. Type apropos word to search for commands related to word... Reading symbols from /home/rjones/local/bin/ocamlopt.opt...(no debugging symbols found)...done. [?1034h(gdb) run Starting program: /home/rjones/local/bin/ocamlopt.opt -c -g -w a -I camlp4/import -warn-error A-3 -I camlp4/config -I camlp4/boot -o camlp4/boot/camlp4boot.cmx camlp4/boot/camlp4boot.ml Program received signal SIGSEGV, Segmentation fault. 0x100587cc in .camlCSEgen__compare_1011 () Missing separate debuginfos, use: debuginfo-install glibc-2.18-12.fc20.ppc64p7 (gdb) t a a bt Thread 1 (process 48132): #0 0x100587cc in .camlCSEgen__compare_1011 () #1 0x10007b48 in .caml_apply2 () #2 0x10263e44 in .camlMap__find_1091 () #3 0x10058d4c in .camlCSEgen__find_equation_1124 () #4 0x10057624 in .camlCSEgen__fun_1397 () #5 0x10007a84 in .caml_apply3 () #6 0x10057c5c in .camlCSEgen__fun_1397 () #7 0x10007a84 in .caml_apply3 () #8 0x10057c5c in .camlCSEgen__fun_1397 () #9 0x10007a84 in .caml_apply3 () #10 0x1005799c in .camlCSEgen__fun_1397 () [these two stack frames are repeated over and over] #49873 0x10007a85 in .caml_apply3 () #49874 0x10057d51 in .camlCSEgen__fun_1397 () #49875 0x10007a85 in .caml_apply3 () #49876 0x10057d51 in .camlCSEgen__fun_1397 () #49877 0x10007a85 in .caml_apply3 () #49878 0x10057315 in .camlCSEgen__fun_1406 () #49879 0x10006a41 in .caml_send1 () #49880 0x10059ea9 in .camlCSE__fundecl_1029 () #49881 0x1007c59d in .camlAsmgen__compile_fundecl_1050 () #49882 0x1007c895 in .camlAsmgen__compile_phrase_1053 () #49883 0x1007bbe9 in .camlAsmgen__fun_1285 () #49884 0x10258a01 in .camlList__iter_1061 () #49885 0x1007bc2d in .camlAsmgen__fun_1290 () #49886 0x1007cc49 in .camlAsmgen__compile_implementation_1063 () #49887 0x10082da9 in .camlOptcompile__fun_1274 () #49888 0x10083525 in .camlOptcompile__comp_1048 () #49889 0x10083de1 in .camlOptcompile__implementation_1040 () #49890 0x10007cd9 in .camlOptmain__process_implementation_file_1011 () #49891 0x10008445 in .camlOptmain__process_file_1016 () #49892 0x100084c5 in .camlOptmain__anonymous_1022 () #49893 0x10283ec5 in .camlArg__parse_argv_dynamic_1078 () #49894 0x10284161 in .camlArg__parse_1140 () #49895 0x100092bd in .camlOptmain__main_1474 () #49896 0x1000a785 in .camlOptmain__entry () #49897 0x10003439 in caml_startup.code_begin () #49898 0x102b633d in .caml_start_program () #49899 0x102b6a30 in .caml_main () #49900