[ https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343304#comment-16343304 ]
Gergo Repas commented on YARN-7796: ----------------------------------- [~Jim_Brennan] I have the same gcc version (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)), and yes, when I removed the -fstack-check flag (and having this patch reverted) I haven't experienced the segfault anymore. [~miklos.szeg...@cloudera.com] Your results may suggest that the container-executor issue is not related to the max stack size. I wonder if the following happens: "If neither STACK_CHECK_BUILTIN nor STACK_CHECK_STATIC_BUILTIN is defined, GCC will change its allocation strategy for large objects if the option -fstack-check is specified: they will always be allocated dynamically if their size exceeds STACK_CHECK_MAX_VAR_SIZE bytes." (from https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html). > Container-executor fails with segfault on certain OS configurations > ------------------------------------------------------------------- > > Key: YARN-7796 > URL: https://issues.apache.org/jira/browse/YARN-7796 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.0.0 > Reporter: Gergo Repas > Assignee: Gergo Repas > Priority: Major > Fix For: 3.1.0, 3.0.1 > > Attachments: YARN-7796.000.patch, YARN-7796.001.patch, > YARN-7796.002.patch > > > There is a relatively big (128K) buffer allocated on the stack in > container-executor.c for the purpose of copying files. As indicated by the > below gdb stack trace, this allocation can fail with SIGSEGV. This happens > only on certain OS configurations - I can reproduce this issue on RHEL 6.9: > {code:java} > [Thread debugging using libthread_db enabled] > main : command provided 0 > main : run as user is *** > main : requested yarn user is *** > Program received signal SIGSEGV, Segmentation fault. > 0x00000000004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 > "/yarn/nm/nmPrivate/container_1516711246952_0001_02_000001.tokens", > out_filename=0x932930 > "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_000001.tokens", > perm=384) > at > /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966 > 966 char buffer[buffer_size]; > (gdb) bt > #0 0x00000000004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 > "/yarn/nm/nmPrivate/container_1516711246952_0001_02_000001.tokens", > out_filename=0x932930 > "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_000001.tokens", > perm=384) > at > /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966 > #1 0x0000000000409a81 in initialize_app (user=<value optimized out>, > app_id=0x7ffd669fd2b7 "application_1516711246952_0001", > nmPrivate_credentials_file=0x7ffd669fd2d6 > "/yarn/nm/nmPrivate/container_1516711246952_0001_02_000001.tokens", > local_dirs=0x9331c8, log_roots=<value optimized out>, args=0x7ffd669fb168) > at > /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122 > #2 0x0000000000403f90 in main (argc=<value optimized out>, argv=<value > optimized out>) at > /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org