[
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341932#comment-16341932
]
Miklos Szegedi commented on YARN-7796:
--------------------------------------
[~Jim_Brennan], [~grepas], the stack depth is specified by {{ulimit -s}}. It is
different on Redhat 6 and 7. I also checked below with -fstack-check and it has
no impact on the limit.
{code:java}
*** REDHAT 6 ***
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
[root@mybox-rh69 ~]# curl
https://gist.githubusercontent.com/szegedim/c583ccead8316b1035bc9148bcf588b9/raw/c0455196b47c76194e37a100964f3b3bf51d4a53/checkstack.cpp
>./checkstack.cpp && gcc ./checkstack.cpp -lstdc++ -fstack-check && ./a.out
12051K succeededSegmentation fault (core dumped)
[root@mybox-rh69 ~]# curl
https://gist.githubusercontent.com/szegedim/c583ccead8316b1035bc9148bcf588b9/raw/c0455196b47c76194e37a100964f3b3bf51d4a53/checkstack.cpp
>./checkstack.cpp && gcc ./checkstack.cpp -lstdc++ && ./a.out
12051K succeededSegmentation fault (core dumped)
[root@mybox-rh69 ~]# ulimit -s
10240
*** REDHAT 7 ***
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
[root@mybox-rh74 ~]# curl
https://gist.githubusercontent.com/szegedim/c583ccead8316b1035bc9148bcf588b9/raw/c0455196b47c76194e37a100964f3b3bf51d4a53/checkstack.cpp
>./checkstack.cpp && gcc ./checkstack.cpp -lstdc++ -fstack-check && ./a.out
8016K Segmentation fault
[root@mybox-rh74 ~]# curl
https://gist.githubusercontent.com/szegedim/c583ccead8316b1035bc9148bcf588b9/raw/c0455196b47c76194e37a100964f3b3bf51d4a53/checkstack.cpp
>./checkstack.cpp && gcc ./checkstack.cpp -lstdc++ && ./a.out
8016K Segmentation fault
[root@mybox-rh74 ~]# ulimit -s
8192
*** REDHAT 6 BUILT CODE ON REDHAT 7 ***
[root@mybox-rh74 ~]# scp root@mybox-rh69:/root/a.out ./b.out
a.out
100% 6989 4.4MB/s 00:00
[root@mybox-rh74 ~]# ./b.out
8016K Segmentation fault
{code}
> Container-executor fails with segfault on certain OS configurations
> -------------------------------------------------------------------
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.0.0
> Reporter: Gergo Repas
> Assignee: Gergo Repas
> Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch,
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in
> container-executor.c for the purpose of copying files. As indicated by the
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_000001.tokens",
> out_filename=0x932930
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_000001.tokens",
> perm=384)
> at
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0 0x00000000004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_000001.tokens",
> out_filename=0x932930
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_000001.tokens",
> perm=384)
> at
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1 0x0000000000409a81 in initialize_app (user=<value optimized out>,
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001",
> nmPrivate_credentials_file=0x7ffd669fd2d6
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_000001.tokens",
> local_dirs=0x9331c8, log_roots=<value optimized out>, args=0x7ffd669fb168)
> at
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2 0x0000000000403f90 in main (argc=<value optimized out>, argv=<value
> optimized out>) at
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]