[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-30 Thread Jim Brennan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345361#comment-16345361
 ] 

Jim Brennan commented on YARN-7796:
---

I've filed a new Jira for this binary compatibility issue: YARN-7857

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-29 Thread Jim Brennan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343452#comment-16343452
 ] 

Jim Brennan commented on YARN-7796:
---

[~miklos.szeg...@cloudera.com] - I believe the max stack size reported by 
ulimit is in KBs, so 10 MB for RHEL 6 and 8 MB for RHEL 7.   The 
container-executor 128 KB stack allocation should be well within those limits.

It does appear that gcc is generating dynamic stack-checking code when compiled 
on RHEL 6, and that seems to work when run on RHEL 6.  But the same binary does 
not seem to work on RHEL 7.  This implies to me that the code generated for 
RHEL 6 is in some way incompatible with the RHEL 7 kernel.  

 

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-29 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343304#comment-16343304
 ] 

Gergo Repas commented on YARN-7796:
---

[~Jim_Brennan] I have the same gcc version (gcc version 4.4.7 20120313 (Red Hat 
4.4.7-18) (GCC)), and yes, when I removed the -fstack-check flag (and having 
this patch reverted) I haven't experienced the segfault anymore.

[~miklos.szeg...@cloudera.com] Your results may suggest that the 
container-executor issue is not related to the max stack size. I wonder if the 
following happens: "If neither STACK_CHECK_BUILTIN nor 
STACK_CHECK_STATIC_BUILTIN is defined, GCC will change its allocation strategy 
for large objects if the option -fstack-check is specified: they will always be 
allocated dynamically if their size exceeds STACK_CHECK_MAX_VAR_SIZE bytes." 
(from https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html).

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-26 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341948#comment-16341948
 ] 

Miklos Szegedi commented on YARN-7796:
--

Now the question is, how does a 128K allocation fill in a stack that is 
normally 8K? If it is the one that brought up the issue, there should be 
another big allocation. Do you have a ulimit -s value from a system that 
reproduces this?

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-26 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341932#comment-16341932
 ] 

Miklos Szegedi commented on YARN-7796:
--

[~Jim_Brennan], [~grepas], the stack depth is specified by {{ulimit -s}}. It is 
different on Redhat 6 and 7. I also checked below with -fstack-check and it has 
no impact on the limit.
{code:java}
*** REDHAT 6 ***
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
[root@mybox-rh69 ~]# curl 
https://gist.githubusercontent.com/szegedim/c583ccead8316b1035bc9148bcf588b9/raw/c0455196b47c76194e37a100964f3b3bf51d4a53/checkstack.cpp
 >./checkstack.cpp && gcc ./checkstack.cpp -lstdc++ -fstack-check && ./a.out
12051K succeededSegmentation fault (core dumped)
[root@mybox-rh69 ~]# curl 
https://gist.githubusercontent.com/szegedim/c583ccead8316b1035bc9148bcf588b9/raw/c0455196b47c76194e37a100964f3b3bf51d4a53/checkstack.cpp
 >./checkstack.cpp && gcc ./checkstack.cpp -lstdc++ && ./a.out
12051K succeededSegmentation fault (core dumped)
[root@mybox-rh69 ~]# ulimit -s
10240

*** REDHAT 7 ***
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
[root@mybox-rh74 ~]# curl 
https://gist.githubusercontent.com/szegedim/c583ccead8316b1035bc9148bcf588b9/raw/c0455196b47c76194e37a100964f3b3bf51d4a53/checkstack.cpp
 >./checkstack.cpp && gcc ./checkstack.cpp -lstdc++ -fstack-check && ./a.out
8016K Segmentation fault
[root@mybox-rh74 ~]# curl 
https://gist.githubusercontent.com/szegedim/c583ccead8316b1035bc9148bcf588b9/raw/c0455196b47c76194e37a100964f3b3bf51d4a53/checkstack.cpp
 >./checkstack.cpp && gcc ./checkstack.cpp -lstdc++ && ./a.out
8016K Segmentation fault
[root@mybox-rh74 ~]# ulimit -s
8192

*** REDHAT 6 BUILT CODE ON REDHAT 7 ***
[root@mybox-rh74 ~]# scp root@mybox-rh69:/root/a.out ./b.out
a.out   
100% 6989 4.4MB/s   00:00
[root@mybox-rh74 ~]# ./b.out 
8016K Segmentation fault
{code}

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional 

[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-26 Thread Jim Brennan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341628#comment-16341628
 ] 

Jim Brennan commented on YARN-7796:
---

[~grepas] that is interesting.  I wonder if it is the version of gcc that is 
the issue?  This is what I was using on RHEL 6, which causes the problem when 
running on RHEL 7:

gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)

I'll be interested to hear if removing the -fstack-check flag works in your 
case.

 

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-26 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340756#comment-16340756
 ] 

Gergo Repas commented on YARN-7796:
---

[~Jim_Brennan] Thanks for the info on -fstack-check, I'll look into it. In my 
case the compilation and execution both happened on RHEL 6.9.

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-25 Thread Jim Brennan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340101#comment-16340101
 ] 

Jim Brennan commented on YARN-7796:
---

We ran into this same issue when running a container-executor that was compiled 
on RHEL 6 on a RHEL 7 system.  While we have verified that the patch in this 
Jira does avoid the segmentation fault, we are concerned that the root cause of 
the problem remains, and may bite us later.

The -fstack-check flag was added to the command line in YARN-6721

Based on my testing, a container-executor (without the patch from this Jira) 
compiled on RHEL 6 with the -fstack-check flag always hits this segmentation 
fault when run on RHEL 7.  But if you compile without this flag, the 
container-executor runs on RHEL 7 with no problems.  I also verified this with 
a simple program that just does the copy_file.

[~grepas] - was this the case for you? Were you running a container-executor 
that was compiled on an earlier redhat release?

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-24 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337103#comment-16337103
 ] 

Gergo Repas commented on YARN-7796:
---

Thanks [~miklos.szeg...@cloudera.com] for the reviews, commit, and pushing the 
failing builds through!

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336958#comment-16336958
 ] 

Hudson commented on YARN-7796:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13544 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13544/])
YARN-7796. Container-executor fails with segfault on certain OS (szegedim: rev 
e7642a3e6f540b4b56367babfbaf35ee6b3c7675)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336942#comment-16336942
 ] 

Miklos Szegedi commented on YARN-7796:
--

+1 LGTM. The docker issue is unrelated.

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336708#comment-16336708
 ] 

genericqa commented on YARN-7796:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
24m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m  5s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7796 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12907385/YARN-7796.002.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux f3eb4970474c 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 39b999a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19414/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19414/testReport/ |
| Max. process+thread count | 395 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19414/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 

[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336628#comment-16336628
 ] 

Miklos Szegedi commented on YARN-7796:
--

{{[ERROR] unable to create new native thread -> [Help 1]}}

It looks like we have a busy node. I kicked off the test again. 
https://builds.apache.org/job/PreCommit-YARN-Build/19414/

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336618#comment-16336618
 ] 

genericqa commented on YARN-7796:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
20s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-yarn-server-nodemanager in trunk failed. 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 24m 
21s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 11s{color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 16 new + 0 unchanged - 0 fixed = 16 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 12s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7796 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12907385/YARN-7796.002.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 225059e2d361 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 39b999a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| compile | 
https://builds.apache.org/job/PreCommit-YARN-Build/19413/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/19413/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/19413/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19413/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19413/testReport/ |
| Max. process+thread count | 408 (vs. ulimit of 5000) |
| modules | C: 

[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336605#comment-16336605
 ] 

Miklos Szegedi commented on YARN-7796:
--

+1 pending Jenkins [https://builds.apache.org/job/PreCommit-YARN-Build/19413/]

 

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336516#comment-16336516
 ] 

Gergo Repas commented on YARN-7796:
---

[~miklos.szeg...@cloudera.com] Thanks for the comments. I also had to reformat 
the sorroundings of the line with the leading tab, as the sorrounding lines 
were using tab too.

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch, 
> YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336506#comment-16336506
 ] 

Gergo Repas commented on YARN-7796:
---

[~miklos.szeg...@cloudera.com] I'm addressing those.

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336469#comment-16336469
 ] 

Miklos Szegedi commented on YARN-7796:
--

Thank you [~grepas] for the patch.

Can you fix the outstanding whitespace issue? Also these declarations need to 
be in the beginning of the function per C coding style:
{code:java}
1018const int buffer_size = 128*1024;
1019char* buffer = malloc(buffer_size);{code}

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336230#comment-16336230
 ] 

genericqa commented on YARN-7796:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
25m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 31s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
21s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7796 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12907307/YARN-7796.001.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux fde509a413ac 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f63d13f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/19405/artifact/out/whitespace-tabs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19405/testReport/ |
| Max. process+thread count | 409 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19405/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch
>
>
> There is a relatively big (128K) buffer 

[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336151#comment-16336151
 ] 

Miklos Szegedi commented on YARN-7796:
--

I tried to start jenkins manually. 
https://builds.apache.org/job/PreCommit-YARN-Build/19405/

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336143#comment-16336143
 ] 

genericqa commented on YARN-7796:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
25m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 37s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7796 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12907307/YARN-7796.001.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 1497d87b9a0a 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6347b22 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/19402/artifact/out/whitespace-tabs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19402/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19402/testReport/ |
| Max. process+thread count | 440 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19402/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  

[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336111#comment-16336111
 ] 

genericqa commented on YARN-7796:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
24m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 34s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
21s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7796 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12907304/YARN-7796.000.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 91d2e1d45175 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6347b22 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19401/testReport/ |
| Max. process+thread count | 443 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19401/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by 

[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336046#comment-16336046
 ] 

Gergo Repas commented on YARN-7796:
---

[~miklos.szeg...@cloudera.com] Thanks for the comments, I addressed those 
issues in v001.

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch, YARN-7796.001.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336028#comment-16336028
 ] 

Miklos Szegedi commented on YARN-7796:
--

[~grepas], thank you for the patch. {{malloc}} may return NULL in case of out 
of memory. Could you update the patch to log an error and exit in this case?
{code:java}
 if (write_result <= 0) {
fprintf(LOGFILE, "Error writing to %s - %s\n", out_filename,
   strerror(errno));
close(out_fd);
return -1;
 }{code}
Also please make sure we do not leak the buffer in the error case above.

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations

2018-01-23 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336007#comment-16336007
 ] 

Gergo Repas commented on YARN-7796:
---

The issue is resolved if the above mentioned buffer is allocated on the heap 
instead of the stack, I'm attaching a patch for this.

> Container-executor fails with segfault on certain OS configurations
> ---
>
> Key: YARN-7796
> URL: https://issues.apache.org/jira/browse/YARN-7796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-7796.000.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in 
> container-executor.c for the purpose of copying files. As indicated by the 
> below gdb stack trace, this allocation can fail with SIGSEGV. This happens 
> only on certain OS configurations - I can reproduce this issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is systest
> main : requested yarn user is systest
> Program received signal SIGSEGV, Segmentation fault.
> 0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966 char buffer[buffer_size];
> (gdb) bt
> #0  0x004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> out_filename=0x932930 
> "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_01.tokens",
>  perm=384)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x00409a81 in initialize_app (user=, 
> app_id=0x7ffd669fd2b7 "application_1516711246952_0001", 
> nmPrivate_credentials_file=0x7ffd669fd2d6 
> "/yarn/nm/nmPrivate/container_1516711246952_0001_02_01.tokens", 
> local_dirs=0x9331c8, log_roots=, args=0x7ffd669fb168)
> at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x00403f90 in main (argc=, argv= optimized out>) at 
> /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org