[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-07-03 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072119#comment-16072119
 ] 

James Peach commented on MESOS-7160:


It reproduced on my VM:
{noformat}
[vagrant@fedora-26 src]$ sudo GLOG_v=1 ../bin/gdb-mesos-tests.sh --verbose 
--gtest_filter=NetworkPortsIsolatorTest.ROOT_NC_TaskGroup
...
E0703 08:25:11.022508  5754 perf.cpp:245] Failed to get perf version: Failed to 
execute perf: terminated with signal Aborted (core dumped)
...
[vagrant@fedora-26 src]$ gdb .libs/lt-mesos-tests core.5918
...
(gdb) where
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7efed2b1f450 in __GI_abort () at abort.c:89
#2  0x00a94436 in _Abort (prefix=0x7efedbb06084 "ABORT: 
(/opt/home/src/mesos/3rdparty/libprocess/include/process/posix/subprocess.hpp:195):
 ",
message=0x7efea4000a30 "Failed to os::execvpe on path 'perf': No such file 
or directory") at /opt/home/src/mesos/3rdparty/stout/include/stout/abort.hpp:74
#3  0x00a940a0 in _Abort (prefix=0x7efedbb06084 "ABORT: 
(/opt/home/src/mesos/3rdparty/libprocess/include/process/posix/subprocess.hpp:195):
 ",
message="Failed to os::execvpe on path 'perf': No such file or directory") 
at /opt/home/src/mesos/3rdparty/stout/include/stout/abort.hpp:80
...
{noformat}


> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Andrei Budnik
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-06-26 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064011#comment-16064011
 ] 

Vinod Kone commented on MESOS-7160:
---

[~abudnik] Are you actively working on this? If yes, can you please add this to 
the sprint?

> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Andrei Budnik
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-06-26 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063895#comment-16063895
 ] 

James Peach commented on MESOS-7160:


This morning, my VM doesn't reproduce this, however it definitely happened :)

The normal code path is that the {{exec}} failure causes an abort. The 
supervisor then gets SIGTERM (need to read more code to see why). The signal 
handler it has installed issued SIGKILL. If the SIGTERM delivery is delayed, 
then the second abort in the supervisor could trigger.

{{noformat}}
[pid  2738] execve("/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */ 

[pid  2737] wait4(2738,  
[pid  2738] <... execve resumed> )  = -1 ENOENT (No such file or directory)
[pid  2738] execve("/usr/sbin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 
vars */) = -1 ENOENT (No such file or directory)
[pid  2738] execve("/usr/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars 
*/) = -1 ENOENT (No such file or directory)
[pid  2738] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2738, 
si_uid=0} ---
...
[pid  2737] <... wait4 resumed> 0x7f27e8901f44, 0, NULL) = ? ERESTARTSYS (To be 
restarted if SA_RESTART is set)
[pid  2737] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2708, 
si_uid=0} ---
[pid  2738] +++ killed by SIGKILL +++
[pid  2737] +++ killed by SIGKILL +++
{{noformat}}


> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Andrei Budnik
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-06-26 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063108#comment-16063108
 ] 

Andrei Budnik commented on MESOS-7160:
--

It's very unusual that parent fails to wait for a child. Sure, {{waitpid}} can 
be called before or after child exits. But AFAIK it shouldn't be a race 
condition as kernel keeps {{task_struct}} until parent process invokes 
{{wait*()}}.

In addition, if path to an executable is invalid, then {{execv}} will fail, 
causing invocation of {{abort()}} twice:
[https://github.com/apache/mesos/blob/18695ae8d5cfc209072950e887495a42dd83a1d9/3rdparty/libprocess/include/process/posix/subprocess.hpp#L195]
[https://github.com/apache/mesos/blob/18695ae8d5cfc209072950e887495a42dd83a1d9/3rdparty/libprocess/src/subprocess.cpp#L181]

> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-06-25 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062533#comment-16062533
 ] 

James Peach commented on MESOS-7160:


Ping [~abudnik] ^^^

> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-06-25 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062532#comment-16062532
 ] 

James Peach commented on MESOS-7160:


I looked

> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-05-17 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015165#comment-16015165
 ] 

James Peach commented on MESOS-7160:


FWIW `perf` can crash of misbehave when it is not matched to the kernel version.

> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-05-17 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014946#comment-16014946
 ] 

Till Toenshoff commented on MESOS-7160:
---

Related?
{noformat}
3:07:20 [ RUN  ] PerfTest.Version
23:07:20 ../../src/tests/containerizer/perf_tests.cpp:134: Failure
23:07:20 (perf::version()).failure(): Failed to execute perf: exited with 
status 2
{noformat}


> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-05-17 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013880#comment-16013880
 ] 

Benjamin Bannier commented on MESOS-7160:
-

[This|https://www.spinics.net/lists/linux-perf-users/msg02998.html] might be 
related, but it is hard to be certain without knowing the exact versions of the 
host kernel, linux-tools inside the container, and possibly docker.

> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults

2017-02-22 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879278#comment-15879278
 ] 

James Peach commented on MESOS-7160:


perf aborted for some reason.

> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)