[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072119#comment-16072119 ] James Peach commented on MESOS-7160: It reproduced on my VM: {noformat} [vagrant@fedora-26 src]$ sudo GLOG_v=1 ../bin/gdb-mesos-tests.sh --verbose --gtest_filter=NetworkPortsIsolatorTest.ROOT_NC_TaskGroup ... E0703 08:25:11.022508 5754 perf.cpp:245] Failed to get perf version: Failed to execute perf: terminated with signal Aborted (core dumped) ... [vagrant@fedora-26 src]$ gdb .libs/lt-mesos-tests core.5918 ... (gdb) where #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x7efed2b1f450 in __GI_abort () at abort.c:89 #2 0x00a94436 in _Abort (prefix=0x7efedbb06084 "ABORT: (/opt/home/src/mesos/3rdparty/libprocess/include/process/posix/subprocess.hpp:195): ", message=0x7efea4000a30 "Failed to os::execvpe on path 'perf': No such file or directory") at /opt/home/src/mesos/3rdparty/stout/include/stout/abort.hpp:74 #3 0x00a940a0 in _Abort (prefix=0x7efedbb06084 "ABORT: (/opt/home/src/mesos/3rdparty/libprocess/include/process/posix/subprocess.hpp:195): ", message="Failed to os::execvpe on path 'perf': No such file or directory") at /opt/home/src/mesos/3rdparty/stout/include/stout/abort.hpp:80 ... {noformat} > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Andrei Budnik > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064011#comment-16064011 ] Vinod Kone commented on MESOS-7160: --- [~abudnik] Are you actively working on this? If yes, can you please add this to the sprint? > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Andrei Budnik > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063895#comment-16063895 ] James Peach commented on MESOS-7160: This morning, my VM doesn't reproduce this, however it definitely happened :) The normal code path is that the {{exec}} failure causes an abort. The supervisor then gets SIGTERM (need to read more code to see why). The signal handler it has installed issued SIGKILL. If the SIGTERM delivery is delayed, then the second abort in the supervisor could trigger. {{noformat}} [pid 2738] execve("/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */ [pid 2737] wait4(2738, [pid 2738] <... execve resumed> ) = -1 ENOENT (No such file or directory) [pid 2738] execve("/usr/sbin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */) = -1 ENOENT (No such file or directory) [pid 2738] execve("/usr/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */) = -1 ENOENT (No such file or directory) [pid 2738] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2738, si_uid=0} --- ... [pid 2737] <... wait4 resumed> 0x7f27e8901f44, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) [pid 2737] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2708, si_uid=0} --- [pid 2738] +++ killed by SIGKILL +++ [pid 2737] +++ killed by SIGKILL +++ {{noformat}} > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Andrei Budnik > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063108#comment-16063108 ] Andrei Budnik commented on MESOS-7160: -- It's very unusual that parent fails to wait for a child. Sure, {{waitpid}} can be called before or after child exits. But AFAIK it shouldn't be a race condition as kernel keeps {{task_struct}} until parent process invokes {{wait*()}}. In addition, if path to an executable is invalid, then {{execv}} will fail, causing invocation of {{abort()}} twice: [https://github.com/apache/mesos/blob/18695ae8d5cfc209072950e887495a42dd83a1d9/3rdparty/libprocess/include/process/posix/subprocess.hpp#L195] [https://github.com/apache/mesos/blob/18695ae8d5cfc209072950e887495a42dd83a1d9/3rdparty/libprocess/src/subprocess.cpp#L181] > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062533#comment-16062533 ] James Peach commented on MESOS-7160: Ping [~abudnik] ^^^ > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062532#comment-16062532 ] James Peach commented on MESOS-7160: I looked > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015165#comment-16015165 ] James Peach commented on MESOS-7160: FWIW `perf` can crash of misbehave when it is not matched to the kernel version. > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014946#comment-16014946 ] Till Toenshoff commented on MESOS-7160: --- Related? {noformat} 3:07:20 [ RUN ] PerfTest.Version 23:07:20 ../../src/tests/containerizer/perf_tests.cpp:134: Failure 23:07:20 (perf::version()).failure(): Failed to execute perf: exited with status 2 {noformat} > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013880#comment-16013880 ] Benjamin Bannier commented on MESOS-7160: - [This|https://www.spinics.net/lists/linux-perf-users/msg02998.html] might be related, but it is hard to be certain without knowing the exact versions of the host kernel, linux-tools inside the container, and possibly docker. > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879278#comment-15879278 ] James Peach commented on MESOS-7160: perf aborted for some reason. > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)