[jira] [Created] (MESOS-6925) Break down the `mesos-protobufs` target in CMake further.
Michael Park created MESOS-6925: --- Summary: Break down the `mesos-protobufs` target in CMake further. Key: MESOS-6925 URL: https://issues.apache.org/jira/browse/MESOS-6925 Project: Mesos Issue Type: Task Components: cmake Reporter: Michael Park In the {{mesos-tidy}} setup, we need to perform the protobuf generation, but we don't need to compile the generated files. If we could have a separate targets for protobuf generation vs compilation, we would be able to do less work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6924) Add a target for external dependencies in CMake.
[ https://issues.apache.org/jira/browse/MESOS-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-6924: Description: It would be nice to be able to have a target for external dependencies, i.e. 3rdparty dependencies except {{stout}}/{{libprocess}}. This would help with {{mesos-tidy}} in specific, to do less work. We can currently spell out all of them + a hack around {{libev}} vs {{libevent}}, it would be cleaner to simply build an external dependencies target. {code} # Build the external dependencies. # TODO(mpark): Use an external dependencies target once MESOS-6924 is resolved. cmake --build 3rdparty --target boost-1.53.0 cmake --build 3rdparty --target elfio-3.2 cmake --build 3rdparty --target glog-0.3.3 cmake --build 3rdparty --target gmock-1.7.0 cmake --build 3rdparty --target http_parser-2.6.2 # TODO(mpark): The `|| true` is a hack to try both `libev` and `libevent` and # use whichever one happens to be configured. This would also go # away with MESOS-6924. cmake --build 3rdparty --target libev-4.22 || true cmake --build 3rdparty --target libevent-2.1.5-beta || true cmake --build 3rdparty --target leveldb-1.4 cmake --build 3rdparty --target nvml-352.79 cmake --build 3rdparty --target picojson-1.3.0 cmake --build 3rdparty --target protobuf-2.6.1 cmake --build 3rdparty --target zookeeper-3.4.8 {code} was: It would be nice to be able to have a target for external dependencies, i.e. 3rdparty dependencies except {{stout}}/{{libprocess}}. This would help with {{mesos-tidy}} in specific, to do less work. We can currently spell out all of them + a hack around {{libev}} vs {{libevent}}, it would be cleaner to simply build an external dependencies target. {code} cmake --build 3rdparty --target boost-1.53.0 cmake --build 3rdparty --target elfio-3.2 cmake --build 3rdparty --target glog-0.3.3 cmake --build 3rdparty --target gmock-1.7.0 cmake --build 3rdparty --target http_parser-2.6.2 # NOTE: Try both `libev` and `libevent`. This is a terrible hack. cmake --build 3rdparty --target libev-4.22 || true cmake --build 3rdparty --target libevent-2.1.5-beta || true cmake --build 3rdparty --target leveldb-1.4 cmake --build 3rdparty --target nvml-352.79 cmake --build 3rdparty --target picojson-1.3.0 cmake --build 3rdparty --target protobuf-2.6.1 cmake --build 3rdparty --target zookeeper-3.4.8 {code} > Add a target for external dependencies in CMake. > > > Key: MESOS-6924 > URL: https://issues.apache.org/jira/browse/MESOS-6924 > Project: Mesos > Issue Type: Task > Components: cmake >Reporter: Michael Park > > It would be nice to be able to have a target for external dependencies, i.e. > 3rdparty dependencies except {{stout}}/{{libprocess}}. This would help with > {{mesos-tidy}} in specific, to do less work. We can currently spell out all > of them + a hack around {{libev}} vs {{libevent}}, it would be cleaner to > simply build an external dependencies target. > {code} > # Build the external dependencies. > # TODO(mpark): Use an external dependencies target once MESOS-6924 is > resolved. > cmake --build 3rdparty --target boost-1.53.0 > cmake --build 3rdparty --target elfio-3.2 > cmake --build 3rdparty --target glog-0.3.3 > cmake --build 3rdparty --target gmock-1.7.0 > cmake --build 3rdparty --target http_parser-2.6.2 > # TODO(mpark): The `|| true` is a hack to try both `libev` and `libevent` and > # use whichever one happens to be configured. This would also go > # away with MESOS-6924. > cmake --build 3rdparty --target libev-4.22 || true > cmake --build 3rdparty --target libevent-2.1.5-beta || true > cmake --build 3rdparty --target leveldb-1.4 > cmake --build 3rdparty --target nvml-352.79 > cmake --build 3rdparty --target picojson-1.3.0 > cmake --build 3rdparty --target protobuf-2.6.1 > cmake --build 3rdparty --target zookeeper-3.4.8 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6789) SSL socket's 'shutdown()' method is broken
[ https://issues.apache.org/jira/browse/MESOS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822560#comment-15822560 ] Joseph Wu commented on MESOS-6789: -- {code} commit c600d12a01865daad8ba7607b53eff35686f0f35 Author: Greg MannDate: Fri Jan 13 15:56:34 2017 -0800 Fixed SSL socket 'shutdown()'. Recently, a change was made to the signature of `Socket::shutdown`, but the corresponding override in `LibeventSSLSocketImpl` was not updated, so that the implementation-specific method is no longer being executed. Further, the SSL socket's `shutdown` code did not actually shutdown the socket; rather, the shutdown was performed in the destructor. This patch updates the function's signature to match that of the base class's method, adds the `override` specifier to the implemention's method declaration, and updates the function to properly shutdown the SSL socket. Review: https://reviews.apache.org/r/55343/ {code} > SSL socket's 'shutdown()' method is broken > -- > > Key: MESOS-6789 > URL: https://issues.apache.org/jira/browse/MESOS-6789 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Greg Mann >Assignee: Greg Mann > Labels: encryption, libprocess, ssl > Fix For: 1.2.0 > > > We recently uncovered two issues with the {{LibeventSSLSocketImpl::shutdown}} > method: > * The introduction of a shutdown method parameter with [this > commit|https://reviews.apache.org/r/54113/] means that the implementation's > method is no longer overriding the default implementation. In addition to > fixing the implementation method's signature, we should add the {{override}} > specifier to all of our socket implementations' methods to ensure that this > doesn't happen in the future. > * The {{LibeventSSLSocketImpl::shutdown}} function does not actually shutdown > the SSL socket. The proper function to shutdown an SSL socket is > {{SSL_shutdown}}, which is called in the implementation's destructor. We > should move this into {{shutdown()}} so that by the time that method returns, > the socket has actually been shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6789) SSL socket's 'shutdown()' method is broken
[ https://issues.apache.org/jira/browse/MESOS-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-6789: - Fix Version/s: 1.2.0 > SSL socket's 'shutdown()' method is broken > -- > > Key: MESOS-6789 > URL: https://issues.apache.org/jira/browse/MESOS-6789 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Greg Mann >Assignee: Greg Mann > Labels: encryption, libprocess, ssl > Fix For: 1.2.0 > > > We recently uncovered two issues with the {{LibeventSSLSocketImpl::shutdown}} > method: > * The introduction of a shutdown method parameter with [this > commit|https://reviews.apache.org/r/54113/] means that the implementation's > method is no longer overriding the default implementation. In addition to > fixing the implementation method's signature, we should add the {{override}} > specifier to all of our socket implementations' methods to ensure that this > doesn't happen in the future. > * The {{LibeventSSLSocketImpl::shutdown}} function does not actually shutdown > the SSL socket. The proper function to shutdown an SSL socket is > {{SSL_shutdown}}, which is called in the implementation's destructor. We > should move this into {{shutdown()}} so that by the time that method returns, > the socket has actually been shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6924) Add a target for external dependencies in CMake.
Michael Park created MESOS-6924: --- Summary: Add a target for external dependencies in CMake. Key: MESOS-6924 URL: https://issues.apache.org/jira/browse/MESOS-6924 Project: Mesos Issue Type: Task Components: cmake Reporter: Michael Park It would be nice to be able to have a target for external dependencies, i.e. 3rdparty dependencies except {{stout}}/{{libprocess}}. This would help with {{mesos-tidy}} in specific, to do less work. We can currently spell out all of them + a hack around {{libev}} vs {{libevent}}, it would be cleaner to simply build an external dependencies target. {code} cmake --build 3rdparty --target boost-1.53.0 cmake --build 3rdparty --target elfio-3.2 cmake --build 3rdparty --target glog-0.3.3 cmake --build 3rdparty --target gmock-1.7.0 cmake --build 3rdparty --target http_parser-2.6.2 # NOTE: Try both `libev` and `libevent`. This is a terrible hack. cmake --build 3rdparty --target libev-4.22 || true cmake --build 3rdparty --target libevent-2.1.5-beta || true cmake --build 3rdparty --target leveldb-1.4 cmake --build 3rdparty --target nvml-352.79 cmake --build 3rdparty --target picojson-1.3.0 cmake --build 3rdparty --target protobuf-2.6.1 cmake --build 3rdparty --target zookeeper-3.4.8 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6924) Add a target for external dependencies in CMake.
[ https://issues.apache.org/jira/browse/MESOS-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-6924: Description: It would be nice to be able to have a target for external dependencies, i.e. 3rdparty dependencies except {{stout}}/{{libprocess}}. This would help with {{mesos-tidy}} in specific, to do less work. We can currently spell out all of them + a hack around {{libev}} vs {{libevent}}, it would be cleaner to simply build an external dependencies target. {code} cmake --build 3rdparty --target boost-1.53.0 cmake --build 3rdparty --target elfio-3.2 cmake --build 3rdparty --target glog-0.3.3 cmake --build 3rdparty --target gmock-1.7.0 cmake --build 3rdparty --target http_parser-2.6.2 # NOTE: Try both `libev` and `libevent`. This is a terrible hack. cmake --build 3rdparty --target libev-4.22 || true cmake --build 3rdparty --target libevent-2.1.5-beta || true cmake --build 3rdparty --target leveldb-1.4 cmake --build 3rdparty --target nvml-352.79 cmake --build 3rdparty --target picojson-1.3.0 cmake --build 3rdparty --target protobuf-2.6.1 cmake --build 3rdparty --target zookeeper-3.4.8 {code} was: It would be nice to be able to have a target for external dependencies, i.e. 3rdparty dependencies except {{stout}}/{{libprocess}}. This would help with {{mesos-tidy}} in specific, to do less work. We can currently spell out all of them + a hack around {{libev}} vs {{libevent}}, it would be cleaner to simply build an external dependencies target. {code} cmake --build 3rdparty --target boost-1.53.0 cmake --build 3rdparty --target elfio-3.2 cmake --build 3rdparty --target glog-0.3.3 cmake --build 3rdparty --target gmock-1.7.0 cmake --build 3rdparty --target http_parser-2.6.2 # NOTE: Try both `libev` and `libevent`. This is a terrible hack. cmake --build 3rdparty --target libev-4.22 || true cmake --build 3rdparty --target libevent-2.1.5-beta || true cmake --build 3rdparty --target leveldb-1.4 cmake --build 3rdparty --target nvml-352.79 cmake --build 3rdparty --target picojson-1.3.0 cmake --build 3rdparty --target protobuf-2.6.1 cmake --build 3rdparty --target zookeeper-3.4.8 {code} > Add a target for external dependencies in CMake. > > > Key: MESOS-6924 > URL: https://issues.apache.org/jira/browse/MESOS-6924 > Project: Mesos > Issue Type: Task > Components: cmake >Reporter: Michael Park > > It would be nice to be able to have a target for external dependencies, i.e. > 3rdparty dependencies except {{stout}}/{{libprocess}}. This would help with > {{mesos-tidy}} in specific, to do less work. We can currently spell out all > of them + a hack around {{libev}} vs {{libevent}}, it would be cleaner to > simply build an external dependencies target. > {code} > cmake --build 3rdparty --target boost-1.53.0 > cmake --build 3rdparty --target elfio-3.2 > cmake --build 3rdparty --target glog-0.3.3 > cmake --build 3rdparty --target gmock-1.7.0 > cmake --build 3rdparty --target http_parser-2.6.2 > # NOTE: Try both `libev` and `libevent`. This is a terrible hack. > cmake --build 3rdparty --target libev-4.22 || true > cmake --build 3rdparty --target libevent-2.1.5-beta || true > cmake --build 3rdparty --target leveldb-1.4 > cmake --build 3rdparty --target nvml-352.79 > cmake --build 3rdparty --target picojson-1.3.0 > cmake --build 3rdparty --target protobuf-2.6.1 > cmake --build 3rdparty --target zookeeper-3.4.8 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6802) SSL socket can lose bytes in the case of EOF
[ https://issues.apache.org/jira/browse/MESOS-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822555#comment-15822555 ] Joseph Wu commented on MESOS-6802: -- {code} commit 5023e004030e6018ea64f6824c353ffe4165c907 Author: Greg MannDate: Fri Jan 13 15:47:57 2017 -0800 Added new libprocess socket tests. This patch adds NetSocketTest.EOFBeforeRecv and NetSocketTest.EOFAfterRecv to verify that EOFs are reliably received whether or not there is a pending recv() request at the time the EOF is received. Review: https://reviews.apache.org/r/53803/ {code} > SSL socket can lose bytes in the case of EOF > > > Key: MESOS-6802 > URL: https://issues.apache.org/jira/browse/MESOS-6802 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Greg Mann >Assignee: Greg Mann > Labels: libevent, libprocess, ssl > Fix For: 1.2.0 > > > During recent work on SSL-enabled tests in libprocess (MESOS-5966), we > discovered a bug in {{LibeventSSLSocketImpl}}, wherein the socket can either > fail to receive an EOF, or lose data when an EOF is received. > The {{LibeventSSLSocketImpl::event_callback(short events)}} method > immediately sets any pending {{RecvRequest}}'s promise to zero upon receipt > of an EOF. However, at the time the promise is set, there may actually be > data waiting to be read by libevent. Upon receipt of an EOF, we should > attempt to read the socket's bufferevent first to ensure that we aren't > losing any data previously received by the socket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4705) Linux 'perf' parsing logic may fail when OS distribution has perf backports.
[ https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822425#comment-15822425 ] Benjamin Mahler commented on MESOS-4705: [~fan.du] It appears that this doesn't handle my version of perf on CentOS 7.3.1611 with perf version 3.10.0-514.2.2.el7.x86_64.debug: {noformat} (statistics).failure(): Failed to parse perf sample: Failed to parse perf sample line '3710583015,,cycles,mesos_test,1459686383,100.00,2.539,GHz': Unexpected number of fields {noformat} > Linux 'perf' parsing logic may fail when OS distribution has perf backports. > > > Key: MESOS-4705 > URL: https://issues.apache.org/jira/browse/MESOS-4705 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation >Affects Versions: 0.27.1 >Reporter: Fan Du >Assignee: Fan Du > Fix For: 0.26.2, 0.27.3, 0.28.2, 1.0.0 > > > When sampling container with perf event on Centos7 with kernel > 3.10.0-123.el7.x86_64, slave complained with below error spew: > {code} > E0218 16:32:00.591181 8376 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line > '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00': > Unexpected number of fields > {code} > it's caused by the current perf format [assumption | > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430] > with kernel version below 3.12 > On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below: > value,unit,event,cgroup,running,ratio > A local modification fixed this error on my test bed, please review this > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6843) Fetcher should not assume stdout/stderr in the sandbox.
[ https://issues.apache.org/jira/browse/MESOS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822406#comment-15822406 ] Joseph Wu commented on MESOS-6843: -- Thinking out loud: The ideal solution would be to pipe logs from the fetcher into the container logger. In the past, this would have required a pretty large refactor, as the container logger simply outputs a description of FDs (or FDs that can only be inherited once). But now, in the Mesos containerizer at least, we have the IO Switchboard sitting in between the container and the container logger. (Logs go container -> IO Switchboard -> container logger.) It is conceivable to add a way of injecting stdout/stderr to the IO Switchboard (cc [~klueska]). We'd still need another solution for docker containers though. > Fetcher should not assume stdout/stderr in the sandbox. > --- > > Key: MESOS-6843 > URL: https://issues.apache.org/jira/browse/MESOS-6843 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.0.2, 1.1.0 >Reporter: Jie Yu >Priority: Critical > Labels: mesosphere > > If container logger is used, this assumption might not be true. For instance, > a journald logger might redirect all task logs to journald. So in theory, the > fetcher log should go to journald as well, rather than writing to > sandbox/stdout and sandbox/stderr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-6923) mesos 1.0.1-2.0.94.ubuntu1604 + python 2.7 install == no bueno
[ https://issues.apache.org/jira/browse/MESOS-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Scherger updated MESOS-6923: - Comment: was deleted (was: this hack seems to work: {code} mv /usr/lib/python2.7/site-packages /tmp/ apt-get install python-minimal mv /tmp/site-packages /usr/lib/python2.7/site-packages {code}) > mesos 1.0.1-2.0.94.ubuntu1604 + python 2.7 install == no bueno > --- > > Key: MESOS-6923 > URL: https://issues.apache.org/jira/browse/MESOS-6923 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.0.1 >Reporter: Alan Scherger > > When you install the mesos package on xenial we drop files into: > # ls -al /usr/lib/python2.7/site-packages/ > total 64 > drwxr-xr-x 9 root root 4096 Sep 21 02:54 . > drwxr-xr-x 4 root root 12288 Jan 13 21:28 .. > drwxr-xr-x 6 root root 4096 Jan 13 21:02 mesos > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos-1.0.1.dist-info > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.cli-1.0.1.dist-info > -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.cli-1.0.1-py2.7-nspkg.pth > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.executor-1.0.1.dist-info > -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.executor-1.0.1-py2.7-nspkg.pth > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.interface-1.0.1.dist-info > -rw-rw-r-- 1 root root 302 Sep 2 03:02 > mesos.interface-1.0.1-py2.7-nspkg.pth > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.native-1.0.1.dist-info > -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.native-1.0.1-py2.7-nspkg.pth > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.scheduler-1.0.1.dist-info > -rw-rw-r-- 1 root root 302 Sep 2 03:02 > mesos.scheduler-1.0.1-py2.7-nspkg.pth > when you got to install "python-minimal" after the fact it fails with: > new installation of python2.7-minimal; /usr/lib/python2.7/site-packages is a > directory > which is expected a symlink to /usr/local/lib/python2.7/dist-packages. > please find the package shipping files in /usr/lib/python2.7/site-packages and > file a bug report to ship these in /usr/lib/python2.7/dist-packages instead > aborting installation of python2.7-minimal > idk if we care, but it makes my life :sadpanda: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6923) mesos 1.0.1-2.0.94.ubuntu1604 + python 2.7 install == no bueno
[ https://issues.apache.org/jira/browse/MESOS-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822390#comment-15822390 ] Alan Scherger commented on MESOS-6923: -- this hack seems to work: {code} mv /usr/lib/python2.7/site-packages /tmp/ apt-get install python-minimal mv /tmp/site-packages /usr/lib/python2.7/site-packages {code} > mesos 1.0.1-2.0.94.ubuntu1604 + python 2.7 install == no bueno > --- > > Key: MESOS-6923 > URL: https://issues.apache.org/jira/browse/MESOS-6923 > Project: Mesos > Issue Type: Improvement >Affects Versions: 1.0.1 >Reporter: Alan Scherger > > When you install the mesos package on xenial we drop files into: > # ls -al /usr/lib/python2.7/site-packages/ > total 64 > drwxr-xr-x 9 root root 4096 Sep 21 02:54 . > drwxr-xr-x 4 root root 12288 Jan 13 21:28 .. > drwxr-xr-x 6 root root 4096 Jan 13 21:02 mesos > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos-1.0.1.dist-info > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.cli-1.0.1.dist-info > -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.cli-1.0.1-py2.7-nspkg.pth > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.executor-1.0.1.dist-info > -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.executor-1.0.1-py2.7-nspkg.pth > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.interface-1.0.1.dist-info > -rw-rw-r-- 1 root root 302 Sep 2 03:02 > mesos.interface-1.0.1-py2.7-nspkg.pth > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.native-1.0.1.dist-info > -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.native-1.0.1-py2.7-nspkg.pth > drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.scheduler-1.0.1.dist-info > -rw-rw-r-- 1 root root 302 Sep 2 03:02 > mesos.scheduler-1.0.1-py2.7-nspkg.pth > when you got to install "python-minimal" after the fact it fails with: > new installation of python2.7-minimal; /usr/lib/python2.7/site-packages is a > directory > which is expected a symlink to /usr/local/lib/python2.7/dist-packages. > please find the package shipping files in /usr/lib/python2.7/site-packages and > file a bug report to ship these in /usr/lib/python2.7/dist-packages instead > aborting installation of python2.7-minimal > idk if we care, but it makes my life :sadpanda: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6923) mesos 1.0.1-2.0.94.ubuntu1604 + python 2.7 install == no bueno
Alan Scherger created MESOS-6923: Summary: mesos 1.0.1-2.0.94.ubuntu1604 + python 2.7 install == no bueno Key: MESOS-6923 URL: https://issues.apache.org/jira/browse/MESOS-6923 Project: Mesos Issue Type: Improvement Affects Versions: 1.0.1 Reporter: Alan Scherger When you install the mesos package on xenial we drop files into: # ls -al /usr/lib/python2.7/site-packages/ total 64 drwxr-xr-x 9 root root 4096 Sep 21 02:54 . drwxr-xr-x 4 root root 12288 Jan 13 21:28 .. drwxr-xr-x 6 root root 4096 Jan 13 21:02 mesos drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos-1.0.1.dist-info drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.cli-1.0.1.dist-info -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.cli-1.0.1-py2.7-nspkg.pth drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.executor-1.0.1.dist-info -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.executor-1.0.1-py2.7-nspkg.pth drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.interface-1.0.1.dist-info -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.interface-1.0.1-py2.7-nspkg.pth drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.native-1.0.1.dist-info -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.native-1.0.1-py2.7-nspkg.pth drwxrwxr-x 2 root root 4096 Sep 21 02:54 mesos.scheduler-1.0.1.dist-info -rw-rw-r-- 1 root root 302 Sep 2 03:02 mesos.scheduler-1.0.1-py2.7-nspkg.pth when you got to install "python-minimal" after the fact it fails with: new installation of python2.7-minimal; /usr/lib/python2.7/site-packages is a directory which is expected a symlink to /usr/local/lib/python2.7/dist-packages. please find the package shipping files in /usr/lib/python2.7/site-packages and file a bug report to ship these in /usr/lib/python2.7/dist-packages instead aborting installation of python2.7-minimal idk if we care, but it makes my life :sadpanda: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6843) Fetcher should not assume stdout/stderr in the sandbox.
[ https://issues.apache.org/jira/browse/MESOS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822326#comment-15822326 ] Adam B commented on MESOS-6843: --- This is rather annoying with a custom container logger, because even if my task doesn't fetch anything, there's still an empty stdout/stderr file in the sandbox, but all my task output is actually in journald. It's confusing as a user. Would this be complicated to do, or just a couple of new if checks? > Fetcher should not assume stdout/stderr in the sandbox. > --- > > Key: MESOS-6843 > URL: https://issues.apache.org/jira/browse/MESOS-6843 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.0.2, 1.1.0 >Reporter: Jie Yu >Priority: Critical > Labels: mesosphere > > If container logger is used, this assumption might not be true. For instance, > a journald logger might redirect all task logs to journald. So in theory, the > fetcher log should go to journald as well, rather than writing to > sandbox/stdout and sandbox/stderr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6843) Fetcher should not assume stdout/stderr in the sandbox.
[ https://issues.apache.org/jira/browse/MESOS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6843: -- Target Version/s: 1.2.0 Labels: mesosphere (was: ) Priority: Critical (was: Major) Component/s: fetcher > Fetcher should not assume stdout/stderr in the sandbox. > --- > > Key: MESOS-6843 > URL: https://issues.apache.org/jira/browse/MESOS-6843 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.0.2, 1.1.0 >Reporter: Jie Yu >Priority: Critical > Labels: mesosphere > > If container logger is used, this assumption might not be true. For instance, > a journald logger might redirect all task logs to journald. So in theory, the > fetcher log should go to journald as well, rather than writing to > sandbox/stdout and sandbox/stderr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6652) Perf version not correctly parsed on Fedora 24 (and probably others)
[ https://issues.apache.org/jira/browse/MESOS-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach updated MESOS-6652: --- Shepherd: Yan Xu > Perf version not correctly parsed on Fedora 24 (and probably others) > > > Key: MESOS-6652 > URL: https://issues.apache.org/jira/browse/MESOS-6652 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.1.0 > Environment: Fedora 24 >Reporter: Jan Schlicht >Assignee: James Peach >Priority: Minor > > Happened on a current Fedora 24 machine, when trying to run tests. > {noformat} > $ perf --version > perf version 4.8.10.200.fc24.x86_64.gc23c > {noformat} > doesn't seem to be parsed correctly by {{perf::supported()}}, because when > running {{./bin/mesos-tests.sh}} it reads > {noformat} > - > Could not find the 'perf' command or its version lower that 2.6.39 so tests > using it to sample the 'cpu-cycles' hardware event will not be run. > - > - > require 'perf' version >= 2.6.39 so no 'perf' tests will be run > - > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6652) Perf version not correctly parsed on Fedora 24 (and probably others)
[ https://issues.apache.org/jira/browse/MESOS-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-6652: -- Assignee: James Peach > Perf version not correctly parsed on Fedora 24 (and probably others) > > > Key: MESOS-6652 > URL: https://issues.apache.org/jira/browse/MESOS-6652 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.1.0 > Environment: Fedora 24 >Reporter: Jan Schlicht >Assignee: James Peach >Priority: Minor > > Happened on a current Fedora 24 machine, when trying to run tests. > {noformat} > $ perf --version > perf version 4.8.10.200.fc24.x86_64.gc23c > {noformat} > doesn't seem to be parsed correctly by {{perf::supported()}}, because when > running {{./bin/mesos-tests.sh}} it reads > {noformat} > - > Could not find the 'perf' command or its version lower that 2.6.39 so tests > using it to sample the 'cpu-cycles' hardware event will not be run. > - > - > require 'perf' version >= 2.6.39 so no 'perf' tests will be run > - > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6652) Perf version not correctly parsed on Fedora 24 (and probably others)
[ https://issues.apache.org/jira/browse/MESOS-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822315#comment-15822315 ] James Peach commented on MESOS-6652: | Handle perf versions with more than 3 components. | https://reviews.apache.org/r/55521/ | > Perf version not correctly parsed on Fedora 24 (and probably others) > > > Key: MESOS-6652 > URL: https://issues.apache.org/jira/browse/MESOS-6652 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.1.0 > Environment: Fedora 24 >Reporter: Jan Schlicht >Priority: Minor > > Happened on a current Fedora 24 machine, when trying to run tests. > {noformat} > $ perf --version > perf version 4.8.10.200.fc24.x86_64.gc23c > {noformat} > doesn't seem to be parsed correctly by {{perf::supported()}}, because when > running {{./bin/mesos-tests.sh}} it reads > {noformat} > - > Could not find the 'perf' command or its version lower that 2.6.39 so tests > using it to sample the 'cpu-cycles' hardware event will not be run. > - > - > require 'perf' version >= 2.6.39 so no 'perf' tests will be run > - > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6907) FutureTest.After3 is flaky
[ https://issues.apache.org/jira/browse/MESOS-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6907: -- Target Version/s: (was: 1.2.0) > FutureTest.After3 is flaky > -- > > Key: MESOS-6907 > URL: https://issues.apache.org/jira/browse/MESOS-6907 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Alexander Rojas > > There is apparently a race condition between the time an instance of > {{Future}} goes out of scope and when the enclosing data is actually > deleted, if {{Future::after(Duration, lambda::functionFuture&)>)}} is called. > The issue is more likely to occur if the machine is under load or if it is > not a very powerful one. The easiest way to reproduce it is to run: > {code} > $ stress -c 4 -t 2600 -d 2 -i 2 & > $ ./libprocess-tests --gtest_filter="FutureTest.After3" --gtest_repeat=-1 > --gtest_break_on_failure > {code} > An exploratory fix for the issue is to change the test to: > {code} > TEST(FutureTest, After3) > { > Future future; > process::WeakFuture weak_future(future); > EXPECT_SOME(weak_future.get()); > { > Clock::pause(); > // The original future disappears here. After this call the > // original future goes out of scope and should not be reachable > // anymore. > future = future > .after(Milliseconds(1), [](Future f) { > f.discard(); > return Nothing(); > }); > Clock::advance(Seconds(2)); > Clock::settle(); > AWAIT_READY(future); > } > if (weak_future.get().isSome()) { > os::sleep(Seconds(1)); > } > EXPECT_NONE(weak_future.get()); > EXPECT_FALSE(future.hasDiscard()); > } > {code} > The interesting thing of the fix is that both extra snippets are needed > (either one or the other is not enough) to prevent the issue from happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6907) FutureTest.After3 is flaky
[ https://issues.apache.org/jira/browse/MESOS-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-6907: -- Target Version/s: 1.2.0 > FutureTest.After3 is flaky > -- > > Key: MESOS-6907 > URL: https://issues.apache.org/jira/browse/MESOS-6907 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Alexander Rojas > > There is apparently a race condition between the time an instance of > {{Future}} goes out of scope and when the enclosing data is actually > deleted, if {{Future::after(Duration, lambda::functionFuture&)>)}} is called. > The issue is more likely to occur if the machine is under load or if it is > not a very powerful one. The easiest way to reproduce it is to run: > {code} > $ stress -c 4 -t 2600 -d 2 -i 2 & > $ ./libprocess-tests --gtest_filter="FutureTest.After3" --gtest_repeat=-1 > --gtest_break_on_failure > {code} > An exploratory fix for the issue is to change the test to: > {code} > TEST(FutureTest, After3) > { > Future future; > process::WeakFuture weak_future(future); > EXPECT_SOME(weak_future.get()); > { > Clock::pause(); > // The original future disappears here. After this call the > // original future goes out of scope and should not be reachable > // anymore. > future = future > .after(Milliseconds(1), [](Future f) { > f.discard(); > return Nothing(); > }); > Clock::advance(Seconds(2)); > Clock::settle(); > AWAIT_READY(future); > } > if (weak_future.get().isSome()) { > os::sleep(Seconds(1)); > } > EXPECT_NONE(weak_future.get()); > EXPECT_FALSE(future.hasDiscard()); > } > {code} > The interesting thing of the fix is that both extra snippets are needed > (either one or the other is not enough) to prevent the issue from happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6914) Command 'hadoop version 2>&1' failed
[ https://issues.apache.org/jira/browse/MESOS-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822267#comment-15822267 ] Kevin Klues commented on MESOS-6914: Maybe you could give a bit more context about how to reproduce the error? > Command 'hadoop version 2>&1' failed > > > Key: MESOS-6914 > URL: https://issues.apache.org/jira/browse/MESOS-6914 > Project: Mesos > Issue Type: Bug >Reporter: yangjunfeng > > I am green hand in spark on mesos. > when I run spark-shell on mesos. The error is below: > Command 'hadoop version 2>&1' failed; this is the output: > sh: hadoop: command not found > Failed to fetch > 'hdfs://188.188.0.189:9000/usr/yjf/spark-2.1.0-bin-hadoop2.7.tgz': Failed to > create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was > either not found or exited with a non-zero exit status: 127 > Failed to synchronize with agent (it's probably exited) > How can I fix this problom. > Thanks a lot! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6915) Encountered a problem while starting mesos-master
[ https://issues.apache.org/jira/browse/MESOS-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822262#comment-15822262 ] Kevin Klues commented on MESOS-6915: I am unsure what the error you think you are encountering is. Those appear to just be log messages printed while running the mesos master. > Encountered a problem while starting mesos-master > - > > Key: MESOS-6915 > URL: https://issues.apache.org/jira/browse/MESOS-6915 > Project: Mesos > Issue Type: Wish > Components: agent, master >Affects Versions: 1.1.0 >Reporter: Jijo Joy >Assignee: Kevin Klues > > I0112 17:23:43.639902 17432 http.cpp:391] HTTP GET for /master/state from > 192.168.10.35:44407 with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64) > AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36' > I0112 17:23:51.350908 17432 http.cpp:391] HTTP GET for /master/state from > 192.168.10.35:29323 with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64) > AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36' > I0112 17:23:52.892664 17430 http.cpp:391] HTTP GET for /master/state from > 192.168.10.35:29323 with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64; > Trident/7.0; rv:11.0) like Gecko' > I am getting the above notification while running mesos-master.sh > But still able to get the JAVA PYTHON example executed successfully . > I am new to the Apache Mesos and Clustering Environment. Kindly help !! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6915) Encountered a problem while starting mesos-master
[ https://issues.apache.org/jira/browse/MESOS-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues reassigned MESOS-6915: -- Assignee: Kevin Klues > Encountered a problem while starting mesos-master > - > > Key: MESOS-6915 > URL: https://issues.apache.org/jira/browse/MESOS-6915 > Project: Mesos > Issue Type: Wish > Components: agent, master >Affects Versions: 1.1.0 >Reporter: Jijo Joy >Assignee: Kevin Klues > > I0112 17:23:43.639902 17432 http.cpp:391] HTTP GET for /master/state from > 192.168.10.35:44407 with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64) > AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36' > I0112 17:23:51.350908 17432 http.cpp:391] HTTP GET for /master/state from > 192.168.10.35:29323 with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64) > AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36' > I0112 17:23:52.892664 17430 http.cpp:391] HTTP GET for /master/state from > 192.168.10.35:29323 with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64; > Trident/7.0; rv:11.0) like Gecko' > I am getting the above notification while running mesos-master.sh > But still able to get the JAVA PYTHON example executed successfully . > I am new to the Apache Mesos and Clustering Environment. Kindly help !! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6914) Command 'hadoop version 2>&1' failed
[ https://issues.apache.org/jira/browse/MESOS-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822255#comment-15822255 ] Kevin Klues commented on MESOS-6914: To resolve this, it depends on exactly what it is you'd like to accomplish. Do you actually want/need HDFS running on your system, or are you just bothered by the error message being there? The agent should still start up despite this error, correct? > Command 'hadoop version 2>&1' failed > > > Key: MESOS-6914 > URL: https://issues.apache.org/jira/browse/MESOS-6914 > Project: Mesos > Issue Type: Bug >Reporter: yangjunfeng > > I am green hand in spark on mesos. > when I run spark-shell on mesos. The error is below: > Command 'hadoop version 2>&1' failed; this is the output: > sh: hadoop: command not found > Failed to fetch > 'hdfs://188.188.0.189:9000/usr/yjf/spark-2.1.0-bin-hadoop2.7.tgz': Failed to > create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was > either not found or exited with a non-zero exit status: 127 > Failed to synchronize with agent (it's probably exited) > How can I fix this problom. > Thanks a lot! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6054) Agent Crash with Malformed UUID when doing TaskUpdate
[ https://issues.apache.org/jira/browse/MESOS-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822240#comment-15822240 ] Aaron Wood commented on MESOS-6054: --- [~dvonthenen] seems that Mesos expects valid v4 UUIDs. I've submitted a patch to fix the segfault that occurs https://reviews.apache.org/r/55480/ > Agent Crash with Malformed UUID when doing TaskUpdate > - > > Key: MESOS-6054 > URL: https://issues.apache.org/jira/browse/MESOS-6054 > Project: Mesos > Issue Type: Bug > Components: framework api >Affects Versions: 1.0.0 > Environment: Ubuntu 14.04, Mesos 1.0.0-2.0.89.ubuntu1404, Marathon > 1.1.2 >Reporter: David vonThenen >Priority: Minor > Attachments: _usr_sbin_mesos-slave.0.crash > > > When using the HTTP API using protobufs, if the UUID in a TaskUpdate is > malformed (in this case, was using a UUID that was base64 encoded), it would > cause the Agent where the executor is running on to crash and restart. > Here is a JSON dump of the protobuf used: > {code} > { > "executor_id": { > "value": "executor-scaleio1" > }, > "framework_id": { > "value": "ac8545a7-f8fc-431e-bc36-0239c4460658-0002" > }, > "type": 2, > "update": { > "status": { > "task_id": { > "value": "scaleio1" > }, > "state": 1, > "source": 2, > "executor_id": { > "value": "executor-scaleio1" > }, > "uuid": > "WVdVd01EQTFNakF0TkdVeU9TMDBNell3TFdJMk4yUXRPR05sT1RFNU56VmlPREUw" > } > } > } > {code} > In the master it looks like is processes the accept calls… but after it > processes all of them, it looks like the agents are immediately being > disconnected: > {code} > ... > ... > I0816 17:53:09.974340 4010 master.cpp:3342] Processing ACCEPT call for > offers: [ 2bf179c3-004a-49e3-98ab-5a75fa773522-O80 ] on agent > 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 > (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) for framework > 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (ScaleIO Framework) > W0816 17:53:09.974578 4010 validation.cpp:647] Executor executor-scaleio4 > for task scaleio4 uses less CPUs (None) than the minimum required (0.01). > Please update your executor, as this will be mandatory in future releases. > W0816 17:53:09.974604 4010 validation.cpp:659] Executor executor-scaleio4 > for task scaleio4 uses less memory (None) than the minimum required (32MB). > Please update your executor, as this will be mandatory in future releases. > I0816 17:53:09.974645 4010 master.cpp:7439] Adding task scaleio4 with > resources cpus(*):1; mem(*):2048 on agent > 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 > (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) > I0816 17:53:09.974668 4010 master.cpp:3831] Launching task scaleio4 of > framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (ScaleIO Framework) with > resources cpus(*):1; mem(*):2048 on agent > 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 > (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) > I0816 17:53:11.306182 4010 master.cpp:1245] Agent > 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 > (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) disconnected > I0816 17:53:11.306335 4010 master.cpp:2784] Disconnecting agent > 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 > (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) > I0816 17:53:11.306520 4010 master.cpp:2803] Deactivating agent > 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 > (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) > I0816 17:53:11.306676 4010 master.cpp:1264] Removing framework > 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (ScaleIO Framework) from > disconnected agent 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at > slave(1)@172.31.22.211:5051 > (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) because the framework is > not checkpointing > I0816 17:53:11.306798 4010 master.cpp:6448] Removing framework > 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (ScaleIO Framework) from agent > 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at slave(1)@172.31.22.211:5051 > (ec2-52-89-227-184.us-west-2.compute.amazonaws.com) > I0816 17:53:11.306882 4010 master.cpp:6833] Updating the state of task > scaleio4 of framework 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 (latest > state: TASK_LOST, status update state: TASK_LOST) > I0816 17:53:11.306778 4013 hierarchical.cpp:571] Agent > 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 deactivated > I0816 17:53:11.307140 4010 master.cpp:6899] Removing task scaleio4 with > resources cpus(*):1; mem(*):2048 of framework > 2bf179c3-004a-49e3-98ab-5a75fa773522-0001 on agent > 2bf179c3-004a-49e3-98ab-5a75fa773522-S7 at
[jira] [Updated] (MESOS-6917) Segfault when the executor sets an invalid UUID when sending a status update.
[ https://issues.apache.org/jira/browse/MESOS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Wood updated MESOS-6917: -- Description: A segfault occurs when an executor sets a UUID that's not a valid v4 UUID and sends it off to the agent: {code} ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state == ERROR: Not a valid UUID *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are using GNU date *** PC: @ 0x7efeb6101428 (unknown) *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID 14007; stack trace: *** @ 0x7efeb64a6390 (unknown) @ 0x7efeb6101428 (unknown) @ 0x7efeb610302a (unknown) @ 0x560df739fa6e _Abort() @ 0x560df739fa9c _Abort() @ 0x7efebb53a5ad Try<>::get() @ 0x7efebb5363d6 Try<>::get() @ 0x7efebbd84809 mesos::internal::slave::validation::executor::call::validate() @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor() @ 0x7efebbc773b8 _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_ @ 0x7efebbcb5808 _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E1_E9_M_invokeERKSt9_Any_dataS7_ @ 0x7efebbfb2aea std::function<>::operator()() @ 0x7efebcb158b8 _ZZZN7process11ProcessBase6_visitERKNS0_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSD_14authentication20AuthenticationResultEEE0_clESN_ENKUlbE1_clEb @ 0x7efebcb1a10a _ZZZNK7process9_DeferredIZZNS_11ProcessBase6_visitERKNS1_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSE_14authentication20AuthenticationResultEEE0_clESO_EUlbE1_EcvSt8functionIFT_T0_EEINS_6FutureINSE_8ResponseEEERKbEEvENKUlS12_E_clES12_ENKUlvE_clEv @ 0x7efebcb1c5f8 _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEEvEZZNKS0_9_DeferredIZZNS0_11ProcessBase6_visitERKNS7_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_5OwnedINS2_7RequestNKUlRK6OptionINS2_14authentication20AuthenticationResultEEE0_clEST_EUlbE1_EcvSt8functionIFT_T0_EEIS4_RKbEEvENKUlS14_E_clES14_EUlvE_E9_M_invokeERKSt9_Any_data @ 0x7efebb5ce8ca std::function<>::operator()() @ 0x7efebb5c4b27 _ZZN7process8internal8DispatchINS_6FutureINS_4http8ResponseclIRSt8functionIFS5_vS5_RKNS_4UPIDEOT_ENKUlPNS_11ProcessBaseEE_clESI_ @ 0x7efebb5d4e1e _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchINS0_6FutureINS0_4http8ResponseclIRSt8functionIFS9_vS9_RKNS0_4UPIDEOT_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_ @ 0x7efebcb30baf std::function<>::operator()() @ 0x7efebcb13fd6 process::ProcessBase::visit() @ 0x7efebcb1f3c8 process::DispatchEvent::visit() @ 0x7efebb3ab2ea process::ProcessBase::serve() @ 0x7efebcb0fe8a process::ProcessManager::resume() @ 0x7efebcb0c5a3 _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv @ 0x7efebcb1ea34 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE @ 0x7efebcb1e98a _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv @ 0x7efebcb1e91a _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7efeb6980c80 (unknown) @ 0x7efeb649c6ba start_thread @ 0x7efeb61d282d (unknown) Aborted (core dumped) {code} https://reviews.apache.org/r/55480/ was: A segfault occurs when an executor sends an update with a UUID that's not a valid v4 UUID and sends it off to the agent: {code} ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state == ERROR: Not a valid UUID *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are using GNU date *** PC: @ 0x7efeb6101428 (unknown) *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID 14007; stack trace: *** @ 0x7efeb64a6390 (unknown) @ 0x7efeb6101428 (unknown) @ 0x7efeb610302a (unknown) @ 0x560df739fa6e _Abort() @ 0x560df739fa9c _Abort() @ 0x7efebb53a5ad Try<>::get() @ 0x7efebb5363d6 Try<>::get() @ 0x7efebbd84809 mesos::internal::slave::validation::executor::call::validate() @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor() @ 0x7efebbc773b8 _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_ @ 0x7efebbcb5808 _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E1_E9_M_invokeERKSt9_Any_dataS7_ @ 0x7efebbfb2aea std::function<>::operator()() @
[jira] [Updated] (MESOS-6917) Segfault when the executor sets an invalid UUID when sending a status update.
[ https://issues.apache.org/jira/browse/MESOS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-6917: -- Target Version/s: 1.1.1, 1.2.0, 1.0.3 (was: 1.0.0, 1.1.0, 1.2.0) Labels: mesosphere (was: ) Description: A segfault occurs when an executor sends an update with a UUID that's not a valid v4 UUID and sends it off to the agent: {code} ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state == ERROR: Not a valid UUID *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are using GNU date *** PC: @ 0x7efeb6101428 (unknown) *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID 14007; stack trace: *** @ 0x7efeb64a6390 (unknown) @ 0x7efeb6101428 (unknown) @ 0x7efeb610302a (unknown) @ 0x560df739fa6e _Abort() @ 0x560df739fa9c _Abort() @ 0x7efebb53a5ad Try<>::get() @ 0x7efebb5363d6 Try<>::get() @ 0x7efebbd84809 mesos::internal::slave::validation::executor::call::validate() @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor() @ 0x7efebbc773b8 _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_ @ 0x7efebbcb5808 _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E1_E9_M_invokeERKSt9_Any_dataS7_ @ 0x7efebbfb2aea std::function<>::operator()() @ 0x7efebcb158b8 _ZZZN7process11ProcessBase6_visitERKNS0_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSD_14authentication20AuthenticationResultEEE0_clESN_ENKUlbE1_clEb @ 0x7efebcb1a10a _ZZZNK7process9_DeferredIZZNS_11ProcessBase6_visitERKNS1_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSE_14authentication20AuthenticationResultEEE0_clESO_EUlbE1_EcvSt8functionIFT_T0_EEINS_6FutureINSE_8ResponseEEERKbEEvENKUlS12_E_clES12_ENKUlvE_clEv @ 0x7efebcb1c5f8 _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEEvEZZNKS0_9_DeferredIZZNS0_11ProcessBase6_visitERKNS7_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_5OwnedINS2_7RequestNKUlRK6OptionINS2_14authentication20AuthenticationResultEEE0_clEST_EUlbE1_EcvSt8functionIFT_T0_EEIS4_RKbEEvENKUlS14_E_clES14_EUlvE_E9_M_invokeERKSt9_Any_data @ 0x7efebb5ce8ca std::function<>::operator()() @ 0x7efebb5c4b27 _ZZN7process8internal8DispatchINS_6FutureINS_4http8ResponseclIRSt8functionIFS5_vS5_RKNS_4UPIDEOT_ENKUlPNS_11ProcessBaseEE_clESI_ @ 0x7efebb5d4e1e _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchINS0_6FutureINS0_4http8ResponseclIRSt8functionIFS9_vS9_RKNS0_4UPIDEOT_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_ @ 0x7efebcb30baf std::function<>::operator()() @ 0x7efebcb13fd6 process::ProcessBase::visit() @ 0x7efebcb1f3c8 process::DispatchEvent::visit() @ 0x7efebb3ab2ea process::ProcessBase::serve() @ 0x7efebcb0fe8a process::ProcessManager::resume() @ 0x7efebcb0c5a3 _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv @ 0x7efebcb1ea34 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE @ 0x7efebcb1e98a _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv @ 0x7efebcb1e91a _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7efeb6980c80 (unknown) @ 0x7efeb649c6ba start_thread @ 0x7efeb61d282d (unknown) Aborted (core dumped) {code} was: A segfault occurs when an executor sets a UUID that's not a valid v4 UUID and sends it off to the agent: {code} ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state == ERROR: Not a valid UUID *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are using GNU date *** PC: @ 0x7efeb6101428 (unknown) *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID 14007; stack trace: *** @ 0x7efeb64a6390 (unknown) @ 0x7efeb6101428 (unknown) @ 0x7efeb610302a (unknown) @ 0x560df739fa6e _Abort() @ 0x560df739fa9c _Abort() @ 0x7efebb53a5ad Try<>::get() @ 0x7efebb5363d6 Try<>::get() @ 0x7efebbd84809 mesos::internal::slave::validation::executor::call::validate() @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor() @ 0x7efebbc773b8 _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_ @ 0x7efebbcb5808
[jira] [Updated] (MESOS-6917) Segfault when the executor sets a UUID that is not a valid v4 UUID
[ https://issues.apache.org/jira/browse/MESOS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6917: --- Affects Version/s: (was: 1.2.0) > Segfault when the executor sets a UUID that is not a valid v4 UUID > -- > > Key: MESOS-6917 > URL: https://issues.apache.org/jira/browse/MESOS-6917 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0 >Reporter: Aaron Wood >Assignee: Aaron Wood >Priority: Blocker > > A segfault occurs when an executor sets a UUID that's not a valid v4 UUID and > sends it off to the agent: > {code} > ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state > == ERROR: Not a valid UUID > *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are > using GNU date *** > PC: @ 0x7efeb6101428 (unknown) > *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID > 14007; stack trace: *** > @ 0x7efeb64a6390 (unknown) > @ 0x7efeb6101428 (unknown) > @ 0x7efeb610302a (unknown) > @ 0x560df739fa6e _Abort() > @ 0x560df739fa9c _Abort() > @ 0x7efebb53a5ad Try<>::get() > @ 0x7efebb5363d6 Try<>::get() > @ 0x7efebbd84809 > mesos::internal::slave::validation::executor::call::validate() > @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor() > @ 0x7efebbc773b8 > _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_ > @ 0x7efebbcb5808 > _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E1_E9_M_invokeERKSt9_Any_dataS7_ > @ 0x7efebbfb2aea std::function<>::operator()() > @ 0x7efebcb158b8 > _ZZZN7process11ProcessBase6_visitERKNS0_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSD_14authentication20AuthenticationResultEEE0_clESN_ENKUlbE1_clEb > @ 0x7efebcb1a10a > _ZZZNK7process9_DeferredIZZNS_11ProcessBase6_visitERKNS1_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSE_14authentication20AuthenticationResultEEE0_clESO_EUlbE1_EcvSt8functionIFT_T0_EEINS_6FutureINSE_8ResponseEEERKbEEvENKUlS12_E_clES12_ENKUlvE_clEv > @ 0x7efebcb1c5f8 > _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEEvEZZNKS0_9_DeferredIZZNS0_11ProcessBase6_visitERKNS7_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_5OwnedINS2_7RequestNKUlRK6OptionINS2_14authentication20AuthenticationResultEEE0_clEST_EUlbE1_EcvSt8functionIFT_T0_EEIS4_RKbEEvENKUlS14_E_clES14_EUlvE_E9_M_invokeERKSt9_Any_data > @ 0x7efebb5ce8ca std::function<>::operator()() > @ 0x7efebb5c4b27 > _ZZN7process8internal8DispatchINS_6FutureINS_4http8ResponseclIRSt8functionIFS5_vS5_RKNS_4UPIDEOT_ENKUlPNS_11ProcessBaseEE_clESI_ > @ 0x7efebb5d4e1e > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchINS0_6FutureINS0_4http8ResponseclIRSt8functionIFS9_vS9_RKNS0_4UPIDEOT_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_ > @ 0x7efebcb30baf std::function<>::operator()() > @ 0x7efebcb13fd6 process::ProcessBase::visit() > @ 0x7efebcb1f3c8 process::DispatchEvent::visit() > @ 0x7efebb3ab2ea process::ProcessBase::serve() > @ 0x7efebcb0fe8a process::ProcessManager::resume() > @ 0x7efebcb0c5a3 > _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv > @ 0x7efebcb1ea34 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x7efebcb1e98a > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv > @ 0x7efebcb1e91a > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7efeb6980c80 (unknown) > @ 0x7efeb649c6ba start_thread > @ 0x7efeb61d282d (unknown) > Aborted (core dumped) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6917) Segfault when the executor sets a UUID that is not a valid v4 UUID
[ https://issues.apache.org/jira/browse/MESOS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-6917: --- Affects Version/s: 1.2.0 > Segfault when the executor sets a UUID that is not a valid v4 UUID > -- > > Key: MESOS-6917 > URL: https://issues.apache.org/jira/browse/MESOS-6917 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.2.0 >Reporter: Aaron Wood >Assignee: Aaron Wood >Priority: Blocker > > A segfault occurs when an executor sets a UUID that's not a valid v4 UUID and > sends it off to the agent: > {code} > ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state > == ERROR: Not a valid UUID > *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are > using GNU date *** > PC: @ 0x7efeb6101428 (unknown) > *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID > 14007; stack trace: *** > @ 0x7efeb64a6390 (unknown) > @ 0x7efeb6101428 (unknown) > @ 0x7efeb610302a (unknown) > @ 0x560df739fa6e _Abort() > @ 0x560df739fa9c _Abort() > @ 0x7efebb53a5ad Try<>::get() > @ 0x7efebb5363d6 Try<>::get() > @ 0x7efebbd84809 > mesos::internal::slave::validation::executor::call::validate() > @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor() > @ 0x7efebbc773b8 > _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_ > @ 0x7efebbcb5808 > _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E1_E9_M_invokeERKSt9_Any_dataS7_ > @ 0x7efebbfb2aea std::function<>::operator()() > @ 0x7efebcb158b8 > _ZZZN7process11ProcessBase6_visitERKNS0_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSD_14authentication20AuthenticationResultEEE0_clESN_ENKUlbE1_clEb > @ 0x7efebcb1a10a > _ZZZNK7process9_DeferredIZZNS_11ProcessBase6_visitERKNS1_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSE_14authentication20AuthenticationResultEEE0_clESO_EUlbE1_EcvSt8functionIFT_T0_EEINS_6FutureINSE_8ResponseEEERKbEEvENKUlS12_E_clES12_ENKUlvE_clEv > @ 0x7efebcb1c5f8 > _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEEvEZZNKS0_9_DeferredIZZNS0_11ProcessBase6_visitERKNS7_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_5OwnedINS2_7RequestNKUlRK6OptionINS2_14authentication20AuthenticationResultEEE0_clEST_EUlbE1_EcvSt8functionIFT_T0_EEIS4_RKbEEvENKUlS14_E_clES14_EUlvE_E9_M_invokeERKSt9_Any_data > @ 0x7efebb5ce8ca std::function<>::operator()() > @ 0x7efebb5c4b27 > _ZZN7process8internal8DispatchINS_6FutureINS_4http8ResponseclIRSt8functionIFS5_vS5_RKNS_4UPIDEOT_ENKUlPNS_11ProcessBaseEE_clESI_ > @ 0x7efebb5d4e1e > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchINS0_6FutureINS0_4http8ResponseclIRSt8functionIFS9_vS9_RKNS0_4UPIDEOT_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_ > @ 0x7efebcb30baf std::function<>::operator()() > @ 0x7efebcb13fd6 process::ProcessBase::visit() > @ 0x7efebcb1f3c8 process::DispatchEvent::visit() > @ 0x7efebb3ab2ea process::ProcessBase::serve() > @ 0x7efebcb0fe8a process::ProcessManager::resume() > @ 0x7efebcb0c5a3 > _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv > @ 0x7efebcb1ea34 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x7efebcb1e98a > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv > @ 0x7efebcb1e91a > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7efeb6980c80 (unknown) > @ 0x7efeb649c6ba start_thread > @ 0x7efeb61d282d (unknown) > Aborted (core dumped) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6907) FutureTest.After3 is flaky
[ https://issues.apache.org/jira/browse/MESOS-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822216#comment-15822216 ] Alexander Rojas commented on MESOS-6907: >From the behavior of the tests, and the snippets that _fix_ the issue, there >must be someone keeping around a reference to {{future.data}} for longer than >expected. The known references are kept in copies of the future: [one in the >caller|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/tests/future_tests.cpp#L275-L279], > which is destroyed with the call to after. The [other copy of the >future|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/future.hpp#L1481] > is kept in a {{Timer}} instance. However copies of this timer are moved >around. One copy of the timer is control by the {{future}} itself, and it is >stored in the vector of [{{onAny()}} >callbacks|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/future.hpp#L1483]. > This copy of the timer is destroyed when the [timer >expires|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/future.hpp#L1345] > or when the original [future is >satisfied|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/future.hpp#L1371-L1372] > (which doesn't happen in this test). There is at least one known more copy of the {{timer}} which is kept by the [{{Clock}} itself|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/clock.cpp#L281-L294]. However, libprocess itself gets involved managing the lifetime of the timers through a [callback function|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/clock.cpp#L70-L71] which is set when [libprocess is starting|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L1069]. My theory is that libprocess is keeping this callbacks for longer than expected, but so far I haven't been able to prove it. However, I think is perfectly normal that this behavior occurs and probably the test needs to be updated (this last paragraph is just a conjecture at this point). > FutureTest.After3 is flaky > -- > > Key: MESOS-6907 > URL: https://issues.apache.org/jira/browse/MESOS-6907 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Alexander Rojas > > There is apparently a race condition between the time an instance of > {{Future}} goes out of scope and when the enclosing data is actually > deleted, if {{Future::after(Duration, lambda::functionFuture&)>)}} is called. > The issue is more likely to occur if the machine is under load or if it is > not a very powerful one. The easiest way to reproduce it is to run: > {code} > $ stress -c 4 -t 2600 -d 2 -i 2 & > $ ./libprocess-tests --gtest_filter="FutureTest.After3" --gtest_repeat=-1 > --gtest_break_on_failure > {code} > An exploratory fix for the issue is to change the test to: > {code} > TEST(FutureTest, After3) > { > Future future; > process::WeakFuture weak_future(future); > EXPECT_SOME(weak_future.get()); > { > Clock::pause(); > // The original future disappears here. After this call the > // original future goes out of scope and should not be reachable > // anymore. > future = future > .after(Milliseconds(1), [](Future f) { > f.discard(); > return Nothing(); > }); > Clock::advance(Seconds(2)); > Clock::settle(); > AWAIT_READY(future); > } > if (weak_future.get().isSome()) { > os::sleep(Seconds(1)); > } > EXPECT_NONE(weak_future.get()); > EXPECT_FALSE(future.hasDiscard()); > } > {code} > The interesting thing of the fix is that both extra snippets are needed > (either one or the other is not enough) to prevent the issue from happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6010) Docker registry puller shows decode error "No response decoded".
[ https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822158#comment-15822158 ] Jie Yu edited comment on MESOS-6010 at 1/13/17 7:11 PM: Did some digging today on this issue. 1) As [~nfnt] mentioned, there is a proxy between agent and docker registry (e.g., squid) that is doing HTTP CONNECT tunneling (e.g., http://wiki.squid-cache.org/Features/HTTPS). 2) Recent versions of curl (after 7.11.1) starts to include the proxy response (200 Connection established) as well in the response (https://curl.haxx.se/mail/lib-2005-10/0024.html). 3) According to RFC, it's ok to have zero headers (See 4.1, "zero or more header fields", https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.1). 4) According to RFC, if no header is specified, client should read the response body till EOF (See 4.4, https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4). 5) I think our http parser does the right thing, since we didn't feed the decoder an EOF (i.e., decoder->decode("", 0)), the parser thinks that there will be more body content from the socket. 6) That results in "No response decoded". Some relevant thread: https://github.com/nodejs/node-v0.x-archive/issues/1711 https://github.com/nodejs/http-parser/issues/327 https://github.com/nodejs/node-v0.x-archive/issues/1956 https://curl.haxx.se/mail/lib-2005-10/0023.html http://stackoverflow.com/questions/16965530/what-to-do-with-extra-http-header-from-proxy was (Author: jieyu): Did some digging today on this issue. 1) As [~nfnt] mentioned, there is a proxy between agent and docker registry (e.g., squid) that is doing HTTP CONNECT tunneling (e.g., http://wiki.squid-cache.org/Features/HTTPS). 2) Recent versions of curl (after 7.11.1) starts to include the proxy response (200 Connection established) as well in the response (https://curl.haxx.se/mail/lib-2005-10/0024.html). 3) According to RFC, it's ok to have zero headers (See 4.1, "zero or more header fields", https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.1). 4) According to RFC, if no header is specified, client should read the response body till EOF (See 4.4, https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4). 5) I think our http parser does the right thing, since we didn't feed the decoder an EOF (i.e., decoder->decode("", 0)), the parser thinks that there will be more body content from the socket. 6) That results in "No response decoded". Some relevant thread: https://github.com/nodejs/node-v0.x-archive/issues/1711 https://github.com/nodejs/http-parser/issues/327 https://github.com/nodejs/node-v0.x-archive/issues/1956 https://curl.haxx.se/mail/lib-2005-10/0023.html > Docker registry puller shows decode error "No response decoded". > > > Key: MESOS-6010 > URL: https://issues.apache.org/jira/browse/MESOS-6010 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.0, 1.0.1 >Reporter: Sunzhe >Assignee: Jan Schlicht >Priority: Critical > Labels: Docker, mesos-containerizer > > The {{mesos-agent}} flags: > {code} > GLOG_v=1 ./bin/mesos-agent.sh \ > --master=zk://${MESOS_MASTER_IP}:2181/mesos \ > --ip=10.100.3.3 \ > --work_dir=${MESOS_WORK_DIR} \ > > --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux > \ > --enforce_container_disk_quota \ > --containerizers=mesos \ > --image_providers=docker \ > --executor_environment_variables="{}" > {code} > And the {{mesos-execute}} flags: > {code} > ./src/mesos-execute \ >--master=${MESOS_MASTER_IP}:5050 \ >--name=${INSTANCE_NAME} \ >--docker_image=${DOCKER_IMAGE} \ >--framework_capabilities=GPU_RESOURCES \ >--shell=false > {code} > But when {{./src/mesos-execute}}, the errors like below: > {code} > I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0 > I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at > master@10.103.0.125:5050 > Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009' > Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to decode HTTP responses: No > response decoded > HTTP/1.1 200 Connection established > HTTP/1.1 401 Unauthorized > Content-Type: application/json; charset=utf-8 > Docker-Distribution-Api-Version: registry/2.0 > Www-Authenticate: Bearer > realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull; > Date: Tue, 09 Aug 2016 08:10:32 GMT >
[jira] [Comment Edited] (MESOS-6010) Docker registry puller shows decode error "No response decoded".
[ https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822158#comment-15822158 ] Jie Yu edited comment on MESOS-6010 at 1/13/17 7:07 PM: Did some digging today on this issue. 1) As [~nfnt] mentioned, there is a proxy between agent and docker registry (e.g., squid) that is doing HTTP CONNECT tunneling (e.g., http://wiki.squid-cache.org/Features/HTTPS). 2) Recent versions of curl (after 7.11.1) starts to include the proxy response (200 Connection established) as well in the response (https://curl.haxx.se/mail/lib-2005-10/0024.html). 3) According to RFC, it's ok to have zero headers (See 4.1, "zero or more header fields", https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.1). 4) According to RFC, if no header is specified, client should read the response body till EOF (See 4.4, https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4). 5) I think our http parser does the right thing, since we didn't feed the decoder an EOF (i.e., decoder->decode("", 0)), the parser thinks that there will be more body content from the socket. 6) That results in "No response decoded". Some relevant thread: https://github.com/nodejs/node-v0.x-archive/issues/1711 https://github.com/nodejs/http-parser/issues/327 https://github.com/nodejs/node-v0.x-archive/issues/1956 https://curl.haxx.se/mail/lib-2005-10/0023.html was (Author: jieyu): Did some digging today on this issue. 1) As [~nfnt] mentioned, there is a proxy between agent and docker registry (e.g., squid) that is doing HTTP CONNECT tunneling (e.g., http://wiki.squid-cache.org/Features/HTTPS). 2) Recent versions of curl (after 7.11.1) starts to include the proxy response (200 Connection established) as well in the response (https://curl.haxx.se/mail/lib-2005-10/0024.html). 3) According to RFC, it's ok to have zero headers (See 4.1, "zero or more header fields", https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.1). 4) According to RFC, if no header is specified, client should read the response body till EOF (See 4.4, https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4). 5) I think our http parser does the right thing, since we didn't feed the decoder an EOF (i.e., decoder->decode("", 0)), the parser thinks that there will be more body content from the socket. 6) That results in "No response decoded". Some relevant thread: https://github.com/nodejs/node-v0.x-archive/issues/1711 https://github.com/nodejs/http-parser/issues/327 https://github.com/nodejs/node-v0.x-archive/issues/1956 > Docker registry puller shows decode error "No response decoded". > > > Key: MESOS-6010 > URL: https://issues.apache.org/jira/browse/MESOS-6010 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.0, 1.0.1 >Reporter: Sunzhe >Assignee: Jan Schlicht >Priority: Critical > Labels: Docker, mesos-containerizer > > The {{mesos-agent}} flags: > {code} > GLOG_v=1 ./bin/mesos-agent.sh \ > --master=zk://${MESOS_MASTER_IP}:2181/mesos \ > --ip=10.100.3.3 \ > --work_dir=${MESOS_WORK_DIR} \ > > --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux > \ > --enforce_container_disk_quota \ > --containerizers=mesos \ > --image_providers=docker \ > --executor_environment_variables="{}" > {code} > And the {{mesos-execute}} flags: > {code} > ./src/mesos-execute \ >--master=${MESOS_MASTER_IP}:5050 \ >--name=${INSTANCE_NAME} \ >--docker_image=${DOCKER_IMAGE} \ >--framework_capabilities=GPU_RESOURCES \ >--shell=false > {code} > But when {{./src/mesos-execute}}, the errors like below: > {code} > I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0 > I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at > master@10.103.0.125:5050 > Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009' > Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to decode HTTP responses: No > response decoded > HTTP/1.1 200 Connection established > HTTP/1.1 401 Unauthorized > Content-Type: application/json; charset=utf-8 > Docker-Distribution-Api-Version: registry/2.0 > Www-Authenticate: Bearer > realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull; > Date: Tue, 09 Aug 2016 08:10:32 GMT > Content-Length: 145 > Strict-Transport-Security: max-age=31536000 > {"errors":[{"code":"UNAUTHORIZED","message":"authentication >
[jira] [Comment Edited] (MESOS-6010) Docker registry puller shows decode error "No response decoded".
[ https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822158#comment-15822158 ] Jie Yu edited comment on MESOS-6010 at 1/13/17 6:57 PM: Did some digging today on this issue. 1) As [~nfnt] mentioned, there is a proxy between agent and docker registry (e.g., squid) that is doing HTTP CONNECT tunneling (e.g., http://wiki.squid-cache.org/Features/HTTPS). 2) Recent versions of curl (after 7.11.1) starts to include the proxy response (200 Connection established) as well in the response (https://curl.haxx.se/mail/lib-2005-10/0024.html). 3) According to RFC, it's ok to have zero headers (See 4.1, "zero or more header fields", https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.1). 4) According to RFC, if no header is specified, client should read the response body till EOF (See 4.4, https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4). 5) I think our http parser does the right thing, since we didn't feed the decoder an EOF (i.e., decoder->decode("", 0)), the parser thinks that there will be more body content from the socket. 6) That results in "No response decoded". Some relevant thread: https://github.com/nodejs/node-v0.x-archive/issues/1711 https://github.com/nodejs/http-parser/issues/327 https://github.com/nodejs/node-v0.x-archive/issues/1956 was (Author: jieyu): Did some digging today on this issue. 1) As [~nfnt] mentioned, there is a proxy between agent and docker registry (e.g., squid) that is doing HTTP CONNECT tunneling (e.g., http://wiki.squid-cache.org/Features/HTTPS). 2) Recent versions of curl (after 7.11.1) starts to include the proxy response (200 Connection established) as well in the response (https://curl.haxx.se/mail/lib-2005-10/0024.html). 3) According to RFC, it's ok to have zero headers (See 4.1, "zero or more header fields", https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.1). 4) According to RFC, if no header is specified, client should read the response body till EOF (See 4.4, https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4). 5) I think out http parser does the right thing, since we didn't feed the decoder an EOF (i.e., decoder->decode("", 0)), the parser thinks that there will be more body content from the socket. 6) That results in "No response decoded". Some relevant thread: https://github.com/nodejs/node-v0.x-archive/issues/1711 https://github.com/nodejs/http-parser/issues/327 https://github.com/nodejs/node-v0.x-archive/issues/1956 > Docker registry puller shows decode error "No response decoded". > > > Key: MESOS-6010 > URL: https://issues.apache.org/jira/browse/MESOS-6010 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.0, 1.0.1 >Reporter: Sunzhe >Assignee: Jan Schlicht >Priority: Critical > Labels: Docker, mesos-containerizer > > The {{mesos-agent}} flags: > {code} > GLOG_v=1 ./bin/mesos-agent.sh \ > --master=zk://${MESOS_MASTER_IP}:2181/mesos \ > --ip=10.100.3.3 \ > --work_dir=${MESOS_WORK_DIR} \ > > --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux > \ > --enforce_container_disk_quota \ > --containerizers=mesos \ > --image_providers=docker \ > --executor_environment_variables="{}" > {code} > And the {{mesos-execute}} flags: > {code} > ./src/mesos-execute \ >--master=${MESOS_MASTER_IP}:5050 \ >--name=${INSTANCE_NAME} \ >--docker_image=${DOCKER_IMAGE} \ >--framework_capabilities=GPU_RESOURCES \ >--shell=false > {code} > But when {{./src/mesos-execute}}, the errors like below: > {code} > I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0 > I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at > master@10.103.0.125:5050 > Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009' > Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to decode HTTP responses: No > response decoded > HTTP/1.1 200 Connection established > HTTP/1.1 401 Unauthorized > Content-Type: application/json; charset=utf-8 > Docker-Distribution-Api-Version: registry/2.0 > Www-Authenticate: Bearer > realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull; > Date: Tue, 09 Aug 2016 08:10:32 GMT > Content-Length: 145 > Strict-Transport-Security: max-age=31536000 > {"errors":[{"code":"UNAUTHORIZED","message":"authentication >
[jira] [Commented] (MESOS-6010) Docker registry puller shows decode error "No response decoded".
[ https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822158#comment-15822158 ] Jie Yu commented on MESOS-6010: --- Did some digging today on this issue. 1) As [~nfnt] mentioned, there is a proxy between agent and docker registry (e.g., squid) that is doing HTTP CONNECT tunneling (e.g., http://wiki.squid-cache.org/Features/HTTPS). 2) Recent versions of curl (after 7.11.1) starts to include the proxy response (200 Connection established) as well in the response (https://curl.haxx.se/mail/lib-2005-10/0024.html). 3) According to RFC, it's ok to have zero headers (See 4.1, "zero or more header fields", https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.1). 4) According to RFC, if no header is specified, client should read the response body till EOF (See 4.4, https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4). 5) I think out http parser does the right thing, since we didn't feed the decoder an EOF (i.e., decoder->decode("", 0)), the parser thinks that there will be more body content from the socket. 6) That results in "No response decoded". Some relevant thread: https://github.com/nodejs/node-v0.x-archive/issues/1711 https://github.com/nodejs/http-parser/issues/327 https://github.com/nodejs/node-v0.x-archive/issues/1956 > Docker registry puller shows decode error "No response decoded". > > > Key: MESOS-6010 > URL: https://issues.apache.org/jira/browse/MESOS-6010 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.0, 1.0.1 >Reporter: Sunzhe >Assignee: Jan Schlicht >Priority: Critical > Labels: Docker, mesos-containerizer > > The {{mesos-agent}} flags: > {code} > GLOG_v=1 ./bin/mesos-agent.sh \ > --master=zk://${MESOS_MASTER_IP}:2181/mesos \ > --ip=10.100.3.3 \ > --work_dir=${MESOS_WORK_DIR} \ > > --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux > \ > --enforce_container_disk_quota \ > --containerizers=mesos \ > --image_providers=docker \ > --executor_environment_variables="{}" > {code} > And the {{mesos-execute}} flags: > {code} > ./src/mesos-execute \ >--master=${MESOS_MASTER_IP}:5050 \ >--name=${INSTANCE_NAME} \ >--docker_image=${DOCKER_IMAGE} \ >--framework_capabilities=GPU_RESOURCES \ >--shell=false > {code} > But when {{./src/mesos-execute}}, the errors like below: > {code} > I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0 > I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at > master@10.103.0.125:5050 > Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009' > Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to decode HTTP responses: No > response decoded > HTTP/1.1 200 Connection established > HTTP/1.1 401 Unauthorized > Content-Type: application/json; charset=utf-8 > Docker-Distribution-Api-Version: registry/2.0 > Www-Authenticate: Bearer > realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull; > Date: Tue, 09 Aug 2016 08:10:32 GMT > Content-Length: 145 > Strict-Transport-Security: max-age=31536000 > {"errors":[{"code":"UNAUTHORIZED","message":"authentication > required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]} > ; Container destroyed while provisioning images' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6922) SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky
Greg Mann created MESOS-6922: Summary: SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky Key: MESOS-6922 URL: https://issues.apache.org/jira/browse/MESOS-6922 Project: Mesos Issue Type: Bug Components: tests Environment: CentOS 7 Reporter: Greg Mann This was observed on ASF CI. Find attached the log from a failed run; it appears that too many status updates are being received: {code} /mesos/src/tests/slave_recovery_tests.cpp:1350: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0x7ffcf00155b8, @0x2b3f4f7ab8c0 120-byte object <50-66 6A-45 3F-2B 00-00 00-00 00-00 00-00 00-00 DF-13 00-00 00-00 00-00 70-59 01-90 3F-2B 00-00 A0-D7 00-90 3F-2B 00-00 05-00 00-00 01-00 00-00 D0-01 91-04 00-00 00-00 D0-9C 00-90 3F-2B 00-00 C0-EB 01-90 3F-2B 00-00 18-00 00-00 00-2B 00-00 47-98 7C-B9 92-29 D6-41 90-5B 02-90 3F-2B 00-00 00-00 00-00 00-00 00-00 70-6E 01-90 3F-2B 00-00 00-00 00-00 00-00 00-00>) Expected: to be called once Actual: called twice - over-saturated and active {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6922) SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky
[ https://issues.apache.org/jira/browse/MESOS-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-6922: - Attachment: SlaveRecoveryTest.RecoverTerminatedExecutor.txt > SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky > -- > > Key: MESOS-6922 > URL: https://issues.apache.org/jira/browse/MESOS-6922 > Project: Mesos > Issue Type: Bug > Components: tests > Environment: CentOS 7 >Reporter: Greg Mann > Labels: tests > Attachments: SlaveRecoveryTest.RecoverTerminatedExecutor.txt > > > This was observed on ASF CI. Find attached the log from a failed run; it > appears that too many status updates are being received: > {code} > /mesos/src/tests/slave_recovery_tests.cpp:1350: Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0x7ffcf00155b8, @0x2b3f4f7ab8c0 120-byte > object <50-66 6A-45 3F-2B 00-00 00-00 00-00 00-00 00-00 DF-13 00-00 00-00 > 00-00 70-59 01-90 3F-2B 00-00 A0-D7 00-90 3F-2B 00-00 05-00 00-00 01-00 00-00 > D0-01 91-04 00-00 00-00 D0-9C 00-90 3F-2B 00-00 C0-EB 01-90 3F-2B 00-00 18-00 > 00-00 00-2B 00-00 47-98 7C-B9 92-29 D6-41 90-5B 02-90 3F-2B 00-00 00-00 00-00 > 00-00 00-00 70-6E 01-90 3F-2B 00-00 00-00 00-00 00-00 00-00>) > Expected: to be called once >Actual: called twice - over-saturated and active > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6921) Document posix isolators could not isolate resources in configuration.md
haosdent huang created MESOS-6921: - Summary: Document posix isolators could not isolate resources in configuration.md Key: MESOS-6921 URL: https://issues.apache.org/jira/browse/MESOS-6921 Project: Mesos Issue Type: Improvement Components: documentation Reporter: haosdent huang Priority: Trivial POSIX isolators only report resource usage without do any actual isolation. We should make this more obviously in {{slave/flags.cpp}} and configuration.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6920) Validate the UUID in Master::statusUpdate.
[ https://issues.apache.org/jira/browse/MESOS-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822023#comment-15822023 ] James Peach commented on MESOS-6920: | Validate the StatusUpdate UUID in Master::statusUpdate. | https://reviews.apache.org/r/55509/ | > Validate the UUID in Master::statusUpdate. > -- > > Key: MESOS-6920 > URL: https://issues.apache.org/jira/browse/MESOS-6920 > Project: Mesos > Issue Type: Bug >Reporter: James Peach >Assignee: James Peach > > Validate the UUID in Master::statusUpdate() to avoid the possibility of > triggering a CHECK when logging the {{StatusUpdate}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6920) Validate the UUID in Master::statusUpdate.
[ https://issues.apache.org/jira/browse/MESOS-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach updated MESOS-6920: --- Shepherd: Yan Xu > Validate the UUID in Master::statusUpdate. > -- > > Key: MESOS-6920 > URL: https://issues.apache.org/jira/browse/MESOS-6920 > Project: Mesos > Issue Type: Bug >Reporter: James Peach >Assignee: James Peach > > Validate the UUID in Master::statusUpdate() to avoid the possibility of > triggering a CHECK when logging the {{StatusUpdate}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess
[ https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821985#comment-15821985 ] haosdent huang commented on MESOS-5342: --- Hi, [~klueska][~ct.clmsn] I read the file briefly before and going to read it again tmr. The CgroupsCpushareIsolatorProcess have changed to CpuSubsytem. And some Huawei guys are adding NUMA/cpuset support to Mesos recently, they implementation consider cpuset like network ports which more simpler than the proposal in https://docs.google.com/document/d/1G3L1Tdulg5iW7hZ2WXbG-bqROILu7zdBh2aWYu3An6A/ . I would try to add some comments tomorrow and see if we could merge both Huawei guys and [~ct.clmsn] works in the proposal. > CPU pinning/binding support for CgroupsCpushareIsolatorProcess > -- > > Key: MESOS-5342 > URL: https://issues.apache.org/jira/browse/MESOS-5342 > Project: Mesos > Issue Type: Improvement > Components: cgroups, containerization >Affects Versions: 0.28.1 >Reporter: Chris > Labels: cgroups, cpu, cpu-usage, gpu, isolation, isolator, > mentor, perfomance > > The cgroups isolator currently lacks support for binding (also called > pinning) containers to a set of cores. The GNU/Linux kernel is known to make > sub-optimal core assignments for processes and threads. Poor assignments > impact program performance, specifically in terms of cache locality. > Applications requiring GPU resources can benefit from this feature by getting > access to cores closest to the GPU hardware, which reduces cpu-gpu copy > latency. > Most cluster management systems from the HPC community (SLURM) provide both > cgroup isolation and cpu binding. This feature would provide similar > capabilities. The current interest in supporting Intel's Cache Allocation > Technology, and the advent of Intel's Knights-series processors, will require > making choices about where container's are going to run on the mesos-agent's > processor(s) cores - this feature is a step toward developing a robust > solution. > The improvement in this JIRA ticket will handle hardware topology detection, > track container-to-core utilization in a histogram, and use a mathematical > optimization technique to select cores for container assignment based on > latency and the container-to-core utilization histogram. > For GPU tasks, the improvement will prioritize selection of cores based on > latency between the GPU and cores in an effort to minimize copy latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6920) Validate the UUID in Master::statusUpdate.
James Peach created MESOS-6920: -- Summary: Validate the UUID in Master::statusUpdate. Key: MESOS-6920 URL: https://issues.apache.org/jira/browse/MESOS-6920 Project: Mesos Issue Type: Bug Reporter: James Peach Validate the UUID in Master::statusUpdate() to avoid the possibility of triggering a CHECK when logging the {{StatusUpdate}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6920) Validate the UUID in Master::statusUpdate.
[ https://issues.apache.org/jira/browse/MESOS-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-6920: -- Assignee: James Peach > Validate the UUID in Master::statusUpdate. > -- > > Key: MESOS-6920 > URL: https://issues.apache.org/jira/browse/MESOS-6920 > Project: Mesos > Issue Type: Bug >Reporter: James Peach >Assignee: James Peach > > Validate the UUID in Master::statusUpdate() to avoid the possibility of > triggering a CHECK when logging the {{StatusUpdate}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6010) Docker registry puller shows decode error "No response decoded".
[ https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-6010: -- Target Version/s: 1.1.1, 1.2.0 (was: 1.1.1, 1.2.0, 1.0.3) > Docker registry puller shows decode error "No response decoded". > > > Key: MESOS-6010 > URL: https://issues.apache.org/jira/browse/MESOS-6010 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.0, 1.0.1 >Reporter: Sunzhe >Assignee: Jan Schlicht >Priority: Critical > Labels: Docker, mesos-containerizer > > The {{mesos-agent}} flags: > {code} > GLOG_v=1 ./bin/mesos-agent.sh \ > --master=zk://${MESOS_MASTER_IP}:2181/mesos \ > --ip=10.100.3.3 \ > --work_dir=${MESOS_WORK_DIR} \ > > --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux > \ > --enforce_container_disk_quota \ > --containerizers=mesos \ > --image_providers=docker \ > --executor_environment_variables="{}" > {code} > And the {{mesos-execute}} flags: > {code} > ./src/mesos-execute \ >--master=${MESOS_MASTER_IP}:5050 \ >--name=${INSTANCE_NAME} \ >--docker_image=${DOCKER_IMAGE} \ >--framework_capabilities=GPU_RESOURCES \ >--shell=false > {code} > But when {{./src/mesos-execute}}, the errors like below: > {code} > I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0 > I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at > master@10.103.0.125:5050 > Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009' > Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to decode HTTP responses: No > response decoded > HTTP/1.1 200 Connection established > HTTP/1.1 401 Unauthorized > Content-Type: application/json; charset=utf-8 > Docker-Distribution-Api-Version: registry/2.0 > Www-Authenticate: Bearer > realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull; > Date: Tue, 09 Aug 2016 08:10:32 GMT > Content-Length: 145 > Strict-Transport-Security: max-age=31536000 > {"errors":[{"code":"UNAUTHORIZED","message":"authentication > required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]} > ; Container destroyed while provisioning images' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6010) Docker registry puller shows decode error "No response decoded".
[ https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-6010: -- Target Version/s: 1.1.1, 1.2.0, 1.0.3 (was: 1.2.0) > Docker registry puller shows decode error "No response decoded". > > > Key: MESOS-6010 > URL: https://issues.apache.org/jira/browse/MESOS-6010 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.0, 1.0.1 >Reporter: Sunzhe >Assignee: Jan Schlicht >Priority: Critical > Labels: Docker, mesos-containerizer > > The {{mesos-agent}} flags: > {code} > GLOG_v=1 ./bin/mesos-agent.sh \ > --master=zk://${MESOS_MASTER_IP}:2181/mesos \ > --ip=10.100.3.3 \ > --work_dir=${MESOS_WORK_DIR} \ > > --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux > \ > --enforce_container_disk_quota \ > --containerizers=mesos \ > --image_providers=docker \ > --executor_environment_variables="{}" > {code} > And the {{mesos-execute}} flags: > {code} > ./src/mesos-execute \ >--master=${MESOS_MASTER_IP}:5050 \ >--name=${INSTANCE_NAME} \ >--docker_image=${DOCKER_IMAGE} \ >--framework_capabilities=GPU_RESOURCES \ >--shell=false > {code} > But when {{./src/mesos-execute}}, the errors like below: > {code} > I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0 > I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at > master@10.103.0.125:5050 > Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009' > Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to decode HTTP responses: No > response decoded > HTTP/1.1 200 Connection established > HTTP/1.1 401 Unauthorized > Content-Type: application/json; charset=utf-8 > Docker-Distribution-Api-Version: registry/2.0 > Www-Authenticate: Bearer > realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull; > Date: Tue, 09 Aug 2016 08:10:32 GMT > Content-Length: 145 > Strict-Transport-Security: max-age=31536000 > {"errors":[{"code":"UNAUTHORIZED","message":"authentication > required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]} > ; Container destroyed while provisioning images' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5342) CPU pinning/binding support for CgroupsCpushareIsolatorProcess
[ https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821948#comment-15821948 ] Kevin Klues commented on MESOS-5342: Is anyone currently shepherding this? [~jieyu] [~kaysoky] Has anyone reviewed Chris's design doc? > CPU pinning/binding support for CgroupsCpushareIsolatorProcess > -- > > Key: MESOS-5342 > URL: https://issues.apache.org/jira/browse/MESOS-5342 > Project: Mesos > Issue Type: Improvement > Components: cgroups, containerization >Affects Versions: 0.28.1 >Reporter: Chris > Labels: cgroups, cpu, cpu-usage, gpu, isolation, isolator, > mentor, perfomance > > The cgroups isolator currently lacks support for binding (also called > pinning) containers to a set of cores. The GNU/Linux kernel is known to make > sub-optimal core assignments for processes and threads. Poor assignments > impact program performance, specifically in terms of cache locality. > Applications requiring GPU resources can benefit from this feature by getting > access to cores closest to the GPU hardware, which reduces cpu-gpu copy > latency. > Most cluster management systems from the HPC community (SLURM) provide both > cgroup isolation and cpu binding. This feature would provide similar > capabilities. The current interest in supporting Intel's Cache Allocation > Technology, and the advent of Intel's Knights-series processors, will require > making choices about where container's are going to run on the mesos-agent's > processor(s) cores - this feature is a step toward developing a robust > solution. > The improvement in this JIRA ticket will handle hardware topology detection, > track container-to-core utilization in a histogram, and use a mathematical > optimization technique to select cores for container assignment based on > latency and the container-to-core utilization histogram. > For GPU tasks, the improvement will prioritize selection of cores based on > latency between the GPU and cores in an effort to minimize copy latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-6919) Libprocess reinit code leaks SSL server socket FD
[ https://issues.apache.org/jira/browse/MESOS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-6919: Assignee: Greg Mann > Libprocess reinit code leaks SSL server socket FD > - > > Key: MESOS-6919 > URL: https://issues.apache.org/jira/browse/MESOS-6919 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Greg Mann >Assignee: Greg Mann > Labels: libprocess, ssl > > After [this commit|https://github.com/apache/mesos/commit/789e9f7], it was > discovered that tests which use {{process::reinitialize}} to switch between > SSL and non-SSL modes will leak the file descriptor associated with the > server socket {{\_\_s\_\_}}. This can be reproduced by running the following > trivial test in repetition: > {code} > diff --git a/src/tests/scheduler_tests.cpp b/src/tests/scheduler_tests.cpp > index 1ff423f..d5fd575 100644 > --- a/src/tests/scheduler_tests.cpp > +++ b/src/tests/scheduler_tests.cpp > @@ -1821,6 +1821,12 @@ INSTANTIATE_TEST_CASE_P( > #endif // USE_SSL_SOCKET > +TEST_P(SchedulerSSLTest, LeakTest) > +{ > + ::sleep(1); > +} > + > + > // Tests that a scheduler can subscribe, run a task, and then tear itself > down. > TEST_P(SchedulerSSLTest, RunTaskAndTeardown) > { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6919) Libprocess reinit code leaks SSL server socket FD
[ https://issues.apache.org/jira/browse/MESOS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-6919: - Description: After [this commit|https://github.com/apache/mesos/commit/789e9f7], it was discovered that tests which use {{process::reinitialize}} to switch between SSL and non-SSL modes will leak the file descriptor associated with the server socket {{\_\_s\_\_}}. This can be reproduced by running the following trivial test in repetition: {code} diff --git a/src/tests/scheduler_tests.cpp b/src/tests/scheduler_tests.cpp index 1ff423f..d5fd575 100644 --- a/src/tests/scheduler_tests.cpp +++ b/src/tests/scheduler_tests.cpp @@ -1821,6 +1821,12 @@ INSTANTIATE_TEST_CASE_P( #endif // USE_SSL_SOCKET +TEST_P(SchedulerSSLTest, LeakTest) +{ + ::sleep(1); +} + + // Tests that a scheduler can subscribe, run a task, and then tear itself down. TEST_P(SchedulerSSLTest, RunTaskAndTeardown) { {code} was: After [this commit|https://github.com/apache/mesos/commit/789e9f7], it was discovered that tests which use {{process::reinitialize}} to switch between SSL and non-SSL modes will leak the file descriptor associated with the server socket {{__s__}}. This can be reproduced by running the following trivial test in repetition: {code} diff --git a/src/tests/scheduler_tests.cpp b/src/tests/scheduler_tests.cpp index 1ff423f..d5fd575 100644 --- a/src/tests/scheduler_tests.cpp +++ b/src/tests/scheduler_tests.cpp @@ -1821,6 +1821,12 @@ INSTANTIATE_TEST_CASE_P( #endif // USE_SSL_SOCKET +TEST_P(SchedulerSSLTest, LeakTest) +{ + ::sleep(1); +} + + // Tests that a scheduler can subscribe, run a task, and then tear itself down. TEST_P(SchedulerSSLTest, RunTaskAndTeardown) { {code} > Libprocess reinit code leaks SSL server socket FD > - > > Key: MESOS-6919 > URL: https://issues.apache.org/jira/browse/MESOS-6919 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Greg Mann > Labels: libprocess, ssl > > After [this commit|https://github.com/apache/mesos/commit/789e9f7], it was > discovered that tests which use {{process::reinitialize}} to switch between > SSL and non-SSL modes will leak the file descriptor associated with the > server socket {{\_\_s\_\_}}. This can be reproduced by running the following > trivial test in repetition: > {code} > diff --git a/src/tests/scheduler_tests.cpp b/src/tests/scheduler_tests.cpp > index 1ff423f..d5fd575 100644 > --- a/src/tests/scheduler_tests.cpp > +++ b/src/tests/scheduler_tests.cpp > @@ -1821,6 +1821,12 @@ INSTANTIATE_TEST_CASE_P( > #endif // USE_SSL_SOCKET > +TEST_P(SchedulerSSLTest, LeakTest) > +{ > + ::sleep(1); > +} > + > + > // Tests that a scheduler can subscribe, run a task, and then tear itself > down. > TEST_P(SchedulerSSLTest, RunTaskAndTeardown) > { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6919) Libprocess reinit code leaks SSL server socket FD
Greg Mann created MESOS-6919: Summary: Libprocess reinit code leaks SSL server socket FD Key: MESOS-6919 URL: https://issues.apache.org/jira/browse/MESOS-6919 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Greg Mann After [this commit|https://github.com/apache/mesos/commit/789e9f7], it was discovered that tests which use {{process::reinitialize}} to switch between SSL and non-SSL modes will leak the file descriptor associated with the server socket {{__s__}}. This can be reproduced by running the following trivial test in repetition: {code} diff --git a/src/tests/scheduler_tests.cpp b/src/tests/scheduler_tests.cpp index 1ff423f..d5fd575 100644 --- a/src/tests/scheduler_tests.cpp +++ b/src/tests/scheduler_tests.cpp @@ -1821,6 +1821,12 @@ INSTANTIATE_TEST_CASE_P( #endif // USE_SSL_SOCKET +TEST_P(SchedulerSSLTest, LeakTest) +{ + ::sleep(1); +} + + // Tests that a scheduler can subscribe, run a task, and then tear itself down. TEST_P(SchedulerSSLTest, RunTaskAndTeardown) { {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6918) Prometheus exporter endpoints for metrics
James Peach created MESOS-6918: -- Summary: Prometheus exporter endpoints for metrics Key: MESOS-6918 URL: https://issues.apache.org/jira/browse/MESOS-6918 Project: Mesos Issue Type: Bug Components: statistics Reporter: James Peach There are a couple of [Prometheus|https://prometheus.io] metrics exporters for Mesos, of varying quality. Since the Mesos stats system actually knows about statistics data types and semantics, and Mesos has reasonable HTTP support we could add Prometheus metrics endpoints to directly expose statistics in [Prometheus wire format|https://prometheus.io/docs/instrumenting/exposition_formats/], removing the need for operators to run separate exporter processes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6917) Segfault when the executor sets a UUID that is not a valid v4 UUID
Aaron Wood created MESOS-6917: - Summary: Segfault when the executor sets a UUID that is not a valid v4 UUID Key: MESOS-6917 URL: https://issues.apache.org/jira/browse/MESOS-6917 Project: Mesos Issue Type: Bug Affects Versions: 1.1.0, 1.0.2, 1.0.1, 1.0.0 Reporter: Aaron Wood Assignee: Aaron Wood Priority: Blocker A segfault occurs when an executor sets a UUID that's not a valid v4 UUID and sends it off to the agent: {code} ABORT: (../../3rdparty/stout/include/stout/try.hpp:77): Try::get() but state == ERROR: Not a valid UUID *** Aborted at 1484262968 (unix time) try "date -d @1484262968" if you are using GNU date *** PC: @ 0x7efeb6101428 (unknown) *** SIGABRT (@0x36b7) received by PID 14007 (TID 0x7efeabd29700) from PID 14007; stack trace: *** @ 0x7efeb64a6390 (unknown) @ 0x7efeb6101428 (unknown) @ 0x7efeb610302a (unknown) @ 0x560df739fa6e _Abort() @ 0x560df739fa9c _Abort() @ 0x7efebb53a5ad Try<>::get() @ 0x7efebb5363d6 Try<>::get() @ 0x7efebbd84809 mesos::internal::slave::validation::executor::call::validate() @ 0x7efebbb59b36 mesos::internal::slave::Slave::Http::executor() @ 0x7efebbc773b8 _ZZN5mesos8internal5slave5Slave10initializeEvENKUlRKN7process4http7RequestEE1_clES7_ @ 0x7efebbcb5808 _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E1_E9_M_invokeERKSt9_Any_dataS7_ @ 0x7efebbfb2aea std::function<>::operator()() @ 0x7efebcb158b8 _ZZZN7process11ProcessBase6_visitERKNS0_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSD_14authentication20AuthenticationResultEEE0_clESN_ENKUlbE1_clEb @ 0x7efebcb1a10a _ZZZNK7process9_DeferredIZZNS_11ProcessBase6_visitERKNS1_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5OwnedINS_4http7RequestNKUlRK6OptionINSE_14authentication20AuthenticationResultEEE0_clESO_EUlbE1_EcvSt8functionIFT_T0_EEINS_6FutureINSE_8ResponseEEERKbEEvENKUlS12_E_clES12_ENKUlvE_clEv @ 0x7efebcb1c5f8 _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEEvEZZNKS0_9_DeferredIZZNS0_11ProcessBase6_visitERKNS7_12HttpEndpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_5OwnedINS2_7RequestNKUlRK6OptionINS2_14authentication20AuthenticationResultEEE0_clEST_EUlbE1_EcvSt8functionIFT_T0_EEIS4_RKbEEvENKUlS14_E_clES14_EUlvE_E9_M_invokeERKSt9_Any_data @ 0x7efebb5ce8ca std::function<>::operator()() @ 0x7efebb5c4b27 _ZZN7process8internal8DispatchINS_6FutureINS_4http8ResponseclIRSt8functionIFS5_vS5_RKNS_4UPIDEOT_ENKUlPNS_11ProcessBaseEE_clESI_ @ 0x7efebb5d4e1e _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchINS0_6FutureINS0_4http8ResponseclIRSt8functionIFS9_vS9_RKNS0_4UPIDEOT_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_ @ 0x7efebcb30baf std::function<>::operator()() @ 0x7efebcb13fd6 process::ProcessBase::visit() @ 0x7efebcb1f3c8 process::DispatchEvent::visit() @ 0x7efebb3ab2ea process::ProcessBase::serve() @ 0x7efebcb0fe8a process::ProcessManager::resume() @ 0x7efebcb0c5a3 _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv @ 0x7efebcb1ea34 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE @ 0x7efebcb1e98a _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv @ 0x7efebcb1e91a _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7efeb6980c80 (unknown) @ 0x7efeb649c6ba start_thread @ 0x7efeb61d282d (unknown) Aborted (core dumped) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6907) FutureTest.After3 is flaky
[ https://issues.apache.org/jira/browse/MESOS-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rojas updated MESOS-6907: --- Description: There is apparently a race condition between the time an instance of {{Future}} goes out of scope and when the enclosing data is actually deleted, if {{Future::after(Duration, lambda::function)}} is called. The issue is more likely to occur if the machine is under load or if it is not a very powerful one. The easiest way to reproduce it is to run: {code} $ stress -c 4 -t 2600 -d 2 -i 2 & $ ./libprocess-tests --gtest_filter="FutureTest.After3" --gtest_repeat=-1 --gtest_break_on_failure {code} An exploratory fix for the issue is to change the test to: {code} TEST(FutureTest, After3) { Future future; process::WeakFuture weak_future(future); EXPECT_SOME(weak_future.get()); { Clock::pause(); // The original future disappears here. After this call the // original future goes out of scope and should not be reachable // anymore. future = future .after(Milliseconds(1), [](Future f) { f.discard(); return Nothing(); }); Clock::advance(Seconds(2)); Clock::settle(); AWAIT_READY(future); } if (weak_future.get().isSome()) { os::sleep(Seconds(1)); } EXPECT_NONE(weak_future.get()); EXPECT_FALSE(future.hasDiscard()); } {code} The interesting thing of the fix is that both extra snippets are needed (either one or the other is not enough) to prevent the issue from happening. was: After playing with the latest patch solving MESOS-6484 we found out that the modifications done introduce a flakyness in the test {{FutureTest.After3}}. The flakyness occurs, depending on the machine and the load of it between once every 1 runs and once every 50 runs, being most likely a race condition in the code. To reproduce run: {code} ${MESOS_BUILD_DIR}/3rdparty/libprocess/libprocess-tests --gtest_filter="*.After3" --gtest_repeat=-1 --gtest_break_on_failure {code} > FutureTest.After3 is flaky > -- > > Key: MESOS-6907 > URL: https://issues.apache.org/jira/browse/MESOS-6907 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Alexander Rojas > > There is apparently a race condition between the time an instance of > {{Future}} goes out of scope and when the enclosing data is actually > deleted, if {{Future::after(Duration, lambda::function Future&)>)}} is called. > The issue is more likely to occur if the machine is under load or if it is > not a very powerful one. The easiest way to reproduce it is to run: > {code} > $ stress -c 4 -t 2600 -d 2 -i 2 & > $ ./libprocess-tests --gtest_filter="FutureTest.After3" --gtest_repeat=-1 > --gtest_break_on_failure > {code} > An exploratory fix for the issue is to change the test to: > {code} > TEST(FutureTest, After3) > { > Future future; > process::WeakFuture weak_future(future); > EXPECT_SOME(weak_future.get()); > { > Clock::pause(); > // The original future disappears here. After this call the > // original future goes out of scope and should not be reachable > // anymore. > future = future > .after(Milliseconds(1), [](Future f) { > f.discard(); > return Nothing(); > }); > Clock::advance(Seconds(2)); > Clock::settle(); > AWAIT_READY(future); > } > if (weak_future.get().isSome()) { > os::sleep(Seconds(1)); > } > EXPECT_NONE(weak_future.get()); > EXPECT_FALSE(future.hasDiscard()); > } > {code} > The interesting thing of the fix is that both extra snippets are needed > (either one or the other is not enough) to prevent the issue from happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6864) Container Exec should be possible with tasks belonging to a task group
[ https://issues.apache.org/jira/browse/MESOS-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821809#comment-15821809 ] Gastón Kleiman commented on MESOS-6864: --- https://reviews.apache.org/r/55464/ > Container Exec should be possible with tasks belonging to a task group > -- > > Key: MESOS-6864 > URL: https://issues.apache.org/jira/browse/MESOS-6864 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman >Priority: Blocker > Labels: debugging, mesosphere > > {{LaunchNestedContainerSession}} currently requires the parent container to > be an Executor > (https://github.com/apache/mesos/blob/f89f28724f5837ff414dc6cc84e1afb63f3306e5/src/slave/http.cpp#L2189-L2211). > This works for command tasks, because the task container id is the same as > the executor container id. > But it won't work for pod tasks whose container id is different from > executor’s container id. > In order to resolve this ticket, we need to allow launching a child container > at an arbitrary level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6916) Improve health checks validation
Gastón Kleiman created MESOS-6916: - Summary: Improve health checks validation Key: MESOS-6916 URL: https://issues.apache.org/jira/browse/MESOS-6916 Project: Mesos Issue Type: Bug Reporter: Gastón Kleiman The "general" fields should also be validated (i.e., `timeout_seconds`), similar to what's done in https://reviews.apache.org/r/55458/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6010) Docker registry puller shows decode error "No response decoded".
[ https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821580#comment-15821580 ] Jan Schlicht commented on MESOS-6010: - The root cause isn't a problem or bug of {{http-parser}} but ambiguities on how to deal with HTTP responses. An HTTP response _should_ indicate the length of its body by setting the {{Content-Length}} header. But when this header isn't set, this could mean different things: a) The response doesn't have a body or b) We have to somehow figure out the length of the body. b) is something we cannot do because {{process::ResponseDecoder}} should support parsing multiple HTTP responses in a single string and we wouldn't be able to tell where a body ends and a new response starts. Hence a), assuming that a response doesn't have a body when {{Content-Length}} isn't set, can resolve this problem. > Docker registry puller shows decode error "No response decoded". > > > Key: MESOS-6010 > URL: https://issues.apache.org/jira/browse/MESOS-6010 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.0, 1.0.1 >Reporter: Sunzhe >Assignee: Jan Schlicht >Priority: Critical > Labels: Docker, mesos-containerizer > > The {{mesos-agent}} flags: > {code} > GLOG_v=1 ./bin/mesos-agent.sh \ > --master=zk://${MESOS_MASTER_IP}:2181/mesos \ > --ip=10.100.3.3 \ > --work_dir=${MESOS_WORK_DIR} \ > > --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux > \ > --enforce_container_disk_quota \ > --containerizers=mesos \ > --image_providers=docker \ > --executor_environment_variables="{}" > {code} > And the {{mesos-execute}} flags: > {code} > ./src/mesos-execute \ >--master=${MESOS_MASTER_IP}:5050 \ >--name=${INSTANCE_NAME} \ >--docker_image=${DOCKER_IMAGE} \ >--framework_capabilities=GPU_RESOURCES \ >--shell=false > {code} > But when {{./src/mesos-execute}}, the errors like below: > {code} > I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0 > I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at > master@10.103.0.125:5050 > Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009' > Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1' > Received status update TASK_FAILED for task 'test' > message: 'Failed to launch container: Failed to decode HTTP responses: No > response decoded > HTTP/1.1 200 Connection established > HTTP/1.1 401 Unauthorized > Content-Type: application/json; charset=utf-8 > Docker-Distribution-Api-Version: registry/2.0 > Www-Authenticate: Bearer > realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull; > Date: Tue, 09 Aug 2016 08:10:32 GMT > Content-Length: 145 > Strict-Transport-Security: max-age=31536000 > {"errors":[{"code":"UNAUTHORIZED","message":"authentication > required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]} > ; Container destroyed while provisioning images' > source: SOURCE_AGENT > reason: REASON_CONTAINER_LAUNCH_FAILED > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)