[jira] [Updated] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically
[ https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-7643: Target Version/s: 1.2.2, 1.3.1, 1.4.0, 1.1.3 Labels: isolation (was: ) > The order of isolators provided in '--isolation' flag is not preserved and > instead sorted alphabetically > > > Key: MESOS-7643 > URL: https://issues.apache.org/jira/browse/MESOS-7643 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.1.2, 1.2.0, 1.3.0 >Reporter: Michael Cherny >Assignee: Gilbert Song > Labels: isolation > > According to documentation and comments in code the order of the entries in > the --isolation flag should specify the ordering of the isolators. > Specifically, the > `create` and `prepare` calls for each isolator should run serially in the > order in which they appear in the --isolation flag, while the `cleanup` call > should be serialized in reverse order (with exception of filesystem isolator > which is always first). > But in fact, the isolators provided in '--isolation' flag are sorted > alphabetically. > That happens in [this line of > code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377]. > In this line use of 'set' is done (apparently instead of list or > vector) and set is a sorted container. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7173) CMake does not define `GIT_SHA` etc. in build.cpp
[ https://issues.apache.org/jira/browse/MESOS-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Schwartzmeyer updated MESOS-7173: Shepherd: Joseph Wu (was: Alex Clemmer) > CMake does not define `GIT_SHA` etc. in build.cpp > - > > Key: MESOS-7173 > URL: https://issues.apache.org/jira/browse/MESOS-7173 > Project: Mesos > Issue Type: Bug > Environment: CMake >Reporter: Andrew Schwartzmeyer >Assignee: Andrew Schwartzmeyer >Priority: Minor > Labels: cmake, microsoft, windows > > `build.cpp` expects `BUILD_GIT_{SHA,BRANCH,TAG}` to be defined by the build > system (CMake) but they are not currently. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7603) longjmp error in libcurl
[ https://issues.apache.org/jira/browse/MESOS-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065463#comment-16065463 ] Charles Allen commented on MESOS-7603: -- This was resolved internally by linking libcurl against c-ares > longjmp error in libcurl > > > Key: MESOS-7603 > URL: https://issues.apache.org/jira/browse/MESOS-7603 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.2.0 >Reporter: Charles Allen > > We encountered the following error when the fetcher tries to run on a mesos > 1.2.0 agent through systemd: > {code} > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: *** longjmp causes > uninitialized stack frame ***: /usr/sbin/mesos-agent terminated > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Backtrace: > = > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libc.so.6(+0x71c07)[0x7f8d08f5fc07] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libc.so.6(__fortify_fail+0x47)[0x7f8d08fedb17] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libc.so.6(+0xff56d)[0x7f8d08fed56d] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libc.so.6(__longjmp_chk+0x38)[0x7f8d08fed4c8] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libcurl.so.4(+0xae34)[0x7f8d08519e34] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libpthread.so.0(+0x116b0)[0x7f8d098386b0] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libpthread.so.0(pthread_cond_wait+0xbf)[0x7f8d0983448f] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libstdc++.so.6(_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE+0x2b)[0x7f8d095968ab] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libmesos-1.2.0.so(_ZN7process14ProcessManager4waitERKNS_4UPIDE+0x328)[0x7f8d0b47f3d8] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libmesos-1.2.0.so(_ZN7process4waitERKNS_4UPIDERK8Duration+0x2e7)[0x7f8d0b486117] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /usr/sbin/mesos-agent(+0x12810)[0x557e1d691810] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /lib64/libc.so.6(__libc_start_main+0xfc)[0x7f8d08f0e93c] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: > /usr/sbin/mesos-agent(+0x139c9)[0x557e1d6929c9] > Jun 01 22:55:53 ip-172-19-68-109 mesos-agent[103454]: === Memory map: > > {code} > It looks like this error: > https://stackoverflow.com/questions/9191668/error-longjmp-causes-uninitialized-stack-frame > > Where the solution is either set {{curl_easy_setopt(curl, CURLOPT_NOSIGNAL, > 1)}} or use a special config option to libcurl -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically
[ https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-7643: - Assignee: Gilbert Song > The order of isolators provided in '--isolation' flag is not preserved and > instead sorted alphabetically > > > Key: MESOS-7643 > URL: https://issues.apache.org/jira/browse/MESOS-7643 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.1.2, 1.2.0, 1.3.0 >Reporter: Michael Cherny >Assignee: Gilbert Song > > According to documentation and comments in code the order of the entries in > the --isolation flag should specify the ordering of the isolators. > Specifically, the > `create` and `prepare` calls for each isolator should run serially in the > order in which they appear in the --isolation flag, while the `cleanup` call > should be serialized in reverse order (with exception of filesystem isolator > which is always first). > But in fact, the isolators provided in '--isolation' flag are sorted > alphabetically. > That happens in [this line of > code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377]. > In this line use of 'set' is done (apparently instead of list or > vector) and set is a sorted container. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-4092) Try to re-establish connection on ping timeouts with agent before removing it
[ https://issues.apache.org/jira/browse/MESOS-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065357#comment-16065357 ] Ilya Pronin commented on MESOS-4092: Looks like our problem here is that we use our health-check for detecting remote-peer failure and link failure, but don't distinguish them. When a connection breaks, libprocess issues {{ExitedEvent}} and opens a new connection when required. But in the case of a network problem a relatively long time may pass before TCP retransmissions limit is reached and the connection is declared dead. One possible solution can be to try using the aforementioned "relink" functionality at some point during agent pinging. We can use a strategy similar to the one used by TCP: after N consecutive failed pings "relink" before sending the next ping. Plus a similar thing on the agent's side. Another possible solution can be to use TCP keepalive mechanism tuned to "detect" broken connections faster than {{agent_ping_timeout * max_agent_ping_timeouts}}. Or we can mess with TCP user timeout, but IMO it's a road to hell and AFAIK user timeout is available only on Linux. > Try to re-establish connection on ping timeouts with agent before removing it > - > > Key: MESOS-4092 > URL: https://issues.apache.org/jira/browse/MESOS-4092 > Project: Mesos > Issue Type: Improvement > Components: master >Affects Versions: 0.25.0 >Reporter: Ian Downes > > The SlaveObserver will trigger an agent to be removed after > {{flags.max_slave_ping_timeouts}} timeouts of {{flags.slave_ping_timeout}}. > This can occur because of transient network failures, e.g., gray failures of > a switch uplink exhibiting heavy or total packet loss. Some network > architectures are designed to tolerate such gray failures and support > multiple paths between hosts. This can be implemented with equal-cost > multi-path routing (ECMP) where flows are hashed by their 5-tuple to multiple > possible uplinks. In such networks re-establishing a TCP connection will > almost certainly use a new source port and thus will likely be hashed to a > different uplink, avoiding the failed uplink and re-establishing connectivity > with the agent. > After failing to receive pongs the SlaveObserver should next try to > re-establish a TCP connection (with exponential back-off) before declaring > the agent as lost. This can avoid significant disruption where large numbers > of agents reached through a single failed link could be removed unnecessarily > while still ensuring that agents that are truly lost are recognized as such. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-4092) Try to re-establish connection on ping timeouts with agent before removing it
[ https://issues.apache.org/jira/browse/MESOS-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065357#comment-16065357 ] Ilya Pronin edited comment on MESOS-4092 at 6/27/17 7:43 PM: - Seems that our problem here is that we use our health-check for detecting remote-peer failure and link failure, but don't distinguish them. When a connection breaks, libprocess issues {{ExitedEvent}} and opens a new connection when required. But in the case of a network problem a relatively long time may pass before TCP retransmissions limit is reached and the connection is declared dead. One possible solution can be to try using the aforementioned "relink" functionality at some point during agent pinging. We can use a strategy similar to the one used by TCP: after N consecutive failed pings "relink" before sending the next ping. Plus a similar thing on the agent's side. Another possible solution can be to use TCP keepalive mechanism tuned to "detect" broken connections faster than {{agent_ping_timeout * max_agent_ping_timeouts}}. Or we can mess with TCP user timeout, but IMO it's a road to hell and AFAIK user timeout is available only on Linux. was (Author: ipronin): Looks like our problem here is that we use our health-check for detecting remote-peer failure and link failure, but don't distinguish them. When a connection breaks, libprocess issues {{ExitedEvent}} and opens a new connection when required. But in the case of a network problem a relatively long time may pass before TCP retransmissions limit is reached and the connection is declared dead. One possible solution can be to try using the aforementioned "relink" functionality at some point during agent pinging. We can use a strategy similar to the one used by TCP: after N consecutive failed pings "relink" before sending the next ping. Plus a similar thing on the agent's side. Another possible solution can be to use TCP keepalive mechanism tuned to "detect" broken connections faster than {{agent_ping_timeout * max_agent_ping_timeouts}}. Or we can mess with TCP user timeout, but IMO it's a road to hell and AFAIK user timeout is available only on Linux. > Try to re-establish connection on ping timeouts with agent before removing it > - > > Key: MESOS-4092 > URL: https://issues.apache.org/jira/browse/MESOS-4092 > Project: Mesos > Issue Type: Improvement > Components: master >Affects Versions: 0.25.0 >Reporter: Ian Downes > > The SlaveObserver will trigger an agent to be removed after > {{flags.max_slave_ping_timeouts}} timeouts of {{flags.slave_ping_timeout}}. > This can occur because of transient network failures, e.g., gray failures of > a switch uplink exhibiting heavy or total packet loss. Some network > architectures are designed to tolerate such gray failures and support > multiple paths between hosts. This can be implemented with equal-cost > multi-path routing (ECMP) where flows are hashed by their 5-tuple to multiple > possible uplinks. In such networks re-establishing a TCP connection will > almost certainly use a new source port and thus will likely be hashed to a > different uplink, avoiding the failed uplink and re-establishing connectivity > with the agent. > After failing to receive pongs the SlaveObserver should next try to > re-establish a TCP connection (with exponential back-off) before declaring > the agent as lost. This can avoid significant disruption where large numbers > of agents reached through a single failed link could be removed unnecessarily > while still ensuring that agents that are truly lost are recognized as such. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7634) OsTest.ChownNoAccess fails on s390x machines
[ https://issues.apache.org/jira/browse/MESOS-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065327#comment-16065327 ] Vinod Kone commented on MESOS-7634: --- Sorry for the delay in getting back. It definitely did fail on the CI job. https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-s390x-WIP/14/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,label_exp=mesos/console https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-s390x-WIP/14/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,label_exp=mesos/console Do you have ssh access to the VM that runs these Jenkins jobs? It is labeled "mesos1". Maybe try to see if you can repro on that particular VM (running as Jenkins user). > OsTest.ChownNoAccess fails on s390x machines > > > Key: MESOS-7634 > URL: https://issues.apache.org/jira/browse/MESOS-7634 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone > > Running a custom branch of Mesos (with some fixes in docker build scripts for > s390x) on s390x based CI machines throws the following error when running > stout tests. > {code} > [ RUN ] OsTest.ChownNoAccess > ../../../../3rdparty/stout/tests/os_tests.cpp:839: Failure > Value of: os::chown(uid.get(), gid.get(), "one", true).isError() > Actual: false > Expected: true > ../../../../3rdparty/stout/tests/os_tests.cpp:840: Failure > Value of: os::chown(uid.get(), gid.get(), "one/two", true).isError() > Actual: false > {code} > One can repro this by building Mesos from my custom branch here: > https://github.com/vinodkone/mesos/tree/vinod/s390x -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7728) Java HTTP adapter crashes JVM when leading master disconnects.
[ https://issues.apache.org/jira/browse/MESOS-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7728: --- Target Version/s: 1.2.2, 1.3.1, 1.4.0, 1.1.3 > Java HTTP adapter crashes JVM when leading master disconnects. > -- > > Key: MESOS-7728 > URL: https://issues.apache.org/jira/browse/MESOS-7728 > Project: Mesos > Issue Type: Bug > Components: java api >Affects Versions: 1.1.2, 1.2.1, 1.3.0 >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: mesosphere > > When a Java scheduler using HTTP v0-v1 adapter loses the leading Mesos > master, {{V0ToV1AdapterProcess::disconnected()}} is invoked, which in turn > invokes Java scheduler [code via > JNI|https://github.com/apache/mesos/blob/87c38b9e2bc5b1030a071ddf0aab69db70d64781/src/java/jni/org_apache_mesos_v1_scheduler_V0Mesos.cpp#L446]. > This call uses the wrong object, {{jmesos}} instead of {{jscheduler}}, which > crashes JVM: > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4bca3849bf, pid=21, tid=0x7f4b2ac45700 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build > 1.8.0_131-b11) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # V [libjvm.so+0x6d39bf] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, > _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x1af > {noformat} > {noformat} > Stack: [0x7f4b2a445000,0x7f4b2ac46000], sp=0x7f4b2ac44a80, free > space=8190k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > V [libjvm.so+0x6d39bf] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, > _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x1af > V [libjvm.so+0x6d7fef] jni_CallVoidMethodV+0x10f > C [libmesos-1.2.0.so+0x1aa32d3] JNIEnv_::CallVoidMethod(_jobject*, > _jmethodID*, ...)+0x93 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7729) ExamplesTest.DynamicReservationFramework is flaky
Vinod Kone created MESOS-7729: - Summary: ExamplesTest.DynamicReservationFramework is flaky Key: MESOS-7729 URL: https://issues.apache.org/jira/browse/MESOS-7729 Project: Mesos Issue Type: Bug Reporter: Vinod Kone Observed this on ASF CI {code} [ RUN ] ExamplesTest.DynamicReservationFramework Using temporary directory '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN' /mesos/mesos-1.4.0/src/tests/dynamic_reservation_framework_test.sh: line 19: /mesos/mesos-1.4.0/_build/src/colors.sh: No such file or directory /mesos/mesos-1.4.0/src/tests/dynamic_reservation_framework_test.sh: line 20: /mesos/mesos-1.4.0/_build/src/atexit.sh: No such file or directory WARNING: Logging before InitGoogleLogging() is written to STDERR I0627 16:04:20.661948 8847 process.cpp:1282] libprocess is initialized on 172.17.0.3:37113 with 16 worker threads I0627 16:04:20.662199 8847 logging.cpp:199] Logging to STDERR I0627 16:04:20.674317 8847 leveldb.cpp:174] Opened db in 3.343216ms I0627 16:04:20.675655 8847 leveldb.cpp:181] Compacted db in 1.264481ms I0627 16:04:20.675797 8847 leveldb.cpp:196] Created db iterator in 89655ns I0627 16:04:20.675829 8847 leveldb.cpp:202] Seeked to beginning of db in 5551ns I0627 16:04:20.675848 8847 leveldb.cpp:271] Iterated through 0 keys in the db in 1133ns I0627 16:04:20.676103 8847 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0627 16:04:20.680465 8873 recover.cpp:451] Starting replica recovery I0627 16:04:20.681649 8873 recover.cpp:477] Replica is in EMPTY status I0627 16:04:20.682160 8847 local.cpp:272] Creating default 'local' authorizer I0627 16:04:20.684504 8884 replica.cpp:676] Replica in EMPTY status received a broadcasted recover request from __req_res__(1)@172.17.0.3:37113 I0627 16:04:20.685750 8882 recover.cpp:197] Received a recover response from a replica in EMPTY status I0627 16:04:20.686617 8877 recover.cpp:568] Updating replica status to STARTING I0627 16:04:20.688508 8877 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 741914ns I0627 16:04:20.688544 8881 master.cpp:438] Master c1c3a180-5bd3-42fa-b84e-e2c30aba7364 (089cb2cc2625) started on 172.17.0.3:37113 I0627 16:04:20.688551 8877 replica.cpp:322] Persisted replica status to STARTING I0627 16:04:20.689095 8878 recover.cpp:477] Replica is in STARTING status I0627 16:04:20.688582 8881 master.cpp:440] Flags at startup: --acls="permissive: true register_frameworks { principals { type: ANY } roles { type: SOME values: "test" } } " --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.4.0/src/webui" --work_dir="/tmp/mesos-9H3Est/master" --zk_session_timeout="10secs" I0627 16:04:20.689460 8881 master.cpp:492] Master allowing unauthenticated frameworks to register I0627 16:04:20.689476 8881 master.cpp:506] Master allowing unauthenticated agents to register I0627 16:04:20.689482 8881 master.cpp:520] Master allowing HTTP frameworks to register without authentication I0627 16:04:20.689494 8881 credentials.hpp:37] Loading credentials for authentication from '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials' W0627 16:04:20.689620 8881 credentials.hpp:52] Permissions on credentials file '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials' are too open; it is recommended that your credentials file is NOT accessible by others I0627 16:04:20.689817 8881 master.cpp:562] Using default 'crammd5' authenticator I0627 16:04:20.690021 8881 authenticator.cpp:520] Initializing server SASL I0627 16:04:20.690615 8878 replica.cpp:676] Replica in STARTING status received a broadcasted recover request from __req_res__(2)@172.17.0.3:37113 I0627
[jira] [Created] (MESOS-7728) Java HTTP adapter crashes JVM when leading master disconnects.
Alexander Rukletsov created MESOS-7728: -- Summary: Java HTTP adapter crashes JVM when leading master disconnects. Key: MESOS-7728 URL: https://issues.apache.org/jira/browse/MESOS-7728 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 1.3.0, 1.2.1, 1.1.2 Reporter: Alexander Rukletsov Assignee: Alexander Rukletsov When a Java scheduler using HTTP v0-v1 adapter loses the leading Mesos master, {{V0ToV1AdapterProcess::disconnected()}} is invoked, which in turn invokes Java scheduler [code via JNI|https://github.com/apache/mesos/blob/87c38b9e2bc5b1030a071ddf0aab69db70d64781/src/java/jni/org_apache_mesos_v1_scheduler_V0Mesos.cpp#L446]. This call uses the wrong object, {{jmesos}} instead of {{jscheduler}}, which crashes JVM: {noformat} # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f4bca3849bf, pid=21, tid=0x7f4b2ac45700 # # JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x6d39bf] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x1af {noformat} {noformat} Stack: [0x7f4b2a445000,0x7f4b2ac46000], sp=0x7f4b2ac44a80, free space=8190k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x6d39bf] jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x1af V [libjvm.so+0x6d7fef] jni_CallVoidMethodV+0x10f C [libmesos-1.2.0.so+0x1aa32d3] JNIEnv_::CallVoidMethod(_jobject*, _jmethodID*, ...)+0x93 {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7725) PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-7725: -- Description: Observed this on ASF CI. {code} [ RUN ] PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval I0627 15:20:33.687146 30773 cluster.cpp:162] Creating default 'local' authorizer I0627 15:20:33.691745 30795 master.cpp:438] Master d8d232e5-1689-4780-b232-c91e5c3277b1 (0b1049f05548) started on 172.17.0.2:44357 I0627 15:20:33.691800 30795 master.cpp:440] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="50ms" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/Wg4Ouh/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/Wg4Ouh/master" --zk_session_timeout="10secs" I0627 15:20:33.692142 30795 master.cpp:490] Master only allowing authenticated frameworks to register I0627 15:20:33.692150 30795 master.cpp:504] Master only allowing authenticated agents to register I0627 15:20:33.692154 30795 master.cpp:517] Master only allowing authenticated HTTP frameworks to register I0627 15:20:33.692160 30795 credentials.hpp:37] Loading credentials for authentication from '/tmp/Wg4Ouh/credentials' I0627 15:20:33.692463 30795 master.cpp:562] Using default 'crammd5' authenticator I0627 15:20:33.692612 30795 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0627 15:20:33.692831 30795 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0627 15:20:33.692942 30795 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0627 15:20:33.693061 30795 master.cpp:642] Authorization enabled W0627 15:20:33.693076 30795 master.cpp:705] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information I0627 15:20:33.693354 30780 hierarchical.cpp:169] Initialized hierarchical allocator process I0627 15:20:33.693359 30782 whitelist_watcher.cpp:77] No whitelist given I0627 15:20:33.695943 30795 master.cpp:2161] Elected as the leading master! I0627 15:20:33.695960 30795 master.cpp:1700] Recovering from registrar I0627 15:20:33.696193 30795 registrar.cpp:345] Recovering registrar I0627 15:20:33.697032 30795 registrar.cpp:389] Successfully fetched the registry (0B) in 811008ns I0627 15:20:33.697147 30795 registrar.cpp:493] Applied 1 operations in 40183ns; attempting to update the registry I0627 15:20:33.697922 30792 registrar.cpp:550] Successfully updated the registry in 709120ns I0627 15:20:33.698020 30792 registrar.cpp:422] Successfully recovered registrar I0627 15:20:33.698490 30789 master.cpp:1799] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0627 15:20:33.698511 30784 hierarchical.cpp:207] Skipping recovery of hierarchical allocator: nothing to recover I0627 15:20:33.707849 30773 containerizer.cpp:230] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret W0627 15:20:33.708729 30773 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0627 15:20:33.708909 30773 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0627 15:20:33.708955 30773 provisioner.cpp:255] Using default backend 'copy' I0627 15:20:33.711526 30773 cluster.cpp:448] Creating default 'local' authorizer I0627 15:20:33.714450 30776 slave.cpp:249] Mesos agent started on (451)@172.17.0.2:44357 I0627 15:20:33.714649 30776 slave.cpp:250] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/PersistentVolumeEndpointsTest_ReserveAndSlaveRemoval_RnxQRd/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs"
[jira] [Updated] (MESOS-7725) PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-7725: -- Shepherd: Vinod Kone Sprint: Mesosphere Sprint 58 Story Points: 3 > PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval test is flaky > -- > > Key: MESOS-7725 > URL: https://issues.apache.org/jira/browse/MESOS-7725 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Assignee: Neil Conway > Labels: flaky-test, mesosphere-oncall > > Observed this on ASF CI. > {code} > [ RUN ] PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval > I0627 15:20:33.687146 30773 cluster.cpp:162] Creating default 'local' > authorizer > I0627 15:20:33.691745 30795 master.cpp:438] Master > d8d232e5-1689-4780-b232-c91e5c3277b1 (0b1049f05548) started on > 172.17.0.2:44357 > I0627 15:20:33.691800 30795 master.cpp:440] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="50ms" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/Wg4Ouh/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/Wg4Ouh/master" > --zk_session_timeout="10secs" > I0627 15:20:33.692142 30795 master.cpp:490] Master only allowing > authenticated frameworks to register > I0627 15:20:33.692150 30795 master.cpp:504] Master only allowing > authenticated agents to register > I0627 15:20:33.692154 30795 master.cpp:517] Master only allowing > authenticated HTTP frameworks to register > I0627 15:20:33.692160 30795 credentials.hpp:37] Loading credentials for > authentication from '/tmp/Wg4Ouh/credentials' > I0627 15:20:33.692463 30795 master.cpp:562] Using default 'crammd5' > authenticator > I0627 15:20:33.692612 30795 http.cpp:974] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0627 15:20:33.692831 30795 http.cpp:974] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0627 15:20:33.692942 30795 http.cpp:974] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0627 15:20:33.693061 30795 master.cpp:642] Authorization enabled > W0627 15:20:33.693076 30795 master.cpp:705] The '--roles' flag is deprecated. > This flag will be removed in the future. See the Mesos 0.27 upgrade notes for > more information > I0627 15:20:33.693354 30780 hierarchical.cpp:169] Initialized hierarchical > allocator process > I0627 15:20:33.693359 30782 whitelist_watcher.cpp:77] No whitelist given > I0627 15:20:33.695943 30795 master.cpp:2161] Elected as the leading master! > I0627 15:20:33.695960 30795 master.cpp:1700] Recovering from registrar > I0627 15:20:33.696193 30795 registrar.cpp:345] Recovering registrar > I0627 15:20:33.697032 30795 registrar.cpp:389] Successfully fetched the > registry (0B) in 811008ns > I0627 15:20:33.697147 30795 registrar.cpp:493] Applied 1 operations in > 40183ns; attempting to update the registry > I0627 15:20:33.697922 30792 registrar.cpp:550] Successfully updated the > registry in 709120ns > I0627 15:20:33.698020 30792 registrar.cpp:422] Successfully recovered > registrar > I0627 15:20:33.698490 30789 master.cpp:1799] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0627 15:20:33.698511 30784 hierarchical.cpp:207] Skipping recovery of > hierarchical allocator: nothing to recover > I0627 15:20:33.707849 30773 containerizer.cpp:230] Using isolation: > posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret > W0627 15:20:33.708729 30773 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W0627 15:20:33.708909 30773 backend.cpp:76] Failed to create 'bind' backend:
[jira] [Commented] (MESOS-3968) DiskQuotaTest.SlaveRecovery is flaky
[ https://issues.apache.org/jira/browse/MESOS-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065232#comment-16065232 ] Vinod Kone commented on MESOS-3968: --- Observed the same thing Neil observed in ASF CI. {code} [ RUN ] DiskQuotaTest.SlaveRecovery I0627 11:28:25.636018 4587 cluster.cpp:162] Creating default 'local' authorizer I0627 11:28:25.641643 4609 master.cpp:438] Master 08082988-3d7b-4a23-8092-66781efb5f6f (18fe836728c1) started on 172.17.0.4:34439 I0627 11:28:25.641682 4609 master.cpp:440] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/I85Ccm/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" --work_dir="/tmp/I85Ccm/master" --zk_session_timeout="10secs" I0627 11:28:25.642195 4609 master.cpp:490] Master only allowing authenticated frameworks to register I0627 11:28:25.642216 4609 master.cpp:504] Master only allowing authenticated agents to register I0627 11:28:25.642230 4609 master.cpp:517] Master only allowing authenticated HTTP frameworks to register I0627 11:28:25.642247 4609 credentials.hpp:37] Loading credentials for authentication from '/tmp/I85Ccm/credentials' I0627 11:28:25.642676 4609 master.cpp:562] Using default 'crammd5' authenticator I0627 11:28:25.642896 4609 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0627 11:28:25.643079 4609 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0627 11:28:25.643203 4609 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0627 11:28:25.643312 4609 master.cpp:642] Authorization enabled I0627 11:28:25.643540 4611 hierarchical.cpp:169] Initialized hierarchical allocator process I0627 11:28:25.643767 4613 whitelist_watcher.cpp:77] No whitelist given I0627 11:28:25.647075 4607 master.cpp:2161] Elected as the leading master! I0627 11:28:25.647130 4607 master.cpp:1700] Recovering from registrar I0627 11:28:25.647503 4610 registrar.cpp:345] Recovering registrar I0627 11:28:25.652940 4610 registrar.cpp:389] Successfully fetched the registry (0B) in 5.362176ms I0627 11:28:25.653300 4610 registrar.cpp:493] Applied 1 operations in 161908ns; attempting to update the registry I0627 11:28:25.654299 4610 registrar.cpp:550] Successfully updated the registry in 913920ns I0627 11:28:25.654633 4610 registrar.cpp:422] Successfully recovered registrar I0627 11:28:25.655278 4611 master.cpp:1799] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0627 11:28:25.655741 4612 hierarchical.cpp:207] Skipping recovery of hierarchical allocator: nothing to recover I0627 11:28:25.661547 4587 containerizer.cpp:230] Using isolation: posix/cpu,posix/mem,disk/du,filesystem/posix,network/cni,environment_secret W0627 11:28:25.662441 4587 backend.cpp:76] Failed to create 'overlay' backend: OverlayBackend requires root privileges W0627 11:28:25.662691 4587 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0627 11:28:25.662744 4587 provisioner.cpp:255] Using default backend 'copy' I0627 11:28:25.672664 4587 cluster.cpp:448] Creating default 'local' authorizer I0627 11:28:25.677569 4612 slave.cpp:249] Mesos agent started on (42)@172.17.0.4:34439 I0627 11:28:25.677613 4612 slave.cpp:250] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/DiskQuotaTest_SlaveRecovery_IXTvqp/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
[jira] [Updated] (MESOS-3968) DiskQuotaTest.SlaveRecovery is flaky
[ https://issues.apache.org/jira/browse/MESOS-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-3968: -- Labels: flaky-test mesosphere mesosphere-oncall (was: flaky-test mesosphere) > DiskQuotaTest.SlaveRecovery is flaky > > > Key: MESOS-3968 > URL: https://issues.apache.org/jira/browse/MESOS-3968 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Mahler > Labels: flaky-test, mesosphere, mesosphere-oncall > > {noformat: title=Failed Run} > [ RUN ] DiskQuotaTest.SlaveRecovery > I1120 12:02:54.015383 29806 leveldb.cpp:176] Opened db in 2.965411ms > I1120 12:02:54.018033 29806 leveldb.cpp:183] Compacted db in 2.585354ms > I1120 12:02:54.018175 29806 leveldb.cpp:198] Created db iterator in 27134ns > I1120 12:02:54.018275 29806 leveldb.cpp:204] Seeked to beginning of db in > 3025ns > I1120 12:02:54.018375 29806 leveldb.cpp:273] Iterated through 0 keys in the > db in 679ns > I1120 12:02:54.018491 29806 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1120 12:02:54.021386 29838 recover.cpp:449] Starting replica recovery > I1120 12:02:54.021692 29838 recover.cpp:475] Replica is in EMPTY status > I1120 12:02:54.022189 29827 master.cpp:367] Master > 9a3c45ec-28b3-49e6-a83f-1f2035cc1105 (a51e6bb03b55) started on > 172.17.5.188:41228 > I1120 12:02:54.022212 29827 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/DsMniF/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" > --work_dir="/tmp/DsMniF/master" --zk_session_timeout="10secs" > I1120 12:02:54.022557 29827 master.cpp:414] Master only allowing > authenticated frameworks to register > I1120 12:02:54.022569 29827 master.cpp:419] Master only allowing > authenticated slaves to register > I1120 12:02:54.022578 29827 credentials.hpp:37] Loading credentials for > authentication from '/tmp/DsMniF/credentials' > I1120 12:02:54.022896 29827 master.cpp:458] Using default 'crammd5' > authenticator > I1120 12:02:54.023217 29827 master.cpp:495] Authorization enabled > I1120 12:02:54.023512 29831 whitelist_watcher.cpp:79] No whitelist given > I1120 12:02:54.023814 29833 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (562)@172.17.5.188:41228 > I1120 12:02:54.023519 29832 hierarchical.cpp:153] Initialized hierarchical > allocator process > I1120 12:02:54.025997 29831 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1120 12:02:54.027042 29832 recover.cpp:566] Updating replica status to > STARTING > I1120 12:02:54.027354 29830 master.cpp:1612] The newly elected leader is > master@172.17.5.188:41228 with id 9a3c45ec-28b3-49e6-a83f-1f2035cc1105 > I1120 12:02:54.027385 29830 master.cpp:1625] Elected as the leading master! > I1120 12:02:54.027403 29830 master.cpp:1385] Recovering from registrar > I1120 12:02:54.027679 29830 registrar.cpp:309] Recovering registrar > I1120 12:02:54.028439 29840 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.195171ms > I1120 12:02:54.028539 29840 replica.cpp:323] Persisted replica status to > STARTING > I1120 12:02:54.028944 29840 recover.cpp:475] Replica is in STARTING status > I1120 12:02:54.030910 29840 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (563)@172.17.5.188:41228 > I1120 12:02:54.031429 29840 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1120 12:02:54.032032 29840 recover.cpp:566] Updating replica status to VOTING > I1120 12:02:54.032816 29840 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 496492ns > I1120 12:02:54.032982 29840 replica.cpp:323] Persisted replica status to > VOTING > I1120 12:02:54.033254 29840 recover.cpp:580] Successfully joined the Paxos > group > I1120 12:02:54.033562 29840 recover.cpp:464] Recover process terminated > I1120 12:02:54.034631 29839 log.cpp:661] Attempting to start the writer > I1120 12:02:54.036386 29834 replica.cpp:496] Replica received implicit > promise request from (564)@172.17.5.188:41228
[jira] [Created] (MESOS-7727) Scheme/HTTPTest.Get segfaults
Vinod Kone created MESOS-7727: - Summary: Scheme/HTTPTest.Get segfaults Key: MESOS-7727 URL: https://issues.apache.org/jira/browse/MESOS-7727 Project: Mesos Issue Type: Bug Reporter: Vinod Kone Assignee: Till Toenshoff Observed this on ASF CI {code} [ RUN ] Scheme/HTTPTest.Get/0 I0627 09:58:16.931704 2483 openssl.cpp:419] CA file path is unspecified! NOTE: Set CA file path with LIBPROCESS_SSL_CA_FILE= I0627 09:58:16.931727 2483 openssl.cpp:424] CA directory path unspecified! NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR= I0627 09:58:16.931732 2483 openssl.cpp:429] Will not verify peer certificate! NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification I0627 09:58:16.931740 2483 openssl.cpp:435] Will only verify peer certificate if presented! NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification I0627 09:58:16.932193 3504 process.cpp:968] Failed to accept socket: future discarded *** Aborted at 1498557496 (unix time) try "date -d @1498557496" if you are using GNU date *** PC: @ 0x7f5397f30912 (unknown) *** SIGSEGV (@0x7f5349e18068) received by PID 2483 (TID 0x7f53937cd700) from PID 1239515240; stack trace: *** I0627 09:58:16.934547 2483 process.cpp:1282] libprocess is initialized on 172.17.0.4:50357 with 16 worker threads @ 0x7f53987ac370 (unknown) @ 0x7f5397f30912 (unknown) @ 0x7f5397f30f8c (unknown) @ 0x42b1a3 process::UPID::UPID() @ 0x8fcdec process::DispatchEvent::DispatchEvent() I0627 09:58:16.940096 3518 process.cpp:3779] Handling HTTP event for process '(80)' with path: '/(80)/get' @ 0x8f5275 process::internal::dispatch() @ 0x910002 process::dispatch<>() I0627 09:58:16.945485 3519 process.cpp:3779] Handling HTTP event for process '(80)' with path: '/(80)/get' @ 0x8f4184 process::ProcessBase::route() [ OK ] Scheme/HTTPTest.Get/0 (463 ms) [ RUN ] Scheme/HTTPTest.Get/1 @ 0x9e88b9 process::ProcessBase::route<>() @ 0x9e4bb2 process::Help::initialize() @ 0x8ed69a process::ProcessManager::resume() @ 0x8e9a98 _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv @ 0x8fc38c _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE @ 0x8fc2d0 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv @ 0x8fc25a _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x7f5397f27230 (unknown) @ 0x7f53987a4dc5 start_thread @ 0x7f539769076d __clone make[7]: *** [check-local] Segmentation fault {code} [~tillt] can you triage this? looks related to SSL -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7726) MasterTest.IgnoreOldAgentReregistration test is flaky
Vinod Kone created MESOS-7726: - Summary: MasterTest.IgnoreOldAgentReregistration test is flaky Key: MESOS-7726 URL: https://issues.apache.org/jira/browse/MESOS-7726 Project: Mesos Issue Type: Bug Reporter: Vinod Kone Assignee: Neil Conway Observed this on ASF CI. {code} [ RUN ] MasterTest.IgnoreOldAgentReregistration I0627 05:23:06.031154 4917 cluster.cpp:162] Creating default 'local' authorizer I0627 05:23:06.033433 4945 master.cpp:438] Master a8778782-0da1-49a5-9cb8-9f6d11701733 (c43debbe7e32) started on 172.17.0.4:41747 I0627 05:23:06.033457 4945 master.cpp:440] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/2BARnF/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" --work_dir="/tmp/2BARnF/master" --zk_session_timeout="10secs" I0627 05:23:06.033771 4945 master.cpp:490] Master only allowing authenticated frameworks to register I0627 05:23:06.033787 4945 master.cpp:504] Master only allowing authenticated agents to register I0627 05:23:06.033798 4945 master.cpp:517] Master only allowing authenticated HTTP frameworks to register I0627 05:23:06.033812 4945 credentials.hpp:37] Loading credentials for authentication from '/tmp/2BARnF/credentials' I0627 05:23:06.034080 4945 master.cpp:562] Using default 'crammd5' authenticator I0627 05:23:06.034221 4945 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0627 05:23:06.034409 4945 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0627 05:23:06.034569 4945 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0627 05:23:06.034688 4945 master.cpp:642] Authorization enabled I0627 05:23:06.034862 4938 whitelist_watcher.cpp:77] No whitelist given I0627 05:23:06.034868 4950 hierarchical.cpp:169] Initialized hierarchical allocator process I0627 05:23:06.037211 4957 master.cpp:2161] Elected as the leading master! I0627 05:23:06.037236 4957 master.cpp:1700] Recovering from registrar I0627 05:23:06.037333 4938 registrar.cpp:345] Recovering registrar I0627 05:23:06.038146 4938 registrar.cpp:389] Successfully fetched the registry (0B) in 768256ns I0627 05:23:06.038290 4938 registrar.cpp:493] Applied 1 operations in 30798ns; attempting to update the registry I0627 05:23:06.038861 4938 registrar.cpp:550] Successfully updated the registry in 510976ns I0627 05:23:06.038960 4938 registrar.cpp:422] Successfully recovered registrar I0627 05:23:06.039364 4941 hierarchical.cpp:207] Skipping recovery of hierarchical allocator: nothing to recover I0627 05:23:06.039594 4958 master.cpp:1799] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0627 05:23:06.043999 4917 containerizer.cpp:230] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret W0627 05:23:06.044456 4917 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W0627 05:23:06.044548 4917 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I0627 05:23:06.044580 4917 provisioner.cpp:255] Using default backend 'copy' I0627 05:23:06.046222 4917 cluster.cpp:448] Creating default 'local' authorizer I0627 05:23:06.047572 4950 slave.cpp:249] Mesos agent started on (269)@172.17.0.4:41747 I0627 05:23:06.047591 4950 slave.cpp:250] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local"
[jira] [Created] (MESOS-7725) PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval test is flaky
Vinod Kone created MESOS-7725: - Summary: PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval test is flaky Key: MESOS-7725 URL: https://issues.apache.org/jira/browse/MESOS-7725 Project: Mesos Issue Type: Bug Reporter: Vinod Kone Assignee: Neil Conway Observed this on ASF CI. Will paste the log once I find a failing build whose logs are not rotated out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7724) MasterAPITest.Subscribe test segfaults
Vinod Kone created MESOS-7724: - Summary: MasterAPITest.Subscribe test segfaults Key: MESOS-7724 URL: https://issues.apache.org/jira/browse/MESOS-7724 Project: Mesos Issue Type: Bug Components: HTTP API Reporter: Vinod Kone Found this on ASF CI {code} [ RUN ] ContentType/MasterAPITest.Subscribe/1 I0625 05:38:37.009217 30646 cluster.cpp:162] Creating default 'local' authorizer I0625 05:38:37.014230 30650 master.cpp:438] Master 7395bba4-e83c-4a4a-9010-d2e89629edeb (7bd48084f726) started on 172.17.0.2:41689 I0625 05:38:37.014291 30650 master.cpp:440] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/D1sbIK/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/D1sbIK/master" --zk_session_timeout="10secs" I0625 05:38:37.014972 30650 master.cpp:490] Master only allowing authenticated frameworks to register I0625 05:38:37.014992 30650 master.cpp:504] Master only allowing authenticated agents to register I0625 05:38:37.015008 30650 master.cpp:517] Master only allowing authenticated HTTP frameworks to register I0625 05:38:37.015019 30650 credentials.hpp:37] Loading credentials for authentication from '/tmp/D1sbIK/credentials' I0625 05:38:37.015575 30650 master.cpp:562] Using default 'crammd5' authenticator I0625 05:38:37.016842 30650 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0625 05:38:37.017230 30650 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0625 05:38:37.017542 30650 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0625 05:38:37.017822 30650 master.cpp:642] Authorization enabled I0625 05:38:37.018196 30651 hierarchical.cpp:169] Initialized hierarchical allocator process I0625 05:38:37.018357 30653 whitelist_watcher.cpp:77] No whitelist given *** Aborted at 1498369117 (unix time) try "date -d @1498369117" if you are using GNU date *** PC: @ 0x2b0f5603bc41 std::_Hashtable<>::_M_bucket_begin() *** SIGSEGV (@0x5d0) received by PID 30646 (TID 0x2b0f641b6700) from PID 1488; stack trace: *** @ 0x2b0f5bfcc330 (unknown) @ 0x2b0f5603bc41 std::_Hashtable<>::_M_bucket_begin() @ 0x2b0f5603bb2d std::_Hashtable<>::count() @ 0x2b0f5603bacd std::unordered_map<>::count() @ 0x2b0f56033b68 hashmap<>::contains() @ 0x2b0f56008d5e mesos::internal::master::Master::exited() @ 0x2b0f5600aeaa mesos::internal::master::Master::subscribe()::$_36::operator()() @ 0x2b0f5600ae71 _ZZZNK7process9_DeferredIZN5mesos8internal6master6Master9subscribeERKNS3_14HttpConnectionEE4$_36EcvSt8functionIFvT_EEIRKNS_6FutureI7NothingvENKUlSJ_E_clESJ_ENKUlvE_clEv @ 0x2b0f5600abdd _ZNSt17_Function_handlerIFvvEZZNK7process9_DeferredIZN5mesos8internal6master6Master9subscribeERKNS5_14HttpConnectionEE4$_36EcvSt8functionIFvT_EEIRKNS1_6FutureI7NothingvENKUlSL_E_clESL_EUlvE_E9_M_invokeERKSt9_Any_data @ 0x1af707e std::function<>::operator()() @ 0x1eabc09 _ZZN7process8internal8DispatchIvEclIRSt8functionIFvvvRKNS_4UPIDEOT_ENKUlPNS_11ProcessBaseEE_clESE_ @ 0x1eab9c2 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8internal8DispatchIvEclIRSt8functionIFvvvRKNS0_4UPIDEOT_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x2b0f582e7608 std::function<>::operator()() @ 0x2b0f582cdab4 process::ProcessBase::visit() @ 0x2b0f5835da8e process::DispatchEvent::visit() @ 0x1af66b1 process::ProcessBase::serve() @ 0x2b0f582cb7a4 process::ProcessManager::resume() @ 0x2b0f582d982c process::ProcessManager::init_threads()::$_2::operator()() @ 0x2b0f582d9735
[jira] [Created] (MESOS-7723) Support lxcfs for serving special proc files for containers.
Jie Yu created MESOS-7723: - Summary: Support lxcfs for serving special proc files for containers. Key: MESOS-7723 URL: https://issues.apache.org/jira/browse/MESOS-7723 Project: Mesos Issue Type: Improvement Components: containerization Reporter: Jie Yu LXCFS is a small FUSE filesystem written with the intention of making Linux containers feel more like a virtual machine. It started as a side-project of LXC but is useable by any runtime. https://github.com/lxc/lxcfs Some legacy applications will read /proc/cpuinfo or /proc/meminfo to get the available cpus and memory. Without this, the application will assume it has all the cores and memory on the host. We can potentially build an isolator for this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-6345) ExamplesTest.PersistentVolumeFramework failing due to double free corruption on Ubuntu 14.04
[ https://issues.apache.org/jira/browse/MESOS-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064894#comment-16064894 ] Dmitry Zhuk commented on MESOS-6345: https://reviews.apache.org/r/60467/ > ExamplesTest.PersistentVolumeFramework failing due to double free corruption > on Ubuntu 14.04 > > > Key: MESOS-6345 > URL: https://issues.apache.org/jira/browse/MESOS-6345 > Project: Mesos > Issue Type: Bug > Components: framework >Reporter: Avinash Sridharan > Labels: mesosphere > > PersistentVolumeFramework tests if failing on Ubuntu 14 > {code} > [Step 10/10] *** Error in > `/mnt/teamcity/work/4240ba9ddd0997c3/build/src/.libs/lt-persistent-volume-framework': > double free or corruption (fasttop): 0x7f1ae0006a20 *** > [04:56:48]W: [Step 10/10] *** Aborted at 1475902608 (unix time) try "date > -d @1475902608" if you are using GNU date *** > [04:56:48]W: [Step 10/10] I1008 04:56:48.592744 25425 state.cpp:57] > Recovering state from '/mnt/teamcity/temp/buildTmp/mesos-8KiPML/2/meta' > [04:56:48]W: [Step 10/10] I1008 04:56:48.592808 25423 state.cpp:57] > Recovering state from '/mnt/teamcity/temp/buildTmp/mesos-8KiPML/1/meta' > [04:56:48]W: [Step 10/10] I1008 04:56:48.592952 25425 > status_update_manager.cpp:203] Recovering status update manager > [04:56:48]W: [Step 10/10] I1008 04:56:48.592957 25423 > status_update_manager.cpp:203] Recovering status update manager > [04:56:48]W: [Step 10/10] I1008 04:56:48.593010 25424 > containerizer.cpp:557] Recovering containerizer > [04:56:48]W: [Step 10/10] I1008 04:56:48.593143 25396 sched.cpp:226] > Version: 1.1.0 > [04:56:48]W: [Step 10/10] I1008 04:56:48.593158 25425 master.cpp:2013] > Elected as the leading master! > [04:56:48]W: [Step 10/10] I1008 04:56:48.593173 25425 master.cpp:1560] > Recovering from registrar > [04:56:48]W: [Step 10/10] I1008 04:56:48.593211 25424 registrar.cpp:329] > Recovering registrar > [04:56:48]W: [Step 10/10] I1008 04:56:48.593250 25425 sched.cpp:330] New > master detected at master@172.30.2.21:45167 > [04:56:48]W: [Step 10/10] I1008 04:56:48.593282 25425 sched.cpp:341] No > credentials provided. Attempting to register without authentication > [04:56:48]W: [Step 10/10] I1008 04:56:48.593293 25425 sched.cpp:820] > Sending SUBSCRIBE call to master@172.30.2.21:45167 > [04:56:48]W: [Step 10/10] PC: @ 0x7f1b0bbaccc9 (unknown) > [04:56:48]W: [Step 10/10] I1008 04:56:48.593339 25425 sched.cpp:853] Will > retry registration in 32.354951ms if necessary > [04:56:48]W: [Step 10/10] I1008 04:56:48.593364 25421 master.cpp:1387] > Dropping 'mesos.scheduler.Call' message since not recovered yet > [04:56:48]W: [Step 10/10] I1008 04:56:48.593413 25428 provisioner.cpp:253] > Provisioner recovery complete > [04:56:48]W: [Step 10/10] *** SIGABRT (@0x6334) received by PID 25396 (TID > 0x7f1b02ed6700) from PID 25396; stack trace: *** > [04:56:48]W: [Step 10/10] I1008 04:56:48.593520 25421 > containerizer.cpp:557] Recovering containerizer > [04:56:48]W: [Step 10/10] I1008 04:56:48.593529 25425 slave.cpp:5276] > Finished recovery > [04:56:48]W: [Step 10/10] I1008 04:56:48.593627 25422 leveldb.cpp:304] > Persisting metadata (8 bytes) to leveldb took 4.546422ms > [04:56:48]W: [Step 10/10] I1008 04:56:48.593695 25428 provisioner.cpp:253] > Provisioner recovery complete > [04:56:48]W: [Step 10/10] I1008 04:56:48.593701 25422 replica.cpp:320] > Persisted replica status to VOTING > [04:56:48]W: [Step 10/10] I1008 04:56:48.593760 25424 slave.cpp:5276] > Finished recovery > [04:56:48]W: [Step 10/10] I1008 04:56:48.593864 25427 recover.cpp:582] > Successfully joined the Paxos group > [04:56:48]W: [Step 10/10] I1008 04:56:48.593896 25425 slave.cpp:5448] > Querying resource estimator for oversubscribable resources > [04:56:48]W: [Step 10/10] I1008 04:56:48.593922 25427 recover.cpp:466] > Recover process terminated > [04:56:48]W: [Step 10/10] I1008 04:56:48.593976 25427 slave.cpp:5462] > Received oversubscribable resources {} from the resource estimator > [04:56:48]W: [Step 10/10] I1008 04:56:48.594002 25424 slave.cpp:5448] > Querying resource estimator for oversubscribable resources > [04:56:48]W: [Step 10/10] I1008 04:56:48.594017 25422 log.cpp:553] > Attempting to start the writer > [04:56:48]W: [Step 10/10] I1008 04:56:48.594030 25428 > status_update_manager.cpp:177] Pausing sending status updates > [04:56:48]W: [Step 10/10] I1008 04:56:48.594032 25427 slave.cpp:915] New > master detected at master@172.30.2.21:45167 > [04:56:48]W: [Step 10/10] I1008 04:56:48.594055 25423 slave.cpp:915] New > master detected at master@172.30.2.21:45167 > [04:56:48]W: [Step 10/10] I1008 04:56:48.594048 25428 >
[jira] [Commented] (MESOS-6345) ExamplesTest.PersistentVolumeFramework failing due to double free corruption on Ubuntu 14.04
[ https://issues.apache.org/jira/browse/MESOS-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064891#comment-16064891 ] Dmitry Zhuk commented on MESOS-6345: Similar crash on CentOS7 (in ExamplesTest.PersistentVolumeFramework and ExamplesTest.DynamicReservationFramework) presumably due to race condition for {{signaledWrapper}} in {{configureSignal}}. {noformat} [ RUN ] ExamplesTest.DynamicReservationFramework *** Error in `mesos/build/src/.libs/lt-dynamic-reservation-framework': double free or corruption (fasttop): 0x7fdfa0002e60 *** === Backtrace: = /lib64/libc.so.6(+0x7c503)[0x7fdfc6da7503] mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2NS1_9_DeferredIDTcl4bindadsrSt8functionIFvT0_T1_EEclcvSF__Efp1_fp2_RKNS1_3PIDIT_EEMSJ_FvSC_SD_ET2_T3_EUliiE_E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE+0x31)[0x7fdfcca9165c] mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2NS1_9_DeferredIDTcl4bindadsrSt8functionIFvT0_T1_EEclcvSF__Efp1_fp2_RKNS1_3PIDIT_EEMSJ_FvSC_SD_ET2_T3_EUliiE_E10_M_managerERSt9_Any_dataRKST_St18_Manager_operation+0xa2)[0x7fdfcca79857] mesos/build/src/.libs/lt-dynamic-reservation-framework(_ZNSt14_Function_baseD1Ev+0x33)[0x560e50f40ae7] mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt8functionIFviiEED1Ev+0x18)[0x7fdfcca2ec98] mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt10_Head_baseILm0ESt8functionIFviiEELb0EED1Ev+0x18)[0x7fdfcca300ce] mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt11_Tuple_implILm0EISt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2D1Ev+0x18)[0x7fdfcca300e8] mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt5tupleIISt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2D1Ev+0x18)[0x7fdfcca30102] mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES3_St12_PlaceholderILi1EES7_ILi2D1Ev+0x1c)[0x7fdfcca30120] mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt14_Function_base13_Base_managerISt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES5_St12_PlaceholderILi1EES9_ILi2E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE+0x29)[0x7fdfcca91873] mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt14_Function_base13_Base_managerISt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES5_St12_PlaceholderILi1EES9_ILi2E10_M_managerERSt9_Any_dataRKSF_St18_Manager_operation+0xa2)[0x7fdfcca79ba3] mesos/build/src/.libs/lt-dynamic-reservation-framework(_ZNSt14_Function_baseD1Ev+0x33)[0x560e50f40ae7] mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt8functionIFviiEED1Ev+0x18)[0x7fdfcca2ec98] mesos/build/src/.libs/libmesos-1.4.0.so(_ZN2os8internal15configureSignalEPKSt8functionIFviiEE+0x4a)[0x7fdfcc9db47d] mesos/build/src/.libs/libmesos-1.4.0.so(_ZN5mesos8internal5slave5Slave10initializeEv+0x3d5e)[0x7fdfcc9e0a78] mesos/build/src/.libs/libmesos-1.4.0.so(_ZN7process14ProcessManager6resumeEPNS_11ProcessBaseE+0x284)[0x7fdfcd93fedc] mesos/build/src/.libs/libmesos-1.4.0.so(+0x61152da)[0x7fdfcd93c2da] mesos/build/src/.libs/libmesos-1.4.0.so(+0x6127bce)[0x7fdfcd94ebce] mesos/build/src/.libs/libmesos-1.4.0.so(+0x6127b12)[0x7fdfcd94eb12] mesos/build/src/.libs/libmesos-1.4.0.so(+0x6127a9c)[0x7fdfcd94ea9c] /lib64/libstdc++.so.6(+0xb5230)[0x7fdfc73b7230] /lib64/libpthread.so.0(+0x7dc5)[0x7fdfc7612dc5] /lib64/libc.so.6(clone+0x6d)[0x7fdfc6e2276d] {noformat} > ExamplesTest.PersistentVolumeFramework failing due to double free corruption > on Ubuntu 14.04 > > > Key: MESOS-6345 > URL: https://issues.apache.org/jira/browse/MESOS-6345 > Project: Mesos > Issue Type: Bug > Components: framework >Reporter: Avinash Sridharan > Labels: mesosphere > > PersistentVolumeFramework tests if failing on Ubuntu 14 > {code} > [Step 10/10] *** Error in > `/mnt/teamcity/work/4240ba9ddd0997c3/build/src/.libs/lt-persistent-volume-framework': > double free or corruption (fasttop): 0x7f1ae0006a20 *** > [04:56:48]W: [Step 10/10] *** Aborted at 1475902608 (unix time) try "date > -d @1475902608" if you are using GNU date *** > [04:56:48]W: [Step 10/10] I1008 04:56:48.592744 25425 state.cpp:57] > Recovering state from '/mnt/teamcity/temp/buildTmp/mesos-8KiPML/2/meta' > [04:56:48]W: [Step 10/10] I1008 04:56:48.592808 25423 state.cpp:57] > Recovering state from '/mnt/teamcity/temp/buildTmp/mesos-8KiPML/1/meta' > [04:56:48]W: [Step 10/10] I1008 04:56:48.592952 25425 > status_update_manager.cpp:203] Recovering status update manager > [04:56:48]W: [Step 10/10] I1008 04:56:48.592957 25423 > status_update_manager.cpp:203] Recovering status update manager > [04:56:48]W: [Step 10/10] I1008 04:56:48.593010 25424
[jira] [Updated] (MESOS-7160) Parsing of perf version segfaults
[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Budnik updated MESOS-7160: - Sprint: Mesosphere Sprint 58 > Parsing of perf version segfaults > - > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Andrei Budnik > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7709) Add --dns flag to the agent.
[ https://issues.apache.org/jira/browse/MESOS-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061870#comment-16061870 ] Qian Zhang edited comment on MESOS-7709 at 6/27/17 6:52 AM: {quote} The problem becomes even more acerbated when you have a mix of v4 and v6 containers, since if now you rely only on `/etc/resolv.conf` to provide the default you will have pick some of the 3 possible nameservers to v4 and some to v6 again making it inflexible. {quote} Do you mean the case that there are some v4 containers and some v6 containers in the same agent host? And if we introduce a {{--dns}} agent flag, how will the issue you mentioned be resolved? Thanks. Update: Had a sync up with Avinash in Slack, the idea is, in a Mesos cluster which has both IPv4 containers and IPv6 containers, without the {{\--dns}} agent flag either the frameworks will have to explicitly set a IPv6 DNS entry for v6 containers using the {{\--dns}} parameter to {{docker run}}, or we will need to have IPv6 entry for {{nameservers}} in our {{/etc/resolv.conf}}. With the introduction of the {{\--dns}} flag this problem goes away since for IPv6 networks the operator can just set a nameserver (multiple of them if necessary) for a given network and we can pass these values to the docker daemon when launching the docker container on that IPv6 network. was (Author: qianzhang): {quote} The problem becomes even more acerbated when you have a mix of v4 and v6 containers, since if now you rely only on `/etc/resolv.conf` to provide the default you will have pick some of the 3 possible nameservers to v4 and some to v6 again making it inflexible. {quote} Do you mean the case that there are some v4 containers and some v6 containers in the same agent host? And if we introduce a {{--dns}} agent flag, how will the issue you mentioned be resolved? Thanks. > Add --dns flag to the agent. > > > Key: MESOS-7709 > URL: https://issues.apache.org/jira/browse/MESOS-7709 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > > Mesos support both CNI (through `network/cni` isolator) and CNM (through > docker) specification. Both these specifications allow for DNS entries for > containers to be set on a per-container, and per-network basis. > Currently, the behavior of the agent is to use the DNS nameservers set in > /etc/resolv.conf when the CNI or CNM plugin that is used to attached the > container to the CNI/CNM network doesnt' explicitly set the DNS for the > container. This is a bit inflexible especially when we have a mix of v4 and > v6 networks. > The operator should be able to specify DNS nameservers for the networks he > installs either the override the ones provided by the plugin or as defaults > when the plugins are not going to specify DNS name servers. > In order to achieve the above goal we need to introduce a `\--dns` flag to > the agent. The `\--dns` flag should support a JSON (or a JSON file) with the > following schema: > {code} > { > "mesos": { > [ > { > "network" : , > "nameservers": [] > } > ] > }, > "docker": { > [ > { > "network" : , > "nameservers": [] > } > ] > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)