[jira] [Assigned] (MESOS-2929) Update libprocess #include headers
[ https://issues.apache.org/jira/browse/MESOS-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-2929: - Assignee: (was: Paul Brett) > Update libprocess #include headers > -- > > Key: MESOS-2929 > URL: https://issues.apache.org/jira/browse/MESOS-2929 > Project: Mesos > Issue Type: Bug >Reporter: Paul Brett > > Update libprocess to #include headers for symbols we rely on and reorder to > comply with the style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2853) Report per-container metrics from host egress filter
[ https://issues.apache.org/jira/browse/MESOS-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-2853: - Assignee: (was: Paul Brett) > Report per-container metrics from host egress filter > > > Key: MESOS-2853 > URL: https://issues.apache.org/jira/browse/MESOS-2853 > Project: Mesos > Issue Type: Improvement > Components: isolation >Reporter: Paul Brett > Labels: twitter > > Export in statistics.json the fq_codel flow statistics for each container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2926) Extend mesos-style.py/cpplint.py to check #include files
[ https://issues.apache.org/jira/browse/MESOS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-2926: - Assignee: (was: Paul Brett) > Extend mesos-style.py/cpplint.py to check #include files > > > Key: MESOS-2926 > URL: https://issues.apache.org/jira/browse/MESOS-2926 > Project: Mesos > Issue Type: Bug >Reporter: Paul Brett > > cpplint.py provides the capability to enforce the style guide requirements > for #including everything you use and ordering files based on type but it > does not work for mesos because we do use #include <...> for project files > where it expects #include "...". > We should update the style checker to support our include usage and then turn > it on by default in the commit hook. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2927) Update mesos #include headers
[ https://issues.apache.org/jira/browse/MESOS-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-2927: - Assignee: (was: Paul Brett) > Update mesos #include headers > - > > Key: MESOS-2927 > URL: https://issues.apache.org/jira/browse/MESOS-2927 > Project: Mesos > Issue Type: Bug >Reporter: Paul Brett > > Update mesos to #include headers for symbols we rely on and reorder to comply > with the style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2952) Provide user namespaces for privileged access inside containers
[ https://issues.apache.org/jira/browse/MESOS-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-2952: - Assignee: (was: Paul Brett) > Provide user namespaces for privileged access inside containers > --- > > Key: MESOS-2952 > URL: https://issues.apache.org/jira/browse/MESOS-2952 > Project: Mesos > Issue Type: Epic >Reporter: Paul Brett > > User namespaces allow per-namespace mappings of user and group IDs. This > means that a process's user and group IDs inside a user namespace can be > different from its IDs outside of the namespace. Most notably, a process can > have a nonzero user ID outside a namespace while at the same time having a > user ID of zero inside the namespace; in other words, the process is > unprivileged for operations outside the user namespace but has root > privileges inside the namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2994) Design doc for creating user namespaces inside containers
[ https://issues.apache.org/jira/browse/MESOS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-2994: - Assignee: (was: Paul Brett) > Design doc for creating user namespaces inside containers > - > > Key: MESOS-2994 > URL: https://issues.apache.org/jira/browse/MESOS-2994 > Project: Mesos > Issue Type: Improvement >Reporter: Paul Brett > Labels: twitter > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1977) Disk Isolator Usage Metrics
[ https://issues.apache.org/jira/browse/MESOS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-1977: - Assignee: (was: Paul Brett) > Disk Isolator Usage Metrics > > > Key: MESOS-1977 > URL: https://issues.apache.org/jira/browse/MESOS-1977 > Project: Mesos > Issue Type: Improvement > Components: isolation >Reporter: Joris Van Remoortere > Labels: mesosphere > > Implement just the usage statistics aspect of the block io isolator for the > mesos containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2599) Make exit codes unique
[ https://issues.apache.org/jira/browse/MESOS-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-2599: - Assignee: (was: Paul Brett) > Make exit codes unique > -- > > Key: MESOS-2599 > URL: https://issues.apache.org/jira/browse/MESOS-2599 > Project: Mesos > Issue Type: Improvement >Reporter: Paul Brett > Labels: twitter > > Currently, we use EXIT(1) for all exits from the slave. If we make the exit > code unique for each reason, we can use the exit code to analyze failures. > Grouping the exit codes between startup exits (before the slave ever offered > service) and in service exits. Additionally, it would be useful to identify > which exists are expected to clear on a retry. > We should validate if the exit code is being inspected by calling scripts, > which could break with the updated exit codes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1977) Disk Isolator Usage Metrics
[ https://issues.apache.org/jira/browse/MESOS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-1977: - Assignee: Paul Brett > Disk Isolator Usage Metrics > > > Key: MESOS-1977 > URL: https://issues.apache.org/jira/browse/MESOS-1977 > Project: Mesos > Issue Type: Improvement > Components: isolation >Reporter: Joris Van Remoortere >Assignee: Paul Brett > Labels: mesosphere > > Implement just the usage statistics aspect of the block io isolator for the > mesos containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3588) Port mapping isolator check failed: createQdisc.get()
Paul Brett created MESOS-3588: - Summary: Port mapping isolator check failed: createQdisc.get() Key: MESOS-3588 URL: https://issues.apache.org/jira/browse/MESOS-3588 Project: Mesos Issue Type: Bug Reporter: Paul Brett Container creation is failing occasionally due to the required name already existing, e.g: {code} F1005 13:25:04.331053 48582 port_mapping.cpp:2245] Check failed: createQdisc.get() *** Check failure stack trace: *** @ 0x7f3b5c3b668d google::LogMessage::Fail() @ 0x7f3b5c3b84d4 google::LogMessage::SendToLog() @ 0x7f3b5c3b627c google::LogMessage::Flush() @ 0x7f3b5c3b8dc9 google::LogMessageFatal::~LogMessageFatal() @ 0x7f3b5c0bdc8c mesos::internal::slave::PortMappingIsolatorProcess::isolate() @ 0x7f3b5bf28fd6 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave20MesosIsolatorProcessERKNS6_11ContainerIDEiSA_iEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x7f3b5c3690b1 process::ProcessManager::resume() @ 0x7f3b5c3693af process::internal::schedule() @ 0x7f3b5c478cd0 execute_native_thread_routine @ 0x7f3b5b14283d start_thread @ 0x7f3b5abb7fdd clone /usr/local/bin/mesos-slave.sh: line 102: 48575 Aborted (core dumped) $debug /usr/local/sbin/mesos-slave "${MESOS_FLAGS[@]}" Slave Exit Status: 134 {code} It appears the there are valid circumstances under which the kernel can reallocate the namespace PID before the containers external interface (mesos_n) has been destroyed. {code} 2236 // Prepare the ingress queueing disciplines on veth. 2237 Try createQdisc = ingress::create(veth(pid)); 2238 if (createQdisc.isError()) { 2239 return Failure( 2240 "Failed to create the ingress qdisc on " + veth(pid) + 2241 ": " + createQdisc.error()); 2242 } 2243 2244 // Veth device should exist since we just created it. 2245 CHECK(createQdisc.get()); {code} We should check for test for link already exists errors in port mapping (e.g. link::create returns false) and fail the container creation rather than killing the slave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3513) Cgroups Test Filters aborts tests on Centos 6.6
Paul Brett created MESOS-3513: - Summary: Cgroups Test Filters aborts tests on Centos 6.6 Key: MESOS-3513 URL: https://issues.apache.org/jira/browse/MESOS-3513 Project: Mesos Issue Type: Bug Components: slave, test Environment: Centos 6.6 Reporter: Paul Brett Assignee: Paul Brett Running make check on centos 6.6 causes all tests to abort due to CHECK_SOME test in CgroupsFIlter: {code} Build directory: /home/jenkins/workspace/mesos-config-centos6/build F0923 23:00:49.748896 27362 environment.cpp:132] CHECK_SOME(hierarchies_): Failed to determine canonical path of /sys/fs/cgroup/freezer: No such file or directory *** Check failure stack trace: *** @ 0x7fb786ca0c4d google::LogMessage::Fail() @ 0x7fb786ca298c google::LogMessage::SendToLog() @ 0x7fb786ca083c google::LogMessage::Flush() @ 0x7fb786ca3289 google::LogMessageFatal::~LogMessageFatal() @ 0x58e66c mesos::internal::tests::CgroupsFilter::CgroupsFilter() @ 0x58712f mesos::internal::tests::Environment::Environment() @ 0x4c882f main @ 0x7fb782767d5d __libc_start_main @ 0x4d6331 (unknown) make[3]: *** [check-local] Aborted {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky
[ https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900954#comment-14900954 ] Paul Brett commented on MESOS-3422: --- Tested HEAD on Centos6 (original reporting platform) with no errors. {code} [--] 1 test from MasterSlaveReconciliationTest [ RUN ] MasterSlaveReconciliationTest.ReconcileLostTask Using temporary directory '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI' I0921 16:38:36.016902 51925 leveldb.cpp:176] Opened db in 73.30966ms I0921 16:38:36.023943 51925 leveldb.cpp:183] Compacted db in 6.963667ms I0921 16:38:36.024034 51925 leveldb.cpp:198] Created db iterator in 48856ns I0921 16:38:36.024061 51925 leveldb.cpp:204] Seeked to beginning of db in 3684ns I0921 16:38:36.024077 51925 leveldb.cpp:273] Iterated through 0 keys in the db in 337ns I0921 16:38:36.024189 51925 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0921 16:38:36.025542 51935 recover.cpp:449] Starting replica recovery I0921 16:38:36.026080 51935 recover.cpp:475] Replica is in EMPTY status I0921 16:38:36.028053 51930 master.cpp:380] Master 20150921-163836-2081170186-40941-51925 (smfd-aki-27-sr1.devel.twitter.com) started on 10.35.12.124:40941 I0921 16:38:36.028286 51934 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0921 16:38:36.028094 51930 master.cpp:382] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials" --framework_sorter="drf" --help="false" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/master" --zk_session_timeout="10secs" I0921 16:38:36.029104 51930 master.cpp:427] Master only allowing authenticated frameworks to register I0921 16:38:36.029132 51930 master.cpp:432] Master only allowing authenticated slaves to register I0921 16:38:36.029155 51930 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials' I0921 16:38:36.029250 51936 recover.cpp:195] Received a recover response from a replica in EMPTY status I0921 16:38:36.029738 51930 master.cpp:471] Using default 'crammd5' authenticator I0921 16:38:36.029908 51930 authenticator.cpp:512] Initializing server SASL I0921 16:38:36.029947 51940 recover.cpp:566] Updating replica status to STARTING I0921 16:38:36.030782 51930 master.cpp:508] Authorization enabled I0921 16:38:36.036074 51926 master.cpp:1607] The newly elected leader is master@10.35.12.124:40941 with id 20150921-163836-2081170186-40941-51925 I0921 16:38:36.036110 51926 master.cpp:1620] Elected as the leading master! I0921 16:38:36.036145 51926 master.cpp:1380] Recovering from registrar I0921 16:38:36.036335 51930 registrar.cpp:309] Recovering registrar I0921 16:38:36.067191 51938 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 36.988836ms I0921 16:38:36.067246 51938 replica.cpp:323] Persisted replica status to STARTING I0921 16:38:36.067517 51938 recover.cpp:475] Replica is in STARTING status I0921 16:38:36.068230 51936 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0921 16:38:36.068429 51928 recover.cpp:195] Received a recover response from a replica in STARTING status I0921 16:38:36.068729 51927 recover.cpp:566] Updating replica status to VOTING I0921 16:38:36.074915 51940 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 6.095154ms I0921 16:38:36.074942 51940 replica.cpp:323] Persisted replica status to VOTING I0921 16:38:36.075021 51936 recover.cpp:580] Successfully joined the Paxos group I0921 16:38:36.075228 51936 recover.cpp:464] Recover process terminated I0921 16:38:36.075657 51926 log.cpp:661] Attempting to start the writer I0921 16:38:36.077828 51927 replica.cpp:477] Replica received implicit promise request with proposal 1 I0921 16:38:36.091645 51927 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 13.779849ms I0921 16:38:36.091686 51927 replica.cpp:345] Persisted promised to 1 I0921 16:38:36.092543 51934 coordinator.cpp:231] Coordinator attemping to fill missing position I0921 16:38:36.094199 51939 replica.cpp:378] Replica received explicit promise request for position 0 with
[jira] [Commented] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky
[ https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900955#comment-14900955 ] Paul Brett commented on MESOS-3422: --- Tested HEAD on Centos6 (original reporting platform) with no errors. {code} [--] 1 test from MasterSlaveReconciliationTest [ RUN ] MasterSlaveReconciliationTest.ReconcileLostTask Using temporary directory '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI' I0921 16:38:36.016902 51925 leveldb.cpp:176] Opened db in 73.30966ms I0921 16:38:36.023943 51925 leveldb.cpp:183] Compacted db in 6.963667ms I0921 16:38:36.024034 51925 leveldb.cpp:198] Created db iterator in 48856ns I0921 16:38:36.024061 51925 leveldb.cpp:204] Seeked to beginning of db in 3684ns I0921 16:38:36.024077 51925 leveldb.cpp:273] Iterated through 0 keys in the db in 337ns I0921 16:38:36.024189 51925 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0921 16:38:36.025542 51935 recover.cpp:449] Starting replica recovery I0921 16:38:36.026080 51935 recover.cpp:475] Replica is in EMPTY status I0921 16:38:36.028053 51930 master.cpp:380] Master 20150921-163836-2081170186-40941-51925 (smfd-aki-27-sr1.devel.twitter.com) started on 10.35.12.124:40941 I0921 16:38:36.028286 51934 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0921 16:38:36.028094 51930 master.cpp:382] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials" --framework_sorter="drf" --help="false" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/master" --zk_session_timeout="10secs" I0921 16:38:36.029104 51930 master.cpp:427] Master only allowing authenticated frameworks to register I0921 16:38:36.029132 51930 master.cpp:432] Master only allowing authenticated slaves to register I0921 16:38:36.029155 51930 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials' I0921 16:38:36.029250 51936 recover.cpp:195] Received a recover response from a replica in EMPTY status I0921 16:38:36.029738 51930 master.cpp:471] Using default 'crammd5' authenticator I0921 16:38:36.029908 51930 authenticator.cpp:512] Initializing server SASL I0921 16:38:36.029947 51940 recover.cpp:566] Updating replica status to STARTING I0921 16:38:36.030782 51930 master.cpp:508] Authorization enabled I0921 16:38:36.036074 51926 master.cpp:1607] The newly elected leader is master@10.35.12.124:40941 with id 20150921-163836-2081170186-40941-51925 I0921 16:38:36.036110 51926 master.cpp:1620] Elected as the leading master! I0921 16:38:36.036145 51926 master.cpp:1380] Recovering from registrar I0921 16:38:36.036335 51930 registrar.cpp:309] Recovering registrar I0921 16:38:36.067191 51938 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 36.988836ms I0921 16:38:36.067246 51938 replica.cpp:323] Persisted replica status to STARTING I0921 16:38:36.067517 51938 recover.cpp:475] Replica is in STARTING status I0921 16:38:36.068230 51936 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0921 16:38:36.068429 51928 recover.cpp:195] Received a recover response from a replica in STARTING status I0921 16:38:36.068729 51927 recover.cpp:566] Updating replica status to VOTING I0921 16:38:36.074915 51940 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 6.095154ms I0921 16:38:36.074942 51940 replica.cpp:323] Persisted replica status to VOTING I0921 16:38:36.075021 51936 recover.cpp:580] Successfully joined the Paxos group I0921 16:38:36.075228 51936 recover.cpp:464] Recover process terminated I0921 16:38:36.075657 51926 log.cpp:661] Attempting to start the writer I0921 16:38:36.077828 51927 replica.cpp:477] Replica received implicit promise request with proposal 1 I0921 16:38:36.091645 51927 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 13.779849ms I0921 16:38:36.091686 51927 replica.cpp:345] Persisted promised to 1 I0921 16:38:36.092543 51934 coordinator.cpp:231] Coordinator attemping to fill missing position I0921 16:38:36.094199 51939 replica.cpp:378] Replica received explicit promise request for position 0 with
[jira] [Commented] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky
[ https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900956#comment-14900956 ] Paul Brett commented on MESOS-3422: --- Tested HEAD on Centos6 (original reporting platform) with no errors. {code} [--] 1 test from MasterSlaveReconciliationTest [ RUN ] MasterSlaveReconciliationTest.ReconcileLostTask Using temporary directory '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI' I0921 16:38:36.016902 51925 leveldb.cpp:176] Opened db in 73.30966ms I0921 16:38:36.023943 51925 leveldb.cpp:183] Compacted db in 6.963667ms I0921 16:38:36.024034 51925 leveldb.cpp:198] Created db iterator in 48856ns I0921 16:38:36.024061 51925 leveldb.cpp:204] Seeked to beginning of db in 3684ns I0921 16:38:36.024077 51925 leveldb.cpp:273] Iterated through 0 keys in the db in 337ns I0921 16:38:36.024189 51925 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0921 16:38:36.025542 51935 recover.cpp:449] Starting replica recovery I0921 16:38:36.026080 51935 recover.cpp:475] Replica is in EMPTY status I0921 16:38:36.028053 51930 master.cpp:380] Master 20150921-163836-2081170186-40941-51925 (smfd-aki-27-sr1.devel.twitter.com) started on 10.35.12.124:40941 I0921 16:38:36.028286 51934 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0921 16:38:36.028094 51930 master.cpp:382] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials" --framework_sorter="drf" --help="false" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/master" --zk_session_timeout="10secs" I0921 16:38:36.029104 51930 master.cpp:427] Master only allowing authenticated frameworks to register I0921 16:38:36.029132 51930 master.cpp:432] Master only allowing authenticated slaves to register I0921 16:38:36.029155 51930 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials' I0921 16:38:36.029250 51936 recover.cpp:195] Received a recover response from a replica in EMPTY status I0921 16:38:36.029738 51930 master.cpp:471] Using default 'crammd5' authenticator I0921 16:38:36.029908 51930 authenticator.cpp:512] Initializing server SASL I0921 16:38:36.029947 51940 recover.cpp:566] Updating replica status to STARTING I0921 16:38:36.030782 51930 master.cpp:508] Authorization enabled I0921 16:38:36.036074 51926 master.cpp:1607] The newly elected leader is master@10.35.12.124:40941 with id 20150921-163836-2081170186-40941-51925 I0921 16:38:36.036110 51926 master.cpp:1620] Elected as the leading master! I0921 16:38:36.036145 51926 master.cpp:1380] Recovering from registrar I0921 16:38:36.036335 51930 registrar.cpp:309] Recovering registrar I0921 16:38:36.067191 51938 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 36.988836ms I0921 16:38:36.067246 51938 replica.cpp:323] Persisted replica status to STARTING I0921 16:38:36.067517 51938 recover.cpp:475] Replica is in STARTING status I0921 16:38:36.068230 51936 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0921 16:38:36.068429 51928 recover.cpp:195] Received a recover response from a replica in STARTING status I0921 16:38:36.068729 51927 recover.cpp:566] Updating replica status to VOTING I0921 16:38:36.074915 51940 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 6.095154ms I0921 16:38:36.074942 51940 replica.cpp:323] Persisted replica status to VOTING I0921 16:38:36.075021 51936 recover.cpp:580] Successfully joined the Paxos group I0921 16:38:36.075228 51936 recover.cpp:464] Recover process terminated I0921 16:38:36.075657 51926 log.cpp:661] Attempting to start the writer I0921 16:38:36.077828 51927 replica.cpp:477] Replica received implicit promise request with proposal 1 I0921 16:38:36.091645 51927 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 13.779849ms I0921 16:38:36.091686 51927 replica.cpp:345] Persisted promised to 1 I0921 16:38:36.092543 51934 coordinator.cpp:231] Coordinator attemping to fill missing position I0921 16:38:36.094199 51939 replica.cpp:378] Replica received explicit promise request for position 0 with
[jira] [Assigned] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky
[ https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-3422: - Assignee: Paul Brett > MasterSlaveReconciliationTest.ReconcileLostTask test is flaky > - > > Key: MESOS-3422 > URL: https://issues.apache.org/jira/browse/MESOS-3422 > Project: Mesos > Issue Type: Bug > Components: technical debt, test >Affects Versions: 0.25.0 > Environment: CentOS >Reporter: Vinod Kone >Assignee: Paul Brett > > Observed this on internal CI > {code} > DEBUG: [--] 5 tests from MasterSlaveReconciliationTest > DEBUG: [ RUN ] MasterSlaveReconciliationTest.SlaveReregisterTerminatedExecutor > DEBUG: Using temporary directory > '/tmp/MasterSlaveReconciliationTest_SlaveReregisterTerminatedExecutor_QJPUzf' > DEBUG: [ OK ] MasterSlaveReconciliationTest.SlaveReregisterTerminatedExecutor > (78 ms) > DEBUG: [ RUN ] MasterSlaveReconciliationTest.ReconcileLostTask > DEBUG: Using temporary directory > '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_16KDgE' > DEBUG: tests/master_slave_reconciliation_tests.cpp:226: Failure > DEBUG: Failed to wait 15secs for statusUpdateMessage > DEBUG: tests/master_slave_reconciliation_tests.cpp:216: Failure > DEBUG: Actual function call count doesn't match EXPECT_CALL(sched, > statusUpdate(, _))... > DEBUG: Expected: to be called once > DEBUG: Actual: never called - unsatisfied and active > DEBUG: I0914 08:51:27.825984 16062 leveldb.cpp:438] Reading position from > leveldb took 16151ns > DEBUG: I0914 08:51:27.828069 16049 registrar.cpp:342] Successfully fetched > the registry (0B) in 7648us > DEBUG: I0914 08:51:27.828119 16049 registrar.cpp:441] Applied 1 operations in > 2805ns; attempting to update the 'registry' > DEBUG: I0914 08:51:27.829991 16066 log.cpp:685] Attempting to append 222 > bytes to the log > DEBUG: I0914 08:51:27.830029 16066 coordinator.cpp:341] Coordinator > attempting to write APPEND action at position 1 > DEBUG: I0914 08:51:27.830729 16053 replica.cpp:511] Replica received write > request for position 1 > DEBUG: I0914 08:51:27.831167 16053 leveldb.cpp:343] Persisting action (241 > bytes) to leveldb took 414748ns > DEBUG: I0914 08:51:27.831185 16053 replica.cpp:679] Persisted action at 1 > DEBUG: I0914 08:51:27.831493 16058 replica.cpp:658] Replica received learned > notice for position 1 > DEBUG: I0914 08:51:27.831698 16058 leveldb.cpp:343] Persisting action (243 > bytes) to leveldb took 185223ns > DEBUG: I0914 08:51:27.831714 16058 replica.cpp:679] Persisted action at 1 > DEBUG: I0914 08:51:27.831722 16058 replica.cpp:664] Replica learned APPEND > action at position 1 > DEBUG: I0914 08:51:27.831989 16056 registrar.cpp:486] Successfully updated > the 'registry' in 3.827968ms > DEBUG: I0914 08:51:27.832041 16052 log.cpp:704] Attempting to truncate the > log to 1 > DEBUG: I0914 08:51:27.832093 16056 registrar.cpp:372] Successfully recovered > registrar > DEBUG: I0914 08:51:27.832259 16072 coordinator.cpp:341] Coordinator > attempting to write TRUNCATE action at position 2 > DEBUG: I0914 08:51:27.832259 16062 master.cpp:1404] Recovered 0 slaves from > the Registry (183B) ; allowing 10mins for slaves to re-register > DEBUG: I0914 08:51:27.832882 16060 replica.cpp:511] Replica received write > request for position 2 > DEBUG: I0914 08:51:27.833243 16060 leveldb.cpp:343] Persisting action (16 > bytes) to leveldb took 340843ns > DEBUG: I0914 08:51:27.833261 16060 replica.cpp:679] Persisted action at 2 > DEBUG: I0914 08:51:27.833593 16050 replica.cpp:658] Replica received learned > notice for position 2 > DEBUG: I0914 08:51:27.833724 16050 leveldb.cpp:343] Persisting action (18 > bytes) to leveldb took 112560ns > DEBUG: I0914 08:51:27.833755 16050 leveldb.cpp:401] Deleting ~1 keys from > leveldb took 16580ns > DEBUG: I0914 08:51:27.833765 16050 replica.cpp:679] Persisted action at 2 > DEBUG: I0914 08:51:27.833775 16050 replica.cpp:664] Replica learned TRUNCATE > action at position 2 > DEBUG: I0914 08:51:27.843340 16057 http.cpp:333] HTTP POST for > /master/maintenance/schedule from 172.18.4.102:46471 > DEBUG: I0914 08:51:27.843801 16050 registrar.cpp:441] Applied 1 operations in > 25197ns; attempting to update the 'registry' > DEBUG: I0914 08:51:27.845721 16068 log.cpp:685] Attempting to append 328 > bytes to the log > DEBUG: I0914 08:51:27.845772 16068 coordinator.cpp:341] Coordinator > attempting to write APPEND action at position 3 > DEBUG: I0914 08:51:27.846606 16052 replica.cpp:511] Replica received write > request for position 3 > DEBUG: I0914 08:51:27.847012 16052 leveldb.cpp:343] Persisting action (347 > bytes) to leveldb took 387519ns > DEBUG: I0914 08:51:27.847026 16052 replica.cpp:679] Persisted action at 3 > DEBUG: I0914 08:51:27.847698 16048
[jira] [Commented] (MESOS-3253) Add pid to network helper error messages
[ https://issues.apache.org/jira/browse/MESOS-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725694#comment-14725694 ] Paul Brett commented on MESOS-3253: --- I don't think we need this anymore. > Add pid to network helper error messages > > > Key: MESOS-3253 > URL: https://issues.apache.org/jira/browse/MESOS-3253 > Project: Mesos > Issue Type: Bug >Reporter: Paul Brett > > Network helper logs errors to stderr without the associated namespace pid or > container id which prevents the errors from being associated with the > appropriate container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3347) Remove dead code in src/linux/perf.cpp
Paul Brett created MESOS-3347: - Summary: Remove dead code in src/linux/perf.cpp Key: MESOS-3347 URL: https://issues.apache.org/jira/browse/MESOS-3347 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Performance monitoring routines include support for sampling for single pid, single cgroup and multiple pids cases but these are never used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3292) Perf isolator event validation issues
Paul Brett created MESOS-3292: - Summary: Perf isolator event validation issues Key: MESOS-3292 URL: https://issues.apache.org/jira/browse/MESOS-3292 Project: Mesos Issue Type: Bug Reporter: Paul Brett Linux perf isolator currently validates events by running the installed perf command once at slave startup to verify that no error is raised when the event is requested. No checking is done at startup to validate that the perf event is supported by Mesos in the PerfStatistics message. However, perf is an external program and can be upgraded while the slave is running, possibly resulting in a change of perf Version and supported events or output formats. We should validate events against PerfStatistics at startup and deal with on the fly perf upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3271) SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.
Paul Brett created MESOS-3271: - Summary: SlaveRecoveryTest/0.NonCheckpointingFramework is flaky. Key: MESOS-3271 URL: https://issues.apache.org/jira/browse/MESOS-3271 Project: Mesos Issue Type: Bug Components: slave Reporter: Paul Brett Test failure on Ubuntu 14 configured with --disable-java --disable-python --enable-ssl --enable-libevent --enable-optimize --enable-network-isolation Commit: 9b78b301469667b5a44f0a351de5f3a71edae499 [ RUN ] SlaveRecoveryTest/0.NonCheckpointingFramework I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0 I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 20150815-064146-544909504-51064-12195-S0 Registered executor on slave1-ubuntu12 Starting task 044bd49e-2f38-4671-802a-ac6524d61a85 Forked command at 17114 sh -c 'sleep 1000' [err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 0x2, fd: 21, flags: 0x80) *** Aborted at 1439646107 (unix time) try date -d @1439646107 if you are using GNU date *** PC: @ 0x7f6ba512d0d5 (unknown) *** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 12195; stack trace: *** @ 0x7f6ba54c4cb0 (unknown) @ 0x7f6ba512d0d5 (unknown) @ 0x7f6ba513083b (unknown) @ 0x7f6ba448e1ba (unknown) @ 0x7f6ba448e52b (unknown) @ 0x7f6ba447dcc9 (unknown) @ 0x4c4033 process::internal::run() @ 0x7f6ba72642ab process::Future::discard() @ 0x7f6ba72643be process::internal::discard() @ 0x7f6ba7262298 _ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data @ 0x4c4033 process::internal::run() @ 0x6fa0cb process::Future::discard() @ 0x7f6ba6fb5736 cgroups::event::Listener::finalize() @ 0x7f6ba728fb11 process::ProcessManager::resume() @ 0x7f6ba728fe0f process::internal::schedule() @ 0x7f6ba5c9d490 (unknown) @ 0x7f6ba54bce9a start_thread @ 0x7f6ba51ea38d (unknown) + /bin/true -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3271) SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.
[ https://issues.apache.org/jira/browse/MESOS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-3271: -- Attachment: build.txt SlaveRecoveryTest/0.NonCheckpointingFramework is flaky. --- Key: MESOS-3271 URL: https://issues.apache.org/jira/browse/MESOS-3271 Project: Mesos Issue Type: Bug Components: slave Reporter: Paul Brett Attachments: build.txt Test failure on Ubuntu 14 configured with --disable-java --disable-python --enable-ssl --enable-libevent --enable-optimize --enable-network-isolation Commit: 9b78b301469667b5a44f0a351de5f3a71edae499 [ RUN ] SlaveRecoveryTest/0.NonCheckpointingFramework I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0 I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 20150815-064146-544909504-51064-12195-S0 Registered executor on slave1-ubuntu12 Starting task 044bd49e-2f38-4671-802a-ac6524d61a85 Forked command at 17114 sh -c 'sleep 1000' [err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 0x2, fd: 21, flags: 0x80) *** Aborted at 1439646107 (unix time) try date -d @1439646107 if you are using GNU date *** PC: @ 0x7f6ba512d0d5 (unknown) *** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 12195; stack trace: *** @ 0x7f6ba54c4cb0 (unknown) @ 0x7f6ba512d0d5 (unknown) @ 0x7f6ba513083b (unknown) @ 0x7f6ba448e1ba (unknown) @ 0x7f6ba448e52b (unknown) @ 0x7f6ba447dcc9 (unknown) @ 0x4c4033 process::internal::run() @ 0x7f6ba72642ab process::Future::discard() @ 0x7f6ba72643be process::internal::discard() @ 0x7f6ba7262298 _ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data @ 0x4c4033 process::internal::run() @ 0x6fa0cb process::Future::discard() @ 0x7f6ba6fb5736 cgroups::event::Listener::finalize() @ 0x7f6ba728fb11 process::ProcessManager::resume() @ 0x7f6ba728fe0f process::internal::schedule() @ 0x7f6ba5c9d490 (unknown) @ 0x7f6ba54bce9a start_thread @ 0x7f6ba51ea38d (unknown) + /bin/true -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3272) CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer is flaky.
[ https://issues.apache.org/jira/browse/MESOS-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-3272: -- Attachment: build.log CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer is flaky. Key: MESOS-3272 URL: https://issues.apache.org/jira/browse/MESOS-3272 Project: Mesos Issue Type: Bug Components: isolation Reporter: Paul Brett Attachments: build.log Test aborts when configured with python, libevent and SSL on Ubuntu12. [ RUN ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer *** Aborted at 1439667937 (unix time) try date -d @1439667937 if you are using GNU date *** PC: @ 0x7feba972a753 (unknown) *** SIGSEGV (@0x0) received by PID 4359 (TID 0x7febabf897c0) from PID 0; stack trace: *** @ 0x7feba8f7dcb0 (unknown) @ 0x7feba972a753 (unknown) @ 0x7febaaa69328 process::dispatch() @ 0x7febaaa5e9a7 cgroups::freezer::thaw() @ 0xba64ff mesos::internal::tests::CgroupsAnyHierarchyWithCpuMemoryTest_ROOT_CGROUPS_FreezeNonFreezer_Test::TestBody() @ 0xc199a3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0xc0f947 testing::Test::Run() @ 0xc0f9ee testing::TestInfo::Run() @ 0xc0faf5 testing::TestCase::Run() @ 0xc0fda8 testing::internal::UnitTestImpl::RunAllTests() @ 0xc10064 testing::UnitTest::Run() @ 0x4b3273 main @ 0x7feba8bd176d (unknown) @ 0x4bf1f1 (unknown) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3272) CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer is flaky.
Paul Brett created MESOS-3272: - Summary: CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer is flaky. Key: MESOS-3272 URL: https://issues.apache.org/jira/browse/MESOS-3272 Project: Mesos Issue Type: Bug Components: isolation Reporter: Paul Brett Attachments: build.log Test aborts when configured with python, libevent and SSL on Ubuntu12. [ RUN ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer *** Aborted at 1439667937 (unix time) try date -d @1439667937 if you are using GNU date *** PC: @ 0x7feba972a753 (unknown) *** SIGSEGV (@0x0) received by PID 4359 (TID 0x7febabf897c0) from PID 0; stack trace: *** @ 0x7feba8f7dcb0 (unknown) @ 0x7feba972a753 (unknown) @ 0x7febaaa69328 process::dispatch() @ 0x7febaaa5e9a7 cgroups::freezer::thaw() @ 0xba64ff mesos::internal::tests::CgroupsAnyHierarchyWithCpuMemoryTest_ROOT_CGROUPS_FreezeNonFreezer_Test::TestBody() @ 0xc199a3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0xc0f947 testing::Test::Run() @ 0xc0f9ee testing::TestInfo::Run() @ 0xc0faf5 testing::TestCase::Run() @ 0xc0fda8 testing::internal::UnitTestImpl::RunAllTests() @ 0xc10064 testing::UnitTest::Run() @ 0x4b3273 main @ 0x7feba8bd176d (unknown) @ 0x4bf1f1 (unknown) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3254) Cgroup CHECK fails test harness
[ https://issues.apache.org/jira/browse/MESOS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697661#comment-14697661 ] Paul Brett commented on MESOS-3254: --- Updated change at https://reviews.apache.org/r/37490/ Cgroup CHECK fails test harness --- Key: MESOS-3254 URL: https://issues.apache.org/jira/browse/MESOS-3254 Project: Mesos Issue Type: Bug Components: test Reporter: Paul Brett CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [--] 24 tests from SlaveRecoveryTest/0 (38986 ms total) [--] 4 tests from MesosContainerizerSlaveRecoveryTest [ RUN ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics ../../src/tests/mesos.cpp:720: Failure cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to another hierarchy - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*). - F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy *** Check failure stack trace: *** @ 0x7fb2fb4835fd google::LogMessage::Fail() @ 0x7fb2fb48543d google::LogMessage::SendToLog() @ 0x7fb2fb4831ec google::LogMessage::Flush() @ 0x7fb2fb485d39 google::LogMessageFatal::~LogMessageFatal() @ 0x4e3f98 _CheckFatal::~_CheckFatal() @ 0x82f25a mesos::internal::tests::ContainerizerTest::TearDown() @ 0xc030e3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0xbf9050 testing::Test::Run() @ 0xbf912e testing::TestInfo::Run() @ 0xbf9235 testing::TestCase::Run() @ 0xbf94e8 testing::internal::UnitTestImpl::RunAllTests() @ 0xbf97a4 testing::UnitTest::Run() @ 0x4a9df3 main @ 0x7fb2f9371ec5 (unknown) @ 0x4b63ee (unknown) Build step 'Execute shell' marked build as failure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3254) Cgroup CHECK fails test harness
[ https://issues.apache.org/jira/browse/MESOS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-3254: -- Sprint: Twitter Mesos Q3 Sprint 3 Story Points: 2 Description: CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [--] 24 tests from SlaveRecoveryTest/0 (38986 ms total) [--] 4 tests from MesosContainerizerSlaveRecoveryTest [ RUN ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics ../../src/tests/mesos.cpp:720: Failure cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to another hierarchy - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*). - F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy *** Check failure stack trace: *** @ 0x7fb2fb4835fd google::LogMessage::Fail() @ 0x7fb2fb48543d google::LogMessage::SendToLog() @ 0x7fb2fb4831ec google::LogMessage::Flush() @ 0x7fb2fb485d39 google::LogMessageFatal::~LogMessageFatal() @ 0x4e3f98 _CheckFatal::~_CheckFatal() @ 0x82f25a mesos::internal::tests::ContainerizerTest::TearDown() @ 0xc030e3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0xbf9050 testing::Test::Run() @ 0xbf912e testing::TestInfo::Run() @ 0xbf9235 testing::TestCase::Run() @ 0xbf94e8 testing::internal::UnitTestImpl::RunAllTests() @ 0xbf97a4 testing::UnitTest::Run() @ 0x4a9df3 main @ 0x7fb2f9371ec5 (unknown) @ 0x4b63ee (unknown) Build step 'Execute shell' marked build as failure was: CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [--] 24 tests from SlaveRecoveryTest/0 (38986 ms total) [--] 4 tests from MesosContainerizerSlaveRecoveryTest [ RUN ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics ../../src/tests/mesos.cpp:720: Failure cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to another hierarchy - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*). - F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy *** Check failure stack trace: *** @ 0x7fb2fb4835fd google::LogMessage::Fail() @ 0x7fb2fb48543d google::LogMessage::SendToLog() @ 0x7fb2fb4831ec google::LogMessage::Flush() @ 0x7fb2fb485d39 google::LogMessageFatal::~LogMessageFatal() @ 0x4e3f98 _CheckFatal::~_CheckFatal() @ 0x82f25a mesos::internal::tests::ContainerizerTest::TearDown() @ 0xc030e3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0xbf9050 testing::Test::Run() @ 0xbf912e testing::TestInfo::Run() @ 0xbf9235 testing::TestCase::Run() @ 0xbf94e8 testing::internal::UnitTestImpl::RunAllTests() @ 0xbf97a4 testing::UnitTest::Run() @ 0x4a9df3 main @ 0x7fb2f9371ec5 (unknown) @ 0x4b63ee (unknown) Build step 'Execute shell' marked build as failure Cgroup CHECK fails test harness --- Key: MESOS-3254 URL: https://issues.apache.org/jira/browse/MESOS-3254 Project: Mesos Issue Type: Bug Components: test Reporter: Paul Brett CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ]
[jira] [Created] (MESOS-3257) Zookeeper JVM test failure causes test harness to fail
Paul Brett created MESOS-3257: - Summary: Zookeeper JVM test failure causes test harness to fail Key: MESOS-3257 URL: https://issues.apache.org/jira/browse/MESOS-3257 Project: Mesos Issue Type: Bug Reporter: Paul Brett Failure of the test setup for ZooKeeper Java setup causes test harness to exit, preventing subsequent tests from running. {code} [--] 2 tests from LogZooKeeperTest F0813 16:09:33.647265 13790 zookeeper.cpp:78] CHECK_SOME(jvm): Error looking up symbol 'JNI_CreateJavaVM' in '' : /home/pbrett/sandbox/perf.refactor2/build/src/.libs/mesos-tests: undefined symbol: JNI_CreateJavaVM *** Check failure stack trace: *** @ 0x7f2d8cca7aac google::LogMessage::Fail() @ 0x7f2d8cca79fb google::LogMessage::SendToLog() @ 0x7f2d8cca740c google::LogMessage::Flush() @ 0x7f2d8ccaa140 google::LogMessageFatal::~LogMessageFatal() @ 0x8a938c _CheckFatal::~_CheckFatal() @ 0x12f68c0 mesos::internal::tests::ZooKeeperTest::SetUpTestCase() @ 0x132a88a testing::TestCase::RunSetUpTestCase() @ 0x1334cf7 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x132fb94 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x1311635 testing::TestCase::Run() @ 0x1317fca testing::internal::UnitTestImpl::RunAllTests() @ 0x1335427 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x1330128 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x1316cf0 testing::UnitTest::Run() @ 0xc3a9d8 RUN_ALL_TESTS() @ 0xc3a6c8 main @ 0x7f2d8818d9f4 __libc_start_main @ 0x8a5fa9 (unknown) make[3]: *** [check-local] Aborted {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3254) Cgroup CHECK fails test harness
[ https://issues.apache.org/jira/browse/MESOS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-3254: -- Description: CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [--] 24 tests from SlaveRecoveryTest/0 (38986 ms total) [--] 4 tests from MesosContainerizerSlaveRecoveryTest [ RUN ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics ../../src/tests/mesos.cpp:720: Failure cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to another hierarchy - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*). - F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy *** Check failure stack trace: *** @ 0x7fb2fb4835fd google::LogMessage::Fail() @ 0x7fb2fb48543d google::LogMessage::SendToLog() @ 0x7fb2fb4831ec google::LogMessage::Flush() @ 0x7fb2fb485d39 google::LogMessageFatal::~LogMessageFatal() @ 0x4e3f98 _CheckFatal::~_CheckFatal() @ 0x82f25a mesos::internal::tests::ContainerizerTest::TearDown() @ 0xc030e3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0xbf9050 testing::Test::Run() @ 0xbf912e testing::TestInfo::Run() @ 0xbf9235 testing::TestCase::Run() @ 0xbf94e8 testing::internal::UnitTestImpl::RunAllTests() @ 0xbf97a4 testing::UnitTest::Run() @ 0x4a9df3 main @ 0x7fb2f9371ec5 (unknown) @ 0x4b63ee (unknown) Build step 'Execute shell' marked build as failure was: CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [--] 24 tests from SlaveRecoveryTest/0 (38986 ms total) [--] 4 tests from MesosContainerizerSlaveRecoveryTest [ RUN ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics ../../src/tests/mesos.cpp:720: Failure cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to another hierarchy - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*). - F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy *** Check failure stack trace: *** @ 0x7fb2fb4835fd google::LogMessage::Fail() @ 0x7fb2fb48543d google::LogMessage::SendToLog() @ 0x7fb2fb4831ec google::LogMessage::Flush() @ 0x7fb2fb485d39 google::LogMessageFatal::~LogMessageFatal() @ 0x4e3f98 _CheckFatal::~_CheckFatal() @ 0x82f25a mesos::internal::tests::ContainerizerTest::TearDown() @ 0xc030e3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0xbf9050 testing::Test::Run() @ 0xbf912e testing::TestInfo::Run() @ 0xbf9235 testing::TestCase::Run() @ 0xbf94e8 testing::internal::UnitTestImpl::RunAllTests() @ 0xbf97a4 testing::UnitTest::Run() @ 0x4a9df3 main @ 0x7fb2f9371ec5 (unknown) @ 0x4b63ee (unknown) Build step 'Execute shell' marked build as failure Cgroup CHECK fails test harness --- Key: MESOS-3254 URL: https://issues.apache.org/jira/browse/MESOS-3254 Project: Mesos Issue Type: Bug Components: test Reporter: Paul Brett CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [--] 24 tests from SlaveRecoveryTest/0
[jira] [Commented] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine
[ https://issues.apache.org/jira/browse/MESOS-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694498#comment-14694498 ] Paul Brett commented on MESOS-3185: --- Updated, reviews are: https://reviews.apache.org/r/37423/ https://reviews.apache.org/r/37424/ https://reviews.apache.org/r/37417/ https://reviews.apache.org/r/37416/ Refactor Subprocess logic in linux/perf.cpp to use common subroutine Key: MESOS-3185 URL: https://issues.apache.org/jira/browse/MESOS-3185 Project: Mesos Issue Type: Bug Components: slave Reporter: Paul Brett Assignee: Paul Brett MESOS-2834 will enhance the perf isolator to support the different output formats provided by difference kernel versions. In order to achieve this, it requires to execute the perf --version command. We should decompose the existing Subcommand processing in perf so that we can share the implementation between the multiple uses of perf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3253) Add pid to network helper error messages
Paul Brett created MESOS-3253: - Summary: Add pid to network helper error messages Key: MESOS-3253 URL: https://issues.apache.org/jira/browse/MESOS-3253 Project: Mesos Issue Type: Bug Reporter: Paul Brett Network helper logs errors to stderr without the associated namespace pid or container id which prevents the errors from being associated with the appropriate container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3252) Ignore no statistics condition for containers with no qdisc
Paul Brett created MESOS-3252: - Summary: Ignore no statistics condition for containers with no qdisc Key: MESOS-3252 URL: https://issues.apache.org/jira/browse/MESOS-3252 Project: Mesos Issue Type: Bug Components: isolation Reporter: Paul Brett In PortMappingStatistics::execute, we log the following errors to stderr if the egress rate limiting qdiscs are not configured inside the container. {code} Failed to get the network statistics for the htb qdisc on eth0 Failed to get the network statistics for the fq_codel qdisc on eth0 {code} This can occur because of an error reading the qdisc (statistics function return an error) or because the qdisc does not exist (function returns none). We should not log an error when the qdisc does not exist since this is normal behaviour if the container is created without rate limiting. We do not want to gate this function on the slave rate limiting flag since we would have to compare the behaviour against the flag value at the time the container was created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3254) Cgroup CHECK fails test harness
Paul Brett created MESOS-3254: - Summary: Cgroup CHECK fails test harness Key: MESOS-3254 URL: https://issues.apache.org/jira/browse/MESOS-3254 Project: Mesos Issue Type: Bug Components: test Reporter: Paul Brett CHECK in clean up of ContainerizerTest causes test harness to abort rather than fail or skip only perf related tests. [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch [ OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms) [--] 24 tests from SlaveRecoveryTest/0 (38986 ms total) [--] 4 tests from MesosContainerizerSlaveRecoveryTest [ RUN ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics ../../src/tests/mesos.cpp:720: Failure cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to another hierarchy - We cannot run any cgroups tests that require a hierarchy with subsystem 'perf_event' because we failed to find an existing hierarchy or create a new one (tried '/tmp/mesos_test_cgroup/perf_event'). You can either remove all existing hierarchies, or disable this test case (i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*). - F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy *** Check failure stack trace: *** @ 0x7fb2fb4835fd google::LogMessage::Fail() @ 0x7fb2fb48543d google::LogMessage::SendToLog() @ 0x7fb2fb4831ec google::LogMessage::Flush() @ 0x7fb2fb485d39 google::LogMessageFatal::~LogMessageFatal() @ 0x4e3f98 _CheckFatal::~_CheckFatal() @ 0x82f25a mesos::internal::tests::ContainerizerTest::TearDown() @ 0xc030e3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0xbf9050 testing::Test::Run() @ 0xbf912e testing::TestInfo::Run() @ 0xbf9235 testing::TestCase::Run() @ 0xbf94e8 testing::internal::UnitTestImpl::RunAllTests() @ 0xbf97a4 testing::UnitTest::Run() @ 0x4a9df3 main @ 0x7fb2f9371ec5 (unknown) @ 0x4b63ee (unknown) Build step 'Execute shell' marked build as failure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2994) Design doc for creating user namespaces inside containers
[ https://issues.apache.org/jira/browse/MESOS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2994: -- Sprint: Twitter Mesos Q3 Sprint 1, Twitter Mesos Q3 Sprint 2 (was: Twitter Mesos Q3 Sprint 1, Twitter Mesos Q3 Sprint 2, Twitter Mesos Q3 Sprint 3) Design doc for creating user namespaces inside containers - Key: MESOS-2994 URL: https://issues.apache.org/jira/browse/MESOS-2994 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Labels: twitter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3204) PortMappingIsolatorProcess shell script can silently fail
Paul Brett created MESOS-3204: - Summary: PortMappingIsolatorProcess shell script can silently fail Key: MESOS-3204 URL: https://issues.apache.org/jira/browse/MESOS-3204 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.24.0 Reporter: Paul Brett Assignee: Paul Brett PortMappingIsolatorProcess::scripts generates a shell script to configure the target environment but does not set the shell '-e' flag. Hence errors generated by the script will be silently ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3204) PortMappingIsolatorProcess shell script can silently fail
[ https://issues.apache.org/jira/browse/MESOS-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654606#comment-14654606 ] Paul Brett commented on MESOS-3204: --- https://reviews.apache.org/r/37106 PortMappingIsolatorProcess shell script can silently fail - Key: MESOS-3204 URL: https://issues.apache.org/jira/browse/MESOS-3204 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.24.0 Reporter: Paul Brett Assignee: Paul Brett PortMappingIsolatorProcess::scripts generates a shell script to configure the target environment but does not set the shell '-e' flag. Hence errors generated by the script will be silently ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine
[ https://issues.apache.org/jira/browse/MESOS-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649943#comment-14649943 ] Paul Brett commented on MESOS-3185: --- [~bmahler] suggested looking at process::await as a way to simplify the current code, so I am pulling the review while I take a look at this. Refactor Subprocess logic in linux/perf.cpp to use common subroutine Key: MESOS-3185 URL: https://issues.apache.org/jira/browse/MESOS-3185 Project: Mesos Issue Type: Bug Components: slave Reporter: Paul Brett Assignee: Paul Brett MESOS-2834 will enhance the perf isolator to support the different output formats provided by difference kernel versions. In order to achieve this, it requires to execute the perf --version command. We should decompose the existing Subcommand processing in perf so that we can share the implementation between the multiple uses of perf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine
[ https://issues.apache.org/jira/browse/MESOS-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649885#comment-14649885 ] Paul Brett commented on MESOS-3185: --- I was working with Marco for a while trying to do that and have held off this change to see where it is going, but need to move forward with the change. My hope is that we can get the stout version to support what I need and then I will fix perf to use his code. Refactor Subprocess logic in linux/perf.cpp to use common subroutine Key: MESOS-3185 URL: https://issues.apache.org/jira/browse/MESOS-3185 Project: Mesos Issue Type: Bug Components: slave Reporter: Paul Brett Assignee: Paul Brett MESOS-2834 will enhance the perf isolator to support the different output formats provided by difference kernel versions. In order to achieve this, it requires to execute the perf --version command. We should decompose the existing Subcommand processing in perf so that we can share the implementation between the multiple uses of perf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine
[ https://issues.apache.org/jira/browse/MESOS-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649886#comment-14649886 ] Paul Brett commented on MESOS-3185: --- Added review https://reviews.apache.org/r/37000/ Refactor Subprocess logic in linux/perf.cpp to use common subroutine Key: MESOS-3185 URL: https://issues.apache.org/jira/browse/MESOS-3185 Project: Mesos Issue Type: Bug Components: slave Reporter: Paul Brett Assignee: Paul Brett MESOS-2834 will enhance the perf isolator to support the different output formats provided by difference kernel versions. In order to achieve this, it requires to execute the perf --version command. We should decompose the existing Subcommand processing in perf so that we can share the implementation between the multiple uses of perf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine
Paul Brett created MESOS-3185: - Summary: Refactor Subprocess logic in linux/perf.cpp to use common subroutine Key: MESOS-3185 URL: https://issues.apache.org/jira/browse/MESOS-3185 Project: Mesos Issue Type: Bug Components: slave Reporter: Paul Brett Assignee: Paul Brett MESOS-2834 will enhance the perf isolator to support the different output formats provided by difference kernel versions. In order to achieve this, it requires to execute the perf --version command. We should decompose the existing Subcommand processing in perf so that we can share the implementation between the multiple uses of perf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3175) subprocess_tests.cpp:598 delete used but allocated with new[]
[ https://issues.apache.org/jira/browse/MESOS-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648033#comment-14648033 ] Paul Brett commented on MESOS-3175: --- https://reviews.apache.org/r/36947/ subprocess_tests.cpp:598 delete used but allocated with new[] - Key: MESOS-3175 URL: https://issues.apache.org/jira/browse/MESOS-3175 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.24.0 Reporter: Paul Brett Assignee: Paul Brett Compiler detected error: ../3rdparty/libprocess/src/tests/subprocess_tests.cpp|619 col 3| warning: 'delete' applied to a pointer that was allocated with 'new[]'; did you mean 'delete[]'? [-Wmismatched-new-delete] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3175) subprocess_tests.cpp:598 delete used but allocated with new[]
Paul Brett created MESOS-3175: - Summary: subprocess_tests.cpp:598 delete used but allocated with new[] Key: MESOS-3175 URL: https://issues.apache.org/jira/browse/MESOS-3175 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.24.0 Reporter: Paul Brett Assignee: Paul Brett Compiler detected error: ../3rdparty/libprocess/src/tests/subprocess_tests.cpp|619 col 3| warning: 'delete' applied to a pointer that was allocated with 'new[]'; did you mean 'delete[]'? [-Wmismatched-new-delete] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS Flaky
Paul Brett created MESOS-3160: - Summary: CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS Flaky Key: MESOS-3160 URL: https://issues.apache.org/jira/browse/MESOS-3160 Project: Mesos Issue Type: Bug Affects Versions: 0.24.0 Reporter: Paul Brett Test will occasionally with: [ RUN ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure helper.increaseRSS(getpagesize()): Failed to sync with the subprocess ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet [ FAILED ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS (223 ms) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2834) Support different perf output formats
[ https://issues.apache.org/jira/browse/MESOS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2834: -- Sprint: Twitter Mesos Q2 Sprint 6, Twitter Mesos Q3 Sprint 1, Twitter Mesos Q3 Sprint 2 (was: Twitter Mesos Q2 Sprint 6, Twitter Mesos Q3 Sprint 1) Support different perf output formats - Key: MESOS-2834 URL: https://issues.apache.org/jira/browse/MESOS-2834 Project: Mesos Issue Type: Improvement Components: isolation Reporter: Ian Downes Assignee: Paul Brett Labels: twitter The output format of perf changes in 3.14 (inserting an additional field) and in again in 4.1 (appending additional) fields. See kernel commits: 410136f5dd96b6013fe6d1011b523b1c247e1ccb d73515c03c6a2706e088094ff6095a3abefd398b Update the perf::parse() function to understand all these formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3050) Failing Docker/Cgroups/Volume tests in 0.23.0-rc3 on CentOS 7.1
[ https://issues.apache.org/jira/browse/MESOS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630212#comment-14630212 ] Paul Brett commented on MESOS-3050: --- PerfEventIsolatorTest is due to incompatible perf output version, will be fixed by MESOS-2834. Failing Docker/Cgroups/Volume tests in 0.23.0-rc3 on CentOS 7.1 --- Key: MESOS-3050 URL: https://issues.apache.org/jira/browse/MESOS-3050 Project: Mesos Issue Type: Bug Components: containerization, docker, test Affects Versions: 0.23.0 Environment: CentOS Linux release 7.1.1503 0.23.0-rc3 Reporter: Adam B Assignee: Timothy Chen {code} [ RUN ] DockerTest.ROOT_DOCKER_CheckPortResource ../../src/tests/docker_tests.cpp:303: Failure (run).failure(): Container exited on error: exited with status 1 [ FAILED ] DockerTest.ROOT_DOCKER_CheckPortResource (709 ms) {code} ... {code} [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample ../../src/tests/isolator_tests.cpp:837: Failure isolator: Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (9 ms) [--] 1 test from PerfEventIsolatorTest (9 ms total) [--] 2 tests from SharedFilesystemIsolatorTest [ RUN ] SharedFilesystemIsolatorTest.ROOT_RelativeVolume + mount -n --bind /tmp/SharedFilesystemIsolatorTest_ROOT_RelativeVolume_4yTEAC/var/tmp /var/tmp + touch /var/tmp/492407e1-5dec-4b34-8f2f-130430f41aac ../../src/tests/isolator_tests.cpp:1001: Failure Value of: os::exists(file) Actual: true Expected: false [ FAILED ] SharedFilesystemIsolatorTest.ROOT_RelativeVolume (92 ms) [ RUN ] SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume + mount -n --bind /tmp/SharedFilesystemIsolatorTest_ROOT_AbsoluteVolume_OwYrXK /var/tmp + touch /var/tmp/7de712aa-52eb-4976-b0f9-32b6a006418d ../../src/tests/isolator_tests.cpp:1086: Failure Value of: os::exists(path::join(containerPath, filename)) Actual: true Expected: false [ FAILED ] SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume (100 ms) {code} ... {code} [--] 1 test from UserCgroupIsolatorTest/0, where TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup -bash: /sys/fs/cgroup/blkio/user.slice/cgroup.procs: Permission denied mkdir: cannot create directory ‘/sys/fs/cgroup/blkio/user.slice/user’: Permission denied ../../src/tests/isolator_tests.cpp:1274: Failure Value of: os::system( su - + UNPRIVILEGED_USERNAME + -c 'mkdir + path::join(flags.cgroups_hierarchy, userCgroup) + ') Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/blkio/user.slice/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1283: Failure Value of: os::system( su - + UNPRIVILEGED_USERNAME + -c 'echo $$ + path::join(flags.cgroups_hierarchy, userCgroup, cgroup.procs) + ') Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/memory/mesos/bbf8c8f0-3d67-40df-a269-b3dc6a9597aa/cgroup.procs: Permission denied -bash: /sys/fs/cgroup/cpuacct,cpu/user.slice/cgroup.procs: No such file or directory mkdir: cannot create directory ‘/sys/fs/cgroup/cpuacct,cpu/user.slice/user’: No such file or directory ../../src/tests/isolator_tests.cpp:1274: Failure Value of: os::system( su - + UNPRIVILEGED_USERNAME + -c 'mkdir + path::join(flags.cgroups_hierarchy, userCgroup) + ') Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/cpuacct,cpu/user.slice/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1283: Failure Value of: os::system( su - + UNPRIVILEGED_USERNAME + -c 'echo $$ + path::join(flags.cgroups_hierarchy, userCgroup, cgroup.procs) + ') Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/cgroup.procs: No such file or directory mkdir: cannot create directory ‘/sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/user’: No such file or directory ../../src/tests/isolator_tests.cpp:1274: Failure Value of: os::system( su - + UNPRIVILEGED_USERNAME + -c 'mkdir + path::join(flags.cgroups_hierarchy, userCgroup) + ') Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1283: Failure Value of: os::system( su - + UNPRIVILEGED_USERNAME + -c 'echo $$ + path::join(flags.cgroups_hierarchy, userCgroup, cgroup.procs) + ') Actual: 256 Expected: 0 [ FAILED ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam =
[jira] [Commented] (MESOS-3035) As a Developer I would like a standard way to run a Subprocess in libprocess
[ https://issues.apache.org/jira/browse/MESOS-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630375#comment-14630375 ] Paul Brett commented on MESOS-3035: --- Removed MESOS-2834 as a dependent by implementing a revised version of this code. [~marco-mesos] you might want to take a look at https://reviews.apache.org/r/36378 to see what I was thinking. As a Developer I would like a standard way to run a Subprocess in libprocess Key: MESOS-3035 URL: https://issues.apache.org/jira/browse/MESOS-3035 Project: Mesos Issue Type: Story Components: libprocess Reporter: Marco Massenzio Assignee: Marco Massenzio As part of MESOS-2830 and MESOS-2902 I have been researching the ability to run a {{Subprocess}} and capture the {{stdout / stderr}} along with the exit status code. {{process::subprocess()}} offers much of the functionality, but in a way that still requires a lot of handiwork on the developer's part; we would like to further abstract away the ability to just pass a string, an optional set of command-line arguments and then collect the output of the command (bonus: without blocking). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2834) Support different perf output formats
[ https://issues.apache.org/jira/browse/MESOS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2834: -- Shepherd: Jie Yu (was: Ian Downes) Support different perf output formats - Key: MESOS-2834 URL: https://issues.apache.org/jira/browse/MESOS-2834 Project: Mesos Issue Type: Improvement Components: isolation Reporter: Ian Downes Assignee: Paul Brett Labels: twitter The output format of perf changes in 3.14 (inserting an additional field) and in again in 4.1 (appending additional) fields. See kernel commits: 410136f5dd96b6013fe6d1011b523b1c247e1ccb d73515c03c6a2706e088094ff6095a3abefd398b Update the perf::parse() function to understand all these formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3029) Stout os::release returns unsortable version
Paul Brett created MESOS-3029: - Summary: Stout os::release returns unsortable version Key: MESOS-3029 URL: https://issues.apache.org/jira/browse/MESOS-3029 Project: Mesos Issue Type: Bug Reporter: Paul Brett Priority: Minor When the Linux kernel version was incremented from 2.6.39 to 3.0.0 it was discovered that a few applications could not process kernel versions starting with anything but 2.x. For compatibility, Red Hat and others mapped kernel version 3.n to 2.6.n+40. This introduces the interesting property that kernel 2.6.50 is later that kernel 3.9 but Version(2, 60, 50) is not greater that Version(3, 9, 0). Since we want to be able to order kernel versions, we need to undo this mapping. The following function is proposed for use in linux/perf.cpp to address this issue: Version canonicalLinuxRelease(const Version v) { if((v Version(2, 6, 39)) (v Version(3, 0, 0))) { return Version(3, v.patchVersion-40, 0); } return v; } We could either add this to stout/os.hpp or add a custom sort order to Version (which we might need later when we generalize it). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3026) ProcessTest.Cache fails and hangs
[ https://issues.apache.org/jira/browse/MESOS-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621123#comment-14621123 ] Paul Brett commented on MESOS-3026: --- I see the same hang on g++ 4.8, ubuntu 14.04. ProcessTest.Cache fails and hangs - Key: MESOS-3026 URL: https://issues.apache.org/jira/browse/MESOS-3026 Project: Mesos Issue Type: Bug Components: libprocess Environment: ubuntu 15.04/ ubuntu 14.04.2 clang-3.6 / gcc 4.8.2 Reporter: Joris Van Remoortere Assignee: Alexander Rojas Labels: libprocess, tests {code} [ RUN ] ProcessTest.Cache ../../../3rdparty/libprocess/src/tests/process_tests.cpp:1726: Failure Value of: response.get().status Actual: 200 OK Expected: 304 Not Modified [ FAILED ] ProcessTest.Cache (1 ms) {code} The tests then finish running, but the gtest framework fails to terminate and uses 100% CPU. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2834) Support different perf output formats
[ https://issues.apache.org/jira/browse/MESOS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621430#comment-14621430 ] Paul Brett commented on MESOS-2834: --- Updated to address reviewer issues and reposted at: https://reviews.apache.org/r/36378/ https://reviews.apache.org/r/36380/ Support different perf output formats - Key: MESOS-2834 URL: https://issues.apache.org/jira/browse/MESOS-2834 Project: Mesos Issue Type: Improvement Components: isolation Reporter: Ian Downes Assignee: Paul Brett Labels: twitter The output format of perf changes in 3.14 (inserting an additional field) and in again in 4.1 (appending additional) fields. See kernel commits: 410136f5dd96b6013fe6d1011b523b1c247e1ccb d73515c03c6a2706e088094ff6095a3abefd398b Update the perf::parse() function to understand all these formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2993) Document per container unique egress flow and network queueing statistics
[ https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619017#comment-14619017 ] Paul Brett commented on MESOS-2993: --- Completed review updates. [~adam-mesos] - could you please check if the updates are ok and commit. Thanks. Document per container unique egress flow and network queueing statistics -- Key: MESOS-2993 URL: https://issues.apache.org/jira/browse/MESOS-2993 Project: Mesos Issue Type: Bug Components: documentation, isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Document new network isolation capabilities in 0.23 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3020) Expose major, minor and patch components from stout Version
[ https://issues.apache.org/jira/browse/MESOS-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619515#comment-14619515 ] Paul Brett edited comment on MESOS-3020 at 7/8/15 10:48 PM: Patch available for review https://reviews.apache.org/r/36336 was (Author: pbrett): Patch available for review https://reviews.apache.org/r/36281/ Expose major, minor and patch components from stout Version - Key: MESOS-3020 URL: https://issues.apache.org/jira/browse/MESOS-3020 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Stout version class does not expose version components, preventing computations manipulation of version information. Solution is to make major, minor and patch public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3020) Expose major, minor and patch components from stout Version
[ https://issues.apache.org/jira/browse/MESOS-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619502#comment-14619502 ] Paul Brett commented on MESOS-3020: --- The need I have at the moment is in processing kernel versions. Redhat use kernel versions 2.6.40 upwards as an alias for 3.0 series, so I want to write something like this: TryVersion v = os::release; if((v = Version(2, 6, 40)) (v Version(3, 0, 0))) v = Version(3, v.minor-40, v.patch); Expose major, minor and patch components from stout Version - Key: MESOS-3020 URL: https://issues.apache.org/jira/browse/MESOS-3020 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Stout version class does not expose version components, preventing computations manipulation of version information. Solution is to make major, minor and patch public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3020) Expose major, minor and patch components from stout Version
[ https://issues.apache.org/jira/browse/MESOS-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619515#comment-14619515 ] Paul Brett commented on MESOS-3020: --- Patch available for review https://reviews.apache.org/r/36281/ Expose major, minor and patch components from stout Version - Key: MESOS-3020 URL: https://issues.apache.org/jira/browse/MESOS-3020 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Stout version class does not expose version components, preventing computations manipulation of version information. Solution is to make major, minor and patch public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
Paul Brett created MESOS-3002: - Summary: Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-3002: -- Assignee: Mark Wang Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Mark Wang Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator
[ https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617165#comment-14617165 ] Paul Brett commented on MESOS-3002: --- Mark - can you take a look at this. Thanks Rename OptionT::get(const T _t) to getOrElse() broke network isolator Key: MESOS-3002 URL: https://issues.apache.org/jira/browse/MESOS-3002 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Mark Wang Change to Option from get() to getOrElse() breaks network isolator. Building with '../configure --with-network-isolator' generates the following error: ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static member function 'static Trymesos::slave::Isolator* mesos::internal::slave::PortMappingIsolatorProcess::create(const mesos::internal::slave::Flags)': ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: error: no matching function for call to 'Optionstd::basic_stringchar ::get(const char [1]) const' flags.resources.get(), ^ ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: candidates are: In file included from ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0, from ../../3rdparty/libprocess/include/process/check.hpp:19, from ../../3rdparty/libprocess/include/process/collect.hpp:7, from ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30: ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: const T OptionT::get() const [with T = std::basic_stringchar] const T get() const { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: candidate expects 0 arguments, 1 provided ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: T OptionT::get() [with T = std::basic_stringchar] T get() { assert(isSome()); return t; } ^ ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: candidate expects 0 arguments, 1 provided make[2]: *** [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] Error 1 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src' make: *** [check-recursive] Error 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2993) Document per container unique egress flow and network queueing statistics
[ https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617498#comment-14617498 ] Paul Brett commented on MESOS-2993: --- Review draft available at https://reviews.apache.org/r/36281/ Document per container unique egress flow and network queueing statistics -- Key: MESOS-2993 URL: https://issues.apache.org/jira/browse/MESOS-2993 Project: Mesos Issue Type: Bug Components: documentation, isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Document new network isolation capabilities in 0.23 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3011) Publish release documentation for major releases on website
Paul Brett created MESOS-3011: - Summary: Publish release documentation for major releases on website Key: MESOS-3011 URL: https://issues.apache.org/jira/browse/MESOS-3011 Project: Mesos Issue Type: Documentation Reporter: Paul Brett Currently, the website only provides a single version of the documentation. We should publish documentation for each release on the website independently (for example as https://mesos.apache.org/documentation/0.22/index.html, https://mesos.apache.org/documentation/0.23/index.html) and make latest redirect to the current version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2993) Document per container unique egress flow and network queueing statistics
[ https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617770#comment-14617770 ] Paul Brett commented on MESOS-2993: --- Update incorporating reviewer comments. Document per container unique egress flow and network queueing statistics -- Key: MESOS-2993 URL: https://issues.apache.org/jira/browse/MESOS-2993 Project: Mesos Issue Type: Bug Components: documentation, isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Document new network isolation capabilities in 0.23 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2952) Provide user namespaces for privileged access inside containers
[ https://issues.apache.org/jira/browse/MESOS-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2952: -- Issue Type: Epic (was: Bug) Provide user namespaces for privileged access inside containers --- Key: MESOS-2952 URL: https://issues.apache.org/jira/browse/MESOS-2952 Project: Mesos Issue Type: Epic Reporter: Paul Brett Assignee: Paul Brett User namespaces allow per-namespace mappings of user and group IDs. This means that a process's user and group IDs inside a user namespace can be different from its IDs outside of the namespace. Most notably, a process can have a nonzero user ID outside a namespace while at the same time having a user ID of zero inside the namespace; in other words, the process is unprivileged for operations outside the user namespace but has root privileges inside the namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2993) Document per container unique egress flow and network queueing statistics
Paul Brett created MESOS-2993: - Summary: Document per container unique egress flow and network queueing statistics Key: MESOS-2993 URL: https://issues.apache.org/jira/browse/MESOS-2993 Project: Mesos Issue Type: Bug Components: documentation, isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Document new network isolation capabilities in 0.23 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2994) Design doc for creating user namespaces inside containers
Paul Brett created MESOS-2994: - Summary: Design doc for creating user namespaces inside containers Key: MESOS-2994 URL: https://issues.apache.org/jira/browse/MESOS-2994 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2994) Design doc for creating user namespaces inside containers
[ https://issues.apache.org/jira/browse/MESOS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2994: -- Labels: twitter (was: ) Design doc for creating user namespaces inside containers - Key: MESOS-2994 URL: https://issues.apache.org/jira/browse/MESOS-2994 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Labels: twitter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2993) Document per container unique egress flow and network queueing statistics
[ https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2993: -- Labels: twitter (was: ) Document per container unique egress flow and network queueing statistics -- Key: MESOS-2993 URL: https://issues.apache.org/jira/browse/MESOS-2993 Project: Mesos Issue Type: Bug Components: documentation, isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Document new network isolation capabilities in 0.23 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2956) Stack trace in isolator tests on Linux VM
[ https://issues.apache.org/jira/browse/MESOS-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606156#comment-14606156 ] Paul Brett commented on MESOS-2956: --- https://reviews.apache.org/r/36014/ Stack trace in isolator tests on Linux VM - Key: MESOS-2956 URL: https://issues.apache.org/jira/browse/MESOS-2956 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett PerfEventIsolatorTest fails with stack trace when run in Linux VM [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample F0629 11:38:17.088412 14114 isolator_tests.cpp:837] CHECK_SOME(isolator): Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } *** Check failure stack trace: *** @ 0x2ab5e5aeeb1a google::LogMessage::Fail() @ 0x2ab5e5aeea66 google::LogMessage::SendToLog() @ 0x2ab5e5aee468 google::LogMessage::Flush() @ 0x2ab5e5af137c google::LogMessageFatal::~LogMessageFatal() @ 0x864b0c _CheckFatal::~_CheckFatal() @ 0xc458ed mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody() @ 0x119fb17 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119ac9e testing::internal::HandleExceptionsInMethodIfSupported() @ 0x118305f testing::Test::Run() @ 0x1183782 testing::TestInfo::Run() @ 0x1183d0a testing::TestCase::Run() @ 0x11889d4 testing::internal::UnitTestImpl::RunAllTests() @ 0x11a09ae testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119b9c3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x11878e0 testing::UnitTest::Run() @ 0xcdc8c7 main @ 0x2ab5e7fdbec5 (unknown) @ 0x861a89 (unknown) make[3]: *** [check-local] Aborted (core dumped) [ RUN ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup F0629 11:49:38.763434 18836 isolator_tests.cpp:1200] CHECK_SOME(isolator): Failed to create PerfEvent isolator, invalid events: { cpu-cycles } *** Check failure stack trace: *** @ 0x2ba40eb2db1a google::LogMessage::Fail() @ 0x2ba40eb2da66 google::LogMessage::SendToLog() @ 0x2ba40eb2d468 google::LogMessage::Flush() @ 0x2ba40eb3037c google::LogMessageFatal::~LogMessageFatal() @ 0x864b0c _CheckFatal::~_CheckFatal() @ 0xc5ddb1 mesos::internal::tests::UserCgroupIsolatorTest_ROOT_CGROUPS_UserCgroup_Test::TestBody() @ 0x119fc43 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119adca testing::internal::HandleExceptionsInMethodIfSupported() @ 0x118318b testing::Test::Run() @ 0x11838ae testing::TestInfo::Run() @ 0x1183e36 testing::TestCase::Run() @ 0x1188b00 testing::internal::UnitTestImpl::RunAllTests() @ 0x11a0ada testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119baef testing::internal::HandleExceptionsInMethodIfSupported() @ 0x1187a0c testing::UnitTest::Run() @ 0xcdc9f3 main @ 0x2ba41101aec5 (unknown) @ 0x861a89 (unknown) make[3]: *** [check-local] Aborted (core dumped) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2956) Stack trace in isolator tests on Linux VM
Paul Brett created MESOS-2956: - Summary: Stack trace in isolator tests on Linux VM Key: MESOS-2956 URL: https://issues.apache.org/jira/browse/MESOS-2956 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett PerfEventIsolatorTest fails with stack trace when run in Linux VM [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample F0629 11:38:17.088412 14114 isolator_tests.cpp:837] CHECK_SOME(isolator): Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } *** Check failure stack trace: *** @ 0x2ab5e5aeeb1a google::LogMessage::Fail() @ 0x2ab5e5aeea66 google::LogMessage::SendToLog() @ 0x2ab5e5aee468 google::LogMessage::Flush() @ 0x2ab5e5af137c google::LogMessageFatal::~LogMessageFatal() @ 0x864b0c _CheckFatal::~_CheckFatal() @ 0xc458ed mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody() @ 0x119fb17 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119ac9e testing::internal::HandleExceptionsInMethodIfSupported() @ 0x118305f testing::Test::Run() @ 0x1183782 testing::TestInfo::Run() @ 0x1183d0a testing::TestCase::Run() @ 0x11889d4 testing::internal::UnitTestImpl::RunAllTests() @ 0x11a09ae testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119b9c3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x11878e0 testing::UnitTest::Run() @ 0xcdc8c7 main @ 0x2ab5e7fdbec5 (unknown) @ 0x861a89 (unknown) make[3]: *** [check-local] Aborted (core dumped) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2956) Stack trace in isolator tests on Linux VM
[ https://issues.apache.org/jira/browse/MESOS-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2956: -- Description: PerfEventIsolatorTest fails with stack trace when run in Linux VM [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample F0629 11:38:17.088412 14114 isolator_tests.cpp:837] CHECK_SOME(isolator): Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } *** Check failure stack trace: *** @ 0x2ab5e5aeeb1a google::LogMessage::Fail() @ 0x2ab5e5aeea66 google::LogMessage::SendToLog() @ 0x2ab5e5aee468 google::LogMessage::Flush() @ 0x2ab5e5af137c google::LogMessageFatal::~LogMessageFatal() @ 0x864b0c _CheckFatal::~_CheckFatal() @ 0xc458ed mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody() @ 0x119fb17 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119ac9e testing::internal::HandleExceptionsInMethodIfSupported() @ 0x118305f testing::Test::Run() @ 0x1183782 testing::TestInfo::Run() @ 0x1183d0a testing::TestCase::Run() @ 0x11889d4 testing::internal::UnitTestImpl::RunAllTests() @ 0x11a09ae testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119b9c3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x11878e0 testing::UnitTest::Run() @ 0xcdc8c7 main @ 0x2ab5e7fdbec5 (unknown) @ 0x861a89 (unknown) make[3]: *** [check-local] Aborted (core dumped) [ RUN ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup F0629 11:49:38.763434 18836 isolator_tests.cpp:1200] CHECK_SOME(isolator): Failed to create PerfEvent isolator, invalid events: { cpu-cycles } *** Check failure stack trace: *** @ 0x2ba40eb2db1a google::LogMessage::Fail() @ 0x2ba40eb2da66 google::LogMessage::SendToLog() @ 0x2ba40eb2d468 google::LogMessage::Flush() @ 0x2ba40eb3037c google::LogMessageFatal::~LogMessageFatal() @ 0x864b0c _CheckFatal::~_CheckFatal() @ 0xc5ddb1 mesos::internal::tests::UserCgroupIsolatorTest_ROOT_CGROUPS_UserCgroup_Test::TestBody() @ 0x119fc43 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119adca testing::internal::HandleExceptionsInMethodIfSupported() @ 0x118318b testing::Test::Run() @ 0x11838ae testing::TestInfo::Run() @ 0x1183e36 testing::TestCase::Run() @ 0x1188b00 testing::internal::UnitTestImpl::RunAllTests() @ 0x11a0ada testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119baef testing::internal::HandleExceptionsInMethodIfSupported() @ 0x1187a0c testing::UnitTest::Run() @ 0xcdc9f3 main @ 0x2ba41101aec5 (unknown) @ 0x861a89 (unknown) make[3]: *** [check-local] Aborted (core dumped) was: PerfEventIsolatorTest fails with stack trace when run in Linux VM [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample F0629 11:38:17.088412 14114 isolator_tests.cpp:837] CHECK_SOME(isolator): Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } *** Check failure stack trace: *** @ 0x2ab5e5aeeb1a google::LogMessage::Fail() @ 0x2ab5e5aeea66 google::LogMessage::SendToLog() @ 0x2ab5e5aee468 google::LogMessage::Flush() @ 0x2ab5e5af137c google::LogMessageFatal::~LogMessageFatal() @ 0x864b0c _CheckFatal::~_CheckFatal() @ 0xc458ed mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody() @ 0x119fb17 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119ac9e testing::internal::HandleExceptionsInMethodIfSupported() @ 0x118305f testing::Test::Run() @ 0x1183782 testing::TestInfo::Run() @ 0x1183d0a testing::TestCase::Run() @ 0x11889d4 testing::internal::UnitTestImpl::RunAllTests() @ 0x11a09ae testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x119b9c3 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x11878e0 testing::UnitTest::Run() @ 0xcdc8c7 main @ 0x2ab5e7fdbec5 (unknown) @ 0x861a89 (unknown) make[3]: *** [check-local] Aborted (core dumped) Stack trace in isolator tests on Linux VM - Key: MESOS-2956 URL: https://issues.apache.org/jira/browse/MESOS-2956 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett
[jira] [Created] (MESOS-2952) Provide user namespaces for privileged access inside containers
Paul Brett created MESOS-2952: - Summary: Provide user namespaces for privileged access inside containers Key: MESOS-2952 URL: https://issues.apache.org/jira/browse/MESOS-2952 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett User namespaces allow per-namespace mappings of user and group IDs. This means that a process's user and group IDs inside a user namespace can be different from its IDs outside of the namespace. Most notably, a process can have a nonzero user ID outside a namespace while at the same time having a user ID of zero inside the namespace; in other words, the process is unprivileged for operations outside the user namespace but has root privileges inside the namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2928) Update stout to #include headers for symbols we rely on
[ https://issues.apache.org/jira/browse/MESOS-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2928: -- Description: Update mesos to #include headers for symbols we rely on and reorder to comply with the style guide. (was: Update mesos to #include headers for symbols we rely on) Update stout to #include headers for symbols we rely on --- Key: MESOS-2928 URL: https://issues.apache.org/jira/browse/MESOS-2928 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett Update mesos to #include headers for symbols we rely on and reorder to comply with the style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2928) Update stout #include headers
[ https://issues.apache.org/jira/browse/MESOS-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2928: -- Description: Update stout to #include headers for symbols we rely on and reorder to comply with the style guide. (was: Update mesos to #include headers for symbols we rely on and reorder to comply with the style guide.) Update stout #include headers - Key: MESOS-2928 URL: https://issues.apache.org/jira/browse/MESOS-2928 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett Update stout to #include headers for symbols we rely on and reorder to comply with the style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2927) Update mesos #include headers
[ https://issues.apache.org/jira/browse/MESOS-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2927: -- Description: Update mesos to #include headers for symbols we rely on and reorder to comply with the style guide. (was: Update mesos to #include headers for symbols we rely on) Update mesos #include headers - Key: MESOS-2927 URL: https://issues.apache.org/jira/browse/MESOS-2927 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett Update mesos to #include headers for symbols we rely on and reorder to comply with the style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2929) Update libprocess #include headers
[ https://issues.apache.org/jira/browse/MESOS-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2929: -- Summary: Update libprocess #include headers (was: Update libprocess to #include headers for symbols we rely on) Update libprocess #include headers -- Key: MESOS-2929 URL: https://issues.apache.org/jira/browse/MESOS-2929 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2928) Update stout #include headers
[ https://issues.apache.org/jira/browse/MESOS-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2928: -- Summary: Update stout #include headers (was: Update stout to #include headers for symbols we rely on) Update stout #include headers - Key: MESOS-2928 URL: https://issues.apache.org/jira/browse/MESOS-2928 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett Update mesos to #include headers for symbols we rely on and reorder to comply with the style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2929) Update libprocess #include headers
[ https://issues.apache.org/jira/browse/MESOS-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2929: -- Description: Update libprocess to #include headers for symbols we rely on and reorder to comply with the style guide. Update libprocess #include headers -- Key: MESOS-2929 URL: https://issues.apache.org/jira/browse/MESOS-2929 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett Update libprocess to #include headers for symbols we rely on and reorder to comply with the style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2925) Invalid usage of ATOMIC_FLAG_INIT in member initialization
[ https://issues.apache.org/jira/browse/MESOS-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600121#comment-14600121 ] Paul Brett commented on MESOS-2925: --- [~jvanremoortere] - I think the init macro in the initializer list use looks much better but the compiler warns against it because the behavior is undefined and therefore unsafe. BTW, I'm using clang on Linux, so I don't know if the proposed Apple tweak would help me. Invalid usage of ATOMIC_FLAG_INIT in member initialization -- Key: MESOS-2925 URL: https://issues.apache.org/jira/browse/MESOS-2925 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.23.0 Reporter: Paul Brett The C++ specification states: The macro ATOMIC_FLAG_INIT shall be defined in such a way that it can be used to initialize an object of type atomic_flag to the clear state. The macro can be used in the form: atomic_flag guard = ATOMIC_FLAG_INIT; It is unspecified whether the macro can be used in other initialization contexts. Clang catches this (although reports it erroneously as a braced scaled init issue) and refuses to compile libprocess. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2925) Invalid usage of ATOMIC_FLAG_INIT in member initialization
[ https://issues.apache.org/jira/browse/MESOS-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600104#comment-14600104 ] Paul Brett commented on MESOS-2925: --- Up for review at https://reviews.apache.org/r/35841/ Invalid usage of ATOMIC_FLAG_INIT in member initialization -- Key: MESOS-2925 URL: https://issues.apache.org/jira/browse/MESOS-2925 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.23.0 Reporter: Paul Brett The C++ specification states: The macro ATOMIC_FLAG_INIT shall be defined in such a way that it can be used to initialize an object of type atomic_flag to the clear state. The macro can be used in the form: atomic_flag guard = ATOMIC_FLAG_INIT; It is unspecified whether the macro can be used in other initialization contexts. Clang catches this (although reports it erroneously as a braced scaled init issue) and refuses to compile libprocess. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2926) Extend mesos-style.py/cpplint.py to check #include files
Paul Brett created MESOS-2926: - Summary: Extend mesos-style.py/cpplint.py to check #include files Key: MESOS-2926 URL: https://issues.apache.org/jira/browse/MESOS-2926 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett cpplint.py provides the capability to enforce the style guide requirements for #including everything you use and ordering files based on type but it does not work for mesos because we do use #include ... for project files where it expects #include We should update the style checker to support our include usage and then turn it on by default in the commit hook. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2927) Update mesos to #include headers for symbols we rely on
Paul Brett created MESOS-2927: - Summary: Update mesos to #include headers for symbols we rely on Key: MESOS-2927 URL: https://issues.apache.org/jira/browse/MESOS-2927 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett Update mesos to #include headers for symbols we rely on -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2929) Update libprocess to #include headers for symbols we rely on
Paul Brett created MESOS-2929: - Summary: Update libprocess to #include headers for symbols we rely on Key: MESOS-2929 URL: https://issues.apache.org/jira/browse/MESOS-2929 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2928) Update stout to #include headers for symbols we rely on
[ https://issues.apache.org/jira/browse/MESOS-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600455#comment-14600455 ] Paul Brett edited comment on MESOS-2928 at 6/25/15 12:58 AM: - https://reviews.apache.org/r/35861/ was (Author: pbrett): https://reviews.apache.org/r/35860/ Update stout to #include headers for symbols we rely on --- Key: MESOS-2928 URL: https://issues.apache.org/jira/browse/MESOS-2928 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett Update mesos to #include headers for symbols we rely on -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2928) Update stout to #include headers for symbols we rely on
[ https://issues.apache.org/jira/browse/MESOS-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600455#comment-14600455 ] Paul Brett commented on MESOS-2928: --- https://reviews.apache.org/r/35860/ Update stout to #include headers for symbols we rely on --- Key: MESOS-2928 URL: https://issues.apache.org/jira/browse/MESOS-2928 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett Update mesos to #include headers for symbols we rely on -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2925) Invalid usage of ATOMIC_FLAG_INIT in member initialization
[ https://issues.apache.org/jira/browse/MESOS-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-2925: - Assignee: Paul Brett Invalid usage of ATOMIC_FLAG_INIT in member initialization -- Key: MESOS-2925 URL: https://issues.apache.org/jira/browse/MESOS-2925 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett The C++ specification states: The macro ATOMIC_FLAG_INIT shall be defined in such a way that it can be used to initialize an object of type atomic_flag to the clear state. The macro can be used in the form: atomic_flag guard = ATOMIC_FLAG_INIT; It is unspecified whether the macro can be used in other initialization contexts. Clang catches this (although reports it erroneously as a braced scaled init issue) and refuses to compile libprocess. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2925) Invalid usage of ATOMIC_FLAG_INIT in member initialization
Paul Brett created MESOS-2925: - Summary: Invalid usage of ATOMIC_FLAG_INIT in member initialization Key: MESOS-2925 URL: https://issues.apache.org/jira/browse/MESOS-2925 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.23.0 Reporter: Paul Brett The C++ specification states: The macro ATOMIC_FLAG_INIT shall be defined in such a way that it can be used to initialize an object of type atomic_flag to the clear state. The macro can be used in the form: atomic_flag guard = ATOMIC_FLAG_INIT; It is unspecified whether the macro can be used in other initialization contexts. Clang catches this (although reports it erroneously as a braced scaled init issue) and refuses to compile libprocess. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2903) Network isolator should not fail when target state already exists
[ https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2903: -- Story Points: 3 (was: 2) Network isolator should not fail when target state already exists - Key: MESOS-2903 URL: https://issues.apache.org/jira/browse/MESOS-2903 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Priority: Critical Network isolator has multiple instances of the following pattern: {noformat} Trybool something = ::create(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to create something ...) } else if (!icmpVethToEth0.get()) { ++metrics.adding_veth_icmp_filters_already_exist; return Failure(Something already exists); } {noformat} These failures have occurred in operation due to the failure to recover or delete an orphan, causing the slave to remain on line but unable to create new resources.We should convert the second failure message in this pattern to an information message since the final state of the system is the state that we requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2904) Add slave metric to count container launch failures
[ https://issues.apache.org/jira/browse/MESOS-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596747#comment-14596747 ] Paul Brett commented on MESOS-2904: --- Fix without test hardness is out for review https://reviews.apache.org/r/35738/ Add slave metric to count container launch failures --- Key: MESOS-2904 URL: https://issues.apache.org/jira/browse/MESOS-2904 Project: Mesos Issue Type: Bug Components: slave, statistics Reporter: Paul Brett Assignee: Paul Brett We have seen circumstances where a machine has been consistently unable to launch containers due to an inconsistent state (for example, unexpected network configuration). Adding a metric to track container launch failures will allow us to detect and alert on slaves in such a state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2903) Network isolator should not fail when target state already exists
Paul Brett created MESOS-2903: - Summary: Network isolator should not fail when target state already exists Key: MESOS-2903 URL: https://issues.apache.org/jira/browse/MESOS-2903 Project: Mesos Issue Type: Bug Components: isolation Reporter: Paul Brett Network isolator has multiple instances of the following pattern: {noformat} Trybool something = ::create(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to create something ...) } else if (!icmpVethToEth0.get()) { ++metrics.adding_veth_icmp_filters_already_exist; return Failure(Something already exists); } {noformat} These failures have occurred in operation due to the failure to recover or delete an orphan, causing the slave to remain on line but unable to create new resources.We should convert the second failure message in this pattern to an information message since the final state of the system is the state that we requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2904) Add slave metric to count container launch failures
Paul Brett created MESOS-2904: - Summary: Add slave metric to count container launch failures Key: MESOS-2904 URL: https://issues.apache.org/jira/browse/MESOS-2904 Project: Mesos Issue Type: Bug Components: slave, statistics Reporter: Paul Brett Assignee: Paul Brett We have seen circumstances where a machine has been consistently unable to launch containers due to an inconsistent state (for example, unexpected network configuration). Adding a metric to track container launch failures will allow us to detect and alert on slaves in such a state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2903) Network isolator should not fail when target state already exists
[ https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594171#comment-14594171 ] Paul Brett commented on MESOS-2903: --- The new logic will be: {noformat} Trybool something = ::create(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to create something ...) } else if (!icmpVethToEth0.get()) { // already exists Trybool something = ::update(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to update something ...) } } {noformat} Network isolator should not fail when target state already exists - Key: MESOS-2903 URL: https://issues.apache.org/jira/browse/MESOS-2903 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Priority: Critical Network isolator has multiple instances of the following pattern: {noformat} Trybool something = ::create(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to create something ...) } else if (!icmpVethToEth0.get()) { ++metrics.adding_veth_icmp_filters_already_exist; return Failure(Something already exists); } {noformat} These failures have occurred in operation due to the failure to recover or delete an orphan, causing the slave to remain on line but unable to create new resources.We should convert the second failure message in this pattern to an information message since the final state of the system is the state that we requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2903) Network isolator should not fail when target state already exists
[ https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett reassigned MESOS-2903: - Assignee: Paul Brett Network isolator should not fail when target state already exists - Key: MESOS-2903 URL: https://issues.apache.org/jira/browse/MESOS-2903 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Priority: Critical Network isolator has multiple instances of the following pattern: {noformat} Trybool something = ::create(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to create something ...) } else if (!icmpVethToEth0.get()) { ++metrics.adding_veth_icmp_filters_already_exist; return Failure(Something already exists); } {noformat} These failures have occurred in operation due to the failure to recover or delete an orphan, causing the slave to remain on line but unable to create new resources.We should convert the second failure message in this pattern to an information message since the final state of the system is the state that we requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2903) Network isolator should not fail when target state already exists
[ https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2903: -- Story Points: 2 Network isolator should not fail when target state already exists - Key: MESOS-2903 URL: https://issues.apache.org/jira/browse/MESOS-2903 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Priority: Critical Network isolator has multiple instances of the following pattern: {noformat} Trybool something = ::create(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to create something ...) } else if (!icmpVethToEth0.get()) { ++metrics.adding_veth_icmp_filters_already_exist; return Failure(Something already exists); } {noformat} These failures have occurred in operation due to the failure to recover or delete an orphan, causing the slave to remain on line but unable to create new resources.We should convert the second failure message in this pattern to an information message since the final state of the system is the state that we requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2853) Report per-container metrics from host egress filter
[ https://issues.apache.org/jira/browse/MESOS-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594175#comment-14594175 ] Paul Brett commented on MESOS-2853: --- Container metrics are not tracked by fq_codel on a per-filter basis, hence this information is not available. Will wait to see the interaction of fq_codel on host eth0 with real workloads before deciding if further work is required. Report per-container metrics from host egress filter Key: MESOS-2853 URL: https://issues.apache.org/jira/browse/MESOS-2853 Project: Mesos Issue Type: Improvement Components: isolation Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Export in statistics.json the fq_codel flow statistics for each container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2332) Report per-container metrics for network bandwidth throttling
[ https://issues.apache.org/jira/browse/MESOS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590894#comment-14590894 ] Paul Brett commented on MESOS-2332: --- Network performance statistics are now reported in statistics.json on the slave. Report per-container metrics for network bandwidth throttling - Key: MESOS-2332 URL: https://issues.apache.org/jira/browse/MESOS-2332 Project: Mesos Issue Type: Improvement Components: isolation Reporter: Paul Brett Assignee: Paul Brett Labels: features, twitter Export metrics from the network isolation to identify scope and duration of container throttling. Packet loss can be identified from the overlimits and requeues fields of the htb qdisc report for the virtual interface, e.g. {noformat} $ tc -s -d qdisc show dev mesos19223 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 158213287452 bytes 1030876393 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc ingress : parent :fff1 Sent 119381747824 bytes 1144549901 pkt (dropped 2044879, overlimits 0 requeues 0) backlog 0b 0p requeues 0 {noformat} Note that since a packet can be examined multiple times before transmission, overlimits can exceed total packets sent. Add to the port_mapping isolator usage() and the container statistics protobuf. Carefully consider the naming (esp tx/rx) + commenting of the protobuf fields so it's clear what these represent and how they are different to the existing dropped packet counts from the network stack. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2874) Convert PortMappingStatistics to use automatic JSON encoding/decoding
Paul Brett created MESOS-2874: - Summary: Convert PortMappingStatistics to use automatic JSON encoding/decoding Key: MESOS-2874 URL: https://issues.apache.org/jira/browse/MESOS-2874 Project: Mesos Issue Type: Bug Reporter: Paul Brett Assignee: Paul Brett Simplify PortMappingStatistics by using JSON::Protocol and protobuf::parse to convert ResourceStatistics to/from line format. This change will simplify the implementation of MESOS-2332. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2874) Convert PortMappingStatistics to use automatic JSON encoding/decoding
[ https://issues.apache.org/jira/browse/MESOS-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2874: -- Component/s: test isolation Convert PortMappingStatistics to use automatic JSON encoding/decoding - Key: MESOS-2874 URL: https://issues.apache.org/jira/browse/MESOS-2874 Project: Mesos Issue Type: Bug Components: isolation, test Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Simplify PortMappingStatistics by using JSON::Protocol and protobuf::parse to convert ResourceStatistics to/from line format. This change will simplify the implementation of MESOS-2332. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2784) Add constexpr to C++11 whitelist
[ https://issues.apache.org/jira/browse/MESOS-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586692#comment-14586692 ] Paul Brett commented on MESOS-2784: --- Comments from review incorporated and review updated. Add constexpr to C++11 whitelist Key: MESOS-2784 URL: https://issues.apache.org/jira/browse/MESOS-2784 Project: Mesos Issue Type: Improvement Components: documentation Reporter: Paul Brett Assignee: Paul Brett Labels: twitter constexpr is currently used to eliminate initialization dependency issues for non-POD objects. We should add it to the whitelist of acceptable c++11 features in the style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2853) Report per-container metrics from host egress filter
Paul Brett created MESOS-2853: - Summary: Report per-container metrics from host egress filter Key: MESOS-2853 URL: https://issues.apache.org/jira/browse/MESOS-2853 Project: Mesos Issue Type: Improvement Components: isolation, twitter Reporter: Paul Brett Assignee: Paul Brett Export in statistics.json the fq_codel flow statistics for each container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-2821) Document and consolidate qdisc handles
[ https://issues.apache.org/jira/browse/MESOS-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett closed MESOS-2821. - Document and consolidate qdisc handles -- Key: MESOS-2821 URL: https://issues.apache.org/jira/browse/MESOS-2821 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Labels: twitter The structure of traffic control qdiscs and filters in non-trivial with the knowledge of which handles are the parents of which filters or qdiscs are in the create and recovery functions and will be needed to collect statistics on the links. Lets pull out the constants and document them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2836) Report per-container metrics for network bandwidth throttling to the slave
Paul Brett created MESOS-2836: - Summary: Report per-container metrics for network bandwidth throttling to the slave Key: MESOS-2836 URL: https://issues.apache.org/jira/browse/MESOS-2836 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Report per-container metrics for network bandwidth throttling to the slave in the output of mesos-network-helper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2836) Report per-container metrics for network bandwidth throttling to the slave
[ https://issues.apache.org/jira/browse/MESOS-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579626#comment-14579626 ] Paul Brett commented on MESOS-2836: --- https://reviews.apache.org/r/35229/ Report per-container metrics for network bandwidth throttling to the slave -- Key: MESOS-2836 URL: https://issues.apache.org/jira/browse/MESOS-2836 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Report per-container metrics for network bandwidth throttling to the slave in the output of mesos-network-helper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2837) Decode network statistics from mesos-network-helper
Paul Brett created MESOS-2837: - Summary: Decode network statistics from mesos-network-helper Key: MESOS-2837 URL: https://issues.apache.org/jira/browse/MESOS-2837 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Decode network statistics from mesos-network-helper and output to slave statistics.json -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2837) Decode network statistics from mesos-network-helper
[ https://issues.apache.org/jira/browse/MESOS-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579628#comment-14579628 ] Paul Brett commented on MESOS-2837: --- https://reviews.apache.org/r/35257/ Decode network statistics from mesos-network-helper --- Key: MESOS-2837 URL: https://issues.apache.org/jira/browse/MESOS-2837 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Decode network statistics from mesos-network-helper and output to slave statistics.json -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2752) Add HTB queueing discipline wrapper class
[ https://issues.apache.org/jira/browse/MESOS-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2752: -- Issue Type: Improvement (was: Bug) Add HTB queueing discipline wrapper class - Key: MESOS-2752 URL: https://issues.apache.org/jira/browse/MESOS-2752 Project: Mesos Issue Type: Improvement Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Network isolator uses a Hierarchical Token Bucket (HTB) traffic control discipline on the egress filter inside each container as the root for adding traffic filters. A HTB wrapper is needed to access the network statistics for this interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)