[jira] [Assigned] (MESOS-2929) Update libprocess #include headers

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2929:
-

Assignee: (was: Paul Brett)

> Update libprocess #include headers
> --
>
> Key: MESOS-2929
> URL: https://issues.apache.org/jira/browse/MESOS-2929
> Project: Mesos
>  Issue Type: Bug
>Reporter: Paul Brett
>
> Update libprocess to #include headers for symbols we rely on and reorder to 
> comply with the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2853) Report per-container metrics from host egress filter

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2853:
-

Assignee: (was: Paul Brett)

> Report per-container metrics from host egress filter
> 
>
> Key: MESOS-2853
> URL: https://issues.apache.org/jira/browse/MESOS-2853
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Paul Brett
>  Labels: twitter
>
> Export in statistics.json the fq_codel flow statistics for each container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2926) Extend mesos-style.py/cpplint.py to check #include files

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2926:
-

Assignee: (was: Paul Brett)

> Extend mesos-style.py/cpplint.py to check #include files
> 
>
> Key: MESOS-2926
> URL: https://issues.apache.org/jira/browse/MESOS-2926
> Project: Mesos
>  Issue Type: Bug
>Reporter: Paul Brett
>
> cpplint.py provides the capability to enforce the style guide requirements 
> for #including everything you use and ordering files based on type but it 
> does not work for mesos because we do use #include <...> for project files 
> where it expects #include "...".  
> We should update the style checker to support our include usage and then turn 
> it on by default in the commit hook.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2927) Update mesos #include headers

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2927:
-

Assignee: (was: Paul Brett)

> Update mesos #include headers
> -
>
> Key: MESOS-2927
> URL: https://issues.apache.org/jira/browse/MESOS-2927
> Project: Mesos
>  Issue Type: Bug
>Reporter: Paul Brett
>
> Update mesos to #include headers for symbols we rely on and reorder to comply 
> with the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2952) Provide user namespaces for privileged access inside containers

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2952:
-

Assignee: (was: Paul Brett)

> Provide user namespaces for privileged access inside containers
> ---
>
> Key: MESOS-2952
> URL: https://issues.apache.org/jira/browse/MESOS-2952
> Project: Mesos
>  Issue Type: Epic
>Reporter: Paul Brett
>
> User namespaces allow per-namespace mappings of user and group IDs. This 
> means that a process's user and group IDs inside a user namespace can be 
> different from its IDs outside of the namespace. Most notably, a process can 
> have a nonzero user ID outside a namespace while at the same time having a 
> user ID of zero inside the namespace; in other words, the process is 
> unprivileged for operations outside the user namespace but has root 
> privileges inside the namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2994) Design doc for creating user namespaces inside containers

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2994:
-

Assignee: (was: Paul Brett)

> Design doc for creating user namespaces inside containers
> -
>
> Key: MESOS-2994
> URL: https://issues.apache.org/jira/browse/MESOS-2994
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Paul Brett
>  Labels: twitter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1977) Disk Isolator Usage Metrics

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-1977:
-

Assignee: (was: Paul Brett)

> Disk Isolator Usage  Metrics
> 
>
> Key: MESOS-1977
> URL: https://issues.apache.org/jira/browse/MESOS-1977
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Joris Van Remoortere
>  Labels: mesosphere
>
> Implement just the usage statistics aspect of the block io isolator for the 
> mesos containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2599) Make exit codes unique

2015-10-12 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2599:
-

Assignee: (was: Paul Brett)

> Make exit codes unique
> --
>
> Key: MESOS-2599
> URL: https://issues.apache.org/jira/browse/MESOS-2599
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Paul Brett
>  Labels: twitter
>
> Currently, we use EXIT(1) for all exits from the slave.  If we make the exit 
> code  unique for each reason, we can use the exit code to analyze failures.  
> Grouping the exit codes between startup exits (before the slave ever offered 
> service) and in service exits. Additionally, it would be useful to identify 
> which exists are expected to clear on a retry.
> We should validate if the exit code is being inspected by calling scripts, 
> which could break with the updated exit codes.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1977) Disk Isolator Usage Metrics

2015-10-12 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-1977:
-

Assignee: Paul Brett

> Disk Isolator Usage  Metrics
> 
>
> Key: MESOS-1977
> URL: https://issues.apache.org/jira/browse/MESOS-1977
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Joris Van Remoortere
>Assignee: Paul Brett
>  Labels: mesosphere
>
> Implement just the usage statistics aspect of the block io isolator for the 
> mesos containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3588) Port mapping isolator check failed: createQdisc.get()

2015-10-05 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3588:
-

 Summary: Port mapping isolator check failed: createQdisc.get()
 Key: MESOS-3588
 URL: https://issues.apache.org/jira/browse/MESOS-3588
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett


Container creation is failing occasionally due to the required name already 
existing, e.g:

{code}
F1005 13:25:04.331053 48582 port_mapping.cpp:2245] Check failed: 
createQdisc.get()
*** Check failure stack trace: ***  
@ 0x7f3b5c3b668d  google::LogMessage::Fail()
@ 0x7f3b5c3b84d4  google::LogMessage::SendToLog()   
@ 0x7f3b5c3b627c  google::LogMessage::Flush()   
@ 0x7f3b5c3b8dc9  google::LogMessageFatal::~LogMessageFatal()   
@ 0x7f3b5c0bdc8c  
mesos::internal::slave::PortMappingIsolatorProcess::isolate()
@ 0x7f3b5bf28fd6  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave20MesosIsolatorProcessERKNS6_11ContainerIDEiSA_iEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
@ 0x7f3b5c3690b1  process::ProcessManager::resume() 
@ 0x7f3b5c3693af  process::internal::schedule() 
@ 0x7f3b5c478cd0  execute_native_thread_routine 
@ 0x7f3b5b14283d  start_thread  
@ 0x7f3b5abb7fdd  clone 
/usr/local/bin/mesos-slave.sh: line 102: 48575 Aborted (core 
dumped) $debug /usr/local/sbin/mesos-slave "${MESOS_FLAGS[@]}"
Slave Exit Status: 134  
{code}  
  

It appears the there are valid circumstances under which the kernel can 
reallocate the namespace PID before the containers external interface 
(mesos_n) has been destroyed.

{code}
  2236   // Prepare the ingress queueing disciplines on veth.   
   
  2237   Try createQdisc = ingress::create(veth(pid));
   
  2238   if (createQdisc.isError()) {   
   
  2239 return Failure(  
   
  2240 "Failed to create the ingress qdisc on " + veth(pid) +   
   
  2241 ": " + createQdisc.error()); 
   
  2242   }  
   
  2243  
   
  2244   // Veth device should exist since we just created it.  
   
  2245   CHECK(createQdisc.get());   
{code}

We should check for test for link already exists errors in port mapping (e.g. 
link::create returns false) and fail the container creation rather than killing 
the slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3513) Cgroups Test Filters aborts tests on Centos 6.6

2015-09-24 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3513:
-

 Summary: Cgroups Test Filters aborts tests on Centos 6.6 
 Key: MESOS-3513
 URL: https://issues.apache.org/jira/browse/MESOS-3513
 Project: Mesos
  Issue Type: Bug
  Components: slave, test
 Environment: Centos 6.6
Reporter: Paul Brett
Assignee: Paul Brett


Running make check on centos 6.6 causes all tests to abort due to CHECK_SOME 
test in CgroupsFIlter:

{code}
Build directory: /home/jenkins/workspace/mesos-config-centos6/build
F0923 23:00:49.748896 27362 environment.cpp:132] CHECK_SOME(hierarchies_): 
Failed to determine canonical path of /sys/fs/cgroup/freezer: No such file or 
directory 
*** Check failure stack trace: ***
@ 0x7fb786ca0c4d  google::LogMessage::Fail()
@ 0x7fb786ca298c  google::LogMessage::SendToLog()
@ 0x7fb786ca083c  google::LogMessage::Flush()
@ 0x7fb786ca3289  google::LogMessageFatal::~LogMessageFatal()
@   0x58e66c  mesos::internal::tests::CgroupsFilter::CgroupsFilter()
@   0x58712f  mesos::internal::tests::Environment::Environment()
@   0x4c882f  main
@ 0x7fb782767d5d  __libc_start_main
@   0x4d6331  (unknown)
make[3]: *** [check-local] Aborted
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2015-09-21 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900954#comment-14900954
 ] 

Paul Brett commented on MESOS-3422:
---

Tested HEAD on Centos6 (original reporting platform) with no errors.  

{code}
[--] 1 test from MasterSlaveReconciliationTest
[ RUN  ] MasterSlaveReconciliationTest.ReconcileLostTask
Using temporary directory 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI'
I0921 16:38:36.016902 51925 leveldb.cpp:176] Opened db in 73.30966ms
I0921 16:38:36.023943 51925 leveldb.cpp:183] Compacted db in 6.963667ms
I0921 16:38:36.024034 51925 leveldb.cpp:198] Created db iterator in 48856ns
I0921 16:38:36.024061 51925 leveldb.cpp:204] Seeked to beginning of db in 3684ns
I0921 16:38:36.024077 51925 leveldb.cpp:273] Iterated through 0 keys in the db 
in 337ns
I0921 16:38:36.024189 51925 replica.cpp:744] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0921 16:38:36.025542 51935 recover.cpp:449] Starting replica recovery
I0921 16:38:36.026080 51935 recover.cpp:475] Replica is in EMPTY status
I0921 16:38:36.028053 51930 master.cpp:380] Master 
20150921-163836-2081170186-40941-51925 (smfd-aki-27-sr1.devel.twitter.com) 
started on 10.35.12.124:40941
I0921 16:38:36.028286 51934 replica.cpp:641] Replica in EMPTY status received a 
broadcasted recover request
I0921 16:38:36.028094 51930 master.cpp:382] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials"
 --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_slave_ping_timeouts="5" --quiet="false" 
--recovery_slave_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
--registry_strict="true" --root_submissions="true" 
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
--user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/master" 
--zk_session_timeout="10secs"
I0921 16:38:36.029104 51930 master.cpp:427] Master only allowing authenticated 
frameworks to register
I0921 16:38:36.029132 51930 master.cpp:432] Master only allowing authenticated 
slaves to register
I0921 16:38:36.029155 51930 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials'
I0921 16:38:36.029250 51936 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I0921 16:38:36.029738 51930 master.cpp:471] Using default 'crammd5' 
authenticator
I0921 16:38:36.029908 51930 authenticator.cpp:512] Initializing server SASL
I0921 16:38:36.029947 51940 recover.cpp:566] Updating replica status to STARTING
I0921 16:38:36.030782 51930 master.cpp:508] Authorization enabled
I0921 16:38:36.036074 51926 master.cpp:1607] The newly elected leader is 
master@10.35.12.124:40941 with id 20150921-163836-2081170186-40941-51925
I0921 16:38:36.036110 51926 master.cpp:1620] Elected as the leading master!
I0921 16:38:36.036145 51926 master.cpp:1380] Recovering from registrar
I0921 16:38:36.036335 51930 registrar.cpp:309] Recovering registrar
I0921 16:38:36.067191 51938 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 36.988836ms
I0921 16:38:36.067246 51938 replica.cpp:323] Persisted replica status to 
STARTING
I0921 16:38:36.067517 51938 recover.cpp:475] Replica is in STARTING status
I0921 16:38:36.068230 51936 replica.cpp:641] Replica in STARTING status 
received a broadcasted recover request
I0921 16:38:36.068429 51928 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I0921 16:38:36.068729 51927 recover.cpp:566] Updating replica status to VOTING
I0921 16:38:36.074915 51940 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 6.095154ms
I0921 16:38:36.074942 51940 replica.cpp:323] Persisted replica status to VOTING
I0921 16:38:36.075021 51936 recover.cpp:580] Successfully joined the Paxos group
I0921 16:38:36.075228 51936 recover.cpp:464] Recover process terminated
I0921 16:38:36.075657 51926 log.cpp:661] Attempting to start the writer
I0921 16:38:36.077828 51927 replica.cpp:477] Replica received implicit promise 
request with proposal 1
I0921 16:38:36.091645 51927 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 13.779849ms
I0921 16:38:36.091686 51927 replica.cpp:345] Persisted promised to 1
I0921 16:38:36.092543 51934 coordinator.cpp:231] Coordinator attemping to fill 
missing position
I0921 16:38:36.094199 51939 replica.cpp:378] Replica received explicit promise 
request for position 0 with 

[jira] [Commented] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2015-09-21 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900955#comment-14900955
 ] 

Paul Brett commented on MESOS-3422:
---

Tested HEAD on Centos6 (original reporting platform) with no errors.  

{code}
[--] 1 test from MasterSlaveReconciliationTest
[ RUN  ] MasterSlaveReconciliationTest.ReconcileLostTask
Using temporary directory 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI'
I0921 16:38:36.016902 51925 leveldb.cpp:176] Opened db in 73.30966ms
I0921 16:38:36.023943 51925 leveldb.cpp:183] Compacted db in 6.963667ms
I0921 16:38:36.024034 51925 leveldb.cpp:198] Created db iterator in 48856ns
I0921 16:38:36.024061 51925 leveldb.cpp:204] Seeked to beginning of db in 3684ns
I0921 16:38:36.024077 51925 leveldb.cpp:273] Iterated through 0 keys in the db 
in 337ns
I0921 16:38:36.024189 51925 replica.cpp:744] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0921 16:38:36.025542 51935 recover.cpp:449] Starting replica recovery
I0921 16:38:36.026080 51935 recover.cpp:475] Replica is in EMPTY status
I0921 16:38:36.028053 51930 master.cpp:380] Master 
20150921-163836-2081170186-40941-51925 (smfd-aki-27-sr1.devel.twitter.com) 
started on 10.35.12.124:40941
I0921 16:38:36.028286 51934 replica.cpp:641] Replica in EMPTY status received a 
broadcasted recover request
I0921 16:38:36.028094 51930 master.cpp:382] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials"
 --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_slave_ping_timeouts="5" --quiet="false" 
--recovery_slave_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
--registry_strict="true" --root_submissions="true" 
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
--user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/master" 
--zk_session_timeout="10secs"
I0921 16:38:36.029104 51930 master.cpp:427] Master only allowing authenticated 
frameworks to register
I0921 16:38:36.029132 51930 master.cpp:432] Master only allowing authenticated 
slaves to register
I0921 16:38:36.029155 51930 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials'
I0921 16:38:36.029250 51936 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I0921 16:38:36.029738 51930 master.cpp:471] Using default 'crammd5' 
authenticator
I0921 16:38:36.029908 51930 authenticator.cpp:512] Initializing server SASL
I0921 16:38:36.029947 51940 recover.cpp:566] Updating replica status to STARTING
I0921 16:38:36.030782 51930 master.cpp:508] Authorization enabled
I0921 16:38:36.036074 51926 master.cpp:1607] The newly elected leader is 
master@10.35.12.124:40941 with id 20150921-163836-2081170186-40941-51925
I0921 16:38:36.036110 51926 master.cpp:1620] Elected as the leading master!
I0921 16:38:36.036145 51926 master.cpp:1380] Recovering from registrar
I0921 16:38:36.036335 51930 registrar.cpp:309] Recovering registrar
I0921 16:38:36.067191 51938 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 36.988836ms
I0921 16:38:36.067246 51938 replica.cpp:323] Persisted replica status to 
STARTING
I0921 16:38:36.067517 51938 recover.cpp:475] Replica is in STARTING status
I0921 16:38:36.068230 51936 replica.cpp:641] Replica in STARTING status 
received a broadcasted recover request
I0921 16:38:36.068429 51928 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I0921 16:38:36.068729 51927 recover.cpp:566] Updating replica status to VOTING
I0921 16:38:36.074915 51940 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 6.095154ms
I0921 16:38:36.074942 51940 replica.cpp:323] Persisted replica status to VOTING
I0921 16:38:36.075021 51936 recover.cpp:580] Successfully joined the Paxos group
I0921 16:38:36.075228 51936 recover.cpp:464] Recover process terminated
I0921 16:38:36.075657 51926 log.cpp:661] Attempting to start the writer
I0921 16:38:36.077828 51927 replica.cpp:477] Replica received implicit promise 
request with proposal 1
I0921 16:38:36.091645 51927 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 13.779849ms
I0921 16:38:36.091686 51927 replica.cpp:345] Persisted promised to 1
I0921 16:38:36.092543 51934 coordinator.cpp:231] Coordinator attemping to fill 
missing position
I0921 16:38:36.094199 51939 replica.cpp:378] Replica received explicit promise 
request for position 0 with 

[jira] [Commented] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2015-09-21 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900956#comment-14900956
 ] 

Paul Brett commented on MESOS-3422:
---

Tested HEAD on Centos6 (original reporting platform) with no errors.  

{code}
[--] 1 test from MasterSlaveReconciliationTest
[ RUN  ] MasterSlaveReconciliationTest.ReconcileLostTask
Using temporary directory 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI'
I0921 16:38:36.016902 51925 leveldb.cpp:176] Opened db in 73.30966ms
I0921 16:38:36.023943 51925 leveldb.cpp:183] Compacted db in 6.963667ms
I0921 16:38:36.024034 51925 leveldb.cpp:198] Created db iterator in 48856ns
I0921 16:38:36.024061 51925 leveldb.cpp:204] Seeked to beginning of db in 3684ns
I0921 16:38:36.024077 51925 leveldb.cpp:273] Iterated through 0 keys in the db 
in 337ns
I0921 16:38:36.024189 51925 replica.cpp:744] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0921 16:38:36.025542 51935 recover.cpp:449] Starting replica recovery
I0921 16:38:36.026080 51935 recover.cpp:475] Replica is in EMPTY status
I0921 16:38:36.028053 51930 master.cpp:380] Master 
20150921-163836-2081170186-40941-51925 (smfd-aki-27-sr1.devel.twitter.com) 
started on 10.35.12.124:40941
I0921 16:38:36.028286 51934 replica.cpp:641] Replica in EMPTY status received a 
broadcasted recover request
I0921 16:38:36.028094 51930 master.cpp:382] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials"
 --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_slave_ping_timeouts="5" --quiet="false" 
--recovery_slave_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
--registry_strict="true" --root_submissions="true" 
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
--user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/master" 
--zk_session_timeout="10secs"
I0921 16:38:36.029104 51930 master.cpp:427] Master only allowing authenticated 
frameworks to register
I0921 16:38:36.029132 51930 master.cpp:432] Master only allowing authenticated 
slaves to register
I0921 16:38:36.029155 51930 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_i61HPI/credentials'
I0921 16:38:36.029250 51936 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I0921 16:38:36.029738 51930 master.cpp:471] Using default 'crammd5' 
authenticator
I0921 16:38:36.029908 51930 authenticator.cpp:512] Initializing server SASL
I0921 16:38:36.029947 51940 recover.cpp:566] Updating replica status to STARTING
I0921 16:38:36.030782 51930 master.cpp:508] Authorization enabled
I0921 16:38:36.036074 51926 master.cpp:1607] The newly elected leader is 
master@10.35.12.124:40941 with id 20150921-163836-2081170186-40941-51925
I0921 16:38:36.036110 51926 master.cpp:1620] Elected as the leading master!
I0921 16:38:36.036145 51926 master.cpp:1380] Recovering from registrar
I0921 16:38:36.036335 51930 registrar.cpp:309] Recovering registrar
I0921 16:38:36.067191 51938 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 36.988836ms
I0921 16:38:36.067246 51938 replica.cpp:323] Persisted replica status to 
STARTING
I0921 16:38:36.067517 51938 recover.cpp:475] Replica is in STARTING status
I0921 16:38:36.068230 51936 replica.cpp:641] Replica in STARTING status 
received a broadcasted recover request
I0921 16:38:36.068429 51928 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I0921 16:38:36.068729 51927 recover.cpp:566] Updating replica status to VOTING
I0921 16:38:36.074915 51940 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 6.095154ms
I0921 16:38:36.074942 51940 replica.cpp:323] Persisted replica status to VOTING
I0921 16:38:36.075021 51936 recover.cpp:580] Successfully joined the Paxos group
I0921 16:38:36.075228 51936 recover.cpp:464] Recover process terminated
I0921 16:38:36.075657 51926 log.cpp:661] Attempting to start the writer
I0921 16:38:36.077828 51927 replica.cpp:477] Replica received implicit promise 
request with proposal 1
I0921 16:38:36.091645 51927 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 13.779849ms
I0921 16:38:36.091686 51927 replica.cpp:345] Persisted promised to 1
I0921 16:38:36.092543 51934 coordinator.cpp:231] Coordinator attemping to fill 
missing position
I0921 16:38:36.094199 51939 replica.cpp:378] Replica received explicit promise 
request for position 0 with 

[jira] [Assigned] (MESOS-3422) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2015-09-18 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-3422:
-

Assignee: Paul Brett

> MasterSlaveReconciliationTest.ReconcileLostTask test is flaky
> -
>
> Key: MESOS-3422
> URL: https://issues.apache.org/jira/browse/MESOS-3422
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Affects Versions: 0.25.0
> Environment: CentOS
>Reporter: Vinod Kone
>Assignee: Paul Brett
>
> Observed this on internal CI
> {code}
> DEBUG: [--] 5 tests from MasterSlaveReconciliationTest
> DEBUG: [ RUN ] MasterSlaveReconciliationTest.SlaveReregisterTerminatedExecutor
> DEBUG: Using temporary directory 
> '/tmp/MasterSlaveReconciliationTest_SlaveReregisterTerminatedExecutor_QJPUzf'
> DEBUG: [ OK ] MasterSlaveReconciliationTest.SlaveReregisterTerminatedExecutor 
> (78 ms)
> DEBUG: [ RUN ] MasterSlaveReconciliationTest.ReconcileLostTask
> DEBUG: Using temporary directory 
> '/tmp/MasterSlaveReconciliationTest_ReconcileLostTask_16KDgE'
> DEBUG: tests/master_slave_reconciliation_tests.cpp:226: Failure
> DEBUG: Failed to wait 15secs for statusUpdateMessage
> DEBUG: tests/master_slave_reconciliation_tests.cpp:216: Failure
> DEBUG: Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(, _))...
> DEBUG: Expected: to be called once
> DEBUG: Actual: never called - unsatisfied and active
> DEBUG: I0914 08:51:27.825984 16062 leveldb.cpp:438] Reading position from 
> leveldb took 16151ns
> DEBUG: I0914 08:51:27.828069 16049 registrar.cpp:342] Successfully fetched 
> the registry (0B) in 7648us
> DEBUG: I0914 08:51:27.828119 16049 registrar.cpp:441] Applied 1 operations in 
> 2805ns; attempting to update the 'registry'
> DEBUG: I0914 08:51:27.829991 16066 log.cpp:685] Attempting to append 222 
> bytes to the log
> DEBUG: I0914 08:51:27.830029 16066 coordinator.cpp:341] Coordinator 
> attempting to write APPEND action at position 1
> DEBUG: I0914 08:51:27.830729 16053 replica.cpp:511] Replica received write 
> request for position 1
> DEBUG: I0914 08:51:27.831167 16053 leveldb.cpp:343] Persisting action (241 
> bytes) to leveldb took 414748ns
> DEBUG: I0914 08:51:27.831185 16053 replica.cpp:679] Persisted action at 1
> DEBUG: I0914 08:51:27.831493 16058 replica.cpp:658] Replica received learned 
> notice for position 1
> DEBUG: I0914 08:51:27.831698 16058 leveldb.cpp:343] Persisting action (243 
> bytes) to leveldb took 185223ns
> DEBUG: I0914 08:51:27.831714 16058 replica.cpp:679] Persisted action at 1
> DEBUG: I0914 08:51:27.831722 16058 replica.cpp:664] Replica learned APPEND 
> action at position 1
> DEBUG: I0914 08:51:27.831989 16056 registrar.cpp:486] Successfully updated 
> the 'registry' in 3.827968ms
> DEBUG: I0914 08:51:27.832041 16052 log.cpp:704] Attempting to truncate the 
> log to 1
> DEBUG: I0914 08:51:27.832093 16056 registrar.cpp:372] Successfully recovered 
> registrar
> DEBUG: I0914 08:51:27.832259 16072 coordinator.cpp:341] Coordinator 
> attempting to write TRUNCATE action at position 2
> DEBUG: I0914 08:51:27.832259 16062 master.cpp:1404] Recovered 0 slaves from 
> the Registry (183B) ; allowing 10mins for slaves to re-register
> DEBUG: I0914 08:51:27.832882 16060 replica.cpp:511] Replica received write 
> request for position 2
> DEBUG: I0914 08:51:27.833243 16060 leveldb.cpp:343] Persisting action (16 
> bytes) to leveldb took 340843ns
> DEBUG: I0914 08:51:27.833261 16060 replica.cpp:679] Persisted action at 2
> DEBUG: I0914 08:51:27.833593 16050 replica.cpp:658] Replica received learned 
> notice for position 2
> DEBUG: I0914 08:51:27.833724 16050 leveldb.cpp:343] Persisting action (18 
> bytes) to leveldb took 112560ns
> DEBUG: I0914 08:51:27.833755 16050 leveldb.cpp:401] Deleting ~1 keys from 
> leveldb took 16580ns
> DEBUG: I0914 08:51:27.833765 16050 replica.cpp:679] Persisted action at 2
> DEBUG: I0914 08:51:27.833775 16050 replica.cpp:664] Replica learned TRUNCATE 
> action at position 2
> DEBUG: I0914 08:51:27.843340 16057 http.cpp:333] HTTP POST for 
> /master/maintenance/schedule from 172.18.4.102:46471
> DEBUG: I0914 08:51:27.843801 16050 registrar.cpp:441] Applied 1 operations in 
> 25197ns; attempting to update the 'registry'
> DEBUG: I0914 08:51:27.845721 16068 log.cpp:685] Attempting to append 328 
> bytes to the log
> DEBUG: I0914 08:51:27.845772 16068 coordinator.cpp:341] Coordinator 
> attempting to write APPEND action at position 3
> DEBUG: I0914 08:51:27.846606 16052 replica.cpp:511] Replica received write 
> request for position 3
> DEBUG: I0914 08:51:27.847012 16052 leveldb.cpp:343] Persisting action (347 
> bytes) to leveldb took 387519ns
> DEBUG: I0914 08:51:27.847026 16052 replica.cpp:679] Persisted action at 3
> DEBUG: I0914 08:51:27.847698 16048 

[jira] [Commented] (MESOS-3253) Add pid to network helper error messages

2015-09-01 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725694#comment-14725694
 ] 

Paul Brett commented on MESOS-3253:
---

I don't think we need this anymore.

> Add pid to network helper error messages
> 
>
> Key: MESOS-3253
> URL: https://issues.apache.org/jira/browse/MESOS-3253
> Project: Mesos
>  Issue Type: Bug
>Reporter: Paul Brett
>
> Network helper logs errors to stderr without the associated namespace pid or 
> container id  which prevents the errors from being associated with the 
> appropriate container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3347) Remove dead code in src/linux/perf.cpp

2015-08-31 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3347:
-

 Summary: Remove dead code in src/linux/perf.cpp
 Key: MESOS-3347
 URL: https://issues.apache.org/jira/browse/MESOS-3347
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett


Performance monitoring routines include support for sampling for single pid, 
single cgroup and multiple pids cases but these are never used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3292) Perf isolator event validation issues

2015-08-18 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3292:
-

 Summary: Perf isolator event validation issues
 Key: MESOS-3292
 URL: https://issues.apache.org/jira/browse/MESOS-3292
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett


Linux perf isolator currently validates events by running the installed perf 
command once at slave startup to verify that no error is raised when the event 
is requested.  No checking is done at startup to validate that the perf event 
is supported by Mesos in the PerfStatistics message.

However, perf is an external program and can be upgraded while the slave is 
running, possibly resulting in a change of perf Version and supported events or 
output formats.

We should validate events against PerfStatistics at startup and deal with on 
the fly perf upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3271) SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.

2015-08-15 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3271:
-

 Summary: SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.
 Key: MESOS-3271
 URL: https://issues.apache.org/jira/browse/MESOS-3271
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Paul Brett


Test failure on Ubuntu 14 configured with --disable-java --disable-python 
--enable-ssl --enable-libevent --enable-optimize --enable-network-isolation

Commit: 9b78b301469667b5a44f0a351de5f3a71edae499

[ RUN  ] SlaveRecoveryTest/0.NonCheckpointingFramework
I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0
I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 
20150815-064146-544909504-51064-12195-S0
Registered executor on slave1-ubuntu12
Starting task 044bd49e-2f38-4671-802a-ac6524d61a85
Forked command at 17114
sh -c 'sleep 1000'
[err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
0x2, fd: 21, flags: 0x80)
*** Aborted at 1439646107 (unix time) try date -d @1439646107 if you are 
using GNU date ***
PC: @ 0x7f6ba512d0d5 (unknown)
*** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 
12195; stack trace: ***
@ 0x7f6ba54c4cb0 (unknown)
@ 0x7f6ba512d0d5 (unknown)
@ 0x7f6ba513083b (unknown)
@ 0x7f6ba448e1ba (unknown)
@ 0x7f6ba448e52b (unknown)
@ 0x7f6ba447dcc9 (unknown)
@   0x4c4033 process::internal::run()
@ 0x7f6ba72642ab process::Future::discard()
@ 0x7f6ba72643be process::internal::discard()
@ 0x7f6ba7262298 
_ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data
@   0x4c4033 process::internal::run()
@   0x6fa0cb process::Future::discard()
@ 0x7f6ba6fb5736 cgroups::event::Listener::finalize()
@ 0x7f6ba728fb11 process::ProcessManager::resume()
@ 0x7f6ba728fe0f process::internal::schedule()
@ 0x7f6ba5c9d490 (unknown)
@ 0x7f6ba54bce9a start_thread
@ 0x7f6ba51ea38d (unknown)
+ /bin/true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3271) SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.

2015-08-15 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-3271:
--
Attachment: build.txt

 SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.
 ---

 Key: MESOS-3271
 URL: https://issues.apache.org/jira/browse/MESOS-3271
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Paul Brett
 Attachments: build.txt


 Test failure on Ubuntu 14 configured with --disable-java --disable-python 
 --enable-ssl --enable-libevent --enable-optimize --enable-network-isolation
 Commit: 9b78b301469667b5a44f0a351de5f3a71edae499
 [ RUN  ] SlaveRecoveryTest/0.NonCheckpointingFramework
 I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0
 I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 
 20150815-064146-544909504-51064-12195-S0
 Registered executor on slave1-ubuntu12
 Starting task 044bd49e-2f38-4671-802a-ac6524d61a85
 Forked command at 17114
 sh -c 'sleep 1000'
 [err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
 0x2, fd: 21, flags: 0x80)
 *** Aborted at 1439646107 (unix time) try date -d @1439646107 if you are 
 using GNU date ***
 PC: @ 0x7f6ba512d0d5 (unknown)
 *** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 
 12195; stack trace: ***
 @ 0x7f6ba54c4cb0 (unknown)
 @ 0x7f6ba512d0d5 (unknown)
 @ 0x7f6ba513083b (unknown)
 @ 0x7f6ba448e1ba (unknown)
 @ 0x7f6ba448e52b (unknown)
 @ 0x7f6ba447dcc9 (unknown)
 @   0x4c4033 process::internal::run()
 @ 0x7f6ba72642ab process::Future::discard()
 @ 0x7f6ba72643be process::internal::discard()
 @ 0x7f6ba7262298 
 _ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data
 @   0x4c4033 process::internal::run()
 @   0x6fa0cb process::Future::discard()
 @ 0x7f6ba6fb5736 cgroups::event::Listener::finalize()
 @ 0x7f6ba728fb11 process::ProcessManager::resume()
 @ 0x7f6ba728fe0f process::internal::schedule()
 @ 0x7f6ba5c9d490 (unknown)
 @ 0x7f6ba54bce9a start_thread
 @ 0x7f6ba51ea38d (unknown)
 + /bin/true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3272) CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer is flaky.

2015-08-15 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-3272:
--
Attachment: build.log

 CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer is flaky.
 

 Key: MESOS-3272
 URL: https://issues.apache.org/jira/browse/MESOS-3272
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Reporter: Paul Brett
 Attachments: build.log


 Test aborts when configured with python, libevent and SSL on Ubuntu12.
 [ RUN  ] 
 CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer
 *** Aborted at 1439667937 (unix time) try date -d @1439667937 if you are 
 using GNU date ***
 PC: @ 0x7feba972a753 (unknown)
 *** SIGSEGV (@0x0) received by PID 4359 (TID 0x7febabf897c0) from PID 0; 
 stack trace: ***
 @ 0x7feba8f7dcb0 (unknown)
 @ 0x7feba972a753 (unknown)
 @ 0x7febaaa69328 process::dispatch()
 @ 0x7febaaa5e9a7 cgroups::freezer::thaw()
 @   0xba64ff 
 mesos::internal::tests::CgroupsAnyHierarchyWithCpuMemoryTest_ROOT_CGROUPS_FreezeNonFreezer_Test::TestBody()
 @   0xc199a3 
 testing::internal::HandleExceptionsInMethodIfSupported()
 @   0xc0f947 testing::Test::Run()
 @   0xc0f9ee testing::TestInfo::Run()
 @   0xc0faf5 testing::TestCase::Run()
 @   0xc0fda8 testing::internal::UnitTestImpl::RunAllTests()
 @   0xc10064 testing::UnitTest::Run()
 @   0x4b3273 main
 @ 0x7feba8bd176d (unknown)
 @   0x4bf1f1 (unknown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3272) CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer is flaky.

2015-08-15 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3272:
-

 Summary: 
CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer is flaky.
 Key: MESOS-3272
 URL: https://issues.apache.org/jira/browse/MESOS-3272
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Reporter: Paul Brett
 Attachments: build.log

Test aborts when configured with python, libevent and SSL on Ubuntu12.

[ RUN  ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer
*** Aborted at 1439667937 (unix time) try date -d @1439667937 if you are 
using GNU date ***
PC: @ 0x7feba972a753 (unknown)
*** SIGSEGV (@0x0) received by PID 4359 (TID 0x7febabf897c0) from PID 0; stack 
trace: ***
@ 0x7feba8f7dcb0 (unknown)
@ 0x7feba972a753 (unknown)
@ 0x7febaaa69328 process::dispatch()
@ 0x7febaaa5e9a7 cgroups::freezer::thaw()
@   0xba64ff 
mesos::internal::tests::CgroupsAnyHierarchyWithCpuMemoryTest_ROOT_CGROUPS_FreezeNonFreezer_Test::TestBody()
@   0xc199a3 
testing::internal::HandleExceptionsInMethodIfSupported()
@   0xc0f947 testing::Test::Run()
@   0xc0f9ee testing::TestInfo::Run()
@   0xc0faf5 testing::TestCase::Run()
@   0xc0fda8 testing::internal::UnitTestImpl::RunAllTests()
@   0xc10064 testing::UnitTest::Run()
@   0x4b3273 main
@ 0x7feba8bd176d (unknown)
@   0x4bf1f1 (unknown)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3254) Cgroup CHECK fails test harness

2015-08-14 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697661#comment-14697661
 ] 

Paul Brett commented on MESOS-3254:
---

Updated change at https://reviews.apache.org/r/37490/

 Cgroup CHECK fails test harness
 ---

 Key: MESOS-3254
 URL: https://issues.apache.org/jira/browse/MESOS-3254
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Paul Brett

 CHECK in clean up of ContainerizerTest causes test harness to abort rather 
 than fail or skip only perf related tests.
 [ RUN  ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch
 [   OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms)
 [--] 24 tests from SlaveRecoveryTest/0 (38986 ms total)
 [--] 4 tests from MesosContainerizerSlaveRecoveryTest
 [ RUN  ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics
 ../../src/tests/mesos.cpp:720: Failure
 cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to 
 another hierarchy
 -
 We cannot run any cgroups tests that require
 a hierarchy with subsystem 'perf_event'
 because we failed to find an existing hierarchy
 or create a new one (tried '/tmp/mesos_test_cgroup/perf_event').
 You can either remove all existing
 hierarchies, or disable this test case
 (i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*).
 -
 F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): 
 '/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy 
 *** Check failure stack trace: ***
 @ 0x7fb2fb4835fd  google::LogMessage::Fail()
 @ 0x7fb2fb48543d  google::LogMessage::SendToLog()
 @ 0x7fb2fb4831ec  google::LogMessage::Flush()
 @ 0x7fb2fb485d39  google::LogMessageFatal::~LogMessageFatal()
 @   0x4e3f98  _CheckFatal::~_CheckFatal()
 @   0x82f25a  
 mesos::internal::tests::ContainerizerTest::TearDown()
 @   0xc030e3  
 testing::internal::HandleExceptionsInMethodIfSupported()
 @   0xbf9050  testing::Test::Run()
 @   0xbf912e  testing::TestInfo::Run()
 @   0xbf9235  testing::TestCase::Run()
 @   0xbf94e8  testing::internal::UnitTestImpl::RunAllTests()
 @   0xbf97a4  testing::UnitTest::Run()
 @   0x4a9df3  main
 @ 0x7fb2f9371ec5  (unknown)
 @   0x4b63ee  (unknown)
 Build step 'Execute shell' marked build as failure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3254) Cgroup CHECK fails test harness

2015-08-14 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-3254:
--
  Sprint: Twitter Mesos Q3 Sprint 3
Story Points: 2
 Description: 

CHECK in clean up of ContainerizerTest causes test harness to abort rather than 
fail or skip only perf related tests.

[ RUN  ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch
[   OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms)
[--] 24 tests from SlaveRecoveryTest/0 (38986 ms total)

[--] 4 tests from MesosContainerizerSlaveRecoveryTest
[ RUN  ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics
../../src/tests/mesos.cpp:720: Failure
cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to 
another hierarchy
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/tmp/mesos_test_cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*).
-
F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): 
'/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy 
*** Check failure stack trace: ***
@ 0x7fb2fb4835fd  google::LogMessage::Fail()
@ 0x7fb2fb48543d  google::LogMessage::SendToLog()
@ 0x7fb2fb4831ec  google::LogMessage::Flush()
@ 0x7fb2fb485d39  google::LogMessageFatal::~LogMessageFatal()
@   0x4e3f98  _CheckFatal::~_CheckFatal()
@   0x82f25a  
mesos::internal::tests::ContainerizerTest::TearDown()
@   0xc030e3  
testing::internal::HandleExceptionsInMethodIfSupported()
@   0xbf9050  testing::Test::Run()
@   0xbf912e  testing::TestInfo::Run()
@   0xbf9235  testing::TestCase::Run()
@   0xbf94e8  testing::internal::UnitTestImpl::RunAllTests()
@   0xbf97a4  testing::UnitTest::Run()
@   0x4a9df3  main
@ 0x7fb2f9371ec5  (unknown)
@   0x4b63ee  (unknown)
Build step 'Execute shell' marked build as failure

  was:


CHECK in clean up of ContainerizerTest causes test harness to abort rather than 
fail or skip only perf related tests.

[ RUN  ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch
[   OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms)
[--] 24 tests from SlaveRecoveryTest/0 (38986 ms total)

[--] 4 tests from MesosContainerizerSlaveRecoveryTest
[ RUN  ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics
../../src/tests/mesos.cpp:720: Failure
cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to 
another hierarchy
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/tmp/mesos_test_cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*).
-
F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): 
'/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy 
*** Check failure stack trace: ***
@ 0x7fb2fb4835fd  google::LogMessage::Fail()
@ 0x7fb2fb48543d  google::LogMessage::SendToLog()
@ 0x7fb2fb4831ec  google::LogMessage::Flush()
@ 0x7fb2fb485d39  google::LogMessageFatal::~LogMessageFatal()
@   0x4e3f98  _CheckFatal::~_CheckFatal()
@   0x82f25a  
mesos::internal::tests::ContainerizerTest::TearDown()
@   0xc030e3  
testing::internal::HandleExceptionsInMethodIfSupported()
@   0xbf9050  testing::Test::Run()
@   0xbf912e  testing::TestInfo::Run()
@   0xbf9235  testing::TestCase::Run()
@   0xbf94e8  testing::internal::UnitTestImpl::RunAllTests()
@   0xbf97a4  testing::UnitTest::Run()
@   0x4a9df3  main
@ 0x7fb2f9371ec5  (unknown)
@   0x4b63ee  (unknown)
Build step 'Execute shell' marked build as failure


 Cgroup CHECK fails test harness
 ---

 Key: MESOS-3254
 URL: https://issues.apache.org/jira/browse/MESOS-3254
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Paul Brett

 CHECK in clean up of ContainerizerTest causes test harness to abort rather 
 than fail or skip only perf related tests.
 [ RUN  ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch
 [   OK ] 

[jira] [Created] (MESOS-3257) Zookeeper JVM test failure causes test harness to fail

2015-08-13 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3257:
-

 Summary: Zookeeper JVM test failure causes test harness to fail
 Key: MESOS-3257
 URL: https://issues.apache.org/jira/browse/MESOS-3257
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett


Failure of the test setup for ZooKeeper Java setup causes test harness to exit, 
preventing subsequent tests from running.

{code}
[--] 2 tests from LogZooKeeperTest
F0813 16:09:33.647265 13790 zookeeper.cpp:78] CHECK_SOME(jvm): Error looking up 
symbol 'JNI_CreateJavaVM' in '' : 
/home/pbrett/sandbox/perf.refactor2/build/src/.libs/mesos-tests: undefined 
symbol: JNI_CreateJavaVM
*** Check failure stack trace: ***
@ 0x7f2d8cca7aac  google::LogMessage::Fail()
@ 0x7f2d8cca79fb  google::LogMessage::SendToLog()
@ 0x7f2d8cca740c  google::LogMessage::Flush()
@ 0x7f2d8ccaa140  google::LogMessageFatal::~LogMessageFatal()
@   0x8a938c  _CheckFatal::~_CheckFatal()
@  0x12f68c0  mesos::internal::tests::ZooKeeperTest::SetUpTestCase()
@  0x132a88a  testing::TestCase::RunSetUpTestCase()
@  0x1334cf7  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x132fb94  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x1311635  testing::TestCase::Run()
@  0x1317fca  testing::internal::UnitTestImpl::RunAllTests()
@  0x1335427  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x1330128  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x1316cf0  testing::UnitTest::Run()
@   0xc3a9d8  RUN_ALL_TESTS()
@   0xc3a6c8  main
@ 0x7f2d8818d9f4  __libc_start_main
@   0x8a5fa9  (unknown)
make[3]: *** [check-local] Aborted
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3254) Cgroup CHECK fails test harness

2015-08-12 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-3254:
--
Description: 


CHECK in clean up of ContainerizerTest causes test harness to abort rather than 
fail or skip only perf related tests.

[ RUN  ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch
[   OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms)
[--] 24 tests from SlaveRecoveryTest/0 (38986 ms total)

[--] 4 tests from MesosContainerizerSlaveRecoveryTest
[ RUN  ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics
../../src/tests/mesos.cpp:720: Failure
cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to 
another hierarchy
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/tmp/mesos_test_cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*).
-
F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): 
'/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy 
*** Check failure stack trace: ***
@ 0x7fb2fb4835fd  google::LogMessage::Fail()
@ 0x7fb2fb48543d  google::LogMessage::SendToLog()
@ 0x7fb2fb4831ec  google::LogMessage::Flush()
@ 0x7fb2fb485d39  google::LogMessageFatal::~LogMessageFatal()
@   0x4e3f98  _CheckFatal::~_CheckFatal()
@   0x82f25a  
mesos::internal::tests::ContainerizerTest::TearDown()
@   0xc030e3  
testing::internal::HandleExceptionsInMethodIfSupported()
@   0xbf9050  testing::Test::Run()
@   0xbf912e  testing::TestInfo::Run()
@   0xbf9235  testing::TestCase::Run()
@   0xbf94e8  testing::internal::UnitTestImpl::RunAllTests()
@   0xbf97a4  testing::UnitTest::Run()
@   0x4a9df3  main
@ 0x7fb2f9371ec5  (unknown)
@   0x4b63ee  (unknown)
Build step 'Execute shell' marked build as failure

  was:
CHECK in clean up of ContainerizerTest causes test harness to abort rather than 
fail or skip only perf related tests.

[ RUN  ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch
[   OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms)
[--] 24 tests from SlaveRecoveryTest/0 (38986 ms total)

[--] 4 tests from MesosContainerizerSlaveRecoveryTest
[ RUN  ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics
../../src/tests/mesos.cpp:720: Failure
cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to 
another hierarchy
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/tmp/mesos_test_cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*).
-
F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): 
'/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy 
*** Check failure stack trace: ***
@ 0x7fb2fb4835fd  google::LogMessage::Fail()
@ 0x7fb2fb48543d  google::LogMessage::SendToLog()
@ 0x7fb2fb4831ec  google::LogMessage::Flush()
@ 0x7fb2fb485d39  google::LogMessageFatal::~LogMessageFatal()
@   0x4e3f98  _CheckFatal::~_CheckFatal()
@   0x82f25a  
mesos::internal::tests::ContainerizerTest::TearDown()
@   0xc030e3  
testing::internal::HandleExceptionsInMethodIfSupported()
@   0xbf9050  testing::Test::Run()
@   0xbf912e  testing::TestInfo::Run()
@   0xbf9235  testing::TestCase::Run()
@   0xbf94e8  testing::internal::UnitTestImpl::RunAllTests()
@   0xbf97a4  testing::UnitTest::Run()
@   0x4a9df3  main
@ 0x7fb2f9371ec5  (unknown)
@   0x4b63ee  (unknown)
Build step 'Execute shell' marked build as failure


 Cgroup CHECK fails test harness
 ---

 Key: MESOS-3254
 URL: https://issues.apache.org/jira/browse/MESOS-3254
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Paul Brett

 CHECK in clean up of ContainerizerTest causes test harness to abort rather 
 than fail or skip only perf related tests.
 [ RUN  ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch
 [   OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms)
 [--] 24 tests from SlaveRecoveryTest/0 

[jira] [Commented] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine

2015-08-12 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694498#comment-14694498
 ] 

Paul Brett commented on MESOS-3185:
---

Updated, reviews are:

https://reviews.apache.org/r/37423/
https://reviews.apache.org/r/37424/
https://reviews.apache.org/r/37417/
https://reviews.apache.org/r/37416/

 Refactor Subprocess logic in linux/perf.cpp to use common subroutine
 

 Key: MESOS-3185
 URL: https://issues.apache.org/jira/browse/MESOS-3185
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Paul Brett
Assignee: Paul Brett

 MESOS-2834 will enhance the perf isolator to support the different output 
 formats provided by difference kernel versions.  In order to achieve this, it 
 requires to execute the perf --version command. 
 We should decompose the existing Subcommand processing in perf so that we can 
 share the implementation between the multiple uses of perf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3253) Add pid to network helper error messages

2015-08-11 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3253:
-

 Summary: Add pid to network helper error messages
 Key: MESOS-3253
 URL: https://issues.apache.org/jira/browse/MESOS-3253
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett


Network helper logs errors to stderr without the associated namespace pid or 
container id  which prevents the errors from being associated with the 
appropriate container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3252) Ignore no statistics condition for containers with no qdisc

2015-08-11 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3252:
-

 Summary: Ignore no statistics condition for containers with no 
qdisc
 Key: MESOS-3252
 URL: https://issues.apache.org/jira/browse/MESOS-3252
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Reporter: Paul Brett


In PortMappingStatistics::execute, we log the following errors to stderr if the 
egress rate limiting qdiscs are not configured inside the container.

{code}
Failed to get the network statistics for the htb qdisc on eth0
Failed to get the network statistics for the fq_codel qdisc on eth0
{code}

This can occur because of an error reading the qdisc (statistics function 
return an error) or because the qdisc does not exist (function returns none).  

We should not log an error when the qdisc does not exist since this is normal 
behaviour if the container is created without rate limiting.  We do not want to 
gate this function on the slave rate limiting flag since we would have to 
compare the behaviour against the flag value at the time the container was 
created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3254) Cgroup CHECK fails test harness

2015-08-11 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3254:
-

 Summary: Cgroup CHECK fails test harness
 Key: MESOS-3254
 URL: https://issues.apache.org/jira/browse/MESOS-3254
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Paul Brett


CHECK in clean up of ContainerizerTest causes test harness to abort rather than 
fail or skip only perf related tests.

[ RUN  ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch
[   OK ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch (628 ms)
[--] 24 tests from SlaveRecoveryTest/0 (38986 ms total)

[--] 4 tests from MesosContainerizerSlaveRecoveryTest
[ RUN  ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics
../../src/tests/mesos.cpp:720: Failure
cgroups::mount(hierarchy, subsystem): 'perf_event' is already attached to 
another hierarchy
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'perf_event'
because we failed to find an existing hierarchy
or create a new one (tried '/tmp/mesos_test_cgroup/perf_event').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-MesosContainerizerSlaveRecoveryTest.*).
-
F0811 17:23:43.874696 12955 mesos.cpp:774] CHECK_SOME(cgroups): 
'/tmp/mesos_test_cgroup/perf_event' is not a valid hierarchy 
*** Check failure stack trace: ***
@ 0x7fb2fb4835fd  google::LogMessage::Fail()
@ 0x7fb2fb48543d  google::LogMessage::SendToLog()
@ 0x7fb2fb4831ec  google::LogMessage::Flush()
@ 0x7fb2fb485d39  google::LogMessageFatal::~LogMessageFatal()
@   0x4e3f98  _CheckFatal::~_CheckFatal()
@   0x82f25a  
mesos::internal::tests::ContainerizerTest::TearDown()
@   0xc030e3  
testing::internal::HandleExceptionsInMethodIfSupported()
@   0xbf9050  testing::Test::Run()
@   0xbf912e  testing::TestInfo::Run()
@   0xbf9235  testing::TestCase::Run()
@   0xbf94e8  testing::internal::UnitTestImpl::RunAllTests()
@   0xbf97a4  testing::UnitTest::Run()
@   0x4a9df3  main
@ 0x7fb2f9371ec5  (unknown)
@   0x4b63ee  (unknown)
Build step 'Execute shell' marked build as failure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2994) Design doc for creating user namespaces inside containers

2015-08-11 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2994:
--
Sprint: Twitter Mesos Q3 Sprint 1, Twitter Mesos Q3 Sprint 2  (was: Twitter 
Mesos Q3 Sprint 1, Twitter Mesos Q3 Sprint 2, Twitter Mesos Q3 Sprint 3)

 Design doc for creating user namespaces inside containers
 -

 Key: MESOS-2994
 URL: https://issues.apache.org/jira/browse/MESOS-2994
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3204) PortMappingIsolatorProcess shell script can silently fail

2015-08-04 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3204:
-

 Summary: PortMappingIsolatorProcess shell script can silently fail
 Key: MESOS-3204
 URL: https://issues.apache.org/jira/browse/MESOS-3204
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.24.0
Reporter: Paul Brett
Assignee: Paul Brett


PortMappingIsolatorProcess::scripts generates a shell script to configure the 
target environment but does not set the shell '-e' flag.  Hence errors 
generated by the script will be silently ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3204) PortMappingIsolatorProcess shell script can silently fail

2015-08-04 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654606#comment-14654606
 ] 

Paul Brett commented on MESOS-3204:
---

https://reviews.apache.org/r/37106

 PortMappingIsolatorProcess shell script can silently fail
 -

 Key: MESOS-3204
 URL: https://issues.apache.org/jira/browse/MESOS-3204
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.24.0
Reporter: Paul Brett
Assignee: Paul Brett

 PortMappingIsolatorProcess::scripts generates a shell script to configure the 
 target environment but does not set the shell '-e' flag.  Hence errors 
 generated by the script will be silently ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine

2015-07-31 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649943#comment-14649943
 ] 

Paul Brett commented on MESOS-3185:
---

[~bmahler] suggested looking at process::await as a way to simplify the current 
code, so I am  pulling the review while I take a look at this.

 Refactor Subprocess logic in linux/perf.cpp to use common subroutine
 

 Key: MESOS-3185
 URL: https://issues.apache.org/jira/browse/MESOS-3185
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Paul Brett
Assignee: Paul Brett

 MESOS-2834 will enhance the perf isolator to support the different output 
 formats provided by difference kernel versions.  In order to achieve this, it 
 requires to execute the perf --version command. 
 We should decompose the existing Subcommand processing in perf so that we can 
 share the implementation between the multiple uses of perf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine

2015-07-31 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649885#comment-14649885
 ] 

Paul Brett commented on MESOS-3185:
---

I was working with Marco for a while trying to do that and have held off this 
change to see where it is going, but need to move forward with the change.  My 
hope is that we can get the stout version to support what I need and then I 
will fix perf to use his code.

 Refactor Subprocess logic in linux/perf.cpp to use common subroutine
 

 Key: MESOS-3185
 URL: https://issues.apache.org/jira/browse/MESOS-3185
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Paul Brett
Assignee: Paul Brett

 MESOS-2834 will enhance the perf isolator to support the different output 
 formats provided by difference kernel versions.  In order to achieve this, it 
 requires to execute the perf --version command. 
 We should decompose the existing Subcommand processing in perf so that we can 
 share the implementation between the multiple uses of perf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine

2015-07-31 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649886#comment-14649886
 ] 

Paul Brett commented on MESOS-3185:
---

Added review https://reviews.apache.org/r/37000/

 Refactor Subprocess logic in linux/perf.cpp to use common subroutine
 

 Key: MESOS-3185
 URL: https://issues.apache.org/jira/browse/MESOS-3185
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Paul Brett
Assignee: Paul Brett

 MESOS-2834 will enhance the perf isolator to support the different output 
 formats provided by difference kernel versions.  In order to achieve this, it 
 requires to execute the perf --version command. 
 We should decompose the existing Subcommand processing in perf so that we can 
 share the implementation between the multiple uses of perf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3185) Refactor Subprocess logic in linux/perf.cpp to use common subroutine

2015-07-31 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3185:
-

 Summary: Refactor Subprocess logic in linux/perf.cpp to use common 
subroutine
 Key: MESOS-3185
 URL: https://issues.apache.org/jira/browse/MESOS-3185
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Paul Brett
Assignee: Paul Brett


MESOS-2834 will enhance the perf isolator to support the different output 
formats provided by difference kernel versions.  In order to achieve this, it 
requires to execute the perf --version command. 

We should decompose the existing Subcommand processing in perf so that we can 
share the implementation between the multiple uses of perf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3175) subprocess_tests.cpp:598 delete used but allocated with new[]

2015-07-30 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648033#comment-14648033
 ] 

Paul Brett commented on MESOS-3175:
---

https://reviews.apache.org/r/36947/

 subprocess_tests.cpp:598 delete used but allocated with new[]
 -

 Key: MESOS-3175
 URL: https://issues.apache.org/jira/browse/MESOS-3175
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.24.0
Reporter: Paul Brett
Assignee: Paul Brett

 Compiler detected error:
 ../3rdparty/libprocess/src/tests/subprocess_tests.cpp|619 col 3| warning: 
 'delete' applied to a pointer that was allocated with 'new[]'; did you mean 
 'delete[]'? [-Wmismatched-new-delete]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3175) subprocess_tests.cpp:598 delete used but allocated with new[]

2015-07-30 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3175:
-

 Summary: subprocess_tests.cpp:598 delete used but allocated with 
new[]
 Key: MESOS-3175
 URL: https://issues.apache.org/jira/browse/MESOS-3175
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.24.0
Reporter: Paul Brett
Assignee: Paul Brett


Compiler detected error:

../3rdparty/libprocess/src/tests/subprocess_tests.cpp|619 col 3| warning: 
'delete' applied to a pointer that was allocated with 'new[]'; did you mean 
'delete[]'? [-Wmismatched-new-delete]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS Flaky

2015-07-27 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3160:
-

 Summary: 
CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS Flaky
 Key: MESOS-3160
 URL: https://issues.apache.org/jira/browse/MESOS-3160
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Paul Brett


Test will occasionally with:

[ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS
../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
helper.increaseRSS(getpagesize()): Failed to sync with the subprocess
../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet
[  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS 
(223 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2834) Support different perf output formats

2015-07-20 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2834:
--
Sprint: Twitter Mesos Q2 Sprint 6, Twitter Mesos Q3 Sprint 1, Twitter Mesos 
Q3 Sprint 2  (was: Twitter Mesos Q2 Sprint 6, Twitter Mesos Q3 Sprint 1)

 Support different perf output formats
 -

 Key: MESOS-2834
 URL: https://issues.apache.org/jira/browse/MESOS-2834
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Reporter: Ian Downes
Assignee: Paul Brett
  Labels: twitter

 The output format of perf changes in 3.14 (inserting an additional field) and 
 in again in 4.1 (appending additional) fields. See kernel commits:
 410136f5dd96b6013fe6d1011b523b1c247e1ccb
 d73515c03c6a2706e088094ff6095a3abefd398b
 Update the perf::parse() function to understand all these formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3050) Failing Docker/Cgroups/Volume tests in 0.23.0-rc3 on CentOS 7.1

2015-07-16 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630212#comment-14630212
 ] 

Paul Brett commented on MESOS-3050:
---

PerfEventIsolatorTest is due to incompatible perf output version, will be fixed 
by MESOS-2834.

 Failing Docker/Cgroups/Volume tests in 0.23.0-rc3 on CentOS 7.1
 ---

 Key: MESOS-3050
 URL: https://issues.apache.org/jira/browse/MESOS-3050
 Project: Mesos
  Issue Type: Bug
  Components: containerization, docker, test
Affects Versions: 0.23.0
 Environment: CentOS Linux release 7.1.1503
 0.23.0-rc3
Reporter: Adam B
Assignee: Timothy Chen

 {code}
 [ RUN  ] DockerTest.ROOT_DOCKER_CheckPortResource
 ../../src/tests/docker_tests.cpp:303: Failure
 (run).failure(): Container exited on error: exited with status 1
 [  FAILED  ] DockerTest.ROOT_DOCKER_CheckPortResource (709 ms)
 {code}
 ...
 {code}
 [ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
 ../../src/tests/isolator_tests.cpp:837: Failure
 isolator: Failed to create PerfEvent isolator, invalid events: { cycles, 
 task-clock }
 [  FAILED  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (9 ms)
 [--] 1 test from PerfEventIsolatorTest (9 ms total)

 [--] 2 tests from SharedFilesystemIsolatorTest
 [ RUN  ] SharedFilesystemIsolatorTest.ROOT_RelativeVolume
 + mount -n --bind 
 /tmp/SharedFilesystemIsolatorTest_ROOT_RelativeVolume_4yTEAC/var/tmp /var/tmp
 + touch /var/tmp/492407e1-5dec-4b34-8f2f-130430f41aac
 ../../src/tests/isolator_tests.cpp:1001: Failure
 Value of: os::exists(file)
   Actual: true
 Expected: false
 [  FAILED  ] SharedFilesystemIsolatorTest.ROOT_RelativeVolume (92 ms)
 [ RUN  ] SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume
 + mount -n --bind 
 /tmp/SharedFilesystemIsolatorTest_ROOT_AbsoluteVolume_OwYrXK /var/tmp
 + touch /var/tmp/7de712aa-52eb-4976-b0f9-32b6a006418d
 ../../src/tests/isolator_tests.cpp:1086: Failure
 Value of: os::exists(path::join(containerPath, filename))
   Actual: true
 Expected: false
 [  FAILED  ] SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume (100 ms)
 {code}
 ...
 {code}
 [--] 1 test from UserCgroupIsolatorTest/0, where TypeParam = 
 mesos::internal::slave::CgroupsMemIsolatorProcess
 userdel: user 'mesos.test.unprivileged.user' does not exist
 [ RUN  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup
 -bash: /sys/fs/cgroup/blkio/user.slice/cgroup.procs: Permission denied
 mkdir: cannot create directory ‘/sys/fs/cgroup/blkio/user.slice/user’: 
 Permission denied
 ../../src/tests/isolator_tests.cpp:1274: Failure
 Value of: os::system( su -  + UNPRIVILEGED_USERNAME +  -c 'mkdir  + 
 path::join(flags.cgroups_hierarchy, userCgroup) + ')
   Actual: 256
 Expected: 0
 -bash: /sys/fs/cgroup/blkio/user.slice/user/cgroup.procs: No such file or 
 directory
 ../../src/tests/isolator_tests.cpp:1283: Failure
 Value of: os::system( su -  + UNPRIVILEGED_USERNAME +  -c 'echo $$  + 
 path::join(flags.cgroups_hierarchy, userCgroup, cgroup.procs) + ')
   Actual: 256
 Expected: 0
 -bash: 
 /sys/fs/cgroup/memory/mesos/bbf8c8f0-3d67-40df-a269-b3dc6a9597aa/cgroup.procs:
  Permission denied
 -bash: /sys/fs/cgroup/cpuacct,cpu/user.slice/cgroup.procs: No such file or 
 directory
 mkdir: cannot create directory ‘/sys/fs/cgroup/cpuacct,cpu/user.slice/user’: 
 No such file or directory
 ../../src/tests/isolator_tests.cpp:1274: Failure
 Value of: os::system( su -  + UNPRIVILEGED_USERNAME +  -c 'mkdir  + 
 path::join(flags.cgroups_hierarchy, userCgroup) + ')
   Actual: 256
 Expected: 0
 -bash: /sys/fs/cgroup/cpuacct,cpu/user.slice/user/cgroup.procs: No such file 
 or directory
 ../../src/tests/isolator_tests.cpp:1283: Failure
 Value of: os::system( su -  + UNPRIVILEGED_USERNAME +  -c 'echo $$  + 
 path::join(flags.cgroups_hierarchy, userCgroup, cgroup.procs) + ')
   Actual: 256
 Expected: 0
 -bash: 
 /sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/cgroup.procs:
  No such file or directory
 mkdir: cannot create directory 
 ‘/sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/user’:
  No such file or directory
 ../../src/tests/isolator_tests.cpp:1274: Failure
 Value of: os::system( su -  + UNPRIVILEGED_USERNAME +  -c 'mkdir  + 
 path::join(flags.cgroups_hierarchy, userCgroup) + ')
   Actual: 256
 Expected: 0
 -bash: 
 /sys/fs/cgroup/name=systemd/user.slice/user-2004.slice/session-3865.scope/user/cgroup.procs:
  No such file or directory
 ../../src/tests/isolator_tests.cpp:1283: Failure
 Value of: os::system( su -  + UNPRIVILEGED_USERNAME +  -c 'echo $$  + 
 path::join(flags.cgroups_hierarchy, userCgroup, cgroup.procs) + ')
   Actual: 256
 Expected: 0
 [  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where 
 TypeParam = 

[jira] [Commented] (MESOS-3035) As a Developer I would like a standard way to run a Subprocess in libprocess

2015-07-16 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630375#comment-14630375
 ] 

Paul Brett commented on MESOS-3035:
---

Removed MESOS-2834 as a dependent by implementing a revised version of this 
code.  

[~marco-mesos] you might want to take a look at 
https://reviews.apache.org/r/36378 to see what I was thinking.

 As a Developer I would like a standard way to run a Subprocess in libprocess
 

 Key: MESOS-3035
 URL: https://issues.apache.org/jira/browse/MESOS-3035
 Project: Mesos
  Issue Type: Story
  Components: libprocess
Reporter: Marco Massenzio
Assignee: Marco Massenzio

 As part of MESOS-2830 and MESOS-2902 I have been researching the ability to 
 run a {{Subprocess}} and capture the {{stdout / stderr}} along with the exit 
 status code.
 {{process::subprocess()}} offers much of the functionality, but in a way that 
 still requires a lot of handiwork on the developer's part; we would like to 
 further abstract away the ability to just pass a string, an optional set of 
 command-line arguments and then collect the output of the command (bonus: 
 without blocking).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2834) Support different perf output formats

2015-07-16 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2834:
--
Shepherd: Jie Yu  (was: Ian Downes)

 Support different perf output formats
 -

 Key: MESOS-2834
 URL: https://issues.apache.org/jira/browse/MESOS-2834
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Reporter: Ian Downes
Assignee: Paul Brett
  Labels: twitter

 The output format of perf changes in 3.14 (inserting an additional field) and 
 in again in 4.1 (appending additional) fields. See kernel commits:
 410136f5dd96b6013fe6d1011b523b1c247e1ccb
 d73515c03c6a2706e088094ff6095a3abefd398b
 Update the perf::parse() function to understand all these formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3029) Stout os::release returns unsortable version

2015-07-10 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3029:
-

 Summary: Stout os::release returns unsortable version
 Key: MESOS-3029
 URL: https://issues.apache.org/jira/browse/MESOS-3029
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Priority: Minor


When the Linux kernel version was incremented from 2.6.39 to 3.0.0 it was 
discovered that a few applications could not process kernel versions starting 
with anything but 2.x.  For compatibility, Red Hat and others mapped kernel 
version 3.n to 2.6.n+40.  

This introduces the interesting property that kernel 2.6.50 is later that 
kernel 3.9 but Version(2, 60, 50) is not greater that Version(3, 9, 0).  Since 
we want to be able to order kernel versions, we need to undo this mapping.

The following function is proposed for use in linux/perf.cpp to address this 
issue:

Version canonicalLinuxRelease(const Version v) {
  if((v  Version(2, 6, 39))  (v  Version(3, 0, 0))) {
return Version(3, v.patchVersion-40, 0);
  }
  return v;
}

We could either add this to stout/os.hpp or add a custom sort order to Version 
(which we might need later when we generalize it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3026) ProcessTest.Cache fails and hangs

2015-07-09 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621123#comment-14621123
 ] 

Paul Brett commented on MESOS-3026:
---

I see the same hang on g++ 4.8, ubuntu 14.04.

 ProcessTest.Cache fails and hangs
 -

 Key: MESOS-3026
 URL: https://issues.apache.org/jira/browse/MESOS-3026
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
 Environment: ubuntu 15.04/ ubuntu 14.04.2
 clang-3.6 / gcc 4.8.2
Reporter: Joris Van Remoortere
Assignee: Alexander Rojas
  Labels: libprocess, tests

 {code}
 [ RUN  ] ProcessTest.Cache
 ../../../3rdparty/libprocess/src/tests/process_tests.cpp:1726: Failure
 Value of: response.get().status
   Actual: 200 OK
 Expected: 304 Not Modified
 [  FAILED  ] ProcessTest.Cache (1 ms)
 {code}
 The tests then finish running, but the gtest framework fails to terminate and 
 uses 100% CPU.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2834) Support different perf output formats

2015-07-09 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621430#comment-14621430
 ] 

Paul Brett commented on MESOS-2834:
---

Updated to address reviewer issues and reposted at:

https://reviews.apache.org/r/36378/
https://reviews.apache.org/r/36380/

 Support different perf output formats
 -

 Key: MESOS-2834
 URL: https://issues.apache.org/jira/browse/MESOS-2834
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Reporter: Ian Downes
Assignee: Paul Brett
  Labels: twitter

 The output format of perf changes in 3.14 (inserting an additional field) and 
 in again in 4.1 (appending additional) fields. See kernel commits:
 410136f5dd96b6013fe6d1011b523b1c247e1ccb
 d73515c03c6a2706e088094ff6095a3abefd398b
 Update the perf::parse() function to understand all these formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2993) Document per container unique egress flow and network queueing statistics

2015-07-08 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619017#comment-14619017
 ] 

Paul Brett commented on MESOS-2993:
---

Completed review updates.  

[~adam-mesos] - could you please check if the updates are ok and commit. Thanks.

 Document  per container unique egress flow and network queueing statistics
 --

 Key: MESOS-2993
 URL: https://issues.apache.org/jira/browse/MESOS-2993
 Project: Mesos
  Issue Type: Bug
  Components: documentation, isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Document new network isolation capabilities in 0.23



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3020) Expose major, minor and patch components from stout Version

2015-07-08 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619515#comment-14619515
 ] 

Paul Brett edited comment on MESOS-3020 at 7/8/15 10:48 PM:


Patch available for review https://reviews.apache.org/r/36336


was (Author: pbrett):
Patch available for review https://reviews.apache.org/r/36281/

 Expose major, minor and patch components from stout Version  
 -

 Key: MESOS-3020
 URL: https://issues.apache.org/jira/browse/MESOS-3020
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Stout version class does not expose version components, preventing 
 computations manipulation of version information.  Solution is to make major, 
 minor and patch public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3020) Expose major, minor and patch components from stout Version

2015-07-08 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619502#comment-14619502
 ] 

Paul Brett commented on MESOS-3020:
---

The need I have at the moment is in processing kernel versions.  Redhat use 
kernel versions 2.6.40 upwards as an alias for 3.0 series, so I want to write 
something like this:

TryVersion v = os::release;
if((v = Version(2, 6, 40))  (v  Version(3, 0, 0)))
v = Version(3, v.minor-40, v.patch);

 Expose major, minor and patch components from stout Version  
 -

 Key: MESOS-3020
 URL: https://issues.apache.org/jira/browse/MESOS-3020
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Stout version class does not expose version components, preventing 
 computations manipulation of version information.  Solution is to make major, 
 minor and patch public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3020) Expose major, minor and patch components from stout Version

2015-07-08 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619515#comment-14619515
 ] 

Paul Brett commented on MESOS-3020:
---

Patch available for review https://reviews.apache.org/r/36281/

 Expose major, minor and patch components from stout Version  
 -

 Key: MESOS-3020
 URL: https://issues.apache.org/jira/browse/MESOS-3020
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Stout version class does not expose version components, preventing 
 computations manipulation of version information.  Solution is to make major, 
 minor and patch public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator

2015-07-07 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3002:
-

 Summary: Rename OptionT::get(const T _t) to getOrElse() broke 
network isolator
 Key: MESOS-3002
 URL: https://issues.apache.org/jira/browse/MESOS-3002
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett


Change to Option from get() to getOrElse() breaks network isolator.  Building 
with '../configure --with-network-isolator' generates the following error:

../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static 
member function 'static Trymesos::slave::Isolator* 
mesos::internal::slave::PortMappingIsolatorProcess::create(const 
mesos::internal::slave::Flags)':
../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: 
error: no matching function for call to 'Optionstd::basic_stringchar 
::get(const char [1]) const'
   flags.resources.get(),
 ^
../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: note: 
candidates are:
In file included from 
../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0,
 from ../../3rdparty/libprocess/include/process/check.hpp:19,
 from ../../3rdparty/libprocess/include/process/collect.hpp:7,
 from 
../../src/slave/containerizer/isolators/network/port_mapping.cpp:30:
../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: 
const T OptionT::get() const [with T = std::basic_stringchar]
   const T get() const { assert(isSome()); return t; }
^
../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: note: 
  candidate expects 0 arguments, 1 provided
../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note: 
T OptionT::get() [with T = std::basic_stringchar]
   T get() { assert(isSome()); return t; }
  ^
../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: note:  
 candidate expects 0 arguments, 1 provided
make[2]: *** 
[slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo] 
Error 1
make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src'
make: *** [check-recursive] Error 1




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator

2015-07-07 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-3002:
--
Assignee: Mark Wang

 Rename OptionT::get(const T _t) to getOrElse() broke network isolator
 

 Key: MESOS-3002
 URL: https://issues.apache.org/jira/browse/MESOS-3002
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Mark Wang

 Change to Option from get() to getOrElse() breaks network isolator.  Building 
 with '../configure --with-network-isolator' generates the following error:
 ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static 
 member function 'static Trymesos::slave::Isolator* 
 mesos::internal::slave::PortMappingIsolatorProcess::create(const 
 mesos::internal::slave::Flags)':
 ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: 
 error: no matching function for call to 'Optionstd::basic_stringchar 
 ::get(const char [1]) const'
flags.resources.get(),
  ^
 ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: 
 note: candidates are:
 In file included from 
 ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0,
  from ../../3rdparty/libprocess/include/process/check.hpp:19,
  from ../../3rdparty/libprocess/include/process/collect.hpp:7,
  from 
 ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30:
 ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: 
 note: const T OptionT::get() const [with T = std::basic_stringchar]
const T get() const { assert(isSome()); return t; }
 ^
 ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: 
 note:   candidate expects 0 arguments, 1 provided
 ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: 
 note: T OptionT::get() [with T = std::basic_stringchar]
T get() { assert(isSome()); return t; }
   ^
 ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: 
 note:   candidate expects 0 arguments, 1 provided
 make[2]: *** 
 [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo]
  Error 1
 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src'
 make[1]: *** [check] Error 2
 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src'
 make: *** [check-recursive] Error 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3002) Rename OptionT::get(const T _t) to getOrElse() broke network isolator

2015-07-07 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617165#comment-14617165
 ] 

Paul Brett commented on MESOS-3002:
---

Mark - can you take a look at this.  Thanks

 Rename OptionT::get(const T _t) to getOrElse() broke network isolator
 

 Key: MESOS-3002
 URL: https://issues.apache.org/jira/browse/MESOS-3002
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Mark Wang

 Change to Option from get() to getOrElse() breaks network isolator.  Building 
 with '../configure --with-network-isolator' generates the following error:
 ../../src/slave/containerizer/isolators/network/port_mapping.cpp: In static 
 member function 'static Trymesos::slave::Isolator* 
 mesos::internal::slave::PortMappingIsolatorProcess::create(const 
 mesos::internal::slave::Flags)':
 ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: 
 error: no matching function for call to 'Optionstd::basic_stringchar 
 ::get(const char [1]) const'
flags.resources.get(),
  ^
 ../../src/slave/containerizer/isolators/network/port_mapping.cpp:1103:29: 
 note: candidates are:
 In file included from 
 ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:26:0,
  from ../../3rdparty/libprocess/include/process/check.hpp:19,
  from ../../3rdparty/libprocess/include/process/collect.hpp:7,
  from 
 ../../src/slave/containerizer/isolators/network/port_mapping.cpp:30:
 ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: 
 note: const T OptionT::get() const [with T = std::basic_stringchar]
const T get() const { assert(isSome()); return t; }
 ^
 ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:130:12: 
 note:   candidate expects 0 arguments, 1 provided
 ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: 
 note: T OptionT::get() [with T = std::basic_stringchar]
T get() { assert(isSome()); return t; }
   ^
 ../../3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:131:6: 
 note:   candidate expects 0 arguments, 1 provided
 make[2]: *** 
 [slave/containerizer/isolators/network/libmesos_no_3rdparty_la-port_mapping.lo]
  Error 1
 make[2]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src'
 make[1]: *** [check] Error 2
 make[1]: Leaving directory `/home/pbrett/sandbox/mesos.master/build/src'
 make: *** [check-recursive] Error 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2993) Document per container unique egress flow and network queueing statistics

2015-07-07 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617498#comment-14617498
 ] 

Paul Brett commented on MESOS-2993:
---

Review draft available at https://reviews.apache.org/r/36281/

 Document  per container unique egress flow and network queueing statistics
 --

 Key: MESOS-2993
 URL: https://issues.apache.org/jira/browse/MESOS-2993
 Project: Mesos
  Issue Type: Bug
  Components: documentation, isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Document new network isolation capabilities in 0.23



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3011) Publish release documentation for major releases on website

2015-07-07 Thread Paul Brett (JIRA)
Paul Brett created MESOS-3011:
-

 Summary: Publish release documentation for major releases on 
website
 Key: MESOS-3011
 URL: https://issues.apache.org/jira/browse/MESOS-3011
 Project: Mesos
  Issue Type: Documentation
Reporter: Paul Brett


Currently, the website only provides a single version of the documentation.  We 
should publish documentation for each release on the website independently (for 
example as https://mesos.apache.org/documentation/0.22/index.html, 
https://mesos.apache.org/documentation/0.23/index.html) and make latest 
redirect to the current version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2993) Document per container unique egress flow and network queueing statistics

2015-07-07 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617770#comment-14617770
 ] 

Paul Brett commented on MESOS-2993:
---

Update incorporating reviewer comments.

 Document  per container unique egress flow and network queueing statistics
 --

 Key: MESOS-2993
 URL: https://issues.apache.org/jira/browse/MESOS-2993
 Project: Mesos
  Issue Type: Bug
  Components: documentation, isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Document new network isolation capabilities in 0.23



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2952) Provide user namespaces for privileged access inside containers

2015-07-06 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2952:
--
Issue Type: Epic  (was: Bug)

 Provide user namespaces for privileged access inside containers
 ---

 Key: MESOS-2952
 URL: https://issues.apache.org/jira/browse/MESOS-2952
 Project: Mesos
  Issue Type: Epic
Reporter: Paul Brett
Assignee: Paul Brett

 User namespaces allow per-namespace mappings of user and group IDs. This 
 means that a process's user and group IDs inside a user namespace can be 
 different from its IDs outside of the namespace. Most notably, a process can 
 have a nonzero user ID outside a namespace while at the same time having a 
 user ID of zero inside the namespace; in other words, the process is 
 unprivileged for operations outside the user namespace but has root 
 privileges inside the namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2993) Document per container unique egress flow and network queueing statistics

2015-07-06 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2993:
-

 Summary: Document  per container unique egress flow and network 
queueing statistics
 Key: MESOS-2993
 URL: https://issues.apache.org/jira/browse/MESOS-2993
 Project: Mesos
  Issue Type: Bug
  Components: documentation, isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett


Document new network isolation capabilities in 0.23



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2994) Design doc for creating user namespaces inside containers

2015-07-06 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2994:
-

 Summary: Design doc for creating user namespaces inside containers
 Key: MESOS-2994
 URL: https://issues.apache.org/jira/browse/MESOS-2994
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2994) Design doc for creating user namespaces inside containers

2015-07-06 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2994:
--
Labels: twitter  (was: )

 Design doc for creating user namespaces inside containers
 -

 Key: MESOS-2994
 URL: https://issues.apache.org/jira/browse/MESOS-2994
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2993) Document per container unique egress flow and network queueing statistics

2015-07-06 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2993:
--
Labels: twitter  (was: )

 Document  per container unique egress flow and network queueing statistics
 --

 Key: MESOS-2993
 URL: https://issues.apache.org/jira/browse/MESOS-2993
 Project: Mesos
  Issue Type: Bug
  Components: documentation, isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Document new network isolation capabilities in 0.23



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2956) Stack trace in isolator tests on Linux VM

2015-06-29 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606156#comment-14606156
 ] 

Paul Brett commented on MESOS-2956:
---

https://reviews.apache.org/r/36014/

 Stack trace in isolator tests on Linux VM
 -

 Key: MESOS-2956
 URL: https://issues.apache.org/jira/browse/MESOS-2956
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett

 PerfEventIsolatorTest fails with stack trace when run in Linux VM
 [--] 1 test from PerfEventIsolatorTest
 [ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
 F0629 11:38:17.088412 14114 isolator_tests.cpp:837] CHECK_SOME(isolator): 
 Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } 
 *** Check failure stack trace: ***
 @ 0x2ab5e5aeeb1a  google::LogMessage::Fail()
 @ 0x2ab5e5aeea66  google::LogMessage::SendToLog()
 @ 0x2ab5e5aee468  google::LogMessage::Flush()
 @ 0x2ab5e5af137c  google::LogMessageFatal::~LogMessageFatal()
 @   0x864b0c  _CheckFatal::~_CheckFatal()
 @   0xc458ed  
 mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody()
 @  0x119fb17  
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x119ac9e  
 testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x118305f  testing::Test::Run()
 @  0x1183782  testing::TestInfo::Run()
 @  0x1183d0a  testing::TestCase::Run()
 @  0x11889d4  testing::internal::UnitTestImpl::RunAllTests()
 @  0x11a09ae  
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x119b9c3  
 testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x11878e0  testing::UnitTest::Run()
 @   0xcdc8c7  main
 @ 0x2ab5e7fdbec5  (unknown)
 @   0x861a89  (unknown)
 make[3]: *** [check-local] Aborted (core dumped)
 [ RUN  ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup
 F0629 11:49:38.763434 18836 isolator_tests.cpp:1200] CHECK_SOME(isolator): 
 Failed to create PerfEvent isolator, invalid events: { cpu-cycles } 
 *** Check failure stack trace: ***
 @ 0x2ba40eb2db1a  google::LogMessage::Fail()
 @ 0x2ba40eb2da66  google::LogMessage::SendToLog()
 @ 0x2ba40eb2d468  google::LogMessage::Flush()
 @ 0x2ba40eb3037c  google::LogMessageFatal::~LogMessageFatal()
 @   0x864b0c  _CheckFatal::~_CheckFatal()
 @   0xc5ddb1  
 mesos::internal::tests::UserCgroupIsolatorTest_ROOT_CGROUPS_UserCgroup_Test::TestBody()
 @  0x119fc43  
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x119adca  
 testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x118318b  testing::Test::Run()
 @  0x11838ae  testing::TestInfo::Run()
 @  0x1183e36  testing::TestCase::Run()
 @  0x1188b00  testing::internal::UnitTestImpl::RunAllTests()
 @  0x11a0ada  
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x119baef  
 testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x1187a0c  testing::UnitTest::Run()
 @   0xcdc9f3  main
 @ 0x2ba41101aec5  (unknown)
 @   0x861a89  (unknown)
 make[3]: *** [check-local] Aborted (core dumped)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2956) Stack trace in isolator tests on Linux VM

2015-06-29 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2956:
-

 Summary: Stack trace in isolator tests on Linux VM
 Key: MESOS-2956
 URL: https://issues.apache.org/jira/browse/MESOS-2956
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett


PerfEventIsolatorTest fails with stack trace when run in Linux VM

[--] 1 test from PerfEventIsolatorTest
[ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
F0629 11:38:17.088412 14114 isolator_tests.cpp:837] CHECK_SOME(isolator): 
Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } 
*** Check failure stack trace: ***
@ 0x2ab5e5aeeb1a  google::LogMessage::Fail()
@ 0x2ab5e5aeea66  google::LogMessage::SendToLog()
@ 0x2ab5e5aee468  google::LogMessage::Flush()
@ 0x2ab5e5af137c  google::LogMessageFatal::~LogMessageFatal()
@   0x864b0c  _CheckFatal::~_CheckFatal()
@   0xc458ed  
mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody()
@  0x119fb17  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x119ac9e  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x118305f  testing::Test::Run()
@  0x1183782  testing::TestInfo::Run()
@  0x1183d0a  testing::TestCase::Run()
@  0x11889d4  testing::internal::UnitTestImpl::RunAllTests()
@  0x11a09ae  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x119b9c3  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x11878e0  testing::UnitTest::Run()
@   0xcdc8c7  main
@ 0x2ab5e7fdbec5  (unknown)
@   0x861a89  (unknown)
make[3]: *** [check-local] Aborted (core dumped)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2956) Stack trace in isolator tests on Linux VM

2015-06-29 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2956:
--
Description: 
PerfEventIsolatorTest fails with stack trace when run in Linux VM

[--] 1 test from PerfEventIsolatorTest
[ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
F0629 11:38:17.088412 14114 isolator_tests.cpp:837] CHECK_SOME(isolator): 
Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } 
*** Check failure stack trace: ***
@ 0x2ab5e5aeeb1a  google::LogMessage::Fail()
@ 0x2ab5e5aeea66  google::LogMessage::SendToLog()
@ 0x2ab5e5aee468  google::LogMessage::Flush()
@ 0x2ab5e5af137c  google::LogMessageFatal::~LogMessageFatal()
@   0x864b0c  _CheckFatal::~_CheckFatal()
@   0xc458ed  
mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody()
@  0x119fb17  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x119ac9e  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x118305f  testing::Test::Run()
@  0x1183782  testing::TestInfo::Run()
@  0x1183d0a  testing::TestCase::Run()
@  0x11889d4  testing::internal::UnitTestImpl::RunAllTests()
@  0x11a09ae  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x119b9c3  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x11878e0  testing::UnitTest::Run()
@   0xcdc8c7  main
@ 0x2ab5e7fdbec5  (unknown)
@   0x861a89  (unknown)
make[3]: *** [check-local] Aborted (core dumped)

[ RUN  ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup
F0629 11:49:38.763434 18836 isolator_tests.cpp:1200] CHECK_SOME(isolator): 
Failed to create PerfEvent isolator, invalid events: { cpu-cycles } 
*** Check failure stack trace: ***
@ 0x2ba40eb2db1a  google::LogMessage::Fail()
@ 0x2ba40eb2da66  google::LogMessage::SendToLog()
@ 0x2ba40eb2d468  google::LogMessage::Flush()
@ 0x2ba40eb3037c  google::LogMessageFatal::~LogMessageFatal()
@   0x864b0c  _CheckFatal::~_CheckFatal()
@   0xc5ddb1  
mesos::internal::tests::UserCgroupIsolatorTest_ROOT_CGROUPS_UserCgroup_Test::TestBody()
@  0x119fc43  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x119adca  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x118318b  testing::Test::Run()
@  0x11838ae  testing::TestInfo::Run()
@  0x1183e36  testing::TestCase::Run()
@  0x1188b00  testing::internal::UnitTestImpl::RunAllTests()
@  0x11a0ada  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x119baef  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x1187a0c  testing::UnitTest::Run()
@   0xcdc9f3  main
@ 0x2ba41101aec5  (unknown)
@   0x861a89  (unknown)
make[3]: *** [check-local] Aborted (core dumped)



  was:
PerfEventIsolatorTest fails with stack trace when run in Linux VM

[--] 1 test from PerfEventIsolatorTest
[ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
F0629 11:38:17.088412 14114 isolator_tests.cpp:837] CHECK_SOME(isolator): 
Failed to create PerfEvent isolator, invalid events: { cycles, task-clock } 
*** Check failure stack trace: ***
@ 0x2ab5e5aeeb1a  google::LogMessage::Fail()
@ 0x2ab5e5aeea66  google::LogMessage::SendToLog()
@ 0x2ab5e5aee468  google::LogMessage::Flush()
@ 0x2ab5e5af137c  google::LogMessageFatal::~LogMessageFatal()
@   0x864b0c  _CheckFatal::~_CheckFatal()
@   0xc458ed  
mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody()
@  0x119fb17  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x119ac9e  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x118305f  testing::Test::Run()
@  0x1183782  testing::TestInfo::Run()
@  0x1183d0a  testing::TestCase::Run()
@  0x11889d4  testing::internal::UnitTestImpl::RunAllTests()
@  0x11a09ae  
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x119b9c3  
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x11878e0  testing::UnitTest::Run()
@   0xcdc8c7  main
@ 0x2ab5e7fdbec5  (unknown)
@   0x861a89  (unknown)
make[3]: *** [check-local] Aborted (core dumped)



 Stack trace in isolator tests on Linux VM
 -

 Key: MESOS-2956
 URL: https://issues.apache.org/jira/browse/MESOS-2956
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett

 

[jira] [Created] (MESOS-2952) Provide user namespaces for privileged access inside containers

2015-06-26 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2952:
-

 Summary: Provide user namespaces for privileged access inside 
containers
 Key: MESOS-2952
 URL: https://issues.apache.org/jira/browse/MESOS-2952
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett


User namespaces allow per-namespace mappings of user and group IDs. This means 
that a process's user and group IDs inside a user namespace can be different 
from its IDs outside of the namespace. Most notably, a process can have a 
nonzero user ID outside a namespace while at the same time having a user ID of 
zero inside the namespace; in other words, the process is unprivileged for 
operations outside the user namespace but has root privileges inside the 
namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2928) Update stout to #include headers for symbols we rely on

2015-06-25 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2928:
--
Description: Update mesos to #include headers for symbols we rely on and 
reorder to comply with the style guide.  (was: Update mesos to #include headers 
for symbols we rely on)

 Update stout to #include headers for symbols we rely on
 ---

 Key: MESOS-2928
 URL: https://issues.apache.org/jira/browse/MESOS-2928
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett

 Update mesos to #include headers for symbols we rely on and reorder to comply 
 with the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2928) Update stout #include headers

2015-06-25 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2928:
--
Description: Update stout to #include headers for symbols we rely on and 
reorder to comply with the style guide.  (was: Update mesos to #include headers 
for symbols we rely on and reorder to comply with the style guide.)

 Update stout #include headers
 -

 Key: MESOS-2928
 URL: https://issues.apache.org/jira/browse/MESOS-2928
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett

 Update stout to #include headers for symbols we rely on and reorder to comply 
 with the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2927) Update mesos #include headers

2015-06-25 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2927:
--
Description: Update mesos to #include headers for symbols we rely on and 
reorder to comply with the style guide.  (was: Update mesos to #include headers 
for symbols we rely on)

 Update mesos #include headers
 -

 Key: MESOS-2927
 URL: https://issues.apache.org/jira/browse/MESOS-2927
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett

 Update mesos to #include headers for symbols we rely on and reorder to comply 
 with the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2929) Update libprocess #include headers

2015-06-25 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2929:
--
Summary: Update libprocess #include headers  (was: Update libprocess to 
#include headers for symbols we rely on)

 Update libprocess #include headers
 --

 Key: MESOS-2929
 URL: https://issues.apache.org/jira/browse/MESOS-2929
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2928) Update stout #include headers

2015-06-25 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2928:
--
Summary: Update stout #include headers  (was: Update stout to #include 
headers for symbols we rely on)

 Update stout #include headers
 -

 Key: MESOS-2928
 URL: https://issues.apache.org/jira/browse/MESOS-2928
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett

 Update mesos to #include headers for symbols we rely on and reorder to comply 
 with the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2929) Update libprocess #include headers

2015-06-25 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2929:
--
Description: 
Update libprocess to #include headers for symbols we rely on and reorder to 
comply with the style guide.


 Update libprocess #include headers
 --

 Key: MESOS-2929
 URL: https://issues.apache.org/jira/browse/MESOS-2929
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett

 Update libprocess to #include headers for symbols we rely on and reorder to 
 comply with the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2925) Invalid usage of ATOMIC_FLAG_INIT in member initialization

2015-06-24 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600121#comment-14600121
 ] 

Paul Brett commented on MESOS-2925:
---

[~jvanremoortere] - I think the init macro in the initializer list use looks 
much better but the compiler warns against it because the behavior is undefined 
and therefore unsafe.  BTW, I'm using clang on Linux, so I don't know if the 
proposed Apple tweak would help me.  

 Invalid usage of ATOMIC_FLAG_INIT in member initialization
 --

 Key: MESOS-2925
 URL: https://issues.apache.org/jira/browse/MESOS-2925
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.23.0
Reporter: Paul Brett

 The C++ specification states:
 The macro ATOMIC_FLAG_INIT shall be defined in such a way that it can be used 
 to initialize an object of type atomic_flag to the clear state. The macro can 
 be used in the form: atomic_flag guard = ATOMIC_FLAG_INIT; It is 
 unspecified whether the macro can be used in other initialization contexts. 
 Clang catches this (although reports it erroneously as a braced scaled init 
 issue) and refuses to compile libprocess.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2925) Invalid usage of ATOMIC_FLAG_INIT in member initialization

2015-06-24 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600104#comment-14600104
 ] 

Paul Brett commented on MESOS-2925:
---

Up for review at https://reviews.apache.org/r/35841/

 Invalid usage of ATOMIC_FLAG_INIT in member initialization
 --

 Key: MESOS-2925
 URL: https://issues.apache.org/jira/browse/MESOS-2925
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.23.0
Reporter: Paul Brett

 The C++ specification states:
 The macro ATOMIC_FLAG_INIT shall be defined in such a way that it can be used 
 to initialize an object of type atomic_flag to the clear state. The macro can 
 be used in the form: atomic_flag guard = ATOMIC_FLAG_INIT; It is 
 unspecified whether the macro can be used in other initialization contexts. 
 Clang catches this (although reports it erroneously as a braced scaled init 
 issue) and refuses to compile libprocess.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2926) Extend mesos-style.py/cpplint.py to check #include files

2015-06-24 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2926:
-

 Summary: Extend mesos-style.py/cpplint.py to check #include files
 Key: MESOS-2926
 URL: https://issues.apache.org/jira/browse/MESOS-2926
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett


cpplint.py provides the capability to enforce the style guide requirements for 
#including everything you use and ordering files based on type but it does not 
work for mesos because we do use #include ... for project files where it 
expects #include   

We should update the style checker to support our include usage and then turn 
it on by default in the commit hook.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2927) Update mesos to #include headers for symbols we rely on

2015-06-24 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2927:
-

 Summary: Update mesos to #include headers for symbols we rely on
 Key: MESOS-2927
 URL: https://issues.apache.org/jira/browse/MESOS-2927
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett


Update mesos to #include headers for symbols we rely on



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2929) Update libprocess to #include headers for symbols we rely on

2015-06-24 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2929:
-

 Summary: Update libprocess to #include headers for symbols we rely 
on
 Key: MESOS-2929
 URL: https://issues.apache.org/jira/browse/MESOS-2929
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2928) Update stout to #include headers for symbols we rely on

2015-06-24 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600455#comment-14600455
 ] 

Paul Brett edited comment on MESOS-2928 at 6/25/15 12:58 AM:
-

https://reviews.apache.org/r/35861/


was (Author: pbrett):
https://reviews.apache.org/r/35860/

 Update stout to #include headers for symbols we rely on
 ---

 Key: MESOS-2928
 URL: https://issues.apache.org/jira/browse/MESOS-2928
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett

 Update mesos to #include headers for symbols we rely on



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2928) Update stout to #include headers for symbols we rely on

2015-06-24 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600455#comment-14600455
 ] 

Paul Brett commented on MESOS-2928:
---

https://reviews.apache.org/r/35860/

 Update stout to #include headers for symbols we rely on
 ---

 Key: MESOS-2928
 URL: https://issues.apache.org/jira/browse/MESOS-2928
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett

 Update mesos to #include headers for symbols we rely on



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2925) Invalid usage of ATOMIC_FLAG_INIT in member initialization

2015-06-24 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2925:
-

Assignee: Paul Brett

 Invalid usage of ATOMIC_FLAG_INIT in member initialization
 --

 Key: MESOS-2925
 URL: https://issues.apache.org/jira/browse/MESOS-2925
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett

 The C++ specification states:
 The macro ATOMIC_FLAG_INIT shall be defined in such a way that it can be used 
 to initialize an object of type atomic_flag to the clear state. The macro can 
 be used in the form: atomic_flag guard = ATOMIC_FLAG_INIT; It is 
 unspecified whether the macro can be used in other initialization contexts. 
 Clang catches this (although reports it erroneously as a braced scaled init 
 issue) and refuses to compile libprocess.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2925) Invalid usage of ATOMIC_FLAG_INIT in member initialization

2015-06-24 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2925:
-

 Summary: Invalid usage of ATOMIC_FLAG_INIT in member initialization
 Key: MESOS-2925
 URL: https://issues.apache.org/jira/browse/MESOS-2925
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.23.0
Reporter: Paul Brett


The C++ specification states:

The macro ATOMIC_FLAG_INIT shall be defined in such a way that it can be used 
to initialize an object of type atomic_flag to the clear state. The macro can 
be used in the form: atomic_flag guard = ATOMIC_FLAG_INIT; It is unspecified 
whether the macro can be used in other initialization contexts. 

Clang catches this (although reports it erroneously as a braced scaled init 
issue) and refuses to compile libprocess.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2903) Network isolator should not fail when target state already exists

2015-06-22 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2903:
--
Story Points: 3  (was: 2)

 Network isolator should not fail when target state already exists
 -

 Key: MESOS-2903
 URL: https://issues.apache.org/jira/browse/MESOS-2903
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
Priority: Critical

 Network isolator has multiple instances of the following pattern:
 {noformat}
   Trybool something = ::create();  
   if (something.isError()) {  
  
 ++metrics.something_errors;  
 return Failure(Failed to create something ...)
   } else if (!icmpVethToEth0.get()) { 
   
 ++metrics.adding_veth_icmp_filters_already_exist; 
   
 return Failure(Something already exists);
   }   
   
 {noformat}
 These failures have occurred in operation due to the failure to recover or 
 delete an orphan, causing the slave to remain on line but unable to create 
 new resources.We should convert the second failure message in this 
 pattern to an information message since the final state of the system is the 
 state that we requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2904) Add slave metric to count container launch failures

2015-06-22 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596747#comment-14596747
 ] 

Paul Brett commented on MESOS-2904:
---

Fix without test hardness is out for review https://reviews.apache.org/r/35738/


 Add slave metric to count container launch failures
 ---

 Key: MESOS-2904
 URL: https://issues.apache.org/jira/browse/MESOS-2904
 Project: Mesos
  Issue Type: Bug
  Components: slave, statistics
Reporter: Paul Brett
Assignee: Paul Brett

 We have seen circumstances where a machine has been consistently unable to 
 launch containers due to an inconsistent state (for example, unexpected 
 network configuration).   Adding a metric to track container launch failures 
 will allow us to detect and alert on slaves in such a state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2903) Network isolator should not fail when target state already exists

2015-06-19 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2903:
-

 Summary: Network isolator should not fail when target state 
already exists
 Key: MESOS-2903
 URL: https://issues.apache.org/jira/browse/MESOS-2903
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Reporter: Paul Brett


Network isolator has multiple instances of the following pattern:

{noformat}
  Trybool something = ::create();  
  if (something.isError()) {   
++metrics.something_errors;  
return Failure(Failed to create something ...)
  } else if (!icmpVethToEth0.get()) {   

++metrics.adding_veth_icmp_filters_already_exist;   

return Failure(Something already exists);
  } 

{noformat}

These failures have occurred in operation due to the failure to recover or 
delete an orphan, causing the slave to remain on line but unable to create new 
resources.We should convert the second failure message in this pattern to 
an information message since the final state of the system is the state that we 
requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2904) Add slave metric to count container launch failures

2015-06-19 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2904:
-

 Summary: Add slave metric to count container launch failures
 Key: MESOS-2904
 URL: https://issues.apache.org/jira/browse/MESOS-2904
 Project: Mesos
  Issue Type: Bug
  Components: slave, statistics
Reporter: Paul Brett
Assignee: Paul Brett


We have seen circumstances where a machine has been consistently unable to 
launch containers due to an inconsistent state (for example, unexpected network 
configuration).   Adding a metric to track container launch failures will allow 
us to detect and alert on slaves in such a state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2903) Network isolator should not fail when target state already exists

2015-06-19 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594171#comment-14594171
 ] 

Paul Brett commented on MESOS-2903:
---

The new logic will be:

{noformat}
  Trybool something = ::create();  
  if (something.isError()) {   
++metrics.something_errors;  
return Failure(Failed to create something ...)
  } else if (!icmpVethToEth0.get()) {
// already exists
Trybool something = ::update();
if (something.isError()) {  
 
  ++metrics.something_errors;  
  return Failure(Failed to update something ...)
}
  } 
{noformat}

 Network isolator should not fail when target state already exists
 -

 Key: MESOS-2903
 URL: https://issues.apache.org/jira/browse/MESOS-2903
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Priority: Critical

 Network isolator has multiple instances of the following pattern:
 {noformat}
   Trybool something = ::create();  
   if (something.isError()) {  
  
 ++metrics.something_errors;  
 return Failure(Failed to create something ...)
   } else if (!icmpVethToEth0.get()) { 
   
 ++metrics.adding_veth_icmp_filters_already_exist; 
   
 return Failure(Something already exists);
   }   
   
 {noformat}
 These failures have occurred in operation due to the failure to recover or 
 delete an orphan, causing the slave to remain on line but unable to create 
 new resources.We should convert the second failure message in this 
 pattern to an information message since the final state of the system is the 
 state that we requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2903) Network isolator should not fail when target state already exists

2015-06-19 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2903:
-

Assignee: Paul Brett

 Network isolator should not fail when target state already exists
 -

 Key: MESOS-2903
 URL: https://issues.apache.org/jira/browse/MESOS-2903
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
Priority: Critical

 Network isolator has multiple instances of the following pattern:
 {noformat}
   Trybool something = ::create();  
   if (something.isError()) {  
  
 ++metrics.something_errors;  
 return Failure(Failed to create something ...)
   } else if (!icmpVethToEth0.get()) { 
   
 ++metrics.adding_veth_icmp_filters_already_exist; 
   
 return Failure(Something already exists);
   }   
   
 {noformat}
 These failures have occurred in operation due to the failure to recover or 
 delete an orphan, causing the slave to remain on line but unable to create 
 new resources.We should convert the second failure message in this 
 pattern to an information message since the final state of the system is the 
 state that we requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2903) Network isolator should not fail when target state already exists

2015-06-19 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2903:
--
Story Points: 2

 Network isolator should not fail when target state already exists
 -

 Key: MESOS-2903
 URL: https://issues.apache.org/jira/browse/MESOS-2903
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
Priority: Critical

 Network isolator has multiple instances of the following pattern:
 {noformat}
   Trybool something = ::create();  
   if (something.isError()) {  
  
 ++metrics.something_errors;  
 return Failure(Failed to create something ...)
   } else if (!icmpVethToEth0.get()) { 
   
 ++metrics.adding_veth_icmp_filters_already_exist; 
   
 return Failure(Something already exists);
   }   
   
 {noformat}
 These failures have occurred in operation due to the failure to recover or 
 delete an orphan, causing the slave to remain on line but unable to create 
 new resources.We should convert the second failure message in this 
 pattern to an information message since the final state of the system is the 
 state that we requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2853) Report per-container metrics from host egress filter

2015-06-19 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594175#comment-14594175
 ] 

Paul Brett commented on MESOS-2853:
---

Container metrics are not tracked by fq_codel on a per-filter basis, hence this 
information is not available.  Will wait to see the interaction of fq_codel on 
host eth0 with real workloads before deciding if further work is required.

 Report per-container metrics from host egress filter
 

 Key: MESOS-2853
 URL: https://issues.apache.org/jira/browse/MESOS-2853
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Export in statistics.json the fq_codel flow statistics for each container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2332) Report per-container metrics for network bandwidth throttling

2015-06-17 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590894#comment-14590894
 ] 

Paul Brett commented on MESOS-2332:
---

Network performance statistics are now reported in statistics.json on the slave.

 Report per-container metrics for network bandwidth throttling
 -

 Key: MESOS-2332
 URL: https://issues.apache.org/jira/browse/MESOS-2332
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: features, twitter

 Export metrics from the network isolation to identify scope and duration of 
 container throttling.  
 Packet loss can be identified from the overlimits and requeues fields of the 
 htb qdisc report for the virtual interface, e.g.
 {noformat}
 $ tc -s -d qdisc show dev mesos19223
 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 
 1 1 1
  Sent 158213287452 bytes 1030876393 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
 qdisc ingress : parent :fff1 
  Sent 119381747824 bytes 1144549901 pkt (dropped 2044879, overlimits 0 
 requeues 0)
  backlog 0b 0p requeues 0
 {noformat}
 Note that since a packet can be examined multiple times before transmission, 
 overlimits can exceed total packets sent.  
 Add to the port_mapping isolator usage() and the container statistics 
 protobuf. Carefully consider the naming (esp tx/rx) + commenting of the 
 protobuf fields so it's clear what these represent and how they are different 
 to the existing dropped packet counts from the network stack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2874) Convert PortMappingStatistics to use automatic JSON encoding/decoding

2015-06-16 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2874:
-

 Summary: Convert PortMappingStatistics to use automatic JSON 
encoding/decoding
 Key: MESOS-2874
 URL: https://issues.apache.org/jira/browse/MESOS-2874
 Project: Mesos
  Issue Type: Bug
Reporter: Paul Brett
Assignee: Paul Brett


Simplify PortMappingStatistics by using JSON::Protocol and protobuf::parse to 
convert ResourceStatistics to/from line format.

This change will simplify the implementation of MESOS-2332.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2874) Convert PortMappingStatistics to use automatic JSON encoding/decoding

2015-06-16 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2874:
--
Component/s: test
 isolation

 Convert PortMappingStatistics to use automatic JSON encoding/decoding
 -

 Key: MESOS-2874
 URL: https://issues.apache.org/jira/browse/MESOS-2874
 Project: Mesos
  Issue Type: Bug
  Components: isolation, test
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Simplify PortMappingStatistics by using JSON::Protocol and protobuf::parse to 
 convert ResourceStatistics to/from line format.
 This change will simplify the implementation of MESOS-2332.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2784) Add constexpr to C++11 whitelist

2015-06-15 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586692#comment-14586692
 ] 

Paul Brett commented on MESOS-2784:
---

Comments from review incorporated and review updated.

 Add constexpr to C++11 whitelist
 

 Key: MESOS-2784
 URL: https://issues.apache.org/jira/browse/MESOS-2784
 Project: Mesos
  Issue Type: Improvement
  Components: documentation
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 constexpr is currently used to eliminate initialization dependency issues for 
 non-POD objects.  We should add it to the whitelist of acceptable c++11 
 features in the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2853) Report per-container metrics from host egress filter

2015-06-10 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2853:
-

 Summary: Report per-container metrics from host egress filter
 Key: MESOS-2853
 URL: https://issues.apache.org/jira/browse/MESOS-2853
 Project: Mesos
  Issue Type: Improvement
  Components: isolation, twitter
Reporter: Paul Brett
Assignee: Paul Brett


Export in statistics.json the fq_codel flow statistics for each container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-2821) Document and consolidate qdisc handles

2015-06-10 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett closed MESOS-2821.
-

 Document and consolidate qdisc handles
 --

 Key: MESOS-2821
 URL: https://issues.apache.org/jira/browse/MESOS-2821
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 The structure of traffic control qdiscs and filters in non-trivial with the 
 knowledge of which handles are the parents of which filters or qdiscs are in 
 the create and recovery functions and will be needed to collect statistics on 
 the links.  Lets pull out the constants and document them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2836) Report per-container metrics for network bandwidth throttling to the slave

2015-06-09 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2836:
-

 Summary: Report per-container metrics for network bandwidth 
throttling to the slave
 Key: MESOS-2836
 URL: https://issues.apache.org/jira/browse/MESOS-2836
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett


Report per-container metrics for network bandwidth throttling to the slave in 
the output of mesos-network-helper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2836) Report per-container metrics for network bandwidth throttling to the slave

2015-06-09 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579626#comment-14579626
 ] 

Paul Brett commented on MESOS-2836:
---

https://reviews.apache.org/r/35229/

 Report per-container metrics for network bandwidth throttling to the slave
 --

 Key: MESOS-2836
 URL: https://issues.apache.org/jira/browse/MESOS-2836
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett

 Report per-container metrics for network bandwidth throttling to the slave in 
 the output of mesos-network-helper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2837) Decode network statistics from mesos-network-helper

2015-06-09 Thread Paul Brett (JIRA)
Paul Brett created MESOS-2837:
-

 Summary: Decode network statistics from mesos-network-helper
 Key: MESOS-2837
 URL: https://issues.apache.org/jira/browse/MESOS-2837
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett


Decode network statistics from mesos-network-helper and output to slave 
statistics.json



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2837) Decode network statistics from mesos-network-helper

2015-06-09 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579628#comment-14579628
 ] 

Paul Brett commented on MESOS-2837:
---

https://reviews.apache.org/r/35257/

 Decode network statistics from mesos-network-helper
 ---

 Key: MESOS-2837
 URL: https://issues.apache.org/jira/browse/MESOS-2837
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett

 Decode network statistics from mesos-network-helper and output to slave 
 statistics.json



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2752) Add HTB queueing discipline wrapper class

2015-06-08 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2752:
--
Issue Type: Improvement  (was: Bug)

 Add HTB queueing discipline wrapper class
 -

 Key: MESOS-2752
 URL: https://issues.apache.org/jira/browse/MESOS-2752
 Project: Mesos
  Issue Type: Improvement
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Network isolator uses a Hierarchical Token Bucket (HTB) traffic control 
 discipline on the egress filter inside each container as the root for adding 
 traffic filters.  A HTB wrapper is needed to access the network statistics 
 for this interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >