[jira] [Commented] (MESOS-8139) Upgrade protobuf to 3.4.x.

2017-11-13 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249555#comment-16249555
 ] 

Dmitry Zhuk commented on MESOS-8139:


https://reviews.apache.org/r/63752/
https://reviews.apache.org/r/63753/

> Upgrade protobuf to 3.4.x.
> --
>
> Key: MESOS-8139
> URL: https://issues.apache.org/jira/browse/MESOS-8139
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Mahler
>  Labels: performance
>
> The 3.4.x release includes move support:
> https://github.com/google/protobuf/releases/tag/v3.4.0
> This will provide some performance improvements for us, and will allow us to 
> start using move semantics for messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7195) Use C++11 variadic templates for process::dispatch/defer/delay/async/run

2017-07-19 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093109#comment-16093109
 ] 

Dmitry Zhuk commented on MESOS-7195:


[~mcypark] sounds good. Let's proceed with the patches then.

> Use C++11 variadic templates for process::dispatch/defer/delay/async/run
> 
>
> Key: MESOS-7195
> URL: https://issues.apache.org/jira/browse/MESOS-7195
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Yan Xu
>
> These methods are currently implemented using {{REPEAT_FROM_TO}} (i.e., 
> {{BOOST_PP_REPEAT_FROM_TO}}):
> {code:title=}
> REPEAT_FROM_TO(1, 11, TEMPLATE, _) // Args A0 -> A9.
> {code}
> This means we have to bump up the number of repetition whenever we have a new 
> method with more args.
> Seems like we can replace this with C++11 variadic templates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7713) Optimize number of copies made in dispatch/defer mechanism

2017-06-29 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068196#comment-16068196
 ] 

Dmitry Zhuk commented on MESOS-7713:


https://docs.google.com/spreadsheets/d/1xqFxcWxOyjbozro0SkshTIKkaGgShRMN8bqBsdtnl8k/edit?usp=sharing
 - this demonstrates performance improvements for master failover with patches 
applied. Reregistration time reduced from 1:20 to 1:00 (not including time to 
recover registry).

Test environment: scale test cluster simulating ~40K agents and ~100K tasks, 
dedicated master hosts, {{--reregistration_backoff_factor=45secs}} on agents.

Versions tested:
1.2.0 - Mesos 1.2.0 +  https://reviews.apache.org/r/58355/
1.2.0-fix - same as above + https://reviews.apache.org/r/60002/, 
https://reviews.apache.org/r/60003/ + https://reviews.apache.org/r/60472/, 
https://reviews.apache.org/r/60473/,  https://reviews.apache.org/r/60474/ + 
changes to install {{Master::reregisterSlave}} handler with {{mutable_}} 
versions of protobuf message fields accessors, take parameters by value and 
{{std::move}} them to {{defer}}.

Each version was tested 3 times by killing leading master and collecting 
metrics from newly elected master logs.
Metrics are calculated by counting number of different messages appearing in 
logs:
{{reregistering}} - "Re-registering agent ..."
{{ignoring}} - "Ignoring re-register agent message from agent ... as 
readmission is already in progress"
{{reregistered}} - "Re-registered agent ..."
{{sending}} - "Sending updated checkpointed resources ... to agent ..."
{{update}} - "Received update of agent ... with total oversubscribed resources 
..."
{{pending}} = {{reregistering}} - {{sending}} - indicates number of in-progress 
reregistrations.
{{offers}} - "Sending ... offers to framework ..."
{{applied_cnt}}, {{applied}} - "Applied ... operations in ...; attempting to 
update the registry" (corresponds to number of message and total number of 
operations)
{{reg_updated}} - "Successfully updated the registry in ..." (extracted 
duration from message).

> Optimize number of copies made in dispatch/defer mechanism
> --
>
> Key: MESOS-7713
> URL: https://issues.apache.org/jira/browse/MESOS-7713
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Affects Versions: 1.2.0, 1.2.1, 1.3.0
>Reporter: Dmitry Zhuk
>Assignee: Dmitry Zhuk
>
> Profiling agents reregistration for a large cluster shows, that many CPU 
> cycles are spent on copying protobuf objects. This is partially due to copies 
> made by a code like this:
> {code}
> future.then(defer(self(), &Process::method, param);
> {code}
> {{param}} could be copied 8-10 times before it reaches {{method}}. 
> Specifically, {{reregisterSlave}} accepts vectors of rather complex objects, 
> which are passed to {{defer}}.
> Currently there are some places in {{defer}}, {{dispatch}} and {{Future}} 
> code, which could use {{std::move}} and {{std::forward}} to evade some of the 
> copies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6345) ExamplesTest.PersistentVolumeFramework failing due to double free corruption on Ubuntu 14.04

2017-06-27 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064894#comment-16064894
 ] 

Dmitry Zhuk commented on MESOS-6345:


https://reviews.apache.org/r/60467/

> ExamplesTest.PersistentVolumeFramework failing due to double free corruption 
> on Ubuntu 14.04
> 
>
> Key: MESOS-6345
> URL: https://issues.apache.org/jira/browse/MESOS-6345
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Reporter: Avinash Sridharan
>  Labels: mesosphere
>
> PersistentVolumeFramework tests if failing on Ubuntu 14
> {code}
> [Step 10/10] *** Error in 
> `/mnt/teamcity/work/4240ba9ddd0997c3/build/src/.libs/lt-persistent-volume-framework':
>  double free or corruption (fasttop): 0x7f1ae0006a20 ***
> [04:56:48]W:   [Step 10/10] *** Aborted at 1475902608 (unix time) try "date 
> -d @1475902608" if you are using GNU date ***
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.592744 25425 state.cpp:57] 
> Recovering state from '/mnt/teamcity/temp/buildTmp/mesos-8KiPML/2/meta'
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.592808 25423 state.cpp:57] 
> Recovering state from '/mnt/teamcity/temp/buildTmp/mesos-8KiPML/1/meta'
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.592952 25425 
> status_update_manager.cpp:203] Recovering status update manager
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.592957 25423 
> status_update_manager.cpp:203] Recovering status update manager
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593010 25424 
> containerizer.cpp:557] Recovering containerizer
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593143 25396 sched.cpp:226] 
> Version: 1.1.0
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593158 25425 master.cpp:2013] 
> Elected as the leading master!
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593173 25425 master.cpp:1560] 
> Recovering from registrar
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593211 25424 registrar.cpp:329] 
> Recovering registrar
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593250 25425 sched.cpp:330] New 
> master detected at master@172.30.2.21:45167
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593282 25425 sched.cpp:341] No 
> credentials provided. Attempting to register without authentication
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593293 25425 sched.cpp:820] 
> Sending SUBSCRIBE call to master@172.30.2.21:45167
> [04:56:48]W:   [Step 10/10] PC: @ 0x7f1b0bbaccc9 (unknown)
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593339 25425 sched.cpp:853] Will 
> retry registration in 32.354951ms if necessary
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593364 25421 master.cpp:1387] 
> Dropping 'mesos.scheduler.Call' message since not recovered yet
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593413 25428 provisioner.cpp:253] 
> Provisioner recovery complete
> [04:56:48]W:   [Step 10/10] *** SIGABRT (@0x6334) received by PID 25396 (TID 
> 0x7f1b02ed6700) from PID 25396; stack trace: ***
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593520 25421 
> containerizer.cpp:557] Recovering containerizer
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593529 25425 slave.cpp:5276] 
> Finished recovery
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593627 25422 leveldb.cpp:304] 
> Persisting metadata (8 bytes) to leveldb took 4.546422ms
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593695 25428 provisioner.cpp:253] 
> Provisioner recovery complete
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593701 25422 replica.cpp:320] 
> Persisted replica status to VOTING
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593760 25424 slave.cpp:5276] 
> Finished recovery
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593864 25427 recover.cpp:582] 
> Successfully joined the Paxos group
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593896 25425 slave.cpp:5448] 
> Querying resource estimator for oversubscribable resources
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593922 25427 recover.cpp:466] 
> Recover process terminated
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.593976 25427 slave.cpp:5462] 
> Received oversubscribable resources {} from the resource estimator
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.594002 25424 slave.cpp:5448] 
> Querying resource estimator for oversubscribable resources
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.594017 25422 log.cpp:553] 
> Attempting to start the writer
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.594030 25428 
> status_update_manager.cpp:177] Pausing sending status updates
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.594032 25427 slave.cpp:915] New 
> master detected at master@172.30.2.21:45167
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.594055 25423 slave.cpp:915] New 
> master detected at master@172.30.2.21:45167
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.594048 25428 
> status

[jira] [Commented] (MESOS-6345) ExamplesTest.PersistentVolumeFramework failing due to double free corruption on Ubuntu 14.04

2017-06-27 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064891#comment-16064891
 ] 

Dmitry Zhuk commented on MESOS-6345:


Similar crash on CentOS7 (in ExamplesTest.PersistentVolumeFramework and 
ExamplesTest.DynamicReservationFramework) presumably due to race condition for 
{{signaledWrapper}} in {{configureSignal}}.
{noformat}
[ RUN  ] ExamplesTest.DynamicReservationFramework
*** Error in `mesos/build/src/.libs/lt-dynamic-reservation-framework': double 
free or corruption (fasttop): 0x7fdfa0002e60 ***
=== Backtrace: =
/lib64/libc.so.6(+0x7c503)[0x7fdfc6da7503]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2NS1_9_DeferredIDTcl4bindadsrSt8functionIFvT0_T1_EEclcvSF__Efp1_fp2_RKNS1_3PIDIT_EEMSJ_FvSC_SD_ET2_T3_EUliiE_E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE+0x31)[0x7fdfcca9165c]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2NS1_9_DeferredIDTcl4bindadsrSt8functionIFvT0_T1_EEclcvSF__Efp1_fp2_RKNS1_3PIDIT_EEMSJ_FvSC_SD_ET2_T3_EUliiE_E10_M_managerERSt9_Any_dataRKST_St18_Manager_operation+0xa2)[0x7fdfcca79857]
mesos/build/src/.libs/lt-dynamic-reservation-framework(_ZNSt14_Function_baseD1Ev+0x33)[0x560e50f40ae7]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt8functionIFviiEED1Ev+0x18)[0x7fdfcca2ec98]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt10_Head_baseILm0ESt8functionIFviiEELb0EED1Ev+0x18)[0x7fdfcca300ce]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt11_Tuple_implILm0EISt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2D1Ev+0x18)[0x7fdfcca300e8]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt5tupleIISt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2D1Ev+0x18)[0x7fdfcca30102]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES3_St12_PlaceholderILi1EES7_ILi2D1Ev+0x1c)[0x7fdfcca30120]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt14_Function_base13_Base_managerISt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES5_St12_PlaceholderILi1EES9_ILi2E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE+0x29)[0x7fdfcca91873]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt14_Function_base13_Base_managerISt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES5_St12_PlaceholderILi1EES9_ILi2E10_M_managerERSt9_Any_dataRKSF_St18_Manager_operation+0xa2)[0x7fdfcca79ba3]
mesos/build/src/.libs/lt-dynamic-reservation-framework(_ZNSt14_Function_baseD1Ev+0x33)[0x560e50f40ae7]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZNSt8functionIFviiEED1Ev+0x18)[0x7fdfcca2ec98]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZN2os8internal15configureSignalEPKSt8functionIFviiEE+0x4a)[0x7fdfcc9db47d]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZN5mesos8internal5slave5Slave10initializeEv+0x3d5e)[0x7fdfcc9e0a78]
mesos/build/src/.libs/libmesos-1.4.0.so(_ZN7process14ProcessManager6resumeEPNS_11ProcessBaseE+0x284)[0x7fdfcd93fedc]
mesos/build/src/.libs/libmesos-1.4.0.so(+0x61152da)[0x7fdfcd93c2da]
mesos/build/src/.libs/libmesos-1.4.0.so(+0x6127bce)[0x7fdfcd94ebce]
mesos/build/src/.libs/libmesos-1.4.0.so(+0x6127b12)[0x7fdfcd94eb12]
mesos/build/src/.libs/libmesos-1.4.0.so(+0x6127a9c)[0x7fdfcd94ea9c]
/lib64/libstdc++.so.6(+0xb5230)[0x7fdfc73b7230]
/lib64/libpthread.so.0(+0x7dc5)[0x7fdfc7612dc5]
/lib64/libc.so.6(clone+0x6d)[0x7fdfc6e2276d]
{noformat}

> ExamplesTest.PersistentVolumeFramework failing due to double free corruption 
> on Ubuntu 14.04
> 
>
> Key: MESOS-6345
> URL: https://issues.apache.org/jira/browse/MESOS-6345
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Reporter: Avinash Sridharan
>  Labels: mesosphere
>
> PersistentVolumeFramework tests if failing on Ubuntu 14
> {code}
> [Step 10/10] *** Error in 
> `/mnt/teamcity/work/4240ba9ddd0997c3/build/src/.libs/lt-persistent-volume-framework':
>  double free or corruption (fasttop): 0x7f1ae0006a20 ***
> [04:56:48]W:   [Step 10/10] *** Aborted at 1475902608 (unix time) try "date 
> -d @1475902608" if you are using GNU date ***
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.592744 25425 state.cpp:57] 
> Recovering state from '/mnt/teamcity/temp/buildTmp/mesos-8KiPML/2/meta'
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.592808 25423 state.cpp:57] 
> Recovering state from '/mnt/teamcity/temp/buildTmp/mesos-8KiPML/1/meta'
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.592952 25425 
> status_update_manager.cpp:203] Recovering status update manager
> [04:56:48]W:   [Step 10/10] I1008 04:56:48.592957 25423 
> status_update_manager.cpp:203] Recovering status update manager
> [04:56:48]W:   [Step 10/10] I1008 04:56

[jira] [Commented] (MESOS-7688) Improve master failover performance by reducing unnecessary agent retries.

2017-06-22 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059977#comment-16059977
 ] 

Dmitry Zhuk commented on MESOS-7688:


Thanks [~xujyan], I've submitted MESOS-7713 to track this.
Is this statistics for 1.3.0? What is your {{--registration_backoff_factor}} on 
agents? We have a much larger cluster and bumped this parameter to 45secs. This 
got us to 70% of extra reregistrations, and 40% of ignores, 2-3 mins of 
failover.
Also make sure that you have https://reviews.apache.org/r/58355/ applied - this 
significantly improves reregistration time (was >8min before the patch, if I 
remember correctly).

> Improve master failover performance by reducing unnecessary agent retries.
> --
>
> Key: MESOS-7688
> URL: https://issues.apache.org/jira/browse/MESOS-7688
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Benjamin Mahler
>  Labels: scalability
> Attachments: 1.2.0.png, reregistration.perf.gz, reregistration.svg
>
>
> Currently, during a failover the agents will (re-)register with the master. 
> While the master is recovering, the master may drop messages from the agents, 
> and so the agents must retry registration using a backoff mechanism. For 
> large clusters, there can be a lot of overhead in processing unnecessary 
> retries from the agents, given that these messages must be deserialized and 
> contain all of the task / executor information many times over.
> In order to reduce this overhead, the idea is to avoid the need for agents to 
> blindly retry (re-)registration with the master. Two approaches for this are:
> (1) Update the MasterInfo in ZK when the master is recovered. This is a bit 
> of an abuse of MasterInfo unfortunately, but the idea is for agents to only 
> (re-)register when they see that the master reaches a recovered state. Once 
> recovered, the master will not drop messages, and therefore agents only need 
> to retry when the connection breaks.
> (2) Have the master reply with a retry message when it's in the recovering 
> state, so that agents get a clear signal that their messages were dropped. 
> The agents only retry when the connection breaks or they get a retry message. 
> This one is less optimal, because the master may have to process a lot of 
> messages and send retries, but once the master is recovered, the master will 
> process only a single (re-)registration from each agent. The number of 
> (re-)registrations that occur while the master is recovering can be reduced 
> to 1 in this approach if the master sends the retry message only after the 
> master completes recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7713) Optimize number of copies made in dispatch/defer mechanism

2017-06-22 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059952#comment-16059952
 ] 

Dmitry Zhuk commented on MESOS-7713:


https://reviews.apache.org/r/60003/
https://reviews.apache.org/r/60002/

> Optimize number of copies made in dispatch/defer mechanism
> --
>
> Key: MESOS-7713
> URL: https://issues.apache.org/jira/browse/MESOS-7713
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Affects Versions: 1.2.0, 1.2.1, 1.3.0
>Reporter: Dmitry Zhuk
>Assignee: Dmitry Zhuk
>
> Profiling agents reregistration for a large cluster shows, that many CPU 
> cycles are spent on copying protobuf objects. This is partially due to copies 
> made by a code like this:
> {code}
> future.then(defer(self(), &Process::method, param);
> {code}
> {{param}} could be copied 8-10 times before it reaches {{method}}. 
> Specifically, {{reregisterSlave}} accepts vectors of rather complex objects, 
> which are passed to {{defer}}.
> Currently there are some places in {{defer}}, {{dispatch}} and {{Future}} 
> code, which could use {{std::move}} and {{std::forward}} to evade some of the 
> copies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7713) Optimize number of copies made in dispatch/defer mechanism

2017-06-22 Thread Dmitry Zhuk (JIRA)
Dmitry Zhuk created MESOS-7713:
--

 Summary: Optimize number of copies made in dispatch/defer mechanism
 Key: MESOS-7713
 URL: https://issues.apache.org/jira/browse/MESOS-7713
 Project: Mesos
  Issue Type: Task
  Components: libprocess
Affects Versions: 1.3.0, 1.2.1, 1.2.0
Reporter: Dmitry Zhuk
Assignee: Dmitry Zhuk


Profiling agents reregistration for a large cluster shows, that many CPU cycles 
are spent on copying protobuf objects. This is partially due to copies made by 
a code like this:
{code}
future.then(defer(self(), &Process::method, param);
{code}
{{param}} could be copied 8-10 times before it reaches {{method}}. 
Specifically, {{reregisterSlave}} accepts vectors of rather complex objects, 
which are passed to {{defer}}.
Currently there are some places in {{defer}}, {{dispatch}} and {{Future}} code, 
which could use {{std::move}} and {{std::forward}} to evade some of the copies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7688) Improve master failover performance by reducing unnecessary agent retries.

2017-06-22 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059249#comment-16059249
 ] 

Dmitry Zhuk commented on MESOS-7688:


[~bmahler], dropping messages during recovery is not a big deal actually... 
They are processed pretty fast, and have no significant impact on what's 
happening next (or even make it easier for master, as backoff is increased in 
these agents).

See [^1.2.0.png] for some details of what's happening after recovery. This is 
basically based on counting messages in logs. It's based on 1.2.0. I expect it 
to be even worse on HEAD, due to extra continuation added here: 
https://github.com/apache/mesos/commit/29fc2dfcb110a51923d4d7c144bdd797b348f96b#diff-28ebad5255c4c1a70b4abf35651198fdR5430
00:00 - time when the first re-registration request is seen by the master.
*reregistering* ("Re-registering agent ..." in log) - total number of 
reregistrations master started. There are two types of these: initial 
reregistration, and extra reregistrations, because agent sent several requests, 
and master processed last of them after initial reregistration was completed.
*ignoring* ("Ignoring re-register agent message from agent ...") - total number 
of reregistrations ignored, because reregistration of this agent is currently 
in progress. Some of them are caused by an oversight in backoff algorithm. Once 
agent sent reregistration request, it chooses random delay between 0 and 
current backoff, which is close to 0 for lots of agents in a large cluster. See 
MESOS-7087
*reregistered* ("Re-registered agent ...") - total number of agents 
reregistered (includes only initial reregistrations).
*sending* ("Sending updated checkpointed resources ...") - total number of 
replies sent to reregistration request. This should eventually match 
*reregistering*.
*update* ("Received update of agent ...") - agent sends this, when it received 
reregistration confirmation.
*applied* ("Applied ... operations in ...") - total number of operations 
applied to the registry. This stops growing after some time, as no new agents 
are being registered.

Almost all this time Master libprocess queue is clogged with {{MessageEvent}} 
and {{DispatchEvent}} (tens of thousands in queue).

So why is this happening?
{{Master::reregisterSlave}} and {{Master::_reregisterSlave}} are processed 
slower then re-registrations are received.
One optimization is here: https://reviews.apache.org/r/60003/. It also requires 
adding some std::moves and using some tricks to move protobuf objects to use it 
at full. I think it can be improved to handle moves of protobufs transparently.
Another one was discussed in MESOS-6972
Other messages processed by master during this time (such as updates from 
agents, or sending offers) do not have major impact. I tried prioritising 
events in master queue (those related to reregistration were processed first), 
and it somewhat improved things, but the low priority queue was drained in 
1-3sec once all agents reregistered.
There are also some minor things, like converting protobuf {{Resource}} lists 
to {{Resources}} multiple times, unjustified use of {{Owned}} in 
{{Framework::completedTasks}}, but these are really microptimizations.

> Improve master failover performance by reducing unnecessary agent retries.
> --
>
> Key: MESOS-7688
> URL: https://issues.apache.org/jira/browse/MESOS-7688
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Benjamin Mahler
>  Labels: scalability
> Attachments: 1.2.0.png
>
>
> Currently, during a failover the agents will (re-)register with the master. 
> While the master is recovering, the master may drop messages from the agents, 
> and so the agents must retry registration using a backoff mechanism. For 
> large clusters, there can be a lot of overhead in processing unnecessary 
> retries from the agents, given that these messages must be deserialized and 
> contain all of the task / executor information many times over.
> In order to reduce this overhead, the idea is to avoid the need for agents to 
> blindly retry (re-)registration with the master. Two approaches for this are:
> (1) Update the MasterInfo in ZK when the master is recovered. This is a bit 
> of an abuse of MasterInfo unfortunately, but the idea is for agents to only 
> (re-)register when they see that the master reaches a recovered state. Once 
> recovered, the master will not drop messages, and therefore agents only need 
> to retry when the connection breaks.
> (2) Have the master reply with a retry message when it's in the recovering 
> state, so that agents get a clear signal that their messages were dropped. 
> The agents only retry when the connection breaks or they get a retry message.

[jira] [Updated] (MESOS-7688) Improve master failover performance by reducing unnecessary agent retries.

2017-06-22 Thread Dmitry Zhuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Zhuk updated MESOS-7688:
---
Attachment: 1.2.0.png

> Improve master failover performance by reducing unnecessary agent retries.
> --
>
> Key: MESOS-7688
> URL: https://issues.apache.org/jira/browse/MESOS-7688
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Benjamin Mahler
>  Labels: scalability
> Attachments: 1.2.0.png
>
>
> Currently, during a failover the agents will (re-)register with the master. 
> While the master is recovering, the master may drop messages from the agents, 
> and so the agents must retry registration using a backoff mechanism. For 
> large clusters, there can be a lot of overhead in processing unnecessary 
> retries from the agents, given that these messages must be deserialized and 
> contain all of the task / executor information many times over.
> In order to reduce this overhead, the idea is to avoid the need for agents to 
> blindly retry (re-)registration with the master. Two approaches for this are:
> (1) Update the MasterInfo in ZK when the master is recovered. This is a bit 
> of an abuse of MasterInfo unfortunately, but the idea is for agents to only 
> (re-)register when they see that the master reaches a recovered state. Once 
> recovered, the master will not drop messages, and therefore agents only need 
> to retry when the connection breaks.
> (2) Have the master reply with a retry message when it's in the recovering 
> state, so that agents get a clear signal that their messages were dropped. 
> The agents only retry when the connection breaks or they get a retry message. 
> This one is less optimal, because the master may have to process a lot of 
> messages and send retries, but once the master is recovered, the master will 
> process only a single (re-)registration from each agent. The number of 
> (re-)registrations that occur while the master is recovering can be reduced 
> to 1 in this approach if the master sends the retry message only after the 
> master completes recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-06-08 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043468#comment-16043468
 ] 

Dmitry Zhuk commented on MESOS-6972:


Yeah, this will not work with arenas, but I'm not sure if arenas are a better 
alternative.
I presume you're referring to using arenas for parsing incoming messages?
I just think about {{reregisterSlave}}, which basically takes all incoming data 
and puts it inside master's internal data structures. This seems like a typical 
flow for handling message: doing some validation, etc. and then passing the 
incoming data to the same or another process with {{defer}}. So technically we 
get faster message parsing with arena, but then we need to deep copy this data 
for passing it further anyway. Without arenas, parsing is slower, but then we 
can use {{std::move}} and {{Swap}} (I have some ideas about making something 
like {{protobuf::move}} invoking {{Swap}} behind the scenes) to avoid any extra 
overhead on copying. Am I missing something here?

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-06-08 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042695#comment-16042695
 ] 

Dmitry Zhuk commented on MESOS-6972:


[~bmahler], what do you think about taking a different approach: since in many 
cases {{RepeatedPtrField}} needs to be converted to {{vector}} anyway, e.g. to 
pass to internal data structures, we can instead {{Swap}} values from 
{{RepeatedPtrField}} to {{vector}}. This does not add too much overhead 
compared to {{RepeatedPtrField}}.

Currently a handler can be installed by calling something like this:
{code}
  template 
  void install(
  void (T::*method)(const process::UPID&, P1C),
  P1 (M::*param1)() const);
{code}
{{P1 (M::*param1)() const}} is an issue here, as it prevents from swapping from 
returned const {{RepeatedPtrField}}.
We can either remove {{const}} from {{param1}} and update code to use 
{{mutable_*}} versions to access message properties, or keep it as is, and do 
{{const_cast}} as a part of conversion. The latter however, somewhat depends on 
protobuf internals and could break for non-primitive defaults.

I've already tested this change ({{const_cast}} variant), and results look 
promising for hotspots like {{reregisterSlave}}.

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7432) Agent state can become corrupted

2017-04-27 Thread Dmitry Zhuk (JIRA)
Dmitry Zhuk created MESOS-7432:
--

 Summary: Agent state can become corrupted
 Key: MESOS-7432
 URL: https://issues.apache.org/jira/browse/MESOS-7432
 Project: Mesos
  Issue Type: Bug
  Components: agent, master
Affects Versions: 1.1.0
Reporter: Dmitry Zhuk


Under some circumstances master can crash with the following (Mesos 1.1):
{noformat}
I0415 02:51:35.302822 21132 master.cpp:8272] Adding task 
task1-d74842f3-52e7-49cf-9f96-8176b5e9aa1a with resources … on agent 
b8f47842-5f08-446a-9b1f-532fe3bd47ed-S3371 (agent_host)
I0415 02:51:35.311982 21132 master.cpp:5426] Re-registered agent 
b8f47842-5f08-446a-9b1f-532fe3bd47ed-S3371 at slave(1)@agent_ip:5051 
(agent_host) with cpus(*):24; mem(*):61261; ports(*):[31000-32000]; 
disk(*):66123; ephemeral_ports(*):[32768-57344]
I0415 02:51:35.312072 21132 master.cpp:5440] Shutting down framework 
201205082337-03- at reregistered agent 
b8f47842-5f08-446a-9b1f-532fe3bd47ed-S3371 at slave(1)@agent_ip:5051 
(agent_host) because the framework is not partition-aware
I0415 02:51:35.320315 21119 hierarchical.cpp:485] Added agent 
b8f47842-5f08-446a-9b1f-532fe3bd47ed-S3371 (agent_host) with cpus(*):24; 
mem(*):61261; ports(*):[31000-32000]; disk(*):66123; 
ephemeral_ports(*):[32768-57344] (allocated: {})
F0415 02:51:35.525313 21132 master.cpp:7729] Check failed: 'slave' Must be non 
NULL 
*** Check failure stack trace: ***
@ 0x7f1270cb92bd  (unknown)
@ 0x7f1270cbb104  (unknown)
@ 0x7f1270cb8eac  (unknown)
@ 0x7f1270cbb9f9  (unknown)
@ 0x7f127024aded  (unknown)
@ 0x7f127021ad2f  (unknown)
@ 0x7f1270240724  (unknown)
@ 0x7f1270242be3  (unknown)
@ 0x7f1270c56371  (unknown)
@ 0x7f1270c56677  (unknown)
@ 0x7f1270d8f760  (unknown)
@ 0x7f126f23c83d  start_thread
@ 0x7f126ea2efdd  clone
I0415 02:51:38.058533 2 mesos-master.sh:101] Master Exit Status: 134
{noformat}

Decoded stack trace:
{noformat}
Master::removeTask master/master.cpp:7729
Master::_reregisterSlave master/master.cpp:5450
{noformat}

This part of the code has been changed in 1.2 and tasks for removed agents are 
no longer added on re-registration, however issues that lead to this crash  
still exist. This crash seems confusing at first, because it fails to lookup 
agent, that was just added. However agent is registered using {{SlaveInfo}}, 
while lookup is performed based on {{slave_id}} in {{TaskInfo}}, indicating 
corrupted agent registration data.

Below is the sequence on events that lead to inconsistent agent state, which 
crashed master when it was sent by agent upon re-registration.
1. Agent b8f47842-5f08-446a-9b1f-532fe3bd47ed-S12 registered with the master, 
and task task1-d74842f3-52e7-49cf-9f96-8176b5e9aa1a was assigned to it:
{noformat}
I0411 23:52:28.516815  1748 slave.cpp:1115] Registered with master 
master@master_ip:5050; given agent ID b8f47842-5f08-446a-9b1f-532fe3bd47ed-S12
...
I0415 00:37:34.436111  1735 slave.cpp:1539] Got assigned task 
'task1-d74842f3-52e7-49cf-9f96-8176b5e9aa1a' for framework 
201205082337-03-
{noformat}

2. Agent host was rebooted and agent started recovery when restarted. Agent 
correctly detected reboot, so it didn’t recover any data and started 
registration as a new agent. However agent was killed before it received 
registration confirmation from master.
{noformat}
I0415 01:08:13.300375  1772 state.cpp:57] Recovering state from 
'/var/lib/mesos/meta'
I0415 01:08:13.300979  1772 state.cpp:698] No committed checkpointed resources 
found at '/var/lib/mesos/meta/resources/resources.info'
I0415 01:08:13.301540  1772 state.cpp:88] Agent host rebooted
I0415 01:08:13.301772  1783 status_update_manager.cpp:203] Recovering status 
update manager
I0415 01:08:13.302006  1767 containerizer.cpp:555] Recovering containerizer
I0415 01:08:13.304095  1766 port_mapping.cpp:2302] Network isolator recovery 
complete
I0415 01:08:13.306538  1781 provisioner.cpp:253] Provisioner recovery complete
I0415 01:08:13.306686  1785 slave.cpp:5281] Finished recovery
I0415 01:08:13.308131  1785 slave.cpp:5314] Garbage collecting old agent 
b8f47842-5f08-446a-9b1f-532fe3bd47ed-S12
I0415 01:08:13.308218  1768 gc.cpp:55] Scheduling 
'/var/lib/mesos/slaves/b8f47842-5f08-446a-9b1f-532fe3bd47ed-S12' for gc 
6.9643315852days in the future
I0415 01:08:13.308629  1781 gc.cpp:55] Scheduling 
'/var/lib/mesos/meta/slaves/b8f47842-5f08-446a-9b1f-532fe3bd47ed-S12' for gc 
6.9642825185days in the future
2017-04-15 
01:08:16,587:1512(0x7fb7a28c5700):ZOO_INFO@auth_completion_func@1300: 
Authentication scheme digest succeeded
I0415 01:08:16.587770  1771 group.cpp:418] Trying to create path 
'/home/mesos/prod/master' in ZooKeeper
I0415 01:08:16.589260  1777 detector.cpp:152] Detected a new leader: (id='316')
I0415 01:08:16.589419  1788 group.cpp:697] Trying to get 

[jira] [Commented] (MESOS-7348) Network isolator crashes agent on startup when network interface cannot be found

2017-04-05 Thread Dmitry Zhuk (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956864#comment-15956864
 ] 

Dmitry Zhuk commented on MESOS-7348:


https://reviews.apache.org/r/58209/

> Network isolator crashes agent on startup when network interface cannot be 
> found
> 
>
> Key: MESOS-7348
> URL: https://issues.apache.org/jira/browse/MESOS-7348
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network
>Affects Versions: 1.1.0
>Reporter: Dmitry Zhuk
>Priority: Minor
>
> When there's no public network interface, network isolator does not properly 
> handle {{None}} and crashes on trying to obtain error message from {{Result}}:
> {code}
>   } else if (!eth0.isSome()){
> // eth0 is not specified in the flag and we did not get a valid
> // eth0 from the library.
> return Error(
> "Network Isolator failed to find a public interface: " + 
> eth0.error());
>   }
> {code}
> There's also similar issue in code handling loopback interface.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7348) Network isolator crashes agent on startup when network interface cannot be found

2017-04-05 Thread Dmitry Zhuk (JIRA)
Dmitry Zhuk created MESOS-7348:
--

 Summary: Network isolator crashes agent on startup when network 
interface cannot be found
 Key: MESOS-7348
 URL: https://issues.apache.org/jira/browse/MESOS-7348
 Project: Mesos
  Issue Type: Bug
  Components: isolation, network
Affects Versions: 1.1.0
Reporter: Dmitry Zhuk
Priority: Minor


When there's no public network interface, network isolator does not properly 
handle {{None}} and crashes on trying to obtain error message from {{Result}}:
{code}
  } else if (!eth0.isSome()){
// eth0 is not specified in the flag and we did not get a valid
// eth0 from the library.
return Error(
"Network Isolator failed to find a public interface: " + eth0.error());
  }
{code}
There's also similar issue in code handling loopback interface.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] (MESOS-7020) cgroups::internal::write can incorrectly report success

2017-01-30 Thread Dmitry Zhuk (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Dmitry Zhuk commented on  MESOS-7020 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: cgroups::internal::write can incorrectly report success  
 
 
 
 
 
 
 
 
 
 
No, not in production. Found it during experiments with cgroups code, when my code didn't fail on writing invalid value. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] [Created] (MESOS-7020) cgroups::internal::write can incorrectly report success

2017-01-27 Thread Dmitry Zhuk (JIRA)
Dmitry Zhuk created MESOS-7020:
--

 Summary: cgroups::internal::write can incorrectly report success
 Key: MESOS-7020
 URL: https://issues.apache.org/jira/browse/MESOS-7020
 Project: Mesos
  Issue Type: Bug
  Components: cgroups
Affects Versions: 1.2.0
 Environment: CentOS7
Reporter: Dmitry Zhuk
Priority: Minor


{{cgroups::internal::write}} does not flush stream before checking for errors 
after writing:
{code}
  ofstream file(path.c_str());
...
  file << value;

  if (file.fail()) {
// TODO(jieyu): Does ofstream actually set errno?
return ErrnoError();
  }
{code}

Since {{ofstream}} does internal buffering, {{file.fail()}} can return 
{{false}}, as value hasn't been written to file yet.

Replacing {{file << value;}} with {{file << value << std::flush;}} makes 
{{file.fail()}} behave as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)