regarding reliability: obviously TCP provides a point to point guarantee... however there is no application level (ISO model) guarantee that the message was received or processed. A loss of slave or executor at the wrong time would result in no processing of the message without the senders awareness.
In fact, I found that there were some FrameworkToExecutorMessage and ExecutorToFrameworkMessage dropped in our circumstance, Can I guess the possible reason may be that it didn't get processed even if reliably received with TCP, in the occasion that all components in mesos from scheduler to executor working properly. There is a guarantee of status messages but not framework messages. It is a an application equivalent of UDP at the protocol level. Do you mean that these guarantee are status update and slave/framework registered successfully acknowledgement? If I want to guarantee reliability of framework message at application level, I have to do it by myself ? At 2016-01-05 23:59:28, "Ken Sipe" <kens...@gmail.com> wrote: regarding reliability: obviously TCP provides a point to point guarantee... however there is no application level (ISO model) guarantee that the message was received or processed. A loss of slave or executor at the wrong time would result in no processing of the message without the senders awareness. There is a guarantee of status messages but not framework messages. It is a an application equivalent of UDP at the protocol level. ken On Jan 5, 2016, at 9:34 AM, sujz <drizzle...@126.com> wrote: Hi, all: I am using mesos-0.22.0, I noticed that FrameworkToExecutorMessage is sent along path: Scheduler->Master->Slave->Executor, while ExecutorToFrameworkMessage is sent along path: Executor->Slave->Scheduler, So is there some reason or benefit for bypassing master while transmitting ExecutorToFrameworkMessage? One more question, FrameworkToExecutorMessage and ExecutorToFrameworkMessage are instantiated in function SendFrameworkMessage, declaration of SendFrameworkMessage in include/mesos/scheduler.hpp and include/mesos/executor.hpp: // Sends a message from the framework to one of its executors. These // messages are best effort; do not expect a framework message to be // retransmitted in any reliable fashion. virtual Status sendFrameworkMessage( const ExecutorID& executorId, const SlaveID& slaveId, const std::string& data) = 0; I guess that protobuf message are transmitted with TCP, so does this comment mean I have to guarantee reliability by myself even with TCP? What's special for these two messages compared with other protobuf messages, If no, do we have to guarantee reliability all by ourselves? Thank you very much and best regards !