Question about "Framework directly access Meso agent"

2016-02-16 Thread Suteng
Hi,

Currently, Mesos framework's task related operations lauchTask, updateStatus 
and executorSendMessage etc., and resource related operations resourceOffer 
etc., all operations are pass through Mesos Master.
When the cluster and task number become huge, or with optimistic resource 
offer, multi-framework concurrently launchTask, maybe Mesos Master will be a 
bottleneck.
Is possible for framework scheduler directly access Mesos agent, launchTask, 
updateStatus and SendMessage2Executore to Mesos Agent directly, bypass the 
Master?
Will invoke big conflict with current mechanism?

Looking forward to your comments and opinions.

Best Regards,
Teng


[cid:image001.png@01D16976.1DFAFB20]


Su Teng  00241668


Distributed and Parallel Software Lab
Huawei Technologies Co., Ltd.
Email:sut...@huawei.com





答复: Question about "Framework directly access Meso agent"

2016-02-17 Thread Suteng
Alex,
We don't have test the performance of mesos. But we have develop a framework in 
house, which is like a simplified mesos, use to schedule a large number fine 
grain computation tasks.  We find that master will be a bottleneck. One reason 
is our task contain several KB data, and task number is quite huge. 
If we use mesos to replace it, maybe master still be a bottleneck.

Master still do the resource bookkeeping, we can decompose launch task to two 
steps, firstly scheduler tell master which offer he wants, then master tell 
scheduler the address of agent. Secondly, scheduler can directly launch task to 
the agent, and also can directly send message to agent.
Maybe I can do some test about the mesos master launch task throughput, with 
different number task data.


-邮件原件-
发件人: C Rukletsov [mailto:a...@mesosphere.com] 
发送时间: 2016年2月17日 18:04
收件人: dev
主题: Re: Question about "Framework directly access Meso agent"

Suteng—

such optimization makes sense in certain cases (e.g. sending a framework 
message), but it can be rather tricky in general, because the master has to 
maintain bookkeeping. Moreover, with the upcoming HTTP API it becomes harder 
for a framework to determine where to send messages to reach a specific agent.

Have you done any performance tests and seen master becoming a bottleneck?

On Wed, Feb 17, 2016 at 5:14 AM, Suteng <sut...@huawei.com> wrote:

> Hi,
>
>
>
> Currently, Mesos framework’s task related operations lauchTask, 
> updateStatus and executorSendMessage etc., and resource related 
> operations resourceOffer etc., all operations are pass through Mesos Master.
>
> When the cluster and task number become huge, or with optimistic 
> resource offer, multi-framework concurrently launchTask, maybe Mesos 
> Master will be a bottleneck.
>
> Is possible for framework scheduler directly access Mesos agent, 
> launchTask, updateStatus and SendMessage2Executore to Mesos Agent 
> directly, bypass the Master?
>
> Will invoke big conflict with current mechanism?
>
>
>
> Looking forward to your comments and opinions.
>
>
>
> Best Regards,
>
> Teng
>
>
>
>
>
>
>
> Su Teng  00241668
>
>
>
> Distributed and Parallel Software Lab
>
> Huawei Technologies Co., Ltd.
>
> Email:sut...@huawei.com
>
>
>
>
>


答复: encounter “Decoder error while receiving” error when using libprocess send, receive api

2016-10-26 Thread Suteng
Ben,
Than you. After we update the libprocess , this problem is solved .


-邮件原件-
发件人: Benjamin Mahler [mailto:bmah...@apache.org] 
发送时间: 2016年10月17日 16:13
收件人: dev@mesos.apache.org
抄送: zhoujie (S); Zhoubin (Distributed and Parallel Software Lab)
主题: Re: encounter “Decoder error while receiving” error when using libprocess 
send, receive api

Hi Su Teng,

Glad to hear you're making use of libprocess, be aware that we currently bundle 
it in the mesos repository and development occurs within the mesos project at 
the current time.

This issue sounds like https://issues.apache.org/jira/browse/MESOS-5943

Are you obtaining libprocess from the mesos repository? Do you have the 
following patch in the version of libprocess you are running?

https://reviews.apache.org/r/50634/

(Image attachments are dropped by the mail servers by the way)

Ben

On Sunday, October 16, 2016, Suteng <sut...@huawei.com> wrote:

> Hi,
>
> We are build a cache system based on libprocess, and find when 
> send/receive message at a high frequency, there always a decode error 
> in libprocess make we loss the message.
>
> So we do write a testcase. it’s just a ping-pong test, including a 
> server on one node, and several clients on the other node. The clients 
> send messages to server on parallel. when the server receives a 
> message, it just gives a response to the client. We count the response 
> numbers on the client side to see whether there is a message loss. And 
> finally, we got the “Decoder error while receiving” error on the 
> client side. It happens when we use two clients, and sends 200 messages each.
>
>
>
> Are we use libprocess in a wrong way or there is bug in librpcess?
>
>
>
> Here’s our test codes.
>
> Client:
>
> class ClientProcess : public ProtobufProcess
>
> {
>
> public:
>
>   ClientProcess(
>
> int index_,
>
> int requestNum_)
>
> : ProcessBase("ClientProcess"+stringify(index_)),
>
>   index(index_),
>
>   requestNum(requestNum_) {}
>
>
>
>   ~ClientProcess() {}
>
>
>
>   virtual void initialize()
>
>   {
>
> LOG(INFO) << "ClientProcess" << stringify(index) << " initialize";
>
> install("pong", ::pong);
>
> sendToServer();
>
>   }
>
>
>
>   void pong(const UPID& from, const string& body)
>
>   {
>
> responseNum++;
>
>
>
> LOG(INFO) << "--" << self().id  << " recv response " << body;
>
>
>
> if (responseNum == requestNum) {
>
>   LOG(INFO) << "ClientProcess" << stringify(index) << " receives "
>
> << responseNum << " responses";
>
> }
>
>   }
>
>
>
>   void sendToServer()
>
>   {
>
> Address serverAddr = Address(net::IP::parse("server ip", 
> AF_INET).get(), server port);
>
> UPID serverUpid = UPID("ServerProcess", serverAddr);
>
> link(serverUpid);
>
>
>
> for (int i = 0; i < requestNum; i++) {
>
>   string msg = stringify(i);
>
>   send(serverUpid, "ping", msg.c_str(), msg.size());
>
>   LOG(INFO) << self().id << " send msg = " << msg;
>
> }
>
>   }
>
>
>
>   int index;
>
>   int requestNum;
>
>   int responseNum;
>
> };
>
>
>
> int main(int argc, char** argv)
>
> {
>
>   os::setenv("LIBPROCESS_IP", "client ip");
>
>   os::setenv("LIBPROCESS_PORT", "client port");
>
>
>
>   int requestNum = stoi(argv[1]);
>
>   int concurrent = stoi(argv[2]);
>
>   ClientProcess* client = NULL;
>
>   UPID clientUpid;
>
>
>
>   for (int i = 0; i < concurrent; i++) {
>
> client = new ClientProcess(i, requestNum);
>
> clientUpid = spawn(client);
>
>   }
>
>
>
>   wait(clientUpid);
>
>   return 0;
>
> }
>
>
>
>
>
> Server:
>
> class ServerProcess : public ProtobufProcess
>
> {
>
> public:
>
>   ServerProcess() : ProcessBase("ServerProcess") {}
>
>
>
>   ~ServerProcess() {}
>
>
>
>   virtual void initialize()
>
>   {
>
> LOG(INFO) << "ServerProcess initialize";
>
> install("ping", ::ping);
>
>   }
>
>
>
>   void ping(const UPID& from, const string& body)
>
>   {
>
> if (!links.contains(from)) {
>
>   link(from);
>
>   links.insert(from);
>
> }
>
> LOG(INFO) << "recv from " << from.id << ", msg = " << body;
>
> send(from, "pong", body.c_str(), body.size());
>
>   }
>
>
>
>   hashset links;
>
> };
>
>
>
> int main(int argc, char** argv)
>
> {
>
>   os::setenv("LIBPROCESS_IP", "server ip");
>
>   os::setenv("LIBPROCESS_PORT", "server port");
>
>
>
>   ServerProcess server;
>
>   UPID serverUpid = spawn();
>
>
>
>   wait(serverUpid);
>
>   return 0;
>
> }
>
>
>
>
>
> Follows is the message inner libprocess buffer.
>
> The correct HTTP message
>
>
>
>
>
> The error HTTP message
>
>
>
>
>
>
>
>
>
> Su Teng  00241668
>
>
>
> Distributed and Parallel Software Lab
>
> Huawei Technologies Co., Ltd.
>
> Email:sut...@huawei.com
> <javascript:_e(%7B%7D,'cvml','sut...@huawei.com');>
>
>
>
>
>


Does libprocess support multi-port?

2016-10-26 Thread Suteng
Hi,
Does libprocess support multi port? Some process bind to a port, and some other 
process bind to another port in the same OS process.

Thanks,
Teng



[cid:image001.png@01D22FA3.49FDF300]


Su Teng  00241668


Distributed and Parallel Software Lab
Huawei Technologies Co., Ltd.
Email:sut...@huawei.com





encounter “Decoder error while receiving” error when using libprocess send, receive api

2016-10-16 Thread Suteng
Hi,
We are build a cache system based on libprocess, and find when send/receive 
message at a high frequency, there always a decode error in libprocess make we 
loss the message.
So we do write a testcase. it’s just a ping-pong test, including a server on 
one node, and several clients on the other node. The clients send messages to 
server on parallel. when the server receives a message, it just gives a 
response to the client. We count the response numbers on the client side to see 
whether there is a message loss. And finally, we got the “Decoder error while 
receiving” error on the client side. It happens when we use two clients, and 
sends 200 messages each.
[cid:image004.png@01D2285F.D604AB00]

Are we use libprocess in a wrong way or there is bug in librpcess?

Here’s our test codes.
Client:
class ClientProcess : public ProtobufProcess
{
public:
  ClientProcess(
int index_,
int requestNum_)
: ProcessBase("ClientProcess"+stringify(index_)),
  index(index_),
  requestNum(requestNum_) {}

  ~ClientProcess() {}

  virtual void initialize()
  {
LOG(INFO) << "ClientProcess" << stringify(index) << " initialize";
install("pong", ::pong);
sendToServer();
  }

  void pong(const UPID& from, const string& body)
  {
responseNum++;

LOG(INFO) << "--" << self().id  << " recv response " << body;

if (responseNum == requestNum) {
  LOG(INFO) << "ClientProcess" << stringify(index) << " receives "
<< responseNum << " responses";
}
  }

  void sendToServer()
  {
Address serverAddr = Address(net::IP::parse("server ip", AF_INET).get(), 
server port);
UPID serverUpid = UPID("ServerProcess", serverAddr);
link(serverUpid);

for (int i = 0; i < requestNum; i++) {
  string msg = stringify(i);
  send(serverUpid, "ping", msg.c_str(), msg.size());
  LOG(INFO) << self().id << " send msg = " << msg;
}
  }

  int index;
  int requestNum;
  int responseNum;
};

int main(int argc, char** argv)
{
  os::setenv("LIBPROCESS_IP", "client ip");
  os::setenv("LIBPROCESS_PORT", "client port");

  int requestNum = stoi(argv[1]);
  int concurrent = stoi(argv[2]);
  ClientProcess* client = NULL;
  UPID clientUpid;

  for (int i = 0; i < concurrent; i++) {
client = new ClientProcess(i, requestNum);
clientUpid = spawn(client);
  }

  wait(clientUpid);
  return 0;
}


Server:
class ServerProcess : public ProtobufProcess
{
public:
  ServerProcess() : ProcessBase("ServerProcess") {}

  ~ServerProcess() {}

  virtual void initialize()
  {
LOG(INFO) << "ServerProcess initialize";
install("ping", ::ping);
  }

  void ping(const UPID& from, const string& body)
  {
if (!links.contains(from)) {
  link(from);
  links.insert(from);
}
LOG(INFO) << "recv from " << from.id << ", msg = " << body;
send(from, "pong", body.c_str(), body.size());
  }

  hashset links;
};

int main(int argc, char** argv)
{
  os::setenv("LIBPROCESS_IP", "server ip");
  os::setenv("LIBPROCESS_PORT", "server port");

  ServerProcess server;
  UPID serverUpid = spawn();

  wait(serverUpid);
  return 0;
}


Follows is the message inner libprocess buffer.
The correct HTTP message
[cid:image001.png@01D2285E.86F0F1A0]


The error HTTP message
[cid:image002.png@01D2285E.86F0F1A0]



[cid:image003.png@01D2285E.86F0F1A0]


Su Teng  00241668


Distributed and Parallel Software Lab
Huawei Technologies Co., Ltd.
Email:sut...@huawei.com





libprocess “Failed connect: connection closed”

2017-06-19 Thread Suteng
Hi,

We meet to libprocess failure which is contained in mesos 1.0. Libprocess in 
run in ssl_enable mode.
The failure is “Failed connect: connection closed”, when start to run the 
programe.”
I think this failure is throw from  void 
LibeventSSLSocketImpl::event_callback(short events);

Anyone have encounter this failure? Or is a known issue?

Thanks,
Teng

[cid:image001.png@01D2E944.AFF1C0F0]


Su Teng  00241668


Distributed and Parallel Software Lab
Huawei Technologies Co., Ltd.
Email:sut...@huawei.com





libevent-2.1.8 SSL mode can't trigger recv callback

2017-11-06 Thread Suteng
Hi,
We upgrade libevent to version 2.1.8, when enable ssl, libevent won't trigger 
recv() when receive data.
We have using tcpdump to make sure that data is received in kernel.

Anyone meet this problem?


Best regards,
Teng





[cid:image001.png@01D3574E.265979A0]


Su Teng  00241668


Distributed and Parallel Software Lab
Huawei Technologies Co., Ltd.
Email:sut...@huawei.com





CHECK_NOTNULL(self->bev) Check failed inside LibeventSSLSocketImpl::shutdown

2018-06-27 Thread Suteng
F0622 11:22:30.985245 16127 libevent_ssl_socket.cpp:190] Check failed: 
'self->bev' Must be non NULL
Try LibeventSSLSocketImpl::shutdown(int how)
CHECK_NOTNULL(self->bev)

Test case:
A server is non-ssl, B server is enable downgrade, B frequent link reconnect to 
A, then will generate this error, very low probability. It's looks like bev is 
already free, than call shutdown again.

class OpensslProcess : public ProtobufProcess
{
public:
  OpensslProcess()
: ProcessBase("OpensslProcess"), sendCnt(0), recvCnt(0) {}

  ~OpensslProcess() {}

  virtual void initialize()
  {
  install(::HandleMessage);
  install("ping", ::pong);
  //SendMessage();
  }
  void SendMessage()
  {
string data = "hello world";
UdpMessage msg;
msg.set_size(data.size());
msg.set_data(data);

Address serverAddr = Address(net::IP::parse("127.0.0.1", AF_INET).get(), 
7012);
UPID destUpid = UPID("OpensslProcess", serverAddr);

send(destUpid, msg);
sleep(5);
link(destUpid, RemoteConnection::RECONNECT);
send(destUpid, msg);
link(destUpid, RemoteConnection::RECONNECT);
send(destUpid, msg);
  }






[cid:image001.png@01D40DFB.F0D4B050]


Su Teng  00241668


Distributed and Parallel Software Lab
Huawei Technologies Co., Ltd.
Email:sut...@huawei.com





答复: libprocess libevent backend

2018-05-03 Thread Suteng

In libprocess libevent.cpp, is avoid to using epoll. These is the code: 

/home/suteng/code/mesos/3rdparty/libprocess/src/libevent.cpp

206   // TODO(jmlvanre): Allow support for 'epoll' once SSL related
207   // issues are resolved.
208   struct event_config* config = event_config_new();
209   event_config_avoid_method(config, "epoll");



-邮件原件-
发件人: Benjamin Mahler [mailto:bmah...@apache.org] 
发送时间: 2018年5月4日 3:33
收件人: dev <dev@mesos.apache.org>
主题: Re: libprocess libevent backend

Which issue are you referring to?

Libprocess uses libev by default, with --enable-libevent as a configure option 
to use libevent instead. Both of these backends should use epoll if the system 
has it available. Are you seeing otherwise?

On Thu, May 3, 2018 at 6:15 AM, Suteng <sut...@huawei.com> wrote:

> libprocess uie poll as libevent backend, can change to epoll to 
> improve performance ?
>
> There is a TODO issue, is resolved?
>
>
>
>
>
> Thanks,
>
> SU Teng
>
>
>
>
>
> Su Teng  00241668
>
>
>
> Distributed and Parallel Software Lab
>
> Huawei Technologies Co., Ltd.
>
> Email:sut...@huawei.com
>
>
>
>
>


libprocess libevent backend

2018-05-03 Thread Suteng
libprocess uie poll as libevent backend, can change to epoll to improve 
performance ?
There is a TODO issue, is resolved?

[cid:image001.png@01D3E323.E3736F30]

Thanks,
SU Teng

[cid:image002.png@01D3E323.E3736F30]


Su Teng  00241668


Distributed and Parallel Software Lab
Huawei Technologies Co., Ltd.
Email:sut...@huawei.com





答复: libprocess libevent backend

2018-05-04 Thread Suteng
Create an issue
https://issues.apache.org/jira/browse/MESOS-8881

-邮件原件-
发件人: Benjamin Mahler [mailto:bmah...@apache.org] 
发送时间: 2018年5月4日 11:47
收件人: Joris Van Remoortere <joris.van.remoort...@gmail.com>; dev@mesos.apache.org
主题: Re: libprocess libevent backend

+Joris

Wow, Joris do you know why this is disabled? What were the issues?

Suteng, can you file a JIRA ticket?

On Thu, May 3, 2018 at 6:26 PM Suteng <sut...@huawei.com> wrote:

>
> In libprocess libevent.cpp, is avoid to using epoll. These is the code:
>
> /home/suteng/code/mesos/3rdparty/libprocess/src/libevent.cpp
>
> 206   // TODO(jmlvanre): Allow support for 'epoll' once SSL related
> 207   // issues are resolved.
> 208   struct event_config* config = event_config_new();
> 209   event_config_avoid_method(config, "epoll");
>
>
>
> -邮件原件-
> 发件人: Benjamin Mahler [mailto:bmah...@apache.org]
> 发送时间: 2018年5月4日 3:33
> 收件人: dev <dev@mesos.apache.org>
> 主题: Re: libprocess libevent backend
>
> Which issue are you referring to?
>
> Libprocess uses libev by default, with --enable-libevent as a 
> configure option to use libevent instead. Both of these backends 
> should use epoll if the system has it available. Are you seeing otherwise?
>
> On Thu, May 3, 2018 at 6:15 AM, Suteng <sut...@huawei.com> wrote:
>
> > libprocess uie poll as libevent backend, can change to epoll to 
> > improve performance ?
> >
> > There is a TODO issue, is resolved?
> >
> >
> >
> >
> >
> > Thanks,
> >
> > SU Teng
> >
> >
> >
> >
> >
> > Su Teng  00241668
> >
> >
> >
> > Distributed and Parallel Software Lab
> >
> > Huawei Technologies Co., Ltd.
> >
> > Email:sut...@huawei.com
> >
> >
> >
> >
> >
>