[jira] [Comment Edited] (MESOS-1199) Subprocess is slow - gated by process::reap poll interval

2014-09-23 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145123#comment-14145123
 ] 

Ian Downes edited comment on MESOS-1199 at 9/23/14 6:07 PM:


Understood. This race has existed in the codebase for a long time. We could 
consider looking at /proc/\{pid\}/exe to confirm that the pid at least 
corresponds to the expected executable - still not perfect though.


was (Author: idownes):
Understood. This race has existed in the codebase for a long time. We could 
consider looking at /proc/{pid}/exe to confirm that the pid at least 
corresponds to the expected executable - still not perfect though.

 Subprocess is slow - gated by process::reap poll interval
 

 Key: MESOS-1199
 URL: https://issues.apache.org/jira/browse/MESOS-1199
 Project: Mesos
  Issue Type: Improvement
Affects Versions: 0.18.0
Reporter: Ian Downes
Assignee: Craig Hansen-Sturm
 Attachments: wiatpid.pdf


 Subprocess uses process::reap to wait on the subprocess pid and set the exit 
 status. However, process::reap polls with a one second interval resulting in 
 a delay up to the interval duration before the status future is set.
 This means if you need to wait for the subprocess to complete you get hit 
 with E(delay) = 0.5 seconds, independent of the execution time. For example, 
 the MesosContainerizer uses mesos-fetcher in a Subprocess to fetch the 
 executor during launch. At Twitter we fetch a local file, i.e., a very fast 
 operation, but the launch is blocked until the mesos-fetcher pid is reaped - 
 adding 0 to 1 seconds for every launch!
 The problem is even worse with a chain of short Subprocesses because after 
 the first Subprocess completes you'll be synchronized with the reap interval 
 and you'll see nearly the full interval before notification, i.e., 10 
 Subprocesses each of  1 second duration with take ~10 seconds!
 This has become particularly apparent in some new tests I'm working on where 
 test durations are now greatly extended with each taking several seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1199) Subprocess is slow - gated by process::reap poll interval

2014-08-06 Thread Nikita Vetoshkin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087322#comment-14087322
 ] 

Nikita Vetoshkin edited comment on MESOS-1199 at 8/6/14 6:30 AM:
-

Just a quick note: polling pid of non-children is a racy deal. Process can die 
and a new one unrelated with the same pid can spin up in between poll attempts.
I wonder if we could extend executors protocol - e.g. ask executor to bind 
specified Unix Domain socket. Thisi socket can be polled, reconnected and slave 
will receive disconnect when executor dies. Any thoughts?


was (Author: nekto0n):
Just a quick note: polling pid of non-children is a racy deal. Process can die 
and a new one unrelated with the same pid can spin up in between poll attempts.
I wonder if we could extend executors protocol - e.g. to bind specified Unix 
Domain sockets. They can be polled, reconnected and slave will receive 
disconnect when executor dies. Any thoughts?

 Subprocess is slow - gated by process::reap poll interval
 

 Key: MESOS-1199
 URL: https://issues.apache.org/jira/browse/MESOS-1199
 Project: Mesos
  Issue Type: Improvement
Affects Versions: 0.18.0
Reporter: Ian Downes
Assignee: Craig Hansen-Sturm
 Attachments: wiatpid.pdf


 Subprocess uses process::reap to wait on the subprocess pid and set the exit 
 status. However, process::reap polls with a one second interval resulting in 
 a delay up to the interval duration before the status future is set.
 This means if you need to wait for the subprocess to complete you get hit 
 with E(delay) = 0.5 seconds, independent of the execution time. For example, 
 the MesosContainerizer uses mesos-fetcher in a Subprocess to fetch the 
 executor during launch. At Twitter we fetch a local file, i.e., a very fast 
 operation, but the launch is blocked until the mesos-fetcher pid is reaped - 
 adding 0 to 1 seconds for every launch!
 The problem is even worse with a chain of short Subprocesses because after 
 the first Subprocess completes you'll be synchronized with the reap interval 
 and you'll see nearly the full interval before notification, i.e., 10 
 Subprocesses each of  1 second duration with take ~10 seconds!
 This has become particularly apparent in some new tests I'm working on where 
 test durations are now greatly extended with each taking several seconds.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MESOS-1199) Subprocess is slow - gated by process::reap poll interval

2014-08-04 Thread Yifan Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085494#comment-14085494
 ] 

Yifan Gu edited comment on MESOS-1199 at 8/5/14 12:53 AM:
--

How about using inotify to watch on the /proc/pid?
A concern for that is inotify works only on linux. But there might be 
equivalent stuff on other platform. (to make dropbox works at least...)

Update:
Thanks for BenM's reminder that /proc/pid is not actual file. So this may not 
work. Let me test it...

Result: 
Inotify doesn't give any response when the process is killed.



was (Author: yifan):
How about using inotify to watch on the /proc/pid?
A concern for that is inotify works only on linux. But there might be 
equivalent stuff on other platform. (to make dropbox works at least...)

Update:
Thanks for BenM's reminder that /proc/pid is not actual file. So this may not 
work. Let me test it...

 Subprocess is slow - gated by process::reap poll interval
 

 Key: MESOS-1199
 URL: https://issues.apache.org/jira/browse/MESOS-1199
 Project: Mesos
  Issue Type: Improvement
Affects Versions: 0.18.0
Reporter: Ian Downes
Assignee: Craig Hansen-Sturm
 Attachments: wiatpid.pdf


 Subprocess uses process::reap to wait on the subprocess pid and set the exit 
 status. However, process::reap polls with a one second interval resulting in 
 a delay up to the interval duration before the status future is set.
 This means if you need to wait for the subprocess to complete you get hit 
 with E(delay) = 0.5 seconds, independent of the execution time. For example, 
 the MesosContainerizer uses mesos-fetcher in a Subprocess to fetch the 
 executor during launch. At Twitter we fetch a local file, i.e., a very fast 
 operation, but the launch is blocked until the mesos-fetcher pid is reaped - 
 adding 0 to 1 seconds for every launch!
 The problem is even worse with a chain of short Subprocesses because after 
 the first Subprocess completes you'll be synchronized with the reap interval 
 and you'll see nearly the full interval before notification, i.e., 10 
 Subprocesses each of  1 second duration with take ~10 seconds!
 This has become particularly apparent in some new tests I'm working on where 
 test durations are now greatly extended with each taking several seconds.



--
This message was sent by Atlassian JIRA
(v6.2#6252)