[jira] [Commented] (MESOS-8828) Clock::advance can race with process::delay in tests.

2018-05-17 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479319#comment-16479319
 ] 

Andrei Budnik commented on MESOS-8828:
--

Another possible solution can be introducing `FUTURE_DELAY(M)` primitive, that 
returns a future which is set to ready when `delay(duration, pid, M)` is 
called. This primitive is kind of similar to `FUTURE_DISPATCH()`.

> Clock::advance can race with process::delay in tests.
> -
>
> Key: MESOS-8828
> URL: https://issues.apache.org/jira/browse/MESOS-8828
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrei Budnik
>Priority: Major
>  Labels: flaky
> Attachments: failed_tests.txt
>
>
> There are lots of tests that use the following pattern:
>  1) [Pause 
> clocks|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1108]
>  2) [Start an 
> agent|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1122]
>  3) [Advance clocks to trigger an 
> event|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1125]
>  4) [Wait for the 
> event|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1127]
> If an event is scheduled via `process::delay()` after advancing the clocks, 
> then a test hangs in the endless wait for the event that is never triggered, 
> because libprocess clocks are paused. For example, 
> `DiskResource/PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy/0` 
> test hangs at step 4, because the clocks at step 3 has been already advanced 
> before the agent scheduled a call of 
> [Slave::authenticate()|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1301]
>  method. After a successful authentication with a master, the agent sends a 
> [UpdateSlaveMessage|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1546-L1550].
>  But the authentication process never finishes because 
> `[Slave::authenticate()|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1301]`
>  is never called.
> A list of tests that might be affected by the issue attached to this ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8828) Clock::advance can race with process::delay in tests.

2018-05-16 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477512#comment-16477512
 ] 

Alexander Rukletsov commented on MESOS-8828:


A possible solution here would be an introduction of some {{ADVANCE_AWAIT(M)}} 
primitive, that processes messages one by one automatically advancing the clock 
to scroll instantly to a message with the next upcoming deadline, until message 
{{M}} is observed.

> Clock::advance can race with process::delay in tests.
> -
>
> Key: MESOS-8828
> URL: https://issues.apache.org/jira/browse/MESOS-8828
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrei Budnik
>Priority: Major
>  Labels: flaky
> Attachments: failed_tests.txt
>
>
> There are lots of tests that use the following pattern:
>  1) [Pause 
> clocks|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1108]
>  2) [Start an 
> agent|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1122]
>  3) [Advance clocks to trigger an 
> event|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1125]
>  4) [Wait for the 
> event|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1127]
> If an event is scheduled via `process::delay()` after advancing the clocks, 
> then a test hangs in the endless wait for the event that is never triggered, 
> because libprocess clocks are paused. For example, 
> `DiskResource/PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy/0` 
> test hangs at step 4, because the clocks at step 3 has been already advanced 
> before the agent scheduled a call of 
> [Slave::authenticate()|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1301]
>  method. After a successful authentication with a master, the agent sends a 
> [UpdateSlaveMessage|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1546-L1550].
>  But the authentication process never finishes because 
> `[Slave::authenticate()|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1301]`
>  is never called.
> A list of tests that might be affected by the issue attached to this ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)