[ 
https://issues.apache.org/jira/browse/MESOS-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler resolved MESOS-1803.
------------------------------------
    Resolution: Cannot Reproduce

The log timings here look as if the threads were starved of CPU:

{noformat}
I0916 22:59:02.136256 21049 leveldb.cpp:343] Persisting action (165 bytes) to 
leveldb took 141908ns
I0916 22:59:02.136267 21047 leveldb.cpp:343] Persisting action (165 bytes) to 
leveldb took 111061ns
I../../src/tests/registrar_tests.cpp:257: Failure
0916 22:59:02.136276 21049 replica.cpp:676] Persisted action at 1
Failed to wait 10secs for registrar.recover(master)
I0916 22:59:14.265326 21049 replica.cpp:661] Replica learned APPEND action at 
position 1
I0916 22:59:02.136291 21047 replica.cpp:676] Persisted action at 1
E0916 22:59:07.135143 21046 registrar.cpp:500] Registrar aborting: Failed to 
update 'registry': Failed to perform store within 5secs
I0916 22:59:14.265393 21047 replica.cpp:661] Replica learned APPEND action at 
position 1
{noformat}

The logging time stamp is determined at the beginning of the LOG(INFO) 
expression, when the initial LogMessage object is created. The interleaving of 
times looks to be a stall of the VM or thread starvation:

{noformat}
22:59:02.136267 21047 // Thread 1, 1st LogMessage flushed.
22:59:02.136276 21049 // Thread 2, 2nd LogMessage flushed.
22:59:14.265326 21049 // Thread 2, 5th LogMessage flushed.
22:59:02.136291 21047 // Thread 1, 3rd LogMessage flushed.
22:59:07.135143 21046 // Thread 3, 4th LogMessage flushed.
22:59:14.265393 21047 // Thread 1, 6th LogMessage flushed.
{noformat}

> Strict/RegistrarTest.remove test is flaky on jenkins.
> -----------------------------------------------------
>
>                 Key: MESOS-1803
>                 URL: https://issues.apache.org/jira/browse/MESOS-1803
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Benjamin Mahler
>            Assignee: Benjamin Mahler
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN      ] Strict/RegistrarTest.remove/1
> Using temporary directory '/tmp/Strict_RegistrarTest_remove_1_3QvnOW'
> I0916 22:59:02.112568 21026 leveldb.cpp:176] Opened db in 1.779835ms
> I0916 22:59:02.112896 21026 leveldb.cpp:183] Compacted db in 301862ns
> I0916 22:59:02.112916 21026 leveldb.cpp:198] Created db iterator in 3065ns
> I0916 22:59:02.112926 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 475ns
> I0916 22:59:02.112936 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 330ns
> I0916 22:59:02.112951 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:59:02.113654 21054 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 421460ns
> I0916 22:59:02.113674 21054 replica.cpp:320] Persisted replica status to 
> VOTING
> I0916 22:59:02.115900 21026 leveldb.cpp:176] Opened db in 1.947919ms
> I0916 22:59:02.116263 21026 leveldb.cpp:183] Compacted db in 338043ns
> I0916 22:59:02.116283 21026 leveldb.cpp:198] Created db iterator in 2809ns
> I0916 22:59:02.116293 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 468ns
> I0916 22:59:02.116302 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 195ns
> I0916 22:59:02.116317 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:59:02.117013 21043 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 472891ns
> I0916 22:59:02.117034 21043 replica.cpp:320] Persisted replica status to 
> VOTING
> I0916 22:59:02.119240 21026 leveldb.cpp:176] Opened db in 1.950367ms
> I0916 22:59:02.120455 21026 leveldb.cpp:183] Compacted db in 1.188056ms
> I0916 22:59:02.120481 21026 leveldb.cpp:198] Created db iterator in 4370ns
> I0916 22:59:02.120499 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 7977ns
> I0916 22:59:02.120517 21026 leveldb.cpp:273] Iterated through 1 keys in the 
> db in 8479ns
> I0916 22:59:02.120533 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:59:02.122890 21026 leveldb.cpp:176] Opened db in 2.301327ms
> I0916 22:59:02.124325 21026 leveldb.cpp:183] Compacted db in 1.406223ms
> I0916 22:59:02.124351 21026 leveldb.cpp:198] Created db iterator in 4185ns
> I0916 22:59:02.124368 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 7167ns
> I0916 22:59:02.124387 21026 leveldb.cpp:273] Iterated through 1 keys in the 
> db in 8182ns
> I0916 22:59:02.124403 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:59:02.124579 21047 recover.cpp:425] Starting replica recovery
> I0916 22:59:02.124651 21047 recover.cpp:451] Replica is in VOTING status
> I0916 22:59:02.124793 21047 recover.cpp:440] Recover process terminated
> I0916 22:59:02.126404 21046 registrar.cpp:313] Recovering registrar
> I0916 22:59:02.126597 21050 log.cpp:656] Attempting to start the writer
> I0916 22:59:02.127259 21041 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0916 22:59:02.127321 21050 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0916 22:59:02.127835 21041 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 547018ns
> I0916 22:59:02.127858 21041 replica.cpp:342] Persisted promised to 1
> I0916 22:59:02.127835 21050 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 487588ns
> I0916 22:59:02.127887 21050 replica.cpp:342] Persisted promised to 1
> I0916 22:59:02.128387 21055 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0916 22:59:02.129546 21042 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0916 22:59:02.129600 21053 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0916 22:59:02.129982 21042 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 406954ns
> I0916 22:59:02.129982 21053 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 357253ns
> I0916 22:59:02.130009 21042 replica.cpp:676] Persisted action at 0
> I0916 22:59:02.130029 21053 replica.cpp:676] Persisted action at 0
> I0916 22:59:02.130543 21041 replica.cpp:508] Replica received write request 
> for position 0
> I0916 22:59:02.130585 21041 leveldb.cpp:438] Reading position from leveldb 
> took 17424ns
> I0916 22:59:02.130599 21046 replica.cpp:508] Replica received write request 
> for position 0
> I0916 22:59:02.130635 21046 leveldb.cpp:438] Reading position from leveldb 
> took 12702ns
> I0916 22:59:02.130728 21041 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 112749ns
> I0916 22:59:02.130749 21041 replica.cpp:676] Persisted action at 0
> I0916 22:59:02.130795 21046 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 131300ns
> I0916 22:59:02.130820 21046 replica.cpp:676] Persisted action at 0
> I0916 22:59:02.131093 21042 replica.cpp:655] Replica received learned notice 
> for position 0
> I0916 22:59:02.131163 21048 replica.cpp:655] Replica received learned notice 
> for position 0
> I0916 22:59:02.131232 21042 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 117351ns
> I0916 22:59:02.131256 21042 replica.cpp:676] Persisted action at 0
> I0916 22:59:02.131273 21042 replica.cpp:661] Replica learned NOP action at 
> position 0
> I0916 22:59:02.131340 21048 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 148690ns
> I0916 22:59:02.131358 21048 replica.cpp:676] Persisted action at 0
> I0916 22:59:02.131369 21048 replica.cpp:661] Replica learned NOP action at 
> position 0
> I0916 22:59:02.131569 21046 log.cpp:672] Writer started with ending position 0
> I0916 22:59:02.131950 21055 leveldb.cpp:438] Reading position from leveldb 
> took 15477ns
> I0916 22:59:02.133436 21052 registrar.cpp:346] Successfully fetched the 
> registry (0B)
> I0916 22:59:02.133471 21052 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0916 22:59:02.134899 21049 log.cpp:680] Attempting to append 144 bytes to 
> the log
> I0916 22:59:02.134968 21048 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 1
> I0916 22:59:02.135306 21042 replica.cpp:508] Replica received write request 
> for position 1
> I0916 22:59:02.135407 21052 replica.cpp:508] Replica received write request 
> for position 1
> I0916 22:59:02.135717 21052 leveldb.cpp:343] Persisting action (163 bytes) to 
> leveldb took 279603ns
> I0916 22:59:02.135721 21042 leveldb.cpp:343] Persisting action (163 bytes) to 
> leveldb took 388839ns
> I0916 22:59:02.135742 21052 replica.cpp:676] Persisted action at 1
> I0916 22:59:02.135752 21042 replica.cpp:676] Persisted action at 1
> I0916 22:59:02.136090 21049 replica.cpp:655] Replica received learned notice 
> for position 1
> I0916 22:59:02.136127 21047 replica.cpp:655] Replica received learned notice 
> for position 1
> I0916 22:59:02.136256 21049 leveldb.cpp:343] Persisting action (165 bytes) to 
> leveldb took 141908ns
> I0916 22:59:02.136267 21047 leveldb.cpp:343] Persisting action (165 bytes) to 
> leveldb took 111061ns
> I../../src/tests/registrar_tests.cpp:257: Failure
> 0916 22:59:02.136276 21049 replica.cpp:676] Persisted action at 1
> Failed to wait 10secs for registrar.recover(master)
> I0916 22:59:14.265326 21049 replica.cpp:661] Replica learned APPEND action at 
> position 1
> I0916 22:59:02.136291 21047 replica.cpp:676] Persisted action at 1
> E0916 22:59:07.135143 21046 registrar.cpp:500] Registrar aborting: Failed to 
> update 'registry': Failed to perform store within 5secs
> I0916 22:59:14.265393 21047 replica.cpp:661] Replica learned APPEND action at 
> position 1
> [  FAILED  ] Strict/RegistrarTest.remove/1, where GetParam() = true (12156 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to