[jira] [Commented] (MESOS-1848) DRFAllocatorTest.DRFAllocatorProcess is flaky

2015-10-13 Thread Liqiang Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956386#comment-14956386
 ] 

Liqiang Lin commented on MESOS-1848:


Suggest to close this bug since it's too old to catch the latest test cases. I 
can not find the related test code anymore. 

$ ./mesos-tests  --gtest_filter=DRFAllocatorTest.DRFAllocatorProcess
Source directory: /Users/liqlin/code/mesos
Build directory: /Users/liqlin/code/mesos/build
-
We cannot run any Docker tests because:
Docker tests not supported on non-Linux systems
-
/usr/bin/nc
Note: Google Test filter = 
DRFAllocatorTest.DRFAllocatorProcess-HealthCheckTest.ROOT_DOCKER_DockerHealthyTask:HealthCheckTest.ROOT_DOCKER_DockerHealthStatusChange:HookTest.ROOT_DOCKER_VerifySlavePreLaunchDockerHook:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DISABLED_ROOT_RunTaskWithCommandInfoWithUser:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_DestroyWhilePulling:DockerContainerizerTest.ROOT_DOCKER_ExecutorCleanupWhenLaunchFailed:DockerContainerizerTest.ROOT_DOCKER_FetchFailure:DockerContainerizerTest.ROOT_DOCKER_DockerPullFailure:DockerContainerizerTest.ROOT_DOCKER_DockerInspectDiscard:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_parsing_version:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:DockerTest.ROOT_DOCKER_MountRelative:DockerTest.ROOT_DOCKER_MountAbsolute:CopyBackendTest.ROOT_CopyBackend:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/0:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/1:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/2:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/3:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/4:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/5:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/6:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/7:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/8:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/9:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/10:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/11:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/12:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/13:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/14:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/15:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/16:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/17:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/18:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/19:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/20:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/21:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/22:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/23:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/24:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/25:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/26:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/27:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/28:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/29:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/30:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/31:SlaveAndFrameworkCount

[jira] [Assigned] (MESOS-1848) DRFAllocatorTest.DRFAllocatorProcess is flaky

2015-10-13 Thread Liqiang Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liqiang Lin reassigned MESOS-1848:
--

Assignee: Liqiang Lin

> DRFAllocatorTest.DRFAllocatorProcess is flaky
> -
>
> Key: MESOS-1848
> URL: https://issues.apache.org/jira/browse/MESOS-1848
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Fedora 20
>Reporter: Vinod Kone
>Assignee: Liqiang Lin
>  Labels: flaky
>
> Observed this on CI. This is pretty strange because the authentication of 
> both the framework and slave timed out at the very beginning, even though we 
> don't manipulate clocks.
> {code}
> [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
> Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_igiR9X'
> I0929 20:11:12.801327 16997 leveldb.cpp:176] Opened db in 489720ns
> I0929 20:11:12.801627 16997 leveldb.cpp:183] Compacted db in 168280ns
> I0929 20:11:12.801784 16997 leveldb.cpp:198] Created db iterator in 5820ns
> I0929 20:11:12.801898 16997 leveldb.cpp:204] Seeked to beginning of db in 
> 1285ns
> I0929 20:11:12.802039 16997 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 792ns
> I0929 20:11:12.802160 16997 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0929 20:11:12.802441 17012 recover.cpp:425] Starting replica recovery
> I0929 20:11:12.802623 17012 recover.cpp:451] Replica is in EMPTY status
> I0929 20:11:12.803251 17012 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0929 20:11:12.803427 17012 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0929 20:11:12.803632 17012 recover.cpp:542] Updating replica status to 
> STARTING
> I0929 20:11:12.803911 17012 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 33999ns
> I0929 20:11:12.804033 17012 replica.cpp:320] Persisted replica status to 
> STARTING
> I0929 20:11:12.804245 17012 recover.cpp:451] Replica is in STARTING status
> I0929 20:11:12.804592 17012 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0929 20:11:12.804775 17012 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0929 20:11:12.804952 17012 recover.cpp:542] Updating replica status to VOTING
> I0929 20:11:12.805115 17012 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 15990ns
> I0929 20:11:12.805234 17012 replica.cpp:320] Persisted replica status to 
> VOTING
> I0929 20:11:12.805366 17012 recover.cpp:556] Successfully joined the Paxos 
> group
> I0929 20:11:12.805539 17012 recover.cpp:440] Recover process terminated
> I0929 20:11:12.809062 17017 master.cpp:312] Master 
> 20140929-201112-2759502016-47295-16997 (fedora-20) started on 
> 192.168.122.164:47295
> I0929 20:11:12.809432 17017 master.cpp:358] Master only allowing 
> authenticated frameworks to register
> I0929 20:11:12.809546 17017 master.cpp:363] Master only allowing 
> authenticated slaves to register
> I0929 20:11:12.810169 17017 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/DRFAllocatorTest_DRFAllocatorProcess_igiR9X/credentials'
> I0929 20:11:12.810510 17017 master.cpp:392] Authorization enabled
> I0929 20:11:12.811841 17016 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0929 20:11:12.812099 17013 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@192.168.122.164:47295
> I0929 20:11:12.813006 17017 master.cpp:1241] The newly elected leader is 
> master@192.168.122.164:47295 with id 20140929-201112-2759502016-47295-16997
> I0929 20:11:12.813164 17017 master.cpp:1254] Elected as the leading master!
> I0929 20:11:12.813279 17017 master.cpp:1072] Recovering from registrar
> I0929 20:11:12.813487 17013 registrar.cpp:312] Recovering registrar
> I0929 20:11:12.813824 17013 log.cpp:656] Attempting to start the writer
> I0929 20:11:12.814256 17013 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0929 20:11:12.814419 17013 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 25049ns
> I0929 20:11:12.814581 17013 replica.cpp:342] Persisted promised to 1
> I0929 20:11:12.814909 17013 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0929 20:11:12.815340 17013 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0929 20:11:12.815497 17013 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 19855ns
> I0929 20:11:12.815636 17013 replica.cpp:676] Persisted action at 0
> I0929 20:11:12.816066 17013 replica.cpp:508] Replica received write request 
> for position 0
> I0929 20:11:12.816220 17013 leveldb.cpp:438] Reading

[jira] [Commented] (MESOS-1848) DRFAllocatorTest.DRFAllocatorProcess is flaky

2015-10-13 Thread Liqiang Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956344#comment-14956344
 ] 

Liqiang Lin commented on MESOS-1848:


I's like to take this JIRA. But I can not reproduce this bug in my OSX machine. 

> DRFAllocatorTest.DRFAllocatorProcess is flaky
> -
>
> Key: MESOS-1848
> URL: https://issues.apache.org/jira/browse/MESOS-1848
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Fedora 20
>Reporter: Vinod Kone
>  Labels: flaky
>
> Observed this on CI. This is pretty strange because the authentication of 
> both the framework and slave timed out at the very beginning, even though we 
> don't manipulate clocks.
> {code}
> [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
> Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_igiR9X'
> I0929 20:11:12.801327 16997 leveldb.cpp:176] Opened db in 489720ns
> I0929 20:11:12.801627 16997 leveldb.cpp:183] Compacted db in 168280ns
> I0929 20:11:12.801784 16997 leveldb.cpp:198] Created db iterator in 5820ns
> I0929 20:11:12.801898 16997 leveldb.cpp:204] Seeked to beginning of db in 
> 1285ns
> I0929 20:11:12.802039 16997 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 792ns
> I0929 20:11:12.802160 16997 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0929 20:11:12.802441 17012 recover.cpp:425] Starting replica recovery
> I0929 20:11:12.802623 17012 recover.cpp:451] Replica is in EMPTY status
> I0929 20:11:12.803251 17012 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0929 20:11:12.803427 17012 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0929 20:11:12.803632 17012 recover.cpp:542] Updating replica status to 
> STARTING
> I0929 20:11:12.803911 17012 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 33999ns
> I0929 20:11:12.804033 17012 replica.cpp:320] Persisted replica status to 
> STARTING
> I0929 20:11:12.804245 17012 recover.cpp:451] Replica is in STARTING status
> I0929 20:11:12.804592 17012 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0929 20:11:12.804775 17012 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0929 20:11:12.804952 17012 recover.cpp:542] Updating replica status to VOTING
> I0929 20:11:12.805115 17012 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 15990ns
> I0929 20:11:12.805234 17012 replica.cpp:320] Persisted replica status to 
> VOTING
> I0929 20:11:12.805366 17012 recover.cpp:556] Successfully joined the Paxos 
> group
> I0929 20:11:12.805539 17012 recover.cpp:440] Recover process terminated
> I0929 20:11:12.809062 17017 master.cpp:312] Master 
> 20140929-201112-2759502016-47295-16997 (fedora-20) started on 
> 192.168.122.164:47295
> I0929 20:11:12.809432 17017 master.cpp:358] Master only allowing 
> authenticated frameworks to register
> I0929 20:11:12.809546 17017 master.cpp:363] Master only allowing 
> authenticated slaves to register
> I0929 20:11:12.810169 17017 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/DRFAllocatorTest_DRFAllocatorProcess_igiR9X/credentials'
> I0929 20:11:12.810510 17017 master.cpp:392] Authorization enabled
> I0929 20:11:12.811841 17016 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0929 20:11:12.812099 17013 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@192.168.122.164:47295
> I0929 20:11:12.813006 17017 master.cpp:1241] The newly elected leader is 
> master@192.168.122.164:47295 with id 20140929-201112-2759502016-47295-16997
> I0929 20:11:12.813164 17017 master.cpp:1254] Elected as the leading master!
> I0929 20:11:12.813279 17017 master.cpp:1072] Recovering from registrar
> I0929 20:11:12.813487 17013 registrar.cpp:312] Recovering registrar
> I0929 20:11:12.813824 17013 log.cpp:656] Attempting to start the writer
> I0929 20:11:12.814256 17013 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0929 20:11:12.814419 17013 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 25049ns
> I0929 20:11:12.814581 17013 replica.cpp:342] Persisted promised to 1
> I0929 20:11:12.814909 17013 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0929 20:11:12.815340 17013 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0929 20:11:12.815497 17013 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 19855ns
> I0929 20:11:12.815636 17013 replica.cpp:676] Persisted action at 0
> I0929 20:11:12.816066 17013 replica.cpp:508] Replica received write request 

[jira] [Comment Edited] (MESOS-2255) SlaveRecoveryTest/0.MasterFailover is flaky

2015-10-13 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956331#comment-14956331
 ] 

Yong Qiao Wang edited comment on MESOS-2255 at 10/14/15 6:06 AM:
-

I will re-run this test case and fix it if it is still a problem. [~xujyan], do 
you have some latest comment on this ticket.


was (Author: jamesyongqiaowang):
I will re-run this test case and fix it if it is still a problem.

> SlaveRecoveryTest/0.MasterFailover is flaky
> ---
>
> Key: MESOS-2255
> URL: https://issues.apache.org/jira/browse/MESOS-2255
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Yan Xu
>Assignee: Yong Qiao Wang
>  Labels: flaky, twitter
>
> {noformat:title=}
> [ RUN  ] SlaveRecoveryTest/0.MasterFailover
> Using temporary directory '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0'
> I0123 07:45:49.818686 17634 leveldb.cpp:176] Opened db in 31.195549ms
> I0123 07:45:49.821962 17634 leveldb.cpp:183] Compacted db in 3.190936ms
> I0123 07:45:49.822049 17634 leveldb.cpp:198] Created db iterator in 47324ns
> I0123 07:45:49.822069 17634 leveldb.cpp:204] Seeked to beginning of db in 
> 2038ns
> I0123 07:45:49.822084 17634 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 484ns
> I0123 07:45:49.822160 17634 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0123 07:45:49.824241 17660 recover.cpp:449] Starting replica recovery
> I0123 07:45:49.825217 17660 recover.cpp:475] Replica is in EMPTY status
> I0123 07:45:49.827020 17660 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0123 07:45:49.827453 17659 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0123 07:45:49.828047 17659 recover.cpp:566] Updating replica status to 
> STARTING
> I0123 07:45:49.838543 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 10.24963ms
> I0123 07:45:49.838580 17659 replica.cpp:323] Persisted replica status to 
> STARTING
> I0123 07:45:49.848836 17659 recover.cpp:475] Replica is in STARTING status
> I0123 07:45:49.850039 17659 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0123 07:45:49.850286 17659 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0123 07:45:49.850754 17659 recover.cpp:566] Updating replica status to VOTING
> I0123 07:45:49.853698 17655 master.cpp:262] Master 
> 20150123-074549-16842879-44955-17634 (utopic) started on 127.0.1.1:44955
> I0123 07:45:49.853981 17655 master.cpp:308] Master only allowing 
> authenticated frameworks to register
> I0123 07:45:49.853997 17655 master.cpp:313] Master only allowing 
> authenticated slaves to register
> I0123 07:45:49.854038 17655 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0/credentials'
> I0123 07:45:49.854557 17655 master.cpp:357] Authorization enabled
> I0123 07:45:49.859633 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.742923ms
> I0123 07:45:49.859853 17659 replica.cpp:323] Persisted replica status to 
> VOTING
> I0123 07:45:49.860327 17658 recover.cpp:580] Successfully joined the Paxos 
> group
> I0123 07:45:49.860703 17654 recover.cpp:464] Recover process terminated
> I0123 07:45:49.859591 17655 master.cpp:1219] The newly elected leader is 
> master@127.0.1.1:44955 with id 20150123-074549-16842879-44955-17634
> I0123 07:45:49.864702 17655 master.cpp:1232] Elected as the leading master!
> I0123 07:45:49.864904 17655 master.cpp:1050] Recovering from registrar
> I0123 07:45:49.865406 17660 registrar.cpp:313] Recovering registrar
> I0123 07:45:49.866576 17660 log.cpp:660] Attempting to start the writer
> I0123 07:45:49.868638 17658 replica.cpp:477] Replica received implicit 
> promise request with proposal 1
> I0123 07:45:49.872521 17658 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.848859ms
> I0123 07:45:49.872555 17658 replica.cpp:345] Persisted promised to 1
> I0123 07:45:49.873769 17661 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0123 07:45:49.875474 17658 replica.cpp:378] Replica received explicit 
> promise request for position 0 with proposal 2
> I0123 07:45:49.880878 17658 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 5.364021ms
> I0123 07:45:49.880913 17658 replica.cpp:679] Persisted action at 0
> I0123 07:45:49.882619 17657 replica.cpp:511] Replica received write request 
> for position 0
> I0123 07:45:49.882998 17657 leveldb.cpp:438] Reading position from leveldb 
> took 150092ns
> I0123 07:45:49.886488 17657 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 3.26

[jira] [Commented] (MESOS-2255) SlaveRecoveryTest/0.MasterFailover is flaky

2015-10-13 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956331#comment-14956331
 ] 

Yong Qiao Wang commented on MESOS-2255:
---

I will re-run this test case and fix it if it is still a problem.

> SlaveRecoveryTest/0.MasterFailover is flaky
> ---
>
> Key: MESOS-2255
> URL: https://issues.apache.org/jira/browse/MESOS-2255
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Yan Xu
>Assignee: Yong Qiao Wang
>  Labels: flaky, twitter
>
> {noformat:title=}
> [ RUN  ] SlaveRecoveryTest/0.MasterFailover
> Using temporary directory '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0'
> I0123 07:45:49.818686 17634 leveldb.cpp:176] Opened db in 31.195549ms
> I0123 07:45:49.821962 17634 leveldb.cpp:183] Compacted db in 3.190936ms
> I0123 07:45:49.822049 17634 leveldb.cpp:198] Created db iterator in 47324ns
> I0123 07:45:49.822069 17634 leveldb.cpp:204] Seeked to beginning of db in 
> 2038ns
> I0123 07:45:49.822084 17634 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 484ns
> I0123 07:45:49.822160 17634 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0123 07:45:49.824241 17660 recover.cpp:449] Starting replica recovery
> I0123 07:45:49.825217 17660 recover.cpp:475] Replica is in EMPTY status
> I0123 07:45:49.827020 17660 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0123 07:45:49.827453 17659 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0123 07:45:49.828047 17659 recover.cpp:566] Updating replica status to 
> STARTING
> I0123 07:45:49.838543 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 10.24963ms
> I0123 07:45:49.838580 17659 replica.cpp:323] Persisted replica status to 
> STARTING
> I0123 07:45:49.848836 17659 recover.cpp:475] Replica is in STARTING status
> I0123 07:45:49.850039 17659 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0123 07:45:49.850286 17659 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0123 07:45:49.850754 17659 recover.cpp:566] Updating replica status to VOTING
> I0123 07:45:49.853698 17655 master.cpp:262] Master 
> 20150123-074549-16842879-44955-17634 (utopic) started on 127.0.1.1:44955
> I0123 07:45:49.853981 17655 master.cpp:308] Master only allowing 
> authenticated frameworks to register
> I0123 07:45:49.853997 17655 master.cpp:313] Master only allowing 
> authenticated slaves to register
> I0123 07:45:49.854038 17655 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0/credentials'
> I0123 07:45:49.854557 17655 master.cpp:357] Authorization enabled
> I0123 07:45:49.859633 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.742923ms
> I0123 07:45:49.859853 17659 replica.cpp:323] Persisted replica status to 
> VOTING
> I0123 07:45:49.860327 17658 recover.cpp:580] Successfully joined the Paxos 
> group
> I0123 07:45:49.860703 17654 recover.cpp:464] Recover process terminated
> I0123 07:45:49.859591 17655 master.cpp:1219] The newly elected leader is 
> master@127.0.1.1:44955 with id 20150123-074549-16842879-44955-17634
> I0123 07:45:49.864702 17655 master.cpp:1232] Elected as the leading master!
> I0123 07:45:49.864904 17655 master.cpp:1050] Recovering from registrar
> I0123 07:45:49.865406 17660 registrar.cpp:313] Recovering registrar
> I0123 07:45:49.866576 17660 log.cpp:660] Attempting to start the writer
> I0123 07:45:49.868638 17658 replica.cpp:477] Replica received implicit 
> promise request with proposal 1
> I0123 07:45:49.872521 17658 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.848859ms
> I0123 07:45:49.872555 17658 replica.cpp:345] Persisted promised to 1
> I0123 07:45:49.873769 17661 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0123 07:45:49.875474 17658 replica.cpp:378] Replica received explicit 
> promise request for position 0 with proposal 2
> I0123 07:45:49.880878 17658 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 5.364021ms
> I0123 07:45:49.880913 17658 replica.cpp:679] Persisted action at 0
> I0123 07:45:49.882619 17657 replica.cpp:511] Replica received write request 
> for position 0
> I0123 07:45:49.882998 17657 leveldb.cpp:438] Reading position from leveldb 
> took 150092ns
> I0123 07:45:49.886488 17657 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 3.269189ms
> I0123 07:45:49.886536 17657 replica.cpp:679] Persisted action at 0
> I0123 07:45:49.887181 17657 replica.cpp:658] Replica received learned notice 
> for position 0
> I0123 07:45:49.892900 17657 leveldb.c

[jira] [Assigned] (MESOS-2239) MasterAuthorizationTest.DuplicateRegistration is flaky

2015-10-13 Thread Chen Zhiwei (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhiwei reassigned MESOS-2239:
--

Assignee: Chen Zhiwei

> MasterAuthorizationTest.DuplicateRegistration is flaky
> --
>
> Key: MESOS-2239
> URL: https://issues.apache.org/jira/browse/MESOS-2239
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: CentOS5 gcc-4.8
>Reporter: Jie Yu
>Assignee: Chen Zhiwei
>  Labels: flaky, flaky-test
>
> {noformat}
> 19:30:44 DEBUG: [ RUN  ] MasterAuthorizationTest.DuplicateRegistration
> 19:30:44 DEBUG: Using temporary directory 
> '/tmp/MasterAuthorizationTest_DuplicateRegistration_lTKlxz'
> 19:30:44 DEBUG: I0121 19:30:44.583595 54842 leveldb.cpp:176] Opened db in 
> 2.002477ms
> 19:30:44 DEBUG: I0121 19:30:44.584470 54842 leveldb.cpp:183] Compacted db in 
> 848351ns
> 19:30:44 DEBUG: I0121 19:30:44.584492 54842 leveldb.cpp:198] Created db 
> iterator in 3830ns
> 19:30:44 DEBUG: I0121 19:30:44.584506 54842 leveldb.cpp:204] Seeked to 
> beginning of db in 962ns
> 19:30:44 DEBUG: I0121 19:30:44.584519 54842 leveldb.cpp:273] Iterated through 
> 0 keys in the db in 598ns
> 19:30:44 DEBUG: I0121 19:30:44.584537 54842 replica.cpp:744] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> 19:30:44 DEBUG: I0121 19:30:44.584684 54873 recover.cpp:449] Starting replica 
> recovery
> 19:30:44 DEBUG: I0121 19:30:44.584774 54859 recover.cpp:475] Replica is in 
> EMPTY status
> 19:30:44 DEBUG: I0121 19:30:44.586305 54881 replica.cpp:641] Replica in EMPTY 
> status received a broadcasted recover request
> 19:30:44 DEBUG: I0121 19:30:44.586943 54866 recover.cpp:195] Received a 
> recover response from a replica in EMPTY status
> 19:30:44 DEBUG: I0121 19:30:44.587247 54872 recover.cpp:566] Updating replica 
> status to STARTING
> 19:30:44 DEBUG: I0121 19:30:44.587838 54867 leveldb.cpp:306] Persisting 
> metadata (8 bytes) to leveldb took 393697ns
> 19:30:44 DEBUG: I0121 19:30:44.587862 54867 replica.cpp:323] Persisted 
> replica status to STARTING
> 19:30:44 DEBUG: I0121 19:30:44.587920 54877 recover.cpp:475] Replica is in 
> STARTING status
> 19:30:44 DEBUG: I0121 19:30:44.588341 54868 replica.cpp:641] Replica in 
> STARTING status received a broadcasted recover request
> 19:30:44 DEBUG: I0121 19:30:44.588577 54877 recover.cpp:195] Received a 
> recover response from a replica in STARTING status
> 19:30:44 DEBUG: I0121 19:30:44.589040 54863 recover.cpp:566] Updating replica 
> status to VOTING
> 19:30:44 DEBUG: I0121 19:30:44.589344 54871 leveldb.cpp:306] Persisting 
> metadata (8 bytes) to leveldb took 268257ns
> 19:30:44 DEBUG: I0121 19:30:44.589361 54871 replica.cpp:323] Persisted 
> replica status to VOTING
> 19:30:44 DEBUG: I0121 19:30:44.589426 54858 recover.cpp:580] Successfully 
> joined the Paxos group
> 19:30:44 DEBUG: I0121 19:30:44.589735 54858 recover.cpp:464] Recover process 
> terminated
> 19:30:44 DEBUG: I0121 19:30:44.593657 54866 master.cpp:262] Master 
> 20150121-193044-1711542956-52053-54842 (atlc-bev-05-sr1.corpdc.twttr.net) 
> started on 172.18.4.102:52053
> 19:30:44 DEBUG: I0121 19:30:44.593690 54866 master.cpp:308] Master only 
> allowing authenticated frameworks to register
> 19:30:44 DEBUG: I0121 19:30:44.593699 54866 master.cpp:313] Master only 
> allowing authenticated slaves to register
> 19:30:44 DEBUG: I0121 19:30:44.593708 54866 credentials.hpp:36] Loading 
> credentials for authentication from 
> '/tmp/MasterAuthorizationTest_DuplicateRegistration_lTKlxz/credentials'
> 19:30:44 DEBUG: I0121 19:30:44.593808 54866 master.cpp:357] Authorization 
> enabled
> 19:30:44 DEBUG: I0121 19:30:44.594319 54871 master.cpp:1219] The newly 
> elected leader is master@172.18.4.102:52053 with id 
> 20150121-193044-1711542956-52053-54842
> 19:30:44 DEBUG: I0121 19:30:44.594336 54871 master.cpp:1232] Elected as the 
> leading master!
> 19:30:44 DEBUG: I0121 19:30:44.594343 54871 master.cpp:1050] Recovering from 
> registrar
> 19:30:44 DEBUG: I0121 19:30:44.594403 54867 registrar.cpp:313] Recovering 
> registrar
> 19:30:44 DEBUG: I0121 19:30:44.594558 54858 log.cpp:660] Attempting to start 
> the writer
> 19:30:44 DEBUG: I0121 19:30:44.595000 54859 replica.cpp:477] Replica received 
> implicit promise request with proposal 1
> 19:30:44 DEBUG: I0121 19:30:44.595340 54859 leveldb.cpp:306] Persisting 
> metadata (8 bytes) to leveldb took 319942ns
> 19:30:44 DEBUG: I0121 19:30:44.595360 54859 replica.cpp:345] Persisted 
> promised to 1
> 19:30:44 DEBUG: I0121 19:30:44.595700 54878 coordinator.cpp:230] Coordinator 
> attemping to fill missing position
> 19:30:44 DEBUG: I0121 19:30:44.596330 54859 replica.cpp:378] Replica received 
> explicit promise request for position 0 with proposal 2
> 19:30:44 DEBUG: I0121 19:30:4

[jira] [Assigned] (MESOS-2331) MasterSlaveReconciliationTest.ReconcileRace is flaky

2015-10-13 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang reassigned MESOS-2331:
-

Assignee: Qian Zhang

> MasterSlaveReconciliationTest.ReconcileRace is flaky
> 
>
> Key: MESOS-2331
> URL: https://issues.apache.org/jira/browse/MESOS-2331
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Yan Xu
>Assignee: Qian Zhang
>  Labels: flaky
>
> {noformat:title=}
> [ RUN  ] MasterSlaveReconciliationTest.ReconcileRace
> Using temporary directory 
> '/tmp/MasterSlaveReconciliationTest_ReconcileRace_NE9nhV'
> I0206 19:09:44.196542 32362 leveldb.cpp:175] Opened db in 38.230192ms
> I0206 19:09:44.206826 32362 leveldb.cpp:182] Compacted db in 9.988493ms
> I0206 19:09:44.207164 32362 leveldb.cpp:197] Created db iterator in 29979ns
> I0206 19:09:44.207641 32362 leveldb.cpp:203] Seeked to beginning of db in 
> 4478ns
> I0206 19:09:44.207929 32362 leveldb.cpp:272] Iterated through 0 keys in the 
> db in 737ns
> I0206 19:09:44.208222 32362 replica.cpp:743] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0206 19:09:44.209132 32384 recover.cpp:448] Starting replica recovery
> I0206 19:09:44.209524 32384 recover.cpp:474] Replica is in EMPTY status
> I0206 19:09:44.211094 32384 replica.cpp:640] Replica in EMPTY status received 
> a broadcasted recover request
> I0206 19:09:44.211385 32384 recover.cpp:194] Received a recover response from 
> a replica in EMPTY status
> I0206 19:09:44.211902 32384 recover.cpp:565] Updating replica status to 
> STARTING
> I0206 19:09:44.236177 32381 master.cpp:344] Master 
> 20150206-190944-16842879-36452-32362 (lucid) started on 127.0.1.1:36452
> I0206 19:09:44.236291 32381 master.cpp:390] Master only allowing 
> authenticated frameworks to register
> I0206 19:09:44.236305 32381 master.cpp:395] Master only allowing 
> authenticated slaves to register
> I0206 19:09:44.236327 32381 credentials.hpp:35] Loading credentials for 
> authentication from 
> '/tmp/MasterSlaveReconciliationTest_ReconcileRace_NE9nhV/credentials'
> I0206 19:09:44.236601 32381 master.cpp:439] Authorization enabled
> I0206 19:09:44.238539 32381 hierarchical_allocator_process.hpp:284] 
> Initialized hierarchical allocator process
> I0206 19:09:44.238662 32381 whitelist_watcher.cpp:64] No whitelist given
> I0206 19:09:44.239364 32381 master.cpp:1350] The newly elected leader is 
> master@127.0.1.1:36452 with id 20150206-190944-16842879-36452-32362
> I0206 19:09:44.239392 32381 master.cpp:1363] Elected as the leading master!
> I0206 19:09:44.239413 32381 master.cpp:1181] Recovering from registrar
> I0206 19:09:44.239645 32381 registrar.cpp:312] Recovering registrar
> I0206 19:09:44.241142 32384 leveldb.cpp:305] Persisting metadata (8 bytes) to 
> leveldb took 29.029117ms
> I0206 19:09:44.241189 32384 replica.cpp:322] Persisted replica status to 
> STARTING
> I0206 19:09:44.241478 32384 recover.cpp:474] Replica is in STARTING status
> I0206 19:09:44.243075 32384 replica.cpp:640] Replica in STARTING status 
> received a broadcasted recover request
> I0206 19:09:44.243398 32384 recover.cpp:194] Received a recover response from 
> a replica in STARTING status
> I0206 19:09:44.243964 32384 recover.cpp:565] Updating replica status to VOTING
> I0206 19:09:44.255692 32384 leveldb.cpp:305] Persisting metadata (8 bytes) to 
> leveldb took 11.502759ms
> I0206 19:09:44.255765 32384 replica.cpp:322] Persisted replica status to 
> VOTING
> I0206 19:09:44.256009 32384 recover.cpp:579] Successfully joined the Paxos 
> group
> I0206 19:09:44.256253 32384 recover.cpp:463] Recover process terminated
> I0206 19:09:44.257669 32384 log.cpp:659] Attempting to start the writer
> I0206 19:09:44.259944 32377 replica.cpp:476] Replica received implicit 
> promise request with proposal 1
> I0206 19:09:44.268805 32377 leveldb.cpp:305] Persisting metadata (8 bytes) to 
> leveldb took 8.45858ms
> I0206 19:09:44.269067 32377 replica.cpp:344] Persisted promised to 1
> I0206 19:09:44.277974 32383 coordinator.cpp:229] Coordinator attemping to 
> fill missing position
> I0206 19:09:44.279767 32383 replica.cpp:377] Replica received explicit 
> promise request for position 0 with proposal 2
> I0206 19:09:44.288940 32383 leveldb.cpp:342] Persisting action (8 bytes) to 
> leveldb took 9.128603ms
> I0206 19:09:44.289294 32383 replica.cpp:678] Persisted action at 0
> I0206 19:09:44.296417 32377 replica.cpp:510] Replica received write request 
> for position 0
> I0206 19:09:44.296944 32377 leveldb.cpp:437] Reading position from leveldb 
> took 48457ns
> I0206 19:09:44.305337 32377 leveldb.cpp:342] Persisting action (14 bytes) to 
> leveldb took 8.141689ms
> I0206 19:09:44.305662 32377 replica.cpp:678] Persisted action at 0
> I0206 19:

[jira] [Assigned] (MESOS-2255) SlaveRecoveryTest/0.MasterFailover is flaky

2015-10-13 Thread Yong Qiao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Qiao Wang reassigned MESOS-2255:
-

Assignee: Yong Qiao Wang

> SlaveRecoveryTest/0.MasterFailover is flaky
> ---
>
> Key: MESOS-2255
> URL: https://issues.apache.org/jira/browse/MESOS-2255
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Yan Xu
>Assignee: Yong Qiao Wang
>  Labels: flaky, twitter
>
> {noformat:title=}
> [ RUN  ] SlaveRecoveryTest/0.MasterFailover
> Using temporary directory '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0'
> I0123 07:45:49.818686 17634 leveldb.cpp:176] Opened db in 31.195549ms
> I0123 07:45:49.821962 17634 leveldb.cpp:183] Compacted db in 3.190936ms
> I0123 07:45:49.822049 17634 leveldb.cpp:198] Created db iterator in 47324ns
> I0123 07:45:49.822069 17634 leveldb.cpp:204] Seeked to beginning of db in 
> 2038ns
> I0123 07:45:49.822084 17634 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 484ns
> I0123 07:45:49.822160 17634 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0123 07:45:49.824241 17660 recover.cpp:449] Starting replica recovery
> I0123 07:45:49.825217 17660 recover.cpp:475] Replica is in EMPTY status
> I0123 07:45:49.827020 17660 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0123 07:45:49.827453 17659 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0123 07:45:49.828047 17659 recover.cpp:566] Updating replica status to 
> STARTING
> I0123 07:45:49.838543 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 10.24963ms
> I0123 07:45:49.838580 17659 replica.cpp:323] Persisted replica status to 
> STARTING
> I0123 07:45:49.848836 17659 recover.cpp:475] Replica is in STARTING status
> I0123 07:45:49.850039 17659 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0123 07:45:49.850286 17659 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0123 07:45:49.850754 17659 recover.cpp:566] Updating replica status to VOTING
> I0123 07:45:49.853698 17655 master.cpp:262] Master 
> 20150123-074549-16842879-44955-17634 (utopic) started on 127.0.1.1:44955
> I0123 07:45:49.853981 17655 master.cpp:308] Master only allowing 
> authenticated frameworks to register
> I0123 07:45:49.853997 17655 master.cpp:313] Master only allowing 
> authenticated slaves to register
> I0123 07:45:49.854038 17655 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0/credentials'
> I0123 07:45:49.854557 17655 master.cpp:357] Authorization enabled
> I0123 07:45:49.859633 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.742923ms
> I0123 07:45:49.859853 17659 replica.cpp:323] Persisted replica status to 
> VOTING
> I0123 07:45:49.860327 17658 recover.cpp:580] Successfully joined the Paxos 
> group
> I0123 07:45:49.860703 17654 recover.cpp:464] Recover process terminated
> I0123 07:45:49.859591 17655 master.cpp:1219] The newly elected leader is 
> master@127.0.1.1:44955 with id 20150123-074549-16842879-44955-17634
> I0123 07:45:49.864702 17655 master.cpp:1232] Elected as the leading master!
> I0123 07:45:49.864904 17655 master.cpp:1050] Recovering from registrar
> I0123 07:45:49.865406 17660 registrar.cpp:313] Recovering registrar
> I0123 07:45:49.866576 17660 log.cpp:660] Attempting to start the writer
> I0123 07:45:49.868638 17658 replica.cpp:477] Replica received implicit 
> promise request with proposal 1
> I0123 07:45:49.872521 17658 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.848859ms
> I0123 07:45:49.872555 17658 replica.cpp:345] Persisted promised to 1
> I0123 07:45:49.873769 17661 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0123 07:45:49.875474 17658 replica.cpp:378] Replica received explicit 
> promise request for position 0 with proposal 2
> I0123 07:45:49.880878 17658 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 5.364021ms
> I0123 07:45:49.880913 17658 replica.cpp:679] Persisted action at 0
> I0123 07:45:49.882619 17657 replica.cpp:511] Replica received write request 
> for position 0
> I0123 07:45:49.882998 17657 leveldb.cpp:438] Reading position from leveldb 
> took 150092ns
> I0123 07:45:49.886488 17657 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 3.269189ms
> I0123 07:45:49.886536 17657 replica.cpp:679] Persisted action at 0
> I0123 07:45:49.887181 17657 replica.cpp:658] Replica received learned notice 
> for position 0
> I0123 07:45:49.892900 17657 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 5.690093ms
> I0123 07:45:49.8929

[jira] [Created] (MESOS-3727) File permission inconsistency for mesos-master executable and mesos-init-wrapper.

2015-10-13 Thread Sarjeet Singh (JIRA)
Sarjeet Singh created MESOS-3727:


 Summary: File permission inconsistency for mesos-master executable 
and mesos-init-wrapper.
 Key: MESOS-3727
 URL: https://issues.apache.org/jira/browse/MESOS-3727
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.25.0
Reporter: Sarjeet Singh
Priority: Trivial


There seems some file permission inconsistency for mesos-master executable and 
mesos-init-wrapper script with mesos-version 0.25.

node-1:~# dpkg -l | grep mesos
ii  mesos   0.25.0-0.2.70.ubuntu1404

node-1:~# ls -ld /usr/sbin/mesos-master
-rwxr-xr-x 1 root root 289173 Oct 12 14:07 /usr/sbin/mesos-master

node-1:~# ls -ld /usr/bin/mesos-init-wrapper
-rwxrwx--- 1 root root 5202 Oct  1 11:17 /usr/bin/mesos-init-wrapper

Observed the issue when tried to execute the mesos-master executable with 
non-root user and since, init-wrapper doesn't have any non-root user 
permission, it didn't get executed and mesos-master didn't get started.

Should be make these file permission consistent for executable & init-script? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3724) subprocess fail to process "docker inspect" output if >64KB

2015-10-13 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956061#comment-14956061
 ] 

Benjamin Mahler commented on MESOS-3724:


Appears to be fixed here:

https://github.com/apache/mesos/commit/f13239e14f9a982465e93184ef6e34395eb561f4

> subprocess fail to process "docker inspect" output if >64KB
> ---
>
> Key: MESOS-3724
> URL: https://issues.apache.org/jira/browse/MESOS-3724
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.21.1
>Reporter: Bhuvan Arumugam
>
> When running a task with docker and if {{docker inspect}} output size is more 
> than 64k, it fails. The command {{docker inspect}} is blocked. The task 
> remain in ASSIGNED state and after 15mins, the task is KILLED. The subprocess 
> library [1] used in mesos to run this command is not handling the output 
> beyond this size.
> {code}
> docker pull livecipher/mesos-docker-inspect-64k:1.0 > /dev/null && docker 
> inspect livecipher/mesos-docker-inspect-64k:1.0 > inspect.out && du -sh 
> inspect.out && rm -f inspect.out
>  76K  inspect.out
> {code}
> You can reproduce it using the above image with any framework. I tested it 
> with aurora.
> Here is a sample failure: http://pastebin.com/w1Ty41rb
> [1] https://github.com/apache/mesos/blob/0.21.1/src/docker/docker.cpp#L804



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3726) RegistryClientTest.SimpleGetBlob is flaky

2015-10-13 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3726:
-

 Summary: RegistryClientTest.SimpleGetBlob is flaky
 Key: MESOS-3726
 URL: https://issues.apache.org/jira/browse/MESOS-3726
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Anand Mazumdar


Showed up on ASF CI:
https://builds.apache.org/job/Mesos/910/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos:7,label_exp=docker%7C%7CHadoop/console

{code}
[ RUN  ] RegistryClientTest.SimpleGetBlob
../../src/tests/containerizer/provisioner_docker_tests.cpp:585: Failure
(socket).failure(): Failed accept: connection error: Connection reset by peer
[  FAILED  ] RegistryClientTest.SimpleGetBlob (10 ms)
{code}

Logs from a good run:
https://builds.apache.org/job/Mesos/919/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos:7,label_exp=docker%7C%7CHadoop/consoleFull

{code}
[ RUN  ] RegistryClientTest.SimpleGetBlob
I1013 01:42:08.282057 31645 registry_client.cpp:262] Response status: 401 
Unauthorized
I1013 01:42:08.294426 31646 registry_client.cpp:262] Response status: 307 
Temporary Redirect
I1013 01:42:08.300989 31647 registry_client.cpp:262] Response status: 200 OK
[   OK ] RegistryClientTest.SimpleGetBlob (29 ms)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3411) ReservationEndpointsTest.AvailableResources appears to be faulty

2015-10-13 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955942#comment-14955942
 ] 

Michael Park commented on MESOS-3411:
-

Reopened to track the failure encountered on the Apache buildbot: 
https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/921/console

{code}
I1013 09:54:08.882694 29149 master.cpp:5559] Processing TEARDOWN call for 
framework 4b0845cd-7ce9-4e7a-b5d1-bcf1c413ca39- (default) at 
scheduler-62c161d7-60e9-4361-aae1-6431e60035f6@172.17.5.161:57074
I1013 09:54:08.882822 29149 master.cpp:5571] Removing framework 
4b0845cd-7ce9-4e7a-b5d1-bcf1c413ca39- (default) at 
scheduler-62c161d7-60e9-4361-aae1-6431e60035f6@172.17.5.161:57074
../../src/tests/reservation_endpoints_tests.cpp:184: Failure
Mock function called more times than expected - taking default action specified 
at:
../../src/tests/mesos.hpp:1518:
Function call: recoverResources(@0x2b1be400cc10 
4b0845cd-7ce9-4e7a-b5d1-bcf1c413ca39-, @0x2b1be4018f40 
4b0845cd-7ce9-4e7a-b5d1-bcf1c413ca39-S0, @0x2b1bc22fc390 { cpus(*):2, 
mem(*):1024, disk(*):1024, ports(*):[31000-32000] }, @0x2b1bc22fc3d0 40-byte 
object <01-00 00-00 1C-2B 00-00 0B-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
00-00 00-00 00-00 00-00 00-00 80-3F 00-00 00-00>)
 Expected: to be called once
   Actual: called twice - over-saturated and active
I1013 09:54:08.884042 29149 hierarchical.hpp:599] Deactivated framework 
4b0845cd-7ce9-4e7a-b5d1-bcf1c413ca39-
I1013 09:54:08.884371 29149 hierarchical.hpp:1103] Recovered cpus(*):2; 
mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total: cpus(*):2; 
mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) on slave 
4b0845cd-7ce9-4e7a-b5d1-bcf1c413ca39-S0 from framework 
4b0845cd-7ce9-4e7a-b5d1-bcf1c413ca39-
I1013 09:54:08.884469 29149 hierarchical.hpp:552] Removed framework 
4b0845cd-7ce9-4e7a-b5d1-bcf1c413ca39-
{code}

> ReservationEndpointsTest.AvailableResources appears to be faulty
> 
>
> Key: MESOS-3411
> URL: https://issues.apache.org/jira/browse/MESOS-3411
> Project: Mesos
>  Issue Type: Bug
>Reporter: Joseph Wu
>Assignee: Michael Park
> Fix For: 0.25.0
>
>
> The reviewbot failed a test, when building/testing an unrelated review 
> (https://reviews.apache.org/r/38077/)
> {code}
> [--] 11 tests from ReservationEndpointsTest
> [ RUN  ] ReservationEndpointsTest.AvailableResources
> ../../src/tests/reservation_endpoints_tests.cpp:195: Failure
> Failed to wait 15secs for recoverResources
> ../../src/tests/reservation_endpoints_tests.cpp:191: Failure
> Actual function call count doesn't match EXPECT_CALL(allocator, 
> recoverResources(_, _, _, _))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> F0910 21:23:50.468192 21487 logging.cpp:57] RAW: Pure virtual method called
> @ 0x2b964be3f82e  google::LogMessage::Fail()
> @ 0x2b964be44ede  google::RawLog__()
> @ 0x2b964b20d1a6  __cxa_pure_virtual
> @ 0x2b964b2cd1ab  mesos::internal::master::Master::removeFramework()
> @ 0x2b964b2c6166  
> mesos::internal::master::Master::frameworkFailoverTimeout()
> @ 0x2b964b31446d  
> _ZZN7process8dispatchIN5mesos8internal6master6MasterERKNS1_11FrameworkIDERKNS_4TimeES5_S8_EEvRKNS_3PIDIT_EEMSC_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESN_
> @ 0x2b964b366a5f  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_11FrameworkIDERKNS0_4TimeES9_SC_EEvRKNS0_3PIDIT_EEMSG_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b964bdcc55b  std::function<>::operator()()
> @ 0x2b964bdb5e41  process::ProcessBase::visit()
> @ 0x2b964bdb8cb0  process::DispatchEvent::visit()
> @   0xb22bc6  process::ProcessBase::serve()
> @ 0x2b964bdb2348  process::ProcessManager::resume()
> @ 0x2b964bda6854  process::internal::schedule()
> @ 0x2b964be020e3  
> _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x2b964be0203d  std::_Bind_simple<>::operator()()
> @ 0x2b964be01fd6  std::thread::_Impl<>::_M_run()
> @ 0x2b964d975a40  (unknown)
> @ 0x2b964e0ec182  start_thread
> @ 0x2b964e3fc47d  (unknown)
> make[4]: *** [check-local] Aborted
> make[4]: Leaving directory 
> `/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.25.0/_build/src'
> make[3]: *** [check-am] Error 2
> make[3]: Leaving directory 
> `/home/jenkins/jenkins-slave/workspace/mesos-reviewbot/mesos-0.25.0/_build/src'
> make[2]: *** [check] Error 2
> make[2]: Leaving directory 
> `/home/jenkins/jenkins-slave/workspace/mes

[jira] [Commented] (MESOS-3725) shared library loading depends on environment variable updates

2015-10-13 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955857#comment-14955857
 ] 

James Peach commented on MESOS-3725:


The alternative fix is to retain the {{libraries::path()}} APIs and persist a 
set of search paths to manually implement the dynamic dlopen(3) search. IMHO 
this is less preferable since it makes it harder for operators to reason about 
where libraries will be loaded from; it's also not generally needed since it is 
so easy to always use an absolute path.

> shared library loading depends on environment variable updates
> --
>
> Key: MESOS-3725
> URL: https://issues.apache.org/jira/browse/MESOS-3725
> Project: Mesos
>  Issue Type: Bug
>  Components: modules, stout
>Reporter: James Peach
>Assignee: James Peach
>
> {{ModuleTest::SetUpTestCase()}} and the various {{libraries::paths()}} is 
> stout assume that updating {{LD_LIBRARY_PATH}} or {{DYLD_LIBRARY_PATH}} is 
> sufficient to alter the search path used by dlopen(3). It is not; those 
> environment variables are only bound at program load.
> My preference is to fix this by requiring the clients of {{DynamicLibrary}} 
> to always pass in an absolute path and to remove all mention of these 
> environment variables.
> FWIW, the tests in {{ModuleTest::SetUpTestCase()}} only work because the 
> libtool wrapper script sets up the library path to the expected value prior 
> to running the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3725) shared library loading depends on environment variable updates

2015-10-13 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-3725:
--

Assignee: James Peach

> shared library loading depends on environment variable updates
> --
>
> Key: MESOS-3725
> URL: https://issues.apache.org/jira/browse/MESOS-3725
> Project: Mesos
>  Issue Type: Bug
>  Components: modules, stout
>Reporter: James Peach
>Assignee: James Peach
>
> {{ModuleTest::SetUpTestCase()}} and the various {{libraries::paths()}} is 
> stout assume that updating {{LD_LIBRARY_PATH}} or {{DYLD_LIBRARY_PATH}} is 
> sufficient to alter the search path used by dlopen(3). It is not; those 
> environment variables are only bound at program load.
> My preference is to fix this by requiring the clients of {{DynamicLibrary}} 
> to always pass in an absolute path and to remove all mention of these 
> environment variables.
> FWIW, the tests in {{ModuleTest::SetUpTestCase()}} only work because the 
> libtool wrapper script sets up the library path to the expected value prior 
> to running the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3725) shared library loading depends on environment variable updates

2015-10-13 Thread James Peach (JIRA)
James Peach created MESOS-3725:
--

 Summary: shared library loading depends on environment variable 
updates
 Key: MESOS-3725
 URL: https://issues.apache.org/jira/browse/MESOS-3725
 Project: Mesos
  Issue Type: Bug
  Components: modules, stout
Reporter: James Peach


{{ModuleTest::SetUpTestCase()}} and the various {{libraries::paths()}} is stout 
assume that updating {{LD_LIBRARY_PATH}} or {{DYLD_LIBRARY_PATH}} is sufficient 
to alter the search path used by dlopen(3). It is not; those environment 
variables are only bound at program load.

My preference is to fix this by requiring the clients of {{DynamicLibrary}} to 
always pass in an absolute path and to remove all mention of these environment 
variables.

FWIW, the tests in {{ModuleTest::SetUpTestCase()}} only work because the 
libtool wrapper script sets up the library path to the expected value prior to 
running the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3724) subprocess fail to process "docker inspect" output if >64KB

2015-10-13 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-3724:
---
Description: 
When running a task with docker and if {{docker inspect}} output size is more 
than 64k, it fails. The command {{docker inspect}} is blocked. The task remain 
in ASSIGNED state and after 15mins, the task is KILLED. The subprocess library 
[1] used in mesos to run this command is not handling the output beyond this 
size.

{code}
docker pull livecipher/mesos-docker-inspect-64k:1.0 > /dev/null && docker 
inspect livecipher/mesos-docker-inspect-64k:1.0 > inspect.out && du -sh 
inspect.out && rm -f inspect.out
 76Kinspect.out
{code}

You can reproduce it using the above image with any framework. I tested it with 
aurora.

Here is a sample failure: http://pastebin.com/w1Ty41rb

[1] https://github.com/apache/mesos/blob/0.21.1/src/docker/docker.cpp#L804

  was:
When running a task with docker and if {{docker inspect}} output size is more 
than 64k, it fails. The {{docker inspect}} The subprocess library [1] used in 
mesos to run this command is not handling the output beyond this size.

{code}
docker pull livecipher/mesos-docker-inspect-64k:1.0 > /dev/null && docker 
inspect livecipher/mesos-docker-inspect-64k:1.0 > inspect.out && du -sh 
inspect.out && rm -f inspect.out
 76Kinspect.out
{code}

You can reproduce it using the above image with any framework. I tested it with 
aurora.

Here is a sample failure: http://pastebin.com/w1Ty41rb

[1] https://github.com/apache/mesos/blob/0.21.1/src/docker/docker.cpp#L804


> subprocess fail to process "docker inspect" output if >64KB
> ---
>
> Key: MESOS-3724
> URL: https://issues.apache.org/jira/browse/MESOS-3724
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.21.1
>Reporter: Bhuvan Arumugam
>
> When running a task with docker and if {{docker inspect}} output size is more 
> than 64k, it fails. The command {{docker inspect}} is blocked. The task 
> remain in ASSIGNED state and after 15mins, the task is KILLED. The subprocess 
> library [1] used in mesos to run this command is not handling the output 
> beyond this size.
> {code}
> docker pull livecipher/mesos-docker-inspect-64k:1.0 > /dev/null && docker 
> inspect livecipher/mesos-docker-inspect-64k:1.0 > inspect.out && du -sh 
> inspect.out && rm -f inspect.out
>  76K  inspect.out
> {code}
> You can reproduce it using the above image with any framework. I tested it 
> with aurora.
> Here is a sample failure: http://pastebin.com/w1Ty41rb
> [1] https://github.com/apache/mesos/blob/0.21.1/src/docker/docker.cpp#L804



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3724) subprocess fail to process "docker inspect" output if >64KB

2015-10-13 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-3724:
---
Description: 
When running a task with docker and if {{docker inspect}} output size is more 
than 64k, it fails. The {{docker inspect}} The subprocess library [1] used in 
mesos to run this command is not handling the output beyond this size.

{code}
docker pull livecipher/mesos-docker-inspect-64k:1.0 > /dev/null && docker 
inspect livecipher/mesos-docker-inspect-64k:1.0 > inspect.out && du -sh 
inspect.out && rm -f inspect.out
 76Kinspect.out
{code}

You can reproduce it using the above image with any framework. I tested it with 
aurora.

Here is a sample failure: http://pastebin.com/w1Ty41rb

[1] https://github.com/apache/mesos/blob/0.21.1/src/docker/docker.cpp#L804

  was:
When running a task with docker and if {{docker inspect}} output size is more 
than 64k, it fails. The {{docker inspect}} The subprocess library [1] used in 
mesos to run this command is not handling the output beyond this size.

{code}
docker pull livecipher/mesos-docker-inspect-64k:1.0 > /dev/null && docker 
inspect livecipher/mesos-docker-inspect-64k:1.0 > inspect.out && du -sh 
inspect.out && rm -f inspect.out
 76Kinspect.out
{code}

You can reproduce it using the above image with any framework. I tested it with 
aurora.

Here is a sample failure: http://pastebin.com/CPmZy25z

[1] https://github.com/apache/mesos/blob/0.21.1/src/docker/docker.cpp#L804


> subprocess fail to process "docker inspect" output if >64KB
> ---
>
> Key: MESOS-3724
> URL: https://issues.apache.org/jira/browse/MESOS-3724
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.21.1
>Reporter: Bhuvan Arumugam
>
> When running a task with docker and if {{docker inspect}} output size is more 
> than 64k, it fails. The {{docker inspect}} The subprocess library [1] used in 
> mesos to run this command is not handling the output beyond this size.
> {code}
> docker pull livecipher/mesos-docker-inspect-64k:1.0 > /dev/null && docker 
> inspect livecipher/mesos-docker-inspect-64k:1.0 > inspect.out && du -sh 
> inspect.out && rm -f inspect.out
>  76K  inspect.out
> {code}
> You can reproduce it using the above image with any framework. I tested it 
> with aurora.
> Here is a sample failure: http://pastebin.com/w1Ty41rb
> [1] https://github.com/apache/mesos/blob/0.21.1/src/docker/docker.cpp#L804



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3724) subprocess fail to process "docker inspect" output if >64KB

2015-10-13 Thread Bhuvan Arumugam (JIRA)
Bhuvan Arumugam created MESOS-3724:
--

 Summary: subprocess fail to process "docker inspect" output if 
>64KB
 Key: MESOS-3724
 URL: https://issues.apache.org/jira/browse/MESOS-3724
 Project: Mesos
  Issue Type: Bug
  Components: slave
Affects Versions: 0.21.1
Reporter: Bhuvan Arumugam


When running a task with docker and if {{docker inspect}} output size is more 
than 64k, it fails. The {{docker inspect}} The subprocess library [1] used in 
mesos to run this command is not handling the output beyond this size.

{code}
docker pull livecipher/mesos-docker-inspect-64k:1.0 > /dev/null && docker 
inspect livecipher/mesos-docker-inspect-64k:1.0 > inspect.out && du -sh 
inspect.out && rm -f inspect.out
 76Kinspect.out
{code}

You can reproduce it using the above image with any framework. I tested it with 
aurora.

Here is a sample failure: http://pastebin.com/CPmZy25z

[1] https://github.com/apache/mesos/blob/0.21.1/src/docker/docker.cpp#L804



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3560) JSON-based credential files do not work correctly

2015-10-13 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955799#comment-14955799
 ] 

Michael Park commented on MESOS-3560:
-

{noformat}
commit a21d41f136e5000ea6ac2fbeace738579ce6df55
Author: Isabel Jimenez 
Date:   Tue Oct 13 20:26:26 2015 +0200

Changed secret field in V1 `Credential` from `bytes` to `string`

Review: https://reviews.apache.org/r/39099
{noformat}

> JSON-based credential files do not work correctly
> -
>
> Key: MESOS-3560
> URL: https://issues.apache.org/jira/browse/MESOS-3560
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Michael Park
>Assignee: Isabel Jimenez
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> Specifying the following credentials file:
> {code}
> {
>   “credentials”: [
> {
>   “principal”: “user”,
>   “secret”: “password”
> }
>   ]
> }
> {code}
> Then hitting a master endpoint with:
> {code}
> curl -i -u “user:password” ...
> {code}
> Does not work. This is contrary to the text-based credentials file which 
> works:
> {code}
> user password
> {code}
> Currently, the password in a JSON-based credentials file needs to be 
> base64-encoded in order for it to work:
> {code}
> {
>   “credentials”: [
> {
>   “principal”: “user”,
>   “secret”: “cGFzc3dvcmQ=”
> }
>   ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3271) SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.

2015-10-13 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955774#comment-14955774
 ] 

Benjamin Mahler commented on MESOS-3271:


[~jvanremoortere] this looks to be a but in the libevent integration?

{noformat}
[err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
0x2, fd: 21, flags: 0x80)
{noformat}

> SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.
> ---
>
> Key: MESOS-3271
> URL: https://issues.apache.org/jira/browse/MESOS-3271
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Paul Brett
> Attachments: build.txt
>
>
> Test failure on Ubuntu 14 configured with {{--disable-java --disable-python 
> --enable-ssl --enable-libevent --enable-optimize --enable-network-isolation}}
> Commit: {{9b78b301469667b5a44f0a351de5f3a71edae499}}
> {code}
> [ RUN  ] SlaveRecoveryTest/0.NonCheckpointingFramework
> I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0
> I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 
> 20150815-064146-544909504-51064-12195-S0
> Registered executor on slave1-ubuntu12
> Starting task 044bd49e-2f38-4671-802a-ac6524d61a85
> Forked command at 17114
> sh -c 'sleep 1000'
> [err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
> 0x2, fd: 21, flags: 0x80)
> *** Aborted at 1439646107 (unix time) try "date -d @1439646107" if you are 
> using GNU date ***
> PC: @ 0x7f6ba512d0d5 (unknown)
> *** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 
> 12195; stack trace: ***
> @ 0x7f6ba54c4cb0 (unknown)
> @ 0x7f6ba512d0d5 (unknown)
> @ 0x7f6ba513083b (unknown)
> @ 0x7f6ba448e1ba (unknown)
> @ 0x7f6ba448e52b (unknown)
> @ 0x7f6ba447dcc9 (unknown)
> @   0x4c4033 process::internal::run<>()
> @ 0x7f6ba72642ab process::Future<>::discard()
> @ 0x7f6ba72643be process::internal::discard<>()
> @ 0x7f6ba7262298 
> _ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data
> @   0x4c4033 process::internal::run<>()
> @   0x6fa0cb process::Future<>::discard()
> @ 0x7f6ba6fb5736 cgroups::event::Listener::finalize()
> @ 0x7f6ba728fb11 process::ProcessManager::resume()
> @ 0x7f6ba728fe0f process::internal::schedule()
> @ 0x7f6ba5c9d490 (unknown)
> @ 0x7f6ba54bce9a start_thread
> @ 0x7f6ba51ea38d (unknown)
> + /bin/true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3104) Add an endpoint that exposes component flags.

2015-10-13 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-3104:
---
Shepherd: Benjamin Mahler

> Add an endpoint that exposes component flags.
> -
>
> Key: MESOS-3104
> URL: https://issues.apache.org/jira/browse/MESOS-3104
> Project: Mesos
>  Issue Type: Task
>Reporter: David Robinson
>Assignee: haosdent
>Priority: Minor
>  Labels: twitter
>
> Apparently there's an ongoing effort to break /state.json apart into separate 
> endpoints. As part of this effort it would be great if an endpoint was 
> created that only exposed the flags. Configuration management tools could use 
> the endpoint to determine whether the master/slave is correctly configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1563) Failed to configure on FreeBSD

2015-10-13 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955680#comment-14955680
 ] 

Ian Downes commented on MESOS-1563:
---

Sure, put it up for review now so we can make early comments and iterate. 
Please include the {{make check}} output in the "testing" section.

At one point I had a working branch for FreeBSD 9.X so I'll dredge that out for 
comparison. 

> Failed to configure on FreeBSD
> --
>
> Key: MESOS-1563
> URL: https://issues.apache.org/jira/browse/MESOS-1563
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
> Environment: FreeBSD-10/stable
>Reporter: Dmitry Sivachenko
>
> When trying to configure mesos on FreeBSD, I get the following error:
> configure: Setting up build environment for x86_64 freebsd10.0
> configure: error: "Mesos is currently unsupported on your platform."
> Why? Is there anything really Linux-specific inside? It's written in Java 
> after all.
> And MacOS is supported, but it is rather close to FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1563) Failed to configure on FreeBSD

2015-10-13 Thread David Forsythe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955670#comment-14955670
 ] 

David Forsythe commented on MESOS-1563:
---

Great.  Is it alright to submit if make check isn't completely passing on 
FreeBSD?

I haven't made a port because I don't know how much actually works (I haven't 
gotten through a complete check run without disabling some tests that probably 
shouldn't be disabled, and some output makes me think that certain tests aren't 
actually working).

> Failed to configure on FreeBSD
> --
>
> Key: MESOS-1563
> URL: https://issues.apache.org/jira/browse/MESOS-1563
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
> Environment: FreeBSD-10/stable
>Reporter: Dmitry Sivachenko
>
> When trying to configure mesos on FreeBSD, I get the following error:
> configure: Setting up build environment for x86_64 freebsd10.0
> configure: error: "Mesos is currently unsupported on your platform."
> Why? Is there anything really Linux-specific inside? It's written in Java 
> after all.
> And MacOS is supported, but it is rather close to FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3468) Improve apply_reviews.sh script to apply chain of reviews

2015-10-13 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3468:
-
Shepherd: Vinod Kone

> Improve apply_reviews.sh script to apply chain of reviews
> -
>
> Key: MESOS-3468
> URL: https://issues.apache.org/jira/browse/MESOS-3468
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Artem Harutyunyan
>  Labels: mesosphere
>
> Currently the support/apply-review.sh script allows an user (typically 
> committer) to apply a single review on top the HEAD. Since Mesos contributors 
> typically submit a chain of reviews for a given issue it makes sense for the 
> script to apply the whole chain recursively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3722) Authenticate quota requests

2015-10-13 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3722:

Shepherd: Benjamin Hindman

> Authenticate quota requests
> ---
>
> Key: MESOS-3722
> URL: https://issues.apache.org/jira/browse/MESOS-3722
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> Quota requests need to be authenticated.
> This ticket will authenticate quota requests using credentials provided by 
> the `Authorization` field of the HTTP request. This is similar to how 
> authentication is implemented in `Master::Http`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3723) Authorize quota requests

2015-10-13 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3723:

Shepherd: Benjamin Hindman

> Authorize quota requests
> 
>
> Key: MESOS-3723
> URL: https://issues.apache.org/jira/browse/MESOS-3723
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> When quotas are requested they should authorize their roles.
> This ticket will authorize quota requests with ACLs. The existing 
> authorization support that has been implemented in MESOS-1342 will be 
> extended to add a `request_quotas` ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3199) Validate Quota Requests.

2015-10-13 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-3199:
---
Issue Type: Task  (was: Epic)

> Validate Quota Requests.
> 
>
> Key: MESOS-3199
> URL: https://issues.apache.org/jira/browse/MESOS-3199
> Project: Mesos
>  Issue Type: Task
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> We need to validate quota requests in terms of syntactical and semantical 
> correctness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3199) Validate Quota Requests.

2015-10-13 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-3199:
---
Epic Name: Quota  (was: MESOS-1791)

> Validate Quota Requests.
> 
>
> Key: MESOS-3199
> URL: https://issues.apache.org/jira/browse/MESOS-3199
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> We need to validate quota requests in terms of syntactical and semantical 
> correctness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3199) Validate Quota Requests.

2015-10-13 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-3199:
---
Description: We need to validate quota requests in terms of syntactical and 
semantical correctness.  (was: We need to validate quota requests in terms of 
syntax correctness, update Master bookkeeping structures, and persist quota 
requests in the {{Registry}}.)

> Validate Quota Requests.
> 
>
> Key: MESOS-3199
> URL: https://issues.apache.org/jira/browse/MESOS-3199
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> We need to validate quota requests in terms of syntactical and semantical 
> correctness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3199) Validate Quota Requests.

2015-10-13 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-3199:
---
Epic Name: MESOS-1791  (was: Quota)

> Validate Quota Requests.
> 
>
> Key: MESOS-3199
> URL: https://issues.apache.org/jira/browse/MESOS-3199
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> We need to validate quota requests in terms of syntax correctness, update 
> Master bookkeeping structures, and persist quota requests in the {{Registry}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3199) Validate Quota Requests.

2015-10-13 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-3199:
---
Epic Name: Quota  (was: Quota Vaildation)

> Validate Quota Requests.
> 
>
> Key: MESOS-3199
> URL: https://issues.apache.org/jira/browse/MESOS-3199
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> We need to validate quota requests in terms of syntax correctness, update 
> Master bookkeeping structures, and persist quota requests in the {{Registry}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3074) Check satisfiability of quota requests in Master

2015-10-13 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-3074:
---
Description: 
We need to to validate quota requests in the Mesos Master as outlined in the 
Design Doc: 
https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I

This ticket aims to validate satisfiability (in terms of available resources) 
of a quota request using a heuristic algorithm in the Mesos Master, rather than 
validating the syntax of the request.

  was:
We need to to validate and quota requests in the Mesos Master as outlined in 
the Design Doc: 
https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I

This ticket aims to validate satisfiability (in terms of available resources) 
of a quota request using a heuristic algorithm in the Mesos Master, rather than 
validating the syntax of the request.


> Check satisfiability of quota requests in Master
> 
>
> Key: MESOS-3074
> URL: https://issues.apache.org/jira/browse/MESOS-3074
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Joerg Schad
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> We need to to validate quota requests in the Mesos Master as outlined in the 
> Design Doc: 
> https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I
> This ticket aims to validate satisfiability (in terms of available resources) 
> of a quota request using a heuristic algorithm in the Mesos Master, rather 
> than validating the syntax of the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3723) Authorize quota requests

2015-10-13 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-3723:
---

 Summary: Authorize quota requests
 Key: MESOS-3723
 URL: https://issues.apache.org/jira/browse/MESOS-3723
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Jan Schlicht
Assignee: Jan Schlicht


When quotas are requested they should authorize their roles.

This ticket will authorize quota requests with ACLs. The existing authorization 
support that has been implemented in MESOS-1342 will be extended to add a 
`request_quotas` ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3608) optionally install test binaries

2015-10-13 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3608:
--
Shepherd: Till Toenshoff

> optionally install test binaries
> 
>
> Key: MESOS-3608
> URL: https://issues.apache.org/jira/browse/MESOS-3608
> Project: Mesos
>  Issue Type: Improvement
>  Components: build, test
>Reporter: James Peach
>Priority: Minor
>
> Many of the tests in Mesos could be described as integration tests, since 
> they have external dependencies on kernel features, installed tools, 
> permissions, etc. I'd like to be able to generate a {{mesos-tests}} RPM along 
> with my {{mesos}} RPM so that I can run the same tests in different 
> deployment environments.
> I propose a new configuration option named {{--enable-test-tools}} that will 
> install the tests into {{libexec/mesos/tests}}. I'll also need to make some 
> minor changes to tests so that helper tools can be found in this location as 
> well as in the build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3554) Allocator changes trigger large re-compiles

2015-10-13 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955402#comment-14955402
 ] 

Benjamin Mahler commented on MESOS-3554:


For reference, my comment about the dependencies is on 
https://reviews.apache.org/r/38869/

> Allocator changes trigger large re-compiles
> ---
>
> Key: MESOS-3554
> URL: https://issues.apache.org/jira/browse/MESOS-3554
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> Due to the templatized nature of the allocator, even small changes trigger 
> large recompiles of the code-base. This make iterating on changes expensive 
> for developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3600) unable to build with non-default protobuf

2015-10-13 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3600:
--
Shepherd: Till Toenshoff

> unable to build with non-default protobuf
> -
>
> Key: MESOS-3600
> URL: https://issues.apache.org/jira/browse/MESOS-3600
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: James Peach
>
> If I install a custom protobuf into {{/opt/protobuf}}, I should be able to 
> pass {{--with-protobuf=/opt/protobuf}} to configure the build to use it.
> On OS X, this fails:
> {code}
> ...
> checking google/protobuf/message.h usability... yes
> checking google/protobuf/message.h presence... yes
> checking for google/protobuf/message.h... yes
> checking for _init in -lprotobuf... no
> configure: error: cannot find protobuf
> ---
> You have requested the use of a non-bundled protobuf but no suitable
> protobuf could be found.
> You may want specify the location of protobuf by providing a prefix
> path via --with-protobuf=DIR, or check that the path you provided is
> correct if youre already doing this.
> ---
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3554) Allocator changes trigger large re-compiles

2015-10-13 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955398#comment-14955398
 ] 

Benjamin Mahler commented on MESOS-3554:


I don't think we can close this since the main issue behind the large 
re-compilation was that the allocator had leaked into a widespread dependency. 
Even with the patch applied, many changes to the allocator require touching the 
header since the entire class declaration is there. This means we still have 
large re-compiles, which is why I suggested focusing on removing the 
unnecessary dependencies :)

Thanks for doing that change, but IMHO we should close this after any change to 
the allocator (including the declaration) doesn't trigger a large re-compile. 
Sounds reasonable?

> Allocator changes trigger large re-compiles
> ---
>
> Key: MESOS-3554
> URL: https://issues.apache.org/jira/browse/MESOS-3554
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> Due to the templatized nature of the allocator, even small changes trigger 
> large recompiles of the code-base. This make iterating on changes expensive 
> for developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3722) Authenticate quota requests

2015-10-13 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-3722:
---

 Summary: Authenticate quota requests
 Key: MESOS-3722
 URL: https://issues.apache.org/jira/browse/MESOS-3722
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Jan Schlicht
Assignee: Jan Schlicht


Quota requests need to be authenticated.

This ticket will authenticate quota requests using credentials provided by the 
`Authorization` field of the HTTP request. This is similar to how 
authentication is implemented in `Master::Http`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3721) We need the active flag on frameworks in the /state-summary endpoint

2015-10-13 Thread JIRA
Michael Lunøe created MESOS-3721:


 Summary: We need the active flag on frameworks in the 
/state-summary endpoint
 Key: MESOS-3721
 URL: https://issues.apache.org/jira/browse/MESOS-3721
 Project: Mesos
  Issue Type: Improvement
Reporter: Michael Lunøe


We have the active flag on the node objects in mesos /state-summary endpoint, 
but we need it to show inactive frameworks too.
Here is a reference to the definition of active in frameworks in the json 
schema: 
https://gist.github.com/mlunoe/645e2aeaac1fa682ef72#file-state-summary-json-L132




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2927) Update mesos #include headers

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2927:
-

Assignee: (was: Paul Brett)

> Update mesos #include headers
> -
>
> Key: MESOS-2927
> URL: https://issues.apache.org/jira/browse/MESOS-2927
> Project: Mesos
>  Issue Type: Bug
>Reporter: Paul Brett
>
> Update mesos to #include headers for symbols we rely on and reorder to comply 
> with the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2952) Provide user namespaces for privileged access inside containers

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2952:
-

Assignee: (was: Paul Brett)

> Provide user namespaces for privileged access inside containers
> ---
>
> Key: MESOS-2952
> URL: https://issues.apache.org/jira/browse/MESOS-2952
> Project: Mesos
>  Issue Type: Epic
>Reporter: Paul Brett
>
> User namespaces allow per-namespace mappings of user and group IDs. This 
> means that a process's user and group IDs inside a user namespace can be 
> different from its IDs outside of the namespace. Most notably, a process can 
> have a nonzero user ID outside a namespace while at the same time having a 
> user ID of zero inside the namespace; in other words, the process is 
> unprivileged for operations outside the user namespace but has root 
> privileges inside the namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2926) Extend mesos-style.py/cpplint.py to check #include files

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2926:
-

Assignee: (was: Paul Brett)

> Extend mesos-style.py/cpplint.py to check #include files
> 
>
> Key: MESOS-2926
> URL: https://issues.apache.org/jira/browse/MESOS-2926
> Project: Mesos
>  Issue Type: Bug
>Reporter: Paul Brett
>
> cpplint.py provides the capability to enforce the style guide requirements 
> for #including everything you use and ordering files based on type but it 
> does not work for mesos because we do use #include <...> for project files 
> where it expects #include "...".  
> We should update the style checker to support our include usage and then turn 
> it on by default in the commit hook.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2853) Report per-container metrics from host egress filter

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2853:
-

Assignee: (was: Paul Brett)

> Report per-container metrics from host egress filter
> 
>
> Key: MESOS-2853
> URL: https://issues.apache.org/jira/browse/MESOS-2853
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Paul Brett
>  Labels: twitter
>
> Export in statistics.json the fq_codel flow statistics for each container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2929) Update libprocess #include headers

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2929:
-

Assignee: (was: Paul Brett)

> Update libprocess #include headers
> --
>
> Key: MESOS-2929
> URL: https://issues.apache.org/jira/browse/MESOS-2929
> Project: Mesos
>  Issue Type: Bug
>Reporter: Paul Brett
>
> Update libprocess to #include headers for symbols we rely on and reorder to 
> comply with the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2994) Design doc for creating user namespaces inside containers

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-2994:
-

Assignee: (was: Paul Brett)

> Design doc for creating user namespaces inside containers
> -
>
> Key: MESOS-2994
> URL: https://issues.apache.org/jira/browse/MESOS-2994
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Paul Brett
>  Labels: twitter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1977) Disk Isolator Usage Metrics

2015-10-13 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett reassigned MESOS-1977:
-

Assignee: (was: Paul Brett)

> Disk Isolator Usage  Metrics
> 
>
> Key: MESOS-1977
> URL: https://issues.apache.org/jira/browse/MESOS-1977
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Joris Van Remoortere
>  Labels: mesosphere
>
> Implement just the usage statistics aspect of the block io isolator for the 
> mesos containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3720) Tests for Quota support in master

2015-10-13 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-3720:
--

 Summary: Tests for Quota support in master
 Key: MESOS-3720
 URL: https://issues.apache.org/jira/browse/MESOS-3720
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov


Allocator-agnostic tests for quota support in the master. They can be divided 
into several groups:
* Request validation;
* Satisfiability validation;
* Master failover;
* Persisting in the registry;
* Functionality and quota guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-191) Add support for multiple disk resources

2015-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-191:
-
Shepherd: Jie Yu

> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1563) Failed to configure on FreeBSD

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-1563:
--
Shepherd: Ian Downes

> Failed to configure on FreeBSD
> --
>
> Key: MESOS-1563
> URL: https://issues.apache.org/jira/browse/MESOS-1563
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
> Environment: FreeBSD-10/stable
>Reporter: Dmitry Sivachenko
>
> When trying to configure mesos on FreeBSD, I get the following error:
> configure: Setting up build environment for x86_64 freebsd10.0
> configure: error: "Mesos is currently unsupported on your platform."
> Why? Is there anything really Linux-specific inside? It's written in Java 
> after all.
> And MacOS is supported, but it is rather close to FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-191) Add support for multiple disk resources

2015-10-13 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955323#comment-14955323
 ] 

Jie Yu commented on MESOS-191:
--

Sounds good! I'll think about the architecture today and post a high level 
solution for discussion. We can then create a design doc together with details.

> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-191) Add support for multiple disk resources

2015-10-13 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955318#comment-14955318
 ] 

Adam B commented on MESOS-191:
--

+1 I think Jie would make an excellent shepherd.

> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1582) Improve build time.

2015-10-13 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955313#comment-14955313
 ] 

James Peach commented on MESOS-1582:


Here's my plan ... add build support for generating a file containing 
wall-clock build time for each file. Taking the longest compilation times, use 
{{-ftime-report}} with clang and gcc to figure out where the time is being 
spent. Rinse and repeat.

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-191) Add support for multiple disk resources

2015-10-13 Thread David Greenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955312#comment-14955312
 ] 

David Greenberg commented on MESOS-191:
---

[~jieyu] I'd also like to be involved in the continued work on the ticket, but 
this is my first time contributing to Mesos core.

> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3696) Add the possibility to specify filesystems for available resources on slaves

2015-10-13 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955300#comment-14955300
 ] 

Adam B commented on MESOS-3696:
---

[~gbellon], Mesos doesn't support multiple disk resources explicitly yet. We're 
finally starting to move on MESOS-191, a long-standing issue for just that.

Resolving this issue as a duplicate. Please reopen if you disagree.

> Add the possibility to specify filesystems for available resources on slaves
> 
>
> Key: MESOS-3696
> URL: https://issues.apache.org/jira/browse/MESOS-3696
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave, webui
>Affects Versions: 0.24.1
> Environment: Debian 8.2
>Reporter: Grégoire Bellon-Gervais
>
> Hello,
> I'm under debian 8.2, I have installed mesos using mesosphere repository.
> I have 3 slaves servers which have all filesystems built like this :
> - RAID 1 for / with disk size : 29 GB 
> - RAID 5 for /data with disk size : 492 GB
> {quote}
> root@my_server:~# df -hT
> Filesystem Type  Size  Used Avail Use% Mounted on
> /dev/md1   ext4   29G  1.9G   26G   7% /
> udev   devtmpfs   10M 0   10M   0% /dev
> tmpfs  tmpfs 6.3G   97M  6.2G   2% /run
> tmpfs  tmpfs  16G 0   16G   0% /dev/shm
> tmpfs  tmpfs 5.0M 0  5.0M   0% /run/lock
> tmpfs  tmpfs  16G 0   16G   0% /sys/fs/cgroup
> /dev/md5   ext4  492G   70M  467G   1% /data
> {quote}
> But in webui, the slaves show only the disk availability for / filesystem not 
> /data
> So, in available resources, I have only 30 GB for disk.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1563) Failed to configure on FreeBSD

2015-10-13 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955298#comment-14955298
 ] 

Ian Downes commented on MESOS-1563:
---

I can shepherd this. Please submit a review request to reviewboard and include 
me as a reviewer.

Have you also written a FreeBSD port?

> Failed to configure on FreeBSD
> --
>
> Key: MESOS-1563
> URL: https://issues.apache.org/jira/browse/MESOS-1563
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
> Environment: FreeBSD-10/stable
>Reporter: Dmitry Sivachenko
>
> When trying to configure mesos on FreeBSD, I get the following error:
> configure: Setting up build environment for x86_64 freebsd10.0
> configure: error: "Mesos is currently unsupported on your platform."
> Why? Is there anything really Linux-specific inside? It's written in Java 
> after all.
> And MacOS is supported, but it is rather close to FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-191) Add support for multiple disk resources

2015-10-13 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955297#comment-14955297
 ] 

Jie Yu commented on MESOS-191:
--

I would like to shepherd this ticket if there's no objection.

> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-191) Add support for multiple disk resources

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-191:
-
Summary: Add support for multiple disk resources  (was: Add support for 
disk spindles in resources)

> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3711) Docker containers running as other than root can't access sandbox

2015-10-13 Thread Vasilis Vasaitis (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955179#comment-14955179
 ] 

Vasilis Vasaitis commented on MESOS-3711:
-

Yes, the Mesos agent itself in our setup is typically run as root. And also 
yes, our agent hosts don't really have any user information (they don't 
synchronize with LDAP or anything like that) so there is no expectation that 
the specified user will exist on the host side of the agent. And even if they 
did, the idea here is to shield the creator of the Docker image from the 
configuration of the host side of the agent: the image is created with some 
user inside its own {{/etc/passwd}}, and with various files owned by that user; 
if that username is then baked into the image, and the execution environment 
supports that, then the container can run blissfully unaware of the user setup 
on the host side of the agent. And I'm under the impression that the only 
obstacle that is right now preventing such a setup from working is the 
ownership/permissions of the top-level sandbox, as described above.

Also, I'm not sure if Aurora specifies a user anyway the proper Mesos way, or 
if passes that information out-of-band to the Thermos executor via their own 
structures. But in any case having that user information propagated properly is 
kinda orthogonal to the use case I'm describing, because the whole point is to 
make the Docker image self-contained and _independent_ of the user who will be 
running the task, so that the running container will exhibit the same behaviour 
regardless of which user is launching it.

Am I making much sense? Do you think that's a reasonable use case worth 
enabling?

> Docker containers running as other than root can't access sandbox
> -
>
> Key: MESOS-3711
> URL: https://issues.apache.org/jira/browse/MESOS-3711
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Reporter: Vasilis Vasaitis
>
> (Disclaimer: I'm not the one running the Mesos infrastructure in my org, and 
> I don't necessarily fully understand how all the moving parts fit together, 
> so please bear with me if there any gaps in my understanding of the issues at 
> hand.)
> We have a setup here where we deploy Docker-based tasks on Mesos, using 
> Aurora (and thus the Thermos executor, on the agent side). As part of the 
> process of launching a task, it looks like the Mesos agent creates / 
> volume-mounts an {{/mnt/mesos/sandbox}} directory, which is what's used as 
> the task's sandbox. Thermos then creates a {{sandbox}} subdirectory _inside_ 
> that, and the aggregate {{/mnt/mesos/sandbox/sandbox}} is in fact the 
> directory that the user application is given. So far so good.
> Now, Docker has the option, during the creation of a Docker image, to specify 
> the _user_ that any container launched using this image will be run as. This 
> is a useful feature, because often the image is built so that only one 
> particular user has ownership of important files etc. One could of course 
> sidestep this issue by always launching the container as root, but that can 
> be unsavoury for its own reasons.
> However, with the setup I described above, specifying a user for the Docker 
> container quickly goes south: the Thermos executor itself is launched as that 
> user, tries to create that extra {{sandbox}} directory, and fails, because 
> the parent directory is owned by root.
> I won't claim to know whether this is the _best_ approach, but one possible 
> solution to this problem is to chmod 1777 the parent sandbox directory (i.e., 
> set the sticky bit, like {{/tmp}}) after creating it; this way any user can 
> create files/directories under it, without compromising the isolation between 
> users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3719) Core dump on /teardown

2015-10-13 Thread Ken Sipe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Sipe updated MESOS-3719:

Description: 
invoked a `/master/teardown` for 2 frameworks.  sample invocation (on the 
master node using mesos-dns) is:  
{code}curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST 
http://master.mesos:5050/master/teardown{code}


logs at the master:

{code}
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.677880  1525 http.cpp:321] HTTP POST for /master/teardown from 
10.0.4.90:53789 with User-Agent='curl/7.42.1'
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679015  1525 master.cpp:5112] Removing framework 
20151013-143739-1510211594-5050-1515-0002 (hdfs) at 
scheduler-a5388720-fcbf-4fd0-b01e-75712b12c99d@10.0.4.90:53903
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679280  1525 master.cpp:5576] Updating the latest state of task 
task.journalnode.journalnode.NodeExecutor.1444747955695 of framework 
20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679385  1525 master.cpp:5644] Removing task 
task.journalnode.journalnode.NodeExecutor.1444747955695 with resources 
cpus(*):0.25; mem(*):691.2 of framework 
20151013-143739-1510211594-5050-1515-0002 on
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679404  1521 hierarchical.hpp:814] Recovered cpus(*):0.25; 
mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679536  1525 master.cpp:5673] Removing executor 
'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; 
mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679641  1521 hierarchical.hpp:814] Recovered cpus(*):0.1; 
mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
F1013 15:36:24.679719  1521 sorter.cpp:213] Check failed: 
total.resources.contains(slaveId)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: *** 
Check failure stack trace: ***
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86c9fd  google::LogMessage::Fail()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86e89d  google::LogMessage::SendToLog()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86c5ec  google::LogMessage::Flush()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86f1be  google::LogMessageFatal::~LogMessageFatal()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e3f2910  mesos::internal::master::allocator::DRFSorter::remove()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e2cc0bc  
mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e977551  process::ProcessManager::resume()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e97784f  process::internal::schedule()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.684733  1520 http.cpp:321] HTTP GET for /master/state-summary 
from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4'
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d30ebc3  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6cb1266c  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6c8552ed  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Unit entered failed state.
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Failed with result 'signal'.
Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Service hold-off time over, scheduling restart.
Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal system

[jira] [Updated] (MESOS-3719) Core dump on /teardown

2015-10-13 Thread Ken Sipe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Sipe updated MESOS-3719:

Description: 
invoked a `/master/teardown` for 2 frameworks.  sample invocation (on the 
master node using mesos-dns) is:  
`curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST 
http://master.mesos:5050/master/teardown`


logs at the master:

{code}
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.677880  1525 http.cpp:321] HTTP POST for /master/teardown from 
10.0.4.90:53789 with User-Agent='curl/7.42.1'
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679015  1525 master.cpp:5112] Removing framework 
20151013-143739-1510211594-5050-1515-0002 (hdfs) at 
scheduler-a5388720-fcbf-4fd0-b01e-75712b12c99d@10.0.4.90:53903
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679280  1525 master.cpp:5576] Updating the latest state of task 
task.journalnode.journalnode.NodeExecutor.1444747955695 of framework 
20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679385  1525 master.cpp:5644] Removing task 
task.journalnode.journalnode.NodeExecutor.1444747955695 with resources 
cpus(*):0.25; mem(*):691.2 of framework 
20151013-143739-1510211594-5050-1515-0002 on
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679404  1521 hierarchical.hpp:814] Recovered cpus(*):0.25; 
mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679536  1525 master.cpp:5673] Removing executor 
'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; 
mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679641  1521 hierarchical.hpp:814] Recovered cpus(*):0.1; 
mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
F1013 15:36:24.679719  1521 sorter.cpp:213] Check failed: 
total.resources.contains(slaveId)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: *** 
Check failure stack trace: ***
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86c9fd  google::LogMessage::Fail()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86e89d  google::LogMessage::SendToLog()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86c5ec  google::LogMessage::Flush()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86f1be  google::LogMessageFatal::~LogMessageFatal()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e3f2910  mesos::internal::master::allocator::DRFSorter::remove()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e2cc0bc  
mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e977551  process::ProcessManager::resume()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e97784f  process::internal::schedule()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.684733  1520 http.cpp:321] HTTP GET for /master/state-summary 
from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4'
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d30ebc3  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6cb1266c  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6c8552ed  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Unit entered failed state.
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Failed with result 'signal'.
Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Service hold-off time over, scheduling restart.
Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: Sta

[jira] [Commented] (MESOS-3719) Core dump on /teardown

2015-10-13 Thread Ken Sipe (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955167#comment-14955167
 ] 

Ken Sipe commented on MESOS-3719:
-

I'll see if I can get exact steps to duplicate.

> Core dump on /teardown
> --
>
> Key: MESOS-3719
> URL: https://issues.apache.org/jira/browse/MESOS-3719
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.24.1
>Reporter: Ken Sipe
>
> invoked a `/master/teardown` for 2 frameworks.  sample invocation (on the 
> master node using mesos-dns) is:  
> {code}curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST 
> http://master.mesos:5050/master/teardown{code}
> logs at the master:
> {code}
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.677880  1525 http.cpp:321] HTTP POST for /master/teardown from 
> 10.0.4.90:53789 with User-Agent='curl/7.42.1'
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679015  1525 master.cpp:5112] Removing framework 
> 20151013-143739-1510211594-5050-1515-0002 (hdfs) at 
> scheduler-a5388720-fcbf-4fd0-b01e-75712b12c99d@10.0.4.90:53903
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679280  1525 master.cpp:5576] Updating the latest state of 
> task task.journalnode.journalnode.NodeExecutor.1444747955695 of framework 
> 20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679385  1525 master.cpp:5644] Removing task 
> task.journalnode.journalnode.NodeExecutor.1444747955695 with resources 
> cpus(*):0.25; mem(*):691.2 of framework 
> 20151013-143739-1510211594-5050-1515-0002 on
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679404  1521 hierarchical.hpp:814] Recovered cpus(*):0.25; 
> mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
> 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679536  1525 master.cpp:5673] Removing executor 
> 'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; 
> mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679641  1521 hierarchical.hpp:814] Recovered cpus(*):0.1; 
> mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
> 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> F1013 15:36:24.679719  1521 sorter.cpp:213] Check failed: 
> total.resources.contains(slaveId)
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> *** Check failure stack trace: ***
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
> 0x7fba6d86c9fd  google::LogMessage::Fail()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
> 0x7fba6d86e89d  google::LogMessage::SendToLog()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
> 0x7fba6d86c5ec  google::LogMessage::Flush()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
> 0x7fba6d86f1be  google::LogMessageFatal::~LogMessageFatal()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
> 0x7fba6e3f2910  mesos::internal::master::allocator::DRFSorter::remove()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
> 0x7fba6e2cc0bc  
> mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
> 0x7fba6e977551  process::ProcessManager::resume()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
> 0x7fba6e97784f  process::internal::schedule()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.684733  1520 http.cpp:321] HTTP GET for /master/state-summary 
> from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4'
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
> 0x7fba6d30ebc3  (unknown)
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>

[jira] [Created] (MESOS-3719) Core dump on /teardown

2015-10-13 Thread Ken Sipe (JIRA)
Ken Sipe created MESOS-3719:
---

 Summary: Core dump on /teardown
 Key: MESOS-3719
 URL: https://issues.apache.org/jira/browse/MESOS-3719
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.24.1
Reporter: Ken Sipe


invoked a `/master/teardown` for 2 frameworks.  sample invocation (on the 
master node using mesos-dns) is:  
`curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST 
http://master.mesos:5050/master/teardown`


logs at the master:

```
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.677880  1525 http.cpp:321] HTTP POST for /master/teardown from 
10.0.4.90:53789 with User-Agent='curl/7.42.1'
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679015  1525 master.cpp:5112] Removing framework 
20151013-143739-1510211594-5050-1515-0002 (hdfs) at 
scheduler-a5388720-fcbf-4fd0-b01e-75712b12c99d@10.0.4.90:53903
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679280  1525 master.cpp:5576] Updating the latest state of task 
task.journalnode.journalnode.NodeExecutor.1444747955695 of framework 
20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679385  1525 master.cpp:5644] Removing task 
task.journalnode.journalnode.NodeExecutor.1444747955695 with resources 
cpus(*):0.25; mem(*):691.2 of framework 
20151013-143739-1510211594-5050-1515-0002 on
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679404  1521 hierarchical.hpp:814] Recovered cpus(*):0.25; 
mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679536  1525 master.cpp:5673] Removing executor 
'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; 
mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679641  1521 hierarchical.hpp:814] Recovered cpus(*):0.1; 
mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
F1013 15:36:24.679719  1521 sorter.cpp:213] Check failed: 
total.resources.contains(slaveId)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: *** 
Check failure stack trace: ***
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86c9fd  google::LogMessage::Fail()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86e89d  google::LogMessage::SendToLog()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86c5ec  google::LogMessage::Flush()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86f1be  google::LogMessageFatal::~LogMessageFatal()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e3f2910  mesos::internal::master::allocator::DRFSorter::remove()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e2cc0bc  
mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e977551  process::ProcessManager::resume()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e97784f  process::internal::schedule()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.684733  1520 http.cpp:321] HTTP GET for /master/state-summary 
from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4'
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d30ebc3  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6cb1266c  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6c8552ed  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Unit entered failed state.
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Failed with result 'signal'.
Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]

[jira] [Commented] (MESOS-3554) Allocator changes trigger large re-compiles

2015-10-13 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955140#comment-14955140
 ] 

Joris Van Remoortere commented on MESOS-3554:
-

The latter.
As we have a few working groups working on allocator-related projects, it would 
be great if the burden of adding a diagnostic or trying things out was low.

There are 2 approaches to resolving this: reducing the inclusion of this header 
file, or making this header file cheap to include.
We chose the latter as the former can creep back up over time.

> Allocator changes trigger large re-compiles
> ---
>
> Key: MESOS-3554
> URL: https://issues.apache.org/jira/browse/MESOS-3554
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> Due to the templatized nature of the allocator, even small changes trigger 
> large recompiles of the code-base. This make iterating on changes expensive 
> for developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1582) Improve build time.

2015-10-13 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955125#comment-14955125
 ] 

James Peach edited comment on MESOS-1582 at 10/13/15 3:31 PM:
--

One of the worst offenders is the test suite. I experimented a bit with gmock 
[recomendations|https://github.com/google/googletest/blob/master/googlemock/docs/CookBook.md#making-the-compilation-faster]
 but did not measure any improvement.

I'd be happy to take a crack at improving this provided there was a shepherd.


was (Author: jamespeach):
One of the worst offenders is the test suite. I experimented a bit with mock 
[recomendations|https://github.com/google/googletest/blob/master/googlemock/docs/CookBook.md#making-the-compilation-faster]
 but did not measure any improvement.

I'd be happy to take a crack at improving this provided there was a shepherd.

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1582) Improve build time.

2015-10-13 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955125#comment-14955125
 ] 

James Peach commented on MESOS-1582:


One of the worst offenders is the test suite. I experimented a bit with mock 
[recomendations|https://github.com/google/googletest/blob/master/googlemock/docs/CookBook.md#making-the-compilation-faster]
 but did not measure any improvement.

I'd be happy to take a crack at improving this provided there was a shepherd.

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3705) HTTP Pipelining doesn't keep order of requests

2015-10-13 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3705:
--
  Sprint: Mesosphere Sprint 21
Story Points: 3

> HTTP Pipelining doesn't keep order of requests
> --
>
> Key: MESOS-3705
> URL: https://issues.apache.org/jira/browse/MESOS-3705
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.24.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: http, libprocess, mesosphere
>
> [HTTP 1.1 Pipelining|https://en.wikipedia.org/wiki/HTTP_pipelining] describes 
> a mechanism by which multiple HTTP request can be performed over a single 
> socket. The requirement here is that responses should be send in the same 
> order as requests are being made.
> Libprocess has some mechanisms built in to deal with pipelining when multiple 
> HTTP requests are made, it is still, however, possible to create a situation 
> in which responses are scrambled respected to the requests arrival.
> Consider the situation in which there are two libprocess processes, 
> {{processA}} and {{processB}}, each running in a different thread, 
> {{thread2}} and {{thread3}} respectively. The 
> [{{ProcessManager}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L374]
>  runs in {{thread1}}.
> {{processA}} is of type {{ProcessA}} which looks roughly as follows:
> {code}
> class ProcessA : public ProcessBase
> {
> public:
>   ProcessA() {}
>   Future foo(const http::Request&) {
> // … Do something …
>return http::Ok();
>   }
> protected:
>   virtual void initialize() {
> route("/foo", None(), &ProcessA::foo);
>   }
> }
> {code}
> {{processB}} is from type {{ProcessB}} which is just like {{ProcessA}} but 
> routes {{"bar"}} instead of {{"foo"}}.
> The situation in which the bug arises is the following:
> # Two requests, one for {{"http://server_uri/(1)/foo"}} and one for 
> {{"http://server_uri/(2)//bar"}} are made over the same socket.
> # The first request arrives to 
> [{{ProcessManager::handle}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2202]
>  which is still running in {{thread1}}. This one creates an {{HttpEvent}} and 
> delivers to the handler, in this case {{processA}}.
> # 
> [{{ProcessManager::deliver}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2361]
>  enqueues the HTTP event in to the {{processA}} queue. This happens in 
> {{thread1}}.
> # The second request arrives to 
> [{{ProcessManager::handle}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2202]
>  which is still running in {{thread1}}. Another {{HttpEvent}} is created and 
> delivered to the handler, in this case {{processB}}.
> # 
> [{{ProcessManager::deliver}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2361]
>  enqueues the HTTP event in to the {{processB}} queue. This happens in 
> {{thread1}}.
> # {{Thread2}} is blocked, so {{processA}} cannot handle the first request, it 
> is stuck in the queue.
> # {{Thread3}} is idle, so it picks up the request to {{processB}} immediately.
> # 
> [{{ProcessBase::visit(HttpEvent)}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L3073]
>  is called in {{thread3}}, this one in turn 
> [dispatches|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L3106]
>  the response's future to the {{HttpProxy}} associated with the socket where 
> the request came.
> At the last point, the bug is evident, the request to {{processB}} will be 
> send before the request to {{processA}} even if the handler takes a long time 
> and the {{processA::bar()}} actually finishes before. The responses are not 
> send in the order the requests are done.
> h1. Reproducer
> The following is a test which successfully reproduces the issue:
> {code}
> class PipelineScramblerProcess : public Process
> {
> public:
>   PipelineScramblerProcess()
> : ProcessBase(ID::generate("PipelineScramblerProcess")) {}
>   void block(const Future& trigger)
>   {
> trigger.await();
>   }
>   Future get(const http::Request& request)
>   {
> if (promise_) {
>   promise_->set(Nothing());
> }
> return http::OK(self().id);
>   }
>   void setPromise(std::unique_ptr>& promise)
>   {
> promise_ = std::move(promise);
>   }
> protected:
>   virtual void initialize()
>   {
> route("/get", None(), &PipelineScramblerProcess::get);
>   }
> private:
>   std

[jira] [Commented] (MESOS-191) Add support for disk spindles in resources

2015-10-13 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955042#comment-14955042
 ] 

Guangya Liu commented on MESOS-191:
---

Thanks [~dgreenbean] , it works for me now ;-)

> Add support for disk spindles in resources
> --
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3601) Formalize all headers and metadata for HTTP API Event Stream

2015-10-13 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang updated MESOS-3601:
--
Description: 
>From an HTTP standpoint the current set of headers returned when connecting to 
>the HTTP scheduler API are insufficient. 
{code:title=current headers}
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Date: Wed, 30 Sep 2015 21:07:16 GMT
Content-Type: application/json
{code}

Since the response from mesos is intended to function as a stream {{Connection: 
keep-alive}} should be specified so that the connection can remain open.

If RecordIO is going to be applied to the messages, the headers should include 
the information necessary for a client to be able to detect RecordIO and setup 
it response handlers appropriately.

How RecordIO is expressed will come down to the semantics of what is actually 
"Returned" as the response from {{POST /api/v1/scheduler}}.

h4. Proposal
One approach would be to leverage http as much as possible, having a client 
specify an {{Accept-Encoding}} along with the {{Accept}} header to indicate 
that it can handle RecordIO {{Content-Encoding}} of {{Content-Type}} messages.  
(This approach allows for things like gzip to be woven in fairly easily in the 
future)

For this approach I would expect the following:
{code:title=Request}
POST /api/v1/scheduler HTTP/1.1
Host: localhost:5050
Accept: application/x-protobuf
Accept-Encoding: recordio
Content-Type: application/x-protobuf
Content-Length: 35
User-Agent: RxNetty Client
{code}
{code:title=Response}
HTTP/1.1 200 OK
Connection: keep-alive
Transfer-Encoding: chunked
Content-Type: application/x-protobuf
Content-Encoding: recordio
Cache-Control: no-transform
{code}

When Content-Encoding is used it is recommended to set {{Cache-Control: 
no-transform}} to signal to any proxies that no transformation should be 
applied to the the content encoding [Section 14.11 RFC 
2616|http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11].



  was:
>From and HTTP standpoint the current set of headers returned when connecting 
>to the HTTP scheduler API are insufficient. 
{code:title=current headers}
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Date: Wed, 30 Sep 2015 21:07:16 GMT
Content-Type: application/json
{code}

Since the response from mesos is intended to function as a stream {{Connection: 
keep-alive}} should be specified so that the connection can remain open.

If RecordIO is going to be applied to the messages, the headers should include 
the information necessary for a client to be able to detect RecordIO and setup 
it response handlers appropriately.

How RecordIO is expressed will come down to the semantics of what is actually 
"Returned" as the response from {{POST /api/v1/scheduler}}.

h4. Proposal
One approach would be to leverage http as much as possible, having a client 
specify an {{Accept-Encoding}} along with the {{Accept}} header to indicate 
that it can handle RecordIO {{Content-Encoding}} of {{Content-Type}} messages.  
(This approach allows for things like gzip to be woven in fairly easily in the 
future)

For this approach I would expect the following:
{code:title=Request}
POST /api/v1/scheduler HTTP/1.1
Host: localhost:5050
Accept: application/x-protobuf
Accept-Encoding: recordio
Content-Type: application/x-protobuf
Content-Length: 35
User-Agent: RxNetty Client
{code}
{code:title=Response}
HTTP/1.1 200 OK
Connection: keep-alive
Transfer-Encoding: chunked
Content-Type: application/x-protobuf
Content-Encoding: recordio
Cache-Control: no-transform
{code}

When Content-Encoding is used it is recommended to set {{Cache-Control: 
no-transform}} to signal to any proxies that no transformation should be 
applied to the the content encoding [Section 14.11 RFC 
2616|http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11].




> Formalize all headers and metadata for HTTP API Event Stream
> 
>
> Key: MESOS-3601
> URL: https://issues.apache.org/jira/browse/MESOS-3601
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.24.0
> Environment: Mesos 0.24.0
>Reporter: Ben Whitehead
>  Labels: api, http, wireprotocol
>
> From an HTTP standpoint the current set of headers returned when connecting 
> to the HTTP scheduler API are insufficient. 
> {code:title=current headers}
> HTTP/1.1 200 OK
> Transfer-Encoding: chunked
> Date: Wed, 30 Sep 2015 21:07:16 GMT
> Content-Type: application/json
> {code}
> Since the response from mesos is intended to function as a stream 
> {{Connection: keep-alive}} should be specified so that the connection can 
> remain open.
> If RecordIO is going to be applied to the messages, the headers should 
> include the information necessary for a client to be able to detect RecordIO 
> and setup it response handlers appropriately.
> How RecordIO is

[jira] [Commented] (MESOS-191) Add support for disk spindles in resources

2015-10-13 Thread David Greenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955022#comment-14955022
 ] 

David Greenberg commented on MESOS-191:
---

[~gyliu] sorry about that! I didn't realize that wasn't fully public--I don't 
work at Mesosphere, but I guess since I was invited to the doc, it still needed 
me to click a few more buttons.

> Add support for disk spindles in resources
> --
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-191) Add support for disk spindles in resources

2015-10-13 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954986#comment-14954986
 ] 

Guangya Liu commented on MESOS-191:
---

@David Greenberg can you make the document as a public one?

> Add support for disk spindles in resources
> --
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3199) Validate Quota Requests.

2015-10-13 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3199:
--
Issue Type: Epic  (was: Task)

> Validate Quota Requests.
> 
>
> Key: MESOS-3199
> URL: https://issues.apache.org/jira/browse/MESOS-3199
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> We need to validate quota requests in terms of syntax correctness, update 
> Master bookkeeping structures, and persist quota requests in the {{Registry}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3199) Validate Quota Requests.

2015-10-13 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3199:
--
Epic Name: Quota Vaildation

> Validate Quota Requests.
> 
>
> Key: MESOS-3199
> URL: https://issues.apache.org/jira/browse/MESOS-3199
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> We need to validate quota requests in terms of syntax correctness, update 
> Master bookkeeping structures, and persist quota requests in the {{Registry}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master

2015-10-13 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954935#comment-14954935
 ] 

Yong Qiao Wang commented on MESOS-3338:
---

In optimistic offer design, the dynamic reserved resources will be treated as 
Reserved Resources rather than Used Resources, the Used Resources in that 
design should be the allocated resources.

> Dynamic reservations are not counted as used resources in the master
> 
>
> Key: MESOS-3338
> URL: https://issues.apache.org/jira/browse/MESOS-3338
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, persistent-volumes
>
> Dynamically reserved resources should be considered used or allocated and 
> hence reflected in Mesos bookkeeping structures and {{state.json}}.
> I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the 
> following section:
> {code}
>   // Check that the Master counts the reservation as a used resource.
>   {
> Future response =
>   process::http::get(master.get(), "state.json");
> AWAIT_READY(response);
> Try parse = JSON::parse(response.get().body);
> ASSERT_SOME(parse);
> Result cpus =
>   parse.get().find("slaves[0].used_resources.cpus");
> ASSERT_SOME_EQ(JSON::Number(1), cpus);
>   }
> {code}
> and got
> {noformat}
> ../../../src/tests/reservation_tests.cpp:168: Failure
> Value of: (cpus).get()
>   Actual: 0
> Expected: JSON::Number(1)
> Which is: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2864) Master should not change the state of a terminal task if it receives another terminal update

2015-10-13 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954910#comment-14954910
 ] 

Yong Qiao Wang commented on MESOS-2864:
---

Hi [~vinodkone], I have addressed your comments, any comments for the updated 
code changes.

> Master should not change the state of a terminal task if it receives another 
> terminal update
> 
>
> Key: MESOS-2864
> URL: https://issues.apache.org/jira/browse/MESOS-2864
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Yong Qiao Wang
>
> Currently, when the master receives a terminal update for an already 
> terminated (but unacknowledged) task it changes the state to the latest 
> update. This is confusing because the slave doesn't change the state of the 
> task in such a case. Master should just forward the update without changing 
> the task state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-2864) Master should not change the state of a terminal task if it receives another terminal update

2015-10-13 Thread Yong Qiao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Qiao Wang updated MESOS-2864:
--
Comment: was deleted

(was: Hi [~vinodkone], any comments for the added test?)

> Master should not change the state of a terminal task if it receives another 
> terminal update
> 
>
> Key: MESOS-2864
> URL: https://issues.apache.org/jira/browse/MESOS-2864
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Yong Qiao Wang
>
> Currently, when the master receives a terminal update for an already 
> terminated (but unacknowledged) task it changes the state to the latest 
> update. This is confusing because the slave doesn't change the state of the 
> task in such a case. Master should just forward the update without changing 
> the task state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1832) Slave should accept PingSlaveMessage but not "PING" message.

2015-10-13 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954906#comment-14954906
 ] 

Yong Qiao Wang commented on MESOS-1832:
---

[~vinodkone], 0.25.0 has be released, so this ticket can be fixed now?

> Slave should accept PingSlaveMessage but not "PING" message.
> 
>
> Key: MESOS-1832
> URL: https://issues.apache.org/jira/browse/MESOS-1832
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Yong Qiao Wang
>  Labels: mesosphere
>
> Slave handles both "PING" message and PingSlaveMessage in until 0.22.0 for 
> backwards compatibility (https://reviews.apache.org/r/25867/).
> In 0.23.0, slave no longer needs handle "PING".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3709) Modulize the containerizer interface.

2015-10-13 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3709:

Shepherd: Till Toenshoff

> Modulize the containerizer interface.
> -
>
> Key: MESOS-3709
> URL: https://issues.apache.org/jira/browse/MESOS-3709
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Benjamin Bannier
>
> So that people can implement their own containerizer as a module. That's more 
> efficient than having an external containerizer and shell out. The module 
> system also provides versioning support, this is definitely better than 
> unversioned external containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3709) Modulize the containerizer interface.

2015-10-13 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-3709:
---

Assignee: Benjamin Bannier

> Modulize the containerizer interface.
> -
>
> Key: MESOS-3709
> URL: https://issues.apache.org/jira/browse/MESOS-3709
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Benjamin Bannier
>
> So that people can implement their own containerizer as a module. That's more 
> efficient than having an external containerizer and shell out. The module 
> system also provides versioning support, this is definitely better than 
> unversioned external containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master

2015-10-13 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954874#comment-14954874
 ] 

Guangya Liu commented on MESOS-3338:


[~alex-mesos] It is still under design but I think that that the design is 
going to be finalized in coming weeks, please refer to 
https://docs.google.com/document/d/1RGrkDNnfyjpOQVxk_kUFJCalNMqnFlzaMRww7j7HSKU/edit
 for more detail. Thanks.

> Dynamic reservations are not counted as used resources in the master
> 
>
> Key: MESOS-3338
> URL: https://issues.apache.org/jira/browse/MESOS-3338
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, persistent-volumes
>
> Dynamically reserved resources should be considered used or allocated and 
> hence reflected in Mesos bookkeeping structures and {{state.json}}.
> I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the 
> following section:
> {code}
>   // Check that the Master counts the reservation as a used resource.
>   {
> Future response =
>   process::http::get(master.get(), "state.json");
> AWAIT_READY(response);
> Try parse = JSON::parse(response.get().body);
> ASSERT_SOME(parse);
> Result cpus =
>   parse.get().find("slaves[0].used_resources.cpus");
> ASSERT_SOME_EQ(JSON::Number(1), cpus);
>   }
> {code}
> and got
> {noformat}
> ../../../src/tests/reservation_tests.cpp:168: Failure
> Value of: (cpus).get()
>   Actual: 0
> Expected: JSON::Number(1)
> Which is: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3435) Add Hyper as Mesos Docker alike support

2015-10-13 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954858#comment-14954858
 ] 

Deshi Xiao commented on MESOS-3435:
---

i prefer use module to implement hyper support

> Add Hyper as Mesos Docker alike support
> ---
>
> Key: MESOS-3435
> URL: https://issues.apache.org/jira/browse/MESOS-3435
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Deshi Xiao
>
> Hyper is Hypervisor-agnostic Docker Engine, I hope marathon can support 
> it.(https://github.com/mesosphere/marathon/issues/1815)
> https://hyper.sh/
> In earlier talk about the implement possible with with Tim Chen, He suggest 
> firstly implement the engine like mesos-src/docker/docker.hpp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3705) HTTP Pipelining doesn't keep order of requests

2015-10-13 Thread Alexander Rojas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rojas updated MESOS-3705:
---
Description: 
[HTTP 1.1 Pipelining|https://en.wikipedia.org/wiki/HTTP_pipelining] describes a 
mechanism by which multiple HTTP request can be performed over a single socket. 
The requirement here is that responses should be send in the same order as 
requests are being made.

Libprocess has some mechanisms built in to deal with pipelining when multiple 
HTTP requests are made, it is still, however, possible to create a situation in 
which responses are scrambled respected to the requests arrival.

Consider the situation in which there are two libprocess processes, 
{{processA}} and {{processB}}, each running in a different thread, {{thread2}} 
and {{thread3}} respectively. The 
[{{ProcessManager}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L374]
 runs in {{thread1}}.

{{processA}} is of type {{ProcessA}} which looks roughly as follows:

{code}
class ProcessA : public ProcessBase
{
public:
  ProcessA() {}

  Future foo(const http::Request&) {
// … Do something …
   return http::Ok();
  }

protected:
  virtual void initialize() {
route("/foo", None(), &ProcessA::foo);
  }
}
{code}

{{processB}} is from type {{ProcessB}} which is just like {{ProcessA}} but 
routes {{"bar"}} instead of {{"foo"}}.

The situation in which the bug arises is the following:

# Two requests, one for {{"http://server_uri/(1)/foo"}} and one for 
{{"http://server_uri/(2)//bar"}} are made over the same socket.
# The first request arrives to 
[{{ProcessManager::handle}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2202]
 which is still running in {{thread1}}. This one creates an {{HttpEvent}} and 
delivers to the handler, in this case {{processA}}.
# 
[{{ProcessManager::deliver}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2361]
 enqueues the HTTP event in to the {{processA}} queue. This happens in 
{{thread1}}.
# The second request arrives to 
[{{ProcessManager::handle}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2202]
 which is still running in {{thread1}}. Another {{HttpEvent}} is created and 
delivered to the handler, in this case {{processB}}.
# 
[{{ProcessManager::deliver}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2361]
 enqueues the HTTP event in to the {{processB}} queue. This happens in 
{{thread1}}.
# {{Thread2}} is blocked, so {{processA}} cannot handle the first request, it 
is stuck in the queue.
# {{Thread3}} is idle, so it picks up the request to {{processB}} immediately.
# 
[{{ProcessBase::visit(HttpEvent)}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L3073]
 is called in {{thread3}}, this one in turn 
[dispatches|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L3106]
 the response's future to the {{HttpProxy}} associated with the socket where 
the request came.

At the last point, the bug is evident, the request to {{processB}} will be send 
before the request to {{processA}} even if the handler takes a long time and 
the {{processA::bar()}} actually finishes before. The responses are not send in 
the order the requests are done.

h1. Reproducer

The following is a test which successfully reproduces the issue:

{code}
class PipelineScramblerProcess : public Process
{
public:
  PipelineScramblerProcess()
: ProcessBase(ID::generate("PipelineScramblerProcess")) {}

  void block(const Future& trigger)
  {
trigger.await();
  }

  Future get(const http::Request& request)
  {
if (promise_) {
  promise_->set(Nothing());
}

return http::OK(self().id);
  }

  void setPromise(std::unique_ptr>& promise)
  {
promise_ = std::move(promise);
  }

protected:
  virtual void initialize()
  {
route("/get", None(), &PipelineScramblerProcess::get);
  }

private:
  std::unique_ptr> promise_;
};

TEST(HTTPConnectionTest, ComplexPipelining)
{
  PipelineScramblerProcess blocked;
  spawn(blocked);
  PipelineScramblerProcess unblocked;
  spawn(unblocked);

  ASSERT_EQ(blocked.self().address.ip, unblocked.self().address.ip);
  ASSERT_EQ(blocked.self().address.port, unblocked.self().address.port);

  std::unique_ptr> promise(new Promise());

  // Block the first process so it cannot process the first request until
  // the second request is finished.
  dispatch(blocked, &PipelineScramblerProcess::block, promise->future());

  // Promise will be set once 'fast' serves the second request.
  unblocked.setPromise(promise);

  // 

[jira] [Created] (MESOS-3718) Implement Quota support in allocator

2015-10-13 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-3718:
--

 Summary: Implement Quota support in allocator
 Key: MESOS-3718
 URL: https://issues.apache.org/jira/browse/MESOS-3718
 Project: Mesos
  Issue Type: Bug
  Components: allocation
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov


The built-in Hierarchical DRF allocator should support Quota. This includes 
(but not limited to): adding, updating, removing and satisfying quota; avoiding 
both overcomitting resources and handing them to non-quota'ed roles in presence 
of master failover.

A [design doc for Quota support in 
Allocator|https://issues.apache.org/jira/browse/MESOS-2937] provides an 
overview of a feature set required to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3271) SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.

2015-10-13 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954826#comment-14954826
 ] 

Benjamin Bannier commented on MESOS-3271:
-

I wasn't able to reproduce this at all in a vagrant container (6 cpus, 1O GB 
ram) on an OS X host, can you provide any guidance on how to increase the 
failure rate [~pbrett]? What is the approximate failure rate you are seeing? 

> SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.
> ---
>
> Key: MESOS-3271
> URL: https://issues.apache.org/jira/browse/MESOS-3271
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Paul Brett
> Attachments: build.txt
>
>
> Test failure on Ubuntu 14 configured with {{--disable-java --disable-python 
> --enable-ssl --enable-libevent --enable-optimize --enable-network-isolation}}
> Commit: {{9b78b301469667b5a44f0a351de5f3a71edae499}}
> {code}
> [ RUN  ] SlaveRecoveryTest/0.NonCheckpointingFramework
> I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0
> I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 
> 20150815-064146-544909504-51064-12195-S0
> Registered executor on slave1-ubuntu12
> Starting task 044bd49e-2f38-4671-802a-ac6524d61a85
> Forked command at 17114
> sh -c 'sleep 1000'
> [err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
> 0x2, fd: 21, flags: 0x80)
> *** Aborted at 1439646107 (unix time) try "date -d @1439646107" if you are 
> using GNU date ***
> PC: @ 0x7f6ba512d0d5 (unknown)
> *** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 
> 12195; stack trace: ***
> @ 0x7f6ba54c4cb0 (unknown)
> @ 0x7f6ba512d0d5 (unknown)
> @ 0x7f6ba513083b (unknown)
> @ 0x7f6ba448e1ba (unknown)
> @ 0x7f6ba448e52b (unknown)
> @ 0x7f6ba447dcc9 (unknown)
> @   0x4c4033 process::internal::run<>()
> @ 0x7f6ba72642ab process::Future<>::discard()
> @ 0x7f6ba72643be process::internal::discard<>()
> @ 0x7f6ba7262298 
> _ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data
> @   0x4c4033 process::internal::run<>()
> @   0x6fa0cb process::Future<>::discard()
> @ 0x7f6ba6fb5736 cgroups::event::Listener::finalize()
> @ 0x7f6ba728fb11 process::ProcessManager::resume()
> @ 0x7f6ba728fe0f process::internal::schedule()
> @ 0x7f6ba5c9d490 (unknown)
> @ 0x7f6ba54bce9a start_thread
> @ 0x7f6ba51ea38d (unknown)
> + /bin/true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3717) Master recovery in presence of quota

2015-10-13 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-3717:
--

 Summary: Master recovery in presence of quota
 Key: MESOS-3717
 URL: https://issues.apache.org/jira/browse/MESOS-3717
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov


Quota complicates master failover in several ways. The new master should 
determine if it is possible to satisfy the total quota and notify an operator 
in case it's not (imagine simultaneous failovers of multiple agents). The new 
master should hint the allocator how many agents might reconnect in the future 
to help it decide how to satisfy quota before the majority of agents reconnect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3716) Update Allocator interface to support quota

2015-10-13 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-3716:
--

Assignee: Alexander Rukletsov

> Update Allocator interface to support quota
> ---
>
> Key: MESOS-3716
> URL: https://issues.apache.org/jira/browse/MESOS-3716
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> An allocator should be notified when a quota is being set/updated or removed. 
> Also to support master failover in presence of quota, allocator should be 
> notified about the reregistering agents and allocations towards quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3716) Update Allocator interface to support quota

2015-10-13 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-3716:
--

 Summary: Update Allocator interface to support quota
 Key: MESOS-3716
 URL: https://issues.apache.org/jira/browse/MESOS-3716
 Project: Mesos
  Issue Type: Bug
  Components: allocation
Reporter: Alexander Rukletsov


An allocator should be notified when a quota is being set/updated or removed. 
Also to support master failover in presence of quota, allocator should be 
notified about the reregistering agents and allocations towards quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master

2015-10-13 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954806#comment-14954806
 ] 

Alexander Rukletsov commented on MESOS-3338:


What's the status here? [~gyliu], do you have any progress on this issue?

> Dynamic reservations are not counted as used resources in the master
> 
>
> Key: MESOS-3338
> URL: https://issues.apache.org/jira/browse/MESOS-3338
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, persistent-volumes
>
> Dynamically reserved resources should be considered used or allocated and 
> hence reflected in Mesos bookkeeping structures and {{state.json}}.
> I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the 
> following section:
> {code}
>   // Check that the Master counts the reservation as a used resource.
>   {
> Future response =
>   process::http::get(master.get(), "state.json");
> AWAIT_READY(response);
> Try parse = JSON::parse(response.get().body);
> ASSERT_SOME(parse);
> Result cpus =
>   parse.get().find("slaves[0].used_resources.cpus");
> ASSERT_SOME_EQ(JSON::Number(1), cpus);
>   }
> {code}
> and got
> {noformat}
> ../../../src/tests/reservation_tests.cpp:168: Failure
> Value of: (cpus).get()
>   Actual: 0
> Expected: JSON::Number(1)
> Which is: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3063) Add an example framework using dynamic reservation

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3063:
--
Labels: mesosphere persistent-volumes  (was: )

> Add an example framework using dynamic reservation
> --
>
> Key: MESOS-3063
> URL: https://issues.apache.org/jira/browse/MESOS-3063
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Klaus Ma
>  Labels: mesosphere, persistent-volumes
>
> An example framework using dynamic reservation should added to
> # test dynamic reservations further, and
> # to be used as a reference for those who want to use the dynamic reservation 
> feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3062) Add authorization for dynamic reservation

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3062:
--
Labels: mesosphere persistent-volumes  (was: mesosphere)

> Add authorization for dynamic reservation
> -
>
> Key: MESOS-3062
> URL: https://issues.apache.org/jira/browse/MESOS-3062
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere, persistent-volumes
>
> Dynamic reservations should be authorized with the {{principal}} of the 
> reserving entity (framework or master). The idea is to introduce {{Reserve}} 
> and {{Unreserve}} into the ACL.
> {code}
>   message Reserve {
> // Subjects.
> required Entity principals = 1;
> // Objects.  MVP: Only possible values = ANY, NONE
> required Entity resources = 1;
>   }
>   message Unreserve {
> // Subjects.
> required Entity principals = 1;
> // Objects.
> required Entity reserver_principals = 2;
>   }
> {code}
> When a framework/operator reserves resources, "reserve" ACLs are checked to 
> see if the framework ({{FrameworkInfo.principal}}) or the operator 
> ({{Credential.user}}) is authorized to reserve the specified resources. If 
> not authorized, the reserve operation is rejected.
> When a framework/operator unreserves resources, "unreserve" ACLs are checked 
> to see if the framework ({{FrameworkInfo.principal}}) or the operator 
> ({{Credential.user}}) is authorized to unreserve the resources reserved by a 
> framework or operator ({{Resource.ReservationInfo.principal}}). If not 
> authorized, the unreserve operation is rejected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3338) Dynamic reservations are not counted as used resources in the master

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3338:
--
Labels: mesosphere persistent-volumes  (was: )

> Dynamic reservations are not counted as used resources in the master
> 
>
> Key: MESOS-3338
> URL: https://issues.apache.org/jira/browse/MESOS-3338
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, persistent-volumes
>
> Dynamically reserved resources should be considered used or allocated and 
> hence reflected in Mesos bookkeeping structures and {{state.json}}.
> I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the 
> following section:
> {code}
>   // Check that the Master counts the reservation as a used resource.
>   {
> Future response =
>   process::http::get(master.get(), "state.json");
> AWAIT_READY(response);
> Try parse = JSON::parse(response.get().body);
> ASSERT_SOME(parse);
> Result cpus =
>   parse.get().find("slaves[0].used_resources.cpus");
> ASSERT_SOME_EQ(JSON::Number(1), cpus);
>   }
> {code}
> and got
> {noformat}
> ../../../src/tests/reservation_tests.cpp:168: Failure
> Value of: (cpus).get()
>   Actual: 0
> Expected: JSON::Number(1)
> Which is: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2210) Disallow special characters in role.

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2210:
--
Labels: mesosphere newbie persistent-volumes  (was: newbie)

> Disallow special characters in role.
> 
>
> Key: MESOS-2210
> URL: https://issues.apache.org/jira/browse/MESOS-2210
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: haosdent
>  Labels: mesosphere, newbie, persistent-volumes
>
> As we introduce persistent volumes in MESOS-1524, we will use roles as 
> directory names on the slave (https://reviews.apache.org/r/28562/). As a 
> result, the master should disallow special characters (like space and slash) 
> in role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3620) Create slave/containerizer/isolators/filesystem/windows.cpp

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3620:
--
Labels: mesosphere windows  (was: mesospehre windows)

> Create slave/containerizer/isolators/filesystem/windows.cpp
> ---
>
> Key: MESOS-3620
> URL: https://issues.apache.org/jira/browse/MESOS-3620
> Project: Mesos
>  Issue Type: Task
>  Components: isolation
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows
>
> Should look a lot like the posix.cpp flavor. Important subset of the 
> dependency tree follows for the posix flavor:
> slave/containerizer/isolators/filesystem/posix.cpp: filesystem/posix, fs, os, 
> path
> filesystem/posix: flags, isolator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2406) Add CLI tool for creating persistent volumes for pre-existing data

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2406:
--
Labels: mesosphere persistent-volumes  (was: )

> Add CLI tool for creating persistent volumes for pre-existing data
> --
>
> Key: MESOS-2406
> URL: https://issues.apache.org/jira/browse/MESOS-2406
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>  Labels: mesosphere, persistent-volumes
>
> This is for the case where the user has some pre-existing data under a 
> certain directory (e.g., /var/lib/cassandra) and wants to expose that 
> directory as a persistent volume to the framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2408) Slave should garbage collect released persistent volumes.

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2408:
--
Labels: mesosphere persistent-volumes  (was: )

> Slave should garbage collect released persistent volumes.
> -
>
> Key: MESOS-2408
> URL: https://issues.apache.org/jira/browse/MESOS-2408
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere, persistent-volumes
>
> This is tricky in the case when a persistence id is re-used. When a 
> persistent volume is destroyed explicitly by the framework, master deletes 
> all information about this volume. That mean the master no longer has the 
> ability to check if the persistence id is re-used (and reject the later 
> attempt). On the slave side, we'll use some GC policy to remove directories 
> associated with deleted persistent volumes (similar to how we GC sandboxes). 
> That means the persistent volume directory won't be deleted immediately when 
> the volume is destroyed by the framework explicitly. When the same 
> persistence id is reused, we'll see the persistent volume still exists and we 
> need to cancel the GC of that directory (similar to what we cancel the GC for 
> meta directories during runTask).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3065) Add authorization for persistent volume

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3065:
--
Labels: mesosphere persistent-volumes  (was: mesosphere)

> Add authorization for persistent volume
> ---
>
> Key: MESOS-3065
> URL: https://issues.apache.org/jira/browse/MESOS-3065
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere, persistent-volumes
>
> Persistent volume should be authorized with the {{principal}} of the 
> reserving entity (framework or master). The idea is to introduce {{Create}} 
> and {{Destroy}} into the ACL.
> {code}
>   message Create {
> // Subjects.
> required Entity principals = 1;
> // Objects? Perhaps the kind of volume? allowed permissions?
>   }
>   message Unreserve {
> // Subjects.
> required Entity principals = 1;
> // Objects.
> required Entity creator_principals = 2;
>   }
> {code}
> When a framework/operator creates a persistent volume, "create" ACLs are 
> checked to see if the framework (FrameworkInfo.principal) or the operator 
> (Credential.user) is authorized to create persistent volumes. If not 
> authorized, the create operation is rejected.
> When a framework/operator destroys a persistent volume, "destroy" ACLs are 
> checked to see if the framework (FrameworkInfo.principal) or the operator 
> (Credential.user) is authorized to destroy the persistent volume created by a 
> framework or operator (Resource.DiskInfo.principal). If not authorized, the 
> destroy operation is rejected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2455) Add operator endpoint to destroy persistent volumes.

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2455:
--
Labels: mesosphere persistent-volumes  (was: mesosphere)

> Add operator endpoint to destroy persistent volumes.
> 
>
> Key: MESOS-2455
> URL: https://issues.apache.org/jira/browse/MESOS-2455
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Michael Park
>Priority: Critical
>  Labels: mesosphere, persistent-volumes
>
> Persistent volumes will not be released automatically.
> So we probably need an endpoint for operators to forcefully release 
> persistent volumes. We probably need to add principal to Persistence struct 
> and use ACLs to control who can release what.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3064) Add 'principal' field to 'Resource.DiskInfo'

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3064:
--
Labels: mesosphere persistent-volumes  (was: mesosphere)

> Add 'principal' field to 'Resource.DiskInfo'
> 
>
> Key: MESOS-3064
> URL: https://issues.apache.org/jira/browse/MESOS-3064
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere, persistent-volumes
>
> In order to support authorization for persistent volumes, we should add the 
> {{principal}} to {{Resource.DiskInfo}}, analogous to 
> {{Resource.ReservationInfo.principal}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2610) Add a Java example framework to test persistent volumes.

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2610:
--
Labels: mesosphere persistent-volumes  (was: )

> Add a Java example framework to test persistent volumes.
> 
>
> Key: MESOS-2610
> URL: https://issues.apache.org/jira/browse/MESOS-2610
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: haosdent
>  Labels: mesosphere, persistent-volumes
>
> We already have a C++ framework for testing persistent volumes. Since many of 
> the frameworks are written in Java, so probably we should add an example 
> framework for Java too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-191) Add support for disk spindles in resources

2015-10-13 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-191:
-
Labels: mesosphere persistent-volumes  (was: )

> Add support for disk spindles in resources
> --
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3401) Add labels to Resources

2015-10-13 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954696#comment-14954696
 ] 

Adam B commented on MESOS-3401:
---

While not as explicitly tied to resources as labels would be, I agree that 
agent attributes are currently sufficient for expressing this metadata. It's a 
little clunkier for custom resources types, but manageable. As long as all 
resources on a node of the same type are fungible, we can defer this issue.

However, we'll need to revisit this question when we add external/cluster-wide 
resources as per MESOS-2728, since these resources are not associated with an 
agent whose attributes can be tied back to the resources. We could consider 
adding attributes to the "resource provider", but consider two separate volumes 
from the same provider. Besides interpreting the resource name/id, how is a 
framework supposed to know their different speeds/formats/permissions?

We'd probably also have to revisit this when we implement MESOS-191 to allow 
multiple types of disk resources on a single agent.

> Add labels to Resources
> ---
>
> Key: MESOS-3401
> URL: https://issues.apache.org/jira/browse/MESOS-3401
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Adam B
>Assignee: Greg Mann
>  Labels: external-volumes, mesosphere, resources
>
> Similar to how we have added labels to tasks/executors (MESOS-2120), and even 
> FrameworkInfo (MESOS-2841), we should extend Resource to allow arbitrary 
> key/value pairs.
> This could be used to specify that a cpu resource has a certain speed, that a 
> disk resource is SSD, or express any other metadata about a built-in or 
> custom resource type. Only the scalar quantity will be used for determining 
> fair share in the Mesos allocator. The rest will be passed onto frameworks as 
> info they can use for scheduling decisions.
> This would require changes to how the slave specifies its `--resources` 
> (probably as json), how the slave/master reports resources in its web/json 
> API, and how resources are offered to frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >