[jira] [Commented] (MESOS-4391) docker pull a remote image conflict

2016-01-16 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103064#comment-15103064
 ] 

Timothy Chen commented on MESOS-4391:
-

Concurrent pulling the same image is handled by the docker daemon and doesn't 
require any special synchronization on our side. We only need to handle cases 
where either the docker daemon doesn't handle, or certain load / pattern that 
will break. AFAIK it's just a warning that shows up in the daemon side and from 
the client side everything still works.

> docker pull a remote image conflict
> ---
>
> Key: MESOS-4391
> URL: https://issues.apache.org/jira/browse/MESOS-4391
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework
>Affects Versions: 0.26.0
> Environment: CentOS Linux release 7.2.1511 (Core)
> 3.10.0-327.el7.x86_64
>Reporter: qinlu
>
> I run a docker app with 3 tasks,and the docker image not exist in the slave 
> ,it must to pull from docker.io.
> Marathon assign 2 app run in a slave,and the last in another.
> I see the log by journalctl,it show me like this :level=error msg="HTTP 
> Error" err="No such image: solr:latest" statusCode=404.
> There is two threads to pull the image
> [root@** ~]# ps -ef|grep solr
> root 30113 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest
> root 30114 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2091) docs/mesos-developers.guide.md does not mention github pull-requests.

2016-01-16 Thread Disha Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103102#comment-15103102
 ] 

Disha Singh commented on MESOS-2091:


All patches must post to Review Board and ask a committer to push to master 
branch.
We can't really use Github to be able to push changes to master.

> docs/mesos-developers.guide.md does not mention github pull-requests.
> -
>
> Key: MESOS-2091
> URL: https://issues.apache.org/jira/browse/MESOS-2091
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Till Toenshoff
>Priority: Minor
>  Labels: documentation, newbie
>
> Given that we do actually support github pull-requests as a way to suggest 
> patches, our guidelines should be updated accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1108) Update homepage of website to include logos for adopters

2016-01-16 Thread Disha Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103144#comment-15103144
 ] 

Disha Singh commented on MESOS-1108:


Please mention the adopters whose logos are relevant for putting up on the home 
page. I would like to work on this issue.

> Update homepage of website to include logos for adopters
> 
>
> Key: MESOS-1108
> URL: https://issues.apache.org/jira/browse/MESOS-1108
> Project: Mesos
>  Issue Type: Task
>  Components: project website
>Reporter: Dave Lester
>
> Increase the presence of adopters on the homepage, possibly featuring a 
> montage of logos. This could be static, or optionally dynamic with 
> side-scrolling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4391) docker pull a remote image conflict

2016-01-16 Thread qinlu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103203#comment-15103203
 ] 

qinlu commented on MESOS-4391:
--

if Mesos use docker to pull same images,why not wait the first process done?

> docker pull a remote image conflict
> ---
>
> Key: MESOS-4391
> URL: https://issues.apache.org/jira/browse/MESOS-4391
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework
>Affects Versions: 0.26.0
> Environment: CentOS Linux release 7.2.1511 (Core)
> 3.10.0-327.el7.x86_64
>Reporter: qinlu
>
> I run a docker app with 3 tasks,and the docker image not exist in the slave 
> ,it must to pull from docker.io.
> Marathon assign 2 app run in a slave,and the last in another.
> I see the log by journalctl,it show me like this :level=error msg="HTTP 
> Error" err="No such image: solr:latest" statusCode=404.
> There is two threads to pull the image
> [root@** ~]# ps -ef|grep solr
> root 30113 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest
> root 30114 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3379) LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed

2016-01-16 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103069#comment-15103069
 ] 

Timothy Chen commented on MESOS-3379:
-

https://reviews.apache.org/r/42389/

> LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed
> --
>
> Key: MESOS-3379
> URL: https://issues.apache.org/jira/browse/MESOS-3379
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Fix For: 0.27.0
>
>
> {code}
> sudo GLOG_v=1 ./bin/mesos-tests.sh 
> --gtest_filter="LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint"
>  --verbose
> {code}
> failed in Ubuntu 14.04
> Just a problem when investing [MESOS-3349 
> PersistentVolumeTest.AccessPersistentVolume fails when run as 
> root.|https://issues.apache.org/jira/browse/MESOS-3349]
> In LinuxFilesystemIsolatorProcess::cleanup, when we read mount table and 
> umount. The order should use reverse order. Suppose our mount order is 
> {code}
> mount /tmp/a /tmp/b
> mount /tmp/c /tmp/b/c
> {code}
> Currently we umount logic in cleanup is 
> {code}
> umount /tmp/b
> umount /tmp/b/c <- Wrong
> {code}
> This is the reason why ROOT_VolumeFromHostSandboxMountPoint failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3349) Removing mount point fails with EBUSY in LinuxFilesystemIsolator.

2016-01-16 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3349:

Fix Version/s: 0.27.0

> Removing mount point fails with EBUSY in LinuxFilesystemIsolator.
> -
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: Jie Yu
>  Labels: flaky-test
> Fix For: 0.27.0
>
>
> When running the tests as root, we found 
> PersistentVolumeTest.AccessPersistentVolume fails consistently on some 
> platforms.
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> Turns out that the 'rmdir' after the 'umount' fails with EBUSY because 
> there's still some references to the mount.
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3379) LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed

2016-01-16 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3379:

Fix Version/s: 0.27.0

> LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed
> --
>
> Key: MESOS-3379
> URL: https://issues.apache.org/jira/browse/MESOS-3379
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Fix For: 0.27.0
>
>
> {code}
> sudo GLOG_v=1 ./bin/mesos-tests.sh 
> --gtest_filter="LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint"
>  --verbose
> {code}
> failed in Ubuntu 14.04
> Just a problem when investing [MESOS-3349 
> PersistentVolumeTest.AccessPersistentVolume fails when run as 
> root.|https://issues.apache.org/jira/browse/MESOS-3349]
> In LinuxFilesystemIsolatorProcess::cleanup, when we read mount table and 
> umount. The order should use reverse order. Suppose our mount order is 
> {code}
> mount /tmp/a /tmp/b
> mount /tmp/c /tmp/b/c
> {code}
> Currently we umount logic in cleanup is 
> {code}
> umount /tmp/b
> umount /tmp/b/c <- Wrong
> {code}
> This is the reason why ROOT_VolumeFromHostSandboxMountPoint failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1111) Add map to Mesos user group page

2016-01-16 Thread Disha Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103201#comment-15103201
 ] 

Disha Singh commented on MESOS-:


I would  like to work on this issue .
A map can be created donline and links of the different places can be kept 
displayed which will enlarge displaying complete address on hovering.
On the website the names can be removed only that picture can be displayed.

> Add map to Mesos user group page
> 
>
> Key: MESOS-
> URL: https://issues.apache.org/jira/browse/MESOS-
> Project: Mesos
>  Issue Type: Task
>  Components: project website
>Reporter: Dave Lester
>
> This may not be as compelling with currently 2 user groups, but as we become 
> more international with having Mesos User Groups we should add a map to our 
> MUG page: http://mesos.apache.org/community/user-groups/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4391) docker pull a remote image conflict

2016-01-16 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103200#comment-15103200
 ] 

Klaus Ma commented on MESOS-4391:
-

Learned, thanks very much :).

> docker pull a remote image conflict
> ---
>
> Key: MESOS-4391
> URL: https://issues.apache.org/jira/browse/MESOS-4391
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework
>Affects Versions: 0.26.0
> Environment: CentOS Linux release 7.2.1511 (Core)
> 3.10.0-327.el7.x86_64
>Reporter: qinlu
>
> I run a docker app with 3 tasks,and the docker image not exist in the slave 
> ,it must to pull from docker.io.
> Marathon assign 2 app run in a slave,and the last in another.
> I see the log by journalctl,it show me like this :level=error msg="HTTP 
> Error" err="No such image: solr:latest" statusCode=404.
> There is two threads to pull the image
> [root@** ~]# ps -ef|grep solr
> root 30113 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest
> root 30114 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4399) ReviewBot should ignore a review chain if any of the reviews in the chain is unpublished

2016-01-16 Thread Shuai Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Lin reassigned MESOS-4399:


Assignee: Shuai Lin

> ReviewBot should ignore a review chain if any of the reviews in the chain is 
> unpublished
> 
>
> Key: MESOS-4399
> URL: https://issues.apache.org/jira/browse/MESOS-4399
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Shuai Lin
>
> Observed this recently where review bot was continuously failing on a review 
> chain because one of the reviews in the chain was unpublished (by mistake).
> Instead of failing, the bot should just skip such a chain and move onto other 
> reviews.
> {noformat}
> Verifying review 42241
> Dependent review: https://reviews.apache.org/api/review-requests/42240/ 
> Error handling URL https://reviews.apache.org/api/review-requests/42240/: 
> FORBIDDEN
> git clean -fd
> git reset --hard f2cf6cbb1ca9d04033e293a8b79b97b958a72df7
> Build step 'Execute shell' marked build as failure
> Sending e-mails to: bui...@mesos.apache.org
> Finished: FAILURE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1111) Add map to Mesos user group page

2016-01-16 Thread Disha Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103201#comment-15103201
 ] 

Disha Singh edited comment on MESOS- at 1/16/16 2:57 PM:
-

I would  like to work on this issue .
A map can be created donline and links of the different places can be kept 
displayed which will enlarge displaying complete address on hovering.
On the website the names can be removed only that picture can be displayed.
An example of how it would be is the following link which only shows Albany,NY 
user group ,which eill take you to desired link when clicked.
A map showing all locations can be created.


was (Author: dishjira):
I would  like to work on this issue .
A map can be created donline and links of the different places can be kept 
displayed which will enlarge displaying complete address on hovering.
On the website the names can be removed only that picture can be displayed.

> Add map to Mesos user group page
> 
>
> Key: MESOS-
> URL: https://issues.apache.org/jira/browse/MESOS-
> Project: Mesos
>  Issue Type: Task
>  Components: project website
>Reporter: Dave Lester
>
> This may not be as compelling with currently 2 user groups, but as we become 
> more international with having Mesos User Groups we should add a map to our 
> MUG page: http://mesos.apache.org/community/user-groups/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1111) Add map to Mesos user group page

2016-01-16 Thread Disha Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103201#comment-15103201
 ] 

Disha Singh edited comment on MESOS- at 1/16/16 2:58 PM:
-

I would  like to work on this issue .
A map can be created donline and links of the different places can be kept 
displayed which will enlarge displaying complete address on hovering.
On the website the names can be removed only that picture can be displayed.
The embeded code of the example of how it would be is the following  which only 
shows Albany,NY user group ,which will take you to desired link when clicked.
-https://widgets.scribblemaps.com/sm/?d=true=true=true=true=true=41.60972649518783=-12.93591919944=2=hybrid=true=true=550=400=Mesos;
 style="border:0" allowfullscreen> 
A map showing all locations can be created.


was (Author: dishjira):
I would  like to work on this issue .
A map can be created donline and links of the different places can be kept 
displayed which will enlarge displaying complete address on hovering.
On the website the names can be removed only that picture can be displayed.
An example of how it would be is the following link which only shows Albany,NY 
user group ,which eill take you to desired link when clicked.
A map showing all locations can be created.

> Add map to Mesos user group page
> 
>
> Key: MESOS-
> URL: https://issues.apache.org/jira/browse/MESOS-
> Project: Mesos
>  Issue Type: Task
>  Components: project website
>Reporter: Dave Lester
>
> This may not be as compelling with currently 2 user groups, but as we become 
> more international with having Mesos User Groups we should add a map to our 
> MUG page: http://mesos.apache.org/community/user-groups/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2444) Update mesos presentations documentation

2016-01-16 Thread Disha Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Disha Singh reassigned MESOS-2444:
--

Assignee: Disha Singh  (was: Nancy Ko)

> Update mesos presentations documentation
> 
>
> Key: MESOS-2444
> URL: https://issues.apache.org/jira/browse/MESOS-2444
> Project: Mesos
>  Issue Type: Task
>  Components: documentation, project website
>Reporter: Dave Lester
>Assignee: Disha Singh
>  Labels: newbie
>
> The list of Mesos presentations in `docs/mesos-presentations.md` only 
> reflects presentations as of mid-2014 and could be more-comprehensive. It 
> would be great to include additional presentations (both slides and videos) 
> on this page.
> Optionally, the display of content on this page could be improved -- 
> potentially using a table and generating thumbnails for each video/slideshow 
> to make it more visual. If this route is taken, images can be added to 
> docs/images; ideally within a subfolder to organize them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-4404) SlaveTest.HTTPSchedulerSlaveRestart is flaky

2016-01-16 Thread Jian Qiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Qiu updated MESOS-4404:

Comment: was deleted

(was: Sure, I will take a look at this.)

> SlaveTest.HTTPSchedulerSlaveRestart is flaky
> 
>
> Key: MESOS-4404
> URL: https://issues.apache.org/jira/browse/MESOS-4404
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, slave
>Affects Versions: 0.26.0
> Environment: From the Jenkins CI: gcc,--verbose --enable-libevent 
> --enable-ssl,centos:7,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere
>
> Saw this failure on the Jenkins CI:
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> I0115 18:42:25.393354  1762 leveldb.cpp:174] Opened db in 3.456169ms
> I0115 18:42:25.394310  1762 leveldb.cpp:181] Compacted db in 922588ns
> I0115 18:42:25.394361  1762 leveldb.cpp:196] Created db iterator in 18529ns
> I0115 18:42:25.394378  1762 leveldb.cpp:202] Seeked to beginning of db in 
> 1933ns
> I0115 18:42:25.394390  1762 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 280ns
> I0115 18:42:25.394430  1762 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 18:42:25.394963  1791 recover.cpp:447] Starting replica recovery
> I0115 18:42:25.395396  1791 recover.cpp:473] Replica is in EMPTY status
> I0115 18:42:25.396589  1795 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (11302)@172.17.0.2:49129
> I0115 18:42:25.397101  1785 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 18:42:25.397721  1791 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 18:42:25.398764  1789 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 684584ns
> I0115 18:42:25.398807  1789 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 18:42:25.398947  1795 master.cpp:374] Master 
> 544823be-76b5-47be-b326-2cd6d6a700b8 (e648fe109cb1) started on 
> 172.17.0.2:49129
> I0115 18:42:25.399209  1788 recover.cpp:473] Replica is in STARTING status
> I0115 18:42:25.398980  1795 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/BOGaaq/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/BOGaaq/master" --zk_session_timeout="10secs"
> I0115 18:42:25.399435  1795 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 18:42:25.399451  1795 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 18:42:25.399461  1795 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/BOGaaq/credentials'
> I0115 18:42:25.399884  1795 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 18:42:25.400060  1795 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 18:42:25.400254  1795 master.cpp:569] Authorization enabled
> I0115 18:42:25.400439  1785 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 18:42:25.400470  1789 whitelist_watcher.cpp:77] No whitelist given
> I0115 18:42:25.400656  1792 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (11303)@172.17.0.2:49129
> I0115 18:42:25.400943  1781 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 18:42:25.401612  1791 recover.cpp:564] Updating replica status to VOTING
> I0115 18:42:25.402313  1785 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 458849ns
> I0115 18:42:25.402345  1785 replica.cpp:320] Persisted replica status to 
> VOTING
> I0115 18:42:25.402510  1788 recover.cpp:578] Successfully joined the Paxos 
> group
> I0115 18:42:25.402848  1788 recover.cpp:462] Recover process terminated
> I0115 18:42:25.402997  1784 master.cpp:1710] The newly elected leader is 
> master@172.17.0.2:49129 with id 544823be-76b5-47be-b326-2cd6d6a700b8
> I0115 18:42:25.403038  1784 master.cpp:1723] 

[jira] [Commented] (MESOS-4404) SlaveTest.HTTPSchedulerSlaveRestart is flaky

2016-01-16 Thread Jian Qiu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103208#comment-15103208
 ] 

Jian Qiu commented on MESOS-4404:
-

Sure, I will take a look at this.

> SlaveTest.HTTPSchedulerSlaveRestart is flaky
> 
>
> Key: MESOS-4404
> URL: https://issues.apache.org/jira/browse/MESOS-4404
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, slave
>Affects Versions: 0.26.0
> Environment: From the Jenkins CI: gcc,--verbose --enable-libevent 
> --enable-ssl,centos:7,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere
>
> Saw this failure on the Jenkins CI:
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> I0115 18:42:25.393354  1762 leveldb.cpp:174] Opened db in 3.456169ms
> I0115 18:42:25.394310  1762 leveldb.cpp:181] Compacted db in 922588ns
> I0115 18:42:25.394361  1762 leveldb.cpp:196] Created db iterator in 18529ns
> I0115 18:42:25.394378  1762 leveldb.cpp:202] Seeked to beginning of db in 
> 1933ns
> I0115 18:42:25.394390  1762 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 280ns
> I0115 18:42:25.394430  1762 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 18:42:25.394963  1791 recover.cpp:447] Starting replica recovery
> I0115 18:42:25.395396  1791 recover.cpp:473] Replica is in EMPTY status
> I0115 18:42:25.396589  1795 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (11302)@172.17.0.2:49129
> I0115 18:42:25.397101  1785 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 18:42:25.397721  1791 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 18:42:25.398764  1789 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 684584ns
> I0115 18:42:25.398807  1789 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 18:42:25.398947  1795 master.cpp:374] Master 
> 544823be-76b5-47be-b326-2cd6d6a700b8 (e648fe109cb1) started on 
> 172.17.0.2:49129
> I0115 18:42:25.399209  1788 recover.cpp:473] Replica is in STARTING status
> I0115 18:42:25.398980  1795 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/BOGaaq/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/BOGaaq/master" --zk_session_timeout="10secs"
> I0115 18:42:25.399435  1795 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 18:42:25.399451  1795 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 18:42:25.399461  1795 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/BOGaaq/credentials'
> I0115 18:42:25.399884  1795 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 18:42:25.400060  1795 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 18:42:25.400254  1795 master.cpp:569] Authorization enabled
> I0115 18:42:25.400439  1785 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 18:42:25.400470  1789 whitelist_watcher.cpp:77] No whitelist given
> I0115 18:42:25.400656  1792 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (11303)@172.17.0.2:49129
> I0115 18:42:25.400943  1781 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 18:42:25.401612  1791 recover.cpp:564] Updating replica status to VOTING
> I0115 18:42:25.402313  1785 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 458849ns
> I0115 18:42:25.402345  1785 replica.cpp:320] Persisted replica status to 
> VOTING
> I0115 18:42:25.402510  1788 recover.cpp:578] Successfully joined the Paxos 
> group
> I0115 18:42:25.402848  1788 recover.cpp:462] Recover process terminated
> I0115 18:42:25.402997  1784 master.cpp:1710] The newly elected leader is 
> master@172.17.0.2:49129 with id 544823be-76b5-47be-b326-2cd6d6a700b8
> I0115 18:42:25.403038  1784 

[jira] [Commented] (MESOS-4404) SlaveTest.HTTPSchedulerSlaveRestart is flaky

2016-01-16 Thread Jian Qiu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103209#comment-15103209
 ] 

Jian Qiu commented on MESOS-4404:
-

Sure, I will take a look at this.

> SlaveTest.HTTPSchedulerSlaveRestart is flaky
> 
>
> Key: MESOS-4404
> URL: https://issues.apache.org/jira/browse/MESOS-4404
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, slave
>Affects Versions: 0.26.0
> Environment: From the Jenkins CI: gcc,--verbose --enable-libevent 
> --enable-ssl,centos:7,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere
>
> Saw this failure on the Jenkins CI:
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> I0115 18:42:25.393354  1762 leveldb.cpp:174] Opened db in 3.456169ms
> I0115 18:42:25.394310  1762 leveldb.cpp:181] Compacted db in 922588ns
> I0115 18:42:25.394361  1762 leveldb.cpp:196] Created db iterator in 18529ns
> I0115 18:42:25.394378  1762 leveldb.cpp:202] Seeked to beginning of db in 
> 1933ns
> I0115 18:42:25.394390  1762 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 280ns
> I0115 18:42:25.394430  1762 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 18:42:25.394963  1791 recover.cpp:447] Starting replica recovery
> I0115 18:42:25.395396  1791 recover.cpp:473] Replica is in EMPTY status
> I0115 18:42:25.396589  1795 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (11302)@172.17.0.2:49129
> I0115 18:42:25.397101  1785 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 18:42:25.397721  1791 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 18:42:25.398764  1789 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 684584ns
> I0115 18:42:25.398807  1789 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 18:42:25.398947  1795 master.cpp:374] Master 
> 544823be-76b5-47be-b326-2cd6d6a700b8 (e648fe109cb1) started on 
> 172.17.0.2:49129
> I0115 18:42:25.399209  1788 recover.cpp:473] Replica is in STARTING status
> I0115 18:42:25.398980  1795 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/BOGaaq/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/BOGaaq/master" --zk_session_timeout="10secs"
> I0115 18:42:25.399435  1795 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 18:42:25.399451  1795 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 18:42:25.399461  1795 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/BOGaaq/credentials'
> I0115 18:42:25.399884  1795 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 18:42:25.400060  1795 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 18:42:25.400254  1795 master.cpp:569] Authorization enabled
> I0115 18:42:25.400439  1785 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 18:42:25.400470  1789 whitelist_watcher.cpp:77] No whitelist given
> I0115 18:42:25.400656  1792 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (11303)@172.17.0.2:49129
> I0115 18:42:25.400943  1781 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 18:42:25.401612  1791 recover.cpp:564] Updating replica status to VOTING
> I0115 18:42:25.402313  1785 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 458849ns
> I0115 18:42:25.402345  1785 replica.cpp:320] Persisted replica status to 
> VOTING
> I0115 18:42:25.402510  1788 recover.cpp:578] Successfully joined the Paxos 
> group
> I0115 18:42:25.402848  1788 recover.cpp:462] Recover process terminated
> I0115 18:42:25.402997  1784 master.cpp:1710] The newly elected leader is 
> master@172.17.0.2:49129 with id 544823be-76b5-47be-b326-2cd6d6a700b8
> I0115 18:42:25.403038  1784 

[jira] [Commented] (MESOS-4249) Mesos fetcher step skipped with MESOS_DOCKER_MESOS_IMAGE flag

2016-01-16 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103241#comment-15103241
 ] 

Shuai Lin commented on MESOS-4249:
--

https://reviews.apache.org/r/42390/

> Mesos fetcher step skipped with MESOS_DOCKER_MESOS_IMAGE flag
> -
>
> Key: MESOS-4249
> URL: https://issues.apache.org/jira/browse/MESOS-4249
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.26.0
> Environment: mesos 0.26.0-0.2.145.ubuntu1404
>Reporter: Marica Antonacci
>Assignee: Shuai Lin
>
> The following behaviour has been observed using a dockerized mesos slave.
> If the slave is running inside a docker container with the docker_mesos_image 
> startup flag and you submit the deployment of a dockerized application or job 
> (through Marathon/Chronos), the fetcher step is not performed. On the other 
> hand, if you request the deployment of a non-dockerized application, the URIs 
> are correctly fetched. Moreover, if I don’t provide the docker_mesos_image 
> flag, the fetcher works fine again for both dockerized and non-dockerized 
> applications.
> More details in the user mailing list 
> (https://www.mail-archive.com/user@mesos.apache.org/msg05429.html).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4414) Visualize resource usage per role in the UI

2016-01-16 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103394#comment-15103394
 ] 

Ian Babrou commented on MESOS-4414:
---

MESOS-4341 is somewhat related

> Visualize resource usage per role in the UI
> ---
>
> Key: MESOS-4414
> URL: https://issues.apache.org/jira/browse/MESOS-4414
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Ian Babrou
>
> When running Mesos in multi-tenant environment with several roles and 
> frameworks, it's easy to be tricked that you have enough slack capacity. This 
> leads to unexpected delays in scheduling, since you are low on resources on 
> specific roles and unused chunks of resources are too small.
> While this is also a monitoring issue, I think Mesos UI can do a better job 
> of vusualizing what resources are available for roles. I made a CLI tool to 
> do that for myself: https://github.com/bobrik/scrappy
> In addition, it could be worth showing top available resource chunks per 
> resource:
> * Biggest CPU intensive task that can be scheduled (with mem and disk)
> * Biggest mem intensive task that can be scheduled (with cpu and disk)
> * etc
> These things can be monitored and alerted on, but it probably isn't the best 
> solution. Maybe it's a job for a separate service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4414) Visualize resource usage per role in the UI

2016-01-16 Thread Ian Babrou (JIRA)
Ian Babrou created MESOS-4414:
-

 Summary: Visualize resource usage per role in the UI
 Key: MESOS-4414
 URL: https://issues.apache.org/jira/browse/MESOS-4414
 Project: Mesos
  Issue Type: Improvement
  Components: webui
Reporter: Ian Babrou


When running Mesos in multi-tenant environment with several roles and 
frameworks, it's easy to be tricked that you have enough slack capacity. This 
leads to unexpected delays in scheduling, since you are low on resources on 
specific roles and unused chunks of resources are too small.

While this is also a monitoring issue, I think Mesos UI can do a better job of 
vusualizing what resources are available for roles. I made a CLI tool to do 
that for myself: https://github.com/bobrik/scrappy

In addition, it could be worth showing top available resource chunks per 
resource:

* Biggest CPU intensive task that can be scheduled (with mem and disk)
* Biggest mem intensive task that can be scheduled (with cpu and disk)
* etc

These things can be monitored and alerted on, but it probably isn't the best 
solution. Maybe it's a job for a separate service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4415) Implement stout/os/windows/rmdir.hpp

2016-01-16 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-4415:
---

 Summary: Implement stout/os/windows/rmdir.hpp
 Key: MESOS-4415
 URL: https://issues.apache.org/jira/browse/MESOS-4415
 Project: Mesos
  Issue Type: Task
  Components: stout
Reporter: Joris Van Remoortere
Assignee: Alex Clemmer
 Fix For: 0.27.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4382) Change the `principal` in `ReservationInfo` to optional

2016-01-16 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103438#comment-15103438
 ] 

Jie Yu commented on MESOS-4382:
---

commit 1b56d99bfa8f64b81413908987f11539d4978a61
Author: Greg Mann 
Date:   Sat Jan 16 12:45:59 2016 -0800

Documented endpoint failure when HTTP authentication disabled.

The `/reserve` and `/unreserve` endpoints currently do not work when
HTTP authentication is disabled. To enable correct behavior, the
`principal` field of `ReservationInfo` is being migrated from `required`
to `optional`. This patch documents this behavior of these endpoints.

Review: https://reviews.apache.org/r/42336/

commit 1dd713516d4f5e175ce45086263d71717bd9591e
Author: Greg Mann 
Date:   Sat Jan 16 12:40:02 2016 -0800

Changed 'ReservationInfo.principal' from required to optional.

In order to allow dynamic reservation without a principal, this field is
being changed to optional. However, the current patch alters the master
to invalidate any reserve operations that do not set this field. After a
deprecation cycle, the master will allow the field to be unset.

Review: https://reviews.apache.org/r/42334/

> Change the `principal` in `ReservationInfo` to optional
> ---
>
> Key: MESOS-4382
> URL: https://issues.apache.org/jira/browse/MESOS-4382
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere, reservations
>
> With the addition of HTTP endpoints for {{/reserve}} and {{/unreserve}}, it 
> is now desirable to allow dynamic reservations without a principal, in the 
> case where HTTP authentication is disabled. To allow for this, we will change 
> the {{principal}} field in {{ReservationInfo}} from required to optional. For 
> backwards-compatibility, however, the master should currently invalidate any 
> {{ReservationInfo}} messages that do not have this field set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4310) Disable support for --switch-user on Windows.

2016-01-16 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4310:

Summary: Disable support for --switch-user on Windows.  (was: Add support 
for --switch-user on Windows.)

> Disable support for --switch-user on Windows.
> -
>
> Key: MESOS-4310
> URL: https://issues.apache.org/jira/browse/MESOS-4310
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, slave
> Fix For: 0.27.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3645) Implement stout/os/windows/stat.hpp

2016-01-16 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3645:

Shepherd: Michael Park  (was: Joris Van Remoortere)

> Implement stout/os/windows/stat.hpp
> ---
>
> Key: MESOS-3645
> URL: https://issues.apache.org/jira/browse/MESOS-3645
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3877) Draft operator documentation for quota

2016-01-16 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103486#comment-15103486
 ] 

Joris Van Remoortere commented on MESOS-3877:
-

{code}
commit e4bc88986276c85897a063707a8d0fc905dda692
Author: Joerg Schad 
Date:   Sat Jan 16 18:29:07 2016 -0500

Quota: Added operator documentation.

Review: https://reviews.apache.org/r/42040/
{code}

> Draft operator documentation for quota
> --
>
> Key: MESOS-3877
> URL: https://issues.apache.org/jira/browse/MESOS-3877
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Reporter: Alexander Rukletsov
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> Draft an operator guide for quota which describes basic usage of the 
> endpoints and few basic and advanced usage cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4314) Publish Quota Documentation

2016-01-16 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103487#comment-15103487
 ] 

Joris Van Remoortere edited comment on MESOS-4314 at 1/16/16 11:31 PM:
---

{code}
commit e4bc88986276c85897a063707a8d0fc905dda692
Author: Joerg Schad 
Date:   Sat Jan 16 18:29:07 2016 -0500

Quota: Added operator documentation.

Review: https://reviews.apache.org/r/42040/

{code}


was (Author: jvanremoortere):
{code}
{code}

> Publish Quota Documentation
> ---
>
> Key: MESOS-4314
> URL: https://issues.apache.org/jira/browse/MESOS-4314
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Joerg Schad
>Assignee: Joerg Schad
> Fix For: 0.27.0
>
>
> Publish and finish the operator guide draft  for quota which describes basic 
> usage of the endpoints and few basic and advanced usage cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4314) Publish Quota Documentation

2016-01-16 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4314:

Shepherd: Joris Van Remoortere  (was: Bernd Mathiske)
  Labels: mesosphere quota  (was: )

> Publish Quota Documentation
> ---
>
> Key: MESOS-4314
> URL: https://issues.apache.org/jira/browse/MESOS-4314
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: mesosphere, quota
> Fix For: 0.27.0
>
>
> Publish and finish the operator guide draft  for quota which describes basic 
> usage of the endpoints and few basic and advanced usage cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2016-01-16 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103504#comment-15103504
 ] 

Neil Conway commented on MESOS-3870:


See also {{Slave::checkpointResources}}, which has a long comment (and some 
workaround code) to deal with the fact that ordered message delivery is not 
currently guaranteed.

> Prevent out-of-order libprocess message delivery
> 
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable 
> message delivery. So if P1 sends  to P2, P2 might see <>, , , 
> or  — but not .
> I suspect much of the code makes a similar assumption. However, it appears 
> that this behavior is not guaranteed. slave.cpp:2217 has the following 
> comment:
> {noformat}
>   // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
>   // ordered (i.e., slave receives them in the same order master sends
>   // them). This should be true in most of the cases because TCP
>   // enforces in order delivery per connection. However, the ordering
>   // is technically not guaranteed because master creates multiple
>   // connections to the slave in some cases (e.g., persistent socket
>   // to slave breaks and master uses ephemeral socket). This could
>   // potentially be solved by using a version number and rejecting
>   // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee 
> ordered message delivery, e.g., by adding a sequence number, or (2) 
> clarifying that ordered message delivery is not guaranteed, and ideally 
> providing a tool to force messages to be delivered out-of-order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-16 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103571#comment-15103571
 ] 

Qian Zhang commented on MESOS-4392:
---

{quote}
An oversubscribed resource cannot be converted into a non-revocable resource, 
not even by preemption. In contrast, a non-oversubscribed, revocable resource 
can be converted into a non-revocable resource.
{quote}
So that means the oversubscribed resources (reported by resource estimator) 
should always be used by framework as revocable resources, but for the 
resources set aside to satisfy quota, they can temporarily used by framework of 
not-quota'ed role as revocable resources, and once the framework of quota'ed 
role needs them, we should revoke them from the framework of not-quota'ed role 
and offer them to the framework of quota'ed role as non-revocable resources, 
right?

> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4403) Check paths in DiskInfo.Source.Path exist during slave initialization.

2016-01-16 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103585#comment-15103585
 ] 

Qian Zhang commented on MESOS-4403:
---

[~jieyu], if we want to do the check during slave initialization, then that 
means DiskInfo.Source.Path should be a command line option of slave, right? So 
are we going to introduce a new slave command line option for that?

> Check paths in DiskInfo.Source.Path exist during slave initialization.
> --
>
> Key: MESOS-4403
> URL: https://issues.apache.org/jira/browse/MESOS-4403
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>
> We have two options here. We can either check and fail if it does not exists. 
> Or we can create if it does not exist like we did for slave.work_dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4403) Check paths in DiskInfo.Source.Path exist during slave initialization.

2016-01-16 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103592#comment-15103592
 ] 

Jie Yu commented on MESOS-4403:
---

Mesos support passing --resources using a JSON file. The DiskInfo.Source.Path 
can be specified there.

> Check paths in DiskInfo.Source.Path exist during slave initialization.
> --
>
> Key: MESOS-4403
> URL: https://issues.apache.org/jira/browse/MESOS-4403
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>
> We have two options here. We can either check and fail if it does not exists. 
> Or we can create if it does not exist like we did for slave.work_dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4403) Check paths in DiskInfo.Source.Path exist during slave initialization.

2016-01-16 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103625#comment-15103625
 ] 

Qian Zhang commented on MESOS-4403:
---

Got it, thanks [~jieyu].

> Check paths in DiskInfo.Source.Path exist during slave initialization.
> --
>
> Key: MESOS-4403
> URL: https://issues.apache.org/jira/browse/MESOS-4403
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>
> We have two options here. We can either check and fail if it does not exists. 
> Or we can create if it does not exist like we did for slave.work_dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4355) Implement isolator for Docker volume

2016-01-16 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4355:
--
Component/s: docker

> Implement isolator for Docker volume
> 
>
> Key: MESOS-4355
> URL: https://issues.apache.org/jira/browse/MESOS-4355
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker, isolation
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> In Docker, user can create a volume with Docker CLI, e.g., {{docker volume 
> create --name my-volume}}, we need to implement an isolator to make the 
> container launched by MesosContainerizer can use such volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4354) Implement isolator for Docker network

2016-01-16 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4354:
--
Component/s: docker

> Implement isolator for Docker network
> -
>
> Key: MESOS-4354
> URL: https://issues.apache.org/jira/browse/MESOS-4354
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker, isolation
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> In Docker, user can create a network with Docker CLI, e.g., {{docker network 
> create my-network}}, we need to implement an isolator to make the container 
> launched by MesosContainerizer can use such network.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)