date:20151020

[jira] [Commented] (MESOS-1607) Introduce optimistic offers.

2015-10-20 Thread Bharath Ravi Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966265#comment-14966265
 ] 

Bharath Ravi Kumar commented on MESOS-1607:
---

Is there a particular milestone that this feature is being targeted for 
(tentatively)? Would be useful to know that. Thanks.

> Introduce optimistic offers.
> 
>
> Key: MESOS-1607
> URL: https://issues.apache.org/jira/browse/MESOS-1607
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, framework, master
>Reporter: Benjamin Hindman
>Assignee: Artem Harutyunyan
>  Labels: mesosphere
> Attachments: optimisitic-offers.pdf
>
>
> *Background*
> The current implementation of resource offers only enable a single framework 
> scheduler to make scheduling decisions for some available resources at a 
> time. In some circumstances, this is good, i.e., when we don't want other 
> framework schedulers to have access to some resources. However, in other 
> circumstances, there are advantages to letting multiple framework schedulers 
> attempt to make scheduling decisions for the _same_ allocation of resources 
> in parallel.
> If you think about this from a "concurrency control" perspective, the current 
> implementation of resource offers is _pessimistic_, the resources contained 
> within an offer are _locked_ until the framework scheduler that they were 
> offered to launches tasks with them or declines them. In addition to making 
> pessimistic offers we'd like to give out _optimistic_ offers, where the same 
> resources are offered to multiple framework schedulers at the same time, and 
> framework schedulers "compete" for those resources on a 
> first-come-first-serve basis (i.e., the first to launch a task "wins"). We've 
> always reserved the right to rescind resource offers using the 'rescind' 
> primitive in the API, and a framework scheduler should be prepared to launch 
> a task and have those tasks go lost because another framework already started 
> to use those resources.
> *Feature*
> We plan to take a step towards optimistic offers, by introducing primitives 
> that allow resources to be offered to multiple frameworks at once.  At first, 
> we will use these primitives to optimistically allocate resources that are 
> reserved for a particular framework/role but have not been allocated by that 
> framework/role.  
> The work with optimistic offers will closely resemble the existing 
> oversubscription feature.  Optimistically offered resources are likely to be 
> considered "revocable resources" (the concept that using resources not 
> reserved for you means you might get those resources revoked).  In effect, we 
> can may create something like a "spot" market for unused resources, driving 
> up utilization by letting frameworks that are willing to use revocable 
> resources run tasks.
> *Future Work*
> This ticket tracks the introduction of some aspects of optimistic offers.  
> Taken to the limit, one could imagine always making optimistic resource 
> offers. This bears a striking resemblance with the Google Omega model (an 
> isomorphism even). However, being able to configure what resources should be 
> allocated optimistically and what resources should be allocated 
> pessimistically gives even more control to a datacenter/cluster operator that 
> might want to, for example, never let multiple frameworks (roles) compete for 
> some set of resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3768) slave crashes on master reboot and tasks got stopped

2015-10-20 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966175#comment-14966175
 ] 

Neil Conway commented on MESOS-3768:


At first glance, this looks similar to [MESOS-2186].

> slave crashes on master reboot and tasks got stopped
> 
>
> Key: MESOS-3768
> URL: https://issues.apache.org/jira/browse/MESOS-3768
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0
>Reporter: Johannes Ziemke
>
> Hi,
> in my 3 master node cluster, I rebooted the leading master which caused 
> several slaves to crash. Beside that, about half of all tasks in the cluster 
> got stopped in the process. After some time, the cluster became stable again.
> Slave log:
> https://gist.github.com/anonymous/f506c79ce63c5c934477
> Master log:
> https://gist.github.com/anonymous/12e8aa2529b19b226425



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3774) Migrate Future tests from process_tests.cpp to future_tests.cpp

2015-10-20 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3774:
---
Labels: mesosphere newbie testing  (was: )

> Migrate Future tests from process_tests.cpp to future_tests.cpp
> ---
>
> Key: MESOS-3774
> URL: https://issues.apache.org/jira/browse/MESOS-3774
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gilbert Song
>Priority: Minor
>  Labels: mesosphere, newbie, testing
>
> Currently we do not have too much `FutureTest` in 
> /mesos/3rdparty/libprocess/src/tests/future_tests.cpp
> It would be more clear to move all future-related tests
> from: /mesos/3rdparty/libprocess/src/tests/process_tests.cpp
> to: /mesos/3rdparty/libprocess/src/tests/future_tests.cpp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-20 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966162#comment-14966162
 ] 

haosdent commented on MESOS-3738:
-

I also update [~rafaelcapucho] your dockefile to apply the patch, see 
https://paste.ee/p/8u3eL .

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
> Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> {noformat}
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> {noformat}
> Marathon JSON file:
> {code}
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> {code}
> {noformat}
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave 
> e20f8959

[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-20 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966158#comment-14966158
 ] 

haosdent commented on MESOS-3738:
-

Thank you very much for your confirm!

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
> Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> {noformat}
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> {noformat}
> Marathon JSON file:
> {code}
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> {code}
> {noformat}
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave 
> e20f8959-cd9f-40ae-987d-809401309361-S0
> WARNING: Your kernel does

[jira] [Assigned] (MESOS-3769) Agent logs are misleading during agent shutdown

2015-10-20 Thread Guangya Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu reassigned MESOS-3769:
--

Assignee: Guangya Liu

> Agent logs are misleading during agent shutdown
> ---
>
> Key: MESOS-3769
> URL: https://issues.apache.org/jira/browse/MESOS-3769
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: newbie
>
> When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted 
> following logs:
> {noformat}
> I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received 
> status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for 
> task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update 
> TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of 
> framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to 
> master@172.18.6.110:62507
> I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting 
> down
> I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0
> I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> {noformat}
> It looks like {{Slave::shutdown()}} uses wrong assumptions about possible 
> execution paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3769) Agent logs are misleading during agent shutdown

2015-10-20 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966147#comment-14966147
 ] 

Guangya Liu commented on MESOS-3769:


RR https://reviews.apache.org/r/39507/

> Agent logs are misleading during agent shutdown
> ---
>
> Key: MESOS-3769
> URL: https://issues.apache.org/jira/browse/MESOS-3769
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: newbie
>
> When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted 
> following logs:
> {noformat}
> I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received 
> status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for 
> task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update 
> TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of 
> framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to 
> master@172.18.6.110:62507
> I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting 
> down
> I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0
> I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> {noformat}
> It looks like {{Slave::shutdown()}} uses wrong assumptions about possible 
> execution paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`

2015-10-20 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966121#comment-14966121
 ] 

haosdent commented on MESOS-3506:
-

Is it possible to find which package affect building "mesos-0.25.0.jar" and we 
just update that package (maybe just need update maven or jdk)? Because when I 
execute update command
{noformat}
Transaction Summary

Install  32 Package(s)
Upgrade 371 Package(s)
{noformat}
Sometimes user maybe want to keep unrelated packages stay in old version to 
avoid bugs in latest version.

> Build instructions for CentOS 6.6 should include `sudo yum update`
> --
>
> Key: MESOS-3506
> URL: https://issues.apache.org/jira/browse/MESOS-3506
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the 
> build to break when building {{mesos-0.25.0.jar}}. The build instructions for 
> this platform on the Getting Started page should be changed accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-20 Thread Jay Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966081#comment-14966081
 ] 

Jay Taylor commented on MESOS-3738:
---

Hi Rafael,

I just uploaded the deb's I built on friday (latest master on friday at
3:30pm Pacific time) which have the patches applied.

You can grab it here: scala.sh/mesos-0.26.0-g38b2f72-0.1.20151016221956.deb

Hope this helps!

Best,
Jay

On Tue, Oct 20, 2015 at 4:54 PM, Rafael Capucho (JIRA) 



> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
> Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> {noformat}
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> {noformat}
> Marathon JSON file:
> {code}
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> {code}
> {noformat}
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
>

[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-20 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966036#comment-14966036
 ] 

Joseph Wu commented on MESOS-3771:
--

Actually, I can't repro this behavior in a unit test.  ([Attempted 
here|https://github.com/kaysoky/mesos/commit/d8869f0aa1fdcf38072b45a6238b191c67b7e0f7]).

I've constructed an {{ExecutorInfo}} with a {{data}} field holding the same 
data you have above.  Fetching the same {{ExecutorInfo}} from the {{/state}} 
endpoint also gives valid JSON.

Am I doing something differently?

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1
>Reporter: Steven Schlansker
>Priority: Critical
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3774) Migrate Future tests from process_tests.cpp to future_tests.cpp

2015-10-20 Thread Gilbert Song (JIRA)

Gilbert Song created MESOS-3774:
---

 Summary: Migrate Future tests from process_tests.cpp to 
future_tests.cpp
 Key: MESOS-3774
 URL: https://issues.apache.org/jira/browse/MESOS-3774
 Project: Mesos
  Issue Type: Improvement
Reporter: Gilbert Song
Priority: Minor


Currently we do not have too much `FutureTest` in 
/mesos/3rdparty/libprocess/src/tests/future_tests.cpp

It would be more clear to move all future-related tests
from: /mesos/3rdparty/libprocess/src/tests/process_tests.cpp
to: /mesos/3rdparty/libprocess/src/tests/future_tests.cpp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-20 Thread Rafael Capucho (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965990#comment-14965990
 ] 

Rafael Capucho commented on MESOS-3738:
---

It will be released like 0.25.1? How can I apply this patch considering that 
I'm using dockerfile[1]? thank you.

[1] - https://paste.ee/p/eryAc

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
> Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> {noformat}
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> {noformat}
> Marathon JSON file:
> {code}
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> {code}
> {noformat}
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.130627

[jira] [Updated] (MESOS-3694) Enable building mesos.apache.org locally in a Docker container.

2015-10-20 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3694:
-
Labels: docathon mesosphere  (was: mesosphere)

> Enable building mesos.apache.org locally in a Docker container.
> ---
>
> Key: MESOS-3694
> URL: https://issues.apache.org/jira/browse/MESOS-3694
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Artem Harutyunyan
>  Labels: docathon, mesosphere
>
> We should make it easy for everyone to modify the website and be able to 
> generate it locally before pushing to upstream. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.

2015-10-20 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965883#comment-14965883
 ] 

Joseph Wu edited comment on MESOS-3762 at 10/20/15 11:30 PM:
-

Reviews for:
Step 1)
https://reviews.apache.org/r/39498/
https://reviews.apache.org/r/39499/
Step 2 & 3)
https://reviews.apache.org/r/39501/


was (Author: kaysoky):
Reviews for step 1)
https://reviews.apache.org/r/39498/
https://reviews.apache.org/r/39499/

> Refactor SSLTest fixture such that MesosTest can use the same helpers.
> --
>
> Key: MESOS-3762
> URL: https://issues.apache.org/jira/browse/MESOS-3762
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> In order to write tests that exercise SSL with other components of Mesos, 
> such as the HTTP scheduler library, we need to use the setup/teardown logic 
> found in the {{SSLTest}} fixture.
> Currently, the test fixtures have separate inheritance structures like this:
> {code}
> SSLTest <- ::testing::Test
> MesosTest <- TemporaryDirectoryTest <- ::testing::Test
> {code}
> where {{::testing::Test}} is a gtest class.
> The plan is the following:
> # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
> require moving the setup (generation of keys and certs) from 
> {{SetUpTestCase}} to {{SetUp}}.  At the same time, *some* of the cleanup 
> logic in the SSLTest will not be needed.
> # Move the logic of generating keys/certs into helpers, so that individual 
> tests can call them when needed, much like {{MesosTest}}.
> # Write a child class of {{SSLTest}} which has the same functionality as the 
> existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} 
> or the {{RegistryClientTest}}.
> # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
> the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
> {{SSLTest}} could be {{#ifdef}}'d into any empty class.
> The resulting structure should be like:
> {code}
> MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
> ChildOfSSLTest /
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-20 Thread Steven Schlansker (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965945#comment-14965945
 ] 

Steven Schlansker commented on MESOS-3771:
--

Yeah, we could try to patch Spark.  However I'm sure someone else will make 
exactly the same mistake down the road -- it seems to work as long as you use 
the protobuf api only.  It really seems wrong to just assume that arbitrary 
bytes are valid UTF-8.  Note that ASCII is a real misnomer here, the only 
things that matter are "arbitrary binary data" (the type of 'data') and "UTF8" 
(the format that the rendered JSON *must* be in).  I don't see anywhere here 
that ASCII is relevant.

Maybe it's possible to escape the 0xACED sequence we see with \uXXX escapes, 
but I'm not sure that's possible, as those escapes produce UTF-16 codepoints 
not binary data...

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1
>Reporter: Steven Schlansker
>Priority: Critical
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3501) configure cannot find libevent headers in CentOS 6

2015-10-20 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965384#comment-14965384
 ] 

Greg Mann edited comment on MESOS-3501 at 10/20/15 11:06 PM:
-

I like Neil's idea of updating the configure error message to note that 
libevent2 is required, I think this may be enough to guide the user in the 
right direction. I also have a ticket open to add the {{--enable-libevent}} 
flag to the "Configuration" docs (MESOS-3749), so we can link to the libevent 
documentation there as well.


was (Author: greggomann):
I like Neil's idea of updating the configure error message to note that 
libevent2 is required, I think this may be enough to guide the user in the 
right direction. I also have a ticket open to add the {{--enable-libevent}} 
flag to the "Configuration" docs, so we can link to the libevent documentation 
there as well.

> configure cannot find libevent headers in CentOS 6
> --
>
> Key: MESOS-3501
> URL: https://issues.apache.org/jira/browse/MESOS-3501
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: CentOS 6.6, 6.7
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: build, configure, libevent, mesosphere
>
> If libevent is installed via {{sudo yum install libevent-headers}}, running 
> {{../configure --enable-libevent}} will fail to discover the libevent headers:
> {code}
> checking event2/event.h usability... no
> checking event2/event.h presence... no
> checking for event2/event.h... no
> configure: error: cannot find libevent headers
> ---
> libevent is required for libprocess to build.
> ---
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3501) configure cannot find libevent headers in CentOS 6

2015-10-20 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965936#comment-14965936
 ] 

Greg Mann commented on MESOS-3501:
--

I posted a patch with clarified error messages in {{configure.ac}}. Review 
here: https://reviews.apache.org/r/39496/

> configure cannot find libevent headers in CentOS 6
> --
>
> Key: MESOS-3501
> URL: https://issues.apache.org/jira/browse/MESOS-3501
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: CentOS 6.6, 6.7
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: build, configure, libevent, mesosphere
>
> If libevent is installed via {{sudo yum install libevent-headers}}, running 
> {{../configure --enable-libevent}} will fail to discover the libevent headers:
> {code}
> checking event2/event.h usability... no
> checking event2/event.h presence... no
> checking for event2/event.h... no
> configure: error: cannot find libevent headers
> ---
> libevent is required for libprocess to build.
> ---
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2275) Document header include rules in style guide

2015-10-20 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-2275:

Sprint: Mesosphere Sprint 21

> Document header include rules in style guide
> 
>
> Key: MESOS-2275
> URL: https://issues.apache.org/jira/browse/MESOS-2275
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Niklas Quarfot Nielsen
>Assignee: Jan Schlicht
>Priority: Trivial
>  Labels: beginner, docathon, mesosphere
>
> We have several ways of sorting, grouping and ordering headers includes in 
> Mesos. We should agree on a rule set and do a style scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2275) Document header include rules in style guide

2015-10-20 Thread Jan Schlicht (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-2275:

Story Points: 3  (was: 1)

> Document header include rules in style guide
> 
>
> Key: MESOS-2275
> URL: https://issues.apache.org/jira/browse/MESOS-2275
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Niklas Quarfot Nielsen
>Assignee: Jan Schlicht
>Priority: Trivial
>  Labels: beginner, docathon, mesosphere
>
> We have several ways of sorting, grouping and ordering headers includes in 
> Mesos. We should agree on a rule set and do a style scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-2970) Support container image caching

2015-10-20 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-2970:
---

Assignee: Gilbert Song

> Support container image caching 
> 
>
> Key: MESOS-2970
> URL: https://issues.apache.org/jira/browse/MESOS-2970
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Gilbert Song
>  Labels: mesosphere
>
> Each image provisioner need to implement its own storing and fetching images, 
> and in some level need to implement caching and concurrent downloads of the 
> same layer/image. 
> We already have fetcher cache, and we should consider if we can reuse this. 
> And if not we still should have some primitives that all the provisioners can 
> reuse around caching.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2315) Deprecate / Remove CommandInfo::ContainerInfo

2015-10-20 Thread Vaibhav Khanduja (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965898#comment-14965898
 ] 

Vaibhav Khanduja commented on MESOS-2315:
-

Apologies for not coming on this quickly. Please correct me if I am wrong.  


Mesos 0.20.0 shipped new ContainerInfo type. The framework making using of it, 
should use TaskInfo with semantics:

a) The new ContainerInfo message has type field “type” field. The current 
supported options are Docker & Mesos and is designed so as it can be extended.
b) CommandInfo: launches a command in a container.
c) CommandInfo and ContainerInfo  runs the supplied command as a task with the 
CommandExecutor inside the specified container.

The example code with old ContainerInfo

TaskInfo task;
task.set_name("Task " + lexical_cast(taskId));
task.mutable_task_id()->set_value(lexical_cast(taskId));
task.mutable_slave_id()->MergeFrom(offer.slave_id());
task.mutable_command()->set_value(“touch hello.txt”);


The example code with new ContainerInfo for Docker container

TaskInfo task;
task.set_name("Task " + lexical_cast(taskId));
task.mutable_task_id()->set_value(lexical_cast(taskId));
task.mutable_slave_id()->MergeFrom(offer.slave_id());
task.mutable_command()->set_value(“touch hello.txt”);


// Use Docker to run the task.
ContainerInfo containerInfo;
containerInfo.set_type(ContainerInfo::DOCKER);
ContainerInfo::DockerInfo dockerInfo;
dockerInfo.set_image("busybox");
containerInfo.mutable_docker()->CopyFrom(dockerInfo);
task.mutable_container()->CopyFrom(containerInfo);

The example code with new ContainerInfo for Mesos container


TaskInfo task;
task.set_name("Task " + lexical_cast(taskId));
task.mutable_task_id()->set_value(lexical_cast(taskId));
task.mutable_slave_id()->MergeFrom(offer.slave_id());
task.mutable_command()->set_value(“touch hello.txt”);
 
 // Use Mesos to run the task.
ContainerInfo containerInfo;
containerInfo.set_type(ContainerInfo::MESOS);
ContainerInfo::MesosInfo mesosInfo;
task.mutable_command()->set_shell(true);
task.mutable_container()->CopyFrom(containerInfo);

> Deprecate / Remove CommandInfo::ContainerInfo
> -
>
> Key: MESOS-2315
> URL: https://issues.apache.org/jira/browse/MESOS-2315
> Project: Mesos
>  Issue Type: Task
>Reporter: Ian Downes
>Assignee: Vaibhav Khanduja
>Priority: Minor
>  Labels: mesosphere, newbie
>
> IIUC this has been deprecated and all current code (except 
> examples/docker_no_executor_framework.cpp) uses the top-level ContainerInfo?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.

2015-10-20 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965883#comment-14965883
 ] 

Joseph Wu commented on MESOS-3762:
--

Reviews for step 1)
https://reviews.apache.org/r/39498/
https://reviews.apache.org/r/39499/

> Refactor SSLTest fixture such that MesosTest can use the same helpers.
> --
>
> Key: MESOS-3762
> URL: https://issues.apache.org/jira/browse/MESOS-3762
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> In order to write tests that exercise SSL with other components of Mesos, 
> such as the HTTP scheduler library, we need to use the setup/teardown logic 
> found in the {{SSLTest}} fixture.
> Currently, the test fixtures have separate inheritance structures like this:
> {code}
> SSLTest <- ::testing::Test
> MesosTest <- TemporaryDirectoryTest <- ::testing::Test
> {code}
> where {{::testing::Test}} is a gtest class.
> The plan is the following:
> # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
> require moving the setup (generation of keys and certs) from 
> {{SetUpTestCase}} to {{SetUp}}.  At the same time, *some* of the cleanup 
> logic in the SSLTest will not be needed.
> # Move the logic of generating keys/certs into helpers, so that individual 
> tests can call them when needed, much like {{MesosTest}}.
> # Write a child class of {{SSLTest}} which has the same functionality as the 
> existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} 
> or the {{RegistryClientTest}}.
> # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
> the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
> {{SSLTest}} could be {{#ifdef}}'d into any empty class.
> The resulting structure should be like:
> {code}
> MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
> ChildOfSSLTest /
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.

2015-10-20 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965877#comment-14965877
 ] 

Joseph Wu commented on MESOS-3762:
--

Found and wrote a fix for an SSL-related test cleanup bug: 
https://reviews.apache.org/r/39495/

> Refactor SSLTest fixture such that MesosTest can use the same helpers.
> --
>
> Key: MESOS-3762
> URL: https://issues.apache.org/jira/browse/MESOS-3762
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> In order to write tests that exercise SSL with other components of Mesos, 
> such as the HTTP scheduler library, we need to use the setup/teardown logic 
> found in the {{SSLTest}} fixture.
> Currently, the test fixtures have separate inheritance structures like this:
> {code}
> SSLTest <- ::testing::Test
> MesosTest <- TemporaryDirectoryTest <- ::testing::Test
> {code}
> where {{::testing::Test}} is a gtest class.
> The plan is the following:
> # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
> require moving the setup (generation of keys and certs) from 
> {{SetUpTestCase}} to {{SetUp}}.  At the same time, *some* of the cleanup 
> logic in the SSLTest will not be needed.
> # Move the logic of generating keys/certs into helpers, so that individual 
> tests can call them when needed, much like {{MesosTest}}.
> # Write a child class of {{SSLTest}} which has the same functionality as the 
> existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} 
> or the {{RegistryClientTest}}.
> # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
> the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
> {{SSLTest}} could be {{#ifdef}}'d into any empty class.
> The resulting structure should be like:
> {code}
> MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
> ChildOfSSLTest /
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.

2015-10-20 Thread Joseph Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3762:
-
Description: 
In order to write tests that exercise SSL with other components of Mesos, such 
as the HTTP scheduler library, we need to use the setup/teardown logic found in 
the {{SSLTest}} fixture.

Currently, the test fixtures have separate inheritance structures like this:
{code}
SSLTest <- ::testing::Test
MesosTest <- TemporaryDirectoryTest <- ::testing::Test
{code}
where {{::testing::Test}} is a gtest class.

The plan is the following:
# Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
require moving the setup (generation of keys and certs) from {{SetUpTestCase}} 
to {{SetUp}}.  At the same time, *some* of the cleanup logic in the SSLTest 
will not be needed.
# Move the logic of generating keys/certs into helpers, so that individual 
tests can call them when needed, much like {{MesosTest}}.
# Write a child class of {{SSLTest}} which has the same functionality as the 
existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} or 
the {{RegistryClientTest}}.
# Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
{{SSLTest}} could be {{#ifdef}}'d into any empty class.

The resulting structure should be like:
{code}
MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
ChildOfSSLTest /
{code}

  was:
In order to write tests that exercise SSL with other components of Mesos, such 
as the HTTP scheduler library, we need to use the setup/teardown logic found in 
the {{SSLTest}} fixture.

Currently, the test fixtures have separate inheritance structures like this:
{code}
SSLTest <- ::testing::Test
MesosTest <- TemporaryDirectoryTest <- ::testing::Test
{code}
where {{::testing::Test}} is a gtest class.

The plan is the following:
1) Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
require moving the setup (generation of keys and certs) from {{SetUpTestCase}} 
to {{SetUp}}.  At the same time, *some* of the cleanup logic in the SSLTest 
will not be needed.
2) Move the logic of generating keys/certs into helpers, so that individual 
tests can call them when needed, much like {{MesosTest}}.
3) Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
{{SSLTest}} could be {{#ifdef}}'d into any empty class.
4) Write a child class of {{SSLTest}} which has the same functionality as the 
existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} or 
the {{RegistryClientTest}}.

The resulting structure should be like:
{code}
MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
ChildOfSSLTest /
{code}


> Refactor SSLTest fixture such that MesosTest can use the same helpers.
> --
>
> Key: MESOS-3762
> URL: https://issues.apache.org/jira/browse/MESOS-3762
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> In order to write tests that exercise SSL with other components of Mesos, 
> such as the HTTP scheduler library, we need to use the setup/teardown logic 
> found in the {{SSLTest}} fixture.
> Currently, the test fixtures have separate inheritance structures like this:
> {code}
> SSLTest <- ::testing::Test
> MesosTest <- TemporaryDirectoryTest <- ::testing::Test
> {code}
> where {{::testing::Test}} is a gtest class.
> The plan is the following:
> # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
> require moving the setup (generation of keys and certs) from 
> {{SetUpTestCase}} to {{SetUp}}.  At the same time, *some* of the cleanup 
> logic in the SSLTest will not be needed.
> # Move the logic of generating keys/certs into helpers, so that individual 
> tests can call them when needed, much like {{MesosTest}}.
> # Write a child class of {{SSLTest}} which has the same functionality as the 
> existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} 
> or the {{RegistryClientTest}}.
> # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
> the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
> {{SSLTest}} could be {{#ifdef}}'d into any empty class.
> The resulting structure should be like:
> {code}
> MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
> ChildOfSSLTest /
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3113) Add resource usage section to containerizer documentation

2015-10-20 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3113:
--
Shepherd: Till Toenshoff

> Add resource usage section to containerizer documentation
> -
>
> Key: MESOS-3113
> URL: https://issues.apache.org/jira/browse/MESOS-3113
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Niklas Quarfot Nielsen
>Assignee: Gilbert Song
>  Labels: docathon, documentaion, mesosphere
>
> Currently, the containerizer documentation doesn't touch upon the usage() API 
> and how to interpret the collected statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-20 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965808#comment-14965808
 ] 

Joseph Wu commented on MESOS-3771:
--

^ That's actually what would (sort of) fix your issue.  There's an old TODO 
[here|https://github.com/apache/mesos/blob/master/src/master/http.cpp#L118-L119]
 to make the change.

We do actually encode {{bytes}} in base64, but only when they are transformed 
into JSON from Protobuf.  However, some of the endpoints (the ones which must 
be backwards compatible) appear to treat {{bytes}} as ASCII strings.  

If you have more control over your version of Spark, you could base64 encode 
from Spark:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala#L47

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1
>Reporter: Steven Schlansker
>Priority: Critical
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2275) Document header include rules in style guide

2015-10-20 Thread Jan Schlicht (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965800#comment-14965800
 ] 

Jan Schlicht commented on MESOS-2275:
-

Judging from the test cases of the review it should work with clang-format. 
It'd sort each block locally and treat angle bracket vs. quotes as separate 
blocks.

> Document header include rules in style guide
> 
>
> Key: MESOS-2275
> URL: https://issues.apache.org/jira/browse/MESOS-2275
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Niklas Quarfot Nielsen
>Assignee: Jan Schlicht
>Priority: Trivial
>  Labels: beginner, docathon, mesosphere
>
> We have several ways of sorting, grouping and ordering headers includes in 
> Mesos. We should agree on a rule set and do a style scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky

2015-10-20 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965788#comment-14965788
 ] 

Gilbert Song commented on MESOS-3773:
-

Seems to be separate failures. We can keep both and link together. 

> RegistryClientTest.SimpleGetBlob is flaky
> -
>
> Key: MESOS-3773
> URL: https://issues.apache.org/jira/browse/MESOS-3773
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Joseph Wu
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> {{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times.  This was 
> encountered on OSX.
> {code:title=Repro}
> bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" 
> --gtest_repeat=10 --gtest_break_on_failure
> {code}
> {code:title=Example Failure}
> [ RUN  ] RegistryClientTest.SimpleGetBlob
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure
> Value of: blobResponse
>   Actual: "2015-10-20 20:58:59.579393024+00:00"
> Expected: blob.get()
> Which is: 
> "\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8
>  \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 
> 20:58:59.579393024+00:00"
> *** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are 
> using GNU date ***
> PC: @0x103144ddc testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: ***
> @ 0x7fff8c58af1a _sigtramp
> @ 0x7fff8386e187 malloc
> @0x1031445b7 testing::internal::AssertHelper::operator=()
> @0x1030d32e0 
> mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
> @0x1030d3562 
> mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
> @0x1031ac8f3 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x103192f87 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1031533f5 testing::Test::Run()
> @0x10315493b testing::TestInfo::Run()
> @0x1031555f7 testing::TestCase::Run()
> @0x103163df3 testing::internal::UnitTestImpl::RunAllTests()
> @0x1031af8c3 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x103195397 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1031639f2 testing::UnitTest::Run()
> @0x1025abd41 RUN_ALL_TESTS()
> @0x1025a8089 main
> @ 0x7fff86b155c9 start
> {code}
> {code:title=Less common failure}
> [ RUN  ] RegistryClientTest.SimpleGetBlob
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure
> (socket).failure(): Failed accept: connection error: 
> error::lib(0):func(0):reason(0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky

2015-10-20 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965769#comment-14965769
 ] 

Anand Mazumdar commented on MESOS-3773:
---

Dup of https://issues.apache.org/jira/browse/MESOS-3726 ?

> RegistryClientTest.SimpleGetBlob is flaky
> -
>
> Key: MESOS-3773
> URL: https://issues.apache.org/jira/browse/MESOS-3773
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Joseph Wu
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> {{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times.  This was 
> encountered on OSX.
> {code:title=Repro}
> bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" 
> --gtest_repeat=10 --gtest_break_on_failure
> {code}
> {code:title=Example Failure}
> [ RUN  ] RegistryClientTest.SimpleGetBlob
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure
> Value of: blobResponse
>   Actual: "2015-10-20 20:58:59.579393024+00:00"
> Expected: blob.get()
> Which is: 
> "\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8
>  \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 
> 20:58:59.579393024+00:00"
> *** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are 
> using GNU date ***
> PC: @0x103144ddc testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: ***
> @ 0x7fff8c58af1a _sigtramp
> @ 0x7fff8386e187 malloc
> @0x1031445b7 testing::internal::AssertHelper::operator=()
> @0x1030d32e0 
> mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
> @0x1030d3562 
> mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
> @0x1031ac8f3 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x103192f87 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1031533f5 testing::Test::Run()
> @0x10315493b testing::TestInfo::Run()
> @0x1031555f7 testing::TestCase::Run()
> @0x103163df3 testing::internal::UnitTestImpl::RunAllTests()
> @0x1031af8c3 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x103195397 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1031639f2 testing::UnitTest::Run()
> @0x1025abd41 RUN_ALL_TESTS()
> @0x1025a8089 main
> @ 0x7fff86b155c9 start
> {code}
> {code:title=Less common failure}
> [ RUN  ] RegistryClientTest.SimpleGetBlob
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure
> (socket).failure(): Failed accept: connection error: 
> error::lib(0):func(0):reason(0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-20 Thread Steven Schlansker (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965762#comment-14965762
 ] 

Steven Schlansker commented on MESOS-3771:
--

Similar, but potentially unrelated, issue: 
https://issues.apache.org/jira/browse/MESOS-3284

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1
>Reporter: Steven Schlansker
>Priority: Critical
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky

2015-10-20 Thread Joseph Wu (JIRA)

Joseph Wu created MESOS-3773:


 Summary: RegistryClientTest.SimpleGetBlob is flaky
 Key: MESOS-3773
 URL: https://issues.apache.org/jira/browse/MESOS-3773
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Joseph Wu
Assignee: Jojy Varghese


{{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times.  This was 
encountered on OSX.

{code:title=Repro}
bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" 
--gtest_repeat=10 --gtest_break_on_failure
{code}

{code:title=Example Failure}
[ RUN  ] RegistryClientTest.SimpleGetBlob
../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure
Value of: blobResponse
  Actual: "2015-10-20 20:58:59.579393024+00:00"
Expected: blob.get()
Which is: 
"\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8
 \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 
20:58:59.579393024+00:00"
*** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are 
using GNU date ***
PC: @0x103144ddc testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: ***
@ 0x7fff8c58af1a _sigtramp
@ 0x7fff8386e187 malloc
@0x1031445b7 testing::internal::AssertHelper::operator=()
@0x1030d32e0 
mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
@0x1030d3562 
mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
@0x1031ac8f3 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@0x103192f87 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x1031533f5 testing::Test::Run()
@0x10315493b testing::TestInfo::Run()
@0x1031555f7 testing::TestCase::Run()
@0x103163df3 testing::internal::UnitTestImpl::RunAllTests()
@0x1031af8c3 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@0x103195397 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x1031639f2 testing::UnitTest::Run()
@0x1025abd41 RUN_ALL_TESTS()
@0x1025a8089 main
@ 0x7fff86b155c9 start
{code}

{code:title=Less common failure}
[ RUN  ] RegistryClientTest.SimpleGetBlob
../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure
(socket).failure(): Failed accept: connection error: 
error::lib(0):func(0):reason(0)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3694) Enable building mesos.apache.org locally in a Docker container.

2015-10-20 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3694:
-
Sprint: Mesosphere Sprint 21

> Enable building mesos.apache.org locally in a Docker container.
> ---
>
> Key: MESOS-3694
> URL: https://issues.apache.org/jira/browse/MESOS-3694
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Artem Harutyunyan
>  Labels: mesosphere
>
> We should make it easy for everyone to modify the website and be able to 
> generate it locally before pushing to upstream. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3694) Enable building mesos.apache.org locally in a Docker container.

2015-10-20 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3694:
-
Story Points: 3  (was: 2)

> Enable building mesos.apache.org locally in a Docker container.
> ---
>
> Key: MESOS-3694
> URL: https://issues.apache.org/jira/browse/MESOS-3694
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Artem Harutyunyan
>  Labels: mesosphere
>
> We should make it easy for everyone to modify the website and be able to 
> generate it locally before pushing to upstream. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3749) Configuration docs are missing --enable-libevent and --enable-ssl

2015-10-20 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3749:
-
Sprint: Mesosphere Sprint 21

> Configuration docs are missing --enable-libevent and --enable-ssl
> -
>
> Key: MESOS-3749
> URL: https://issues.apache.org/jira/browse/MESOS-3749
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: configuration, documentaion, installation, mesosphere
>
> The {{\-\-enable-libevent}} and {{\-\-enable-ssl}} config flags are currently 
> not documented in the "Configuration" docs with the rest of the flags. They 
> should be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3749) Configuration docs are missing --enable-libevent and --enable-ssl

2015-10-20 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965730#comment-14965730
 ] 

Greg Mann commented on MESOS-3749:
--

Review here: https://reviews.apache.org/r/39494/

> Configuration docs are missing --enable-libevent and --enable-ssl
> -
>
> Key: MESOS-3749
> URL: https://issues.apache.org/jira/browse/MESOS-3749
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: configuration, documentaion, installation, mesosphere
>
> The {{\-\-enable-libevent}} and {{\-\-enable-ssl}} config flags are currently 
> not documented in the "Configuration" docs with the rest of the flags. They 
> should be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3024) HTTP endpoint authN is enabled merely by specifying --credentials

2015-10-20 Thread Adam B (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965727#comment-14965727
 ] 

Adam B commented on MESOS-3024:
---

See also the work that [~arojas] is doing for HTTP Authentication in MESOS-2297.
I think we can start by introducing a `--authenticate_webui` flag or instead 
use ACLs to determine when to do webui authn.

> HTTP endpoint authN is enabled merely by specifying --credentials
> -
>
> Key: MESOS-3024
> URL: https://issues.apache.org/jira/browse/MESOS-3024
> Project: Mesos
>  Issue Type: Bug
>  Components: master, security
>Reporter: Adam B
>Assignee: Marco Massenzio
>  Labels: authentication, http, mesosphere
>
> If I set `--credentials` on the master, framework and slave authentication 
> are allowed, but not required. On the other hand, http authentication is now 
> required for authenticated endpoints (currently only `/shutdown`). That means 
> that I cannot enable framework or slave authentication without also enabling 
> http endpoint authentication. This is undesirable.
> Framework and slave authentication have separate flags (`\--authenticate` and 
> `\--authenticate_slaves`) to require authentication for each. It would be 
> great if there was also such a flag for framework authentication. Or maybe we 
> get rid of these flags altogether and rely on ACLs to determine which 
> unauthenticated principals are even allowed to authenticate for each 
> endpoint/action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`

2015-10-20 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3506:
-
Sprint: Mesosphere Sprint 21

> Build instructions for CentOS 6.6 should include `sudo yum update`
> --
>
> Key: MESOS-3506
> URL: https://issues.apache.org/jira/browse/MESOS-3506
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the 
> build to break when building {{mesos-0.25.0.jar}}. The build instructions for 
> this platform on the Getting Started page should be changed accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`

2015-10-20 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965719#comment-14965719
 ] 

Greg Mann commented on MESOS-3506:
--

Review here: https://reviews.apache.org/r/39493/

> Build instructions for CentOS 6.6 should include `sudo yum update`
> --
>
> Key: MESOS-3506
> URL: https://issues.apache.org/jira/browse/MESOS-3506
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the 
> build to break when building {{mesos-0.25.0.jar}}. The build instructions for 
> this platform on the Getting Started page should be changed accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING

2015-10-20 Thread Niklas Quarfot Nielsen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965665#comment-14965665
 ] 

Niklas Quarfot Nielsen commented on MESOS-3766:
---

[~matth...@mesosphere.io] Also, can you grab the master and slave state 
endpoint data?

> Can not kill task in Status STAGING
> ---
>
> Key: MESOS-3766
> URL: https://issues.apache.org/jira/browse/MESOS-3766
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.25.0
> Environment: OSX 
>Reporter: Matthias Veit
>Assignee: Niklas Quarfot Nielsen
>
> I have created a simple Marathon Application with instance count 100 (100 
> tasks) with a simple sleep command. Before all tasks were running, I killed 
> all tasks. This operation was successful, except 2 tasks. These 2 tasks are 
> in state STAGING (according to the mesos UI). Marathon tries to kill those 
> tasks every 5 seconds (for over an hour now) - unsuccessfully.
> I picked one task and grepped the slave log:
> {noformat}
> I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour
> I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80
> I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container 
> '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr
> I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing 
> executor's forked pid 37096 to 
> '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks
> I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000
> I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame
> I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework

[jira] [Commented] (MESOS-3772) Consistency of quoted strings in error messages

2015-10-20 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965663#comment-14965663
 ] 

Neil Conway commented on MESOS-3772:


[~cmaloney] -- hmm, looks useful. Thanks!

[~benjaminhindman] -- [~jvanremoortere] you might have an opinion here?

> Consistency of quoted strings in error messages
> ---
>
> Key: MESOS-3772
> URL: https://issues.apache.org/jira/browse/MESOS-3772
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: mesosphere, newbie
>
> Example log output:
> {quote}
> I1020 18:56:02.933956  1790 slave.cpp:1270] Got assigned task 13 for 
> framework 496620b9-4368-4a71-b741-68216f3d909f-
> I1020 18:56:02.934185  1790 slave.cpp:1386] Launching task 13 for framework 
> 496620b9-4368-4a71-b741-68216f3d909f-
> I1020 18:56:02.934408  1790 slave.cpp:1618] Queuing task '13' for executor 
> default of framework '496620b9-4368-4a71-b741-68216f3d909f-
> I1020 18:56:02.935417  1790 slave.cpp:1760] Sending queued task '13' to 
> executor 'default' of framework 496620b9-4368-4a71-b741-68216f3d909f-
> {quote}
> Aside from the typo (unmatched quote) in the third line, these log messages 
> using quoting inconsistently: sometimes task, executor, and framework IDs are 
> quoted, other times they are not.
> We should probably adopt a general rule, a la 
> http://www.postgresql.org/docs/9.4/static/error-style-guide.html . My 
> proposal: when interpolating a variable, only use quotes if it is possible 
> that the value might contain whitespace or punctuation (in the latter case, 
> the punctuation should probably be escaped).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3772) Consistency of quoted strings in error messages

2015-10-20 Thread Cody Maloney (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965652#comment-14965652
 ] 

Cody Maloney commented on MESOS-3772:
-

What about generally preferring 
[std::quoted|http://en.cppreference.com/w/cpp/io/manip/quoted]? That does the 
escaping of quotes inside the string for you, as well as adding single quotes 
so it is a predictable / reversable transformation.

> Consistency of quoted strings in error messages
> ---
>
> Key: MESOS-3772
> URL: https://issues.apache.org/jira/browse/MESOS-3772
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: mesosphere, newbie
>
> Example log output:
> {quote}
> I1020 18:56:02.933956  1790 slave.cpp:1270] Got assigned task 13 for 
> framework 496620b9-4368-4a71-b741-68216f3d909f-
> I1020 18:56:02.934185  1790 slave.cpp:1386] Launching task 13 for framework 
> 496620b9-4368-4a71-b741-68216f3d909f-
> I1020 18:56:02.934408  1790 slave.cpp:1618] Queuing task '13' for executor 
> default of framework '496620b9-4368-4a71-b741-68216f3d909f-
> I1020 18:56:02.935417  1790 slave.cpp:1760] Sending queued task '13' to 
> executor 'default' of framework 496620b9-4368-4a71-b741-68216f3d909f-
> {quote}
> Aside from the typo (unmatched quote) in the third line, these log messages 
> using quoting inconsistently: sometimes task, executor, and framework IDs are 
> quoted, other times they are not.
> We should probably adopt a general rule, a la 
> http://www.postgresql.org/docs/9.4/static/error-style-guide.html . My 
> proposal: when interpolating a variable, only use quotes if it is possible 
> that the value might contain whitespace or punctuation (in the latter case, 
> the punctuation should probably be escaped).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-1563) Failed to configure on FreeBSD

2015-10-20 Thread David Forsythe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965648#comment-14965648
 ] 

David Forsythe edited comment on MESOS-1563 at 10/20/15 7:52 PM:
-

[~idownes] is there any way https://reviews.apache.org/r/39345/ will land as 
is, or do I need to chop it up into multiple commits?


was (Author: dforsyth):
[~idownes] is there anyway https://reviews.apache.org/r/39345/ will land as is, 
or do I need to chop it up into multiple commits?

> Failed to configure on FreeBSD
> --
>
> Key: MESOS-1563
> URL: https://issues.apache.org/jira/browse/MESOS-1563
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
> Environment: FreeBSD-10/stable
>Reporter: Dmitry Sivachenko
>
> When trying to configure mesos on FreeBSD, I get the following error:
> configure: Setting up build environment for x86_64 freebsd10.0
> configure: error: "Mesos is currently unsupported on your platform."
> Why? Is there anything really Linux-specific inside? It's written in Java 
> after all.
> And MacOS is supported, but it is rather close to FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-1563) Failed to configure on FreeBSD

2015-10-20 Thread David Forsythe (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965648#comment-14965648
 ] 

David Forsythe commented on MESOS-1563:
---

[~idownes] is there anyway https://reviews.apache.org/r/39345/ will land as is, 
or do I need to chop it up into multiple commits?

> Failed to configure on FreeBSD
> --
>
> Key: MESOS-1563
> URL: https://issues.apache.org/jira/browse/MESOS-1563
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.19.0
> Environment: FreeBSD-10/stable
>Reporter: Dmitry Sivachenko
>
> When trying to configure mesos on FreeBSD, I get the following error:
> configure: Setting up build environment for x86_64 freebsd10.0
> configure: error: "Mesos is currently unsupported on your platform."
> Why? Is there anything really Linux-specific inside? It's written in Java 
> after all.
> And MacOS is supported, but it is rather close to FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3024) HTTP endpoint authN is enabled merely by specifying --credentials

2015-10-20 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3024:
---
Shepherd: Adam B
  Sprint: Mesosphere Sprint 21
Story Points: 8
Target Version/s: 0.26.0

> HTTP endpoint authN is enabled merely by specifying --credentials
> -
>
> Key: MESOS-3024
> URL: https://issues.apache.org/jira/browse/MESOS-3024
> Project: Mesos
>  Issue Type: Bug
>  Components: master, security
>Reporter: Adam B
>Assignee: Marco Massenzio
>  Labels: authentication, http, mesosphere
>
> If I set `--credentials` on the master, framework and slave authentication 
> are allowed, but not required. On the other hand, http authentication is now 
> required for authenticated endpoints (currently only `/shutdown`). That means 
> that I cannot enable framework or slave authentication without also enabling 
> http endpoint authentication. This is undesirable.
> Framework and slave authentication have separate flags (`\--authenticate` and 
> `\--authenticate_slaves`) to require authentication for each. It would be 
> great if there was also such a flag for framework authentication. Or maybe we 
> get rid of these flags altogether and rely on ACLs to determine which 
> unauthenticated principals are even allowed to authenticate for each 
> endpoint/action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3772) Consistency of quoted strings in error messages

2015-10-20 Thread Neil Conway (JIRA)

Neil Conway created MESOS-3772:
--

 Summary: Consistency of quoted strings in error messages
 Key: MESOS-3772
 URL: https://issues.apache.org/jira/browse/MESOS-3772
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway


Example log output:

{quote}
I1020 18:56:02.933956  1790 slave.cpp:1270] Got assigned task 13 for framework 
496620b9-4368-4a71-b741-68216f3d909f-
I1020 18:56:02.934185  1790 slave.cpp:1386] Launching task 13 for framework 
496620b9-4368-4a71-b741-68216f3d909f-
I1020 18:56:02.934408  1790 slave.cpp:1618] Queuing task '13' for executor 
default of framework '496620b9-4368-4a71-b741-68216f3d909f-
I1020 18:56:02.935417  1790 slave.cpp:1760] Sending queued task '13' to 
executor 'default' of framework 496620b9-4368-4a71-b741-68216f3d909f-
{quote}

Aside from the typo (unmatched quote) in the third line, these log messages 
using quoting inconsistently: sometimes task, executor, and framework IDs are 
quoted, other times they are not.

We should probably adopt a general rule, a la 
http://www.postgresql.org/docs/9.4/static/error-style-guide.html . My proposal: 
when interpolating a variable, only use quotes if it is possible that the value 
might contain whitespace or punctuation (in the latter case, the punctuation 
should probably be escaped).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-3024) HTTP endpoint authN is enabled merely by specifying --credentials

2015-10-20 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio reassigned MESOS-3024:
--

Assignee: Marco Massenzio

> HTTP endpoint authN is enabled merely by specifying --credentials
> -
>
> Key: MESOS-3024
> URL: https://issues.apache.org/jira/browse/MESOS-3024
> Project: Mesos
>  Issue Type: Bug
>  Components: master, security
>Reporter: Adam B
>Assignee: Marco Massenzio
>  Labels: authentication, http, mesosphere
>
> If I set `--credentials` on the master, framework and slave authentication 
> are allowed, but not required. On the other hand, http authentication is now 
> required for authenticated endpoints (currently only `/shutdown`). That means 
> that I cannot enable framework or slave authentication without also enabling 
> http endpoint authentication. This is undesirable.
> Framework and slave authentication have separate flags (`\--authenticate` and 
> `\--authenticate_slaves`) to require authentication for each. It would be 
> great if there was also such a flag for framework authentication. Or maybe we 
> get rid of these flags altogether and rely on ACLs to determine which 
> unauthenticated principals are even allowed to authenticate for each 
> endpoint/action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3769) Agent logs are misleading during agent shutdown

2015-10-20 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3769:
--
Labels: newbie  (was: )

Looks like all code-paths in {{void Slave::shutdown(const UPID& from, const 
string& message)}} should should check for {{message.empty()}} before trying to 
log it.


> Agent logs are misleading during agent shutdown
> ---
>
> Key: MESOS-3769
> URL: https://issues.apache.org/jira/browse/MESOS-3769
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: newbie
>
> When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted 
> following logs:
> {noformat}
> I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received 
> status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for 
> task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update 
> TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of 
> framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to 
> master@172.18.6.110:62507
> I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting 
> down
> I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0
> I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> {noformat}
> It looks like {{Slave::shutdown()}} uses wrong assumptions about possible 
> execution paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3769) Agent logs are misleading during agent shutdown

2015-10-20 Thread Till Toenshoff (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3769:
--
Shepherd: Till Toenshoff

> Agent logs are misleading during agent shutdown
> ---
>
> Key: MESOS-3769
> URL: https://issues.apache.org/jira/browse/MESOS-3769
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Priority: Minor
>
> When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted 
> following logs:
> {noformat}
> I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received 
> status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for 
> task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update 
> TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of 
> framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to 
> master@172.18.6.110:62507
> I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting 
> down
> I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0
> I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework 
> 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
> {noformat}
> It looks like {{Slave::shutdown()}} uses wrong assumptions about possible 
> execution paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2935) Fetcher doesn't extract from .tar files

2015-10-20 Thread Bhuvan Arumugam (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965546#comment-14965546
 ] 

Bhuvan Arumugam commented on MESOS-2935:


Thats great, [~bernd-mesos]. Thank you!

> Fetcher doesn't extract from .tar files
> ---
>
> Key: MESOS-2935
> URL: https://issues.apache.org/jira/browse/MESOS-2935
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Reporter: Sargun Dhillon
>Assignee: Bhuvan Arumugam
>Priority: Minor
>  Labels: newbie
>
> Compressed artifacts get decompressed with either "unzip -d" or "tar -C $DIR 
> -xf" 
> In addition, only the following file suffixes / extensions result in 
> decompression:
> -tgz
> -tar.gz
> -tbz2
> -tar.bz2
> -tar.xz
> -txz
> -zip
> OR 
> Alternatively, change fetcher to accept .tar as a valid suffix to trigger the 
> tarball code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-20 Thread Steven Schlansker (JIRA)

Steven Schlansker created MESOS-3771:


 Summary: Mesos JSON API creates invalid JSON due to lack of binary 
data / non-ASCII handling
 Key: MESOS-3771
 URL: https://issues.apache.org/jira/browse/MESOS-3771
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Affects Versions: 0.24.1
Reporter: Steven Schlansker
Priority: Critical


Spark encodes some binary data into the ExecutorInfo.data field.  This field is 
sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.

If you have such a field, it seems that it is splatted out into JSON without 
any regards to proper character encoding:

{quote}
0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
{quote}

I suspect this is because the HTTP api emits the executorInfo.data directly:

{code}
JSON::Object model(const ExecutorInfo& executorInfo)
{
  JSON::Object object;
  object.values["executor_id"] = executorInfo.executor_id().value();
  object.values["name"] = executorInfo.name();
  object.values["data"] = executorInfo.data();
  object.values["framework_id"] = executorInfo.framework_id().value();
  object.values["command"] = model(executorInfo.command());
  object.values["resources"] = model(executorInfo.resources());
  return object;
}
{code}

I think this may be because the custom JSON processing library in stout seems 
to not have any idea of what a byte array is.  I'm guessing that some implicit 
conversion makes it get written as a String instead, but:

{code}
inline std::ostream& operator<<(std::ostream& out, const String& string)
{
  // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
  // See RFC4627 for the JSON string specificiation.
  return out << picojson::value(string.value).serialize();
}
{code}

Thank you for any assistance here.  Our cluster is currently entirely down -- 
the frameworks cannot handle parsing the invalid JSON produced (it is not even 
valid utf-8)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-20 Thread Steven Schlansker (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Schlansker updated MESOS-3771:
-
Description: 
Spark encodes some binary data into the ExecutorInfo.data field.  This field is 
sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.

If you have such a field, it seems that it is splatted out into JSON without 
any regards to proper character encoding:

{code}
0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
{code}

I suspect this is because the HTTP api emits the executorInfo.data directly:

{code}
JSON::Object model(const ExecutorInfo& executorInfo)
{
  JSON::Object object;
  object.values["executor_id"] = executorInfo.executor_id().value();
  object.values["name"] = executorInfo.name();
  object.values["data"] = executorInfo.data();
  object.values["framework_id"] = executorInfo.framework_id().value();
  object.values["command"] = model(executorInfo.command());
  object.values["resources"] = model(executorInfo.resources());
  return object;
}
{code}

I think this may be because the custom JSON processing library in stout seems 
to not have any idea of what a byte array is.  I'm guessing that some implicit 
conversion makes it get written as a String instead, but:

{code}
inline std::ostream& operator<<(std::ostream& out, const String& string)
{
  // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
  // See RFC4627 for the JSON string specificiation.
  return out << picojson::value(string.value).serialize();
}
{code}

Thank you for any assistance here.  Our cluster is currently entirely down -- 
the frameworks cannot handle parsing the invalid JSON produced (it is not even 
valid utf-8)


  was:
Spark encodes some binary data into the ExecutorInfo.data field.  This field is 
sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.

If you have such a field, it seems that it is splatted out into JSON without 
any regards to proper character encoding:

{quote}
0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
{quote}

I suspect this is because the HTTP api emits the executorInfo.data directly:

{code}
JSON::Object model(const ExecutorInfo& executorInfo)
{
  JSON::Object object;
  object.values["executor_id"] = executorInfo.executor_id().value();
  object.values["name"] = executorInfo.name();
  object.values["data"] = executorInfo.data();
  object.values["framework_id"] = executorInfo.framework_id().value();
  object.values["command"] = model(executorInfo.command());
  object.values["resources"] = model(executorInfo.resources());
  return object;
}
{code}

I think this may be because the custom JSON processing library in stout seems 
to not have any idea of what a byte array is.  I'm guessing that some implicit 
conversion makes it get written as a String instead, but:

{code}
inline std::ostream& operator<<(std::ostream& out, const String& string)
{
  // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
  // See RFC4627 for the JSON string specificiation.
  return out << picojson::value(string.value).serialize();
}
{code}

Thank you for any assistance here.  Our cluster is currently entirely down -- 
the frameworks cannot handle parsing the invalid JSON produced (it is not even 
valid utf-8)



> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1
>Reporter: Steven Schlansker
>Priority: Critical
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 7

[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING

2015-10-20 Thread Niklas Quarfot Nielsen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965474#comment-14965474
 ] 

Niklas Quarfot Nielsen commented on MESOS-3766:
---

[~matth...@mesosphere.io] acknowledged; will take a look.

Can you share the full logs in the mean time? Any details that precedes the 
stuck state would help. 

> Can not kill task in Status STAGING
> ---
>
> Key: MESOS-3766
> URL: https://issues.apache.org/jira/browse/MESOS-3766
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.25.0
> Environment: OSX 
>Reporter: Matthias Veit
>Assignee: Niklas Quarfot Nielsen
>
> I have created a simple Marathon Application with instance count 100 (100 
> tasks) with a simple sleep command. Before all tasks were running, I killed 
> all tasks. This operation was successful, except 2 tasks. These 2 tasks are 
> in state STAGING (according to the mesos UI). Marathon tries to kill those 
> tasks every 5 seconds (for over an hour now) - unsuccessfully.
> I picked one task and grepped the slave log:
> {noformat}
> I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour
> I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80
> I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container 
> '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr
> I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing 
> executor's forked pid 37096 to 
> '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks
> I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000
> I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame
> I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:18.316463 316018688 slave.cpp:1789] Ask

[jira] [Assigned] (MESOS-3766) Can not kill task in Status STAGING

2015-10-20 Thread Niklas Quarfot Nielsen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen reassigned MESOS-3766:
-

Assignee: Niklas Quarfot Nielsen

> Can not kill task in Status STAGING
> ---
>
> Key: MESOS-3766
> URL: https://issues.apache.org/jira/browse/MESOS-3766
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.25.0
> Environment: OSX 
>Reporter: Matthias Veit
>Assignee: Niklas Quarfot Nielsen
>
> I have created a simple Marathon Application with instance count 100 (100 
> tasks) with a simple sleep command. Before all tasks were running, I killed 
> all tasks. This operation was successful, except 2 tasks. These 2 tasks are 
> in state STAGING (according to the mesos UI). Marathon tries to kill those 
> tasks every 5 seconds (for over an hour now) - unsuccessfully.
> I picked one task and grepped the slave log:
> {noformat}
> I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour
> I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80
> I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container 
> '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr
> I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing 
> executor's forked pid 37096 to 
> '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks
> I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000
> I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame
> I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:23.337116 313872384 slave.cpp:1789] Asked t

[jira] [Updated] (MESOS-3736) Support docker local store pull same image simultaneously

2015-10-20 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-3736:

Story Points: 3
  Labels: mesosphere  (was: )

> Support docker local store pull same image simultaneously 
> --
>
> Key: MESOS-3736
> URL: https://issues.apache.org/jira/browse/MESOS-3736
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: mesosphere
>
> The current local store implements get() using the local puller. For all 
> requests of pulling same docker image at the same time, the local puller just 
> untar the image tarball as many times as those requests are, and cp all of 
> them to the same directory, which wastes time and bear high demand of 
> computation. We should be able to support the local store/puller only do 
> these for the first time, and the simultaneous pulling request should wait 
> for the promised future and get it once the first pulling finishes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3736) Support docker local store pull same image simultaneously

2015-10-20 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965463#comment-14965463
 ] 

Gilbert Song commented on MESOS-3736:
-

This is a note referring to current consideration after in person discussion 
with Jojy. Thanks to [~jojy].

*Still considering two question:
 1. Handling simultaneous failure. If the first request is called and is 
written into the hashmap. All the other requests will be waiting for the future 
of the first request. But because its return type is 'Future>', 
if its future status is 'FAILED/DISCARDED', the other requests will be waiting 
forever. 
 2. The current hashmap uses 'stringify(image::name)' as key, but it may not be 
unique because there is chance that layer_ids can be changed. One solution is 
to have 'stringify(image)' as key.

> Support docker local store pull same image simultaneously 
> --
>
> Key: MESOS-3736
> URL: https://issues.apache.org/jira/browse/MESOS-3736
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>
> The current local store implements get() using the local puller. For all 
> requests of pulling same docker image at the same time, the local puller just 
> untar the image tarball as many times as those requests are, and cp all of 
> them to the same directory, which wastes time and bear high demand of 
> computation. We should be able to support the local store/puller only do 
> these for the first time, and the simultaneous pulling request should wait 
> for the promised future and get it once the first pulling finishes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3770) SlaveRecoveryTest/0.RecoverCompletedExecutor is flaky

2015-10-20 Thread Vinod Kone (JIRA)

Vinod Kone created MESOS-3770:
-

 Summary: SlaveRecoveryTest/0.RecoverCompletedExecutor is flaky
 Key: MESOS-3770
 URL: https://issues.apache.org/jira/browse/MESOS-3770
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.25.0
Reporter: Vinod Kone


Observed this in internal CI

{code}
DEBUG: [ RUN  ] SlaveRecoveryTest/0.RecoverCompletedExecutor
DEBUG: Using temporary directory 
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B'
DEBUG: I1020 08:56:36.634321 28115 leveldb.cpp:176] Opened db in 185.662339ms
DEBUG: I1020 08:56:36.701638 28115 leveldb.cpp:183] Compacted db in 67.257643ms
DEBUG: I1020 08:56:36.701705 28115 leveldb.cpp:198] Created db iterator in 
8212ns
DEBUG: I1020 08:56:36.701719 28115 leveldb.cpp:204] Seeked to beginning of db 
in 1417ns
DEBUG: I1020 08:56:36.701730 28115 leveldb.cpp:273] Iterated through 0 keys in 
the db in 357ns
DEBUG: I1020 08:56:36.701756 28115 replica.cpp:746] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
DEBUG: I1020 08:56:36.702062 28132 recover.cpp:449] Starting replica recovery
DEBUG: I1020 08:56:36.702116 28132 recover.cpp:475] Replica is in EMPTY status
DEBUG: I1020 08:56:36.702952 28132 replica.cpp:642] Replica in EMPTY status 
received a broadcasted recover request from (7143)@172.16.132.117:37586
DEBUG: I1020 08:56:36.703795 28141 recover.cpp:195] Received a recover response 
from a replica in EMPTY status
DEBUG: I1020 08:56:36.704100 28138 recover.cpp:566] Updating replica status to 
STARTING
DEBUG: I1020 08:56:36.705229 28133 master.cpp:376] Master 
0d54e2f1-43d7-4f74-8532-9c37ac40791b (smfc-ahy-19-sr2.corpdc.twitter.com) 
started on 172.16.132.117:37586
DEBUG: I1020 08:56:36.705289 28133 master.cpp:378] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/credentials"
 --framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/master" 
--zk_session_timeout="10secs"
DEBUG: I1020 08:56:36.705440 28133 master.cpp:423] Master only allowing 
authenticated frameworks to register
DEBUG: I1020 08:56:36.705446 28133 master.cpp:428] Master only allowing 
authenticated slaves to register
DEBUG: I1020 08:56:36.705451 28133 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/credentials'
DEBUG: I1020 08:56:36.705587 28133 master.cpp:467] Using default 'crammd5' 
authenticator
DEBUG: I1020 08:56:36.705651 28133 master.cpp:504] Authorization enabled
DEBUG: I1020 08:56:36.706521 28134 master.cpp:1609] The newly elected leader is 
master@172.16.132.117:37586 with id 0d54e2f1-43d7-4f74-8532-9c37ac40791b
DEBUG: I1020 08:56:36.706539 28134 master.cpp:1622] Elected as the leading 
master!
DEBUG: I1020 08:56:36.706545 28134 master.cpp:1382] Recovering from registrar
DEBUG: I1020 08:56:36.706681 28146 registrar.cpp:309] Recovering registrar
DEBUG: I1020 08:56:36.768453 28138 leveldb.cpp:306] Persisting metadata (8 
bytes) to leveldb took 64.300669ms
DEBUG: I1020 08:56:36.768492 28138 replica.cpp:323] Persisted replica status to 
STARTING
DEBUG: I1020 08:56:36.768568 28138 recover.cpp:475] Replica is in STARTING 
status
DEBUG: I1020 08:56:36.769737 28131 replica.cpp:642] Replica in STARTING status 
received a broadcasted recover request from (7144)@172.16.132.117:37586
DEBUG: I1020 08:56:36.769816 28131 recover.cpp:195] Received a recover response 
from a replica in STARTING status
DEBUG: I1020 08:56:36.770355 28141 recover.cpp:566] Updating replica status to 
VOTING
DEBUG: I1020 08:56:36.818709 28136 leveldb.cpp:306] Persisting metadata (8 
bytes) to leveldb took 48.054479ms
DEBUG: I1020 08:56:36.818743 28136 replica.cpp:323] Persisted replica status to 
VOTING
DEBUG: I1020 08:56:36.818791 28136 recover.cpp:580] Successfully joined the 
Paxos group
DEBUG: I1020 08:56:36.818842 28136 recover.cpp:464] Recover process terminated
DEBUG: I1020 08:56:36.818954 28130 log.cpp:661] Attempting to start the writer
DEBUG: I1020 08:56:36.820060 28140 replica.cpp:478] Replica received implicit 
promise request from (7145)@172.16.132.117:37586 with proposal 1
DEBUG: I1020 08:56:36.8854

[jira] [Commented] (MESOS-3736) Support docker local store pull same image simultaneously

2015-10-20 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965386#comment-14965386
 ] 

Gilbert Song commented on MESOS-3736:
-

https://reviews.apache.org/r/39331/

> Support docker local store pull same image simultaneously 
> --
>
> Key: MESOS-3736
> URL: https://issues.apache.org/jira/browse/MESOS-3736
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>
> The current local store implements get() using the local puller. For all 
> requests of pulling same docker image at the same time, the local puller just 
> untar the image tarball as many times as those requests are, and cp all of 
> them to the same directory, which wastes time and bear high demand of 
> computation. We should be able to support the local store/puller only do 
> these for the first time, and the simultaneous pulling request should wait 
> for the promised future and get it once the first pulling finishes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3501) configure cannot find libevent headers in CentOS 6

2015-10-20 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965384#comment-14965384
 ] 

Greg Mann commented on MESOS-3501:
--

I like Neil's idea of updating the configure error message to note that 
libevent2 is required, I think this may be enough to guide the user in the 
right direction. I also have a ticket open to add the {{--enable-libevent}} 
flag to the "Configuration" docs, so we can link to the libevent documentation 
there as well.

> configure cannot find libevent headers in CentOS 6
> --
>
> Key: MESOS-3501
> URL: https://issues.apache.org/jira/browse/MESOS-3501
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: CentOS 6.6, 6.7
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: build, configure, libevent, mesosphere
>
> If libevent is installed via {{sudo yum install libevent-headers}}, running 
> {{../configure --enable-libevent}} will fail to discover the libevent headers:
> {code}
> checking event2/event.h usability... no
> checking event2/event.h presence... no
> checking for event2/event.h... no
> configure: error: cannot find libevent headers
> ---
> libevent is required for libprocess to build.
> ---
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3113) Add resource usage section to containerizer documentation

2015-10-20 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965380#comment-14965380
 ] 

Gilbert Song commented on MESOS-3113:
-

https://reviews.apache.org/r/39484/

> Add resource usage section to containerizer documentation
> -
>
> Key: MESOS-3113
> URL: https://issues.apache.org/jira/browse/MESOS-3113
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Niklas Quarfot Nielsen
>Assignee: Gilbert Song
>  Labels: docathon, documentaion, mesosphere
>
> Currently, the containerizer documentation doesn't touch upon the usage() API 
> and how to interpret the collected statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3113) Add resource usage section to containerizer documentation

2015-10-20 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-3113:

 Sprint: Mesosphere Sprint 21
 Labels: docathon documentaion mesosphere  (was: mesosphere)
Component/s: documentation

> Add resource usage section to containerizer documentation
> -
>
> Key: MESOS-3113
> URL: https://issues.apache.org/jira/browse/MESOS-3113
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Niklas Quarfot Nielsen
>Assignee: Gilbert Song
>  Labels: docathon, documentaion, mesosphere
>
> Currently, the containerizer documentation doesn't touch upon the usage() API 
> and how to interpret the collected statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.

2015-10-20 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965341#comment-14965341
 ] 

Joseph Wu commented on MESOS-3581:
--

IMO, more importantly, we should actually update the Doxygen docs.  They were 
last updated 13 months ago.  (See linked issues)

Also, we can easily get rid of the license headers by actually documenting the 
classes.
For example, the [Watcher 
class|http://mesos.apache.org/api/latest/c++/classWatcher.html] has proper 
documentation *and* a license.

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3769) Agent logs are misleading during agent shutdown

2015-10-20 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-3769:
--

 Summary: Agent logs are misleading during agent shutdown
 Key: MESOS-3769
 URL: https://issues.apache.org/jira/browse/MESOS-3769
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rukletsov
Priority: Minor


When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted 
following logs:
{noformat}
I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received status 
update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of 
framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b-
I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update 
TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of 
framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to master@172.18.6.110:62507
I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting down
I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework 
7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0
I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework 
7aff439d-307c-486b-9c0d-c2a47ddbda5b-
{noformat}

It looks like {{Slave::shutdown()}} uses wrong assumptions about possible 
execution paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3759) Document messages.proto

2015-10-20 Thread Joseph Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3759:
-
Labels: docathon documentation mesosphere  (was: documentation mesosphere)

> Document messages.proto
> ---
>
> Key: MESOS-3759
> URL: https://issues.apache.org/jira/browse/MESOS-3759
> Project: Mesos
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: docathon, documentation, mesosphere
>
> The messages we pass between Mesos components are largely undocumented.  See 
> this 
> [TODO|https://github.com/apache/mesos/blob/19f14d06bac269b635657960d8ea8b2928b7830c/src/messages/messages.proto#L23].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.

2015-10-20 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965324#comment-14965324
 ] 

Greg Mann commented on MESOS-3581:
--

I would advocate fixing the openers in all source files. While using Doxygen's 
{{INPUT_FILTER}} would work, this would add a fragile step to the build system 
unnecessarily, leaving one more thing to maintain in the future. While it isn't 
desirable to pollute the change history of many files, I think it's even less 
desirable to add a script that doesn't really need to be there.

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-20 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965245#comment-14965245
 ] 

Guangya Liu commented on MESOS-3765:


I think that the final goal is "fine-grained" resource scheduling and enabling 
the master/allocator know the exact resource request is a easy way to implement 
this. ;-)

Another point is that a framework hoard resources, do you mean the the 
allocator can use filter to hoard resources for a framework? But the problem is 
that the current filter is host level and even after filter expires, the 
allocator still using the allocation unit to allocate resource offers and this 
may still cannot satisfy the framework request. With "granularity", we may need 
to set the filter to allocation unit level but not host level.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3768) slave crashes on master reboot and tasks got stopped

2015-10-20 Thread Johannes Ziemke (JIRA)

Johannes Ziemke created MESOS-3768:
--

 Summary: slave crashes on master reboot and tasks got stopped
 Key: MESOS-3768
 URL: https://issues.apache.org/jira/browse/MESOS-3768
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Johannes Ziemke


Hi,

in my 3 master node cluster, I rebooted the leading master which caused several 
slaves to crash. Beside that, about half of all tasks in the cluster got 
stopped in the process. After some time, the cluster became stable again.

Slave log:
https://gist.github.com/anonymous/f506c79ce63c5c934477

Master log:
https://gist.github.com/anonymous/12e8aa2529b19b226425




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-20 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965213#comment-14965213
 ] 

Alexander Rukletsov commented on MESOS-3765:


{quote}
There might be problems that the master/allocator do not know the exact 
resource request of the framework, so it seems difficult to let 
master/allocator satisfy the request of the framework
{quote}
But this is true for the status quo, right? Currently the allocator does not 
take into consideration frameworks resource wishes, if any. This ticket 
proposes to make the "allocation chunk" adjustable. IIUC, your proposal is to 
implement {{requestResources()}}, which is in my opinion is a separate 
discussion. Also note that a framework may hoard resources, which means having 
multiple smaller chunks should not be a big problem.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-20 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965189#comment-14965189
 ] 

Guangya Liu edited comment on MESOS-3765 at 10/20/15 2:40 PM:
--

There might be problems that the master/allocator do not know the exact 
resource request of the framework, so it seems difficult to let 
master/allocator satisfy the request of the framework and sometimes this may 
cause the framework starve if the offer do not have enough resources.

Mesos now support {code}requestResource{code}, can we leverage this API? The 
framework can just send the exact resource request to Mesos master and the 
master can return the offer with the exact request resource to framework, 
comments?


was (Author: gyliu):
There might be problems that the master/allocator do not know the exact 
resource request of the framework, so it seems difficult to let 
master/allocator satisfy the request of the framework and sometimes this may 
cause the framework starve if the offer do not have enough resources.

Mesos now support requestResource, can we leverage this API? The framework can 
just send the exact resource request to Mesos master and the master can return 
the offer with the exact request resource to framework, comments?

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-20 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965189#comment-14965189
 ] 

Guangya Liu commented on MESOS-3765:


There might be problems that the master/allocator do not know the exact 
resource request of the framework, so it seems difficult to let 
master/allocator satisfy the request of the framework and sometimes this may 
cause the framework starve if the offer do not have enough resources.

Mesos now support requestResource, can we leverage this API? The framework can 
just send the exact resource request to Mesos master and the master can return 
the offer with the exact request resource to framework, comments?

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3032) Document containerizer launch

2015-10-20 Thread Jojy Varghese (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jojy Varghese updated MESOS-3032:
-
Labels: docathon mesosphere  (was: mesosphere)

https://reviews.apache.org/r/39456/

> Document containerizer launch 
> --
>
> Key: MESOS-3032
> URL: https://issues.apache.org/jira/browse/MESOS-3032
> Project: Mesos
>  Issue Type: Documentation
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>Priority: Minor
>  Labels: docathon, mesosphere
>
> We currently dont have enough documentation for the containerizer component. 
> This task adds documentation for containerizer launch sequence.
> The mail goals are:
> - Have diagrams (state, sequence, class etc) depicting the containerizer 
> launch process.
> - Make the documentation newbie friendly.
> - Usable for future design discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-20 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965176#comment-14965176
 ] 

Alexander Rukletsov commented on MESOS-3765:


Yes, that's what I have in mind. I would avoid calling that an offer, it's 
rather an allocation chunk. The allocator may still allocate multiple chunks to 
a single framework in one allocation cycle, which will end up in a single offer.

An alternative is percentage, but we should still stick to the original agent 
size, as taking a fraction of the remaining resources on a agent can be a very 
small value.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master

2015-10-20 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965124#comment-14965124
 ] 

Guangya Liu commented on MESOS-3338:


[~alexr] Does this help 
https://github.com/apache/mesos/blob/master/src/master/http.cpp#L290-L294 

I think that what you want to do is get the unused resources on every agent 
based on RR 
(https://reviews.apache.org/r/38110/diff/6?file=1098311#file1098311line129), 
does the following help?
{code}
Resources unusedOnAgent =
  slave->totalResources - Resources::sum(slave->usedResources) - 
(slave->totalResources.reserved() - 
Resources::sum(slave->usedResources.reserved()) );
{code}

> Dynamic reservations are not counted as used resources in the master
> 
>
> Key: MESOS-3338
> URL: https://issues.apache.org/jira/browse/MESOS-3338
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, persistent-volumes
>
> Dynamically reserved resources should be considered used or allocated and 
> hence reflected in Mesos bookkeeping structures and {{state.json}}.
> I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the 
> following section:
> {code}
>   // Check that the Master counts the reservation as a used resource.
>   {
> Future response =
>   process::http::get(master.get(), "state.json");
> AWAIT_READY(response);
> Try parse = JSON::parse(response.get().body);
> ASSERT_SOME(parse);
> Result cpus =
>   parse.get().find("slaves[0].used_resources.cpus");
> ASSERT_SOME_EQ(JSON::Number(1), cpus);
>   }
> {code}
> and got
> {noformat}
> ../../../src/tests/reservation_tests.cpp:168: Failure
> Value of: (cpus).get()
>   Actual: 0
> Expected: JSON::Number(1)
> Which is: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-20 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965104#comment-14965104
 ] 

Guangya Liu commented on MESOS-3765:


[~alexr] Does the "granularity" is kind of allocation unit, such as 
"cpu:1;mem:256", then the allocator will treat "cpu:1;mem:256" as one offer?

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3767) Add proper JavaDoc to mesos::modules::ModuleManager class

2015-10-20 Thread Alexander Rojas (JIRA)

Alexander Rojas created MESOS-3767:
--

 Summary: Add proper JavaDoc to mesos::modules::ModuleManager class
 Key: MESOS-3767
 URL: https://issues.apache.org/jira/browse/MESOS-3767
 Project: Mesos
  Issue Type: Documentation
  Components: modules
Reporter: Alexander Rojas


While modules developers do not directly interact with 
{{mesos::modules::ModuleManager}} it does help them to understand how the 
underlying mechanism of Modules works. 

It makes sense then to fully document these parts of Mesos with proper Doxygen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING

2015-10-20 Thread Matthias Veit (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965032#comment-14965032
 ] 

Matthias Veit commented on MESOS-3766:
--

[~bernd-mesos] Please assign to the correct person.

> Can not kill task in Status STAGING
> ---
>
> Key: MESOS-3766
> URL: https://issues.apache.org/jira/browse/MESOS-3766
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.25.0
> Environment: OSX 
>Reporter: Matthias Veit
>
> I have created a simple Marathon Application with instance count 100 (100 
> tasks) with a simple sleep command. Before all tasks were running, I killed 
> all tasks. This operation was successful, except 2 tasks. These 2 tasks are 
> in state STAGING (according to the mesos UI). Marathon tries to kill those 
> tasks every 5 seconds (for over an hour now) - unsuccessfully.
> I picked one task and grepped the slave log:
> {noformat}
> I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour
> I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80
> I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container 
> '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr
> I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing 
> executor's forked pid 37096 to 
> '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks
> I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000
> I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame
> I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:23.337116 313872384 slave.cpp:1789] Asked

[jira] [Created] (MESOS-3766) Can not kill task in Status STAGING

2015-10-20 Thread Matthias Veit (JIRA)

Matthias Veit created MESOS-3766:


 Summary: Can not kill task in Status STAGING
 Key: MESOS-3766
 URL: https://issues.apache.org/jira/browse/MESOS-3766
 Project: Mesos
  Issue Type: Bug
  Components: general
Affects Versions: 0.25.0
 Environment: OSX 
Reporter: Matthias Veit


I have created a simple Marathon Application with instance count 100 (100 
tasks) with a simple sleep command. Before all tasks were running, I killed all 
tasks. This operation was successful, except 2 tasks. These 2 tasks are in 
state STAGING (according to the mesos UI). Marathon tries to kill those tasks 
every 5 seconds (for over an hour now) - unsuccessfully.

I picked one task and grepped the slave log:
{noformat}
I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour
I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task 
'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80
I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container 
'5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor 
'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr
I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing executor's 
forked pid 37096 to 
'/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks
I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor 
'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000
I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task 
'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor 
'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame
I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
I1020 12:41:23.337116 313872384 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
.
.
.
I1020 14:11:03.614157 316018688 slave.cpp:1789] Asked to kill task 
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
{noformat}

master log looks like this:

{noformat}
I1020 12:39:38.044208 351387648 master.hpp:176] Adding task 
app.dc98434b-7

[jira] [Created] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-20 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-3765:
--

 Summary: Make offer size adjustable (granularity)
 Key: MESOS-3765
 URL: https://issues.apache.org/jira/browse/MESOS-3765
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Alexander Rukletsov


The built-in allocator performs "coarse-grained" allocation, meaning that it 
always allocates the entire remaining agent resources to a single framework. 
This may heavily impact allocation fairness in some cases, for example in 
presence of numerous greedy frameworks and a small number of powerful agents.

A possible solution would be to allow operators explicitly specify granularity 
via allocator flags. While this can be tricky for non-standard resources, it's 
pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3756) Generalized HTTP Authentication Modules

2015-10-20 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964944#comment-14964944
 ] 

Alexander Rojas commented on MESOS-3756:


h3. Reviews

# [r/38950/|https://reviews.apache.org/r/38950/]: Http Authenticators can be 
loaded as modules from mesos.
# [r/39043/|https://reviews.apache.org/r/39043/]: Added support for HTTP 
Authentication in Mesos.

> Generalized HTTP Authentication Modules
> ---
>
> Key: MESOS-3756
> URL: https://issues.apache.org/jira/browse/MESOS-3756
> Project: Mesos
>  Issue Type: Task
>  Components: modules
>Reporter: Bernd Mathiske
>Assignee: Alexander Rojas
>
> Libprocess is going to factor out an authentication interface: MESOS-3231
> Here we propose that Mesos can provide implementations for this interface as 
> Mesos modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master

2015-10-20 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964941#comment-14964941
 ] 

Alexander Rukletsov commented on MESOS-3338:


[~qianzhang], could we start the conversation around new states? Maybe it is 
something a working group for optimistic offers can take over?

> Dynamic reservations are not counted as used resources in the master
> 
>
> Key: MESOS-3338
> URL: https://issues.apache.org/jira/browse/MESOS-3338
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, persistent-volumes
>
> Dynamically reserved resources should be considered used or allocated and 
> hence reflected in Mesos bookkeeping structures and {{state.json}}.
> I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the 
> following section:
> {code}
>   // Check that the Master counts the reservation as a used resource.
>   {
> Future response =
>   process::http::get(master.get(), "state.json");
> AWAIT_READY(response);
> Try parse = JSON::parse(response.get().body);
> ASSERT_SOME(parse);
> Result cpus =
>   parse.get().find("slaves[0].used_resources.cpus");
> ASSERT_SOME_EQ(JSON::Number(1), cpus);
>   }
> {code}
> and got
> {noformat}
> ../../../src/tests/reservation_tests.cpp:168: Failure
> Value of: (cpus).get()
>   Actual: 0
> Expected: JSON::Number(1)
> Which is: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master

2015-10-20 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964938#comment-14964938
 ] 

Alexander Rukletsov commented on MESOS-3338:


I do, it's quota : )

> Dynamic reservations are not counted as used resources in the master
> 
>
> Key: MESOS-3338
> URL: https://issues.apache.org/jira/browse/MESOS-3338
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, persistent-volumes
>
> Dynamically reserved resources should be considered used or allocated and 
> hence reflected in Mesos bookkeeping structures and {{state.json}}.
> I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the 
> following section:
> {code}
>   // Check that the Master counts the reservation as a used resource.
>   {
> Future response =
>   process::http::get(master.get(), "state.json");
> AWAIT_READY(response);
> Try parse = JSON::parse(response.get().body);
> ASSERT_SOME(parse);
> Result cpus =
>   parse.get().find("slaves[0].used_resources.cpus");
> ASSERT_SOME_EQ(JSON::Number(1), cpus);
>   }
> {code}
> and got
> {noformat}
> ../../../src/tests/reservation_tests.cpp:168: Failure
> Value of: (cpus).get()
>   Actual: 0
> Expected: JSON::Number(1)
> Which is: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3378) Document a test pattern for expediting event firing

2015-10-20 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964926#comment-14964926
 ] 

Alexander Rukletsov edited comment on MESOS-3378 at 10/20/15 10:19 AM:
---

{noformat}
Commit: beb67a21c006f00a8bf86596087dcf70930b3a33 [beb67a2]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 17:58:09 CEST
Committer: Bernd Mathiske 
{noformat}

{noformat}
Commit: 8e83b9a7c22be7303ded07a6037b31a59d80f5d6 [8e83b9a]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 18:32:38 CEST
Committer: Bernd Mathiske 
{noformat}

{noformat}
Commit: 1f231bf4807d7dfa74eb155f841cbaf50901b60c [1f231bf]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 18:41:03 CEST
Committer: Bernd Mathiske 
{noformat}


was (Author: alexr):
{noformat}
Commit: beb67a21c006f00a8bf86596087dcf70930b3a33 [beb67a2]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 17:58:09 CEST
Committer: Bernd Mathiske 

Commit: 8e83b9a7c22be7303ded07a6037b31a59d80f5d6 [8e83b9a]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 18:32:38 CEST
Committer: Bernd Mathiske 

Commit: 1f231bf4807d7dfa74eb155f841cbaf50901b60c [1f231bf]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 18:41:03 CEST
Committer: Bernd Mathiske 
{noformat}

> Document a test pattern for expediting event firing
> ---
>
> Key: MESOS-3378
> URL: https://issues.apache.org/jira/browse/MESOS-3378
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, test
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> We use {{Clock::advance()}} extensively in tests to expedite event firing and 
> minimize overall {{make check}} time. Document this pattern for posterity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3378) Document a test pattern for expediting event firing

2015-10-20 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3378:
---
Target Version/s: 0.26.0

> Document a test pattern for expediting event firing
> ---
>
> Key: MESOS-3378
> URL: https://issues.apache.org/jira/browse/MESOS-3378
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, test
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> We use {{Clock::advance()}} extensively in tests to expedite event firing and 
> minimize overall {{make check}} time. Document this pattern for posterity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3378) Document a test pattern for expediting event firing

2015-10-20 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964926#comment-14964926
 ] 

Alexander Rukletsov edited comment on MESOS-3378 at 10/20/15 10:19 AM:
---

{noformat}
Commit: beb67a21c006f00a8bf86596087dcf70930b3a33 [beb67a2]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 17:58:09 CEST
Committer: Bernd Mathiske 

Commit: 8e83b9a7c22be7303ded07a6037b31a59d80f5d6 [8e83b9a]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 18:32:38 CEST
Committer: Bernd Mathiske 

Commit: 1f231bf4807d7dfa74eb155f841cbaf50901b60c [1f231bf]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 18:41:03 CEST
Committer: Bernd Mathiske 
{noformat}


was (Author: alexr):
Commit: beb67a21c006f00a8bf86596087dcf70930b3a33 [beb67a2]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 17:58:09 CEST
Committer: Bernd Mathiske 

Commit: 8e83b9a7c22be7303ded07a6037b31a59d80f5d6 [8e83b9a]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 18:32:38 CEST
Committer: Bernd Mathiske 

Commit: 1f231bf4807d7dfa74eb155f841cbaf50901b60c [1f231bf]
Author: Alexander Rukletsov 
Date: 14 Oct 2015 18:41:03 CEST
Committer: Bernd Mathiske 

> Document a test pattern for expediting event firing
> ---
>
> Key: MESOS-3378
> URL: https://issues.apache.org/jira/browse/MESOS-3378
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, test
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> We use {{Clock::advance()}} extensively in tests to expedite event firing and 
> minimize overall {{make check}} time. Document this pattern for posterity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2935) Fetcher doesn't extract from .tar files

2015-10-20 Thread Bernd Mathiske (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964924#comment-14964924
 ] 

Bernd Mathiske commented on MESOS-2935:
---

Hi [~bhuvan], I can shepherd this if that's OK with you.

> Fetcher doesn't extract from .tar files
> ---
>
> Key: MESOS-2935
> URL: https://issues.apache.org/jira/browse/MESOS-2935
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Reporter: Sargun Dhillon
>Assignee: Bhuvan Arumugam
>Priority: Minor
>  Labels: newbie
>
> Compressed artifacts get decompressed with either "unzip -d" or "tar -C $DIR 
> -xf" 
> In addition, only the following file suffixes / extensions result in 
> decompression:
> -tgz
> -tar.gz
> -tbz2
> -tar.bz2
> -tar.xz
> -txz
> -zip
> OR 
> Alternatively, change fetcher to accept .tar as a valid suffix to trigger the 
> tarball code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3072) Unify initialization of modularized components

2015-10-20 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964922#comment-14964922
 ] 

Alexander Rojas commented on MESOS-3072:


h3. Reviews

[r/38627/|https://reviews.apache.org/r/38627/]: Adds an overload of 
ModuleManager::create() allowing overriding parameters programatically.

> Unify initialization of modularized components
> --
>
> Key: MESOS-3072
> URL: https://issues.apache.org/jira/browse/MESOS-3072
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Affects Versions: 0.22.0, 0.22.1, 0.23.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere
>
> h1.Introduction
> As it stands right now, default implementations of modularized components are 
> required to have a non parametrized {{create()}} static method. This allows 
> to write tests which can cover default implementations and modules based on 
> these default implementations on a uniform way.
> For example, with the interface {{Foo}}:
> {code}
> class Foo {
> public:
>   virtual ~Foo() {}
>   virtual Future hello() = 0;
> protected:
>   Foo() {}
> };
> {code}
> With a default implementation:
> {code}
> class LocalFoo {
> public:
>   Try create() {
> return new Foo;
>   }
>   virtual Future hello() {
> return 1;
>   }
> };
> {code}
> This allows to create typed tests which look as following:
> {code}
> typedef ::testing::Types  tests::Module>
>   FooTestTypes;
> TYPED_TEST_CASE(FooTest, FooTestTypes);
> TYPED_TEST(FooTest, ATest)
> {
>   Try foo = TypeParam::create();
>   ASSERT_SOME(foo);
>   AWAIT_CHECK_EQUAL(foo.get()->hello(), 1);
> }
> {code}
> The test will be applied to each of types in the template parameters of 
> {{FooTestTypes}}. This allows to test different implementation of an 
> interface. In our code, it tests default implementations and a module which 
> uses the same default implementation.
> The class {{tests::Module}} needs a little 
> explanation, it is a wrapper around {{ModuleManager}} which allows the tests 
> to encode information about the requested module in the type itself instead 
> of passing a string to the factory method. The wrapper around create, the 
> real important method looks as follows:
> {code}
> template
> static Try test::Module::create()
> {
>   Try moduleName = getModuleName(N);
>   if (moduleName.isError()) {
> return Error(moduleName.error());
>   }
>   return mesos::modules::ModuleManager::create(moduleName.get());
> }
> {code}
> h1.The Problem
> Consider the following implementation of {{Foo}}:
> {code}
> class ParameterFoo {
> public:
>   Try create(int i) {
> return new ParameterFoo(i);
>   }
>   ParameterFoo(int i) : i_(i) {}
>   virtual Future hello() {
> return i;
>   }
> private:
>   int i_;
> };
> {code}
> As it can be seen, this implementation cannot be used as a default 
> implementation since its create API does not match the one of 
> {{test::Module<>}}: {{create()}} has a different signature for both types. It 
> is still a common situation to require initialization parameters for objects, 
> however this constraint (keeping both interfaces alike) forces default 
> implementations of modularized components to have default constructors, 
> therefore the tests are forcing the design of the interfaces.
> Implementations which are supposed to be used as modules only, i.e. non 
> default implementations are allowed to have constructor parameters, since the 
> actual signature of their factory method is, this factory method's function 
> is to decode the parameters and call the appropriate constructor:
> {code}
> template
> T* Module::create(const Parameters& params);
> {code}
> where parameters is just an array of key-value string pairs whose 
> interpretation is left to the specific module. Sadly, this call is wrapped by 
> {{ModuleManager}} which only allows module parameters to be passed from the 
> command line and does not offer a programmatic way to feed construction 
> parameters to modules.
> h1.The Ugly Workaround
> With the requirement of a default constructor and parameters devoid 
> {{create()}} factory function, a common pattern (see 
> [Authenticator|https://github.com/apache/mesos/blob/9d4ac11ed757aa5869da440dfe5343a61b07199a/include/mesos/authentication/authenticator.hpp])
>  has been introduced to feed construction parameters into default 
> implementation, this leads to adding an {{initialize()}} call to the public 
> interface, which will have {{Foo}} become:
> {code}
> class Foo {
> public:
>   virtual ~Foo() {}
>   virtual Try initialize(Option i) = 0;
>   virtual Future hello() = 0;
> protected:
>   Foo() {}
> };
> {code}
> {{ParameterFoo}} will thus look as follows:
> {code}
> class ParameterFoo {
> public:
>   Try c

[jira] [Updated] (MESOS-2935) Fetcher doesn't extract from .tar files

2015-10-20 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-2935:
--
Issue Type: Improvement  (was: Bug)

> Fetcher doesn't extract from .tar files
> ---
>
> Key: MESOS-2935
> URL: https://issues.apache.org/jira/browse/MESOS-2935
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Reporter: Sargun Dhillon
>Assignee: Bhuvan Arumugam
>Priority: Minor
>  Labels: newbie
>
> Compressed artifacts get decompressed with either "unzip -d" or "tar -C $DIR 
> -xf" 
> In addition, only the following file suffixes / extensions result in 
> decompression:
> -tgz
> -tar.gz
> -tbz2
> -tar.bz2
> -tar.xz
> -txz
> -zip
> OR 
> Alternatively, change fetcher to accept .tar as a valid suffix to trigger the 
> tarball code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2935) Fetcher doesn't extract from .tar files

2015-10-20 Thread Bernd Mathiske (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-2935:
--
Shepherd: Bernd Mathiske

> Fetcher doesn't extract from .tar files
> ---
>
> Key: MESOS-2935
> URL: https://issues.apache.org/jira/browse/MESOS-2935
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Reporter: Sargun Dhillon
>Assignee: Bhuvan Arumugam
>Priority: Minor
>  Labels: newbie
>
> Compressed artifacts get decompressed with either "unzip -d" or "tar -C $DIR 
> -xf" 
> In addition, only the following file suffixes / extensions result in 
> decompression:
> -tgz
> -tar.gz
> -tbz2
> -tar.bz2
> -tar.xz
> -txz
> -zip
> OR 
> Alternatively, change fetcher to accept .tar as a valid suffix to trigger the 
> tarball code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3232) Implement HTTP Basic Authentication for Mesos endpoints

2015-10-20 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964918#comment-14964918
 ] 

Alexander Rojas commented on MESOS-3232:


h3. Reviews

[r/38094/|https://reviews.apache.org/r/38094/]: Added implementation of Http 
Basic authentication scheme.

> Implement HTTP Basic Authentication for Mesos endpoints
> ---
>
> Key: MESOS-3232
> URL: https://issues.apache.org/jira/browse/MESOS-3232
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere, security
>
> Using the mechanisms implemented in MESOS-3231, implement HTTP Basic 
> Authentication as described in the 
> [RFC-2617|https://www.ietf.org/rfc/rfc2617.txt].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3233) Allow developers to decide whether a HTTP endpoint should use authentication

2015-10-20 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964913#comment-14964913
 ] 

Alexander Rojas commented on MESOS-3233:


h3. Reviews

[r/38000/|https://reviews.apache.org/r/38000/]: Added an API for libprocess 
users to interact with http::AuthenticatorManager.

> Allow developers to decide whether a HTTP endpoint should use authentication
> 
>
> Key: MESOS-3233
> URL: https://issues.apache.org/jira/browse/MESOS-3233
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere, security
>
> Once HTTP Authentication is enabled, developers should be allowed to decide 
> which endpoints should require authentication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3231) Implement http::AuthenticatorManager and http::Authenticator

2015-10-20 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964906#comment-14964906
 ] 

Alexander Rojas commented on MESOS-3231:


h3. Reviews

# [r/37998/|https://reviews.apache.org/r/37998/]: Made ProcessManager::handle() 
a void returning method.
# [r/39472/|https://reviews.apache.org/r/39472/]: Added the helper container 
InheritanceTree where nodes inherit values from their ancestors.
# [r/37999/|https://reviews.apache.org/r/37999/]: Implemented 
http::AuthenticatorManager.

> Implement http::AuthenticatorManager and http::Authenticator
> 
>
> Key: MESOS-3231
> URL: https://issues.apache.org/jira/browse/MESOS-3231
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere, security
>
> As proposed in the document [Mesos HTTP Authentication 
> Design|https://docs.google.com/document/d/1kM3_f7DSqXcE2MuERrLTGp_XMC6ss2wmpkNYDCY5rOM],
>  a {{process::http::AuthenticatorManager}} and 
> {{process::http::Authenticator}} are needed.
> The {{process::http::AuthenticatorManager}} takes care of the logic which is 
> common for all authenticators, while the {{process::http::Authenticator}} 
> implements specific authentication schemes (for more details, please head to 
> the design doc).
> Tests will be needed too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3764) Port not re-offered after task dies

2015-10-20 Thread Serhey Novachenko (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964889#comment-14964889
 ] 

Serhey Novachenko commented on MESOS-3764:
--

Ok, I figured this out. Not a Mesos issue. I've checked mesos master logs and 
indeed the resources were released:

{noformat}
I1019 20:07:58.113832 10500 master.cpp:5178] Updating the latest state of task 
task-23-74a4f9a9-8952-4068-909f-22c3a220b62a of framework 
20151017-130649-2466976940-5050-10486-0012 to TASK_FAILED
I1019 20:07:58.114332 10500 hierarchical.hpp:761] Recovered cpus(*):0.5; 
mem(*):1024; ports(*):[31150-31150] (total: ports(*):[4000-7000, 31000-32000]; 
cpus(*):4; mem(*):13599; disk(*):199666, allocated: cpus(*):3.3; mem(*):8576; 
ports(*):[31250-31250, 31350-31350, 4685-4685, 5940-5940]) on slave 
20151017-130649-2466976940-5050-10486-S0 from framework 
20151017-130649-2466976940-5050-10486-0012
I1019 20:07:58.165314 10502 master.cpp:5246] Removing task 
task-23-74a4f9a9-8952-4068-909f-22c3a220b62a with resources cpus(*):0.5; 
mem(*):1024; ports(*):[31150-31150] of framework 
20151017-130649-2466976940-5050-10486-0012 on slave 
20151017-130649-2466976940-5050-10486-S0 at slave(1)@18:5051 (ip)
{noformat}

Then I looked through offers getting received and noticed that this slave 
offers exactly one port and others seem to be occupied: cpus:1.00 mem:4096.00 
ports:[6810..6810]

It turned out there was another framework that did not decline offers and thus 
my framework did not receive necessary resources. After killing that another 
framework everything worked fine.

Sorry, my bad

> Port not re-offered after task dies
> ---
>
> Key: MESOS-3764
> URL: https://issues.apache.org/jira/browse/MESOS-3764
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler driver
>Affects Versions: 0.23.0
>Reporter: Serhey Novachenko
>
> I have a Mesos framework configured to accept a specific port for tasks 
> (31150 in my case) and I have amount of tasks == amount of slaves so 
> basically I have a task running on each slave on port 31150.
> I have Mesos slaves configured to offer 4000..7000,31000..32000 and I was 
> successfully running all tasks until one of them threw an exception and died. 
> The framework got the TASK_FAILED status update and I expected the task to be 
> relaunched on the same machine and port but instead my framework says no 
> offer has port 31150 in it. Is there a case when Mesos does not re-offer the 
> port of dead task?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3705) HTTP Pipelining doesn't keep order of requests

2015-10-20 Thread Alexander Rojas (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964811#comment-14964811
 ] 

Alexander Rojas commented on MESOS-3705:


h3. Reviews
[r/39276/|https://reviews.apache.org/r/39276/]: Fixed a bug in which under 
certains circumstances HTTP 1.1 Pipelining is not respected.

> HTTP Pipelining doesn't keep order of requests
> --
>
> Key: MESOS-3705
> URL: https://issues.apache.org/jira/browse/MESOS-3705
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.24.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: http, libprocess, mesosphere
>
> [HTTP 1.1 Pipelining|https://en.wikipedia.org/wiki/HTTP_pipelining] describes 
> a mechanism by which multiple HTTP request can be performed over a single 
> socket. The requirement here is that responses should be send in the same 
> order as requests are being made.
> Libprocess has some mechanisms built in to deal with pipelining when multiple 
> HTTP requests are made, it is still, however, possible to create a situation 
> in which responses are scrambled respected to the requests arrival.
> Consider the situation in which there are two libprocess processes, 
> {{processA}} and {{processB}}, each running in a different thread, 
> {{thread2}} and {{thread3}} respectively. The 
> [{{ProcessManager}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L374]
>  runs in {{thread1}}.
> {{processA}} is of type {{ProcessA}} which looks roughly as follows:
> {code}
> class ProcessA : public ProcessBase
> {
> public:
>   ProcessA() {}
>   Future foo(const http::Request&) {
> // … Do something …
>return http::Ok();
>   }
> protected:
>   virtual void initialize() {
> route("/foo", None(), &ProcessA::foo);
>   }
> }
> {code}
> {{processB}} is from type {{ProcessB}} which is just like {{ProcessA}} but 
> routes {{"bar"}} instead of {{"foo"}}.
> The situation in which the bug arises is the following:
> # Two requests, one for {{"http://server_uri/(1)/foo"}} and one for 
> {{"http://server_uri/(2)//bar"}} are made over the same socket.
> # The first request arrives to 
> [{{ProcessManager::handle}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2202]
>  which is still running in {{thread1}}. This one creates an {{HttpEvent}} and 
> delivers to the handler, in this case {{processA}}.
> # 
> [{{ProcessManager::deliver}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2361]
>  enqueues the HTTP event in to the {{processA}} queue. This happens in 
> {{thread1}}.
> # The second request arrives to 
> [{{ProcessManager::handle}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2202]
>  which is still running in {{thread1}}. Another {{HttpEvent}} is created and 
> delivered to the handler, in this case {{processB}}.
> # 
> [{{ProcessManager::deliver}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2361]
>  enqueues the HTTP event in to the {{processB}} queue. This happens in 
> {{thread1}}.
> # {{Thread2}} is blocked, so {{processA}} cannot handle the first request, it 
> is stuck in the queue.
> # {{Thread3}} is idle, so it picks up the request to {{processB}} immediately.
> # 
> [{{ProcessBase::visit(HttpEvent)}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L3073]
>  is called in {{thread3}}, this one in turn 
> [dispatches|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L3106]
>  the response's future to the {{HttpProxy}} associated with the socket where 
> the request came.
> At the last point, the bug is evident, the request to {{processB}} will be 
> send before the request to {{processA}} even if the handler takes a long time 
> and the {{processA::bar()}} actually finishes before. The responses are not 
> send in the order the requests are done.
> h1. Reproducer
> The following is a test which successfully reproduces the issue:
> {code}
> class PipelineScramblerProcess : public Process
> {
> public:
>   PipelineScramblerProcess()
> : ProcessBase(ID::generate("PipelineScramblerProcess")) {}
>   void block(const Future& trigger)
>   {
> trigger.await();
>   }
>   Future get(const http::Request& request)
>   {
> if (promise_) {
>   promise_->set(Nothing());
> }
> return http::OK(self().id);
>   }
>   void setPromise(std::unique_ptr>& promise)
>   {
> promise_ = std::move(

[jira] [Commented] (MESOS-3764) Port not re-offered after task dies

2015-10-20 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964803#comment-14964803
 ] 

Guangya Liu commented on MESOS-3764:


If the task failed, then the mesos master will call recoverResources to return 
the resource back and then this resource can be re-offered. [~serejja] can you 
please check mesos master log to see if the resource is return back to mesos 
master after the task failed?

> Port not re-offered after task dies
> ---
>
> Key: MESOS-3764
> URL: https://issues.apache.org/jira/browse/MESOS-3764
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler driver
>Affects Versions: 0.23.0
>Reporter: Serhey Novachenko
>
> I have a Mesos framework configured to accept a specific port for tasks 
> (31150 in my case) and I have amount of tasks == amount of slaves so 
> basically I have a task running on each slave on port 31150.
> I have Mesos slaves configured to offer 4000..7000,31000..32000 and I was 
> successfully running all tasks until one of them threw an exception and died. 
> The framework got the TASK_FAILED status update and I expected the task to be 
> relaunched on the same machine and port but instead my framework says no 
> offer has port 31150 in it. Is there a case when Mesos does not re-offer the 
> port of dead task?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3764) Port not re-offered after task dies

2015-10-20 Thread Serhey Novachenko (JIRA)

Serhey Novachenko created MESOS-3764:


 Summary: Port not re-offered after task dies
 Key: MESOS-3764
 URL: https://issues.apache.org/jira/browse/MESOS-3764
 Project: Mesos
  Issue Type: Bug
  Components: scheduler driver
Affects Versions: 0.23.0
Reporter: Serhey Novachenko


I have a Mesos framework configured to accept a specific port for tasks (31150 
in my case) and I have amount of tasks == amount of slaves so basically I have 
a task running on each slave on port 31150.

I have Mesos slaves configured to offer 4000..7000,31000..32000 and I was 
successfully running all tasks until one of them threw an exception and died. 
The framework got the TASK_FAILED status update and I expected the task to be 
relaunched on the same machine and port but instead my framework says no offer 
has port 31150 in it. Is there a case when Mesos does not re-offer the port of 
dead task?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-1971) Switch cgroups_limit_swap default to true

2015-10-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/MESOS-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964693#comment-14964693
 ] 

Anton Lindström commented on MESOS-1971:


[~adam-mesos] Thanks! Sounds good!

> Switch cgroups_limit_swap default to true
> -
>
> Key: MESOS-1971
> URL: https://issues.apache.org/jira/browse/MESOS-1971
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anton Lindström
>Priority: Minor
>
> Switch cgroups_limit_swap to true per default, see MESOS-1662 for more 
> information.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

95 matches

Mail list logo