[jira] [Assigned] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string

2015-10-15 Thread Liqiang Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liqiang Lin reassigned MESOS-3747:
--

Assignee: Liqiang Lin

> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -
>
> Key: MESOS-3747
> URL: https://issues.apache.org/jira/browse/MESOS-3747
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Ben Whitehead
>Assignee: Liqiang Lin
>Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to 
> inherit the user the agent processes is running as, this behavior now results 
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not 
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
> type: SUBSCRIBE
> subscribe: {
> frameworkInfo: {
> user: "",
> name: "testing"
> }
> }
> }
> {code}
> {code:title=TaskInfo}
> call {
> framework_id { value: "20151015-125949-16777343-5050-20146-" },
> type: ACCEPT,
> accept { 
> offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
> operations { 
> type: LAUNCH, 
> launch { 
> task_infos [
> {
> name: "task-1",
> task_id: { value: "task-1" },
> agent_id: { value: 
> "20151015-125949-16777343-5050-20146-S0" },
> resources [
> { name: "cpus", type: SCALAR, scalar: { value: 
> 0.1 },  role: "*" },
> { name: "mem",  type: SCALAR, scalar: { value: 
> 64.0 }, role: "*" },
> { name: "disk", type: SCALAR, scalar: { value: 
> 0.0 },  role: "*" },
> ],
> command: { 
> environment { 
> variables [ 
> { name: "SLEEP_SECONDS" value: "15" } 
> ] 
> },
> value: "env | sort && sleep $SLEEP_SECONDS"
> }
>     }
> ]
>  }
>  }
>  }
> }
> {code}
> {code:title=Update Status}
> event: {
> type: UPDATE,
> update: { 
> status: { 
> task_id: { value: "task-1" }, 
> state: TASK_FAILED,
> message: "Container destroyed while preparing isolators",
> agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, 
> timestamp: 1.444939217401241E9,
> executor_id: { value: "task-1" },
> source: SOURCE_AGENT, 
> reason: REASON_MEMORY_LIMIT,
> uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" 
> } 
> }
> }
> {code}
> {code:title=agent logs}
> I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b':
>  Failed to get user information for '': Success
> I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b'
> I1015 13:15:34.262581 19639 slave.cpp:1604] Queuing task 'task-1' for 
> executor task-1 of framework &#

[jira] [Commented] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string

2015-10-15 Thread Liqiang Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960213#comment-14960213
 ] 

Liqiang Lin commented on MESOS-3747:


Both in mesos::v1::scheduler::MesosProcess::_send(...) and 
Master::Http::scheduler(...) we call the same validation 
validation::scheduler::call::validate(call). If we add not null check for 
FrameworkInfo.user, there should not be 400 BadRequest at Master side since the 
scheduler's subscribe request would not be send out to master. 

> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -
>
> Key: MESOS-3747
> URL: https://issues.apache.org/jira/browse/MESOS-3747
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Ben Whitehead
>Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to 
> inherit the user the agent processes is running as, this behavior now results 
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not 
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
> type: SUBSCRIBE
> subscribe: {
> frameworkInfo: {
> user: "",
> name: "testing"
> }
> }
> }
> {code}
> {code:title=TaskInfo}
> call {
> framework_id { value: "20151015-125949-16777343-5050-20146-" },
> type: ACCEPT,
> accept { 
> offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
> operations { 
> type: LAUNCH, 
> launch { 
> task_infos [
> {
> name: "task-1",
> task_id: { value: "task-1" },
> agent_id: { value: 
> "20151015-125949-16777343-5050-20146-S0" },
> resources [
> { name: "cpus", type: SCALAR, scalar: { value: 
> 0.1 },  role: "*" },
> { name: "mem",  type: SCALAR, scalar: { value: 
> 64.0 }, role: "*" },
> { name: "disk", type: SCALAR, scalar: { value: 
> 0.0 },  role: "*" },
> ],
> command: { 
> environment { 
> variables [ 
> { name: "SLEEP_SECONDS" value: "15" } 
> ] 
> },
> value: "env | sort && sleep $SLEEP_SECONDS"
> }
> }
> ]
>  }
>  }
>  }
> }
> {code}
> {code:title=Update Status}
> event: {
> type: UPDATE,
> update: { 
> status: { 
> task_id: { value: "task-1" }, 
> state: TASK_FAILED,
> message: "Container destroyed while preparing isolators",
> agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, 
> timestamp: 1.444939217401241E9,
> executor_id: { value: "task-1" },
> source: SOURCE_AGENT, 
> reason: REASON_MEMORY_LIMIT,
> uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" 
> } 
> }
> }
> {code}
> {code:title=agent logs}
> I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b':
>  Failed to get user information for '': Success
> I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
>

[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-15 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960202#comment-14960202
 ] 

haosdent commented on MESOS-3738:
-

Thank you, let me contract [~tnachen] to describe this issue.

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
> Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> Marathon JSON file:
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave 
> e20f8959-cd9f-40ae-987d-809401309361-S0
> WARNING: Your kernel does not support swap limit capabiliti

[jira] [Updated] (MESOS-1582) Improve build time.

2015-10-15 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-1582:

Attachment: (was: MESOS-3738-0_23_1.patch)

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1582) Improve build time.

2015-10-15 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-1582:

Attachment: (was: MESOS-3738-0_25_0.patch)

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1582) Improve build time.

2015-10-15 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-1582:

Attachment: (was: MESOS-3738-0_24_1.patch)

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-15 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-3738:

Attachment: MESOS-3738-0_25_0.patch
MESOS-3738-0_24_1.patch
MESOS-3738-0_23_1.patch

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
> Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> Marathon JSON file:
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave 
> e20f8959-cd9f-40ae-987d-809401309361-S0
> WARNING: Your kernel does not support swap limit capabiliti

[jira] [Updated] (MESOS-1582) Improve build time.

2015-10-15 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-1582:

Attachment: MESOS-3738-0_25_0.patch
MESOS-3738-0_24_1.patch
MESOS-3738-0_23_1.patch

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
> Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, 
> MESOS-3738-0_25_0.patch
>
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-15 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960193#comment-14960193
 ] 

Anand Mazumdar commented on MESOS-3738:
---

[~haosd...@gmail.com] Can you find a shepherd for this issue ? If you already 
have one, can you update the JIRA. Thanks

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> Marathon JSON file:
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave 
> e20f8959-cd9f-40ae-987d-809401309361-S0
> WARNING: Your kernel does not support swap limit capabilities, memory limited 
> without s

[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-15 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960190#comment-14960190
 ] 

haosdent commented on MESOS-3738:
-

Thanks for report. Because we launch docker executor through subprocess() and 
argv[0] is "mesos-docker-executor", we get basename for mesos-docker-executor 
would become sandbox dir. While we launch "mesos-executor" through complete 
path and we got the basename for it is launcher_dir. 
And if we build it from source and test "mesos-docker-executor" could pass, 
because we wrap "mesos-docker-executor" through a automake script. So that 
argv[0] would become a correct path.

Patch: https://reviews.apache.org/r/39386/

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> Marathon JSON file:
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 

[jira] [Commented] (MESOS-3524) HTTP API v1 Protobuf Jar in maven central

2015-10-15 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959973#comment-14959973
 ] 

Anand Mazumdar commented on MESOS-3524:
---

For the time-being till we find a formal solution for the same, we pushed the 
V1 Protobufs as part of the existing Mesos JAR as part of MESOS-3575

> HTTP API v1 Protobuf Jar in maven central
> -
>
> Key: MESOS-3524
> URL: https://issues.apache.org/jira/browse/MESOS-3524
> Project: Mesos
>  Issue Type: Story
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.25.0
>Reporter: Ben Whitehead
>Priority: Critical
>  Labels: http, jar, java, maven, mesosphere, protobuf
>
> As a developer working on the JVM I would like mesos to provide a JAR 
> containing the new protobuf classes for the HTTP API.
> Please create a new jar that contains the Protobuf message file and generated 
> java and class files generated from the message file. This jar should be 
> published to maven central.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3748) HTTP scheduler library does not gracefully parse invalid resource identifiers

2015-10-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3748:
--
  Sprint: Mesosphere Sprint 21
Story Points: 1

> HTTP scheduler library does not gracefully parse invalid resource identifiers
> -
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you pass a nonsense string for "master" into a framework using the HTTP 
> scheduler library, the framework segfaults.
> For example, using the example frameworks:
> {code:title=Scheduler Driver}
> build/src/test-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in:
> {code}
> Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to 
> parse 'asdf://127.0.0.1:5050'
> {code}
> {code:title=HTTP Scheduler Library}
> export DEFAULT_PRINCIPAL=root
> build/src/event-call-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in
> {code}
> I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
> Segmentation fault: 11
> {code}
> {code:title=Stack Trace}
> * thread #2: tid = 0x28b6bb, 0x000100ad03ca 
> libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
>  + 42 at scheduler.cpp:213, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
>   * frame #0: 0x000100ad03ca 
> libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
>  + 42 at scheduler.cpp:213
> frame #1: 0x000100ad05f2 libmesos-0.26.0.dylib`virtual thunk to 
> mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0) + 34 
> at scheduler.cpp:210
> frame #2: 0x0001022b60f3 libmesos-0.26.0.dylib`::resume() + 931 at 
> process.cpp:2449
> frame #3: 0x0001022c131c libmesos-0.26.0.dylib`::operator()() + 268 
> at process.cpp:2174
> frame #4: 0x0001022c0fa2 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __invoke<(lambda at ../../../3rdparty/libprocess/src/process.cpp:2158:35) &, 
> const std::__1::atomic &> + 27 at __functional_base:415
> frame #5: 0x0001022c0f87 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __apply_functor<(lambda at 
> ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::tuple > >, 
> 0, std::__1::tuple<> > + 55 at functional:2060
> frame #6: 0x0001022c0f50 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> operator()<> + 41 at functional:2123
> frame #7: 0x0001022c0f27 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __invoke ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > >> + 14 at 
> __functional_base:415
> frame #8: 0x0001022c0f19 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __thread_execute ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > >> + 25 at 
> thread:337
> frame #9: 0x0001022c0f00 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() + 368 at 
> thread:347
> frame #10: 0x7fff964c705a libsystem_pthread.dylib`_pthread_body + 131
> frame #11: 0x7fff964c6fd7 libsystem_pthread.dylib`_pthread_start + 176
> frame #12: 0x7fff964c43ed libsystem_pthread.dylib`thread_start + 13
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3748) HTTP scheduler library does not gracefully parse invalid resource identifiers

2015-10-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3748:
--
Description: 
If you pass a nonsense string for "master" into a framework using the C++ HTTP 
scheduler library, the framework segfaults.

For example, using the example frameworks:

{code:title=Scheduler Driver}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

{code:title=HTTP Scheduler Library}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}

{code:title=Stack Trace}
* thread #2: tid = 0x28b6bb, 0x000100ad03ca 
libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
 + 42 at scheduler.cpp:213, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000100ad03ca 
libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
 + 42 at scheduler.cpp:213
frame #1: 0x000100ad05f2 libmesos-0.26.0.dylib`virtual thunk to 
mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0) + 34 at 
scheduler.cpp:210
frame #2: 0x0001022b60f3 libmesos-0.26.0.dylib`::resume() + 931 at 
process.cpp:2449
frame #3: 0x0001022c131c libmesos-0.26.0.dylib`::operator()() + 268 at 
process.cpp:2174
frame #4: 0x0001022c0fa2 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__invoke<(lambda at ../../../3rdparty/libprocess/src/process.cpp:2158:35) &, 
const std::__1::atomic &> + 27 at __functional_base:415
frame #5: 0x0001022c0f87 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__apply_functor<(lambda at 
../../../3rdparty/libprocess/src/process.cpp:2158:35), 
std::__1::tuple > >, 
0, std::__1::tuple<> > + 55 at functional:2060
frame #6: 0x0001022c0f50 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
operator()<> + 41 at functional:2123
frame #7: 0x0001022c0f27 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__invoke > >> + 14 at 
__functional_base:415
frame #8: 0x0001022c0f19 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__thread_execute > >> + 25 at thread:337
frame #9: 0x0001022c0f00 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() + 368 at 
thread:347
frame #10: 0x7fff964c705a libsystem_pthread.dylib`_pthread_body + 131
frame #11: 0x7fff964c6fd7 libsystem_pthread.dylib`_pthread_start + 176
frame #12: 0x7fff964c43ed libsystem_pthread.dylib`thread_start + 13
{code}

  was:
If you pass a nonsense string for "master" into a framework using the HTTP 
scheduler library, the framework segfaults.

For example, using the example frameworks:

{code:title=Scheduler Driver}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

{code:title=HTTP Scheduler Library}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}

{code:title=Stack Trace}
* thread #2: tid = 0x28b6bb, 0x000100ad03ca 
libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
 + 42 at scheduler.cpp:213, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000100ad03ca 
libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
 + 42 at scheduler.cpp:213
frame #1: 0x000100ad05f2 libmesos-0.26.0.dylib`virtual thunk to 
mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0) + 34 at 
scheduler.cpp:210
frame #2: 0x0001022b60f3 libmesos-0.26.0.dylib`::resume() + 931 at 
process.cpp:2449
frame #3: 0x0001022c131c libmesos-0.26.0.dylib`::operator()() + 268 at 
process.cpp:2174
frame #4: 0x0001022c0fa2 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__invoke<(lambda at ../../../3rdparty/libprocess/src/process.cpp:2158:35) &, 
const std::__1::atomic &> + 27 at __functional_base:415
frame #5: 0x0001022c0f87 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__apply_functor<(lambda at 
../../../3rdparty/libprocess/src/process.cpp:2158:35), 
std::__1::tuple > >, 
0, std::__1::tuple<> > + 55 at functional:2060
frame #6: 0x0001022c0f50 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
operator()<> + 41 at functional:2123
frame #7: 0x0001022c0f27 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__invoke > >> + 14 at 
__fu

[jira] [Commented] (MESOS-3748) HTTP scheduler library does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959889#comment-14959889
 ] 

Joseph Wu commented on MESOS-3748:
--

Turns out to be a rather trivial issue: https://reviews.apache.org/r/39365/

> HTTP scheduler library does not gracefully parse invalid resource identifiers
> -
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you pass a nonsense string for "master" into a framework using the HTTP 
> scheduler library, the framework segfaults.
> For example, using the example frameworks:
> {code:title=Scheduler Driver}
> build/src/test-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in:
> {code}
> Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to 
> parse 'asdf://127.0.0.1:5050'
> {code}
> {code:title=HTTP Scheduler Library}
> export DEFAULT_PRINCIPAL=root
> build/src/event-call-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in
> {code}
> I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
> Segmentation fault: 11
> {code}
> {code:title=Stack Trace}
> * thread #2: tid = 0x28b6bb, 0x000100ad03ca 
> libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
>  + 42 at scheduler.cpp:213, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
>   * frame #0: 0x000100ad03ca 
> libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
>  + 42 at scheduler.cpp:213
> frame #1: 0x000100ad05f2 libmesos-0.26.0.dylib`virtual thunk to 
> mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0) + 34 
> at scheduler.cpp:210
> frame #2: 0x0001022b60f3 libmesos-0.26.0.dylib`::resume() + 931 at 
> process.cpp:2449
> frame #3: 0x0001022c131c libmesos-0.26.0.dylib`::operator()() + 268 
> at process.cpp:2174
> frame #4: 0x0001022c0fa2 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __invoke<(lambda at ../../../3rdparty/libprocess/src/process.cpp:2158:35) &, 
> const std::__1::atomic &> + 27 at __functional_base:415
> frame #5: 0x0001022c0f87 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __apply_functor<(lambda at 
> ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::tuple > >, 
> 0, std::__1::tuple<> > + 55 at functional:2060
> frame #6: 0x0001022c0f50 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> operator()<> + 41 at functional:2123
> frame #7: 0x0001022c0f27 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __invoke ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > >> + 14 at 
> __functional_base:415
> frame #8: 0x0001022c0f19 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __thread_execute ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > >> + 25 at 
> thread:337
> frame #9: 0x0001022c0f00 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() + 368 at 
> thread:347
> frame #10: 0x7fff964c705a libsystem_pthread.dylib`_pthread_body + 131
> frame #11: 0x7fff964c6fd7 libsystem_pthread.dylib`_pthread_start + 176
> frame #12: 0x7fff964c43ed libsystem_pthread.dylib`thread_start + 13
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3748) HTTP scheduler library does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3748:
-
Description: 
If you pass a nonsense string for "master" into a framework using the HTTP 
scheduler library, the framework segfaults.

For example, using the example frameworks:

{code:title=Scheduler Driver}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

{code:title=HTTP Scheduler Library}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}

{code:title=Stack Trace}
* thread #2: tid = 0x28b6bb, 0x000100ad03ca 
libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
 + 42 at scheduler.cpp:213, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000100ad03ca 
libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
 + 42 at scheduler.cpp:213
frame #1: 0x000100ad05f2 libmesos-0.26.0.dylib`virtual thunk to 
mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0) + 34 at 
scheduler.cpp:210
frame #2: 0x0001022b60f3 libmesos-0.26.0.dylib`::resume() + 931 at 
process.cpp:2449
frame #3: 0x0001022c131c libmesos-0.26.0.dylib`::operator()() + 268 at 
process.cpp:2174
frame #4: 0x0001022c0fa2 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__invoke<(lambda at ../../../3rdparty/libprocess/src/process.cpp:2158:35) &, 
const std::__1::atomic &> + 27 at __functional_base:415
frame #5: 0x0001022c0f87 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__apply_functor<(lambda at 
../../../3rdparty/libprocess/src/process.cpp:2158:35), 
std::__1::tuple > >, 
0, std::__1::tuple<> > + 55 at functional:2060
frame #6: 0x0001022c0f50 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
operator()<> + 41 at functional:2123
frame #7: 0x0001022c0f27 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__invoke > >> + 14 at 
__functional_base:415
frame #8: 0x0001022c0f19 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() [inlined] 
__thread_execute > >> + 25 at thread:337
frame #9: 0x0001022c0f00 
libmesos-0.26.0.dylib`::__thread_proxy > > > >() + 368 at 
thread:347
frame #10: 0x7fff964c705a libsystem_pthread.dylib`_pthread_body + 131
frame #11: 0x7fff964c6fd7 libsystem_pthread.dylib`_pthread_start + 176
frame #12: 0x7fff964c43ed libsystem_pthread.dylib`thread_start + 13
{code}

  was:
If you pass a nonsense string for "master" into a framework using the HTTP 
scheduler library, the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

Using the HTTP API:
{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}


> HTTP scheduler library does not gracefully parse invalid resource identifiers
> -
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you pass a nonsense string for "master" into a framework using the HTTP 
> scheduler library, the framework segfaults.
> For example, using the example frameworks:
> {code:title=Scheduler Driver}
> build/src/test-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in:
> {code}
> Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to 
> parse 'asdf://127.0.0.1:5050'
> {code}
> {code:title=HTTP Scheduler Library}
> export DEFAULT_PRINCIPAL=root
> build/src/event-call-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in
> {code}
> I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
> Segmentation fault: 11
> {code}
> {code:title=Stack Trace}
> * thread #2: tid = 0x28b6bb, 0x000100ad03ca 
> libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
>  + 42 at scheduler.cpp:213, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
>   * frame #0: 0x000100ad03ca 
> libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::init

[jira] [Updated] (MESOS-3748) HTTP scheduler library does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3748:
-
Description: 
If you pass a nonsense string for "master" into a framework using the HTTP 
scheduler library, the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

Using the HTTP API:
{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}

  was:
If you pass a nonsense string for "master" into a framework using the HTTP API, 
the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

Using the HTTP API:
{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}


> HTTP scheduler library does not gracefully parse invalid resource identifiers
> -
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you pass a nonsense string for "master" into a framework using the HTTP 
> scheduler library, the framework segfaults.
> For example, using the example frameworks:
> {code}
> build/src/test-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in:
> {code}
> Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to 
> parse 'asdf://127.0.0.1:5050'
> {code}
> Using the HTTP API:
> {code}
> export DEFAULT_PRINCIPAL=root
> build/src/event-call-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in
> {code}
> I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
> Segmentation fault: 11
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3748) HTTP scheduler library does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3748:
-
Summary: HTTP scheduler library does not gracefully parse invalid resource 
identifiers  (was: HTTP API does not gracefully parse invalid resource 
identifiers)

> HTTP scheduler library does not gracefully parse invalid resource identifiers
> -
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you pass a nonsense string for "master" into a framework using the HTTP 
> API, the framework segfaults.
> For example, using the example frameworks:
> {code}
> build/src/test-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in:
> {code}
> Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to 
> parse 'asdf://127.0.0.1:5050'
> {code}
> Using the HTTP API:
> {code}
> export DEFAULT_PRINCIPAL=root
> build/src/event-call-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in
> {code}
> I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
> Segmentation fault: 11
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3748) HTTP API does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3748:
-
Description: 
If you pass a nonsense string for "master" into a framework using the HTTP API, 
the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

Using the HTTP API:
{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}

  was:
If you pass a nonsense string for "master" into a framework using the HTTP API, 
the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}


> HTTP API does not gracefully parse invalid resource identifiers
> ---
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you pass a nonsense string for "master" into a framework using the HTTP 
> API, the framework segfaults.
> For example, using the example frameworks:
> {code}
> build/src/test-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in:
> {code}
> Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to 
> parse 'asdf://127.0.0.1:5050'
> {code}
> Using the HTTP API:
> {code}
> export DEFAULT_PRINCIPAL=root
> build/src/event-call-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in
> {code}
> I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
> Segmentation fault: 11
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3748) HTTP API does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3748:


 Summary: HTTP API does not gracefully parse invalid resource 
identifiers
 Key: MESOS-3748
 URL: https://issues.apache.org/jira/browse/MESOS-3748
 Project: Mesos
  Issue Type: Bug
  Components: framework, HTTP API
Affects Versions: 0.25.0
Reporter: Joseph Wu
Assignee: Joseph Wu


If you pass a nonsense string for "master" into a framework using the HTTP API, 
the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3563) Revocable task CPU shows as zero in /state.json

2015-10-15 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3563:
--
Story Points: 2

> Revocable task CPU shows as zero in /state.json
> ---
>
> Key: MESOS-3563
> URL: https://issues.apache.org/jira/browse/MESOS-3563
> Project: Mesos
>  Issue Type: Bug
>Reporter: Maxim Khutornenko
>Assignee: Vinod Kone
>
> The slave's state.json reports revocable task resources as zero:
> {noformat}
> resources: {
> cpus: 0,
> disk: 3071,
> mem: 1248,
> ports: "[31715-31715]"
> },
> {noformat}
> Also, there is no indication that a task uses revocable CPU. It would be 
> great to have this type of info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3723) Authorize quota requests

2015-10-15 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3723:

Labels: acl mesosphere security  (was: mesosphere)

> Authorize quota requests
> 
>
> Key: MESOS-3723
> URL: https://issues.apache.org/jira/browse/MESOS-3723
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: acl, mesosphere, security
>
> When quotas are requested they should authorize their roles.
> This ticket will authorize quota requests with ACLs. The existing 
> authorization support that has been implemented in MESOS-1342 will be 
> extended to add a `request_quotas` ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3722) Authenticate quota requests

2015-10-15 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3722:

Labels: mesosphere security  (was: mesosphere)

> Authenticate quota requests
> ---
>
> Key: MESOS-3722
> URL: https://issues.apache.org/jira/browse/MESOS-3722
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere, security
>
> Quota requests need to be authenticated.
> This ticket will authenticate quota requests using credentials provided by 
> the `Authorization` field of the HTTP request. This is similar to how 
> authentication is implemented in `Master::Http`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-191) Add support for multiple disk resources

2015-10-15 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959765#comment-14959765
 ] 

Jie Yu commented on MESOS-191:
--

Hey guys, Thanks for the doc above and initial design. My initial proposal can 
be found in this doc:
https://docs.google.com/document/d/1SOw4q5OkTpJpkuxEbkojS6rn20S37t0wRb-n2mFFpww/edit?usp=sharing

> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3746) Consider introducing a mechanism to provide feedback on offer operations

2015-10-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3746:
--
Labels: mesosphere persistent-volumes  (was: mesosphere)

> Consider introducing a mechanism to provide feedback on offer operations
> 
>
> Key: MESOS-3746
> URL: https://issues.apache.org/jira/browse/MESOS-3746
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>  Labels: mesosphere, persistent-volumes
>
> Currently, the master does not provide a direct feedback to the framework 
> when an operation is dropped: 
> https://github.com/apache/mesos/blob/master/src/master/master.cpp#L1713-L1715
> A "subsequent offer" is used as the mechanism to determine whether an 
> operation succeeded or not, which is not sufficient if a framework mistakenly 
> sends invalid operations. There should be an immediate feedback as to whether 
> the request was "accepted".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3746) Consider introducing a mechanism to provide feedback on offer operations

2015-10-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959723#comment-14959723
 ] 

Adam B commented on MESOS-3746:
---

Some operations (e.g. reservations) can be monitored by checking the master's 
state json, but this is not fine-grained enough to be reliable.

> Consider introducing a mechanism to provide feedback on offer operations
> 
>
> Key: MESOS-3746
> URL: https://issues.apache.org/jira/browse/MESOS-3746
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>  Labels: mesosphere
>
> Currently, the master does not provide a direct feedback to the framework 
> when an operation is dropped: 
> https://github.com/apache/mesos/blob/master/src/master/master.cpp#L1713-L1715
> A "subsequent offer" is used as the mechanism to determine whether an 
> operation succeeded or not, which is not sufficient if a framework mistakenly 
> sends invalid operations. There should be an immediate feedback as to whether 
> the request was "accepted".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1582) Improve build time.

2015-10-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959719#comment-14959719
 ] 

James Peach edited comment on MESOS-1582 at 10/15/15 10:05 PM:
---

Here's a quick hack to generate compilation timings:
{code}
diff --git a/src/Makefile.am b/src/Makefile.am
index 96ce73b..fd3389b 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -56,6 +56,11 @@ endif

 PROTOCFLAGS = -I$(top_srcdir)/include -I$(srcdir)

+TIME = /usr/bin/time
+ECHO = /bin/echo
+am__v_CXX_time = $(ECHO) -n "  CXX  " $@ && $(TIME) 
+am__v_lt_time = --silent
+
 # Initialize variables here so we can use += operator everywhere else.
 lib_LTLIBRARIES =
 noinst_LTLIBRARIES =
{code}

Invoke it like {{make V=time}}.


was (Author: jamespeach):
Here's a quick hack to generate compilation timings:
{code}
diff --git a/src/Makefile.am b/src/Makefile.am
index 96ce73b..fd3389b 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -56,6 +56,11 @@ endif

 PROTOCFLAGS = -I$(top_srcdir)/include -I$(srcdir)

+TIME = /usr/bin/time
+ECHO = /bin/echo
+am__v_CXX_time = $(ECHO) -n "  CXX  " $@ && $(TIME) 
+am__v_lt_time = --silent
+
 # Initialize variables here so we can use += operator everywhere else.
 lib_LTLIBRARIES =
 noinst_LTLIBRARIES =
{code}

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1582) Improve build time.

2015-10-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959719#comment-14959719
 ] 

James Peach commented on MESOS-1582:


Here's a quick hack to generate compilation timings:
{code}
diff --git a/src/Makefile.am b/src/Makefile.am
index 96ce73b..fd3389b 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -56,6 +56,11 @@ endif

 PROTOCFLAGS = -I$(top_srcdir)/include -I$(srcdir)

+TIME = /usr/bin/time
+ECHO = /bin/echo
+am__v_CXX_time = $(ECHO) -n "  CXX  " $@ && $(TIME) 
+am__v_lt_time = --silent
+
 # Initialize variables here so we can use += operator everywhere else.
 lib_LTLIBRARIES =
 noinst_LTLIBRARIES =
{code}

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-15 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959702#comment-14959702
 ] 

Yong Tang commented on MESOS-3738:
--

A quick workaround for this issue is to pass

--executor_environment_variables=executor.json

where executor.json consists of MESOS_LAUNCHER_DIR:

{"MESOS_LAUNCHER_DIR": "/usr/libexec/mesos"}

Though this is just a workaround. A fix for this issue is still needed in Mesos 
source.

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> Marathon JSON file:
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.13062762 exec.cpp:208] Executor regi

[jira] [Commented] (MESOS-2647) Slave should validate tasks using oversubscribed resources

2015-10-15 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959697#comment-14959697
 ] 

Vinod Kone commented on MESOS-2647:
---

Available revocable resources can change at any time because it is typically 
based on the current utilization of the agent. This is different from regular 
resources, whose availability only changes when tasks are launched/terminated.

For example, if the agent offers 2 cpus as revocable and a framework launches a 
revocable task with 2 cpus, it might happen that just before the agent receives 
the new revocable task, the utilization increased on the agent and the latest 
available revocable estimate is 1 cpus.

> Slave should validate tasks using oversubscribed resources
> --
>
> Key: MESOS-2647
> URL: https://issues.apache.org/jira/browse/MESOS-2647
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Guangya Liu
>  Labels: twitter
>
> The latest oversubscribed resource estimate might render a revocable task 
> launch invalid. Slave should check this and send TASK_LOST with appropriate 
> REASON.
> We need to add a new REASON for this (REASON_RESOURCE_OVERSUBSCRIBED?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3562) Anomalous bytes in stream from HTTPI Api

2015-10-15 Thread Ben Whitehead (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Whitehead updated MESOS-3562:
-
Description: 
When connecting to the new HTTP Api and attempting to {{SUBSCRIBE}} there are 
some anomalous bytes contained in the chunked stream that appear to be causing 
problems when I attempting to integrate.

Attached are two log files. app.log represents my application trying to connect 
to mesos using RxNetty. Netty has been configured to log all data it 
sends/receives over the wire this can be seen in the byte blocks in the log. 

The client is constructing a protobuf in java for the subscribe call  
{code:java}
final Call subscribeCall = Call.newBuilder()
.setType(Call.Type.SUBSCRIBE)
.setSubscribe(
Call.Subscribe.newBuilder()
.setFrameworkInfo(
Protos.FrameworkInfo.newBuilder()
.setUser("bill")
.setName("testing")
.build()
)
)
.build();
{code}
 
lient sends the protobuf to mesos with the following request headers:
{code}
POST /api/v1/scheduler HTTP/1.1
Content-Type: application/x-protobuf
Accept: application/json
Content-Length: 35
Host: localhost:5050
User-Agent: RxNetty Client
{code}
The body is then serialized via protobuf and sent.

The response from the mesos master has the following headers:
{code}
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Date: Wed, 30 Sep 2015 21:07:16 GMT
Content-Type: application/json
{code}
followed by 
{code}
\r\n\r\n6c\r\n104\n{"subscribed":{"framework_id":{"value":"20150930-103028-16777343-5050-11742-0028"}},"type":"SUBSCRIBED"}
{code}
The {{\r\n\r\n}} is expected for standard http bodies, how ever {{6c\r\n}} 
doesn't appear to be attached to anything. {{104}} is the correct length of the 
Subscribe events JSON.

What is this extra number and why is it there?

This is not the first time confusion has come up related to the wire format for 
the event stream from the new http api see 
[this|http://mail-archives.apache.org/mod_mbox/mesos-user/201508.mbox/%3c94d2c9e8-2fe8-4c11-b0d3-859dac654...@me.com%3E]
 message from the mailing list.

In the [Design 
Doc|https://docs.google.com/document/d/1pnIY_HckimKNvpqhKRhbc9eSItWNFT-priXh_urR-T0/edit#]
 there is a statement that said 
{quote}
All subsequent events that are relevant to this framework  generated by Mesos 
are streamed on this connection. Master encodes each Event in RecordIO format, 
i.e., string representation of length of the event followed by JSON or binary 
Protobuf  (possibly compressed) encoded event. 
{quote}

There is no specification I've been able to find online that actually explains 
this format. The only reference I can find to it is some sample go code.

The attached tcpdump.log contains a tcp dump between the mesos master and my 
client collected using the following command {{tcpdump -xx -n -i lo "dst port 
5050" or "src port 5050" 2>&1 | tee /tmp/tcpdump.log}}

  was:
When connecting to the new HTTP Api and attempting to {{SUBSCRIBE}} there are 
some anomalous bytes contained in the chunked stream that appear to be causing 
problems when I attempting to integrate.

Attached are two log files. app.log represents my application trying to connect 
to mesos using RxNetty. Netty has been configured to log all data it 
sends/receives over the wire this can be seen in the byte blocks in the log. 

The client is constructing a protobuf in java for the subscribe call  
{code:java}
final Call subscribeCall = Call.newBuilder()
.setType(Call.Type.SUBSCRIBE)
.setSubscribe(
Call.Subscribe.newBuilder()
.setFrameworkInfo(
Protos.FrameworkInfo.newBuilder()
.setUser("bill")
.setName("testing_this_shit_out")
.build()
)
)
.build();
{code}
 
lient sends the protobuf to mesos with the following request headers:
{code}
POST /api/v1/scheduler HTTP/1.1
Content-Type: application/x-protobuf
Accept: application/json
Content-Length: 35
Host: localhost:5050
User-Agent: RxNetty Client
{code}
The body is then serialized via protobuf and sent.

The response from the mesos master has the following headers:
{code}
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Date: Wed, 30 Sep 2015 21:07:16 GMT
Content-Type: application/json
{code}
followed by 
{code}
\r\n\r\n6c\r\n104\n{"subscribed":{"framework_id":{"value":"20150930-103028-16777343-5050-11742-0028"}},"type":"SUBSCRIBED"}
{code}
The {{\r\n\r\n}} is expected for standard http bodies, how ever {{6c\r\n}} 
doesn't appear to be attached to anything. {{104}} is the correct length of the 
Subscribe events JSON.

What is this extra number and why is it there?

T

[jira] [Commented] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string

2015-10-15 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959626#comment-14959626
 ] 

Anand Mazumdar commented on MESOS-3747:
---

As per my discussion with [~bmahler] on IRC. Historically, the user was set to 
the current user by the driver in case the framework developer ignored setting 
it.
https://github.com/apache/mesos/blob/master/src/sched/sched.cpp#L1548

With there being no driver now when using the HTTP API, we might want to:

-  Disallow and return a {{400 BadRequest}} when validating if 
{{FrameworkInfo.user==""}}.
- Clarify in {{mesos.proto}} that this the user field is populated by the 
{{driver}} and has to be done explicitly for anyone using the {{Scheduler HTTP 
API}} 
https://github.com/apache/mesos/blob/master/include/mesos/v1/mesos.proto#L212

> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -
>
> Key: MESOS-3747
> URL: https://issues.apache.org/jira/browse/MESOS-3747
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Ben Whitehead
>Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to 
> inherit the user the agent processes is running as, this behavior now results 
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not 
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
> type: SUBSCRIBE
> subscribe: {
> frameworkInfo: {
> user: "",
> name: "testing"
> }
> }
> }
> {code}
> {code:title=TaskInfo}
> call {
> framework_id { value: "20151015-125949-16777343-5050-20146-" },
> type: ACCEPT,
> accept { 
> offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
> operations { 
> type: LAUNCH, 
> launch { 
> task_infos [
> {
> name: "task-1",
> task_id: { value: "task-1" },
> agent_id: { value: 
> "20151015-125949-16777343-5050-20146-S0" },
> resources [
> { name: "cpus", type: SCALAR, scalar: { value: 
> 0.1 },  role: "*" },
> { name: "mem",  type: SCALAR, scalar: { value: 
> 64.0 }, role: "*" },
> { name: "disk", type: SCALAR, scalar: { value: 
> 0.0 },  role: "*" },
> ],
> command: { 
> environment { 
> variables [ 
> { name: "SLEEP_SECONDS" value: "15" } 
> ] 
> },
> value: "env | sort && sleep $SLEEP_SECONDS"
> }
>     }
> ]
>  }
>  }
>  }
> }
> {code}
> {code:title=Update Status}
> event: {
> type: UPDATE,
> update: { 
> status: { 
> task_id: { value: "task-1" }, 
> state: TASK_FAILED,
> message: "Container destroyed while preparing isolators",
> agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, 
> timestamp: 1.444939217401241E9,
> executor_id: { value: "task-1" },
> source: SOURCE_AGENT, 
> reason: REASON_MEMORY_LIMIT,
> uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" 
> } 
> }
> }
> {code}
> {code:title=agent logs}
> I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/run

[jira] [Commented] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string

2015-10-15 Thread Ben Whitehead (JIRA)
@127.0.0.1:57960
I1015 13:41:22.373843 24209 status_update_manager.cpp:826] Checkpointing UPDATE 
for status update TASK_STARTING (UUID: 37754127-182b-4e09-abb2-e30e804b104b) 
for task cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2-
I1015 13:41:22.448226 24209 status_update_manager.cpp:322] Received status 
update TASK_RUNNING (UUID: ef938358-d47c-4445-a7dd-c7775f74f115) for task 
cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2-
I1015 13:41:22.448226 24208 slave.cpp:3016] Forwarding the update TASK_STARTING 
(UUID: 37754127-182b-4e09-abb2-e30e804b104b) for task 
cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2- to master@127.0.0.1:5050
I1015 13:41:22.448326 24209 status_update_manager.cpp:826] Checkpointing UPDATE 
for status update TASK_RUNNING (UUID: ef938358-d47c-4445-a7dd-c7775f74f115) for 
task cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2-
I1015 13:41:22.448415 24208 slave.cpp:2946] Sending acknowledgement for status 
update TASK_STARTING (UUID: 37754127-182b-4e09-abb2-e30e804b104b) for task 
cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2- to executor(1)@127.0.0.1:57960
I1015 13:41:22.516647 24209 status_update_manager.cpp:394] Received status 
update acknowledgement (UUID: 37754127-182b-4e09-abb2-e30e804b104b) for task 
cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2-
I1015 13:41:22.516649 24211 slave.cpp:2946] Sending acknowledgement for status 
update TASK_RUNNING (UUID: ef938358-d47c-4445-a7dd-c7775f74f115) for task 
cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2- to executor(1)@127.0.0.1:57960
I1015 13:41:22.516744 24209 status_update_manager.cpp:826] Checkpointing ACK 
for status update TASK_STARTING (UUID: 37754127-182b-4e09-abb2-e30e804b104b) 
for task cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2-
I1015 13:41:22.573704 24210 slave.cpp:3084] Sending message for framework 
0accd395-799a-4c64-85ae-997f43f48bf2- to 
scheduler-008b3fea-c62f-4e42-bbcf-f1b54c548511@127.0.0.1:60517
I1015 13:41:22.589226 24213 slave.cpp:3016] Forwarding the update TASK_RUNNING 
(UUID: ef938358-d47c-4445-a7dd-c7775f74f115) for task 
cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2- to master@127.0.0.1:5050
I1015 13:41:22.768887 24211 status_update_manager.cpp:394] Received status 
update acknowledgement (UUID: ef938358-d47c-4445-a7dd-c7775f74f115) for task 
cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2-
I1015 13:41:22.768944 24211 status_update_manager.cpp:826] Checkpointing ACK 
for status update TASK_RUNNING (UUID: ef938358-d47c-4445-a7dd-c7775f74f115) for 
task cassandra.ben.node.0.executor.server of framework 
0accd395-799a-4c64-85ae-997f43f48bf2-
I1015 13:41:39.866338 24211 slave.cpp:3084] Sending message for framework 
0accd395-799a-4c64-85ae-997f43f48bf2- to 
scheduler-008b3fea-c62f-4e42-bbcf-f1b54c548511@127.0.0.1:60517
I1015 13:41:40.251452 24212 slave.cpp:3926] Current disk usage 22.64%. Max 
allowed age: 4.715127595139352days
I1015 13:41:52.565579 24210 slave.cpp:3084] Sending message for framework 
0accd395-799a-4c64-85ae-997f43f48bf2- to 
scheduler-008b3fea-c62f-4e42-bbcf-f1b54c548511@127.0.0.1:60517
{code}

> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -
>
> Key: MESOS-3747
> URL: https://issues.apache.org/jira/browse/MESOS-3747
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Ben Whitehead
>Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to 
> inherit the user the agent processes is running as, this behavior now results 
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not 
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
> type: SUBSCRIBE
> subscribe: {
> frameworkInfo: {
> user: "",
> name: "testing"
> }
> }
> }
> {code}
> {code:title=TaskInfo}
> call {
> framework_id { value: "20151015-125949-16777343-5050-20146-" },
> type: ACCEPT,
> accept { 
> offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
> operations { 
> type: LAUNCH,

[jira] [Created] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string

2015-10-15 Thread Ben Whitehead (JIRA)
Ben Whitehead created MESOS-3747:


 Summary: HTTP Scheduler API no longer allows FrameworkInfo.user to 
be empty string
 Key: MESOS-3747
 URL: https://issues.apache.org/jira/browse/MESOS-3747
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Affects Versions: 0.25.0, 0.24.1, 0.24.0
Reporter: Ben Whitehead
Priority: Blocker


When using libmesos a framework can set its user to {{""}} (empty string) to 
inherit the user the agent processes is running as, this behavior now results 
in a {{TASK_FAILED}}.

Full messages and relevant agent logs below.

The error returned to the framework tells me nothing about the user not 
existing on the agent host instead it tells me the container died due to OOM.


{code:title=FrameworkInfo}
call {
type: SUBSCRIBE
subscribe: {
frameworkInfo: {
user: "",
name: "testing"
}
}
}
{code}
{code:title=TaskInfo}
call {
framework_id { value: "20151015-125949-16777343-5050-20146-" },
type: ACCEPT,
accept { 
offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
operations { 
type: LAUNCH, 
launch { 
task_infos [
{
name: "task-1",
task_id: { value: "task-1" },
agent_id: { value: 
"20151015-125949-16777343-5050-20146-S0" },
resources [
{ name: "cpus", type: SCALAR, scalar: { value: 0.1 
},  role: "*" },
{ name: "mem",  type: SCALAR, scalar: { value: 64.0 
}, role: "*" },
{ name: "disk", type: SCALAR, scalar: { value: 0.0 
},  role: "*" },
],
command: { 
environment { 
variables [ 
{ name: "SLEEP_SECONDS" value: "15" } 
] 
},
value: "env | sort && sleep $SLEEP_SECONDS"
}
}
]
 }
 }
 }
}
{code}

{code:title=Update Status}
event: {
type: UPDATE,
update: { 
status: { 
    task_id: { value: "task-1" }, 
state: TASK_FAILED,
message: "Container destroyed while preparing isolators",
agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, 
timestamp: 1.444939217401241E9,
executor_id: { value: "task-1" },
source: SOURCE_AGENT, 
reason: REASON_MEMORY_LIMIT,
uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" 
} 
}
}
{code}

{code:title=agent logs}
I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for 
framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for framework 
e4de5b96-41cc-4713-af44-7cffbdd63ba6-
W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory 
'/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b':
 Failed to get user information for '': Success
I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of 
framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- with resources cpus(*):0.1; 
mem(*):32 in work directory 
'/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b'
I1015 13:15:34.262581 19639 slave.cpp:1604] Queuing task 'task-1' for executor 
task-1 of framework 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-
I1015 13:15:34.262684 19638 docker.cpp:734] No container info found, skipping 
launch
I1015 13:15:34.263478 19638 containerizer.cpp:640] Starting container 
'3958ff84-8dd9-4c3c-995d-5aba5250541b' for executor 'task-1' of framework 
'e4de5b96-41cc-4713-af44-7cffbdd63ba6-'
E1015 13:15:34.264516 19641 slave.cpp:3342] Container 
'3958ff84-8dd9-4c3c-995d-5aba5250541b' for executor 'task-1' of framework 
'e4de5b96-41cc-4713-af44-7cffbdd63ba6-' failed to start: Failed to prepare 
isolator: Failed to get user information for '': Success
I1015 13:15:34.264681 19636 containerizer.cpp:1097] Destroying 

[jira] [Commented] (MESOS-3224) Create a Mesos Contributor Newbie Guide

2015-10-15 Thread Diana Arroyo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959504#comment-14959504
 ] 

Diana Arroyo commented on MESOS-3224:
-

Here is the document in markdown format: 
https://drive.google.com/file/d/0B9ZMdjw53LQocWo2YVNLdGF5a2M/view?usp=sharing .

> Create a Mesos Contributor Newbie Guide
> ---
>
> Key: MESOS-3224
> URL: https://issues.apache.org/jira/browse/MESOS-3224
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Timothy Chen
>Assignee: Diana Arroyo
>
> Currently the website doesn't have a helpful guide for community users to 
> know how to start learning to contribute to Mesos, understand the concepts 
> and lower the barrier to get involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3688) Get Container Name information when launching a container task

2015-10-15 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959463#comment-14959463
 ] 

Kapil Arya commented on MESOS-3688:
---

Yeah, I can take a cut at it. Will create a RR and put the link here.

> Get Container Name information when launching a container task
> --
>
> Key: MESOS-3688
> URL: https://issues.apache.org/jira/browse/MESOS-3688
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 0.24.1
>Reporter: Raffaele Di Fazio
>  Labels: mesosphere
>
> We want to get the Docker Name (or Docker ID, or both) when launching a 
> container task with mesos. The container name is generated by mesos itself 
> (i.e. mesos-77e5fde6-83e7-4618-a2dd-d5b10f2b4d25, obtained with "docker ps") 
> and it would be nice to expose this information to frameworks so that this 
> information can be used, for example by Marathon to give this information to 
> users via a REST API. 
> To go a bit in depth with our use case, we have files created by fluentd 
> logdriver that are named with Docker Name or Docker ID (full or short) and we 
> need a mapping for the users of the REST API and thus the first step is to 
> make this information available from mesos. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3746) Consider introducing a mechanism to provide feedback on offer operations

2015-10-15 Thread Michael Park (JIRA)
Michael Park created MESOS-3746:
---

 Summary: Consider introducing a mechanism to provide feedback on 
offer operations
 Key: MESOS-3746
 URL: https://issues.apache.org/jira/browse/MESOS-3746
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Michael Park


Currently, the master does not provide a direct feedback to the framework when 
an operation is dropped: 
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L1713-L1715

A "subsequent offer" is used as the mechanism to determine whether an operation 
succeeded or not, which is not sufficient if a framework mistakenly sends 
invalid operations. There should be an immediate feedback as to whether the 
request was "accepted".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2079) IO.Write test is flaky on OS X 10.10.

2015-10-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959315#comment-14959315
 ] 

James Peach commented on MESOS-2079:


https://reviews.apache.org/r/39347/
https://reviews.apache.org/r/39348/
https://reviews.apache.org/r/39349/
https://reviews.apache.org/r/39350/
https://reviews.apache.org/r/39351/

> IO.Write test is flaky on OS X 10.10.
> -
>
> Key: MESOS-2079
> URL: https://issues.apache.org/jira/browse/MESOS-2079
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, technical debt, test
> Environment: OS X 10.10
> {noformat}
> $ clang++ --version
> Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
> Target: x86_64-apple-darwin14.0.0
> Thread model: posix
> {noformat}
>Reporter: Benjamin Mahler
>Assignee: James Peach
>  Labels: flaky
>
> [~benjaminhindman]: If I recall correctly, this is related to MESOS-1658. 
> Unfortunately, we don't have a stacktrace for SIGPIPE currently:
> {noformat}
> [ RUN  ] IO.Write
> make[5]: *** [check-local] Broken pipe: 13
> {noformat}
> Running in gdb, seems to always occur here:
> {code}
> Program received signal SIGPIPE, Broken pipe.
> [Switching to process 56827 thread 0x60b]
> 0x7fff9a011132 in __psynch_cvwait ()
> (gdb) where
> #0  0x7fff9a011132 in __psynch_cvwait ()
> #1  0x7fff903e7ea0 in _pthread_cond_wait ()
> #2  0x00010062f27c in Gate::arrive (this=0x101908a10, old=14780) at 
> gate.hpp:82
> #3  0x000100600888 in process::schedule (arg=0x0) at src/process.cpp:1373
> #4  0x7fff903e72fc in _pthread_body ()
> #5  0x7fff903e7279 in _pthread_start ()
> #6  0x7fff903e54b1 in thread_start ()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2522) Add reason field for framework errors

2015-10-15 Thread Matthias Veit (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959225#comment-14959225
 ] 

Matthias Veit commented on MESOS-2522:
--

That is exactly what I am looking for.
Current logic in Marathon is here: 
https://github.com/mesosphere/marathon/blob/master/src/main/scala/mesosphere/marathon/MarathonScheduler.scala#L118
Woud love to fill that flag with the correct value.

> Add reason field for framework errors
> -
>
> Key: MESOS-2522
> URL: https://issues.apache.org/jira/browse/MESOS-2522
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.22.0
>Reporter: Connor Doyle
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Currently, the only insight into framework errors is a message string.  
> Framework schedulers could probably be smarter about how to handle errors if 
> the cause is known.  Since there are only a handful of distinct cases that 
> could trigger an error, they could be captured by an enumeration.
> One specific use case for this feature follows. Frameworks that intend to 
> survive failover typicaly persist the FrameworkID somewhere.  When a 
> framework has been marked completed by the master for exceeding its 
> configured failover timeout, then re-registration triggers a framework error. 
>  Probably, the scheduler wants to disambiguate this kind of framework error 
> from others in order to invalidate the stashed FrameworkID for the next 
> attempt at (re)registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2522) Add reason field for framework errors

2015-10-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2522:
--
Labels: mesosphere newbie  (was: mesosphere)

> Add reason field for framework errors
> -
>
> Key: MESOS-2522
> URL: https://issues.apache.org/jira/browse/MESOS-2522
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.22.0
>Reporter: Connor Doyle
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Currently, the only insight into framework errors is a message string.  
> Framework schedulers could probably be smarter about how to handle errors if 
> the cause is known.  Since there are only a handful of distinct cases that 
> could trigger an error, they could be captured by an enumeration.
> One specific use case for this feature follows. Frameworks that intend to 
> survive failover typicaly persist the FrameworkID somewhere.  When a 
> framework has been marked completed by the master for exceeding its 
> configured failover timeout, then re-registration triggers a framework error. 
>  Probably, the scheduler wants to disambiguate this kind of framework error 
> from others in order to invalidate the stashed FrameworkID for the next 
> attempt at (re)registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2522) Add reason field for framework errors

2015-10-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959216#comment-14959216
 ] 

Adam B commented on MESOS-2522:
---

Exactly. What if Mesos could tell you which error types require you to register 
with a new frameworkId and which don't?

> Add reason field for framework errors
> -
>
> Key: MESOS-2522
> URL: https://issues.apache.org/jira/browse/MESOS-2522
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.22.0
>Reporter: Connor Doyle
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Currently, the only insight into framework errors is a message string.  
> Framework schedulers could probably be smarter about how to handle errors if 
> the cause is known.  Since there are only a handful of distinct cases that 
> could trigger an error, they could be captured by an enumeration.
> One specific use case for this feature follows. Frameworks that intend to 
> survive failover typicaly persist the FrameworkID somewhere.  When a 
> framework has been marked completed by the master for exceeding its 
> configured failover timeout, then re-registration triggers a framework error. 
>  Probably, the scheduler wants to disambiguate this kind of framework error 
> from others in order to invalidate the stashed FrameworkID for the next 
> attempt at (re)registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1582) Improve build time.

2015-10-15 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959198#comment-14959198
 ] 

haosdent commented on MESOS-1582:
-

Seems build the test objects spend a lot of time.

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1582) Improve build time.

2015-10-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959176#comment-14959176
 ] 

James Peach commented on MESOS-1582:


Here's the build times w/ clang on OS X with {{-g -O0}}: 
http://fpaste.org/279627/92656414/

> Improve build time.
> ---
>
> Key: MESOS-1582
> URL: https://issues.apache.org/jira/browse/MESOS-1582
> Project: Mesos
>  Issue Type: Epic
>  Components: build
>Reporter: Benjamin Hindman
>
> The build takes a ridiculously long time unless you have a large, parallel 
> machine. This is a combination of many factors, all of which we'd like to 
> discuss and track here.
> I'd also love to actually track build times so we can get an appreciation of 
> the improvements. Please leave a comment below with your build times!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container

2015-10-15 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-3738:
---

Assignee: haosdent

> Mesos health check is invoked incorrectly when Mesos slave is within the 
> docker container
> -
>
> Key: MESOS-3738
> URL: https://issues.apache.org/jira/browse/MESOS-3738
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
> Environment: Docker 1.8.0:
> Client:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Server:
>  Version:  1.8.0
>  API version:  1.20
>  Go version:   go1.4.2
>  Git commit:   0d03096
>  Built:Tue Aug 11 16:48:39 UTC 2015
>  OS/Arch:  linux/amd64
> Host: Ubuntu 14.04
> Container: Debian 8.1 + Java-7
>Reporter: Yong Tang
>Assignee: haosdent
>
> When Mesos slave is within the container, the COMMAND health check from 
> Marathon is invoked incorrectly.
> In such a scenario, the sandbox directory (instead of the 
> launcher/health-check directory) is used. This result in an error with the 
> container.
> Command to invoke the Mesos slave container:
> sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v 
> /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro
>  -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos 
> mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --docker_stop_timeout=10secs 
> --launcher=posix
> Marathon JSON file:
> {
>   "id": "ubuntu",
>   "container":
>   {
> "type": "DOCKER",
> "docker":
> {
>   "image": "ubuntu",
>   "network": "BRIDGE",
>   "parameters": []
> }
>   },
>   "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
>   "uris": [],
>   "healthChecks":
>   [
> {
>   "protocol": "COMMAND",
>   "command": { "value": "echo Success" },
>   "gracePeriodSeconds": 3000,
>   "intervalSeconds": 5,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 300
> }
>   ],
>   "instances": 1
> }
> STDOUT:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" 
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
> --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f"
>  --stop_timeout="10secs"
> Registered docker executor on b01e2e75afcb
> Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> 1
> Launching health check process: 
> /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check
>  --executor=(1)@10.2.1.7:40695 
> --health_check_json={"command":{"shell":true,"value":"docker exec 
> mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f
>  sh -c \" echo Success 
> \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0}
>  --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
> Health check process launched at pid: 94
> 1
> 1
> 1
> 1
> 1
> STDERR:
> root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
> I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0
> I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave 
> e20f8959-cd9f-40ae-987d-809401309361-S0
> WARNING: Your kernel does not support swap limit capabilities, memory limited 
> without swap.
> ABORT: 
> (/tmp/mesos-build/mesos-repo/3rdparty/libprocess/src/subprocess.cpp:177): 
> Failed to os::execvpe in childMain: No such file or directory*** Ab

[jira] [Commented] (MESOS-3688) Get Container Name information when launching a container task

2015-10-15 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959106#comment-14959106
 ] 

Kapil Arya commented on MESOS-3688:
---

Probably because this is a fairly new proto (merged in 0.25.0). I can't see 
anything obviously wrong with it.

> Get Container Name information when launching a container task
> --
>
> Key: MESOS-3688
> URL: https://issues.apache.org/jira/browse/MESOS-3688
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 0.24.1
>Reporter: Raffaele Di Fazio
>  Labels: mesosphere
>
> We want to get the Docker Name (or Docker ID, or both) when launching a 
> container task with mesos. The container name is generated by mesos itself 
> (i.e. mesos-77e5fde6-83e7-4618-a2dd-d5b10f2b4d25, obtained with "docker ps") 
> and it would be nice to expose this information to frameworks so that this 
> information can be used, for example by Marathon to give this information to 
> users via a REST API. 
> To go a bit in depth with our use case, we have files created by fluentd 
> logdriver that are named with Docker Name or Docker ID (full or short) and we 
> need a mapping for the users of the REST API and thus the first step is to 
> make this information available from mesos. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3688) Get Container Name information when launching a container task

2015-10-15 Thread Raffaele Di Fazio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959023#comment-14959023
 ] 

Raffaele Di Fazio commented on MESOS-3688:
--

We started hacking on this issue, this is what we did:

1) modified the protobuf:

{quote}
 diff --git a/include/mesos/mesos.proto b/include/mesos/mesos.proto
index f2ea4fc..8a14825 100644
--- a/include/mesos/mesos.proto
+++ b/include/mesos/mesos.proto
@@ -1476,6 +1476,7 @@ message ContainerInfo {
 message ContainerStatus {
   // This field can be reliably used to identify the container IP address.
   repeated NetworkInfo network_infos = 1;
+  optional ContainerID id = 2;
 }
{quote}

2) Modified the code of slave.cpp:

{quote}
diff --git a/src/slave/slave.cpp b/src/slave/slave.cpp
index 6526976..ea4b704 100644
--- a/src/slave/slave.cpp
+++ b/src/slave/slave.cpp
@@ -2859,6 +2859,12 @@ void Slave::statusUpdate(StatusUpdate update, const 
UPID& pid)
 networkInfo->set_ip_address(stringify(self().address.ip));
   }

+  ContainerID* containerID = new ContainerID();
+  containerID->set_value("PLACEHOLDER");
+  containerStatus->set_allocated_id(containerID);
+
+
   TaskStatus status = update.status();

   Executor* executor = framework->getExecutor(status.task_id());
{quote}

We currently don't know where to retrieve the data to set in containerID. We 
are looking into the mesos source code, but we would appreciate any suggestion 
and/or explanation of some mesos internals. 
We know this is just an hack and probably not a good enough solution, but it 
should be something similar.
Can anyone help us making it better? 

> Get Container Name information when launching a container task
> --
>
> Key: MESOS-3688
> URL: https://issues.apache.org/jira/browse/MESOS-3688
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 0.24.1
>Reporter: Raffaele Di Fazio
>  Labels: mesosphere
>
> We want to get the Docker Name (or Docker ID, or both) when launching a 
> container task with mesos. The container name is generated by mesos itself 
> (i.e. mesos-77e5fde6-83e7-4618-a2dd-d5b10f2b4d25, obtained with "docker ps") 
> and it would be nice to expose this information to frameworks so that this 
> information can be used, for example by Marathon to give this information to 
> users via a REST API. 
> To go a bit in depth with our use case, we have files created by fluentd 
> logdriver that are named with Docker Name or Docker ID (full or short) and we 
> need a mapping for the users of the REST API and thus the first step is to 
> make this information available from mesos. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1971) Switch cgroups_limit_swap default to true

2015-10-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959006#comment-14959006
 ] 

Anton Lindström commented on MESOS-1971:


I'm sorry for not following up, this is a bit late but should I write a patch 
to deprecate this flag and then change it in a future release?

The approach I had in mind was to add a note about the deprecation in the flag 
description and then also print it in the log of the slave when "cgroups/mem" 
or "cgroups/cpu,cgroups/mem" (default) isolation is set. The question is if we 
should always print it or do a better check of the isolation for more 
combinations of controllers?

[~adam-mesos] Would you be able to shepherd this?

Thanks!

> Switch cgroups_limit_swap default to true
> -
>
> Key: MESOS-1971
> URL: https://issues.apache.org/jira/browse/MESOS-1971
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anton Lindström
>Priority: Minor
>
> Switch cgroups_limit_swap to true per default, see MESOS-1662 for more 
> information.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3745) Slaves makes port unavailable even though no processes are using/blocking it.

2015-10-15 Thread Casey Sybrandy (JIRA)
Casey Sybrandy created MESOS-3745:
-

 Summary: Slaves makes port unavailable even though no processes 
are using/blocking it.
 Key: MESOS-3745
 URL: https://issues.apache.org/jira/browse/MESOS-3745
 Project: Mesos
  Issue Type: Bug
  Components: slave
Affects Versions: 0.24.1
Reporter: Casey Sybrandy


Hello,

I'll try to describe this as best I can.  I was trying to get a service running 
on some nodes and it would fail on about half of them.  The logs on the master 
would state that port 1234 was not part of the offer.  I looked at the nodes 
and could not find any processes using that port.  I manually ran the docker 
container I was trying to start on one of the nodes and it worked fine.  I 
ended up stopping the slave, removing it's data, then starting it back up to 
resolve it.  Unfortunately, I don't know why this occurred.

Ideally, this shouldn't be happening, however it's understandable that 
unexpected events could occur that may put the slave in a weird state.  Perhaps 
a utility could be created that could list the resource usage information on 
that slave from the slave's point of view and provide the ability to free up 
resources without restarting the slave?  Being a dev system this was fine, but 
I'd hate to have to restart a slave on a production system if I can help it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3744) Master crashes when tearing down framework

2015-10-15 Thread Peter Kolloch (JIRA)
0 holes and 0 unlearned
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
> environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@723: Client 
> environment:os.name=Linux
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@724: Client 
> environment:os.arch=4.0.5
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@725: Client 
> environment:os.version=#2 SMP Fri Jul 10 01:01:50 UTC 2015
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@733: Client 
> environment:user.name=(null)
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@741: Client 
> environment:user.home=/root
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@753: Client 
> environment:user.dir=/
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@zookeeper_init@786: 
> Initiating client connection, host=127.0.0.1:2181 sessionTimeout=1 
> watcher=0x7f0532095480 sessionId=0 sessionPasswd= 
> context=0x7f0504001130 flags=0
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.160876 18936 main.cpp:383] Starting Mesos master
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@716: Client 
> environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f0528cd3700):ZOO_INFO@check_events@1703: 
> initiated connection to server [127.0.0.1:2181]
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.161655 18936 master.cpp:368] Master 
> 20151015-131338-3674472458-5050-18936 (10.0.4.219) started on 10.0.4.219:5050
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.161357 18942 log.cpp:238] Attempting to join replica to 
> ZooKeeper group
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,162:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
> environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.162201 18936 master.cpp:370] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --cluster="peter-p70wxd2" --framework_sorter="drf" 
> --help="false" --hostname="10.0.4.219" --initialize_driver_logging="true" 
> --ip="10.0.4.219" --log_auto_initialize="true" --log_dir="/var/log/mesos" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --port="5050" --quiet="false" --quorum="1" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --roles="slave_public" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="fal

[jira] [Updated] (MESOS-3744) Master crashes when tearing down framework

2015-10-15 Thread Peter Kolloch (JIRA)
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@725: Client 
environment:os.version=#2 SMP Fri Jul 10 01:01:50 UTC 2015
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@733: Client 
environment:user.name=(null)
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@741: Client 
environment:user.home=/root
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@753: Client 
environment:user.dir=/
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@zookeeper_init@786: 
Initiating client connection, host=127.0.0.1:2181 sessionTimeout=1 
watcher=0x7f0532095480 sessionId=0 sessionPasswd= context=0x7f0504001130 
flags=0
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.160876 18936 main.cpp:383] Starting Mesos master
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f0528cd3700):ZOO_INFO@check_events@1703: 
initiated connection to server [127.0.0.1:2181]
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.161655 18936 master.cpp:368] Master 
20151015-131338-3674472458-5050-18936 (10.0.4.219) started on 10.0.4.219:5050
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.161357 18942 log.cpp:238] Attempting to join replica to 
ZooKeeper group
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,162:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162201 18936 master.cpp:370] Flags at startup: 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" 
--cluster="peter-p70wxd2" --framework_sorter="drf" --help="false" 
--hostname="10.0.4.219" --initialize_driver_logging="true" --ip="10.0.4.219" 
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" 
--quiet="false" --quorum="1" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="5secs" --registry_strict="false" 
--roles="slave_public" --root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/opt/mesosphere/packages/mesos--d43a8eb9946a5c1c5ec05fb21922a2fdf41775b2/share/mesos/webui"
 --weights="slave_public=1" --work_dir="/var/lib/mesos/master" 
--zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162433 18936 master.cpp:417] Master allowing unauthenticated 
frameworks to register
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162454 18936 master.cpp:422] Master allowing unauthenticated 
slaves to register
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162480 18936 master.cpp:459] Using default 'crammd5' 
authenticator

  was:
The crash happened shortly after calling teardown. The teardown was initiated 
by using httpie with:

http -f -v POST "$MASTER_BASE_URL/teardown" "frameworkId=$FRAMEWORK"

Below you will find the master-fail.log over the relevant time interval. Here 
are the last log lines before the mesos master died:

[jira] [Updated] (MESOS-3744) Master crashes when tearing down framework

2015-10-15 Thread Peter Kolloch (JIRA)
52aee3700):ZOO_INFO@log_env@725: Client 
environment:os.version=#2 SMP Fri Jul 10 01:01:50 UTC 2015
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@733: Client 
environment:user.name=(null)
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@741: Client 
environment:user.home=/root
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@753: Client 
environment:user.dir=/
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@zookeeper_init@786: 
Initiating client connection, host=127.0.0.1:2181 sessionTimeout=1 
watcher=0x7f0532095480 sessionId=0 sessionPasswd= context=0x7f0504001130 
flags=0
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.160876 18936 main.cpp:383] Starting Mesos master
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f0528cd3700):ZOO_INFO@check_events@1703: 
initiated connection to server [127.0.0.1:2181]
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.161655 18936 master.cpp:368] Master 
20151015-131338-3674472458-5050-18936 (10.0.4.219) started on 10.0.4.219:5050
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.161357 18942 log.cpp:238] Attempting to join replica to 
ZooKeeper group
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,162:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162201 18936 master.cpp:370] Flags at startup: 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" 
--cluster="peter-p70wxd2" --framework_sorter="drf" --help="false" 
--hostname="10.0.4.219" --initialize_driver_logging="true" --ip="10.0.4.219" 
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" 
--quiet="false" --quorum="1" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="5secs" --registry_strict="false" 
--roles="slave_public" --root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/opt/mesosphere/packages/mesos--d43a8eb9946a5c1c5ec05fb21922a2fdf41775b2/share/mesos/webui"
 --weights="slave_public=1" --work_dir="/var/lib/mesos/master" 
--zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162433 18936 master.cpp:417] Master allowing unauthenticated 
frameworks to register
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162454 18936 master.cpp:422] Master allowing unauthenticated 
slaves to register
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162480 18936 master.cpp:459] Using default 'crammd5' 
authenticator

  was:
Here is an excerpt of the startup of the effected mesos master version because 
it does contain the software versions in use:

Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.454946 18936 logging.cpp:172] INFO level logging started!
Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.455173 18936 main.cpp:181] Build: 2015-09-28 19:50:01 by
Oct 15 13:

[jira] [Updated] (MESOS-3744) Master crashes when tearing down framework

2015-10-15 Thread Peter Kolloch (JIRA)
OO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@716: Client 
> environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f0528cd3700):ZOO_INFO@check_events@1703: 
> initiated connection to server [127.0.0.1:2181]
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.161655 18936 master.cpp:368] Master 
> 20151015-131338-3674472458-5050-18936 (10.0.4.219) started on 10.0.4.219:5050
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.161357 18942 log.cpp:238] Attempting to join replica to 
> ZooKeeper group
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,162:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
> environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.162201 18936 master.cpp:370] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --cluster="peter-p70wxd2" --framework_sorter="drf" 
> --help="false" --hostname="10.0.4.219" --initialize_driver_logging="true" 
> --ip="10.0.4.219" --log_auto_initialize="true" --log_dir="/var/log/mesos" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --port="5050" --quiet="false" --quorum="1" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --roles="slave_public" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/opt/mesosphere/packages/mesos--d43a8eb9946a5c1c5ec05fb21922a2fdf41775b2/share/mesos/webui"
>  --weights="slave_public=1" --work_dir="/var/lib/mesos/master" 
> --zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.162433 18936 master.cpp:417] Master allowing unauthenticated 
> frameworks to register
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.162454 18936 master.cpp:422] Master allowing unauthenticated 
> slaves to register
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.162480 18936 master.cpp:459] Using default 'crammd5' 
> authenticator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3744) Master crashes when tearing down framework

2015-10-15 Thread Peter Kolloch (JIRA)
8936]: 
I1015 13:13:38.161655 18936 master.cpp:368] Master 
20151015-131338-3674472458-5050-18936 (10.0.4.219) started on 10.0.4.219:5050
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.161357 18942 log.cpp:238] Attempting to join replica to 
ZooKeeper group
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,162:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162201 18936 master.cpp:370] Flags at startup: 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" 
--cluster="peter-p70wxd2" --framework_sorter="drf" --help="false" 
--hostname="10.0.4.219" --initialize_driver_logging="true" --ip="10.0.4.219" 
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" 
--quiet="false" --quorum="1" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="5secs" --registry_strict="false" 
--roles="slave_public" --root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/opt/mesosphere/packages/mesos--d43a8eb9946a5c1c5ec05fb21922a2fdf41775b2/share/mesos/webui"
 --weights="slave_public=1" --work_dir="/var/lib/mesos/master" 
--zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162433 18936 master.cpp:417] Master allowing unauthenticated 
frameworks to register
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162454 18936 master.cpp:422] Master allowing unauthenticated 
slaves to register
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162480 18936 master.cpp:459] Using default 'crammd5' 
authenticator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3743) Provide diagnostic output in agent log when fetching fails

2015-10-15 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-3743:
-

 Summary: Provide diagnostic output in agent log when fetching fails
 Key: MESOS-3743
 URL: https://issues.apache.org/jira/browse/MESOS-3743
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Reporter: Bernd Mathiske
Assignee: Bernd Mathiske
Priority: Minor
 Fix For: 0.26.0


When fetching fails, the fetcher has written log output to stderr in the task 
sandbox, but it is not easy to get to. It may even be impossible to get to if 
one only has the agent log available and no more access to the sandbox. This is 
for instance the case when looking at output from a CI run.

The fetcher actor in the agent detects if the external fetcher program claims 
to have succeeded or not. When it exits with an error code, we could grab the 
fetcher log from the stderr file in the sandbox and append it to the agent log.

This is similar to this patch: https://reviews.apache.org/r/37813/

The difference is that the output of the latter is triggered by test failures 
outside the fetcher, whereas what is proposed here is triggering upon failures 
inside the fetcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2522) Add reason field for framework errors

2015-10-15 Thread Matthias Veit (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958827#comment-14958827
 ] 

Matthias Veit commented on MESOS-2522:
--

[~adam-mesos] Currently Marathon removes the frameworkId, whenever we get a 
driver error, which could be very wrong depending on the kind of the error.

> Add reason field for framework errors
> -
>
> Key: MESOS-2522
> URL: https://issues.apache.org/jira/browse/MESOS-2522
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.22.0
>Reporter: Connor Doyle
>Priority: Minor
>  Labels: mesosphere
>
> Currently, the only insight into framework errors is a message string.  
> Framework schedulers could probably be smarter about how to handle errors if 
> the cause is known.  Since there are only a handful of distinct cases that 
> could trigger an error, they could be captured by an enumeration.
> One specific use case for this feature follows. Frameworks that intend to 
> survive failover typicaly persist the FrameworkID somewhere.  When a 
> framework has been marked completed by the master for exceeding its 
> configured failover timeout, then re-registration triggers a framework error. 
>  Probably, the scheduler wants to disambiguate this kind of framework error 
> from others in order to invalidate the stashed FrameworkID for the next 
> attempt at (re)registration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3573) Mesos does not kill orphaned docker containers

2015-10-15 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958567#comment-14958567
 ] 

Ian Babrou commented on MESOS-3573:
---

Removal of /var/lib/mesos/meta/slaves/latest on attribute change also triggers 
hanging containers. Mesos executors are cleaned up nicely (at least they are 
killed), but docker containers are not.

Regular restart of mesos-slave works as expected.

Any chance this could be addressed any time soon?

> Mesos does not kill orphaned docker containers
> --
>
> Key: MESOS-3573
> URL: https://issues.apache.org/jira/browse/MESOS-3573
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, slave
>Reporter: Ian Babrou
>Priority: Blocker
>  Labels: mesosphere
>
> After upgrade to 0.24.0 we noticed hanging containers appearing. Looks like 
> there were changes between 0.23.0 and 0.24.0 that broke cleanup.
> Here's how to trigger this bug:
> 1. Deploy app in docker container.
> 2. Kill corresponding mesos-docker-executor process
> 3. Observe hanging container
> Here are the logs after kill:
> {noformat}
> slave_1| I1002 12:12:59.362002  7791 docker.cpp:1576] Executor for 
> container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8' has exited
> slave_1| I1002 12:12:59.362284  7791 docker.cpp:1374] Destroying 
> container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8'
> slave_1| I1002 12:12:59.363404  7791 docker.cpp:1478] Running docker stop 
> on container 'f083aaa2-d5c3-43c1-b6ba-342de8829fa8'
> slave_1| I1002 12:12:59.363876  7791 slave.cpp:3399] Executor 
> 'sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c' of framework 
> 20150923-122130-2153451692-5050-1- terminated with signal Terminated
> slave_1| I1002 12:12:59.367570  7791 slave.cpp:2696] Handling status 
> update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1- from @0.0.0.0:0
> slave_1| I1002 12:12:59.367842  7791 slave.cpp:5094] Terminating task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c
> slave_1| W1002 12:12:59.368484  7791 docker.cpp:986] Ignoring updating 
> unknown container: f083aaa2-d5c3-43c1-b6ba-342de8829fa8
> slave_1| I1002 12:12:59.368671  7791 status_update_manager.cpp:322] 
> Received status update TASK_FAILED (UUID: 
> 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1-
> slave_1| I1002 12:12:59.368741  7791 status_update_manager.cpp:826] 
> Checkpointing UPDATE for status update TASK_FAILED (UUID: 
> 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1-
> slave_1| I1002 12:12:59.370636  7791 status_update_manager.cpp:376] 
> Forwarding update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) 
> for task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1- to the slave
> slave_1| I1002 12:12:59.371335  7791 slave.cpp:2975] Forwarding the 
> update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1- to master@172.16.91.128:5050
> slave_1| I1002 12:12:59.371908  7791 slave.cpp:2899] Status update 
> manager successfully handled status update TASK_FAILED (UUID: 
> 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1-
> master_1   | I1002 12:12:59.37204711 master.cpp:4069] Status update 
> TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1- from slave 
> 20151002-120829-2153451692-5050-1-S0 at slave(1)@172.16.91.128:5051 
> (172.16.91.128)
> master_1   | I1002 12:12:59.37253411 master.cpp:4108] Forwarding status 
> update TASK_FAILED (UUID: 4a1b2387-a469-4f01-bfcb-0d1cccbde550) for task 
> sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1-
> master_1   | I1002 12:12:59.37301811 master.cpp:5576] Updating the latest 
> state of task sleepy.87eb6191-68fe-11e5-9444-8eb895523b9c of framework 
> 20150923-122130-2153451692-5050-1- to TASK_FAILED
> master_1   | I1002 12:12:59.37344711 hierarchical.hpp:814] Recovered 
> cpus(*):0.1; mem(*):16; ports(*):[31685-31685] (total: cpus(*):4; 
> mem(*):1001; disk(*):52869; ports(*):[31000-32000], allocated: 
> cpus(*):8.32667e-17) on slave 20151002-120829-2153451692-5050-1-S0 from 
> framework 20150923-122130-2153451692-5050-1-
> {noformat}
> Anoth

[jira] [Created] (MESOS-3742) Site needs to get updated as it still lists MesosCon Europe as an upcoming event

2015-10-15 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-3742:
-

 Summary: Site needs to get updated as it still lists MesosCon 
Europe as an upcoming event
 Key: MESOS-3742
 URL: https://issues.apache.org/jira/browse/MESOS-3742
 Project: Mesos
  Issue Type: Bug
  Components: project website
Reporter: Till Toenshoff


The Apache website does need to get updated as it still lists MesosCon Europe 
as an upcoming event

Even the registration page 
(http://events.linuxfoundation.org/events/mesoscon-europe/attend/register) 
still seems to accept registrations - something we might want to get fixed 
upstream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2090) Introduce trackable JSON-based flags

2015-10-15 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2090:
--
Labels: mesosphere newbie  (was: mesosphere)

> Introduce trackable JSON-based flags
> 
>
> Key: MESOS-2090
> URL: https://issues.apache.org/jira/browse/MESOS-2090
> Project: Mesos
>  Issue Type: Task
>Reporter: Alexander Rukletsov
>  Labels: mesosphere, newbie
>
> Some flags represent configuration that may change over time, e.g. 
> {{--whitelist}} master flag. Such flags mostly contain JSON formatted data. 
> Provide a common mechanism to facilitate tracking changes and reloading the 
> flag value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3734) Incorrect sed syntax for Mac OSX

2015-10-15 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3734:

Assignee: Neil Conway

> Incorrect sed syntax for Mac OSX
> 
>
> Key: MESOS-3734
> URL: https://issues.apache.org/jira/browse/MESOS-3734
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Blocker
>  Labels: mesosphere
>
> The build currently fails on OSX:
> {noformat}
> ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/protoc 
> -I../../mesos/include/mesos/containerizer   \
>   -I../../mesos/include -I../../mesos/src 
> \
>   --python_out=python/interface/src/mesos/interface 
> ../../mesos/include/mesos/containerizer/containerizer.proto
> ../../mesos/install-sh -c -d python/interface/src/mesos/v1/interface
> sed -i 's/mesos\.mesos_pb2/mesos_pb2/' 
> python/interface/src/mesos/interface/containerizer_pb2.py
> sed: 1: "python/interface/src/me ...": extra characters at the end of p 
> command
> make[1]: *** [python/interface/src/mesos/interface/containerizer_pb2.py] 
> Error 1
> {noformat}
> This is because the sed command uses the wrong syntax for OSX: you need 
> {code}sed -i ""{code} to instruct sed to not use a backup file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3554) Allocator changes trigger large re-compiles

2015-10-15 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958451#comment-14958451
 ] 

Joris Van Remoortere commented on MESOS-3554:
-

{code}
commit 8b87d4772c8653574b4892b8a28b8de1a3575fd3
Author: Joris Van Remoortere 
Date:   Thu Oct 15 08:56:48 2015 +0200

Hierarchical Allocator: Replaced Polymorphic factory with functions.

Review: https://reviews.apache.org/r/39312
{code}

> Allocator changes trigger large re-compiles
> ---
>
> Key: MESOS-3554
> URL: https://issues.apache.org/jira/browse/MESOS-3554
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>
> Due to the templatized nature of the allocator, even small changes trigger 
> large recompiles of the code-base. This make iterating on changes expensive 
> for developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2255) SlaveRecoveryTest/0.MasterFailover is flaky

2015-10-15 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958448#comment-14958448
 ] 

Yong Qiao Wang edited comment on MESOS-2255 at 10/15/15 7:12 AM:
-

[~xujyan], I ran this test case SlaveRecoveryTest/0.MasterFailover again on OS 
X(10.10.4), but I found it works well:

{noformat:title=}
Yongs-MacBook-Pro:bin yqwyq$ ./mesos-tests.sh 
--gtest_filter=SlaveRecoveryTest/0.MasterFailover
..
..
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.MasterFailover
I1015 14:58:55.538914 1939460864 exec.cpp:136] Version: 0.26.0
..
..
[   OK ] SlaveRecoveryTest/0.MasterFailover (1397 ms)
[--] 1 test from SlaveRecoveryTest/0 (1397 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (1406 ms total)
[  PASSED  ] 1 test.
{noformat}

Could you let me know which OS/version you ran this case? I need to reproduce 
this problem. Thanks!


was (Author: jamesyongqiaowang):
[~xujyan], I ran the test case SlaveRecoveryTest/0.MasterFailover again on OS 
X(10.10.4), but I found it work well:

{noformat:title=}
Yongs-MacBook-Pro:bin yqwyq$ ./mesos-tests.sh 
--gtest_filter=SlaveRecoveryTest/0.MasterFailover
..
..
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.MasterFailover
I1015 14:58:55.538914 1939460864 exec.cpp:136] Version: 0.26.0
..
..
[   OK ] SlaveRecoveryTest/0.MasterFailover (1397 ms)
[--] 1 test from SlaveRecoveryTest/0 (1397 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (1406 ms total)
[  PASSED  ] 1 test.
{noformat}

Could you let me know which OS/version you ran this case?

> SlaveRecoveryTest/0.MasterFailover is flaky
> ---
>
> Key: MESOS-2255
> URL: https://issues.apache.org/jira/browse/MESOS-2255
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Yan Xu
>Assignee: Yong Qiao Wang
>  Labels: flaky, twitter
>
> {noformat:title=}
> [ RUN  ] SlaveRecoveryTest/0.MasterFailover
> Using temporary directory '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0'
> I0123 07:45:49.818686 17634 leveldb.cpp:176] Opened db in 31.195549ms
> I0123 07:45:49.821962 17634 leveldb.cpp:183] Compacted db in 3.190936ms
> I0123 07:45:49.822049 17634 leveldb.cpp:198] Created db iterator in 47324ns
> I0123 07:45:49.822069 17634 leveldb.cpp:204] Seeked to beginning of db in 
> 2038ns
> I0123 07:45:49.822084 17634 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 484ns
> I0123 07:45:49.822160 17634 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0123 07:45:49.824241 17660 recover.cpp:449] Starting replica recovery
> I0123 07:45:49.825217 17660 recover.cpp:475] Replica is in EMPTY status
> I0123 07:45:49.827020 17660 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0123 07:45:49.827453 17659 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0123 07:45:49.828047 17659 recover.cpp:566] Updating replica status to 
> STARTING
> I0123 07:45:49.838543 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 10.24963ms
> I0123 07:45:49.838580 17659 replica.cpp:323] Persisted replica status to 
> STARTING
> I0123 07:45:49.848836 17659 recover.cpp:475] Replica is in STARTING status
> I0123 07:45:49.850039 17659 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0123 07:45:49.850286 17659 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0123 07:45:49.850754 17659 recover.cpp:566] Updating replica status to VOTING
> I0123 07:45:49.853698 17655 master.cpp:262] Master 
> 20150123-074549-16842879-44955-17634 (utopic) started on 127.0.1.1:44955
> I0123 07:45:49.853981 17655 master.cpp:308] Master only allowing 
> authenticated frameworks to register
> I0123 07:45:49.853997 17655 master.cpp:313] Master only allowing 
> authenticated slaves to register
> I0123 07:45:49.854038 17655 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0/credentials'
> I0123 07:45:49.854557 17655 master.cpp:357] Authorization enabled
> I0123 07:45:49.859633 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.742923ms
> I0123 07:45:49.859853 17659 re

[jira] [Comment Edited] (MESOS-2255) SlaveRecoveryTest/0.MasterFailover is flaky

2015-10-15 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958448#comment-14958448
 ] 

Yong Qiao Wang edited comment on MESOS-2255 at 10/15/15 7:11 AM:
-

[~xujyan], I ran the test case SlaveRecoveryTest/0.MasterFailover again on OS 
X(10.10.4), but I found it work well:

{noformat:title=}
Yongs-MacBook-Pro:bin yqwyq$ ./mesos-tests.sh 
--gtest_filter=SlaveRecoveryTest/0.MasterFailover
..
..
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.MasterFailover
I1015 14:58:55.538914 1939460864 exec.cpp:136] Version: 0.26.0
..
..
[   OK ] SlaveRecoveryTest/0.MasterFailover (1397 ms)
[--] 1 test from SlaveRecoveryTest/0 (1397 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (1406 ms total)
[  PASSED  ] 1 test.
{noformat}

Could you let me know which OS/version you ran this case?


was (Author: jamesyongqiaowang):
[~xujyan], I ran the test case SlaveRecoveryTest/0.MasterFailover again on OS 
X(10.10.4), but I found it work well:

{noformat:title=}
Yongs-MacBook-Pro:bin yqwyq$ ./mesos-tests.sh 
--gtest_filter=SlaveRecoveryTest/0.MasterFailover
..
..
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.MasterFailover
I1015 14:58:55.538914 1939460864 exec.cpp:136] Version: 0.26.0
..
..
[   OK ] SlaveRecoveryTest/0.MasterFailover (1397 ms)
[--] 1 test from SlaveRecoveryTest/0 (1397 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (1406 ms total)
[  PASSED  ] 1 test.
{noformat:title=}

Could you let me know which OS/version you ran this case?

> SlaveRecoveryTest/0.MasterFailover is flaky
> ---
>
> Key: MESOS-2255
> URL: https://issues.apache.org/jira/browse/MESOS-2255
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Yan Xu
>Assignee: Yong Qiao Wang
>  Labels: flaky, twitter
>
> {noformat:title=}
> [ RUN  ] SlaveRecoveryTest/0.MasterFailover
> Using temporary directory '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0'
> I0123 07:45:49.818686 17634 leveldb.cpp:176] Opened db in 31.195549ms
> I0123 07:45:49.821962 17634 leveldb.cpp:183] Compacted db in 3.190936ms
> I0123 07:45:49.822049 17634 leveldb.cpp:198] Created db iterator in 47324ns
> I0123 07:45:49.822069 17634 leveldb.cpp:204] Seeked to beginning of db in 
> 2038ns
> I0123 07:45:49.822084 17634 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 484ns
> I0123 07:45:49.822160 17634 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0123 07:45:49.824241 17660 recover.cpp:449] Starting replica recovery
> I0123 07:45:49.825217 17660 recover.cpp:475] Replica is in EMPTY status
> I0123 07:45:49.827020 17660 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0123 07:45:49.827453 17659 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0123 07:45:49.828047 17659 recover.cpp:566] Updating replica status to 
> STARTING
> I0123 07:45:49.838543 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 10.24963ms
> I0123 07:45:49.838580 17659 replica.cpp:323] Persisted replica status to 
> STARTING
> I0123 07:45:49.848836 17659 recover.cpp:475] Replica is in STARTING status
> I0123 07:45:49.850039 17659 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0123 07:45:49.850286 17659 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0123 07:45:49.850754 17659 recover.cpp:566] Updating replica status to VOTING
> I0123 07:45:49.853698 17655 master.cpp:262] Master 
> 20150123-074549-16842879-44955-17634 (utopic) started on 127.0.1.1:44955
> I0123 07:45:49.853981 17655 master.cpp:308] Master only allowing 
> authenticated frameworks to register
> I0123 07:45:49.853997 17655 master.cpp:313] Master only allowing 
> authenticated slaves to register
> I0123 07:45:49.854038 17655 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0/credentials'
> I0123 07:45:49.854557 17655 master.cpp:357] Authorization enabled
> I0123 07:45:49.859633 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.742923ms
> I0123 07:45:49.859853 17659 replica.cpp:323] Persisted replica statu

[jira] [Commented] (MESOS-2255) SlaveRecoveryTest/0.MasterFailover is flaky

2015-10-15 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958448#comment-14958448
 ] 

Yong Qiao Wang commented on MESOS-2255:
---

[~xujyan], I ran the test case SlaveRecoveryTest/0.MasterFailover again on OS 
X(10.10.4), but I found it work well:

{noformat:title=}
Yongs-MacBook-Pro:bin yqwyq$ ./mesos-tests.sh 
--gtest_filter=SlaveRecoveryTest/0.MasterFailover
..
..
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.MasterFailover
I1015 14:58:55.538914 1939460864 exec.cpp:136] Version: 0.26.0
..
..
[   OK ] SlaveRecoveryTest/0.MasterFailover (1397 ms)
[--] 1 test from SlaveRecoveryTest/0 (1397 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (1406 ms total)
[  PASSED  ] 1 test.
{noformat:title=}

Could you let me know which OS/version you ran this case?

> SlaveRecoveryTest/0.MasterFailover is flaky
> ---
>
> Key: MESOS-2255
> URL: https://issues.apache.org/jira/browse/MESOS-2255
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Yan Xu
>Assignee: Yong Qiao Wang
>  Labels: flaky, twitter
>
> {noformat:title=}
> [ RUN  ] SlaveRecoveryTest/0.MasterFailover
> Using temporary directory '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0'
> I0123 07:45:49.818686 17634 leveldb.cpp:176] Opened db in 31.195549ms
> I0123 07:45:49.821962 17634 leveldb.cpp:183] Compacted db in 3.190936ms
> I0123 07:45:49.822049 17634 leveldb.cpp:198] Created db iterator in 47324ns
> I0123 07:45:49.822069 17634 leveldb.cpp:204] Seeked to beginning of db in 
> 2038ns
> I0123 07:45:49.822084 17634 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 484ns
> I0123 07:45:49.822160 17634 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0123 07:45:49.824241 17660 recover.cpp:449] Starting replica recovery
> I0123 07:45:49.825217 17660 recover.cpp:475] Replica is in EMPTY status
> I0123 07:45:49.827020 17660 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0123 07:45:49.827453 17659 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0123 07:45:49.828047 17659 recover.cpp:566] Updating replica status to 
> STARTING
> I0123 07:45:49.838543 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 10.24963ms
> I0123 07:45:49.838580 17659 replica.cpp:323] Persisted replica status to 
> STARTING
> I0123 07:45:49.848836 17659 recover.cpp:475] Replica is in STARTING status
> I0123 07:45:49.850039 17659 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0123 07:45:49.850286 17659 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0123 07:45:49.850754 17659 recover.cpp:566] Updating replica status to VOTING
> I0123 07:45:49.853698 17655 master.cpp:262] Master 
> 20150123-074549-16842879-44955-17634 (utopic) started on 127.0.1.1:44955
> I0123 07:45:49.853981 17655 master.cpp:308] Master only allowing 
> authenticated frameworks to register
> I0123 07:45:49.853997 17655 master.cpp:313] Master only allowing 
> authenticated slaves to register
> I0123 07:45:49.854038 17655 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0/credentials'
> I0123 07:45:49.854557 17655 master.cpp:357] Authorization enabled
> I0123 07:45:49.859633 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.742923ms
> I0123 07:45:49.859853 17659 replica.cpp:323] Persisted replica status to 
> VOTING
> I0123 07:45:49.860327 17658 recover.cpp:580] Successfully joined the Paxos 
> group
> I0123 07:45:49.860703 17654 recover.cpp:464] Recover process terminated
> I0123 07:45:49.859591 17655 master.cpp:1219] The newly elected leader is 
> master@127.0.1.1:44955 with id 20150123-074549-16842879-44955-17634
> I0123 07:45:49.864702 17655 master.cpp:1232] Elected as the leading master!
> I0123 07:45:49.864904 17655 master.cpp:1050] Recovering from registrar
> I0123 07:45:49.865406 17660 registrar.cpp:313] Recovering registrar
> I0123 07:45:49.866576 17660 log.cpp:660] Attempting to start the writer
> I0123 07:45:49.868638 17658 replica.cpp:477] Replica received implicit 
> promise request with proposal 1
> I0123 07:45:49.872521 17658 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.848859ms
> I0123 07:45:49.872555 17658 replica.cpp:345] Persisted promised to 1
> I0123 07:45:49.873769 17661 coordinator.cpp:230] Coordinator attempin

[jira] [Updated] (MESOS-3741) stout containers inherit from STL containers

2015-10-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3741:

Description: 
stout exposes a number of containers ({{hashmap}}, {{hashset}}, {{list}}, 
{{multihashmap}}, and {{set}}) which {{publicly}} inherit from their STL 
counterparts since MESOS-3217. This code being in stout means these custom 
containers live in the global namespace.

Classes inherited publicly from STL containers are not generally safe to use as 
the STL containers lack {{virtual}} destructors, so that deleting through a 
ptr-to-base will not invoke the base dtr and leak memory. It appears this is 
being made worse by e.g. putting the stout containers (which are often named 
like their STL counterparts) in the global namespace which makes it easy to 
confuse the actual type being used (at least in messy user code containing 
{{using namespace std;}} which is not allowed for good reasons like this in 
mesos code).

It would seem better to (1) decide what minimal set of containers still needs 
to be provided now that C++11 can be used, (2) fix the inheritance for the 
stout containers (e.g. inherit {{privately}} or just compose), or at least (3) 
use a dedicated namespace for these custom containers. 

  was:
stout exposes a number of containers ({{hashmap}}, {{hashset}}, {{list}}, 
{{multihashmap}}, and {{set}}) which {{publicly}} inherit from their STL 
counterparts since MESOS-3217. This code being in stout means these custom 
containers live in the global namespace.

Classes inherited publicly from STL containers are not generally safe to use as 
the STL containers lack {{virtual}} destructors, so that deleting through a 
ptr-to-base will not invoke the base dtr and leak memory. It appears this is 
being made worse by e.g. putting the stout containers (which are often named 
like their STL counterparts) in the global namespace which makes it easy to 
confuse the actual type being used (at least in messy user code containing 
{{using namespace std;}} which is not allowed for good reasons like this in 
mesos code).

It would seem better to (1) decide what minimal set of containers still needs 
to be provided now that C++11 can be used, (2) fix the inheritance for the 
stout containers (e.g. inherit {{privately}}), or at least (3) use a dedicated 
namespace for these custom containers. 


> stout containers inherit from STL containers
> 
>
> Key: MESOS-3741
> URL: https://issues.apache.org/jira/browse/MESOS-3741
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Benjamin Bannier
>
> stout exposes a number of containers ({{hashmap}}, {{hashset}}, {{list}}, 
> {{multihashmap}}, and {{set}}) which {{publicly}} inherit from their STL 
> counterparts since MESOS-3217. This code being in stout means these custom 
> containers live in the global namespace.
> Classes inherited publicly from STL containers are not generally safe to use 
> as the STL containers lack {{virtual}} destructors, so that deleting through 
> a ptr-to-base will not invoke the base dtr and leak memory. It appears this 
> is being made worse by e.g. putting the stout containers (which are often 
> named like their STL counterparts) in the global namespace which makes it 
> easy to confuse the actual type being used (at least in messy user code 
> containing {{using namespace std;}} which is not allowed for good reasons 
> like this in mesos code).
> It would seem better to (1) decide what minimal set of containers still needs 
> to be provided now that C++11 can be used, (2) fix the inheritance for the 
> stout containers (e.g. inherit {{privately}} or just compose), or at least 
> (3) use a dedicated namespace for these custom containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3741) stout containers inherit from STL containers

2015-10-15 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-3741:
---

 Summary: stout containers inherit from STL containers
 Key: MESOS-3741
 URL: https://issues.apache.org/jira/browse/MESOS-3741
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Benjamin Bannier


stout exposes a number of containers ({{hashmap}}, {{hashset}}, {{list}}, 
{{multihashmap}}, and {{set}}) which {{publicly}} inherit from their STL 
counterparts since MESOS-3217. This code being in stout means these custom 
containers live in the global namespace.

Classes inherited publicly from STL containers are not generally safe to use as 
the STL containers lack {{virtual}} destructors, so that deleting through a 
ptr-to-base will not invoke the base dtr and leak memory. It appears this is 
being made worse by e.g. putting the stout containers (which are often named 
like their STL counterparts) in the global namespace which makes it easy to 
confuse the actual type being used (at least in messy user code containing 
{{using namespace std;}} which is not allowed for good reasons like this in 
mesos code).

It would seem better to (1) decide what minimal set of containers still needs 
to be provided now that C++11 can be used, (2) fix the inheritance for the 
stout containers (e.g. inherit {{privately}}), or at least (3) use a dedicated 
namespace for these custom containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)