[jira] [Assigned] (MESOS-5028) Copy provisioner cannot replace directory with symlink

2017-04-07 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-5028:
---

Assignee: Chun-Hung Hsiao  (was: Gilbert Song)

> Copy provisioner cannot replace directory with symlink
> --
>
> Key: MESOS-5028
> URL: https://issues.apache.org/jira/browse/MESOS-5028
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Zhitao Li
>Assignee: Chun-Hung Hsiao
>
> I'm trying to play with the new image provisioner on our custom docker 
> images, but one of layer failed to get copied, possibly due to a dangling 
> symlink.
> Error log with Glog_v=1:
> {quote}
> I0324 05:42:48.926678 15067 copy.cpp:127] Copying layer path 
> '/tmp/mesos/store/docker/layers/5df0888641196b88dcc1b97d04c74839f02a73b8a194a79e134426d6a8fcb0f1/rootfs'
>  to rootfs 
> '/var/lib/mesos/provisioner/containers/5f05be6c-c970-4539-aa64-fd0eef2ec7ae/backends/copy/rootfses/507173f3-e316-48a3-a96e-5fdea9ffe9f6'
> E0324 05:42:49.028506 15062 slave.cpp:3773] Container 
> '5f05be6c-c970-4539-aa64-fd0eef2ec7ae' for executor 'test' of framework 
> 75932a89-1514-4011-bafe-beb6a208bb2d-0004 failed to start: Collect failed: 
> Collect failed: Failed to copy layer: cp: cannot overwrite directory 
> ‘/var/lib/mesos/provisioner/containers/5f05be6c-c970-4539-aa64-fd0eef2ec7ae/backends/copy/rootfses/507173f3-e316-48a3-a96e-5fdea9ffe9f6/etc/apt’
>  with non-directory
> {quote}
> Content of 
> _/tmp/mesos/store/docker/layers/5df0888641196b88dcc1b97d04c74839f02a73b8a194a79e134426d6a8fcb0f1/rootfs/etc/apt_
>  points to a non-existing absolute path (cannot provide exact path but it's a 
> result of us trying to mount apt keys into docker container at build time).
> I believe what happened is that we executed a script at build time, which 
> contains equivalent of:
> {quote}
> rm -rf /etc/apt/* && ln -sf /build-mount-point/ /etc/apt
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7366) Incorrect agent gc could accidentally delete the entire persistent volume content

2017-04-07 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7366:
--
Summary: Incorrect agent gc could accidentally delete the entire persistent 
volume content  (was: Incorrect agent gc could delete the entire persistent 
volume content)

> Incorrect agent gc could accidentally delete the entire persistent volume 
> content
> -
>
> Key: MESOS-7366
> URL: https://issues.apache.org/jira/browse/MESOS-7366
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Zhitao Li
>Assignee: Jie Yu
>Priority: Blocker
>
> When 1) a persistent volume is mounted, 2) umount is stuck or something, 3) 
> executor directory gc being invoked, agent seems to emit a log like:
> ```
>  Failed to delete directory  /runs//volume: Device or 
> resource busy
> ```
> After this, the persistent volume directory is empty.
> This could trigger data loss on critical workload so we should fix this ASAP.
> The triggering environment is a custom executor w/o rootfs image.
> Please let me know if you need more signal.
> {noformat}
> I0407 15:18:22.752624 22758 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
>  to user 'uber'
> I0407 15:18:22.763229 22758 slave.cpp:6179] Launching executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 with resources 
> cpus(cassandra-cstar-location-store, cassandra, {resource_id: 
> 29e2ac63-d605-4982-a463-fa311be94e0a}):0.1; 
> mem(cassandra-cstar-location-store, cassandra, {resource_id: 
> 2e1223f3-41a2-419f-85cc-cbc839c19c70}):768; 
> ports(cassandra-cstar-location-store, cassandra, {resource_id: 
> fdd6598f-f32b-4c90-a622-226684528139}):[31001-31001] in work directory 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
> I0407 15:18:22.764103 22758 slave.cpp:1987] Queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.766253 22764 containerizer.cpp:943] Starting container 
> d5a56564-3e24-4c60-9919-746710b78377 for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.767514 22766 linux.cpp:730] Mounting 
> '/var/lib/mesos/volumes/roles/cassandra-cstar-location-store/d6290423-2ba4-4975-86f4-ffd84ad138ff'
>  to 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/volume'
>  for persistent volume disk(cassandra-cstar-location-store, cassandra, 
> {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> I0407 15:18:22.894340 22768 containerizer.cpp:1494] Checkpointing container's 
> forked pid 6892 to 
> '/var/lib/mesos/meta/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/pids/forked.pid'
> I0407 15:19:01.011916 22749 slave.cpp:3231] Got registration for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 from executor(1)@10.14.6.132:36837
> I0407 15:19:01.031939 22770 slave.cpp:2191] Sending queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' to executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 at executor(1)@10.14.6.132:36837
> I0407 15:26:14.012861 22749 linux.cpp:627] Removing mount 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/fra
> meworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a5656
> 4-3e24-4c60-9919-746710b78377/volume' for persistent volume 
> disk(cassandra-cstar-location-store, cassandra, {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> E0407 15:26:14.013828 22756 slave.cpp:3903] 

[jira] [Updated] (MESOS-7366) Agent sandbox gc could accidentally delete the entire persistent volume content

2017-04-07 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7366:
--
Summary: Agent sandbox gc could accidentally delete the entire persistent 
volume content  (was: Incorrect agent gc could accidentally delete the entire 
persistent volume content)

> Agent sandbox gc could accidentally delete the entire persistent volume 
> content
> ---
>
> Key: MESOS-7366
> URL: https://issues.apache.org/jira/browse/MESOS-7366
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Zhitao Li
>Assignee: Jie Yu
>Priority: Blocker
>
> When 1) a persistent volume is mounted, 2) umount is stuck or something, 3) 
> executor directory gc being invoked, agent seems to emit a log like:
> ```
>  Failed to delete directory  /runs//volume: Device or 
> resource busy
> ```
> After this, the persistent volume directory is empty.
> This could trigger data loss on critical workload so we should fix this ASAP.
> The triggering environment is a custom executor w/o rootfs image.
> Please let me know if you need more signal.
> {noformat}
> I0407 15:18:22.752624 22758 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
>  to user 'uber'
> I0407 15:18:22.763229 22758 slave.cpp:6179] Launching executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 with resources 
> cpus(cassandra-cstar-location-store, cassandra, {resource_id: 
> 29e2ac63-d605-4982-a463-fa311be94e0a}):0.1; 
> mem(cassandra-cstar-location-store, cassandra, {resource_id: 
> 2e1223f3-41a2-419f-85cc-cbc839c19c70}):768; 
> ports(cassandra-cstar-location-store, cassandra, {resource_id: 
> fdd6598f-f32b-4c90-a622-226684528139}):[31001-31001] in work directory 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
> I0407 15:18:22.764103 22758 slave.cpp:1987] Queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.766253 22764 containerizer.cpp:943] Starting container 
> d5a56564-3e24-4c60-9919-746710b78377 for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.767514 22766 linux.cpp:730] Mounting 
> '/var/lib/mesos/volumes/roles/cassandra-cstar-location-store/d6290423-2ba4-4975-86f4-ffd84ad138ff'
>  to 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/volume'
>  for persistent volume disk(cassandra-cstar-location-store, cassandra, 
> {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> I0407 15:18:22.894340 22768 containerizer.cpp:1494] Checkpointing container's 
> forked pid 6892 to 
> '/var/lib/mesos/meta/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/pids/forked.pid'
> I0407 15:19:01.011916 22749 slave.cpp:3231] Got registration for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 from executor(1)@10.14.6.132:36837
> I0407 15:19:01.031939 22770 slave.cpp:2191] Sending queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' to executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 at executor(1)@10.14.6.132:36837
> I0407 15:26:14.012861 22749 linux.cpp:627] Removing mount 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/fra
> meworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a5656
> 4-3e24-4c60-9919-746710b78377/volume' for persistent volume 
> disk(cassandra-cstar-location-store, cassandra, {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> E0407 15:26:14.013828 22756 

[jira] [Updated] (MESOS-7366) Incorrect agent gc could delete the entire persistent volume content

2017-04-07 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7366:
--
Summary: Incorrect agent gc could delete the entire persistent volume 
content  (was: Incorrect agent gc could empty up entire persistent volume 
content)

> Incorrect agent gc could delete the entire persistent volume content
> 
>
> Key: MESOS-7366
> URL: https://issues.apache.org/jira/browse/MESOS-7366
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Zhitao Li
>Assignee: Jie Yu
>Priority: Blocker
>
> When 1) a persistent volume is mounted, 2) umount is stuck or something, 3) 
> executor directory gc being invoked, agent seems to emit a log like:
> ```
>  Failed to delete directory  /runs//volume: Device or 
> resource busy
> ```
> After this, the persistent volume directory is empty.
> This could trigger data loss on critical workload so we should fix this ASAP.
> The triggering environment is a custom executor w/o rootfs image.
> Please let me know if you need more signal.
> {noformat}
> I0407 15:18:22.752624 22758 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
>  to user 'uber'
> I0407 15:18:22.763229 22758 slave.cpp:6179] Launching executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 with resources 
> cpus(cassandra-cstar-location-store, cassandra, {resource_id: 
> 29e2ac63-d605-4982-a463-fa311be94e0a}):0.1; 
> mem(cassandra-cstar-location-store, cassandra, {resource_id: 
> 2e1223f3-41a2-419f-85cc-cbc839c19c70}):768; 
> ports(cassandra-cstar-location-store, cassandra, {resource_id: 
> fdd6598f-f32b-4c90-a622-226684528139}):[31001-31001] in work directory 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
> I0407 15:18:22.764103 22758 slave.cpp:1987] Queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.766253 22764 containerizer.cpp:943] Starting container 
> d5a56564-3e24-4c60-9919-746710b78377 for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.767514 22766 linux.cpp:730] Mounting 
> '/var/lib/mesos/volumes/roles/cassandra-cstar-location-store/d6290423-2ba4-4975-86f4-ffd84ad138ff'
>  to 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/volume'
>  for persistent volume disk(cassandra-cstar-location-store, cassandra, 
> {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> I0407 15:18:22.894340 22768 containerizer.cpp:1494] Checkpointing container's 
> forked pid 6892 to 
> '/var/lib/mesos/meta/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/pids/forked.pid'
> I0407 15:19:01.011916 22749 slave.cpp:3231] Got registration for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 from executor(1)@10.14.6.132:36837
> I0407 15:19:01.031939 22770 slave.cpp:2191] Sending queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' to executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 at executor(1)@10.14.6.132:36837
> I0407 15:26:14.012861 22749 linux.cpp:627] Removing mount 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/fra
> meworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a5656
> 4-3e24-4c60-9919-746710b78377/volume' for persistent volume 
> disk(cassandra-cstar-location-store, cassandra, {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> E0407 15:26:14.013828 22756 slave.cpp:3903] Failed to update resources for 
> container 

[jira] [Created] (MESOS-7373) Remove thread_local workaround on OSX

2017-04-07 Thread Neil Conway (JIRA)
Neil Conway created MESOS-7373:
--

 Summary: Remove thread_local workaround on OSX
 Key: MESOS-7373
 URL: https://issues.apache.org/jira/browse/MESOS-7373
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway


{{include/stout/thread_local.hpp}} in stout contains the following comment:

{noformat}
// A wrapper around the thread local storage attribute. The default
// clang on OSX does not support the c++11 standard `thread_local`
// intentionally until a higher performance implementation is
// released. See https://devforums.apple.com/message/1079348#1079348
// Until then, we use `__thread` on OSX instead.
// We required that THREAD_LOCAL is only used with POD types as this
// is a requirement of `__thread`.
{noformat}

As of XCode 8, this workaround should no longer be necessary, because Apple's 
clang supports {{thread_local}} natively -- see 
http://stackoverflow.com/a/29929949/5327044



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7366) Incorrect agent gc could empty up entire persistent volume content

2017-04-07 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7366:
--
Fix Version/s: 1.2.1
   1.1.2
   1.0.3

> Incorrect agent gc could empty up entire persistent volume content
> --
>
> Key: MESOS-7366
> URL: https://issues.apache.org/jira/browse/MESOS-7366
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Zhitao Li
>Assignee: Jie Yu
>Priority: Blocker
>
> When 1) a persistent volume is mounted, 2) umount is stuck or something, 3) 
> executor directory gc being invoked, agent seems to emit a log like:
> ```
>  Failed to delete directory  /runs//volume: Device or 
> resource busy
> ```
> After this, the persistent volume directory is empty.
> This could trigger data loss on critical workload so we should fix this ASAP.
> The triggering environment is a custom executor w/o rootfs image.
> Please let me know if you need more signal.
> {noformat}
> I0407 15:18:22.752624 22758 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
>  to user 'uber'
> I0407 15:18:22.763229 22758 slave.cpp:6179] Launching executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 with resources 
> cpus(cassandra-cstar-location-store, cassandra, {resource_id: 
> 29e2ac63-d605-4982-a463-fa311be94e0a}):0.1; 
> mem(cassandra-cstar-location-store, cassandra, {resource_id: 
> 2e1223f3-41a2-419f-85cc-cbc839c19c70}):768; 
> ports(cassandra-cstar-location-store, cassandra, {resource_id: 
> fdd6598f-f32b-4c90-a622-226684528139}):[31001-31001] in work directory 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
> I0407 15:18:22.764103 22758 slave.cpp:1987] Queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.766253 22764 containerizer.cpp:943] Starting container 
> d5a56564-3e24-4c60-9919-746710b78377 for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.767514 22766 linux.cpp:730] Mounting 
> '/var/lib/mesos/volumes/roles/cassandra-cstar-location-store/d6290423-2ba4-4975-86f4-ffd84ad138ff'
>  to 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/volume'
>  for persistent volume disk(cassandra-cstar-location-store, cassandra, 
> {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> I0407 15:18:22.894340 22768 containerizer.cpp:1494] Checkpointing container's 
> forked pid 6892 to 
> '/var/lib/mesos/meta/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/pids/forked.pid'
> I0407 15:19:01.011916 22749 slave.cpp:3231] Got registration for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 from executor(1)@10.14.6.132:36837
> I0407 15:19:01.031939 22770 slave.cpp:2191] Sending queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' to executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 at executor(1)@10.14.6.132:36837
> I0407 15:26:14.012861 22749 linux.cpp:627] Removing mount 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/fra
> meworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a5656
> 4-3e24-4c60-9919-746710b78377/volume' for persistent volume 
> disk(cassandra-cstar-location-store, cassandra, {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> E0407 15:26:14.013828 22756 slave.cpp:3903] Failed to update resources for 
> container d5a56564-3e24-4c60-9919-746710b78377 of executor 
> 

[jira] [Commented] (MESOS-7366) Incorrect agent gc could empty up entire persistent volume content

2017-04-07 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961500#comment-15961500
 ] 

Jie Yu commented on MESOS-7366:
---

One workaround hotfix will be to use MNT_DETACH in linux filesystem isolator 
cleanup. This is a simple experiment I did:
{code}
[jie@core-dev ~]$ sudo mount --bind /opt/vagrant /mnt
[jie@core-dev ~]$ cd /mnt
[jie@core-dev mnt]$ ls
bin  embedded
[jie@core-dev mnt]$ cat /proc/self/mountinfo | grep vagrant
168 62 253:0 /opt/vagrant /mnt rw,relatime shared:1 - xfs 
/dev/mapper/centos-root 
rw,seclabel,attr2,inode64,logbsize=128k,sunit=256,swidth=512,noquota
[jie@core-dev mnt]$ sudo umount /mnt
umount: /mnt: target is busy.
(In some cases useful info about processes that use
 the device is found by lsof(8) or fuser(1))
[jie@core-dev mnt]$ sudo umount --lazy /mnt
[jie@core-dev mnt]$ cat /proc/self/mountinfo | grep vagrant
[jie@core-dev mnt]$ cd /mnt
[jie@core-dev mnt]$ ls
[jie@core-dev mnt]$
{code}

A correct long term fix should deal containaerizer destroy failure more 
carefully. We should not mark the executor as done if containerizer destroy 
failed. The current 
[code|https://github.com/apache/mesos/blob/1.2.x/src/slave/slave.cpp#L4734-L4746]
 will still mark the executor as done if containerizer destory failed. This is 
technically not correct because the resources associated with the executor 
might not be released yet by the isolator.

> Incorrect agent gc could empty up entire persistent volume content
> --
>
> Key: MESOS-7366
> URL: https://issues.apache.org/jira/browse/MESOS-7366
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Zhitao Li
>Assignee: Jie Yu
>Priority: Blocker
>
> When 1) a persistent volume is mounted, 2) umount is stuck or something, 3) 
> executor directory gc being invoked, agent seems to emit a log like:
> ```
>  Failed to delete directory  /runs//volume: Device or 
> resource busy
> ```
> After this, the persistent volume directory is empty.
> This could trigger data loss on critical workload so we should fix this ASAP.
> The triggering environment is a custom executor w/o rootfs image.
> Please let me know if you need more signal.
> {noformat}
> I0407 15:18:22.752624 22758 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
>  to user 'uber'
> I0407 15:18:22.763229 22758 slave.cpp:6179] Launching executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 with resources 
> cpus(cassandra-cstar-location-store, cassandra, {resource_id: 
> 29e2ac63-d605-4982-a463-fa311be94e0a}):0.1; 
> mem(cassandra-cstar-location-store, cassandra, {resource_id: 
> 2e1223f3-41a2-419f-85cc-cbc839c19c70}):768; 
> ports(cassandra-cstar-location-store, cassandra, {resource_id: 
> fdd6598f-f32b-4c90-a622-226684528139}):[31001-31001] in work directory 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
> I0407 15:18:22.764103 22758 slave.cpp:1987] Queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.766253 22764 containerizer.cpp:943] Starting container 
> d5a56564-3e24-4c60-9919-746710b78377 for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.767514 22766 linux.cpp:730] Mounting 
> '/var/lib/mesos/volumes/roles/cassandra-cstar-location-store/d6290423-2ba4-4975-86f4-ffd84ad138ff'
>  to 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/volume'
>  for persistent volume disk(cassandra-cstar-location-store, cassandra, 
> {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> I0407 15:18:22.894340 22768 containerizer.cpp:1494] Checkpointing container's 
> forked pid 6892 to 
> 

[jira] [Updated] (MESOS-7366) Incorrect agent gc could empty up entire persistent volume content

2017-04-07 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7366:
--
Target Version/s: 1.0.3, 1.1.2, 1.2.1
   Fix Version/s: (was: 1.2.1)
  (was: 1.1.2)
  (was: 1.0.3)

> Incorrect agent gc could empty up entire persistent volume content
> --
>
> Key: MESOS-7366
> URL: https://issues.apache.org/jira/browse/MESOS-7366
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Zhitao Li
>Assignee: Jie Yu
>Priority: Blocker
>
> When 1) a persistent volume is mounted, 2) umount is stuck or something, 3) 
> executor directory gc being invoked, agent seems to emit a log like:
> ```
>  Failed to delete directory  /runs//volume: Device or 
> resource busy
> ```
> After this, the persistent volume directory is empty.
> This could trigger data loss on critical workload so we should fix this ASAP.
> The triggering environment is a custom executor w/o rootfs image.
> Please let me know if you need more signal.
> {noformat}
> I0407 15:18:22.752624 22758 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
>  to user 'uber'
> I0407 15:18:22.763229 22758 slave.cpp:6179] Launching executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 with resources 
> cpus(cassandra-cstar-location-store, cassandra, {resource_id: 
> 29e2ac63-d605-4982-a463-fa311be94e0a}):0.1; 
> mem(cassandra-cstar-location-store, cassandra, {resource_id: 
> 2e1223f3-41a2-419f-85cc-cbc839c19c70}):768; 
> ports(cassandra-cstar-location-store, cassandra, {resource_id: 
> fdd6598f-f32b-4c90-a622-226684528139}):[31001-31001] in work directory 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
> I0407 15:18:22.764103 22758 slave.cpp:1987] Queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.766253 22764 containerizer.cpp:943] Starting container 
> d5a56564-3e24-4c60-9919-746710b78377 for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.767514 22766 linux.cpp:730] Mounting 
> '/var/lib/mesos/volumes/roles/cassandra-cstar-location-store/d6290423-2ba4-4975-86f4-ffd84ad138ff'
>  to 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/volume'
>  for persistent volume disk(cassandra-cstar-location-store, cassandra, 
> {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> I0407 15:18:22.894340 22768 containerizer.cpp:1494] Checkpointing container's 
> forked pid 6892 to 
> '/var/lib/mesos/meta/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/pids/forked.pid'
> I0407 15:19:01.011916 22749 slave.cpp:3231] Got registration for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 from executor(1)@10.14.6.132:36837
> I0407 15:19:01.031939 22770 slave.cpp:2191] Sending queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' to executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 at executor(1)@10.14.6.132:36837
> I0407 15:26:14.012861 22749 linux.cpp:627] Removing mount 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/fra
> meworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a5656
> 4-3e24-4c60-9919-746710b78377/volume' for persistent volume 
> disk(cassandra-cstar-location-store, cassandra, {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> E0407 15:26:14.013828 22756 slave.cpp:3903] Failed to update resources for 
> container 

[jira] [Commented] (MESOS-7366) Incorrect agent gc could empty up entire persistent volume content

2017-04-07 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961473#comment-15961473
 ] 

Zhitao Li commented on MESOS-7366:
--

Some error logs: 
https://gist.github.com/zhitaoli/ada41a3b186fae8c7d3a79928716fb56

> Incorrect agent gc could empty up entire persistent volume content
> --
>
> Key: MESOS-7366
> URL: https://issues.apache.org/jira/browse/MESOS-7366
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Zhitao Li
>Assignee: Jie Yu
>Priority: Blocker
>
> When 1) a persistent volume is mounted, 2) umount is stuck or something, 3) 
> executor directory gc being invoked, agent seems to emit a log like:
> ```
>  Failed to delete directory  /runs//volume: Device or 
> resource busy
> ```
> After this, the persistent volume directory is empty.
> This could trigger data loss on critical workload so we should fix this ASAP.
> The triggering environment is a custom executor w/o rootfs image.
> Please let me know if you need more signal.
> {noformat}
> I0407 15:18:22.752624 22758 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
>  to user 'uber'
> I0407 15:18:22.763229 22758 slave.cpp:6179] Launching executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 with resources 
> cpus(cassandra-cstar-location-store, cassandra, {resource_id: 
> 29e2ac63-d605-4982-a463-fa311be94e0a}):0.1; 
> mem(cassandra-cstar-location-store, cassandra, {resource_id: 
> 2e1223f3-41a2-419f-85cc-cbc839c19c70}):768; 
> ports(cassandra-cstar-location-store, cassandra, {resource_id: 
> fdd6598f-f32b-4c90-a622-226684528139}):[31001-31001] in work directory 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377'
> I0407 15:18:22.764103 22758 slave.cpp:1987] Queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.766253 22764 containerizer.cpp:943] Starting container 
> d5a56564-3e24-4c60-9919-746710b78377 for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014
> I0407 15:18:22.767514 22766 linux.cpp:730] Mounting 
> '/var/lib/mesos/volumes/roles/cassandra-cstar-location-store/d6290423-2ba4-4975-86f4-ffd84ad138ff'
>  to 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/volume'
>  for persistent volume disk(cassandra-cstar-location-store, cassandra, 
> {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> I0407 15:18:22.894340 22768 containerizer.cpp:1494] Checkpointing container's 
> forked pid 6892 to 
> '/var/lib/mesos/meta/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/frameworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a56564-3e24-4c60-9919-746710b78377/pids/forked.pid'
> I0407 15:19:01.011916 22749 slave.cpp:3231] Got registration for executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 from executor(1)@10.14.6.132:36837
> I0407 15:19:01.031939 22770 slave.cpp:2191] Sending queued task 
> 'node-29__c6fdf823-e31a-4b78-a34f-e47e749c07f4' to executor 
> 'node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7' of framework 
> 5d030fd5-0fb6-4366-9dee-706261fa0749-0014 at executor(1)@10.14.6.132:36837
> I0407 15:26:14.012861 22749 linux.cpp:627] Removing mount 
> '/var/lib/mesos/slaves/91ec544d-ac98-4958-bd7f-85d1f7822421-S3296/fra
> meworks/5d030fd5-0fb6-4366-9dee-706261fa0749-0014/executors/node-29_executor__7eeb4a92-4849-4de5-a2d0-90f64705f5d7/runs/d5a5656
> 4-3e24-4c60-9919-746710b78377/volume' for persistent volume 
> disk(cassandra-cstar-location-store, cassandra, {resource_id: 
> fefc15d6-0c6f-4eac-a3f8-c34d0335c5ec})[d6290423-2ba4-4975-86f4-ffd84ad138ff:volume]:6466445
>  of container d5a56564-3e24-4c60-9919-746710b78377
> E0407 15:26:14.013828 22756 slave.cpp:3903] Failed to update resources for 
> container 

[jira] [Updated] (MESOS-7366) Incorrect agent gc could empty up entire persistent volume content

2017-04-07 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7366:
--
Priority: Blocker  (was: Critical)

> Incorrect agent gc could empty up entire persistent volume content
> --
>
> Key: MESOS-7366
> URL: https://issues.apache.org/jira/browse/MESOS-7366
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Zhitao Li
>Assignee: Jie Yu
>Priority: Blocker
>
> When 1) a persistent volume is mounted, 2) umount is stuck or something, 3) 
> executor directory gc being invoked, agent seems to emit a log like:
> ```
>  Failed to delete directory  /runs//volume: Device or 
> resource busy
> ```
> After this, the persistent volume directory is empty.
> This could trigger data loss on critical workload so we should fix this ASAP.
> The triggering environment is a custom executor w/o rootfs image.
> Please let me know if you need more signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7366) Incorrect agent gc could empty up entire persistent volume content

2017-04-07 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7366:
--
Affects Version/s: 1.0.2
   1.1.1
   1.2.0

> Incorrect agent gc could empty up entire persistent volume content
> --
>
> Key: MESOS-7366
> URL: https://issues.apache.org/jira/browse/MESOS-7366
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Zhitao Li
>Assignee: Jie Yu
>Priority: Blocker
>
> When 1) a persistent volume is mounted, 2) umount is stuck or something, 3) 
> executor directory gc being invoked, agent seems to emit a log like:
> ```
>  Failed to delete directory  /runs//volume: Device or 
> resource busy
> ```
> After this, the persistent volume directory is empty.
> This could trigger data loss on critical workload so we should fix this ASAP.
> The triggering environment is a custom executor w/o rootfs image.
> Please let me know if you need more signal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7372) Improve agent re-registration robustness.

2017-04-07 Thread James Peach (JIRA)
James Peach created MESOS-7372:
--

 Summary: Improve agent re-registration robustness.
 Key: MESOS-7372
 URL: https://issues.apache.org/jira/browse/MESOS-7372
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: James Peach
Assignee: James Peach


There's no message validation on {{Master::reregisterSlave}}, so it is possible 
for malicious senders to send invalid re-registrations that trigger {{CHECK}} 
failures.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5460) Add HDFS support in Windows builds.

2017-04-07 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5460:
-
Priority: Minor  (was: Major)

Defer this for now.

Hopefully we can refactor the fetcher and get rid of the dependency on Hadoop 
for certain types of URLs.

> Add HDFS support in Windows builds.
> ---
>
> Key: MESOS-5460
> URL: https://issues.apache.org/jira/browse/MESOS-5460
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Joseph Wu
>Priority: Minor
>  Labels: mesos, mesosphere, windows
>
> Right now we have a bunch of #ifdefs throughout the codebase around the HDFS 
> code, because Windows doesn't support it. We should explore adding support 
> for (e.g.) fetching from HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5882) `os::cloexec` does not exist on Windows

2017-04-07 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961334#comment-15961334
 ] 

Joseph Wu commented on MESOS-5882:
--

We use {{cloexec}} to change how subprocesses inherit file descriptors 
(inherited by default on Posix, which we need to change in order to not leak 
FDs).  In contrast, FDs are *not* inherited by default on Windows, so we've 
left {{cloexec}} as a no-op on Windows.

At the moment, this is fine.  However, the introduction of the IO Switchboard 
(for exec-ing into containers) added a case where we explicitly want to inherit 
FDs.  This is done with a ChildHook that unsets the {{O_CLOEXEC}} option on a 
specific FD before creating a subprocess:
https://github.com/apache/mesos/blob/1.2.x/3rdparty/libprocess/include/process/subprocess_base.hpp#L212-L216
https://github.com/apache/mesos/blob/1.2.x/src/slave/containerizer/mesos/io/switchboard.cpp#L563-L565

This feature isn't supported in Windows, but we might want to support it at 
some point in the future.  That means we need to provide a way of selectively 
inheriting FDs on Windows.

> `os::cloexec` does not exist on Windows
> ---
>
> Key: MESOS-5882
> URL: https://issues.apache.org/jira/browse/MESOS-5882
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Joseph Wu
>  Labels: mesosphere, stout
>
> `os::cloexec` does not work on Windows. It will never work at the OS level. 
> Because of this, there are likely many important and hard-to-detect bugs 
> hanging around the agent.
> This is extremely important to fix. Some possible solutions to investigate 
> (some of which are _extremely_ risky):
> * Abstract out file descriptors into a class, implement cloexec in that class 
> on Windows (since we can't rely on the OS to do it).
> * Refactor all the code that relies on `os::cloexec` to not rely on it.
> Of the two, the first seems less risky in the short term, because the cloexec 
> code only affects Windows. Depending on the semantics of the implementation 
> of the `FileDescriptor` class, it is possible that this is riskier to Windows 
> in the longer term, as the semantics of `cloexec` may have subtle difference 
> between Linux and Windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6735) `os::realpath` semantics differ between Windows and POSIX

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-6735:
---

Assignee: John Kordich  (was: Alex Clemmer)

> `os::realpath` semantics differ between Windows and POSIX
> -
>
> Key: MESOS-6735
> URL: https://issues.apache.org/jira/browse/MESOS-6735
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: John Kordich
>  Labels: stout
>
> `os::realpath` on Unix should error out if the path does not exist. On 
> Windows, the implementation is backed by `_fullpath`, which does not error 
> out if the path does not exist. In general, we'd like the semantics to be as 
> similar as possible across the two.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6687) Use of symlinks requires us to run as Administrator on Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-6687:
---

Assignee: Andrew Schwartzmeyer  (was: Alex Clemmer)

> Use of symlinks requires us to run as Administrator on Windows
> --
>
> Key: MESOS-6687
> URL: https://issues.apache.org/jira/browse/MESOS-6687
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: microsoft, windows-mvp
>
> Currently all the Agent binaries are required to run as Administrator on 
> Windows because only Administrator has the ability to create symlinks.
> It will soon be possible to set Windows into "developer mode" to resolve this 
> issue, but in the medium term, we need to investigate what it will take to 
> remove the dependency on symlinks, so that we are not dependent on this 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7371) Investigate enabling NTFS long path support automatically

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-7371:
---

 Summary: Investigate enabling NTFS long path support automatically
 Key: MESOS-7371
 URL: https://issues.apache.org/jira/browse/MESOS-7371
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Andrew Schwartzmeyer
Assignee: Andrew Schwartzmeyer


MSDN _seems_ to imply that it's possible to enable the NTFS long path support 
for each process, without the user having to do it manually. This should be 
investigated (perhaps it's done during installation).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7370) Fix create symlink code to use flag which enables non-admins to make symlinks

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-7370:
---

 Summary: Fix create symlink code to use flag which enables 
non-admins to make symlinks
 Key: MESOS-7370
 URL: https://issues.apache.org/jira/browse/MESOS-7370
 Project: Mesos
  Issue Type: Improvement
  Components: stout
 Environment: Windows 10
Reporter: Andrew Schwartzmeyer
Assignee: Andrew Schwartzmeyer


Specifically {{SYMBOLIC_LINK_FLAG_ALLOW_UNPRIVILEGED_CREATE}}.

bq. Specify this flag to allow creation of symbolic links when the process is 
not elevated



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7272) Unified containerizer does not support docker registry version < 2.3.

2017-04-07 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-7272:

Priority: Blocker  (was: Major)

> Unified containerizer does not support docker registry version < 2.3.
> -
>
> Key: MESOS-7272
> URL: https://issues.apache.org/jira/browse/MESOS-7272
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: depay
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: easyfix
>
> in file `src/uri/fetchers/docker.cpp`
> ```
> Option contentType = response.headers.get("Content-Type");  
> if (contentType.isSome() &&  
> !strings::startsWith(  
> contentType.get(),  
> "application/vnd.docker.distribution.manifest.v1")) {  
>   return Failure(  
>   "Unsupported manifest MIME type: " + contentType.get());  
> }  
> ```
> Docker fetcher check the contentType strictly, while docker registry with 
> version < 2.3 returns manifests with contentType `application/json`, that 
> leading failure like `E0321 13:27:27.572402 40370 slave.cpp:4650] Container 
> 'xxx' for executor 'xxx' of framework xxx failed to start: Unsupported 
> manifest MIME type: application/json; charset=utf-8`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7251) Support pulling images from AliCloud private registry.

2017-04-07 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-7251:

Priority: Blocker  (was: Major)

> Support pulling images from AliCloud private registry.
> --
>
> Key: MESOS-7251
> URL: https://issues.apache.org/jira/browse/MESOS-7251
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Kaiyu Shi
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: docker, fetcher, provisioner
>
> The image puller via curl doesn't work when I'm specifying the image name as:
> registry.cn-hangzhou.aliyuncs.com/kaiyu/pytorch-cuda75
> 400 BAD REQUEST
> But the docker pulls it successfully 
> bq. docker pull registry.cn-hangzhou.aliyuncs.com/kaiyu/pytorch-cuda75



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5172) Registry puller cannot fetch blobs correctly from http Redirect 3xx urls.

2017-04-07 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-5172:

Target Version/s: 1.2.1, 1.3.0

> Registry puller cannot fetch blobs correctly from http Redirect 3xx urls.
> -
>
> Key: MESOS-5172
> URL: https://issues.apache.org/jira/browse/MESOS-5172
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer, mesosphere
>
> When the registry puller is pulling a private repository from some private 
> registry (e.g., quay.io), errors may occur when fetching blobs, at which 
> point fetching the manifest of the repo is finished correctly. The error 
> message is `Unexpected HTTP response '400 Bad Request' when trying to 
> download the blob`. This may arise from the logic of fetching blobs, or 
> incorrect format of uri when requesting blobs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5172) Registry puller cannot fetch blobs correctly from http Redirect 3xx urls.

2017-04-07 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-5172:

Priority: Blocker  (was: Major)

> Registry puller cannot fetch blobs correctly from http Redirect 3xx urls.
> -
>
> Key: MESOS-5172
> URL: https://issues.apache.org/jira/browse/MESOS-5172
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: containerizer, mesosphere
>
> When the registry puller is pulling a private repository from some private 
> registry (e.g., quay.io), errors may occur when fetching blobs, at which 
> point fetching the manifest of the repo is finished correctly. The error 
> message is `Unexpected HTTP response '400 Bad Request' when trying to 
> download the blob`. This may arise from the logic of fetching blobs, or 
> incorrect format of uri when requesting blobs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5881) Semantics of `os::symlink` differ across POSIX and Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961293#comment-15961293
 ] 

Andrew Schwartzmeyer commented on MESOS-5881:
-

We need to confirm that you really cannot use any system API to create a broken 
symlink; I was fairly certain this is possible to do on Windows (I thought I 
fixed PowerShell to support it), but now I need to research it.

> Semantics of `os::symlink` differ across POSIX and Windows
> --
>
> Key: MESOS-5881
> URL: https://issues.apache.org/jira/browse/MESOS-5881
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Li Li
>  Labels: mesosphere, stout, windows
>
> This issue causes the following tests to fail on Windows:
> * RmdirTest.RemoveDirectoryWithNoTargetSymbolicLink
> * OsTest.Realpath
> On most POSIX implementations, it is possible to create a symlink with a 
> target that does not exist. On Windows, attempting to create a symlink 
> pointing to a target that does not exist will cause a runtime failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-5881) Semantics of `os::symlink` differ across POSIX and Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-5881:
---

Assignee: Li Li  (was: Alex Clemmer)

> Semantics of `os::symlink` differ across POSIX and Windows
> --
>
> Key: MESOS-5881
> URL: https://issues.apache.org/jira/browse/MESOS-5881
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Li Li
>  Labels: mesosphere, stout, windows
>
> This issue causes the following tests to fail on Windows:
> * RmdirTest.RemoveDirectoryWithNoTargetSymbolicLink
> * OsTest.Realpath
> On most POSIX implementations, it is possible to create a symlink with a 
> target that does not exist. On Windows, attempting to create a symlink 
> pointing to a target that does not exist will cause a runtime failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5881) Semantics of `os::symlink` differ across POSIX and Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-5881:

Labels: mesosphere stout windows  (was: mesosphere stout)

> Semantics of `os::symlink` differ across POSIX and Windows
> --
>
> Key: MESOS-5881
> URL: https://issues.apache.org/jira/browse/MESOS-5881
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, stout, windows
>
> This issue causes the following tests to fail on Windows:
> * RmdirTest.RemoveDirectoryWithNoTargetSymbolicLink
> * OsTest.Realpath
> On most POSIX implementations, it is possible to create a symlink with a 
> target that does not exist. On Windows, attempting to create a symlink 
> pointing to a target that does not exist will cause a runtime failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6678) `os::temp` might fail on Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6678:

Labels: stout windows  (was: stout)

> `os::temp` might fail on Windows
> 
>
> Key: MESOS-6678
> URL: https://issues.apache.org/jira/browse/MESOS-6678
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: stout, windows
>
> On Unix, `os::temp` is always `/tmp`. On Windows, we have to get this path 
> from the OS. Unfortunately, this operation might fail, and we need to handle 
> this.
> We should consider either:
> (1) Defaulting to something sane, or
> (2) Refactoring `os::temp` to return a `Try`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6678) `os::temp` might fail on Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-6678:
---

Assignee: Andrew Schwartzmeyer  (was: Alex Clemmer)

> `os::temp` might fail on Windows
> 
>
> Key: MESOS-6678
> URL: https://issues.apache.org/jira/browse/MESOS-6678
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: stout, windows
>
> On Unix, `os::temp` is always `/tmp`. On Windows, we have to get this path 
> from the OS. Unfortunately, this operation might fail, and we need to handle 
> this.
> We should consider either:
> (1) Defaulting to something sane, or
> (2) Refactoring `os::temp` to return a `Try`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5460) Add HDFS support in Windows builds.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-5460:

Issue Type: Task  (was: Bug)

> Add HDFS support in Windows builds.
> ---
>
> Key: MESOS-5460
> URL: https://issues.apache.org/jira/browse/MESOS-5460
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Joseph Wu
>  Labels: mesos, mesosphere, windows
>
> Right now we have a bunch of #ifdefs throughout the codebase around the HDFS 
> code, because Windows doesn't support it. We should explore adding support 
> for (e.g.) fetching from HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5460) Add HDFS support in Windows builds.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961284#comment-15961284
 ] 

Andrew Schwartzmeyer commented on MESOS-5460:
-

Joe, we're going to need more information on this. We need to follow up with 
you, and internally about possible HDFS support on Windows; we're just not sure.

> Add HDFS support in Windows builds.
> ---
>
> Key: MESOS-5460
> URL: https://issues.apache.org/jira/browse/MESOS-5460
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Joseph Wu
>  Labels: mesos, mesosphere, windows
>
> Right now we have a bunch of #ifdefs throughout the codebase around the HDFS 
> code, because Windows doesn't support it. We should explore adding support 
> for (e.g.) fetching from HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-5460) Add HDFS support in Windows builds.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-5460:
---

Assignee: Joseph Wu  (was: Alex Clemmer)

> Add HDFS support in Windows builds.
> ---
>
> Key: MESOS-5460
> URL: https://issues.apache.org/jira/browse/MESOS-5460
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Joseph Wu
>  Labels: mesos, mesosphere, windows
>
> Right now we have a bunch of #ifdefs throughout the codebase around the HDFS 
> code, because Windows doesn't support it. We should explore adding support 
> for (e.g.) fetching from HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5460) Add HDFS support in Windows builds.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-5460:

Labels: mesos mesosphere windows  (was: mesos mesosphere)

> Add HDFS support in Windows builds.
> ---
>
> Key: MESOS-5460
> URL: https://issues.apache.org/jira/browse/MESOS-5460
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesos, mesosphere, windows
>
> Right now we have a bunch of #ifdefs throughout the codebase around the HDFS 
> code, because Windows doesn't support it. We should explore adding support 
> for (e.g.) fetching from HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5461) Add Windows support for persistent volumes.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-5461:

Issue Type: Task  (was: Improvement)

> Add Windows support for persistent volumes.
> ---
>
> Key: MESOS-5461
> URL: https://issues.apache.org/jira/browse/MESOS-5461
> Project: Mesos
>  Issue Type: Task
>  Components: isolation
>Reporter: Alex Clemmer
>Assignee: John Kordich
>  Labels: agent, mesos, mesosphere, windows
>
> Right now the persistent volumes code in the POSIX isolators is `#ifdef`'d 
> out for Windows compilations.
> This is because the Windows isolators take a dependency on the POSIX 
> isolators, but Windows doesn't support `os::chown`. Since this protects the 
> invariant that persistent volumes be owned by one task at a time, we 
> currently choose to disable the whole thing for now. At a later date we will 
> need to come back and revisit this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-5461) Add Windows support for persistent volumes.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-5461:
---

Assignee: John Kordich  (was: Alex Clemmer)

> Add Windows support for persistent volumes.
> ---
>
> Key: MESOS-5461
> URL: https://issues.apache.org/jira/browse/MESOS-5461
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Alex Clemmer
>Assignee: John Kordich
>  Labels: agent, mesos, mesosphere, windows
>
> Right now the persistent volumes code in the POSIX isolators is `#ifdef`'d 
> out for Windows compilations.
> This is because the Windows isolators take a dependency on the POSIX 
> isolators, but Windows doesn't support `os::chown`. Since this protects the 
> invariant that persistent volumes be owned by one task at a time, we 
> currently choose to disable the whole thing for now. At a later date we will 
> need to come back and revisit this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5461) Add Windows support for persistent volumes.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-5461:

Issue Type: Improvement  (was: Bug)

> Add Windows support for persistent volumes.
> ---
>
> Key: MESOS-5461
> URL: https://issues.apache.org/jira/browse/MESOS-5461
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Alex Clemmer
>Assignee: John Kordich
>  Labels: agent, mesos, mesosphere, windows
>
> Right now the persistent volumes code in the POSIX isolators is `#ifdef`'d 
> out for Windows compilations.
> This is because the Windows isolators take a dependency on the POSIX 
> isolators, but Windows doesn't support `os::chown`. Since this protects the 
> invariant that persistent volumes be owned by one task at a time, we 
> currently choose to disable the whole thing for now. At a later date we will 
> need to come back and revisit this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5461) Add Windows support for persistent volumes.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-5461:

Labels: agent mesos mesosphere windows  (was: agent mesos mesosphere)

> Add Windows support for persistent volumes.
> ---
>
> Key: MESOS-5461
> URL: https://issues.apache.org/jira/browse/MESOS-5461
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Alex Clemmer
>Assignee: John Kordich
>  Labels: agent, mesos, mesosphere, windows
>
> Right now the persistent volumes code in the POSIX isolators is `#ifdef`'d 
> out for Windows compilations.
> This is because the Windows isolators take a dependency on the POSIX 
> isolators, but Windows doesn't support `os::chown`. Since this protects the 
> invariant that persistent volumes be owned by one task at a time, we 
> currently choose to disable the whole thing for now. At a later date we will 
> need to come back and revisit this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6392) Remove `TEST*_TEMP_DISABLED_ON_WINDOWS`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-6392:
---

Assignee: Li Li

> Remove `TEST*_TEMP_DISABLED_ON_WINDOWS`
> ---
>
> Key: MESOS-6392
> URL: https://issues.apache.org/jira/browse/MESOS-6392
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Li Li
>Priority: Minor
>  Labels: stout, windows
>
> As a temporary measure, we introduced the family of macros 
> `TEST*_TEMP_DISABLED_ON_WINDOWS`. This creates a `DISABLED_` test on Windows, 
> but enables it on every other platform.
> Eventually, permanently-disabled tests should be `#ifdef`'d out and these 
> macros should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6117) TCP health checks are not supported on Windows.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6117:

Labels: check health-check mesosphere windows  (was: check health-check 
mesosphere)

> TCP health checks are not supported on Windows.
> ---
>
> Key: MESOS-6117
> URL: https://issues.apache.org/jira/browse/MESOS-6117
> Project: Mesos
>  Issue Type: Task
>  Components: executor
>Reporter: Alexander Rukletsov
>Assignee: Andrew Schwartzmeyer
>  Labels: check, health-check, mesosphere, windows
>
> Currently, TCP health check is only available on Linux. Windows support 
> should be added to maintain feature parity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6392) Remove `TEST*_TEMP_DISABLED_ON_WINDOWS`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6392:

Issue Type: Task  (was: Bug)

> Remove `TEST*_TEMP_DISABLED_ON_WINDOWS`
> ---
>
> Key: MESOS-6392
> URL: https://issues.apache.org/jira/browse/MESOS-6392
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>Priority: Minor
>  Labels: stout, windows
>
> As a temporary measure, we introduced the family of macros 
> `TEST*_TEMP_DISABLED_ON_WINDOWS`. This creates a `DISABLED_` test on Windows, 
> but enables it on every other platform.
> Eventually, permanently-disabled tests should be `#ifdef`'d out and these 
> macros should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6392) Remove `TEST*_TEMP_DISABLED_ON_WINDOWS`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-6392:
---

Assignee: (was: Alex Clemmer)

> Remove `TEST*_TEMP_DISABLED_ON_WINDOWS`
> ---
>
> Key: MESOS-6392
> URL: https://issues.apache.org/jira/browse/MESOS-6392
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Priority: Minor
>  Labels: stout, windows
>
> As a temporary measure, we introduced the family of macros 
> `TEST*_TEMP_DISABLED_ON_WINDOWS`. This creates a `DISABLED_` test on Windows, 
> but enables it on every other platform.
> Eventually, permanently-disabled tests should be `#ifdef`'d out and these 
> macros should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6392) Remove `TEST*_TEMP_DISABLED_ON_WINDOWS`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6392:

Labels: stout windows  (was: stout)

> Remove `TEST*_TEMP_DISABLED_ON_WINDOWS`
> ---
>
> Key: MESOS-6392
> URL: https://issues.apache.org/jira/browse/MESOS-6392
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Alex Clemmer
>Priority: Minor
>  Labels: stout, windows
>
> As a temporary measure, we introduced the family of macros 
> `TEST*_TEMP_DISABLED_ON_WINDOWS`. This creates a `DISABLED_` test on Windows, 
> but enables it on every other platform.
> Eventually, permanently-disabled tests should be `#ifdef`'d out and these 
> macros should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6117) TCP health checks are not supported on Windows.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961281#comment-15961281
 ] 

Andrew Schwartzmeyer commented on MESOS-6117:
-

Slightly related as the final two tests needed for MESOS-6709 also require a 
curl alternative implementation on Windows.

> TCP health checks are not supported on Windows.
> ---
>
> Key: MESOS-6117
> URL: https://issues.apache.org/jira/browse/MESOS-6117
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Alexander Rukletsov
>Assignee: Andrew Schwartzmeyer
>  Labels: check, health-check, mesosphere
>
> Currently, TCP health check is only available on Linux. Windows support 
> should be added to maintain feature parity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6117) TCP health checks are not supported on Windows.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6117:

Issue Type: Task  (was: Bug)

> TCP health checks are not supported on Windows.
> ---
>
> Key: MESOS-6117
> URL: https://issues.apache.org/jira/browse/MESOS-6117
> Project: Mesos
>  Issue Type: Task
>  Components: executor
>Reporter: Alexander Rukletsov
>Assignee: Andrew Schwartzmeyer
>  Labels: check, health-check, mesosphere
>
> Currently, TCP health check is only available on Linux. Windows support 
> should be added to maintain feature parity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6117) TCP health checks are not supported on Windows.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-6117:
---

Assignee: Andrew Schwartzmeyer

> TCP health checks are not supported on Windows.
> ---
>
> Key: MESOS-6117
> URL: https://issues.apache.org/jira/browse/MESOS-6117
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Alexander Rukletsov
>Assignee: Andrew Schwartzmeyer
>  Labels: check, health-check, mesosphere
>
> Currently, TCP health check is only available on Linux. Windows support 
> should be added to maintain feature parity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-5940) `setPaths` doesn’t work on Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-5940:
---

Assignee: Andrew Schwartzmeyer  (was: Alex Clemmer)

> `setPaths` doesn’t work on Windows
> --
>
> Key: MESOS-5940
> URL: https://issues.apache.org/jira/browse/MESOS-5940
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>
>  `LD_LIBRARY_PATH` doesn’t exist on Windows, so this will never be successful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6709) Port `health_check_tests.cpp`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6709:

Issue Type: Task  (was: Bug)

> Port `health_check_tests.cpp`
> -
>
> Key: MESOS-6709
> URL: https://issues.apache.org/jira/browse/MESOS-6709
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: microsoft, windows, windows-mvp
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6714) Port `slave_tests.cpp`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6714:

Issue Type: Task  (was: Bug)

> Port `slave_tests.cpp`
> --
>
> Key: MESOS-6714
> URL: https://issues.apache.org/jira/browse/MESOS-6714
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Alex Clemmer
>  Labels: microsoft, windows-mvp
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6714) Port `slave_tests.cpp`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-6714:
---

Assignee: (was: Alex Clemmer)

> Port `slave_tests.cpp`
> --
>
> Key: MESOS-6714
> URL: https://issues.apache.org/jira/browse/MESOS-6714
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>  Labels: microsoft, windows-mvp
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6709) Port `health_check_tests.cpp`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-6709:
---

Assignee: Andrew Schwartzmeyer

> Port `health_check_tests.cpp`
> -
>
> Key: MESOS-6709
> URL: https://issues.apache.org/jira/browse/MESOS-6709
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: microsoft, windows, windows-mvp
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6709) Port `health_check_tests.cpp`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-6709:
---

Assignee: (was: Alex Clemmer)

> Port `health_check_tests.cpp`
> -
>
> Key: MESOS-6709
> URL: https://issues.apache.org/jira/browse/MESOS-6709
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>  Labels: microsoft, windows, windows-mvp
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6709) Port `health_check_tests.cpp`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6709:

Labels: microsoft windows windows-mvp  (was: microsoft windows-mvp)

> Port `health_check_tests.cpp`
> -
>
> Key: MESOS-6709
> URL: https://issues.apache.org/jira/browse/MESOS-6709
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: microsoft, windows, windows-mvp
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7335) Blocked: Replace `wstring` with `u16string`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-7335:

Summary: Blocked: Replace `wstring` with `u16string`  (was: Replace 
`wstring` with `u16string`)

> Blocked: Replace `wstring` with `u16string`
> ---
>
> Key: MESOS-7335
> URL: https://issues.apache.org/jira/browse/MESOS-7335
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
> Environment: Windows 10
>Reporter: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: microsoft, unicode, windows
>
> The most correct C++ type for UTF-16 data is {{std::u16string}}, but due to a 
> known bug in 
> [MSVC|https://connect.microsoft.com/VisualStudio/Feedback/Details/1403302], 
> the {{std::codecvt_utf8_utf16}} converter cannot be used with {{char16_t / 
> u16string}} types, so we must use {{wchar_t / wstring}} types for now.
> {quote}
> I am deeply sorry for the statement I made about the availability of this 
> fix. This is not fixed in Visual Studio 2017.
> This issue requires a binary breaking change to the VC++ libraries, so we 
> cannot ship this fix in an update to the libraries. Visual Studio 2017 
> shipped with v141 of the libraries, which is a minor update and binary 
> compatible with v140 (the version shipped with Visual Studio 2015).
> This bug will be fixed in the next major version of the Visual C++ Libraries.
> Thanks,
> Steve Wishnousky
> Software Engineer II - Visual C++ Libraries
> stw...@microsoft.com
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7335) Blocked: Replace `wstring` with `u16string`

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-7335:
---

Assignee: Andrew Schwartzmeyer

> Blocked: Replace `wstring` with `u16string`
> ---
>
> Key: MESOS-7335
> URL: https://issues.apache.org/jira/browse/MESOS-7335
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
> Environment: Windows 10
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>  Labels: microsoft, unicode, windows
>
> The most correct C++ type for UTF-16 data is {{std::u16string}}, but due to a 
> known bug in 
> [MSVC|https://connect.microsoft.com/VisualStudio/Feedback/Details/1403302], 
> the {{std::codecvt_utf8_utf16}} converter cannot be used with {{char16_t / 
> u16string}} types, so we must use {{wchar_t / wstring}} types for now.
> {quote}
> I am deeply sorry for the statement I made about the availability of this 
> fix. This is not fixed in Visual Studio 2017.
> This issue requires a binary breaking change to the VC++ libraries, so we 
> cannot ship this fix in an update to the libraries. Visual Studio 2017 
> shipped with v141 of the libraries, which is a minor update and binary 
> compatible with v140 (the version shipped with Visual Studio 2015).
> This bug will be fixed in the next major version of the Visual C++ Libraries.
> Thanks,
> Steve Wishnousky
> Software Engineer II - Visual C++ Libraries
> stw...@microsoft.com
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6807) Design mapping between `TaskInfo` and Job Objects

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6807:

Issue Type: Improvement  (was: Bug)

> Design mapping between `TaskInfo` and Job Objects
> -
>
> Key: MESOS-6807
> URL: https://issues.apache.org/jira/browse/MESOS-6807
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
> Environment: Windows Server 2016
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>  Labels: microsoft, windows
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This issue starts tracking the work of correctly mapping Mesos's `TaskInfo` 
> APIs (as in resource usage limits of particular tasks scheduled on an agent) 
> to [Windows' Job 
> Objects|https://msdn.microsoft.com/en-us/library/windows/desktop/ms684161(v=vs.85).aspx],
>  which are akin to Linux's `cgroup`s.
> Initial time estimate is for the investigation, not implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7157) LINK : fatal error LNK1181: cannot open input file 'glog.lib' when build mesos on Release configuration on Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-7157:

Priority: Minor  (was: Major)

> LINK : fatal error LNK1181: cannot open input file 'glog.lib' when build 
> mesos on Release configuration on Windows
> --
>
> Key: MESOS-7157
> URL: https://issues.apache.org/jira/browse/MESOS-7157
> Project: Mesos
>  Issue Type: Bug
>  Components: build
> Environment: Windows10 + 64-bit + Visual studio 2015 
>Reporter: PhoebeHui
>Assignee: Andrew Schwartzmeyer
>Priority: Minor
>
> I try to build Mesos with vs2015 on Release configuration on Windows, but it 
> failed with link error LNK1181. It seems that it hard code the glog.lib path:
> if I set Configuration=Debug, it will find glog.lib on
> D:/mesos/build_x64/3rdparty/glog-0.3.4/src/glog-0.3.4-build/Debug or
> D:/mesos/build_x64/3rdparty/glog-0.3.4/src/glog-0.3.4-build/Debug/Debug
> it works.
> if I set Configuration=Release, it will find glog.lib on 
> D:/mesos/build_x64/3rdparty/glog-0.3.4/src/glog-0.3.4-build/Debug or
> D:/mesos/build_x64/3rdparty/glog-0.3.4/src/glog-0.3.4-build/Debug/Release
> it doesn't work, since 
> D:/mesos/build_x64/3rdparty/glog-0.3.4/src/glog-0.3.4-build/Debug doesn't 
> exsit. the glog.lib located on  
> D:/mesos/build_x64/3rdparty/glog-0.3.4/src/glog-0.3.4-build/Release
> Repro steps:
> Get source: 
> git clone -c core.autocrlf=true https://github.com/apache/mesos D:\mesos\src
> Build the project:
> cd d:\mesos\src
> .\bootstrap.bat 
> cd..
>  mkdir build_x64 && pushd build_x64
> cmake ..\src -G "Visual Studio 14 2015 Win64" -DENABLE_LIBEVENT=1 
> -DHAS_AUTHENTICATION=0 -DPATCHEXE_PATH="C:\Program Files (x86)\GnuWin32\bin" 
> msbuild Mesos.sln /p:Configuration=Release /p:PreferredToolArchitecture=x64 
> /m 
> It will build failed with Link error:
> "F:\mesos\build_x64\Mesos.sln" (Rebuild target) (1) ->
>"D:\mesos\build_x64\3rdparty\stout\tests\stout-tests.vcxproj.metaproj" 
> (Rebuild target) (37) ->
>"D:\mesos\build_x64\3rdparty\stout\tests\stout-tests.vcxproj" (Rebuild 
> target) (52) ->
>(Link target) -> 
>  LINK : fatal error LNK1181: cannot open input file 'glog.lib' 
> [D:\mesos\build_x64\3rdparty\stout\tests\stout-tests.vcxproj]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-3176) Replicate *nix permission logic in Windows using the NTFS ACL API.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-3176:
---

Assignee: Li Li  (was: Alex Clemmer)

> Replicate *nix permission logic in Windows using the NTFS ACL API.
> --
>
> Key: MESOS-3176
> URL: https://issues.apache.org/jira/browse/MESOS-3176
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, stout
>Reporter: Alex Clemmer
>Assignee: Li Li
>  Labels: containerizer, stout
>
> From a forthcoming comment in stout/windows/permissions.hpp:
> {code}
>   // TODO(hausdorff): (Tracked as MESOS-3176) On Windows, we currently don't
>   // support User, Group, or "Other" permissions -- everyone is in one big
>   // group; we also currently only support setting write permissions (i.e.,
>   // everyone can write, or no one can) -- so, on Windows agents, any user can
>   // read and execute a file.
>   //
>   // WHY: Currently we're using the DOS permissions model because it's easier.
>   // In the long term we want Windows agents to replicate the *nix model of 
> file
>   // permissions by transitioning from the DOS model to the NTFS ACL API. 
> This,
>   // however, is a significant work item in itself, and will not be done for
>   // the Windows MVP.
>   //
>   // The longer story is, the permissions model we currently use is the
>   // (extremely primitive) DOS model. The CliffsNotes version of the DOS
>   // permission model follows:
>   //
>   //   * There is one type of privilege: write privilege.
>   //   * All files can be read
>   //   * Therefore, there is no native notion of "User", "Group", or "Other"
>   // permissions.
>   //   * There is no concept whatsoever of execute permissions; if a file can
>   // be read (and it definitely can), and if it's a binary, you have
>   // execute permissions.
>   //   * All in all: the DOS model is arguably ok for situations where there 
> is
>   // a single user in a location that can be considered "secure."
>   //
>   // The practical impact of this is that most of the permissions-oriented 
> APIs
>   // in Stout will _pretend_ to set appropriate permissions on Windows, but
>   // mostly set them to "global writable" instead.
>   //
>   // This is clearly not the ideal permissions scenario for Mesos. The other
>   // option is to use the NTFS Access Control List (ACL) API, and in the long
>   // term we will want to transition to that. The CliffsNotes version of the
>   // ACL permission model is as follows:
>   //
>   //   * An ACL is a list of security specifications (each specification is
>   // known as an "Access Control Entry") that describes the access model 
> of
>   // an "object." An "object can be a process, a file, an event, or
>   // anything else that has a security descriptor.
>   //   * Privileges can be granted by a process with required privileges.
>   //   * The ACL model is very fine-grained: access can be granted to a user, 
> a
>   // group, or "other", and can be split up by read, write, and execute
>   // permissions.
>   //
>   // BUT, and here is the kicker, the ACL model is dramatically more
>   // complicated, and not worth doing in the MVP. Our goal in the future is to
>   // find a more permanent solution; for now, we have a non-invasive 
> Unix-based
>   // permission model, and that will work for now.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-3176) Replicate *nix permission logic in Windows using the NTFS ACL API.

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-3176:

Issue Type: Task  (was: Bug)

> Replicate *nix permission logic in Windows using the NTFS ACL API.
> --
>
> Key: MESOS-3176
> URL: https://issues.apache.org/jira/browse/MESOS-3176
> Project: Mesos
>  Issue Type: Task
>  Components: containerization, stout
>Reporter: Alex Clemmer
>Assignee: Li Li
>  Labels: containerizer, stout
>
> From a forthcoming comment in stout/windows/permissions.hpp:
> {code}
>   // TODO(hausdorff): (Tracked as MESOS-3176) On Windows, we currently don't
>   // support User, Group, or "Other" permissions -- everyone is in one big
>   // group; we also currently only support setting write permissions (i.e.,
>   // everyone can write, or no one can) -- so, on Windows agents, any user can
>   // read and execute a file.
>   //
>   // WHY: Currently we're using the DOS permissions model because it's easier.
>   // In the long term we want Windows agents to replicate the *nix model of 
> file
>   // permissions by transitioning from the DOS model to the NTFS ACL API. 
> This,
>   // however, is a significant work item in itself, and will not be done for
>   // the Windows MVP.
>   //
>   // The longer story is, the permissions model we currently use is the
>   // (extremely primitive) DOS model. The CliffsNotes version of the DOS
>   // permission model follows:
>   //
>   //   * There is one type of privilege: write privilege.
>   //   * All files can be read
>   //   * Therefore, there is no native notion of "User", "Group", or "Other"
>   // permissions.
>   //   * There is no concept whatsoever of execute permissions; if a file can
>   // be read (and it definitely can), and if it's a binary, you have
>   // execute permissions.
>   //   * All in all: the DOS model is arguably ok for situations where there 
> is
>   // a single user in a location that can be considered "secure."
>   //
>   // The practical impact of this is that most of the permissions-oriented 
> APIs
>   // in Stout will _pretend_ to set appropriate permissions on Windows, but
>   // mostly set them to "global writable" instead.
>   //
>   // This is clearly not the ideal permissions scenario for Mesos. The other
>   // option is to use the NTFS Access Control List (ACL) API, and in the long
>   // term we will want to transition to that. The CliffsNotes version of the
>   // ACL permission model is as follows:
>   //
>   //   * An ACL is a list of security specifications (each specification is
>   // known as an "Access Control Entry") that describes the access model 
> of
>   // an "object." An "object can be a process, a file, an event, or
>   // anything else that has a security descriptor.
>   //   * Privileges can be granted by a process with required privileges.
>   //   * The ACL model is very fine-grained: access can be granted to a user, 
> a
>   // group, or "other", and can be split up by read, write, and execute
>   // permissions.
>   //
>   // BUT, and here is the kicker, the ACL model is dramatically more
>   // complicated, and not worth doing in the MVP. Our goal in the future is to
>   // find a more permanent solution; for now, we have a non-invasive 
> Unix-based
>   // permission model, and that will work for now.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5882) `os::cloexec` does not exist on Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961217#comment-15961217
 ] 

Andrew Schwartzmeyer commented on MESOS-5882:
-

Joe, we're assigning this to simply to mean "follow-up with Joe". We need a lot 
more context to understand this bug, which is likely going to come from you. 
Thanks!

> `os::cloexec` does not exist on Windows
> ---
>
> Key: MESOS-5882
> URL: https://issues.apache.org/jira/browse/MESOS-5882
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Joseph Wu
>  Labels: mesosphere, stout
>
> `os::cloexec` does not work on Windows. It will never work at the OS level. 
> Because of this, there are likely many important and hard-to-detect bugs 
> hanging around the agent.
> This is extremely important to fix. Some possible solutions to investigate 
> (some of which are _extremely_ risky):
> * Abstract out file descriptors into a class, implement cloexec in that class 
> on Windows (since we can't rely on the OS to do it).
> * Refactor all the code that relies on `os::cloexec` to not rely on it.
> Of the two, the first seems less risky in the short term, because the cloexec 
> code only affects Windows. Depending on the semantics of the implementation 
> of the `FileDescriptor` class, it is possible that this is riskier to Windows 
> in the longer term, as the semantics of `cloexec` may have subtle difference 
> between Linux and Windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-5882) `os::cloexec` does not exist on Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-5882:
---

Assignee: Joseph Wu

> `os::cloexec` does not exist on Windows
> ---
>
> Key: MESOS-5882
> URL: https://issues.apache.org/jira/browse/MESOS-5882
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Joseph Wu
>  Labels: mesosphere, stout
>
> `os::cloexec` does not work on Windows. It will never work at the OS level. 
> Because of this, there are likely many important and hard-to-detect bugs 
> hanging around the agent.
> This is extremely important to fix. Some possible solutions to investigate 
> (some of which are _extremely_ risky):
> * Abstract out file descriptors into a class, implement cloexec in that class 
> on Windows (since we can't rely on the OS to do it).
> * Refactor all the code that relies on `os::cloexec` to not rely on it.
> Of the two, the first seems less risky in the short term, because the cloexec 
> code only affects Windows. Depending on the semantics of the implementation 
> of the `FileDescriptor` class, it is possible that this is riskier to Windows 
> in the longer term, as the semantics of `cloexec` may have subtle difference 
> between Linux and Windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-5882) `os::cloexec` does not exist on Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-5882:
---

Assignee: (was: Alex Clemmer)

> `os::cloexec` does not exist on Windows
> ---
>
> Key: MESOS-5882
> URL: https://issues.apache.org/jira/browse/MESOS-5882
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>  Labels: mesosphere, stout
>
> `os::cloexec` does not work on Windows. It will never work at the OS level. 
> Because of this, there are likely many important and hard-to-detect bugs 
> hanging around the agent.
> This is extremely important to fix. Some possible solutions to investigate 
> (some of which are _extremely_ risky):
> * Abstract out file descriptors into a class, implement cloexec in that class 
> on Windows (since we can't rely on the OS to do it).
> * Refactor all the code that relies on `os::cloexec` to not rely on it.
> Of the two, the first seems less risky in the short term, because the cloexec 
> code only affects Windows. Depending on the semantics of the implementation 
> of the `FileDescriptor` class, it is possible that this is riskier to Windows 
> in the longer term, as the semantics of `cloexec` may have subtle difference 
> between Linux and Windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-3872) Investigate adding color to `support/post-reviews.py` on Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-3872:

Priority: Minor  (was: Major)

> Investigate adding color to `support/post-reviews.py` on Windows
> 
>
> Key: MESOS-3872
> URL: https://issues.apache.org/jira/browse/MESOS-3872
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>Priority: Minor
>  Labels: mesosphere, windows
>
> From the comments:
> # TODO(hausdorff): We have disabled colors for the diffs on Windows, as 
> piping them through `subprocess` causes us to emit ANSI escape codes, which 
> the command prompt doesn't recognize. Presumably we are being routed through 
> some TTY that causes git to not emit the colors using `cmd`'s color codes API 
> (which is entirely different from ANSI. See [1] for more information and 
> MESOS-3872.
> #
> # [1] 
> http://stackoverflow.com/questions/5921556/in-git-bash-on-windows-7-colors-display-as-code-when-running-cucumber-or-rspec



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-3386) Port remaining Stout and libprocess tests to Windows

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-3386:
---

Assignee: John Kordich  (was: Alex Clemmer)

> Port remaining Stout and libprocess tests to Windows
> 
>
> Key: MESOS-3386
> URL: https://issues.apache.org/jira/browse/MESOS-3386
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Alex Clemmer
>Assignee: John Kordich
>  Labels: build, mesosphere, microsoft, tests
>
> We will need to go through all the test files and investigate any test that's 
> marked `TEST_TEMP_DISABLED_ON_WINDOWS`.
> Additionally, here is a concise list of the Stout test files that aren't 
> compile as of 12/5/2016:
> {quote}
> Stout:
> path_tests.cpp
> protobuf_tests.cpp
> protobuf_tests.pb.cc
> svn_tests.cpp
> os/sendfile_tests.cpp
> os/signals_tests.cpp
> libprocess:
> io_tests.cpp
> reap_tests.cpp
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7342) Port Docker tests

2017-04-07 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-7342:
---

Assignee: John Kordich

> Port Docker tests
> -
>
> Key: MESOS-7342
> URL: https://issues.apache.org/jira/browse/MESOS-7342
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
> Environment: Windows 10
>Reporter: Andrew Schwartzmeyer
>Assignee: John Kordich
>  Labels: microsoft, windows
>
> While one of Daniel Pravat's last acts was introducing the the Docker 
> containerizer for Windows, we don't have tests. We need to port 
> `docker_tests.cpp` and `docker_containerizer_tests.cpp` to Windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6791) Allow to specific the device whitelist entries in cgroup devices subsystem

2017-04-07 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6791:

Story Points: 1

> Allow to specific the device whitelist entries in cgroup devices subsystem
> --
>
> Key: MESOS-6791
> URL: https://issues.apache.org/jira/browse/MESOS-6791
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups
>Reporter: haosdent
>Assignee: haosdent
>  Labels: cgroups
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6791) Allow to specific the device whitelist entries in cgroup devices subsystem

2017-04-07 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961114#comment-15961114
 ] 

haosdent commented on MESOS-6791:
-

hi, [~gilbert] 1~2 points.

> Allow to specific the device whitelist entries in cgroup devices subsystem
> --
>
> Key: MESOS-6791
> URL: https://issues.apache.org/jira/browse/MESOS-6791
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups
>Reporter: haosdent
>Assignee: haosdent
>  Labels: cgroups
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-07 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-7210:
---

Assignee: Deshi Xiao  (was: haosdent)

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-07 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7210:

Shepherd: haosdent

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7369) Enable isolation for Infiniband HCAs

2017-04-07 Thread Vishnu Mohan (JIRA)
Vishnu Mohan created MESOS-7369:
---

 Summary: Enable isolation for Infiniband HCAs
 Key: MESOS-7369
 URL: https://issues.apache.org/jira/browse/MESOS-7369
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Reporter: Vishnu Mohan


Build the equivalent of the GPU isolator (incl. the equivalent of the nvml 
driver/libraries overlay into the container filesystem for the mlx 
driver/libraries) to support Infiniband HCAs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7364) Upgrade vendored GMock / GTest

2017-04-07 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-7364:

Sprint: Mesosphere Sprint 54

> Upgrade vendored GMock / GTest
> --
>
> Key: MESOS-7364
> URL: https://issues.apache.org/jira/browse/MESOS-7364
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Reporter: Neil Conway
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> We currently vendor gmock 1.7.0. The latest upstream version of gmock is 
> 1.8.0, which fixes at least one annoying warning (MESOS-6539).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7193) Use of `GTEST_IS_THREADSAFE` in asserts is problematic.

2017-04-07 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht reassigned MESOS-7193:
---

Assignee: Jan Schlicht

> Use of `GTEST_IS_THREADSAFE` in asserts is problematic.
> ---
>
> Key: MESOS-7193
> URL: https://issues.apache.org/jira/browse/MESOS-7193
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, tests
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>
> Some test cases in libprocess use {{ASSERT_TRUE(GTEST_IS_THREADSAFE)}}. This 
> is a misuse of that define, [the documentation in GTest 
> says|https://github.com/google/googletest/blob/master/googletest/include/gtest/internal/gtest-port.h#L155-L163]:
> {noformat}
> Macros indicating which Google Test features are available (a macro
> is defined to 1 if the corresponding feature is supported;
> otherwise UNDEFINED -- it's never defined to 0.).  Google Test
> defines these macros automatically.  Code outside Google Test MUST
> NOT define them.
> {noformat}
> Currently, the use of {{GTEST_IS_THREADSAFE}} works fine in the assert, 
> because it is defined to be {{1}}. But newer upstream versions of GTest use a 
> more complicated define, that can yield to be undefined, causing compilation 
> errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7193) Use of `GTEST_IS_THREADSAFE` in asserts is problematic.

2017-04-07 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-7193:

Sprint: Mesosphere Sprint 54

> Use of `GTEST_IS_THREADSAFE` in asserts is problematic.
> ---
>
> Key: MESOS-7193
> URL: https://issues.apache.org/jira/browse/MESOS-7193
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, tests
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>
> Some test cases in libprocess use {{ASSERT_TRUE(GTEST_IS_THREADSAFE)}}. This 
> is a misuse of that define, [the documentation in GTest 
> says|https://github.com/google/googletest/blob/master/googletest/include/gtest/internal/gtest-port.h#L155-L163]:
> {noformat}
> Macros indicating which Google Test features are available (a macro
> is defined to 1 if the corresponding feature is supported;
> otherwise UNDEFINED -- it's never defined to 0.).  Google Test
> defines these macros automatically.  Code outside Google Test MUST
> NOT define them.
> {noformat}
> Currently, the use of {{GTEST_IS_THREADSAFE}} works fine in the assert, 
> because it is defined to be {{1}}. But newer upstream versions of GTest use a 
> more complicated define, that can yield to be undefined, causing compilation 
> errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7364) Upgrade vendored GMock / GTest

2017-04-07 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht reassigned MESOS-7364:
---

Assignee: Jan Schlicht

> Upgrade vendored GMock / GTest
> --
>
> Key: MESOS-7364
> URL: https://issues.apache.org/jira/browse/MESOS-7364
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Reporter: Neil Conway
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> We currently vendor gmock 1.7.0. The latest upstream version of gmock is 
> 1.8.0, which fixes at least one annoying warning (MESOS-6539).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7367) MasterAPITest.GetRoles is flaky on machines with non-C locale.

2017-04-07 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7367:
---
Labels: flaky-test mesosphere newbie++ test  (was: flaky-test mesosphere 
test)

> MasterAPITest.GetRoles is flaky on machines with non-C locale.
> --
>
> Key: MESOS-7367
> URL: https://issues.apache.org/jira/browse/MESOS-7367
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04 with non-C locale
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere, newbie++, test
>
> {{MasterAPITest.GetRoles}} test sets role weight to a real number using {{.}} 
> as a decimal mark. This however is not correct on machines with non-standard 
> locale, because weight parsing code relies on locale: 
> [https://github.com/apache/mesos/blob/7f04cf886fc2ed59414bf0056a2f351959a2d1f8/src/master/master.cpp#L727-L750].
>  This leads to test failures: [https://pastebin.com/sQR2Tr2Q].
> There are several solutions here.
> h4. 1. Change parsing code to be locale-agnostic.
> This seems to be the most robust solution. However, the {{--weights}} flag is 
> deprecated and will probably be removed soon, together with the parsing code. 
> h4. 2. Fix call sites in our tests to ensure decimal mark is locale dependent.
> This seems like a reasonable solution, but I'd argue we can do even better.
> h4. 3. Use locale-agnostic format for doubles in tests.
> Instead of saying {{"2.5"}} we can say {{"25e-1"}} which is locale agnostic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7367) MasterAPITest.GetRoles is flaky on machines with non-C locale.

2017-04-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960757#comment-15960757
 ] 

Gastón Kleiman commented on MESOS-7367:
---

Commas are used as weight separators (i.e., 
{{--weights="role1=2,5,role2=1.5"}}), so I don't think that using them as 
decimal separators should be valid.

I'd replace {{atof}} with a locale independent solution that uses periods as 
decimal separators. We might be making the same mistake in other places, so we 
should make sure that we're not using {{atof}} anywhere else.

> MasterAPITest.GetRoles is flaky on machines with non-C locale.
> --
>
> Key: MESOS-7367
> URL: https://issues.apache.org/jira/browse/MESOS-7367
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04 with non-C locale
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere, test
>
> {{MasterAPITest.GetRoles}} test sets role weight to a real number using {{.}} 
> as a decimal mark. This however is not correct on machines with non-standard 
> locale, because weight parsing code relies on locale: 
> [https://github.com/apache/mesos/blob/7f04cf886fc2ed59414bf0056a2f351959a2d1f8/src/master/master.cpp#L727-L750].
>  This leads to test failures: [https://pastebin.com/sQR2Tr2Q].
> There are several solutions here.
> h4. 1. Change parsing code to be locale-agnostic.
> This seems to be the most robust solution. However, the {{--weights}} flag is 
> deprecated and will probably be removed soon, together with the parsing code. 
> h4. 2. Fix call sites in our tests to ensure decimal mark is locale dependent.
> This seems like a reasonable solution, but I'd argue we can do even better.
> h4. 3. Use locale-agnostic format for doubles in tests.
> Instead of saying {{"2.5"}} we can say {{"25e-1"}} which is locale agnostic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7367) MasterAPITest.GetRoles is flaky on machines with non-C locale.

2017-04-07 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960734#comment-15960734
 ] 

Alexander Rukletsov commented on MESOS-7367:


After thinking more about *option 2*, this is probably not an option, because 
weights parsing code uses commas to separate entires.

> MasterAPITest.GetRoles is flaky on machines with non-C locale.
> --
>
> Key: MESOS-7367
> URL: https://issues.apache.org/jira/browse/MESOS-7367
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04 with non-C locale
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere, test
>
> {{MasterAPITest.GetRoles}} test sets role weight to a real number using {{.}} 
> as a decimal mark. This however is not correct on machines with non-standard 
> locale, because weight parsing code relies on locale: 
> [https://github.com/apache/mesos/blob/7f04cf886fc2ed59414bf0056a2f351959a2d1f8/src/master/master.cpp#L727-L750].
>  This leads to test failures: [https://pastebin.com/sQR2Tr2Q].
> There are several solutions here.
> h4. 1. Change parsing code to be locale-agnostic.
> This seems to be the most robust solution. However, the {{--weights}} flag is 
> deprecated and will probably be removed soon, together with the parsing code. 
> h4. 2. Fix call sites in our tests to ensure decimal mark is locale dependent.
> This seems like a reasonable solution, but I'd argue we can do even better.
> h4. 3. Use locale-agnostic format for doubles in tests.
> Instead of saying {{"2.5"}} we can say {{"25e-1"}} which is locale agnostic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7367) MasterAPITest.GetRoles is flaky on machines with non-C locale.

2017-04-07 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7367:
---
Description: 
{{MasterAPITest.GetRoles}} test sets role weight to a real number using {{.}} 
as a decimal mark. This however is not correct on machines with non-standard 
locale, because weight parsing code relies on locale: 
[https://github.com/apache/mesos/blob/7f04cf886fc2ed59414bf0056a2f351959a2d1f8/src/master/master.cpp#L727-L750].
 This leads to test failures: [https://pastebin.com/sQR2Tr2Q].

There are several solutions here.

h4. 1. Change parsing code to be locale-agnostic.
This seems to be the most robust solution. However, the {{--weights}} flag is 
deprecated and will probably be removed soon, together with the parsing code. 

h4. 2. Fix call sites in our tests to ensure decimal mark is locale dependent.
This seems like a reasonable solution, but I'd argue we can do even better.

h4. 3. Use locale-agnostic format for doubles in tests.
Instead of saying {{"2.5"}} we can say {{"25e-1"}} which is locale agnostic.

  was:
{{MasterAPITest.GetRoles}} test sets role weight to a real number using {{.}} 
as a decimal mark. This however is not correct on machines with non-standard 
locale, because weight parsing code relies on locale: 
[https://github.com/apache/mesos/blob/7f04cf886fc2ed59414bf0056a2f351959a2d1f8/src/master/master.cpp#L727-L750].
 This leads to test failures: [https://pastebin.com/sQR2Tr2Q].

There are several solutions here.

h4. Change parsing code to be locale-agnostic.
This seems to be the most robust solution. However, the {{--weights}} flag is 
deprecated and will probably be removed soon, together with the parsing code. 

h4. Fix call sites in our tests to ensure decimal mark is locale dependent.
This seems like a reasonable solution, but I'd argue we can do even better.

h4. Use locale-agnostic format for doubles in tests.
Instead of saying {{"2.5"}} we can say {{"25e-1"}} which is locale agnostic.


> MasterAPITest.GetRoles is flaky on machines with non-C locale.
> --
>
> Key: MESOS-7367
> URL: https://issues.apache.org/jira/browse/MESOS-7367
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04 with non-C locale
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere, test
>
> {{MasterAPITest.GetRoles}} test sets role weight to a real number using {{.}} 
> as a decimal mark. This however is not correct on machines with non-standard 
> locale, because weight parsing code relies on locale: 
> [https://github.com/apache/mesos/blob/7f04cf886fc2ed59414bf0056a2f351959a2d1f8/src/master/master.cpp#L727-L750].
>  This leads to test failures: [https://pastebin.com/sQR2Tr2Q].
> There are several solutions here.
> h4. 1. Change parsing code to be locale-agnostic.
> This seems to be the most robust solution. However, the {{--weights}} flag is 
> deprecated and will probably be removed soon, together with the parsing code. 
> h4. 2. Fix call sites in our tests to ensure decimal mark is locale dependent.
> This seems like a reasonable solution, but I'd argue we can do even better.
> h4. 3. Use locale-agnostic format for doubles in tests.
> Instead of saying {{"2.5"}} we can say {{"25e-1"}} which is locale agnostic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7368) Documentation of framework role(s) in proto definition is confusing

2017-04-07 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960730#comment-15960730
 ] 

Benjamin Bannier commented on MESOS-7368:
-

[~guoger] [~bmahler]: I see you introduced {{deprecated}} and {{EXPERIMENTAL}} 
in {{eb674bb614}} and {{d06d05c76e}}, respectively. Could you coordinate on a 
resolution?

> Documentation of framework role(s) in proto definition is confusing
> ---
>
> Key: MESOS-7368
> URL: https://issues.apache.org/jira/browse/MESOS-7368
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Benjamin Bannier
>
> The documentation for role-related fields in {{FrameworkInfo}} is confusing: 
> the {{role}} field is marked as {{deprecated}} while the {{roles}} field is 
> marked as {{EXPERIMENTAL}} and has an additional {{NOTE}} advising to not use 
> this field.
> This leaves users confused what field they should use in their code. 
> Depending on how the proto definition is used {{deprecated}} might even 
> produce (fatal) errors, while {{roles}} seems strongly discouraged.
> We should clean this up to make sure users always know which field they 
> should use.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7368) Documentation of framework role(s) in proto definition is confusing

2017-04-07 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-7368:
---

 Summary: Documentation of framework role(s) in proto definition is 
confusing
 Key: MESOS-7368
 URL: https://issues.apache.org/jira/browse/MESOS-7368
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Benjamin Bannier


The documentation for role-related fields in {{FrameworkInfo}} is confusing: 
the {{role}} field is marked as {{deprecated}} while the {{roles}} field is 
marked as {{EXPERIMENTAL}} and has an additional {{NOTE}} advising to not use 
this field.

This leaves users confused what field they should use in their code. Depending 
on how the proto definition is used {{deprecated}} might even produce (fatal) 
errors, while {{roles}} seems strongly discouraged.

We should clean this up to make sure users always know which field they should 
use.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7367) MasterAPITest.GetRoles is flaky on machines with non-C locale.

2017-04-07 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-7367:
--

 Summary: MasterAPITest.GetRoles is flaky on machines with non-C 
locale.
 Key: MESOS-7367
 URL: https://issues.apache.org/jira/browse/MESOS-7367
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.2.0, 1.1.1, 1.0.2
 Environment: Ubuntu 16.04 with non-C locale
Reporter: Alexander Rukletsov


{{MasterAPITest.GetRoles}} test sets role weight to a real number using {{.}} 
as a decimal mark. This however is not correct on machines with non-standard 
locale, because weight parsing code relies on locale: 
[https://github.com/apache/mesos/blob/7f04cf886fc2ed59414bf0056a2f351959a2d1f8/src/master/master.cpp#L727-L750].
 This leads to test failures: [https://pastebin.com/sQR2Tr2Q].

There are several solutions here.

h4. Change parsing code to be locale-agnostic.
This seems to be the most robust solution. However, the {{--weights}} flag is 
deprecated and will probably be removed soon, together with the parsing code. 

h4. Fix call sites in our tests to ensure decimal mark is locale dependent.
This seems like a reasonable solution, but I'd argue we can do even better.

h4. Use locale-agnostic format for doubles in tests.
Instead of saying {{"2.5"}} we can say {{"25e-1"}} which is locale agnostic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7355) Set MESOS_SANDBOX in debug containers.

2017-04-07 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7355:
---
  Sprint: Mesosphere Sprint 54
Story Points: 1  (was: 3)

> Set MESOS_SANDBOX in debug containers.
> --
>
> Key: MESOS-7355
> URL: https://issues.apache.org/jira/browse/MESOS-7355
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, containerization
>Reporter: Alexander Rukletsov
>  Labels: check, health-check, mesosphere
>
> Currently {{MESOS_SANDBOX}} is not set for debug containers, see 
> [https://github.com/apache/mesos/blob/7f04cf886fc2ed59414bf0056a2f351959a2d1f8/src/slave/containerizer/mesos/containerizer.cpp#L1392-L1407].
>  The most reasonable value seem to be task's sandbox.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6782) Inherit Environment from Parent containers image spec when launching DEBUG container

2017-04-07 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6782:
---
Sprint: Mesosphere Sprint 54

> Inherit Environment from Parent containers image spec when launching DEBUG 
> container
> 
>
> Key: MESOS-6782
> URL: https://issues.apache.org/jira/browse/MESOS-6782
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Alexander Rukletsov
>  Labels: debugging, mesosphere
>
> Right now whenever we enter a DEBUG container we have a fresh environment. 
> For a better user experience, we should have the DEBUG container inherit the 
> environment set up in its parent container image spec (if there is one). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6782) Inherit Environment from Parent containers image spec when launching DEBUG container

2017-04-07 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-6782:
--

Assignee: Alexander Rukletsov  (was: Jie Yu)

> Inherit Environment from Parent containers image spec when launching DEBUG 
> container
> 
>
> Key: MESOS-6782
> URL: https://issues.apache.org/jira/browse/MESOS-6782
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Alexander Rukletsov
>  Labels: debugging, mesosphere
>
> Right now whenever we enter a DEBUG container we have a fresh environment. 
> For a better user experience, we should have the DEBUG container inherit the 
> environment set up in its parent container image spec (if there is one). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)