[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-10-12 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570958#comment-15570958
 ] 

Jie Yu commented on MESOS-6118:
---

commit 89ffe695c5b885ed59374db90cd5fe0f87ba8239
Author: Jie Yu 
Date:   Wed Oct 12 22:49:45 2016 -0700

Removed two std::move in MountInfoTable::read.

Review: https://reviews.apache.org/r/51620/

> Agent would crash with docker container tasks due to host mount table read.
> ---
>
> Key: MESOS-6118
> URL: https://issues.apache.org/jira/browse/MESOS-6118
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
> Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Jamie Briant
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: linux, slave
> Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, 
> cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-10-12 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570934#comment-15570934
 ] 

Jie Yu commented on MESOS-6118:
---

commit ccc746a7d12cc524120a76aa49a0d69e7303608a
Author: Kevin Klues 
Date:   Wed Oct 12 22:33:56 2016 -0700

Added special case when sorting hierarchically in MountInfoTable::read.

It is legal to have entries in a `MountInfoTable` whose `entry.id` is
the same as `entry.parent`. This can happen (for example), if a system
boots from the network and then keeps the original `/` in RAM.
However, to avoid cycles when walking the mount hierarchy, we should
not treat these entries as children of their parent so we skip them.

This commit adds functionality to handle this case.

Review: https://reviews.apache.org/r/52596/

commit 70b227f7d5662c051d0e978e9e4bfec328854c57
Author: Kevin Klues 
Date:   Wed Oct 12 22:33:51 2016 -0700

Added more detailed error message when failing in MountInfoTable::read.

Review: https://reviews.apache.org/r/52597/

> Agent would crash with docker container tasks due to host mount table read.
> ---
>
> Key: MESOS-6118
> URL: https://issues.apache.org/jira/browse/MESOS-6118
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
> Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Jamie Briant
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: linux, slave
> Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, 
> cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6379) Updated webui to material style

2016-10-12 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6379:

Attachment: material-webui.gif

> Updated webui to material style
> ---
>
> Key: MESOS-6379
> URL: https://issues.apache.org/jira/browse/MESOS-6379
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
> Attachments: material-webui.gif
>
>
> Refer to [material style guideline | https://material.google.com/]  After 
> some simple hacks, I found it should not too hard to update current webui to 
> material style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6379) Updated webui to material style

2016-10-12 Thread haosdent (JIRA)
haosdent created MESOS-6379:
---

 Summary: Updated webui to material style
 Key: MESOS-6379
 URL: https://issues.apache.org/jira/browse/MESOS-6379
 Project: Mesos
  Issue Type: Improvement
Reporter: haosdent


Refer to [material style guideline | https://material.google.com/]  After some 
simple hacks, I found it should not too hard to update current webui to 
material style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6143) resolv.conf is not copied when using the Mesos containerizer with a Docker image

2016-10-12 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570610#comment-15570610
 ] 

Avinash Sridharan commented on MESOS-6143:
--

[~jpinkul] tried Mesos 1.0.0 with an alpine image and I don't see this issue. I 
ran the Mesos 1.0.0 on Debian 8 with the following configuration:

Mesos master:
```
sudo /usr/sbin/mesos-master --ip=172.31.12.173 --port=5050 
--log_dir=/var/log/mesos --work_dir=/var/lib/mesos
```

Mesos agent:
```
sudo /usr/sbin/mesos-slave   --ip=172.31.12.173 
 --master=172.31.12.173:5050   
--isolation=filesystem/linux,docker/runtime 
--work_dir=/var/lib/mesos   --image_providers=docker
```

mesos-execute:
```
mesos-execute --master=172.31.12.173:5050 --name=dns-test --docker_image=alpine 
--command="sleep 1"
```

Ran the `nsenter` command on the container and verified that the 
/etc/resolv.conf in the new mnt namespace is the same as that on the hostfs:
```
admin@ip-172-31-12-173:/var/lib/mesos/slaves/70a7875b-aecc-43f0-8aea-2a239d4e97da-S0/frameworks$
 ps aux | grep mesos
admin  550  0.0  0.0  25540  2712 pts/0S+   01:52   0:00 screen -S mesos
admin  551  0.0  0.0  26900  3856 ?Ss   01:52   0:00 SCREEN -S mesos
root  1522  0.0  0.0  40540  3440 pts/1S+   01:59   0:00 sudo 
/usr/sbin/mesos-master --ip=172.31.12.173 --port=5050 --log_dir=/var/log/mesos 
--work_dir=/var/lib/mesos
root  1523  0.0  0.2 904584 33708 pts/1Sl+  01:59   0:01 
/usr/sbin/mesos-master --ip=172.31.12.173 --port=5050 --log_dir=/var/log/mesos 
--work_dir=/var/lib/mesos
root  1538  0.0  0.0  40540  3428 pts/2S+   02:01   0:00 sudo 
/usr/sbin/mesos-slave --ip=172.31.12.173 --master=172.31.12.173:5050 
--isolation=filesystem/linux,docker/runtime --work_dir=/var/lib/mesos 
--image_providers=docker
root  1539  0.1  0.2 818592 35980 pts/2Sl+  02:01   0:03 
/usr/sbin/mesos-slave --ip=172.31.12.173 --master=172.31.12.173:5050 
--isolation=filesystem/linux,docker/runtime --work_dir=/var/lib/mesos 
--image_providers=docker
admin 2045  0.0  0.1 817848 30744 pts/3Sl+  02:21   0:00 mesos-execute 
--master=172.31.12.173:5050 --name=dns-test --docker_image=alpine 
--command=sleep 1
root  2058  0.2  0.1 816488 30068 ?Ssl  02:21   0:01 mesos-executor 
--launcher_dir=/usr/libexec/mesos --sandbox_directory=/mnt/mesos/sandbox 
--user=admin 
--rootfs=/var/lib/mesos/provisioner/containers/dcbe7b8a-e430-4b7f-98eb-d7f62c0c0f87/backends/copy/rootfses/0d8eceac-721b-4a3c-a68e-34ecc30cd718
admin 2101  0.0  0.0  12728  2168 pts/4S+   02:31   0:00 grep mesos
admin@ip-172-31-12-173:/var/lib/mesos/slaves/70a7875b-aecc-43f0-8aea-2a239d4e97da-S0/frameworks$
 sudo nsenter -t 2058 -m cat /etc/alpine-release
3.4.3
admin@ip-172-31-12-173:/var/lib/mesos/slaves/70a7875b-aecc-43f0-8aea-2a239d4e97da-S0/frameworks$
 ls /etc/
adduser.confcloud   deluser.conf  grub.d   
initramfs-tools  ld.so.cache lvm mke2fs.conf opt 
python2.7  rcS.d sgml   subuid-  udev
adjtime cron.d  dhcp  gshadow  inputrc  
ld.so.conf  machine-id  modprobe.d  os-release  python3
resolv.conf   shadow sudoers  ufw
alternativescron.daily  dkms  gshadow- insserv  
ld.so.conf.dmagic   modules pam.confpython3.4  rmt  
 shadow-sudoers.dvim
apt cron.hourly dpkg  gss  insserv.conf 
libaudit.conf   magic.mime  modules-load.d  pam.d   rc0.d  rpc  
 shells sysconfigwgetrc
bash.bashrc cron.monthlyemacs host.conf
insserv.conf.d   locale.aliasmailcap motdpasswd  
rc1.d  rsyslog.conf  skel   sysctl.conf  xdg
bash_completion.d   crontab environment   hostname iproute2 
locale.gen  mailcap.order   mtabpasswd- rc2.d  
rsyslog.d sshsysctl.d xml
bindresvport.blacklist  cron.weekly fstab hostsissue
localtime   manpath.config  nanorc  perlrc3.d  
screenrc  sslsystemd
binfmt.ddbus-1  gai.conf  hosts.allow  issue.net
logcheckmesos   network profile rc4.d  
securetty staff-group-for-usr-local  terminfo
ca-certificates debconf.confgroff hosts.deny   java 
login.defs  mesos-masternetworksprofile.d   rc5.d  
security  subgid timezone
ca-certificates.confdebian_version  group 

[jira] [Commented] (MESOS-6060) Add MOUNT or PATH disk type in logging resources.

2016-10-12 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570567#comment-15570567
 ] 

Anindya Sinha commented on MESOS-6060:
--

commit 5e1c6814915756ae71f6ae612826de7e494fc481
Author: Anindya Sinha 
Date:   Wed Sep 14 17:34:49 2016 -0700

Added MOUNT or PATH disk type info when logging resources.

i.e., Added DiskInfo::Source info when outputting Resource::DiskInfo.

Review: https://reviews.apache.org/r/51517/

> Add MOUNT or PATH disk type in logging resources.
> -
>
> Key: MESOS-6060
> URL: https://issues.apache.org/jira/browse/MESOS-6060
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>  Labels: persistent-volumes, resource
> Fix For: 1.1.0
>
>
> While logging persistent volume disk resources, we should also log the source 
> (if present), ie. if the disk type is MOUNT or PATH, and the corresponding 
> root point. This would be helpful when the agent has multiple disk resources 
> and having this information in the log would help identifying the disk in 
> various scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6312) Add requirement in upgrade.md and getting-started.md for agent '--runtime_dir' in when running as non-root

2016-10-12 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570529#comment-15570529
 ] 

Kevin Klues commented on MESOS-6312:


https://reviews.apache.org/r/52787/
https://reviews.apache.org/r/52814

> Add requirement in upgrade.md and getting-started.md for agent 
> '--runtime_dir' in when running as non-root
> --
>
> Key: MESOS-6312
> URL: https://issues.apache.org/jira/browse/MESOS-6312
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Blocker
>
> We recently introduced a new agent flag for {{\-\-runtime_dir}}. Unlike the 
> {{\-\-work_dir}}, this directory is designed to hold the state of a running 
> agent between subsequent agent-restarts (but not across host reboots).
> By default, this flag is set to {{/var/run/mesos}} since this is a {{tempfs}} 
> on linux that gets automatically cleaned up on reboot. However, on most 
> systems {{/var/run/mesos}} is only writable by root, causing problems when 
> launching an agent as non-root and not pointing {{--runtime_dir}} to a 
> different location.
> We need to call this out in the upgrade.md and getting-started.md docs so 
> that people know they may need to set this going forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6336) SlaveTest.KillTaskGroupBetweenRunTaskParts is flaky

2016-10-12 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570475#comment-15570475
 ] 

Vinod Kone commented on MESOS-6336:
---

Looking at the test code, one potential issue I can see is that we are calling 
a method (`unmocked__run`) of a libprocess actor (MockSlave) directly from the 
test code instead of dispatching to it. If two different threads are 
simultaneously accessing the `Slave` object thus causing seg fault.

Not sure why this is happening inside `Slave::finalize()` though because AFAICT 
`unmocked__run` should've been completely done (i.e., removed framework from 
the `frameworks` map) by the time `finalize()` gets called.

> SlaveTest.KillTaskGroupBetweenRunTaskParts is flaky
> ---
>
> Key: MESOS-6336
> URL: https://issues.apache.org/jira/browse/MESOS-6336
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Greg Mann
>  Labels: mesosphere
>
> The test {{SlaveTest.KillTaskGroupBetweenRunTaskParts}} sometimes segfaults 
> during the agent's {{finalize()}} method. This was observed on our internal 
> CI, on Fedora with libev, without SSL:
> {code}
> [ RUN  ] SlaveTest.KillTaskGroupBetweenRunTaskParts
> I1007 14:12:57.973811 28630 cluster.cpp:158] Creating default 'local' 
> authorizer
> I1007 14:12:57.982128 28630 leveldb.cpp:174] Opened db in 8.195028ms
> I1007 14:12:57.982599 28630 leveldb.cpp:181] Compacted db in 446238ns
> I1007 14:12:57.982616 28630 leveldb.cpp:196] Created db iterator in 3650ns
> I1007 14:12:57.982622 28630 leveldb.cpp:202] Seeked to beginning of db in 
> 451ns
> I1007 14:12:57.982627 28630 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 352ns
> I1007 14:12:57.982638 28630 replica.cpp:776] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1007 14:12:57.983024 28645 recover.cpp:451] Starting replica recovery
> I1007 14:12:57.983127 28651 recover.cpp:477] Replica is in EMPTY status
> I1007 14:12:57.983459 28644 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from __req_res__(6234)@172.30.2.161:38776
> I1007 14:12:57.983543 28651 recover.cpp:197] Received a recover response from 
> a replica in EMPTY status
> I1007 14:12:57.983680 28650 recover.cpp:568] Updating replica status to 
> STARTING
> I1007 14:12:57.983990 28648 master.cpp:380] Master 
> 76d4d55f-dcc6-4033-85d9-7ec97ef353cb 
> (ip-172-30-2-161.ec2.internal.mesosphere.io) started on 172.30.2.161:38776
> I1007 14:12:57.984007 28648 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/rVbcaO/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
> --registry_max_agent_count="102400" --registry_store_timeout="100secs" 
> --registry_strict="false" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/rVbcaO/master" --zk_session_timeout="10secs"
> I1007 14:12:57.984127 28648 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1007 14:12:57.984134 28648 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1007 14:12:57.984139 28648 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1007 14:12:57.984143 28648 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/rVbcaO/credentials'
> I1007 14:12:57.988487 28648 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1007 14:12:57.988530 28648 http.cpp:883] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1007 14:12:57.988585 28648 http.cpp:883] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I1007 14:12:57.988648 28648 http.cpp:883] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I1007 14:12:57.988680 28648 master.cpp:584] Authorization enabled
> I1007 14:12:57.988757 28650 whitelist_watcher.cpp:77] No 

[jira] [Updated] (MESOS-6365) Agent secrets for executor authentication and fetching

2016-10-12 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6365:
-
Summary: Agent secrets for executor authentication and fetching  (was: Pass 
credentials to the agent for executor authentication and fetching)

> Agent secrets for executor authentication and fetching
> --
>
> Key: MESOS-6365
> URL: https://issues.apache.org/jira/browse/MESOS-6365
> Project: Mesos
>  Issue Type: Epic
>  Components: slave
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Three features are currently driving the need for a mechanism to pass 
> secrets/credentials from the master to the agent:
> * HTTP executor authentication
> * Container image fetching
> * Artifact fetching
> We currently provide the ability to authenticate with a Docker registry, but 
> the credentials used for this may only be set once on the agent via a 
> command-line flag. Allowing operators to specify a Docker credential on a 
> per-task basis requires a secret-passing mechanism.
> We should design and implement a method for passing secrets that will work in 
> all three of these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4217) Mesos sandbox UI doesn't follow symlinks

2016-10-12 Thread Mohit Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570209#comment-15570209
 ] 

Mohit Soni commented on MESOS-4217:
---

[~zmanji] and [~haosd...@gmail.com] Yeah, this was fixed sometime earlier this 
year. Although, this issue was never closed. I'm going to close this now.

> Mesos sandbox UI doesn't follow symlinks
> 
>
> Key: MESOS-4217
> URL: https://issues.apache.org/jira/browse/MESOS-4217
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Mohit Soni
>Priority: Minor
>
> Current Mesos sandbox UI doesn't follow symlinks. Right now this prevents a 
> user to browse a persistent volume, which is symlinked inside the sandbox 
> directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6363) Default executor should not crash with a failed assertion if it notices a disconnection from the agent for non checkpointed frameworks.

2016-10-12 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6363:
--
Target Version/s:   (was: 1.1.0)

> Default executor should not crash with a failed assertion if it notices a 
> disconnection from the agent for non checkpointed frameworks.
> ---
>
> Key: MESOS-6363
> URL: https://issues.apache.org/jira/browse/MESOS-6363
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> If the executor library detects a disconnection for non-checkpointed 
> frameworks, it injects a {{SHUTDOWN}} event. For checkpointed frameworks, it 
> injects the {{SHUTDOWN}} event post the recovery timeout. In both these 
> cases, the default executor would die with a failed assertion in the 
> {{shutdown()}} handler:
> {code}
> CHECK_EQ(SUBSCRIBED, state);
> {code}
> The executor should commit suicide in both these cases with a successful 
> status code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6035) Add non-recursive version of cgroups::get

2016-10-12 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6035:
--
Shepherd: Yan Xu

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2449) Support group of tasks (Pod) constructs and API in Mesos.

2016-10-12 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2449:
--
Shepherd: Vinod Kone

> Support group of tasks (Pod) constructs and API in Mesos.
> -
>
> Key: MESOS-2449
> URL: https://issues.apache.org/jira/browse/MESOS-2449
> Project: Mesos
>  Issue Type: Epic
>Reporter: Timothy Chen
>  Labels: mesosphere
>
> There is a common need among different frameworks, that wants to start a 
> group of tasks that are either depend or co-located with each other.
> Although a framework can schedule individual tasks within the same offer and 
> slave id, it doesn't have a way to describe dependencies, failure policies 
> (if one of the task failed), network setup, and group container information, 
> etc.
> Want to create a epic to start the discussion around the requirements folks 
> need, and see where we can lead this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6283) Fix the Web UI allowing access to the task sandbox for nested containers.

2016-10-12 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6283:
--
Shepherd: Vinod Kone

> Fix the Web UI allowing access to the task sandbox for nested containers.
> -
>
> Key: MESOS-6283
> URL: https://issues.apache.org/jira/browse/MESOS-6283
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: Anand Mazumdar
>Assignee: haosdent
>Priority: Blocker
>  Labels: mesosphere
> Attachments: sandbox.gif
>
>
> Currently, the sandbox button for a child task is broken on the WebUI. It 
> does nothing and dies with an error that the executor for this task cannot be 
> found. We need to fix the WebUI to follow the symlink "tasks/taskId" and 
> display the task sandbox to the users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5262) slave::statusUpdate() should always ACK the status update sent by executor

2016-10-12 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-5262:
--
Target Version/s:   (was: 1.1.0)

> slave::statusUpdate() should always ACK the status update sent by executor
> --
>
> Key: MESOS-5262
> URL: https://issues.apache.org/jira/browse/MESOS-5262
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Qian Zhang
>  Labels: mesosphere
>
> In {{slave::statusUpdate()}}, there are several scenarios where agent ignores 
> a status update sent by the executor and will not send an ACK to the 
> executor, e.g.:
> When agent receives a {{ShutdownFrameworkMessage}} from master, 
> {{slave::shutdownFramework()}} will be called. In this method, we will set 
> the state of framework to {{TERMINATING}} and send a 
> {{ShutdownExecutorMessage}} to executor. When executor receives that message, 
> usually it will kill the tasks and send {{TASK_KILLED}} status update to 
> agent. Agent will receive that message and handle it in 
> {{slave::statusUpdate()}}, but according to the currently logic of 
> {{slave::statusUpdate()}}, it will ignore the status update if the state of 
> framework is {{TERMINATING}}, so agent will not send an ACK to the executor.
> Since executor may rely on the ACK (e.g., HTTP command executor relies on the 
> ACK of a terminal status update to terminate itself), we should enhance it 
> such that executor can always get an ACK for the status update sent by it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6290) Support nested containers for logger in Mesos Containerizer.

2016-10-12 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6290:

Shepherd: Joseph Wu

> Support nested containers for logger in Mesos Containerizer.
> 
>
> Key: MESOS-6290
> URL: https://issues.apache.org/jira/browse/MESOS-6290
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: containerizer, logger, mesosphere
> Fix For: 1.1.0
>
>
> Currently, there are two issues in mesos containerizer using logger for 
> nested contaienrs:
> 1. An empty executorinfo is passed to logger when launching a nested 
> container, it would potentially break some logger modules if any module tries 
> to access the required proto field (e.g., executorId).
> 2. The logger does not reocver the nested containers yet in 
> MesosContainerizer::recover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6323) 'mesos-containerizer launch' should inherit agent environment variables.

2016-10-12 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569941#comment-15569941
 ] 

Joseph Wu commented on MESOS-6323:
--

{code}
commit 8c8ec608503394575a4f99fd725010b8920e5efa
Author: Joseph Wu 
Date:   Wed Oct 12 11:57:18 2016 -0700

Windows: Implemented os::execvpe with _spawnvpe.

Review: https://reviews.apache.org/r/52798
{code}

> 'mesos-containerizer launch' should inherit agent environment variables.
> 
>
> Key: MESOS-6323
> URL: https://issues.apache.org/jira/browse/MESOS-6323
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Critical
> Fix For: 1.1.0
>
>
> If some dynamic libraries that agent depends on are stored in a non standard 
> location, and the operator starts the agent using LD_LIBRARY_PATH. When we 
> actually fork/exec the 'mesos-containerizer launch' helper, we need to make 
> sure it inherits agent's environment variables. Otherwise, it might throw 
> linking errors. This makes sense because it's a Mesos controlled process.
> However, the the helper actually fork/exec the user container (or executor), 
> we need to make sure to strip the agent environment variables.
> The tricky case is for default executor and command executor. These two are 
> controlled by Mesos as well, we also want them to have agent environment 
> variables. We need to somehow distinguish this from custom executor case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6332) Don't send TASK_LOST in the agent

2016-10-12 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569672#comment-15569672
 ] 

Neil Conway commented on MESOS-6332:


https://reviews.apache.org/r/52803/

> Don't send TASK_LOST in the agent
> -
>
> Key: MESOS-6332
> URL: https://issues.apache.org/jira/browse/MESOS-6332
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> The agent sends {{TASK_LOST}} to handle various error situations. For 
> partition-aware frameworks, we should not send {{TASK_LOST}} -- we should 
> send a more specific {{TaskState}}, depending on the exact circumstances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6378) Error downloading docker images using mesos-execute cli

2016-10-12 Thread Aniket Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569460#comment-15569460
 ] 

Aniket Bhat edited comment on MESOS-6378 at 10/12/16 6:27 PM:
--

More verbose logging for the issue at hand.
{code}
[root@host-62-214 mesos]# sudo mesos-execute --command=/bin/bash 
--docker_image=ubuntu:latest --master=172.22.62.215:5050 --name="yeah" 
--containerizer=mesos
I1012 18:35:53.960327 27742 scheduler.cpp:172] Version: 1.0.1
I1012 18:35:53.964630 27748 scheduler.cpp:461] New master detected at 
master@172.22.62.215:5050
Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0041'
Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S11'
Received status update TASK_FAILED for task 'yeah'
  message: 'Failed to launch container: Failed to decode HTTP responses: No 
response decoded
HTTP/1.1 200 Connection established

HTTP/1.1 401 Unauthorized
Content-Type: application/json; charset=utf-8
Docker-Distribution-Api-Version: registry/2.0
Www-Authenticate: Bearer 
realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/ubuntu:pull;
Date: Wed, 12 Oct 2016 18:25:40 GMT
Content-Length: 146
Strict-Transport-Security: max-age=31536000

{"errors":[{"code":"UNAUTHORIZED","message":"authentication 
required","detail":[{"Type":"repository","Name":"library/ubuntu","Action":"pull"}]}]}
; Container destroyed while provisioning images'
  source: SOURCE_AGENT
  reason: REASON_CONTAINER_LAUNCH_FAILED
{code}


was (Author: abhat):
More verbose logging for the issue at hand.

> Error downloading docker images using mesos-execute cli
> ---
>
> Key: MESOS-6378
> URL: https://issues.apache.org/jira/browse/MESOS-6378
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, fetcher
>Affects Versions: 1.0.1
>Reporter: Aniket Bhat
> Attachments: mesos-slave.INFO, mesos-slave.log
>
>
> When using mesos-execute cli with mesos containerizer to spawn a docker 
> image, the curl for the image fails. 
> {code}
> [root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
> --docker_image=library/ubuntu:latest --master=172.22.62.215:5050 
> --name="yeah" --containerizer=mesos
> I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
> I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
> master@172.22.62.215:5050
> Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
> Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
> Received status update TASK_FAILED for task 'yeah'
>   message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
> Failed connect to registry-1.docker.io:443; Operation now in progress
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Mesos-slave args:
> {code}
> /usr/sbin/mesos-slave --master=zk://172.22.62.215:2181/mesos 
> --log_dir=/var/log/mesos --containerizers=mesos,docker 
> --executor_registration_timeout=5mins --image_providers=appc,docker 
> --isolation=filesystem/linux,docker/runtime --work_dir=/var/lib/mesos
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6378) Error downloading docker images using mesos-execute cli

2016-10-12 Thread Aniket Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Bhat updated MESOS-6378:
---
Attachment: mesos-slave.log

More verbose logging for the issue at hand.

> Error downloading docker images using mesos-execute cli
> ---
>
> Key: MESOS-6378
> URL: https://issues.apache.org/jira/browse/MESOS-6378
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, fetcher
>Affects Versions: 1.0.1
>Reporter: Aniket Bhat
> Attachments: mesos-slave.INFO, mesos-slave.log
>
>
> When using mesos-execute cli with mesos containerizer to spawn a docker 
> image, the curl for the image fails. 
> {code}
> [root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
> --docker_image=library/ubuntu:latest --master=172.22.62.215:5050 
> --name="yeah" --containerizer=mesos
> I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
> I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
> master@172.22.62.215:5050
> Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
> Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
> Received status update TASK_FAILED for task 'yeah'
>   message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
> Failed connect to registry-1.docker.io:443; Operation now in progress
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Mesos-slave args:
> {code}
> /usr/sbin/mesos-slave --master=zk://172.22.62.215:2181/mesos 
> --log_dir=/var/log/mesos --containerizers=mesos,docker 
> --executor_registration_timeout=5mins --image_providers=appc,docker 
> --isolation=filesystem/linux,docker/runtime --work_dir=/var/lib/mesos
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6237) Agent Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-10-12 Thread Benno Evers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569389#comment-15569389
 ] 

Benno Evers edited comment on MESOS-6237 at 10/12/16 5:46 PM:
--

So, one place that definitely needs to be fixed is in master/http/http.cpp:

Try hostname = info.has_hostname()
  ? info.hostname()
  : net::getHostname(net::IP(ntohl(info.ip(;

However, this shouldn't affect the agent display if I understand the code 
correctly.

Can I ask how you are getting a raw IP displayed in the mesos UI anyways? I 
found it hard to start an agent for testing purposes without mesos figuring out 
the hostname automatically, 



was (Author: bennoe):
Hm, one place that definitely needs to be fixed is in master/http/http.cpp:

Try hostname = info.has_hostname()
  ? info.hostname()
  : net::getHostname(net::IP(ntohl(info.ip(;

However, this shouldn't affect the agent display if I understand the code 
correctly.

Can I ask how you are getting a raw IP displayed in the mesos UI anyways? I 
found it hard to start an agent for testing purposes without mesos figuring out 
the hostname automatically, 


> Agent Sandbox inaccessible when using IPv6 address in patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6237
> URL: https://issues.apache.org/jira/browse/MESOS-6237
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>Assignee: Benno Evers
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9
> When using IPs instead of hostnames the Agent Sandbox is inaccessible in the 
> Web UI. The problem seems to be that there's no brackets around the IP so it 
> tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
> http://[2001:41d0:1000:ab9::]:5051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6237) Agent Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-10-12 Thread Benno Evers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569389#comment-15569389
 ] 

Benno Evers commented on MESOS-6237:


Hm, one place that definitely needs to be fixed is in master/http/http.cpp:

Try hostname = info.has_hostname()
  ? info.hostname()
  : net::getHostname(net::IP(ntohl(info.ip(;

However, this shouldn't affect the agent display if I understand the code 
correctly.

Can I ask how you are getting a raw IP displayed in the mesos UI anyways? I 
found it hard to start an agent for testing purposes without mesos figuring out 
the hostname automatically, 


> Agent Sandbox inaccessible when using IPv6 address in patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6237
> URL: https://issues.apache.org/jira/browse/MESOS-6237
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>Assignee: Benno Evers
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9
> When using IPs instead of hostnames the Agent Sandbox is inaccessible in the 
> Web UI. The problem seems to be that there's no brackets around the IP so it 
> tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
> http://[2001:41d0:1000:ab9::]:5051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6140) Add a parallel test runner

2016-10-12 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569352#comment-15569352
 ] 

Till Toenshoff edited comment on MESOS-6140 at 10/12/16 5:45 PM:
-

Can we please add some more documentation around this? I mean specifically for 
manually invoking it -- e.g. people trying to run the tests in parallel outside 
of make check. Hope that makes sense...



was (Author: tillt):
Can we please add some more documentation around this?


> Add a parallel test runner
> --
>
> Key: MESOS-6140
> URL: https://issues.apache.org/jira/browse/MESOS-6140
> Project: Mesos
>  Issue Type: Improvement
>  Components: tests
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> In order to allow parallelization of the test execution we should add a 
> parallel test executor to Mesos, and subsequently activate it in the build 
> setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6238) SSL / libevent support broken in IPv6 patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-10-12 Thread Benno Evers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569370#comment-15569370
 ] 

Benno Evers commented on MESOS-6238:


Hm, the `url` seems pretty random. I can't remember putting it there for a 
specific reason, so I guess its some merge artifact from a previous revision.

I pushed a new commit to github (d2d122ab057c93e9136577db5030f9976eb623c3) 
which fixes this issue, at least for me mesos now builds with --enable-ssl on 
ubuntu trusty  and xenial.

> SSL / libevent support broken in IPv6 patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6238
> URL: https://issues.apache.org/jira/browse/MESOS-6238
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>Assignee: Benno Evers
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 
> make fails when configure options --enable-ssl --enable-libevent were given.
> Error message:
> {noformat}
> ...
> ...
> ../../../3rdparty/libprocess/src/process.cpp: In member function ‘void 
> process::SocketManager::link_connect(const process::Future&, 
> process::network::Socket, const process::UPID&)’:
> ../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not 
> declared in this scope
>Try ip = url.ip;
>  ^
> Makefile:997: recipe for target 'libprocess_la-process.lo' failed
> make[5]: *** [libprocess_la-process.lo] Error 1
> ...
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6378) Error downloading docker images using mesos-execute cli

2016-10-12 Thread Aniket Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Bhat updated MESOS-6378:
---
Attachment: mesos-slave.INFO

Uploading the mesos-slave.INFO file with GLOG_v set to 1 for the 
mesos-slave/agent.

> Error downloading docker images using mesos-execute cli
> ---
>
> Key: MESOS-6378
> URL: https://issues.apache.org/jira/browse/MESOS-6378
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, fetcher
>Affects Versions: 1.0.1
>Reporter: Aniket Bhat
> Attachments: mesos-slave.INFO
>
>
> When using mesos-execute cli with mesos containerizer to spawn a docker 
> image, the curl for the image fails. 
> {code}
> [root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
> --docker_image=library/ubuntu:latest --master=172.22.62.215:5050 
> --name="yeah" --containerizer=mesos
> I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
> I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
> master@172.22.62.215:5050
> Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
> Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
> Received status update TASK_FAILED for task 'yeah'
>   message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
> Failed connect to registry-1.docker.io:443; Operation now in progress
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Mesos-slave args:
> {code}
> /usr/sbin/mesos-slave --master=zk://172.22.62.215:2181/mesos 
> --log_dir=/var/log/mesos --containerizers=mesos,docker 
> --executor_registration_timeout=5mins --image_providers=appc,docker 
> --isolation=filesystem/linux,docker/runtime --work_dir=/var/lib/mesos
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6378) Error downloading docker images using mesos-execute cli

2016-10-12 Thread Aniket Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Bhat updated MESOS-6378:
---
Description: 
When using mesos-execute cli with mesos containerizer to spawn a docker image, 
the curl for the image fails. 

{code}
[root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
--docker_image=library/ubuntu:latest --master=172.22.62.215:5050 --name="yeah" 
--containerizer=mesos
I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
master@172.22.62.215:5050
Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
Received status update TASK_FAILED for task 'yeah'
  message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
Failed connect to registry-1.docker.io:443; Operation now in progress
; Container destroyed while provisioning images'
  source: SOURCE_AGENT
  reason: REASON_CONTAINER_LAUNCH_FAILED
{code}

Mesos-slave args:

{code}
/usr/sbin/mesos-slave --master=zk://172.22.62.215:2181/mesos 
--log_dir=/var/log/mesos --containerizers=mesos,docker 
--executor_registration_timeout=5mins --image_providers=appc,docker 
--isolation=filesystem/linux,docker/runtime --work_dir=/var/lib/mesos
{code}


  was:
When using mesos-execute cli with mesos containerizer to spawn a docker image, 
the curl for the image fails. 

{code}
[root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
--docker_image=library/ubuntu:latest --master=172.22.62.215:5050 --name="yeah" 
--containerizer=mesos
I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
master@172.22.62.215:5050
Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
Received status update TASK_FAILED for task 'yeah'
  message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
Failed connect to registry-1.docker.io:443; Operation now in progress
; Container destroyed while provisioning images'
  source: SOURCE_AGENT
  reason: REASON_CONTAINER_LAUNCH_FAILED
{code}



> Error downloading docker images using mesos-execute cli
> ---
>
> Key: MESOS-6378
> URL: https://issues.apache.org/jira/browse/MESOS-6378
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, fetcher
>Affects Versions: 1.0.1
>Reporter: Aniket Bhat
>
> When using mesos-execute cli with mesos containerizer to spawn a docker 
> image, the curl for the image fails. 
> {code}
> [root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
> --docker_image=library/ubuntu:latest --master=172.22.62.215:5050 
> --name="yeah" --containerizer=mesos
> I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
> I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
> master@172.22.62.215:5050
> Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
> Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
> Received status update TASK_FAILED for task 'yeah'
>   message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
> Failed connect to registry-1.docker.io:443; Operation now in progress
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Mesos-slave args:
> {code}
> /usr/sbin/mesos-slave --master=zk://172.22.62.215:2181/mesos 
> --log_dir=/var/log/mesos --containerizers=mesos,docker 
> --executor_registration_timeout=5mins --image_providers=appc,docker 
> --isolation=filesystem/linux,docker/runtime --work_dir=/var/lib/mesos
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6378) Error downloading docker images using mesos-execute cli

2016-10-12 Thread Aniket Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Bhat updated MESOS-6378:
---
Description: 
When using mesos-execute cli with mesos containerizer to spawn a docker image, 
the curl for the image fails. 

{code}
[root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
--docker_image=library/ubuntu:latest --master=172.22.62.215:5050 --name="yeah" 
--containerizer=mesos
I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
master@172.22.62.215:5050
Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
Received status update TASK_FAILED for task 'yeah'
  message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
Failed connect to registry-1.docker.io:443; Operation now in progress
; Container destroyed while provisioning images'
  source: SOURCE_AGENT
  reason: REASON_CONTAINER_LAUNCH_FAILED
{code}


  was:
When using mesos-execute cli with mesos containerizer to spawn a docker image, 
the curl for the image fails. 

```
[root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
--docker_image=library/ubuntu:latest --master=172.22.62.215:5050 --name="yeah" 
--containerizer=mesos
I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
master@172.22.62.215:5050
Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
Received status update TASK_FAILED for task 'yeah'
  message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
Failed connect to registry-1.docker.io:443; Operation now in progress
; Container destroyed while provisioning images'
  source: SOURCE_AGENT
  reason: REASON_CONTAINER_LAUNCH_FAILED
```


> Error downloading docker images using mesos-execute cli
> ---
>
> Key: MESOS-6378
> URL: https://issues.apache.org/jira/browse/MESOS-6378
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, fetcher
>Affects Versions: 1.0.1
>Reporter: Aniket Bhat
>
> When using mesos-execute cli with mesos containerizer to spawn a docker 
> image, the curl for the image fails. 
> {code}
> [root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
> --docker_image=library/ubuntu:latest --master=172.22.62.215:5050 
> --name="yeah" --containerizer=mesos
> I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
> I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
> master@172.22.62.215:5050
> Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
> Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
> Received status update TASK_FAILED for task 'yeah'
>   message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
> Failed connect to registry-1.docker.io:443; Operation now in progress
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6378) Error downloading docker images using mesos-execute cli

2016-10-12 Thread Aniket Bhat (JIRA)
Aniket Bhat created MESOS-6378:
--

 Summary: Error downloading docker images using mesos-execute cli
 Key: MESOS-6378
 URL: https://issues.apache.org/jira/browse/MESOS-6378
 Project: Mesos
  Issue Type: Bug
  Components: docker, fetcher
Affects Versions: 1.0.1
Reporter: Aniket Bhat


When using mesos-execute cli with mesos containerizer to spawn a docker image, 
the curl for the image fails. 

```
[root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
--docker_image=library/ubuntu:latest --master=172.22.62.215:5050 --name="yeah" 
--containerizer=mesos
I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
master@172.22.62.215:5050
Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
Received status update TASK_FAILED for task 'yeah'
  message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
Failed connect to registry-1.docker.io:443; Operation now in progress
; Container destroyed while provisioning images'
  source: SOURCE_AGENT
  reason: REASON_CONTAINER_LAUNCH_FAILED
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6377) Complete all unit-tests required to strengthen test on CNI port-mapper plugin.

2016-10-12 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6377:
-
Epic Name: Complete the CNI port-mapper plugin  (was: Unit-tests for CNI 
port-mapper plugin)

> Complete all unit-tests required to strengthen test on CNI port-mapper plugin.
> --
>
> Key: MESOS-6377
> URL: https://issues.apache.org/jira/browse/MESOS-6377
> Project: Mesos
>  Issue Type: Epic
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> This epic captures all the unit-test tickets that we need to complete to get 
> better test-coverage for the CNI port-mapper plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6022) unit-test for adding port-mapping using ptp plugin

2016-10-12 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6022:
-
Summary: unit-test for adding port-mapping using ptp plugin  (was: 
unit-test for the port mapper plugin)

> unit-test for adding port-mapping using ptp plugin
> --
>
> Key: MESOS-6022
> URL: https://issues.apache.org/jira/browse/MESOS-6022
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Write unit-tests for the port mapper plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6377) Complete all unit-tests required to strengthen test on CNI port-mapper plugin.

2016-10-12 Thread Avinash Sridharan (JIRA)
Avinash Sridharan created MESOS-6377:


 Summary: Complete all unit-tests required to strengthen test on 
CNI port-mapper plugin.
 Key: MESOS-6377
 URL: https://issues.apache.org/jira/browse/MESOS-6377
 Project: Mesos
  Issue Type: Epic
 Environment: Linux
Reporter: Avinash Sridharan
Assignee: Avinash Sridharan


This epic captures all the unit-test tickets that we need to complete to get 
better test-coverage for the CNI port-mapper plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6312) Add requirement in upgrade.md and getting-started.md for agent '--runtime_dir' in when running as non-root

2016-10-12 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569119#comment-15569119
 ] 

haosdent commented on MESOS-6312:
-

It would be something like 
{code}
diff --git a/bin/mesos-local-flags.sh.in b/bin/mesos-local-flags.sh.in
index 5b4553a..97828ea 100644
--- a/bin/mesos-local-flags.sh.in
+++ b/bin/mesos-local-flags.sh.in
@@ -17,6 +17,7 @@
 # limitations under the License.

 export MESOS_WORK_DIR=/tmp/mesos
+export MESOS_RUNTIME_DIR=/tmp/mesos

 . @abs_top_builddir@/bin/mesos-master-flags.sh
 . @abs_top_builddir@/bin/mesos-agent-flags.sh
diff --git a/src/local/flags.hpp b/src/local/flags.hpp
index c77eff1..bf815a9 100644
--- a/src/local/flags.hpp
+++ b/src/local/flags.hpp
@@ -51,6 +51,14 @@ public:
 "(Example: `/var/lib/mesos`)",
 path::join(os::temp(), "mesos", "local"));

+add(::runtime_dir,
+"runtime_dir",
+"Path of the agent runtime directory. This is where runtime data\n"
+"is stored by an agent that it needs to persist across crashes (but\n"
+"not across reboots). This directory will be cleared on reboot.\n"
+"(Example: `/var/run/mesos`)",
+path::join(os::temp(), "mesos", "local"));
+
{code}

> Add requirement in upgrade.md and getting-started.md for agent 
> '--runtime_dir' in when running as non-root
> --
>
> Key: MESOS-6312
> URL: https://issues.apache.org/jira/browse/MESOS-6312
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Blocker
>
> We recently introduced a new agent flag for {{\-\-runtime_dir}}. Unlike the 
> {{\-\-work_dir}}, this directory is designed to hold the state of a running 
> agent between subsequent agent-restarts (but not across host reboots).
> By default, this flag is set to {{/var/run/mesos}} since this is a {{tempfs}} 
> on linux that gets automatically cleaned up on reboot. However, on most 
> systems {{/var/run/mesos}} is only writable by root, causing problems when 
> launching an agent as non-root and not pointing {{--runtime_dir}} to a 
> different location.
> We need to call this out in the upgrade.md and getting-started.md docs so 
> that people know they may need to set this going forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6014) Create a CNI plugin that provides port mapping functionality for various CNI plugins.

2016-10-12 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6014:
-
Target Version/s: 1.1.0  (was: 1.2.0)

> Create a CNI plugin that provides port mapping functionality for various CNI 
> plugins.
> -
>
> Key: MESOS-6014
> URL: https://issues.apache.org/jira/browse/MESOS-6014
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently there is no CNI plugin that supports port mapping. Given that the 
> unified containerizer is starting to become the de-facto container run time, 
> having  a CNI plugin that provides port mapping is a must have. This is 
> primarily required for support BRIDGE networking mode, similar to docker 
> bridge networking that users expect to have when using docker containers. 
> While the most obvious use case is that of using the port-mapper plugin with 
> the bridge plugin, the port-mapping functionality itself is generic and 
> should be usable with any CNI plugin that needs it.
> Keeping port-mapping as a CNI plugin gives operators the ability to use the 
> default port-mapper (CNI plugin) that Mesos provides, or use their own plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6312) Add requirement in upgrade.md and getting-started.md for agent '--runtime_dir' in when running as non-root

2016-10-12 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues reassigned MESOS-6312:
--

Assignee: Kevin Klues

> Add requirement in upgrade.md and getting-started.md for agent 
> '--runtime_dir' in when running as non-root
> --
>
> Key: MESOS-6312
> URL: https://issues.apache.org/jira/browse/MESOS-6312
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Blocker
>
> We recently introduced a new agent flag for {{\-\-runtime_dir}}. Unlike the 
> {{\-\-work_dir}}, this directory is designed to hold the state of a running 
> agent between subsequent agent-restarts (but not across host reboots).
> By default, this flag is set to {{/var/run/mesos}} since this is a {{tempfs}} 
> on linux that gets automatically cleaned up on reboot. However, on most 
> systems {{/var/run/mesos}} is only writable by root, causing problems when 
> launching an agent as non-root and not pointing {{--runtime_dir}} to a 
> different location.
> We need to call this out in the upgrade.md and getting-started.md docs so 
> that people know they may need to set this going forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6312) Add requirement in upgrade.md and getting-started.md for agent '--runtime_dir' in when running as non-root

2016-10-12 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569031#comment-15569031
 ] 

Kevin Klues commented on MESOS-6312:


I just assigned it to myself and will do it today.

> Add requirement in upgrade.md and getting-started.md for agent 
> '--runtime_dir' in when running as non-root
> --
>
> Key: MESOS-6312
> URL: https://issues.apache.org/jira/browse/MESOS-6312
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Blocker
>
> We recently introduced a new agent flag for {{\-\-runtime_dir}}. Unlike the 
> {{\-\-work_dir}}, this directory is designed to hold the state of a running 
> agent between subsequent agent-restarts (but not across host reboots).
> By default, this flag is set to {{/var/run/mesos}} since this is a {{tempfs}} 
> on linux that gets automatically cleaned up on reboot. However, on most 
> systems {{/var/run/mesos}} is only writable by root, causing problems when 
> launching an agent as non-root and not pointing {{--runtime_dir}} to a 
> different location.
> We need to call this out in the upgrade.md and getting-started.md docs so 
> that people know they may need to set this going forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6376) Add documentation for capabilities support of the mesos containerizer

2016-10-12 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-6376:
---

 Summary: Add documentation for capabilities support of the mesos 
containerizer
 Key: MESOS-6376
 URL: https://issues.apache.org/jira/browse/MESOS-6376
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5384) Improve error message for missing resources file

2016-10-12 Thread John Yost (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568676#comment-15568676
 ] 

John Yost commented on MESOS-5384:
--

Hi Everyone,

>From a user's standpoint, the key is better error handling. If there is a
place within the logic to determine if the json loading fails due to a file
path issue, that would be all that's needed.

Thanks for looking into this!

--John

On Tue, Oct 11, 2016 at 5:02 PM, Benjamin Mahler (JIRA) 



> Improve error message for missing resources file
> 
>
> Key: MESOS-5384
> URL: https://issues.apache.org/jira/browse/MESOS-5384
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.28.1
> Environment: Centos 7
>Reporter: John Yost
>Assignee: Kris Paprocki
>Priority: Minor
>  Labels: easyfix, newbie
>
> Attempting to specify resources file via 
> --resources=/etc/mesos-slave/small-slave-config.json threw the following 
> error:
> Failed to determine slave resources: Bad value for resources, missing or 
> extra ':' in /etc/mesos-slave/small-slave-config.json
> I confirmed I had valid JSON: 
> [
>   {
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
>   "value": 0.5
> }
>   },
>   {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
>   "value": 512
> }
>   }
> ]
> In actuality, I misread to docs with my file pattern. Once I changed to 
> resources=file:///etc/mesos-slave/small-slave-config.json the mesos slave 
> started up fine. Just need a missing file check and corresponding error 
> message to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6360) The handling of whiteout files in provisioner is not correct

2016-10-12 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568646#comment-15568646
 ] 

Qian Zhang commented on MESOS-6360:
---

Here is a specific issue that I found for the current implementation of 
whiteout handling. Suppose I build a Docker image from the following Dockerfile:
{code}
FROM cirros
RUN touch /opt/data
RUN rm -rf /opt/data
RUN echo yes > /opt/data
{code}

And I launch a container from that image via Mesos using overlay provisioner 
backend, then I found there is NO {{data}} file under {{/opt}} in the 
container, but if I launch a Docker container from that image via Docker 
engine, then I found there is the {{data}} file under {{/opt}} in the container 
and its content is {{yes}}. Obviously the result of Docker container is correct.

The root cause of this issue is the current way that we handle the whiteout 
file generated by the line 3 in the above Dockerfile is not correct: we always 
handle the whiteout files after the whole rootfs for the container has been 
provisioned, but that will lead to wrongly deleting a file which should not be 
deleted.

> The handling of whiteout files in provisioner is not correct
> 
>
> Key: MESOS-6360
> URL: https://issues.apache.org/jira/browse/MESOS-6360
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>Priority: Blocker
>
> Currently when user launches a container from a Docker image via universal 
> containerizer, we always handle the whiteout files in 
> {{ProvisionerProcess::__provision()}} regardless of which backend is used.
> However this is actually not correct, because the way to handle whiteout 
> files is backend dependent, that means for different backends, we need to 
> handle whiteout files in different ways, e.g.:
> * AUFS backend: It seems the AUFS whiteout ({{.wh.}} and 
> {{.wh..wh..opq}}) is the whiteout standard in Docker (see [this comment | 
> https://github.com/docker/docker/blob/v1.12.1/pkg/archive/archive.go#L259:L262]
>  for details), so that means after the Docker image is pulled, its whiteout 
> files in the store are already in aufs format, then we do not need to do 
> anything about whiteout file handling because the aufs mount done in 
> {{AufsBackendProcess::provision()}} will handle it automatically.
> * Overlay backend: Overlayfs has its own whiteout files (see [this doc | 
> https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt] for 
> details), so we need to convert the aufs whiteout files to overlayfs whiteout 
> files before we do the overlay mount in {{OverlayBackendProcess::provision}} 
> which will automatically handle the overlayfs whiteout files.
> * Copy backend: We need to manually handle the aufs whiteout files when we 
> copy each layer in {{CopyBackendProcess::_provision()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6290) Support nested containers for logger in Mesos Containerizer.

2016-10-12 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6290:
---
Priority: Blocker  (was: Major)

> Support nested containers for logger in Mesos Containerizer.
> 
>
> Key: MESOS-6290
> URL: https://issues.apache.org/jira/browse/MESOS-6290
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: containerizer, logger, mesosphere
>
> Currently, there are two issues in mesos containerizer using logger for 
> nested contaienrs:
> 1. An empty executorinfo is passed to logger when launching a nested 
> container, it would potentially break some logger modules if any module tries 
> to access the required proto field (e.g., executorId).
> 2. The logger does not reocver the nested containers yet in 
> MesosContainerizer::recover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4766) Improve allocator performance.

2016-10-12 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu updated MESOS-4766:
---
Target Version/s: 1.2.0

> Improve allocator performance.
> --
>
> Key: MESOS-4766
> URL: https://issues.apache.org/jira/browse/MESOS-4766
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>Priority: Critical
>
> This is an epic to track the various tickets around improving the performance 
> of the allocator, including the following:
> * Preventing un-necessary backup of the allocator.
> * Reducing the cost of allocations and allocator state updates.
> * Improving performance of the DRF sorter.
> * More benchmarking to simulate scenarios with performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6312) Add requirement in upgrade.md and getting-started.md for agent '--runtime_dir' in when running as non-root

2016-10-12 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567935#comment-15567935
 ] 

Till Toenshoff commented on MESOS-6312:
---

[~klueska] sry - I messed up the ping ^^

> Add requirement in upgrade.md and getting-started.md for agent 
> '--runtime_dir' in when running as non-root
> --
>
> Key: MESOS-6312
> URL: https://issues.apache.org/jira/browse/MESOS-6312
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Priority: Blocker
>
> We recently introduced a new agent flag for {{\-\-runtime_dir}}. Unlike the 
> {{\-\-work_dir}}, this directory is designed to hold the state of a running 
> agent between subsequent agent-restarts (but not across host reboots).
> By default, this flag is set to {{/var/run/mesos}} since this is a {{tempfs}} 
> on linux that gets automatically cleaned up on reboot. However, on most 
> systems {{/var/run/mesos}} is only writable by root, causing problems when 
> launching an agent as non-root and not pointing {{--runtime_dir}} to a 
> different location.
> We need to call this out in the upgrade.md and getting-started.md docs so 
> that people know they may need to set this going forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6359) Ensure all sources files have a proper license header

2016-10-12 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567922#comment-15567922
 ] 

Till Toenshoff commented on MESOS-6359:
---

Also see [src-headers|https://www.apache.org/legal/src-headers.html].

> Ensure all sources files have a proper license header
> -
>
> Key: MESOS-6359
> URL: https://issues.apache.org/jira/browse/MESOS-6359
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>
> It seems that while we are relatively diligent in adding Apache License 
> headers to C++ headers and source files, this license header is absent in 
> many of the support scripts. This seems contrary to suggest Apache procedure, 
> https://www.apache.org/dev/apply-license.html.
> While we do have existing linter tooling to catch absent license headers in 
> {{support/mesos-style.py}}, would should make sure we apply the license 
> headers to as many file types as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6359) Ensure all sources files have a proper license header

2016-10-12 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6359:
--
Shepherd: Till Toenshoff

> Ensure all sources files have a proper license header
> -
>
> Key: MESOS-6359
> URL: https://issues.apache.org/jira/browse/MESOS-6359
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>
> It seems that while we are relatively diligent in adding Apache License 
> headers to C++ headers and source files, this license header is absent in 
> many of the support scripts. This seems contrary to suggest Apache procedure, 
> https://www.apache.org/dev/apply-license.html.
> While we do have existing linter tooling to catch absent license headers in 
> {{support/mesos-style.py}}, would should make sure we apply the license 
> headers to as many file types as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)