Re: [VOTE] Move Apache Mesos to Attic

2021-04-06 Thread Jie Yu
+1

It was great working with you all.

On Tue, Apr 6, 2021 at 7:13 AM Till Toenshoff  wrote:

>
>
> On 5. Apr 2021, at 19:58, Vinod Kone  wrote:
>
> Hi folks,
>
> Based on the recent conversations
> 
> on our mailing list, it seems to me that the majority consensus among the
> existing PMC is to move the project to the attic
>  and let the interested community members
> collaborate on a fork in Github.
>
> I would like to call a vote to dissolve the PMC and move the project to
> the attic.
>
> Please reply to this thread with your vote. Only binding votes from
> PMC/committers count towards the final tally but everyone in the community
> is encouraged to vote. See process here
> .
>
> Thanks,
>
>
>
>
> +1 for move to attic.
>


Re: Isolator network/port_mapping in mesos 1.7

2019-08-01 Thread Jie Yu
Marc, this is because it depends on a specific version of libnl3 which is
not by default, available on some supported Linux distro.

- Jie

On Tue, Jul 30, 2019 at 7:28 AM Marc Roos  wrote:

>
> Any specific reason why this is not compiled in?
>
>
> -Original Message-
> From: Marc Roos
> Sent: vrijdag 26 juli 2019 12:52
> To: user
> Subject: Isolator network/port_mapping in mesos 1.7
>
>
> Network/port_mapping is not compiled in by default. Not sure if this
> should be or not, but if the cli is going to be added[0] (when ? ;))
>
> E0726 12:42:33.062646  8828 main.cpp:483] EXIT with status 1: Failed to
> create a containerizer: Could not create MesosContainerizer: Unknown or
> unsupported isolator 'network/port_mapping
>
>
>
> [0]
> https://www.mail-archive.com/user@mesos.apache.org/msg10422.html
>
>
>
>
>


Re: Is chained cni networks supported in mesos 1.7

2019-07-24 Thread Jie Yu
No, not yet

On Wed, Jul 24, 2019 at 12:27 PM Marc Roos  wrote:

>
>
>
>
>
> This error message of course
> E0724 21:19:17.852210  1160 cni.cpp:330] Failed to parse CNI network
> configuration file '/etc/mesos-cni/93-chain.conflist': Protobuf parse
> failed: Missing required fields: typ
>
>
> -Original Message-
> Subject: Is chained cni networks supported in mesos 1.7
>
>
> I am getting this error, while I don not have problems using it with
> cnitool.
>
>  cni.cpp:330] Failed to parse CNI network configuration file
> '/etc/mesos-cni/93-chain-routing-overwrite.conflist.bak': Protobuf parse
> failed: Missing required fields: type
>
> [@ mesos-cni]# cat 93-chain.conflist
> {
>   "name": "test-chain",
>   "plugins": [{
> "type": "bridge",
> "bridge": "test-chain0",
> "isGateway": false,
> "isDefaultGateway": false,
> "ipMasq": false,
> "ipam": {
> "type": "host-local",
> "subnet": "10.15.15.0/24"
> }
> },
> {
>   "type": "portmap",
>   "capabilities": {"portMappings": true},
>   "snat": false
> }]
> }
>
>
> [@ mesos-cni]#  CNI_PATH="/usr/libexec/cni/"
> NETCONFPATH="/etc/mesos-cni" cnitool-0.5.2 add test-chain
> /var/run/netns/testing {
> "ip4": {
> "ip": "10.15.15.2/24",
> "gateway": "10.15.15.1"
> },
> "dns": {}
>
>
>


Re: How should I pass the cni_args / ip, injecting lables #5592 not working

2019-05-03 Thread Jie Yu
Mesos's port mapper CNI plugin uses delegate model. You can take a look
https://github.com/apache/mesos/tree/master/src/slave/containerizer/mesos/isolators/network/cni/plugins/port_mapper

You can check out the go sample CNI implementation if you prefer golang
https://github.com/containernetworking/plugins/tree/master/plugins/sample

- Jie

On Fri, May 3, 2019 at 3:18 PM Marc Roos  wrote:

>
> Hmm, yes was already thinking of changing the source, but a wrapper is
> indeed better. Anyone have already something like this lying around? So
> I do not need to start from scratch :)
>
>
>
>
> -Original Message-
> From: Jie Yu [mailto:yujie@gmail.com]
> Sent: zaterdag 4 mei 2019 0:14
> To: user
> Subject: Re: How should I pass the cni_args / ip, injecting lables #5592
> not working
>
> ah. I think one workaround I can think of is to write a wrapper CNI
> plugin that understand args."org.apache.mesos", and set "cni.ips"
> properly for the macvtap plugin.
>
>
> This is a common pattern in CNI called delegation before CNI chaining
> was proposed.
>
> - Jie
>
> On Fri, May 3, 2019 at 11:14 AM Marc Roos 
> wrote:
>
>
>
>
>  >  Yet the cni plugin is afaik not even looking at the
> args.'org.apache.mesos'.network_info configuration,
>  >  for the cni_args it looks e.g. at args.cni.ips (or an
> environment
> variable)
>  >
>  >
>  >That depends on the CNI plugin you're using. CNI is a spec
> between CO
> (container orchestration system
>  > like K8s, mesos, etc.) and NP (network providers like calico,
> cisco,
> juniper, etc.).
>  >
>  >`args` field is the place where CO can inject CO specific
> information.
> Some CNI plugin might use those CO
>  > specific information to perform some special operations.
> Although I
> don't like it, this is just how the
>  > spec has evolved. The use of CNI_ARGS has been deprecated
>
> <
> https://github.com/containernetworking/cni/blob/master/CONVENTIONS.md#cni_args
> >
>  > in favor of using `args` field.
>  >
>  >
>  >  Or is it possible to reference in the cni network json
> config
> files a key of args.'org.apache.mesos'.network_info?
>  >
>  >
>  >I don't really follow this. Can you state your use case?
>
> Basically I want to assign a static ip via mesos/marathon. The
> only
> way
> I can get this to work now is via
>  the cni network configuration something like this. But I do not
> want to
> start creating a cni network
>  config file for every app.
>
>
> {
>   "name": "test-macvtap",
>   "type": "macvtap",
>   "master": "eth1",
>   "hostrouteif": "macvtap1",
>   "ipam": {
> "type": "host-local",
> "subnet": "192.168.122.0/24",
> "rangeStart": "192.168.122.171",
> "rangeEnd": "192.168.122.179",
> "routes": [ { "dst": "192.168.122.22/32", "gw": "0.0.0.0" },
> { "dst": "192.168.10.10/32", "gw": "0.0.0.0" },
> { "dst": "192.168.10.22/32", "gw": "0.0.0.0" }]
>   },
>   "dns": { "nameservers": ["192.168.10.10"] },
>   "args": {
> "cni": { "ips": ["192.168.122.177"] }
> }
>
>
>
>
>
>


Re: How should I pass the cni_args / ip, injecting lables #5592 not working

2019-05-03 Thread Jie Yu
ah. I think one workaround I can think of is to write a wrapper CNI plugin
that understand args."org.apache.mesos", and set "cni.ips" properly for the
macvtap plugin.

This is a common pattern in CNI called delegation before CNI chaining was
proposed.

- Jie

On Fri, May 3, 2019 at 11:14 AM Marc Roos  wrote:

>
>
>  >  Yet the cni plugin is afaik not even looking at the
> args.'org.apache.mesos'.network_info configuration,
>  >  for the cni_args it looks e.g. at args.cni.ips (or an environment
> variable)
>  >
>  >
>  >That depends on the CNI plugin you're using. CNI is a spec between CO
> (container orchestration system
>  > like K8s, mesos, etc.) and NP (network providers like calico, cisco,
> juniper, etc.).
>  >
>  >`args` field is the place where CO can inject CO specific information.
> Some CNI plugin might use those CO
>  > specific information to perform some special operations. Although I
> don't like it, this is just how the
>  > spec has evolved. The use of CNI_ARGS has been deprecated
> <
> https://github.com/containernetworking/cni/blob/master/CONVENTIONS.md#cni_args
> >
>  > in favor of using `args` field.
>  >
>  >
>  >  Or is it possible to reference in the cni network json config
> files a key of args.'org.apache.mesos'.network_info?
>  >
>  >
>  >I don't really follow this. Can you state your use case?
>
> Basically I want to assign a static ip via mesos/marathon. The only way
> I can get this to work now is via
>  the cni network configuration something like this. But I do not want to
> start creating a cni network
>  config file for every app.
>
>
> {
>   "name": "test-macvtap",
>   "type": "macvtap",
>   "master": "eth1",
>   "hostrouteif": "macvtap1",
>   "ipam": {
> "type": "host-local",
> "subnet": "192.168.122.0/24",
> "rangeStart": "192.168.122.171",
> "rangeEnd": "192.168.122.179",
> "routes": [ { "dst": "192.168.122.22/32", "gw": "0.0.0.0" },
> { "dst": "192.168.10.10/32", "gw": "0.0.0.0" },
> { "dst": "192.168.10.22/32", "gw": "0.0.0.0" }]
>   },
>   "dns": { "nameservers": ["192.168.10.10"] },
>   "args": {
> "cni": { "ips": ["192.168.122.177"] }
> }
>
>
>


Re: How should I pass the cni_args / ip, injecting lables #5592 not working

2019-05-03 Thread Jie Yu
>
> Yet the cni plugin is afaik not even looking at the
> args.'org.apache.mesos'.network_info configuration, for the cni_args it
> looks e.g. at args.cni.ips (or an environment variable)


That depends on the CNI plugin you're using. CNI is a spec between CO
(container orchestration system like K8s, mesos, etc.) and NP (network
providers like calico, cisco, juniper, etc.).

`args` field is the place where CO can inject CO specific information. Some
CNI plugin might use those CO specific information to perform some special
operations. Although I don't like it, this is just how the spec has
evolved. The use of CNI_ARGS has been deprecated
<https://github.com/containernetworking/cni/blob/master/CONVENTIONS.md#cni_args>
in
favor of using `args` field.

Or is it possible to reference in the cni network json config files a key
> of args.'org.apache.mesos'.network_info?


I don't really follow this. Can you state your use case?

- Jie

On Fri, May 3, 2019 at 10:31 AM Marc Roos  wrote:

>
> Hi Jie,
>
> Something like this is injected into the cni json configuration by mesos
>
> "args": {
>   "org.apache.mesos": {
> "network_info": {
>   "ip_addresses": [
> {
>   "protocol": "IPv4"
> }
>   ],
>   "name": "test-macvtap-cniip"
> }
>   }
> },
>
> You can add labels in Marathon with something like this
>
>   "ipAddress": {
> "networkName": "test-macvtap-cniip",
> "labels": {"CNI_ARGS": "192.168.122.172"}
>   },
>
> Then the injected will look something like
>
> "args": {
>   "org.apache.mesos": {
> "network_info": {
>   "ip_addresses": [
> {
>   "protocol": "IPv4"
> }
>   ],
>   "labels": {
> "labels": [
>   {
> "key": "CNI_ARGS",
> "value": "IP=192.168.122.172"
>   }
> ]
>   },
>   "name": "test-macvtap-cniip"
> }
>   }
> },
>
>
> Yet the cni plugin is afaik not even looking at the
> args.'org.apache.mesos'.network_info
>  configuration, for the cni_args it looks e.g. at args.cni.ips (or an
> environment variable)
>
> I have no idea what this org.apache.mesos is even usefull for, unless
> you
>  are customizing plugins. But I guess you rather stick to the cni
> standards.
>
> Or is it possible to reference in the cni network json config files
>  a key of args.'org.apache.mesos'.network_info?
>
>
>
> -Original Message-
> From: Jie Yu
> Sent: vrijdag 3 mei 2019 18:59
> To: user
> Subject: Re: FW: How should I pass the cni_args / ip, injecting lables
> #5592 not working
>
> Marc,
>
> I think the CNI_ARGS that Mesos passed into CNI plugin is the
> NetworkInfo object.
> https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L3096
>
> I don't know if there's a way in Marathon to inject NetworkInfo.labels
> from the app definition. Any marathon folks can answer this?
>
> - Jie
>
> On Fri, May 3, 2019 at 6:41 AM Marc Roos 
> wrote:
>
>
>
> I read the jira issue [0] that labels would be injected into the
> cni
> json. But I can't get this to work. I have changed the source of a
> plugin so it would dump the configuration files and this is how it
> looks
> like, when you dump the ip configured in the cni network json [1]
> as
> expected. When adding the labels, you get a totally different json
> [2],
> so how should this ever work???
>
>
> This is not working
> ===
> {
>   "id": "/server",
>   "user": "nobody",
>   "cmd": "python -m SimpleHTTPServer 8080",
>   "cpus": 0.1,
>   "mem": 32,
>   "disk": 0,
>   "instances": 1,
>   "acceptedResourceRoles": ["*"],
>   "constraints": [["hostname","CLUSTER","m03.local"]],
>   "backoffSeconds": 10,
>   "ipAddress": { "networkName": "test-macvtap-cniip" },
>   "labels": { "CNI_ARGS": "IP=192.168.122.173" } }
>
> This is also 

Re: FW: How should I pass the cni_args / ip, injecting lables #5592 not working

2019-05-03 Thread Jie Yu
Marc,

I think the CNI_ARGS that Mesos passed into CNI plugin is the NetworkInfo
object.
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L3096

I don't know if there's a way in Marathon to inject NetworkInfo.labels from
the app definition. Any marathon folks can answer this?

- Jie

On Fri, May 3, 2019 at 6:41 AM Marc Roos  wrote:

>
> I read the jira issue [0] that labels would be injected into the cni
> json. But I can't get this to work. I have changed the source of a
> plugin so it would dump the configuration files and this is how it looks
> like, when you dump the ip configured in the cni network json [1] as
> expected. When adding the labels, you get a totally different json [2],
> so how should this ever work???
>
>
> This is not working
> ===
> {
>   "id": "/server",
>   "user": "nobody",
>   "cmd": "python -m SimpleHTTPServer 8080",
>   "cpus": 0.1,
>   "mem": 32,
>   "disk": 0,
>   "instances": 1,
>   "acceptedResourceRoles": ["*"],
>   "constraints": [["hostname","CLUSTER","m03.local"]],
>   "backoffSeconds": 10,
>   "ipAddress": { "networkName": "test-macvtap-cniip" },
>   "labels": { "CNI_ARGS": "IP=192.168.122.173" } }
>
> This is also not working
> 
> {
>   "id": "/server",
>   "user": "nobody",
>   "cmd": "python -m SimpleHTTPServer 8080",
>   "cpus": 0.1,
>   "mem": 32,
>   "disk": 0,
>   "instances": 1,
>   "acceptedResourceRoles": ["*"],
>   "constraints": [["hostname","CLUSTER","m03.local"]],
>   "backoffSeconds": 10,
>   "networks": [ { "mode": "container", "name": "test-macvtap-cniip" } ],
>   "env": { "CNI_ARGS" : "'IP=192.168.122.173'" } }
>
> This does work:
> ===
> CNI_PATH="/usr/libexec/cni/" NETCONFPATH="/etc/mesos-cni"
> CNI_IFNAME="eth1" CNI_ARGS='IP=192.168.122.173' cnitool-0.5.2 add
> test-macvtap-cniip /var/run/netns/testing
>
>
>
> [1]
> ===
> Cni network config only
> [
>   {
> "args": {
>   "cni": {
> "ips": [
>   "192.168.122.177"
> ]
>   }
> },
> "cniVersion": "",
> "dns": {
>   "nameservers": [
> "192.168.10.10"
>   ]
> },
> "hostrouteif": "macvtap1",
> "ipam": {
>   "rangeEnd": "192.168.122.179",
>   "rangeStart": "192.168.122.171",
>   "routes": [
> {
>   "dst": "192.168.122.22/32",
>   "gw": "0.0.0.0"
> },
> {
>   "dst": "192.168.10.10/32",
>   "gw": "0.0.0.0"
> },
> {
>   "dst": "192.168.10.22/32",
>   "gw": "0.0.0.0"
> }
>   ],
>   "subnet": "192.168.122.0/24",
>   "type": "host-local"
> },
> "master": "eth1",
> "name": "test-macvtap",
> "type": "macvtap"
>   }
> ]
>
>
> [2]
> ===
> Dump from the marathon launched task with labels.
> [
>   {
> "args": {
>   "org.apache.mesos": {
> "network_info": {
>   "ip_addresses": [
> {
>   "protocol": "IPv4"
> }
>   ],
>   "labels": {
> "labels": [
>   {
> "key": "ips",
> "value": "192.168.122.172"
>   }
> ]
>   },
>   "name": "test-macvtap-cniip"
> }
>   }
> },
> "dns": {
>   "nameservers": [
> "192.168.10.10"
>   ]
> },
> "hostrouteif": "macvtap0",
> "ipam": {
>   "rangeEnd": "192.168.122.179",
>   "rangeStart": "192.168.122.171",
>   "routes": [
> {
>   "dst": "192.168.10.153/32",
>   "gw": "0.0.0.0"
> }
>   ],
>   "subnet": "192.168.122.0/24",
>   "type": "host-local"
> },
> "master": "eth1",
> "name": "test-macvtap-cniip",
> "type": "macvtap"
>   }
> ]
>
> [0] https://issues.apache.org/jira/browse/MESOS-5592
>
>
>
>


Re: Slack upgrade to Standard plan. Thanks Criteo

2019-04-23 Thread Jie Yu
Thanks Criteo friends!

On Tue, Apr 23, 2019 at 10:13 AM Vinod Kone  wrote:

> Hi folks,
>
> As you probably realized today, we got our Slack upgraded from "free" plan
> to "standard" plan, which allows us to have unlimited message history and
> better analytics among other things! This would be great for our community.
>
> This upgrade has been made possible due to a general contribution/donation
> from folks at Criteo . Criteo has been a long
> time user of Apache Mesos and luckily for us, they wanted to contribute
> back to the ecosystem. We will update the website with the thanks shortly.
>
> Hope you'll take advantage of the standard plan.
>
> Thanks,
> Vinod
>
>
>


Re: Container cannot write to volume? path created as nagios user???

2019-02-12 Thread Jie Yu
Marc, I cannot reproduce the issue using Mesos mini with 1.7.x head

docker run --rm --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080
mesos/mesos-mini:1.7.x

Go to browser http://localhost:8080/ and paste your marathon json there.

- Jie

On Tue, Feb 12, 2019 at 11:51 AM Marc Roos  wrote:

>
> Hi Jie,
> I am using root for now, if I get it working correctly I will run it as
> a different user. Marathon is running under marathon user.
>
> {
>   "id": "postgres",
>   "user": "root",
>   "cmd": null,
>   "cpus": 0.5,
>   "mem": 512,
>   "instances": 1,
>   "acceptedResourceRoles": ["*"],
>   "container": {
> "type": "MESOS",
> "volumes": [
>   {
> "containerPath": "data",
> "persistent": {
>   "type": "root",
>   "size": 100
>   },
> "mode": "RW"
>   }
> ],
>   "docker": {
>   "image": "postgres",
>   "credential": null,
>   "forcePullImage": false
> }
>   },
>   "env": {
> "POSTGRES_PASSWORD": "example",
> "PGDATA": "/data"
>   },
>   "args": [
>
>   ]
> }
>
>
>
> -Original Message-
> From: Jie Yu [mailto:yujie@gmail.com]
> Sent: 12 February 2019 20:48
> To: user
> Subject: Re: Container cannot write to volume? path created as nagios
> user???
>
> What user do you use to launch containers? We need more information to
> triage this.
>
> - Jie
>
> On Tue, Feb 12, 2019 at 11:44 AM Marc Roos 
> wrote:
>
>
>
>
> [@m03 foo]# ls -alrt
> total 0
> drwxr-xr-x 3 root   root 17 Dec 23  2017 ..
> drwx-- 2 nagios root  6 Feb 12 20:32
> postgres#data#e703252b-2efc-11e9-b19a-5051143001a1
> drwxr-xr-x 3 root   root 64 Feb 12 20:32 .
>
> Preparing rootfs at
> /var/lib/mesos/provisioner/containers/1b105f51-f6c1-4aa2-b68f-5221b
> 21722
> 0d/backends/copy/rootfses/dd25199c-7ab3-4645-8bc9-b5e69a913b37
> Changing root to
> /var/lib/mesos/provisioner/containers/1b105f51-f6c1-4aa2-b68f-5221b
> 21722
> 0d/backends/copy/rootfses/dd25199c-7ab3-4645-8bc9-b5e69a913b37
> mkdir: cannot create directory data: Permission denied
> I0212 20:32:32.791131  3593 executor.cpp:994] Command exited with
> status
> 1 (pid: 3597)
>
>
>
> mesos-1.7.0-2.0.3.x86_64
>
>
>
>
>
>
>
>
>
>


Re: Container cannot write to volume? path created as nagios user???

2019-02-12 Thread Jie Yu
What user do you use to launch containers? We need more information to
triage this.

- Jie

On Tue, Feb 12, 2019 at 11:44 AM Marc Roos  wrote:

>
>
> [@m03 foo]# ls -alrt
> total 0
> drwxr-xr-x 3 root   root 17 Dec 23  2017 ..
> drwx-- 2 nagios root  6 Feb 12 20:32
> postgres#data#e703252b-2efc-11e9-b19a-5051143001a1
> drwxr-xr-x 3 root   root 64 Feb 12 20:32 .
>
> Preparing rootfs at
> /var/lib/mesos/provisioner/containers/1b105f51-f6c1-4aa2-b68f-5221b21722
> 0d/backends/copy/rootfses/dd25199c-7ab3-4645-8bc9-b5e69a913b37
> Changing root to
> /var/lib/mesos/provisioner/containers/1b105f51-f6c1-4aa2-b68f-5221b21722
> 0d/backends/copy/rootfses/dd25199c-7ab3-4645-8bc9-b5e69a913b37
> mkdir: cannot create directory data: Permission denied
> I0212 20:32:32.791131  3593 executor.cpp:994] Command exited with status
> 1 (pid: 3597)
>
>
>
> mesos-1.7.0-2.0.3.x86_64
>
>
>
>
>
>
>


Re: [VOTE] Release Apache Mesos 1.5.2 (rc3)

2019-01-16 Thread Jie Yu
+1

make dist check on macOS Mojave

On Tue, Jan 15, 2019 at 12:57 AM Gilbert Song  wrote:

>  Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.5.2.
>
> 1.5.2 includes the following:
>
> 
> *Announce major bug fixes here*
> https://jira.apache.org/jira/issues/?filter=12345443
>
> The CHANGELOG for the release is available at:
>
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.5.2-rc3
>
> 
>
> The candidate for Mesos 1.5.2 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz
>
> The tag to be voted on is 1.5.2-rc3:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.5.2-rc3
>
> The SHA512 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz.sha512
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1242
>
> Please vote on releasing this package as Apache Mesos 1.5.2!
>
> The vote is open until Fri Jan 18 00:52:44 PST 2019 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.5.2
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Gilbert
>


Re: [VOTE] Release Apache Mesos 1.5.2 (rc2)

2019-01-09 Thread Jie Yu
What's the progress on this?

On Wed, Nov 28, 2018 at 10:31 PM Gilbert Song  wrote:

> Thanks, Chun!
>
> I will cut rc3 sometime early next week.
>
> On Tue, Nov 27, 2018 at 11:13 AM Chun-Hung Hsiao 
> wrote:
>
>> -1 for https://issues.apache.org/jira/browse/MESOS-8623
>>
>> I'm working on a fix.
>>
>> On Thu, Nov 22, 2018 at 1:40 PM Meng Zhu  wrote:
>>
>> > +1
>> > make check on Ubuntu 18.04
>> >
>> > On Wed, Oct 31, 2018 at 4:26 PM Gilbert Song 
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > Please vote on releasing the following candidate as Apache Mesos
>> 1.5.2.
>> > >
>> > > 1.5.2 includes the following:
>> > >
>> > >
>> >
>> 
>> > > *Announce major bug fixes here*
>> > >   * [MESOS-3790] - ZooKeeper connection should retry on `EAI_NONAME`.
>> > >   * [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
>> > >   * [MESOS-8418] - mesos-agent high cpu usage because of numerous
>> > > /proc/mounts reads.
>> > >   * [MESOS-8545] -
>> > > AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
>> > >   * [MESOS-8568] - Command checks should always call
>> > > `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`.
>> > >   * [MESOS-8620] - Containers stuck in FETCHING possibly due to
>> > > unresponsive server.
>> > >   * [MESOS-8830] - Agent gc on old slave sandboxes could empty
>> persistent
>> > > volume data.
>> > >   * [MESOS-8871] - Agent may fail to recover if the agent dies before
>> > > image store cache checkpointed.
>> > >   * [MESOS-8904] - Master crash when removing quota.
>> > >   * [MESOS-8906] - `UriDiskProfileAdaptor` fails to update profile
>> > > selectors.
>> > >   * [MESOS-8907] - Docker image fetcher fails with HTTP/2.
>> > >   * [MESOS-8917] - Agent leaking file descriptors into forked
>> processes.
>> > >   * [MESOS-8921] - Autotools don't work with newer OpenJDK versions.
>> > >   * [MESOS-8935] - Quota limit "chopping" can lead to cpu-only and
>> > > memory-only offers.
>> > >   * [MESOS-8936] - Implement a Random Sorter for offer allocations.
>> > >   * [MESOS-8942] - Master streaming API does not send (health) check
>> > > updates for tasks.
>> > >   * [MESOS-8945] - Master check failure due to CHECK_SOME(providerId).
>> > >   * [MESOS-8947] - Improve the container preparing logging in
>> > > IOSwitchboard and volume/secret isolator.
>> > >   * [MESOS-8952] - process::await/collect n^2 performance issue.
>> > >   * [MESOS-8963] - Executor crash trying to print container ID.
>> > >   * [MESOS-8978] - Command executor calling setsid breaks the tty
>> > support.
>> > >   * [MESOS-8980] - mesos-slave can deadlock with docker pull.
>> > >   * [MESOS-8986] - `slave.available()` in the allocator is expensive
>> and
>> > > drags down allocation performance.
>> > >   * [MESOS-8987] - Master asks agent to shutdown upon auth errors.
>> > >   * [MESOS-9024] - Mesos master segfaults with stack overflow under
>> load.
>> > >   * [MESOS-9049] - Agent GC could unmount a dangling persistent volume
>> > > multiple times.
>> > >   * [MESOS-9116] - Launch nested container session fails due to
>> incorrect
>> > > detection of `mnt` namespace of command executor's task.
>> > >   * [MESOS-9125] - Port mapper CNI plugin might fail with "Resource
>> > > temporarily unavailable".
>> > >   * [MESOS-9127] - Port mapper CNI plugin might deadlock iptables on
>> the
>> > > agent.
>> > >   * [MESOS-9131] - Health checks launching nested containers while a
>> > > container is being destroyed lead to unkillable tasks.
>> > >   * [MESOS-9142] - CNI detach might fail due to missing network config
>> > > file.
>> > >   * [MESOS-9144] - Master authentication handling leads to request
>> > > amplification.
>> > >   * [MESOS-9145] - Master has a fragile burned-in 5s authentication
>> > > timeout.
>> > >   * [MESOS-9146] - Agent has a fragile burn-in 5s authentication
>> timeout.
>> > >   * [MESOS-9147] - Agent and scheduler driver authentication retry
>> > backoff
>> > > time could overflow.
>> > >   * [MESOS-9151] - Container stuck at ISOLATING due to FD leak.
>> > >   * [MESOS-9170] - Zookeeper doesn't compile with newer gcc due to
>> format
>> > > error.
>> > >   * [MESOS-9196] - Removing rootfs mounts may fail with EBUSY.
>> > >   * [MESOS-9231] - `docker inspect` may return an unexpected result to
>> > > Docker executor due to a race condition.
>> > >   * [MESOS-9267] - Mesos agent crashes when CNI network is not
>> configured
>> > > but used.
>> > >   * [MESOS-9279] - Docker Containerizer 'usage' call might be
>> expensive
>> > if
>> > > mount table is big.
>> > >   * [MESOS-9283] - Docker containerizer actor can get backlogged with
>> > > large number of containers.
>> > >   * [MESOS-9305] - Create cgoup recursively to workaround systemd
>> > deleting
>> > > cgroups_root.
>> > >   * [MESOS-9308] - URI disk profile adaptor could deadlock.
>> > >   * [MESOS-9334] - Container stuck at ISOLATING state due to 

Re: Propose to create a Kubernetes framework for Mesos

2018-12-16 Thread Jie Yu
Thanks for the discussion so far! Looks like folks are pretty interested in
this, which is great!

I created a Slack channel in Apache Mesos slack (#virtual-kubelet). Please
join the channel for further discussions! (see this instruction
<http://mesos.apache.org/community/#slack> for joining Apache Mesos slack)

Given folks interested in this spread across the world, the working group
will be coordinated asynchronously in the Slack channel.

- Jie

On Mon, Dec 10, 2018 at 11:20 AM Cameron Chen  wrote:

> We now have both Mesos and Kubernetes(not running on Mesos) running in
> production.As Jie mentioned,with this proposal,I mainly want to solve the
> static partition issue.I agree to explore the ideas in a WG.
>
>
> Cameron
>
> Jie Yu  于2018年12月6日周四 上午9:48写道:
>
> > I'd like to get some feedback on what Mesos users want. I can potentially
> > see two major use cases:
> >
> > (1) I just want k8s to run on Mesos, along with other Mesos frameworks,
> > sharing the same resources pool. I don't really care about nodeless.
> > Ideally, i'd like to run upstream k8s (include kubelet). The original k8s
> > on mesos framework has been retired, and the new Mesosphere MKE is not
> open
> > source, and only runs on Mesosphere DC/OS. I need one open source
> solution
> > here.
> > (2) I want nodeless because I believe it has a tighter integration with
> > Mesos, as compared to (2), and can solve the static partition issue. (1)
> is
> > more like a k8s installer, and you can do that without Mesos.
> >
> > *Can folks chime in here?*
> >
> > However, I'm not sure if re-implementing k8s-scheduler as a Mesos
> framework
> > > is the right approach. I imagine k8s scheduler is significant piece of
> > > code  which we need to re-implement and on top of it as new API objects
> > are
> > > added to k8s API, we need to keep pace with k8s scheduler for parity.
> The
> > > approach we (in the community) took with Spark (and Jenkins to some
> > extent)
> > > was for the scheduling innovation happen in Spark community and we just
> > let
> > > Spark launch spark executors via Mesos and let Spark launch its tasks
> out
> > > of band of Mesos. We used to have a version of Spark framework (fine
> > > grained mode?) where spark tasks were launched via Mesos offers but
> that
> > > was deprecated, partly because of maintainability. Will this k8s
> > framework
> > > have similar problem? Sounds like one of the problems with the existing
> > k8s
> > > framework implementations it the pre-launching of kubelets; can we use
> > the
> > > k8s autoscaler to solve that problem?
> >
> >
> > This is a good concern. It's around 17k lines of code in k8s scheduler.
> >
> > Jies-MacBook-Pro:scheduler jie$ pwd
> > /Users/jie/workspace/kubernetes/pkg/scheduler
> > Jies-MacBook-Pro:scheduler jie$ loc --exclude .*_test.go
> >
> >
> 
> >  Language FilesLinesBlank  Comment
> >  Code
> >
> >
> 
> >  Go  8317429 2165 3798
> > 11466
> >
> >
> 
> >  Total   8317429 2165 3798
> > 11466
> >
> >
> 
> >
> > Also, I think (I might be wrong) most k8s users are not directly creating
> > > pods via the API but rather using higher level abstractions like
> replica
> > > sets, stateful sets, daemon sets etc. How will that fit into this
> > > architecture? Will the framework need to re-implement those controllers
> > as
> > > well?
> >
> >
> > This is not true. You can re-use most of the controllers. Those
> controllers
> > will create pods as you said, and the mesos framework will be responsible
> > for scheduling those pods created.
> >
> > - Jie
> >
> > On Mon, Dec 3, 2018 at 9:56 AM Cecile, Adam 
> wrote:
> >
> > > On 12/3/18 5:40 PM, Michał Łowicki wrote:
> > >
> > >
> > >
> > > On Thu, Nov 29, 2018 at 1:22 AM Vinod Kone 
> wrote:
> > >
> > >> Cameron and Michal: I would love to understand your motivations and
> use
> > >> cases for a k8s Mesos framework in a bit more detail. Looks like you
> are
> >

Re: Propose to create a Kubernetes framework for Mesos

2018-12-05 Thread Jie Yu
> will rule in the ecosystem which didn't quite pan out.
>>
>> However, I'm not sure if re-implementing k8s-scheduler as a Mesos
>> framework
>> is the right approach. I imagine k8s scheduler is significant piece of
>> code  which we need to re-implement and on top of it as new API objects
>> are
>> added to k8s API, we need to keep pace with k8s scheduler for parity. The
>> approach we (in the community) took with Spark (and Jenkins to some
>> extent)
>> was for the scheduling innovation happen in Spark community and we just
>> let
>> Spark launch spark executors via Mesos and let Spark launch its tasks out
>> of band of Mesos. We used to have a version of Spark framework (fine
>> grained mode?) where spark tasks were launched via Mesos offers but that
>> was deprecated, partly because of maintainability. Will this k8s framework
>> have similar problem? Sounds like one of the problems with the existing
>> k8s
>> framework implementations it the pre-launching of kubelets; can we use the
>> k8s autoscaler to solve that problem?
>>
>> Also, I think (I might be wrong) most k8s users are not directly creating
>> pods via the API but rather using higher level abstractions like replica
>> sets, stateful sets, daemon sets etc. How will that fit into this
>> architecture? Will the framework need to re-implement those controllers as
>> well?
>>
>> Is there an integration point in k8s ecosystem where we can reuse the
>> existing k8s schedulers and controllers but run the pods with mesos
>> container runtime?
>>
>> All, in all, I'm +1 to explore the ideas in a WG.
>>
>>
>> On Wed, Nov 28, 2018 at 2:05 PM Paulo Pires  wrote:
>>
>> > Hello all,
>> >
>> > As a Kubernetes fan, I am excited about this proposal.
>> > However, I would challenge this community to think more abstractly about
>> > the problem you want to address and any solution requirements before
>> > discussing implementation details, such as adopting VK.
>> >
>> > Don't take me wrong, VK is a great concept: a Kubernetes node that
>> > delegates container management to someone else.
>> > But allow me to clarify a few things about it:
>> >
>> > - VK simply provides a very limited subset of the kubelet functionality,
>> > namely the Kubernetes node registration and the observation of Pods that
>> > have been assigned to it. It doesn't do pod (intra or inter) networking
>> nor
>> > delegates to CNI, doesn't do volume mounting, and so on.
>> > - Like the kubelet, VK doesn't implement scheduling. It also doesn't
>> > understand anything else than a Pod and its dependencies (e.g.
>> ConfigMap or
>> > Secret), meaning other primitives, such as DaemonSet, Deployment,
>> > StatefulSet, or extensions, such as CRDs are unknown to the VK.
>> > - While the kubelet manages containers through CRI API (Container
>> Runtime
>> > Interface), the VK does it through its own Provider API.
>> > - kubelet translates from Kubernetes primitives to CRI primitives, so
>> CRI
>> > implementations only need to understand CRI. However, the VK does no
>> > translation and passes Kubernetes primitives directly to a provider,
>> > requiring the VK provider to understand Kubernetes primitives.
>> > - kubelet talks to CRI implementations through a gRPC socket. VK talks
>> to
>> > providers in-process and is highly-opinionated about the fact a provider
>> > has no lifecycle (there's no _start_ or _stop_, as there would be for a
>> > framework). There are talks about having Provide API over gRPC but it's
>> not
>> > trivial to decide[2].
>> >
>> > Now, if you are still thinking about implementation details, and having
>> > some experience trying to create a VK provider for Mesos[1], I can tell
>> you
>> > the VK, as is today, is not a seamless fit.
>> > That said, I am willing to help you figure out the design and pick the
>> > right pieces to execute, if this is indeed something you want to do.
>> >
>> > 1 -
>> >
>> https://github.com/pires/virtual-kubelet/tree/mesos_integration/providers/mesos
>> > 2 - https://github.com/virtual-kubelet/virtual-kubelet/issues/160
>> >
>> > Cheers,
>> > Pires
>> >
>> > On Wed, Nov 28, 2018 at 5:38 AM Jie Yu  wrote:
>> >
>> >> + user list as well to hear more feedback from Mesos users.
>> >>
>> >> I am +1 on this proposal to create a Mesos framework that exposes k8s

Re: Propose to create a Kubernetes framework for Mesos

2018-11-27 Thread Jie Yu
+ user list as well to hear more feedback from Mesos users.

I am +1 on this proposal to create a Mesos framework that exposes k8s API,
and provide nodeless

experience to users.

Creating Mesos framework that provides k8s API is not a new idea. For
instance, the following are the two prior attempts:
1. https://github.com/kubernetes-retired/kube-mesos-framework
2. https://mesosphere.com/product/kubernetes-engine/

Both of the above solutions will run unmodified kubelets for workloads
(i.e., pods). Some users might prefer that way, and we should not preclude
that on Mesos. However, the reason this nodeless (aka, virtual kubelet)
idea got me very excited is because it provides us an opportunity to create
a truly integrated solution to bridge k8s and Mesos.

K8s gets popular for reasons. IMO, the followings are the key:
(1) API machinery. This includes API extension mechanism (CRD
),
simple-to-program client, versioning, authn/authz, etc.
(2) It expose basic scheduling primitives, and let users/vendors focus on
orchestration (i.e., Operators). In contrast, Mesos framework is
significantly harder to program due to the need for doing scheduling also.
Although we have scheduling libraries like Fenzo
, the whole community suffers from
fragmentation because there's no "default" solution.

** Why this proposal is more integrated than prior solutions?*

This is because prior solutions are more like installer for k8s. You either
need to pre-reserve resources
 for kubelet, or fork
k8s scheduler to bring up kubelet on demand
. Complexity is
definitely a concern since both systems are involved. In contrast, the
proposal propose to run k8s workloads (pods) directly on Mesos by
translating pod spec to tasks/executors in Mesos. It's just another Mesos
framework, but you can extend that framework behavior using k8s API
extension mechanism (CRD and Operator)!

** Compare to just using k8s?*

First of all, IMO, k8s is just an API spec. Any implementation that passes
conformance tests is vanilla k8s experience. I understand that by going
nodeless, some of the concepts in k8s no longer applies (e.g.,
NodeAffinity, NodeSelector). I am actually less worried about this for two
reasons: 1) Big stakeholders are behind nodeless, including Microsoft, AWS,
Alicloud, etc; 2) K8s API is evolving, and nodeless has real use cases
(e.g., in public clouds).

In fact, we can also choose to implement those k8s APIs that make the most
sense first, and maybe define our own APIs, leveraging the extensibility of
the k8s API machinery!

If we do want to compare to upstream k8s implementation, i think the main
benefit is that:
1) You can still run your existing custom Mesos frameworks as it is today,
but start to provide your users some k8s API experiences
2) Scalability. Mesos is inherently more scalable than k8s because it takes
different trade-offs. You can run multiple copies of the same frameworks
(similar to marathon on marathon) to reach large scale if the k8s framework
itself cannot scale beyond certain limit.

** Why putting this framework in Mesos repo?*

Historically, the problem with Mesos community is fragmentation. People
create different solutions for the same set of problems. Having a "blessed"
solution in the Mesos repo has the following benefits:
1) License and ownership. It's under Apache already.
2) Attract contributions. Less fragmentation.
3) Established high quality in the repository.

 What's my suggestion for next steps? 

I suggest we create a working group for this. Any other PMC that likes this
idea, please chime in here.

- Jie

On Fri, Nov 23, 2018 at 5:24 AM 张冬冬  wrote:

>
>
> 发自我的 iPhone
>
> > 在 2018年11月23日,20:37,Alex Rukletsov  写道:
> >
> > I'm in favour of the proposal, Cameron. Building a bridge between Mesos
> and
> > Kubernetes will be beneficial for both communities. Virtual kubelet
> effort
> > looks promising indeed and is definitely a worthwhile approach to build
> the
> > bridge.
> >
> > While we will need some sort of a scheduler when implementing a provider
> > for mesos, we don't need to implement and use a "default" one: a simple
> > mesos-go based scheduler will be fine for the start. We can of course
> > consider building a default scheduler, but this will significantly
> increase
> > the size of the project.
> >
> > An exercise we will have to do here is determine which parts of a
> > kubernetes task specification can be "converted" and hence launched on a
> > Mesos cluster. Once we have a working prototype we can start testing and
> > collecting data.
> >
> > Do you want to come up with a plan and maybe a more detailed proposal?
> >
> > Best,
> > Alex
>


Re: [VOTE] Release Apache Mesos 1.5.2 (rc2)

2018-11-21 Thread Jie Yu
+1

> On Oct 31, 2018, at 4:26 PM, Gilbert Song  wrote:
> 
> Hi all,
> 
> Please vote on releasing the following candidate as Apache Mesos 1.5.2.
> 
> 1.5.2 includes the following:
> 
> *Announce major bug fixes here*
>   * [MESOS-3790] - ZooKeeper connection should retry on `EAI_NONAME`.
>   * [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
>   * [MESOS-8418] - mesos-agent high cpu usage because of numerous 
> /proc/mounts reads.
>   * [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession 
> is flaky.
>   * [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` 
> before `REMOVE_NESTED_CONTAINER`.
>   * [MESOS-8620] - Containers stuck in FETCHING possibly due to unresponsive 
> server.
>   * [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent 
> volume data.
>   * [MESOS-8871] - Agent may fail to recover if the agent dies before image 
> store cache checkpointed.
>   * [MESOS-8904] - Master crash when removing quota.
>   * [MESOS-8906] - `UriDiskProfileAdaptor` fails to update profile selectors.
>   * [MESOS-8907] - Docker image fetcher fails with HTTP/2.
>   * [MESOS-8917] - Agent leaking file descriptors into forked processes.
>   * [MESOS-8921] - Autotools don't work with newer OpenJDK versions.
>   * [MESOS-8935] - Quota limit "chopping" can lead to cpu-only and 
> memory-only offers.
>   * [MESOS-8936] - Implement a Random Sorter for offer allocations.
>   * [MESOS-8942] - Master streaming API does not send (health) check updates 
> for tasks.
>   * [MESOS-8945] - Master check failure due to CHECK_SOME(providerId).
>   * [MESOS-8947] - Improve the container preparing logging in IOSwitchboard 
> and volume/secret isolator.
>   * [MESOS-8952] - process::await/collect n^2 performance issue.
>   * [MESOS-8963] - Executor crash trying to print container ID.
>   * [MESOS-8978] - Command executor calling setsid breaks the tty support.
>   * [MESOS-8980] - mesos-slave can deadlock with docker pull.
>   * [MESOS-8986] - `slave.available()` in the allocator is expensive and 
> drags down allocation performance.
>   * [MESOS-8987] - Master asks agent to shutdown upon auth errors.
>   * [MESOS-9024] - Mesos master segfaults with stack overflow under load.
>   * [MESOS-9049] - Agent GC could unmount a dangling persistent volume 
> multiple times.
>   * [MESOS-9116] - Launch nested container session fails due to incorrect 
> detection of `mnt` namespace of command executor's task.
>   * [MESOS-9125] - Port mapper CNI plugin might fail with "Resource 
> temporarily unavailable".
>   * [MESOS-9127] - Port mapper CNI plugin might deadlock iptables on the 
> agent.
>   * [MESOS-9131] - Health checks launching nested containers while a 
> container is being destroyed lead to unkillable tasks.
>   * [MESOS-9142] - CNI detach might fail due to missing network config file.
>   * [MESOS-9144] - Master authentication handling leads to request 
> amplification.
>   * [MESOS-9145] - Master has a fragile burned-in 5s authentication timeout.
>   * [MESOS-9146] - Agent has a fragile burn-in 5s authentication timeout.
>   * [MESOS-9147] - Agent and scheduler driver authentication retry backoff 
> time could overflow.
>   * [MESOS-9151] - Container stuck at ISOLATING due to FD leak.
>   * [MESOS-9170] - Zookeeper doesn't compile with newer gcc due to format 
> error.
>   * [MESOS-9196] - Removing rootfs mounts may fail with EBUSY.
>   * [MESOS-9231] - `docker inspect` may return an unexpected result to Docker 
> executor due to a race condition.
>   * [MESOS-9267] - Mesos agent crashes when CNI network is not configured but 
> used.
>   * [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if 
> mount table is big.
>   * [MESOS-9283] - Docker containerizer actor can get backlogged with large 
> number of containers.
>   * [MESOS-9305] - Create cgoup recursively to workaround systemd deleting 
> cgroups_root.
>   * [MESOS-9308] - URI disk profile adaptor could deadlock.
>   * [MESOS-9334] - Container stuck at ISOLATING state due to libevent poll 
> never returns.
> 
> The CHANGELOG for the release is available at:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.5.2-rc2
> 
> 
> The candidate for Mesos 1.5.2 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc2/mesos-1.5.2.tar.gz
> 
> The tag to be voted on is 1.5.2-rc2:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.5.2-rc2
> 
> The SHA512 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc2/mesos-1.5.2.tar.gz.sha512
> 
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc2/mesos-1.5.2.tar.gz.asc
> 
> The PGP key used to sign the release is here:
> 

Re: mesos containerizer issue with v1.8.0: HTTP response Decoding failed

2018-11-08 Thread Jie Yu
Olivier, are you sure you're talking to dockerhub directly? Is there a
proxy in the middle?

I think it's also likely that it's related to your curl version.

I tested on today's master branch, using this marathon config (docker image
is centos). It works for me.

{
  "id": "/test",
  "instances": 1,
  "portDefinitions": [],
  "container": {
"type": "MESOS",
"volumes": [],
"docker": {
  "image": "centos"
}
  },
  "cpus": 0.1,
  "mem": 128,
  "requirePorts": false,
  "networks": [],
  "healthChecks": [],
  "fetch": [],
  "constraints": [],
  "cmd": "sleep 1"
}

- Jie

On Thu, Nov 8, 2018 at 2:33 PM Cecile, Adam  wrote:

> http-parser library is not updated anymore and does not support HTTP/2 so
> it's a dead end anyway
>
> Le 8 novembre 2018 10:17:20 GMT+01:00, Olivier Sallou <
> olivier.sal...@irisa.fr> a écrit :
>>
>>
>>
>> - Mail original -
>>
>>> De: "Olivier Sallou" 
>>> À: user@mesos.apache.org
>>> Envoyé: Mercredi 7 Novembre 2018 12:40:24
>>> Objet: Re: mesos containerizer issue with v1.8.0: HTTP response Decoding 
>>> failed
>>>
>>
>> On 11/7/18 11:54 AM, Cecile, Adam wrote:
>>>
  Hi,

  You might be hitting the same bug as I did (no HTTP/2 support in code
  pulling images for Mesos).
  https://issues.apache.org/jira/browse/MESOS-9364

>>>
>>>  adding some logs as suggested in your issue, error code is different,
>>>  getting "invalid constant string" error from http_parser error:
>>>
>>>
>>>   XX(INVALID_CONSTANT, "invalid constant string")
>>>
>>
>>
>> with additional debug, problem is related to http parser vs received answer 
>> from dockerhub. If I remove control on parsed data length vs body length  it 
>> works nicely (for test/debug only), so it may be a pb of response http 
>> compliance...
>> Anyway, this issue prevents download of docker images when using unified 
>> containerizer. As I am using code from master branch (latest), I can only 
>> hope it will be fixed before next release
>>
>>
>>
>>
>>>
>>>
  My report also include some code you can add in C++ code of the fetcher
  to retrieve the actual message coming from the http response parser 
 library.

  Regards, Adam.

  On 11/7/18 11:42 AM, Adam Cecile wrote:

> On 11/7/18 10:48 AM, Olivier Sallou wrote:
>
>> On 11/7/18 10:38 AM, Olivier Sallou wrote:
>>
>>>  Hi,
>>>
>>>  I installed mesos from source. It works fine with docker containerizer.
>>>
>>>  Howerver it fails with  unified containerizer at container start.
>>>
>>>  It used to work on a previous (older release) install. In the
>>>  meanwhile,
>>>  some system libs etc.. have been upgraded.
>>>
>>>  In logs I have the following:
>>>
>>>
>>>  I1107 09:32:48.707176 31983 containerizer.cpp:1280] Starting container
>>>  28f07a61-676a-4876-aae4-73598de90aae
>>>  E1107 09:32:49.683372 31986 slave.cpp:6168] Container
>>>  '28f07a61-676a-4876-aae4-73598de90aae' for executor '1-0' of framework
>>>  80fc2079-ba14-454b-8276-79fae090f8b3- failed to start: Failed to
>>>  decode HTTP responses: Decoding failed
>>>  HTTP/1.1 200 OK
>>>  Content-Type: application/json
>>>  Date: Wed, 07 Nov 2018 08:32:46 GMT
>>>  Transfer-Encoding: chunked
>>>  Strict-Transport-Security: max-age=31536000
>>>
>>>  105c
>>>  {"token":"eyJhbGciOiJSUzI1NiIsInR5cC...
>>>
>>>
>>>  Logs do not show the destination of the http request (a pull on docker
>>>  hub ? a request to master ? ...)
>>>
>>  I could increase some slave logging, and HTTP failure occurs with
>>  pulling of image:
>>
>>   I1107 10:45:56.689092 31987 registry_puller.cpp:286] Pulling image
>>  'library/centos:latest' from
>>  'docker-manifest://registry-1.docker.io:443library/centos?latest#https'
>>  to '/tmp/mesos/store/docker/staging/99WUh3'
>>   E1107 10:45:57.634601 31987 slave.cpp:6168] Container
>>  '48ea5811-3f97-41c1-b1a5-9a4416552545' for executor '6-0' of framework
>>  80fc2079-ba14-454b-8276-79fae090f8b3- failed to start: Failed to
>>  decode HTTP responses: Decoding failed
>>   HTTP/1.1 200 OK
>>   Content-Type: application/json
>>   Date: Wed, 07 Nov 2018 09:45:54 GMT
>>   Transfer-Encoding: chunked
>>   Strict-Transport-Security: max-age=31536000
>>
>>   
>>
>>
>>  so it seems there is an issue with mesos unified management with docker
>>  hub. Could it be related to libcurl version? (libcurl4 on my system). Is
>>  specific setup needed?
>>
>>
>>
>>  any idea on what could be wrong ? or how to get more debug info ?
>>>
>>>
>>>  Thanks
>>>
>>>
>>>  Olivier
>>>
>>>
>>> --
>>> Olivier Sallou
>>> Univ Rennes, Inria, CNRS, IRISA
>>> Irisa, Campus de Beaulieu
>>> F-35042 RENNES - FRANCE
>>> Tel: 02.99.84.71.95

Re: [VOTE] Release Apache Mesos 1.5.2 (rc1)

2018-10-27 Thread Jie Yu
Gilbert, can we fix this and call another vote?

Thanks,
- Jie

On Wed, Oct 24, 2018 at 12:45 PM Greg Mann  wrote:

> Hmm I wonder if this is an issue on 1.5.1, or perhaps introduced by this
> commit? https://github.com/apache/mesos/commit/902aa34b79
>
> On Wed, Oct 24, 2018 at 12:30 PM Vinod Kone  wrote:
>
>> -1
>>
>> Tested on ASF CI. Looks like Clang builds are failing with a build error.
>> See example build output
>> <
>> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/55/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console
>> >
>> below:
>>
>> libtool: compile:  clang++-3.5 -DPACKAGE_NAME=\"mesos\"
>> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.5.2\"
>> "-DPACKAGE_STRING=\"mesos 1.5.2\"" -DPACKAGE_BUGREPORT=\"\"
>> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"1.5.2\"
>> -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1
>> -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1
>> -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1
>> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\"
>> -DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1
>> -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1
>> -DMESOS_HAS_JAVA=1 -DHAVE_EVENT2_EVENT_H=1 -DHAVE_LIBEVENT=1
>> -DHAVE_EVENT2_THREAD_H=1 -DHAVE_LIBEVENT_PTHREADS=1 -DHAVE_LIBSASL2=1
>> -DHAVE_OPENSSL_SSL_H=1 -DHAVE_EVENT2_BUFFEREVENT_SSL_H=1
>> -DHAVE_LIBEVENT_OPENSSL=1 -DUSE_SSL_SOCKET=1 -DHAVE_SVN_VERSION_H=1
>> -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1
>> -DHAVE_ZLIB_H=1 -DHAVE_LIBZ=1 -DHAVE_PYTHON=\"2.7\"
>> -DMESOS_HAS_PYTHON=1 -I. -I../../src -Werror
>> -DLIBDIR=\"/mesos/mesos-1.5.2/_inst/lib\"
>> -DPKGLIBEXECDIR=\"/mesos/mesos-1.5.2/_inst/libexec/mesos\"
>> -DPKGDATADIR=\"/mesos/mesos-1.5.2/_inst/share/mesos\"
>> -DPKGMODULEDIR=\"/mesos/mesos-1.5.2/_inst/lib/mesos/modules\"
>> -I../../include -I../include -I../include/mesos -DPICOJSON_USE_INT64
>> -D__STDC_FORMAT_MACROS -isystem ../3rdparty/boost-1.53.0 -isystem
>> ../3rdparty/concurrentqueue-7b69a8f -I../3rdparty/elfio-3.2
>> -I../3rdparty/glog-0.3.3/src -I../3rdparty/leveldb-1.19/include
>> -I../../3rdparty/libprocess/include -I../3rdparty/nvml-352.79
>> -I../3rdparty/picojson-1.3.0 -I../3rdparty/protobuf-3.5.0/src
>> -I../../3rdparty/stout/include
>> -I../3rdparty/zookeeper-3.4.8/src/c/include
>> -I../3rdparty/zookeeper-3.4.8/src/c/generated -isystem
>> /usr/include/subversion-1 -isystem /usr/include/apr-1 -isystem
>> /usr/include/apr-1.0 -pthread -Wall -Wsign-compare -Wformat-security
>> -fstack-protector-strong -fPIC -g1 -O0 -std=c++11 -MT
>> slave/containerizer/libmesos_no_3rdparty_la-containerizer.lo -MD -MP
>> -MF slave/containerizer/.deps/libmesos_no_3rdparty_la-containerizer.Tpo
>> -c ../../src/slave/containerizer/containerizer.cpp  -fPIC -DPIC -o
>> slave/containerizer/.libs/libmesos_no_3rdparty_la-containerizer.o
>> In file included from ../../src/slave/http.cpp:30:
>> In file included from ../../include/mesos/authorizer/authorizer.hpp:25:
>> ../../3rdparty/libprocess/include/process/future.hpp:1089:3: error: no
>> matching member function for call to 'set'
>>   set(u);
>>   ^~~
>> ../../src/slave/http.cpp:3196:10: note: in instantiation of function
>> template specialization
>> 'process::Future::Future> process::Future > >' requested here
>>   return slave->containerizer->attach(containerId)
>>  ^
>> ../../3rdparty/libprocess/include/process/future.hpp:597:8: note:
>> candidate function not viable: no known conversion from 'const
>> process::Future >' to
>> 'const process::http::Response' for 1st argument
>>   bool set(const T& _t);
>>^
>> ../../3rdparty/libprocess/include/process/future.hpp:598:8: note:
>> candidate function not viable: no known conversion from 'const
>> process::Future >' to
>> 'process::http::Response' for 1st argument
>>   bool set(T&& _t);
>>^
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Oct 22, 2018 at 12:53 AM Gilbert Song  wrote:
>>
>> > Hi all,
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 1.5.2.
>> >
>> > 1.5.2 includes the following:
>> >
>> >
>> 
>> >   * [MESOS-3790] - ZooKeeper connection should retry on `EAI_NONAME`.
>> >   * [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
>> >   * [MESOS-8418] - mesos-agent high cpu usage because of numerous
>> > /proc/mounts reads.
>> >   * [MESOS-8545] -
>> > AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
>> >   * [MESOS-8568] - Command checks should always call
>> > `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`.
>> >   * [MESOS-8620] - Containers stuck in FETCHING possibly due to
>> > unresponsive server.
>> >   * [MESOS-8830] - Agent gc on old slave sandboxes could empty
>> persistent
>> > volume 

[Containerization] Mesos CNI support

2018-08-23 Thread Jie Yu
Hi,

If you are not using Mesos CNI integration, please ignore this email.

Recently, we discovered a few bugs related to our CNI integration when
there are lots of containers (200+) on a single agent box. Please see the
following tickets for details.

MESOS-9125 : Port mapper
CNI plugin might fail with "Resource temporarily unavailable
MESOS-9127 : Port mapper
CNI plugin might deadlock iptables on the agent
MESOS-9142 : CNI detach
might fail due to missing network config file

We fixed all of them, and backported the fixes to the currently maintained
patch releases. Please refer to the corresponding tickets for the fix
versions.

Also, if you're using any CNI plugin that depends on the host local IPAM
,
for example, the bridge plugin
, please make sure the host
local IPAM's `dataDir`

points to a tmpfs which gets cleaned up when reboot happens. Otherwise, the
IP address will be leaked, and you will hit IP allocation failure if your
box has constant reboots.

Please reach out if you have any question!

- Jie


Re: [VOTE] Move the project repos to gitbox

2018-07-17 Thread Jie Yu
+1

On Tue, Jul 17, 2018 at 9:38 AM, Andrew Schwartzmeyer <
and...@schwartzmeyer.com> wrote:

> +1
>
>
>
> On 07/17/2018 8:54 am, Zhitao Li wrote:
>
> +1
>
> On Tue, Jul 17, 2018 at 8:10 AM James Peach  wrote:
>
>>
>>
>> > On Jul 17, 2018, at 7:58 AM, Vinod Kone  wrote:
>> >
>> > Hi,
>> >
>> > As discussed in another thread and in the committers sync, there seem
>> to be heavy interest in moving our project repos ("mesos", "mesos-site")
>> from the "git-wip" git server to the new "gitbox" server to better avail
>> GitHub integrations.
>> >
>> > Please vote +1, 0, -1 regarding the move to gitbox. The vote will close
>> in 3 business days.
>>
>>
>> +1
>
>
>
> --
> Cheers,
>
> Zhitao Li
>
>


Re: Backport Policy

2018-07-16 Thread Jie Yu
Greg, I like your idea of adding a prescriptive "policy" when evaluating
whether a bug fix should be backported, and leave the decision to committer
(because they have the most context, and avoid a bottleneck in the process).

- Jie

On Mon, Jul 16, 2018 at 11:24 AM, Greg Mann  wrote:

> My impression is that we have two opposing schools of thought here:
>
>1. Backport as little as possible, to avoid unforeseen consequences
>2. Backport as much as proves practical, to eliminate bugs in
>supported versions
>
> Do other people agree with this assessment?
>
> If so, how can we find common ground? One possible solution would be to
> leave the decision on backporting up to the committer, without specifying a
> project-wide policy. This seems to be the status quo, and would lead to
> some variation across committers regarding what types of fixes are
> backported. We could also choose to delegate the decision to the release
> manager; I favor leaving the decision with the committer, to eliminate the
> burden on release managers.
>
> Here's a thought: rather than defining a prescriptive "policy" that we
> expect committers to abide by, we could enumerate in the documentation the
> competing concerns that we expect committers to consider when making
> decisions on backports. The committing docs could read something like:
>
> "When bug fixes are committed to master, the committer should evaluate the
> fix to determine whether or not it should be backported to supported
> versions. This is left to the committer, but they are expected to weigh the
> following concerns when making the decision:
>
>- Every backported change comes with a risk of unintended
>consequences. The change should be carefully evaluated to ensure that such
>side-effects are highly unlikely.
>- As the complexity of applying a backport increases due to merge
>conflicts, the likelihood of unintended consequences also increases. Bug
>fixes which require extensive rebasing should only be backported when the
>bug is critical enough to warrant the risk.
>- Users of supported versions benefit greatly from the resolution of
>bugs in point releases. Thus, whenever concerns #1 and #2 can be allayed
>for a given bug fix, it should be backported."
>
>
> Cheers,
> Greg
>
>
> On Mon, Jul 16, 2018 at 3:06 AM, Alex Rukletsov 
> wrote:
>
>> Back porting as little as possible is the ultimate goal for me. My
>> reasons are closely aligned with what Andrew wrote above.
>>
>> If we agree on this strategy, the next question is how to enforce it. My
>> intuition is that committers will lean towards back porting their patches
>> in arguable cases, because humans tend to overestimate the importance of
>> their personal work. Delegating the decision in such cases to a release
>> manager in my opinion will help us enforce the strategy of minimal number
>> backports. As a bonus, the release manager will have a much better
>> understanding of what's going on with the release, keyword: "more
>> ownership".
>>
>> On Sat, Jul 14, 2018 at 12:07 AM, Andrew Schwartzmeyer <
>> and...@schwartzmeyer.com> wrote:
>>
>>> I believe I fall somewhere between Alex and Ben.
>>>
>>> As for deciding what to backport or not, I lean toward Alex's view of
>>> backporting as little as possible (and agree with his criteria). My
>>> reasoning is that all changes can have unforeseen consequences, which I
>>> believe is something to be actively avoided in already released versions.
>>> The reason for backporting patches to fix regressions is the same as the
>>> reason to avoid backporting as much as possible: keep behavior consistent
>>> (and safe) within a release. With that as the goal of a branch in
>>> maintenance mode, it makes sense to fix regressions, and make exceptions to
>>> fix CVEs and other critical/blocking issues.
>>>
>>> As for who should decide what to backport, I lean toward Ben's view of
>>> the burden being on the committer. I don't think we should add more work
>>> for release managers, and I think the committer/shepherd obviously has the
>>> most understanding of the context around changes proposed for backport.
>>>
>>> Here's an example of a recent bugfix which I backported:
>>> https://reviews.apache.org/r/67587/ (for MESOS-3790)
>>>
>>> While normally I believe this change falls under "avoid due to
>>> unforeseen consequences," I made an exception as the bug was old, circa
>>> 2015, (indicating it had been an issue for others), and was causing
>>> recurring failures in testing. The fix itself was very small, meaning it
>>> was easier to evaluate for possible side effects, so I felt a little safer
>>> in that regard. The effect of not having the fix was a fatal and undesired
>>> crash, which furthermore left troublesome side effects on the system (you
>>> couldn't bring the agent back up). And lastly, a dependent project (DC/OS)
>>> wanted it in their next bump, which necessitated backporting to the release
>>> they were 

Re: Backport Policy

2018-07-13 Thread Jie Yu
I typically backport all bug fixes that cleanly apply and the risk is low.
It's a judgement call, but many of the time, you can easily tell the risk
is low.

I think my argument on why we want to do this is "why not". I want our
software to have less bugs!

Letting release manager decides which patch to backport or not does not
scale. Some release managers might even become dormant after a while.

- Jie

On Fri, Jul 13, 2018 at 12:54 AM, Alex Rukletsov 
wrote:

> This is exactly where our views differ, Ben : )
>
> Ideally, I would like a release manager to have more ownership and less
> manual work. In my imagination, a release manager has more power and
> control about dates, features, backports and everything that is related to
> "their" branch. I would also like us to back port as little as possible, to
> simplify testing and releasing patch versions.
>
> On Fri, Jul 13, 2018 at 1:17 AM, Benjamin Mahler 
> wrote:
>
> > +user, I probably it would be good to hear from users as well.
> >
> > Please see the original proposal as well as Alex's proposal and let us
> know
> > your thoughts.
> >
> > To continue the discussion from where Alex left off:
> >
> > > Other bugs and significant improvements, e.g., performance, may be back
> > ported,
> > the release manager should ideally be the one who decides on this.
> >
> > I'm a little puzzled by this, why is the release manager involved? As we
> > already document, backports occur when the bug is fixed, so this happens
> in
> > the steady state of development, not at release time. The release manager
> > only comes in at the time of the release itself, at which point all
> > backports have already happened and the release manager handles the
> release
> > process. Only blocker level issues can stop the release and while the
> > release manager has a strong say, we should generally agree on what
> > consists of a release blocking issue.
> >
> > Just to clarify my workflow, I generally backport every bug fix I commit
> > that applies cleanly, right after I commit it to master (with the
> > exceptions I listed below).
> >
> > On Thu, Jul 12, 2018 at 8:39 AM, Alex Rukletsov 
> > wrote:
> >
> > > I would like to back port as little as possible. I suggest the
> following
> > > criteria:
> > >
> > > * By default, regressions are back ported to existing release
> branches. A
> > > bug is considered a regression if the functionality is present in the
> > > previous minor or patch version and is not affected by the bug there.
> > >
> > > * Critical and blocker issues, e.g., a CVE, can be back ported.
> > >
> > > * Other bugs and significant improvements, e.g., performance, may be
> back
> > > ported, the release manager should ideally be the one who decides on
> > this.
> > >
> > > On Thu, Jul 12, 2018 at 12:25 AM, Vinod Kone 
> > wrote:
> > >
> > > > Ben, thanks for the clarification. I'm in agreement with the points
> you
> > > > made.
> > > >
> > > > Once we have consensus, would you mind updating the doc?
> > > >
> > > > On Wed, Jul 11, 2018 at 5:15 PM Benjamin Mahler 
> > > > wrote:
> > > >
> > > > > I realized recently that we aren't all on the same page with
> > > backporting.
> > > > > We currently only document the following:
> > > > >
> > > > > "Typically the fix for an issue that is affecting supported
> releases
> > > > lands
> > > > > on the master branch and is then backported to the release
> > branch(es).
> > > In
> > > > > rare cases, the fix might directly go into a release branch without
> > > > landing
> > > > > on master (e.g., fix / issue is not applicable to master)." [1]
> > > > >
> > > > > This leaves room for interpretation about what lies outside of
> > > "typical".
> > > > > Here's the simplest way I can explain what I stick to, and I'd like
> > to
> > > > hear
> > > > > what others have in mind:
> > > > >
> > > > > * By default, bug fixes at any level should be backported to
> existing
> > > > > release branches if it affects those releases. Especially
> important:
> > > > > crashes, bugs in non-experimental features.
> > > > >
> > > > > * Exceptional cases that can omit backporting: difficult to
> backport
> > > > fixes
> > > > > (especially if the bugs are deemed of low priority), bugs in
> > > experimental
> > > > > features.
> > > > >
> > > > > * Exceptional non-bug cases that can be backported: performance
> > > > > improvements.
> > > > >
> > > > > I realize that there is a ton of subtlety here (even in terms of
> > which
> > > > > things are defined as bugs). But I hope we can lay down a policy
> that
> > > > gives
> > > > > everyone the right mindset for common cases and then discuss corner
> > > cases
> > > > > on-demand in the future.
> > > > >
> > > > > [1] http://mesos.apache.org/documentation/latest/versioning/
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Release Apache Mesos 1.6.1 (rc1)

2018-06-27 Thread Jie Yu
+1

Passed on our internal CI that has the following matrix. I looked into the
only failed test, looks to be a flaky test due to a race in the test.



On Tue, Jun 26, 2018 at 7:02 PM, Greg Mann  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.6.1.
>
>
> 1.6.1 includes the following:
> 
> 
> *Announce major features here*
> *Announce major bug fixes here*
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
> plain;f=CHANGELOG;hb=1.6.1-rc1
> 
> 
>
> The candidate for Mesos 1.6.1 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/mesos-1.6.1.tar.gz
>
> The tag to be voted on is 1.6.1-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.6.1-rc1
>
> The SHA512 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/
> mesos-1.6.1.tar.gz.sha512
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/
> mesos-1.6.1.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1229
>
> Please vote on releasing this package as Apache Mesos 1.6.1!
>
> The vote is open until Fri Jun 29 18:46:28 PDT 2018 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.6.1
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Greg
>


Re: Proposing change to the allocatable check in the allocator

2018-06-11 Thread Jie Yu
I would suggest we also consider the possibility of adding per framework
control on `min_allocatable_resources`.

If we want to consider supporting per-framework setting, we should probably
model this as a protobuf, rather than a free form JSON. The same protobuf
can be reused for both master flag, framework API, or even supporting
Resource Request in the future. Something like the following:

message ResourceQuantityPredicate {
  enum Type {
SCALAR_GE,
  }
  optional Type type;
  optional Value.Scalar scalar;
}
message ResourceRequirement {
  required string resource_name;
  oneof predicates {
ResourceQuantityPredicate quantity;
  }
}
message ResourceRequirementList {
  // All requirements MUST be met.
  repeated ResourceRequirement requirements;
}

// Resource request API.
message Request {
  repeated ResoruceRequrementList accepted;
}

// `allocatable()`
message MinimalAllocatableResources {
  repeated ResoruceRequrementList accepted;
}

On Mon, Jun 11, 2018 at 3:47 PM, Meng Zhu  wrote:

> Hi:
>
> The allocatable
> 
>  check in the allocator (shown below) was originally introduced to
>
> help alleviate the situation where a framework receives some resources,
> but no
>
> cpu/memory, thus cannot launch a task.
>
>
> constexpr double MIN_CPUS = 0.01;constexpr Bytes MIN_MEM = Megabytes(32);
> bool HierarchicalAllocatorProcess::allocatable(
> const Resources& resources)
> {
>   Option cpus = resources.cpus();
>   Option mem = resources.mem();
>
>   return (cpus.isSome() && cpus.get() >= MIN_CPUS) ||
>  (mem.isSome() && mem.get() >= MIN_MEM);
> }
>
>
> Issues
>
> However, there has been a couple of issues surfacing lately surrounding
> the check.
>
>-
>- - MESOS-8935 Quota limit "chopping" can lead to cpu-only and
>memory-only offers.
>
> We introduced fined-grained quota-allocation (MESOS-7099) in Mesos 1.5.
> When we
>
> allocate resources to a role, we'll "chop" the available resources of the
> agent up to the
>
> quota limit for the role. However, this has the unintended consequence of
> creating
>
> cpu-only and memory-only offers, even though there might be other agents
> with both
>
> cpu and memory resources available in the cluster.
>
>
> - MESOS-8626 The 'allocatable' check in the allocator is problematic with
> multi-role frameworks.
>
> Consider roleA reserved cpu/memory on an agent and roleB reserved disk on
> the same agent.
>
> A framework under both roleA and roleB will not be able to get the
> reserved disk due to the
>
> allocatable check. With the introduction of resource providers, the
> similar situation will
>
> become more common.
>
> Proposed change
>
> Instead of hardcoding a one-size-fits-all value in Mesos, we are proposing
> to add a new master flag
>
> min_allocatable_resources. It specifies one or more scalar resources
> quantities that define the
>
> minimum allocatable resources for the allocator. The allocator will only
> offer resources that are more
>
> than at least one of the specified resources.  The default behavior *is
> backward compatible* i.e.
>
> by default, the flag is set to “cpus:0.01|mem:32”.
>
> Usage
>
> The flag takes in either a simple text of resource(s) delimited by a bar
> (|) or a JSON array of JSON
>
> formatted resources. Note, the input should be “pure” scalar quantities
> i.e. the specified resource(s)
>
> should only have name, type (set to scalar) and scalar fields set.
>
>
> Examples:
>
>- - To eliminate cpu or memory only offer due to the quota chopping,
>- we could set the flag to “cpus:0.01;mem:32”
>-
>- - To enable offering disk only offer, we could set the flag to
>“disk:32”
>-
>- - For both, we could set the flag to “cpus:0.01;mem:32|disk:32”.
>- Then the allocator will only offer resources that at least contain
>“cpus:0.01;mem:32”
>- OR resources that at least contain “disk:32”.
>
>
> Let me know what you think! Thanks!
>
>
> -Meng
>
>


[Design Proposal] Non-resource Volume Support using CSI

2018-06-08 Thread Jie Yu
Hi {{FIRST_NAME}},

I am working on the design doc for supporting non-resource volumes in Mesos
through CSI. Please see the background and design in the following doc.
Would love to get some feedback on this! Feel free to comment in the doc.

JIRA: MESOS-8984 
Design Doc


Thanks,
- Jie


setsid(2) dependency

2018-06-05 Thread Jie Yu
Hi,

Currently, if you're using Mesos container (aka UCR), we put each container
into a separate process session

(i.e., calling setsid(2)  after fork
the initial process).

For command tasks, we actually do that twice:
1) For the command executor itself
2) When launching the task

in the command executor.

Because of that, we hit one issue that's related to setsid(2) and tty.
Please see more details about the problem in this ticket MESOS-8978
.

Anyone remember why we do setsid(2) for each container? Anyone rely on this
behavior?

I am asking because one simple way to solve this issue is to remove SETSID
child hook when launching the task in the command executor.

Thanks!
- Jie


Re: [VOTE] Release Apache Mesos 1.5.1 (rc1)

2018-05-22 Thread Jie Yu
+1

On Tue, May 15, 2018 at 2:56 PM, Greg Mann  wrote:

> +1 (binding)
>
> I did `sudo make check` and verified that only expected flaky tests failed.
>
>
> Cheers,
> Greg
>
> On Fri, May 11, 2018 at 12:35 PM, Gilbert Song  wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.5.1.
>>
>> 1.5.1 includes the following:
>> 
>> 
>> * [MESOS-1720] - Slave should send exited executor message when the
>> executor is never launched.
>> * [MESOS-7742] - Race conditions in IOSwitchboard: listening on unix
>> socket
>> and premature closing of the connection.
>> * [MESOS-8125] - Agent should properly handle recovering an executor when
>> its pid is reused.
>> * [MESOS-8411] - Killing a queued task can lead to the command executor
>> never terminating.
>> * [MESOS-8416] - CHECK failure if trying to recover nested containers but
>> the framework checkpointing is not enabled.
>> * [MESOS-8468] - `LAUNCH_GROUP` failure tears down the default executor.
>> * [MESOS-8488] - Docker bug can cause unkillable tasks.
>> * [MESOS-8510] - URI disk profile adaptor does not consider plugin type
>> for
>> a profile.
>> * [MESOS-8536] - Pending offer operations on resource provider resources
>> not properly accounted for in allocator.
>> * [MESOS-8550] - Bug in `Master::detected()` leads to coredump in
>> `MasterZooKeeperTest.MasterInfoAddress`.
>> * [MESOS-8552] - CGROUPS_ROOT_PidNamespaceForward and
>> CGROUPS_ROOT_PidNamespaceBackward tests fail.
>> * [MESOS-8565] - Persistent volumes are not visible in Mesos UI when
>> launching a pod using default executor.
>> * [MESOS-8569] - Allow newline characters when decoding base64 strings in
>> stout.
>> * [MESOS-8574] - Docker executor makes no progress when 'docker inspect'
>> hangs.
>> * [MESOS-8575] - Improve discard handling for 'Docker::stop' and
>> 'Docker::pull'.
>> * [MESOS-8576] - Improve discard handling of 'Docker::inspect()'.
>> * [MESOS-8577] - Destroy nested container if
>> `LAUNCH_NESTED_CONTAINER_SESSION` fails.
>> * [MESOS-8594] - Mesos master stack overflow in libprocess socket send
>> loop.
>> * [MESOS-8598] - Allow empty resource provider selector in
>> `UriDiskProfileAdaptor`.
>> * [MESOS-8601] - Master crashes during slave reregistration after
>> failover.
>> * [MESOS-8604] - Quota headroom tracking may be incorrect in the presence
>> of hierarchical reservation.
>> * [MESOS-8605] - Terminal task status update will not send if 'docker
>> inspect' is hung.
>> * [MESOS-8619] - Docker on Windows uses `USERPROFILE` instead of `HOME`
>> for
>> credentials.
>> * [MESOS-8624] - Valid tasks may be explicitly dropped by agent due to
>> race
>> conditions.
>> * [MESOS-8631] - Agent should be able to start a task with every CPU on a
>> Windows machine.
>> * [MESOS-8641] - Event stream could send heartbeat before subscribed.
>> * [MESOS-8646] - Agent should be able to resolve file names on open files.
>> * [MESOS-8651] - Potential memory leaks in the `volume/sandbox_path`
>> isolator.
>> * [MESOS-8741] - `Add` to sequence will not run if it races with sequence
>> destruction.
>> * [MESOS-8742] - Agent resource provider config API calls should be
>> idempotent.
>> * [MESOS-8786] - CgroupIsolatorProcess accesses subsystem processes
>> directly.
>> * [MESOS-8787] - RP-related API should be experimental.
>> * [MESOS-8876] - Normal exit of Docker container using rexray volume
>> results in TASK_FAILED.
>> * [MESOS-8881] - Enable epoll backend in libevent integration.
>> * [MESOS-8885] - Disable libevent debug mode.
>>
>> The CHANGELOG for the release is available at:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>> lain;f=CHANGELOG;hb=1.5.1-rc1
>> 
>> 
>>
>> The candidate for Mesos 1.5.1 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.5.1-rc1/mesos-1.5.1.tar.gz
>>
>> The tag to be voted on is 1.5.1-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.5.1-rc1
>>
>> The SHA512 checksum of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.5.1-rc1/mesos
>> -1.5.1.tar.gz.sha512
>>
>> The signature of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.5.1-rc1/mesos
>> -1.5.1.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1224
>>
>> Please vote on releasing this package as Apache Mesos 1.5.1!
>>
>> The vote is open until Wed May 16 12:31:02 PDT 2018 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 1.5.1
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>> Gilbert
>>
>
>


Re: LIBPROCESS_IP

2018-05-18 Thread Jie Yu
Hendrik,

It looks to me that Mesos always pass LIBPROCESS_IP to executor if agent's
LIBPROCESS_IP environment variable is set:
https://github.com/apache/mesos/blob/1.4.x/src/slave/slave.cpp#L8106-L8115

Maybe that's the difference between your vanilla Mesos, and DC/OS Mesos
config?

- Jie

On Fri, May 18, 2018 at 2:29 AM, Hendrik Haddorp 
wrote:

> Hi,
>
> I had been using Mesos 1.4.1 and now tried out a DC/OS setup (1.9.4) and
> noticed that the env variable LIBPROCESS_IP was set. This broke my
> scheduler (running as a docker container on Marathon). Things worked fine
> again once I unset the variable in my startup script.
>
> I'm now wondering why LIBPROCESS_IP is being set. Is there maybe some
> config setting in Mesos that leads to that? MESOS-3553 and MESOS-3740 are
> for example about LIBPROCESS_IP being passed on but as I didn't get it
> passed in on Mesos 1.4.1 this is odd.
>
> thanks,
> Hendrik
>


Re: Update the *Minimum Linux Kernel version* supported on Mesos

2018-04-05 Thread Jie Yu
>
> User namespaces require >= 3.12 (November 2013). Can we make that the
> minimum?


No, we need to support CentOS7 which uses 3.10 (some variant)

- Jie

On Thu, Apr 5, 2018 at 8:56 AM, James Peach  wrote:

>
>
> > On Apr 5, 2018, at 5:00 AM, Andrei Budnik  wrote:
> >
> > Hi All,
> >
> > We would like to update minimum supported Linux kernel from 2.6.23 to
> > 2.6.28.
> > Linux kernel supports cgroups v1 starting from 2.6.24, but `freezer`
> cgroup
> > functionality was merged into 2.6.28, which supports nested containers.
>
> User namespaces require >= 3.12 (November 2013). Can we make that the
> minimum?
>
> J


Re: Release policy and 1.6 release schedule

2018-03-23 Thread Jie Yu
>
> 2) Backporting will be a burden if releases are too short. I think that in
> practice, backporting will not take too much longer. If there was a
> conflict back in the tree somewhere, then it's likely that after resolving
> that conflict once, the same diff can be used to backport the change to
> previous releases as well.


I think the burden of maintaining a release branch is not just backporting.
We need to run CI to make sure every maintained release branch are working,
and do testing for that. It's a burden if there are too many release
branches.

- Jie

On Fri, Mar 23, 2018 at 9:21 PM, Jie Yu <yujie@gmail.com> wrote:

> It's a burden for supporting multiple releases.
>
> 1.2 was released March, 2017 (1 year ago), and I know that some users are
> still on that version
> 1.3 was released June, 2017 (9 months ago), and we're still maintaining it
> (still backport patches
> <https://github.com/apache/mesos/commit/064f64552624e38d5dd92660eef6f6940128c106>
>  several
> days ago, which some users asked)
> 1.4 was released Sept, 2017 (6 months ago).
> 1.5 was released Feb, 2018 (1 month ago).
>
> As you can see, users expect a release to be supported 6-9 months (e.g.,
> backports are still needed for 1.3 release, which is 9 months old). If we
> were to do monthly minor release, we'll probably need to maintain 6-9
> release branches? That's too much of an ask for committers and maintainers.
>
> I also agree with folks that there're benefits doing releases more
> frequently. Given the historical data, I'd suggest we do quarterly
> releases, and maintain three release branches.
>
> - Jie
>
>
> On Fri, Mar 23, 2018 at 10:03 AM, Greg Mann <g...@mesosphere.io> wrote:
>
>> The best motivation I can think of for a shorter release cycle is this: if
>> the release cadence is fast enough, then developers will be less likely to
>> rush a feature into a release. I think this would be a real benefit, since
>> rushing features in hurts stability. *However*, I'm not sure if every two
>> months is fast enough to bring this benefit. I would imagine that a
>> two-month wait is still long enough that people wouldn't want to wait an
>> entire release cycle to land their feature. Just off the top of my head, I
>> might guess that a release cadence of 1 month or shorter would be often
>> enough that it would always seem reasonable for a developer to wait until
>> the next release to land a feature. What do y'all think?
>>
>> Other motivating factors that have been raised are:
>> 1) Many users upgrade on a longer timescale than every ~2 months. I think
>> that this doesn't need to affect our decision regarding release timing -
>> since we guarantee compatibility of all releases with the same major
>> version number, there is no reason that a user needs to upgrade minor
>> releases one at a time. It's fine to go from 1.N to 1.(N+3), for example.
>> 2) Backporting will be a burden if releases are too short. I think that in
>> practice, backporting will not take too much longer. If there was a
>> conflict back in the tree somewhere, then it's likely that after resolving
>> that conflict once, the same diff can be used to backport the change to
>> previous releases as well.
>> 3) Adhering strictly to a time-based release schedule will help users plan
>> their deployments, since they'll be able to rely on features being
>> released
>> on-schedule. However, if we do strict time-based releases, then it will be
>> less certain that a particular feature will land in a particular release,
>> and users may have to wait a release cycle to get the feature.
>>
>> Personally, I find the idea of preventing features from being rushed into
>> a
>> release very compelling. From that perspective, I would love to see
>> releases every month. However, if we're not going to release that often,
>> then I think it does make sense to adjust our release schedule to
>> accommodate the features that community members want to land in a
>> particular release.
>>
>>
>> Jie, I'm curious why you suggest a *minimal* interval between releases.
>> Could you elaborate a bit on your motivations there?
>>
>> Cheers,
>> Greg
>>
>>
>> On Fri, Mar 16, 2018 at 2:01 PM, Jie Yu <yujie@gmail.com> wrote:
>>
>> > Thanks Greg for starting this thread!
>> >
>> >
>> >> My primary motivation here is to bring our documented policy in line
>> >> with our practice, whatever that may be
>> >
>> >
>> > +100
>> >
>> > Do people think that we should attempt to bring our release cad

Re: Release policy and 1.6 release schedule

2018-03-23 Thread Jie Yu
It's a burden for supporting multiple releases.

1.2 was released March, 2017 (1 year ago), and I know that some users are
still on that version
1.3 was released June, 2017 (9 months ago), and we're still maintaining it
(still backport patches
<https://github.com/apache/mesos/commit/064f64552624e38d5dd92660eef6f6940128c106>
several
days ago, which some users asked)
1.4 was released Sept, 2017 (6 months ago).
1.5 was released Feb, 2018 (1 month ago).

As you can see, users expect a release to be supported 6-9 months (e.g.,
backports are still needed for 1.3 release, which is 9 months old). If we
were to do monthly minor release, we'll probably need to maintain 6-9
release branches? That's too much of an ask for committers and maintainers.

I also agree with folks that there're benefits doing releases more
frequently. Given the historical data, I'd suggest we do quarterly
releases, and maintain three release branches.

- Jie

On Fri, Mar 23, 2018 at 10:03 AM, Greg Mann <g...@mesosphere.io> wrote:

> The best motivation I can think of for a shorter release cycle is this: if
> the release cadence is fast enough, then developers will be less likely to
> rush a feature into a release. I think this would be a real benefit, since
> rushing features in hurts stability. *However*, I'm not sure if every two
> months is fast enough to bring this benefit. I would imagine that a
> two-month wait is still long enough that people wouldn't want to wait an
> entire release cycle to land their feature. Just off the top of my head, I
> might guess that a release cadence of 1 month or shorter would be often
> enough that it would always seem reasonable for a developer to wait until
> the next release to land a feature. What do y'all think?
>
> Other motivating factors that have been raised are:
> 1) Many users upgrade on a longer timescale than every ~2 months. I think
> that this doesn't need to affect our decision regarding release timing -
> since we guarantee compatibility of all releases with the same major
> version number, there is no reason that a user needs to upgrade minor
> releases one at a time. It's fine to go from 1.N to 1.(N+3), for example.
> 2) Backporting will be a burden if releases are too short. I think that in
> practice, backporting will not take too much longer. If there was a
> conflict back in the tree somewhere, then it's likely that after resolving
> that conflict once, the same diff can be used to backport the change to
> previous releases as well.
> 3) Adhering strictly to a time-based release schedule will help users plan
> their deployments, since they'll be able to rely on features being released
> on-schedule. However, if we do strict time-based releases, then it will be
> less certain that a particular feature will land in a particular release,
> and users may have to wait a release cycle to get the feature.
>
> Personally, I find the idea of preventing features from being rushed into a
> release very compelling. From that perspective, I would love to see
> releases every month. However, if we're not going to release that often,
> then I think it does make sense to adjust our release schedule to
> accommodate the features that community members want to land in a
> particular release.
>
>
> Jie, I'm curious why you suggest a *minimal* interval between releases.
> Could you elaborate a bit on your motivations there?
>
> Cheers,
> Greg
>
>
> On Fri, Mar 16, 2018 at 2:01 PM, Jie Yu <yujie@gmail.com> wrote:
>
> > Thanks Greg for starting this thread!
> >
> >
> >> My primary motivation here is to bring our documented policy in line
> >> with our practice, whatever that may be
> >
> >
> > +100
> >
> > Do people think that we should attempt to bring our release cadence more
> >> in line with our current stated policy, or should the policy be changed
> >> to reflect our current practice?
> >
> >
> > I think a minor release every 2 months is probably too aggressive. I
> don't
> > have concrete data, but my feeling is that the frequency that folks
> upgrade
> > Mesos is low. I know that many users are still on 1.2.x.
> >
> > I'd actually suggest that we have a *minimal* interval between two
> > releases (e.g., 3 months), and provide some buffer for the release
> process.
> > (so we're expecting about 3 releases per year, this matches what we did
> > last year).
> >
> > And we use our dev sync to coordinate on a release after the minimal
> > release interval has elapsed (and elect a release manager).
> >
> > - Jie
> >
> > On Wed, Mar 14, 2018 at 9:51 AM, Zhitao Li <zhitaoli...@gmail.com>
> wrote:
> >
> >> An additional data point 

Re: [Containerization WG] Call for Agenda, March 22nd, 2018

2018-03-21 Thread Jie Yu
Looks like we don't have any agenda item for tomorrow. So I *cancelled* the
meeting.

See you guys in two weeks, and if you want to discuss anything, please add
the agenda item to the notes. Thanks!

- Jie

On Wed, Mar 21, 2018 at 10:30 AM, Gilbert Song  wrote:

> Folks,
>
> We are planning for a WG meeting tomorrow at 9 am PST.
>
> Please add any agenda item or topic that you like to discuss with the
> Containerization WG to the following list:
> https://docs.google.com/document/d/1z55a7tLZFoRWVuUxz1FZwgxkHeugt
> c2nHR89skFXSpU/edit#heading=h.j7quoqe53vwr
>
> Thanks,
> Gilbert
>


Re: Mesos on OS X

2018-03-21 Thread Jie Yu
There's no isolation between containers on OSX. Process management is based
on posix process tree (unlike cgroups on Linux), which has some limitations.

If you're fine with the above, then it should work.

- Jie

On Wed, Mar 21, 2018 at 9:23 PM, Benjamin Mahler  wrote:

> MacOS is a supported platform, you can see the supported versions here:
> http://mesos.apache.org/documentation/latest/building/
>
> The containerization maintainers could probably chime in to elaborate on
> the isolation caveats. For example, you won't have many of the resource
> isolators available and the launcher cannot prevent processes from
> "escaping" from the "container".
>
> On Wed, Mar 21, 2018 at 11:37 AM Ken Sipe  wrote:
>
>> I don’t have long running experience but I would expect it to work fine…
>> the thing to be aware of is that under OSX there are no cgroup constraints…
>>  you also may want to review the APPLE difference:
>> https://github.com/apache/mesos/search?utf8=%E2%9C%93=__APPLE__=
>> 
>>
>> Ken
>>
>>
>> On Mar 21, 2018, at 1:25 PM, Sunil Shah  wrote:
>>
>> Hey all,
>>
>> We're contemplating setting up a small OS X Mesos cluster for running iOS
>> tests. I know Mesos technically builds on Macs, but has anyone ever had
>> experience with a long running cluster on OS X? Is it possible?
>> Recommended? Not recommended?
>>
>> Thanks,
>>
>> Sunil
>>
>>
>>


Re: Welcome Zhitao Li as Mesos Committer and PMC Member

2018-03-12 Thread Jie Yu
Congrats Zhitao!

On Mon, Mar 12, 2018 at 2:02 PM, Gilbert Song  wrote:

> Hi,
>
> I am excited to announce that the PMC has voted Zhitao Li as a new
> committer and member of PMC for the Apache Mesos project. Please join me to
> congratulate Zhitao!
>
> Zhitao has been an active contributor to Mesos for one and a half years.
> His main contributions include:
>
>- Designed and implemented Container Image Garbage Collection (
>MESOS-4945 );
>- Designed and implemented part of the HTTP Operator API (MESOS-6007
>);
>- Reported and fixed a lot of bugs
>
> 
>.
>
> Zhitao spares no effort to improve the project quality and to propose
> ideas. Thank you Zhitao for all contributions!
>
> Here is his committer candidate checklist for your perusal:
> https://docs.google.com/document/d/1HGz7iBdo1Q9z9c8fNRgNNLnj0XQ_
> PhDhjXLAfOx139s/
>
> Congrats Zhitao!
>
> Cheers,
> Gilbert
>


Welcome Chun-Hung Hsiao as Mesos Committer and PMC Member

2018-03-10 Thread Jie Yu
Hi,

I am happy to announce that the PMC has voted Chun-Hung Hsiao as a new
committer and member of PMC for the Apache Mesos project. Please join me to
congratulate him!

Chun has been an active contributor for the past year. His main
contributions to the project include:
* Designed and implemented gRPC client support to libprocess (MESOS-7749)
* Designed and implemented Storage Local Resource Provider (MESOS-7235,
MESOS-8374)
* Implemented part of the CSI support (MESOS-7235, MESOS-8374)

Chun is friendly and humble, but also intelligent, insightful, and
opinionated. I am confident that he will be a great addition to our
committer pool. Thanks Chun for all your contributions to the project so
far!

His committer checklist can be found here:
https://docs.google.com/document/d/1FjroAvjGa5NdP29zM7-2eg6tLPAzQRMUmCorytdEI_U/edit?usp=sharing

- Jie


Re: 2 Mesos Slaves in same machine

2018-03-07 Thread Jie Yu
>
> Running 2 agents on the same server will send the same offer twice( same
> offer by each agent ) and there are chances that resources will be over
> utilized by accepting the same offer twice by the framework.


If you didn't specify `--resources` flags on the agent, then yes.

  May I know the settings( cgroup ) required to run 2 slaves in the same
> server and How does it behave?


Check out `--cgroups_root` flag in
https://github.com/apache/mesos/blob/master/docs/configuration/agent.md

This flag only applies to MesosContainerizer (can be used to launch Docker
containers).

- Jie


On Wed, Mar 7, 2018 at 10:06 PM, Baskar Sikkayan <baskar@gmail.com>
wrote:

> 1. Is my understanding correct?
>
> Running 2 agents on the same server will send the same offer twice( same
> offer by each agent ) and there are chances that resources will be over
> utilized by accepting the same offer twice by the framework.
>
> 2. May I know the settings( cgroup ) required to run 2 slaves in the same
> server and How does it behave?
>
> On Wed, Mar 7, 2018 at 10:03 PM, Jie Yu <yujie@gmail.com> wrote:
>
>> Running Docker containers won't work properly because restarting one
>> agent will cause Docker containers managed by the other agent to be deleted.
>>
>> On Wed, Mar 7, 2018 at 9:58 PM, Baskar Sikkayan <baskar@gmail.com>
>> wrote:
>>
>>> We dont have any isolation setting. Looks like 2 slaves are sending too
>>> many offers( same resource offers by 2 agents ) and hence servers gets
>>> overloaded by too many docker jobs and docker becomes unresponsive. Even
>>> "docker ps" not working in this case and docker meta files gets corrupted
>>> on server reboot. Not sure why docker becomes unresponsive. Is running 2
>>> slaves are the main reason for this issue?
>>>
>>> On Wed, Mar 7, 2018 at 9:52 PM, Jie Yu <yujie@gmail.com> wrote:
>>>
>>>> It depends on your isolation setting (mainly cgroup, or any node level
>>>> resources). In general, we don't recommend folks use multiple agents on a
>>>> node.
>>>>
>>>> It's possible to make it work by setting `cgroup_root` separately for
>>>> MesosContainerizer. For DockerContainerizer, currently, we hard code
>>>> `DOCKER_NAME_PREFIX`, making it not possible to use two agents on a node
>>>> properly.
>>>>
>>>> - Jie
>>>>
>>>> On Wed, Mar 7, 2018 at 9:48 PM, Baskar Sikkayan <baskar@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>   We are running 3 node mesos cluster and on each node running 1 mesos
>>>>> master and 2 mesos slaves. Is this a good practice?
>>>>>
>>>>> Will it be a problem running 2 slaves as it might offer too much and
>>>>> will get overloaded.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Baskar.S
>>>>>
>>>>
>>>>
>>>
>>
>


Re: 2 Mesos Slaves in same machine

2018-03-07 Thread Jie Yu
Running Docker containers won't work properly because restarting one agent
will cause Docker containers managed by the other agent to be deleted.

On Wed, Mar 7, 2018 at 9:58 PM, Baskar Sikkayan <baskar@gmail.com>
wrote:

> We dont have any isolation setting. Looks like 2 slaves are sending too
> many offers( same resource offers by 2 agents ) and hence servers gets
> overloaded by too many docker jobs and docker becomes unresponsive. Even
> "docker ps" not working in this case and docker meta files gets corrupted
> on server reboot. Not sure why docker becomes unresponsive. Is running 2
> slaves are the main reason for this issue?
>
> On Wed, Mar 7, 2018 at 9:52 PM, Jie Yu <yujie@gmail.com> wrote:
>
>> It depends on your isolation setting (mainly cgroup, or any node level
>> resources). In general, we don't recommend folks use multiple agents on a
>> node.
>>
>> It's possible to make it work by setting `cgroup_root` separately for
>> MesosContainerizer. For DockerContainerizer, currently, we hard code
>> `DOCKER_NAME_PREFIX`, making it not possible to use two agents on a node
>> properly.
>>
>> - Jie
>>
>> On Wed, Mar 7, 2018 at 9:48 PM, Baskar Sikkayan <baskar@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>>   We are running 3 node mesos cluster and on each node running 1 mesos
>>> master and 2 mesos slaves. Is this a good practice?
>>>
>>> Will it be a problem running 2 slaves as it might offer too much and
>>> will get overloaded.
>>>
>>>
>>> Thanks,
>>> Baskar.S
>>>
>>
>>
>


Re: 2 Mesos Slaves in same machine

2018-03-07 Thread Jie Yu
It depends on your isolation setting (mainly cgroup, or any node level
resources). In general, we don't recommend folks use multiple agents on a
node.

It's possible to make it work by setting `cgroup_root` separately for
MesosContainerizer. For DockerContainerizer, currently, we hard code
`DOCKER_NAME_PREFIX`, making it not possible to use two agents on a node
properly.

- Jie

On Wed, Mar 7, 2018 at 9:48 PM, Baskar Sikkayan 
wrote:

> Hi,
>
>   We are running 3 node mesos cluster and on each node running 1 mesos
> master and 2 mesos slaves. Is this a good practice?
>
> Will it be a problem running 2 slaves as it might offer too much and will
> get overloaded.
>
>
> Thanks,
> Baskar.S
>


Re: Anyone using a custom Sorter?

2018-03-01 Thread Jie Yu
if your intention is to kill sorter interface, i am +100

On Wed, Feb 28, 2018 at 2:12 PM, Michael Park  wrote:

> I'm not even sure if anyone's using a custom Allocator, but
> is anyone using a custom Sorter? It doesn't seem like there's
> even a module for it so it wouldn't be dynamically loaded.
>
> Perhaps you have a fork with a custom Sorter?
>
> Please let me know,
>
> Thanks!
>
> MPark
>


Re: Docker image for fast e2e test with Mesos

2018-02-11 Thread Jie Yu
Thanks for the pointer. Yes, I am aware of https://www.minimesos.org, which
uses a vagrant like workflow (the last release was 11 months ago).

My goal is to have a single docker image that contains all the components,
so that running the entire stack will be just a single `docker run`.
Another goal I want to achieve is to test unreleased Mesos versions.

- Jie

On Sun, Feb 11, 2018 at 4:21 PM, Craig Wickesser <codecr...@gmail.com>
wrote:

> Might be worth checking out mini-mesos as well https://www.minimesos.org
>
> On Sun, Feb 11, 2018 at 7:05 PM Jie Yu <yujie@gmail.com> wrote:
>
>> Hi,
>>
>> When we were developing a framework with Mesos, we realized that it'll be
>> great to have a Docker image that allows framework developers to quickly
>> test with Mesos APIs (preferably new APIs that haven't been released yet).
>> The docker container will have both Mesos master and agent running,
>> allowing framework developers to easily write e2e integration tests with
>> Mesos.
>>
>> Therefore, I went ahead and added some scripts
>> <https://github.com/apache/mesos/tree/master/support/mesos-mini> in the
>> project repo to enable that. I temporarily called the docker image "
>> mesos-mini <https://hub.docker.com/r/mesos/mesos-mini/>" (better name
>> suggestion is welcome!) I also created a Jenkins
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Mini> job that
>> pushes nightly "mesos-mini" docker image with the head of Mesos project.
>>
>> Here is the simple instruction to use it:
>>
>> $ docker run --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080
>> mesos/mesos-mini:2018-02-11
>>
>> Once the container is running, test master endpoint at localhost:5050
>> (e.g., the webui). The agent endpoint will be at localhost:5051. I
>> installed the latest marathon (1.5.5) in the docker image too, so marathon
>> endpoint is at localhost:8080
>>
>> Enjoy! Patches to add more example frameworks are very welcome!
>>
>> - Jie
>>
> --
>
> https://github.com/mindscratch
> https://www.google.com/+CraigWickesser
> https://twitter.com/mind_scratch
> https://twitter.com/craig_links
>
>


Docker image for fast e2e test with Mesos

2018-02-11 Thread Jie Yu
Hi,

When we were developing a framework with Mesos, we realized that it'll be
great to have a Docker image that allows framework developers to quickly
test with Mesos APIs (preferably new APIs that haven't been released yet).
The docker container will have both Mesos master and agent running,
allowing framework developers to easily write e2e integration tests with
Mesos.

Therefore, I went ahead and added some scripts
 in the
project repo to enable that. I temporarily called the docker image "
mesos-mini " (better name
suggestion is welcome!) I also created a Jenkins
 job that
pushes nightly "mesos-mini" docker image with the head of Mesos project.

Here is the simple instruction to use it:

$ docker run --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080
mesos/mesos-mini:2018-02-11

Once the container is running, test master endpoint at localhost:5050
(e.g., the webui). The agent endpoint will be at localhost:5051. I
installed the latest marathon (1.5.5) in the docker image too, so marathon
endpoint is at localhost:8080

Enjoy! Patches to add more example frameworks are very welcome!

- Jie


Re: [VOTE] Release Apache Mesos 1.5.0 (rc2)

2018-02-02 Thread Jie Yu
+1

Verified in our internal CI that `sudo make check` passed in CentOS 6,
CentOS7, Debian 8, Ubuntu 14.04, Ubuntu 16.04 (both w/ or w/o SSL enabled).


On Thu, Feb 1, 2018 at 5:36 PM, Gilbert Song  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.5.0.
>
> 1.5.0 includes the following:
> 
> 
>   * Support Container Storage Interface (CSI).
>   * Agent reconfiguration policy.
>   * Auto GC docker images in Mesos Containerizer.
>   * Standalone containers.
>   * Support gRPC client.
>   * Non-leading VOTING replica catch-up.
>
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
> lain;f=CHANGELOG;hb=1.5.0-rc2
> 
> 
>
> The candidate for Mesos 1.5.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.0-rc2/mesos-1.5.0.tar.gz
>
> The tag to be voted on is 1.5.0-rc2:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.5.0-rc2
>
> The MD5 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.0-rc2/mesos
> -1.5.0.tar.gz.md5
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.0-rc2/mesos
> -1.5.0.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1222
>
> Please vote on releasing this package as Apache Mesos 1.5.0!
>
> The vote is open until Tue Feb  6 17:35:16 PST 2018 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.5.0
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Jie and Gilbert
>


Re: [Containerization WG] Sync tomorrow

2018-01-24 Thread Jie Yu
Instruction on how to join the sync can be found here:
https://docs.google.com/document/d/1z55a7tLZFoRWVuUxz1FZwgxkHeugtc2nHR89skFXSpU/edit#

- Jie

On Wed, Jan 24, 2018 at 3:09 PM, Jie Yu <yujie@gmail.com> wrote:

> Hi folks,
>
> In tomorrow's sync, we'll be discussing the plan for the next Mesos
> release in the containerization area. We'll be reviewing this planning
> spreadsheet and update it accordingly:
> https://docs.google.com/spreadsheets/d/1_e-YRvRDoX766ZdzoE1TtXx3eJUx7fBoZ
> kBx_1TwPjY/edit
>
> Please join us if you're interested!
>
> - Jie
>


[Containerization WG] Sync tomorrow

2018-01-24 Thread Jie Yu
Hi folks,

In tomorrow's sync, we'll be discussing the plan for the next Mesos release
in the containerization area. We'll be reviewing this planning spreadsheet
and update it accordingly:
https://docs.google.com/spreadsheets/d/1_e-YRvRDoX766ZdzoE1TtXx3eJUx7fBoZkBx_1TwPjY/edit

Please join us if you're interested!

- Jie


Re: Questions about Pods and the Mesos Containerizer

2018-01-24 Thread Jie Yu
I can help answer some of them:

Is it possible to do healthchecks per task in a pod?

I believe so given healthcheck is at the TaskInfo level, but AlexR can
confirm.

 Is it possible to allocate a separate IP address per container in a pod?

 Not right now, but possible. We need to change the CNI network isolator to
support that, but there might be caveats on the road.

Is there any plan to support the Docker containeriser with pods?

Probably not. If I want to do that, I'd prefer we refactor Docker
containerizer to use containerd first, and then support pod there.

 Timeframe for debugging tools (equivalent of docker exec, etc)?

We'll have a containerization WG meeting tomorrow morning. I'll make sure
this is on the list. Not timeframe yet, but this shouldn't take too long.

Is there any performance data about using the Mesos containeriser with
> container images versus using the Docker containeriser?
> how does the Mesos containerizer handle extremely large images?
> how does the Mesos containerizer handle dozens/hundreds of concurrent
> pulls?


I believe Uber folks might have some data on this (cc Zhitao)?

- Jie

On Wed, Jan 24, 2018 at 2:21 PM, David Morrison  wrote:

> Hi Mesos community!
>
> We’re in the process of designing a Mesos framework to launch multiple
> containers together on the same host and are considering a couple of
> approaches. The first is to use pods (with the TASK_GROUP primitive), and
> the second is write a custom executor that launches nested containers and
> use CNI to handle networking.
>
> With that in mind, we had the following questions:
>
> Questions about pods/task_groups:
>
>-
>
>Is it possible to do healthchecks per task in a pod?
>-
>
>Is it possible to allocate a separate IP address per container in a
>pod?
>-
>
>Is there any plan to support the Docker containeriser with pods?
>
>
> Questions about UCR/Mesos containerizer:
>
>-
>
>Timeframe for debugging tools (equivalent of docker exec, etc)?
>-
>
>Is there any performance data about using the Mesos containeriser with
>container images versus using the Docker containeriser?
>-
>
>   how does the Mesos containerizer handle extremely large images?
>   -
>
>   how does the Mesos containerizer handle dozens/hundreds of
>   concurrent pulls?
>
>
> If anyone has had any experience using the UCR and/or pods with the sort
> of workflow we’re considering, your input would be highly useful!
>
> Cheers,
>
> David Morrison
>
> Software Engineer @ Yelp
>
>


Re: [VOTE] Release Apache Mesos 1.5.0 (rc1)

2018-01-23 Thread Jie Yu
+1

Verified in our internal CI that `sudo make check` passed in CentOS 6,
CentOS7, Debian 8, Ubuntu 14.04, Ubuntu 16.04 (both w/ or w/o SSL enabled).

- Jie

On Mon, Jan 22, 2018 at 9:17 PM, Sam  wrote:

> +1
>
>
> Regards,
>
> [image: Watch the Video]
> 
>
>
> On Jan 23, 2018, at 11:15 AM, Gilbert Song  wrote:
>
> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.5.0.
>
> 1.5.0 includes the following:
> 
> 
>   * Support Container Storage Interface (CSI).
>   * Agent reconfiguration policy.
>   * Auto GC docker images in Mesos Containerizer.
>   * Standalone containers.
>   * Support gRPC client.
>   * Non-leading VOTING replica catch-up.
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
> plain;f=CHANGELOG;hb=1.5.0-rc1
> 
> 
>
> The candidate for Mesos 1.5.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.0-rc1/mesos-1.5.0.tar.gz
>
> The tag to be voted on is 1.5.0-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.5.0-rc1
>
> The MD5 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.0-rc1/
> mesos-1.5.0.tar.gz.md5
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.5.0-rc1/
> mesos-1.5.0.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1221
>
> Please vote on releasing this package as Apache Mesos 1.5.0!
>
> The vote is open until Thu Jan 25 18:24:36 PST 2018 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.5.0
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Jie and Gilbert
>
>


Re: Mesos replicated log fills disk with logging output

2018-01-08 Thread Jie Yu
Stephan,

I haven't seen that before. A quick Google search suggests that it might be
related to leveldb. The following thread might be related.
https://groups.google.com/d/msg/leveldb/lRrbv4Y0YgU/AtfRTfQXNoYJ

What is the filesystem you're using?

- JIe

On Mon, Jan 8, 2018 at 2:28 PM, Stephan Erb 
wrote:

> Hi everyone,
>
>
>
> a few days ago, we have bumped into an interesting issue that we had not
> seen before. Essentially, one of our toy clusters dissolved itself:
>
>
>
>- 3 masters, each running Mesos (1.2.1), Aurora (0.19.0), and
>ZooKeeper (3.4.5) for leader election
>- Master 1 and master 2 had 100% disk usage, because
>/var/lib/mesos/replicated_log/LOG had grown to about 170 GB
>- The replicated log of both Master 1 and 2 was corrupted. A process
>restart did not fix it.
>- The ZooKeeper on Master 2 was corrupted as well. Logs indicated this
>was caused by the full disk.
>- Master 3 was the leading Mesos master and healthy. Its disk usage
>was normal.
>
>
>
>
>
> The content of /var/lib/mesos/replicated_log/LOG was an endless stream of:
>
>
>
> 2018/01/04-12:30:56.776466 7f65aae877c0 Recovering log #1753
>
> 2018/01/04-12:30:56.776577 7f65aae877c0 Level-0 table #1756: started
>
> 2018/01/04-12:30:56.778885 7f65aae877c0 Level-0 table #1756: 7526 bytes OK
>
> 2018/01/04-12:30:56.782433 7f65aae877c0 Delete type=0 #1753
>
> 2018/01/04-12:30:56.782484 7f65aae877c0 Delete type=3 #1751
>
> 2018/01/04-12:30:56.782642 7f6597fff700 Level-0 table #1759: started
>
> 2018/01/04-12:30:56.782686 7f6597fff700 Level-0 table #1759: 0 bytes OK
>
> 2018/01/04-12:30:56.783242 7f6597fff700 Delete type=0 #1757
>
> 2018/01/04-12:30:56.783312 7f6597fff700 Compacting 4@0 + 1@1 files
>
> 2018/01/04-12:30:56.783499 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0
> ]
>
> 2018/01/04-12:30:56.783538 7f6597fff700 Delete type=2 #1760
>
> 2018/01/04-12:30:56.783563 7f6597fff700 Compaction error: IO error:
> /var/lib/mesos/replicated_log/001735.sst: No such file or directory
>
> 2018/01/04-12:30:56.783598 7f6597fff700 Manual compaction at level-0 from
> (begin) .. (end); will stop at '003060' @ 9423 : 1
>
> 2018/01/04-12:30:56.783607 7f6597fff700 Compacting 4@0 + 1@1 files
>
> 2018/01/04-12:30:56.783698 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0
> ]
>
> 2018/01/04-12:30:56.783728 7f6597fff700 Delete type=2 #1761
>
> 2018/01/04-12:30:56.783749 7f6597fff700 Compaction error: IO error:
> /var/lib/mesos/replicated_log/001735.sst: No such file or directory
>
> 2018/01/04-12:30:56.783770 7f6597fff700 Compacting 4@0 + 1@1 files
>
> 2018/01/04-12:30:56.783900 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0
> ]
>
> 2018/01/04-12:30:56.783929 7f6597fff700 Delete type=2 #1762
>
> 2018/01/04-12:30:56.783950 7f6597fff700 Compaction error: IO error:
> /var/lib/mesos/replicated_log/001735.sst: No such file or directory
>
> 2018/01/04-12:30:56.783970 7f6597fff700 Compacting 4@0 + 1@1 files
>
> 2018/01/04-12:30:56.784312 7f6597fff700 compacted to: files[ 4 1 0 0 0 0 0
> ]
>
> 2018/01/04-12:30:56.785547 7f6597fff700 Delete type=2 #1763
>
>
>
> Content of the associated folder:
>
>
>
> /var/lib/mesos/replicated_log.corrupted# ls -la
>
> total 964480
>
> drwxr-xr-x 2 mesos mesos  4096 Jan  5 10:12 .
>
> drwxr-xr-x 4 mesos mesos  4096 Jan  5 10:27 ..
>
> -rw-r--r-- 1 mesos mesos   724 Dec 14 16:22 001735.ldb
>
> -rw-r--r-- 1 mesos mesos  7393 Dec 14 16:45 001737.sst
>
> -rw-r--r-- 1 mesos mesos 22129 Jan  3 12:53 001742.sst
>
> -rw-r--r-- 1 mesos mesos 14967 Jan  3 13:00 001747.sst
>
> -rw-r--r-- 1 mesos mesos  7526 Jan  4 12:30 001756.sst
>
> -rw-r--r-- 1 mesos mesos 15113 Jan  5 10:08 001765.sst
>
> -rw-r--r-- 1 mesos mesos 65536 Jan  5 10:09 001767.log
>
> -rw-r--r-- 1 mesos mesos16 Jan  5 10:08 CURRENT
>
> -rw-r--r-- 1 mesos mesos 0 Aug 25  2015 LOCK
>
> -rw-r--r-- 1 mesos mesos 178303865220 Jan  5 10:12 LOG
>
> -rw-r--r-- 1 mesos mesos 463093282 Jan  5 10:08 LOG.old
>
> -rw-r--r-- 1 mesos mesos 65536 Jan  5 10:08 MANIFEST-001764
>
>
>
> Monitoring indicates that the disk usage started to grow shortly after a
> badly coordinated configuration deployment change:
>
>
>
>- Master 1 was leading and restarted after a few hours of uptime
>- Master 2 was now leading. After a few seconds (30s-60s or so) it got
>restarted as well
>- Master 3 was now leading (and continued to do so)
>
>
>
> I have to admit I am a bit surprised that the restart scenario could lead
> to the issues described above. Has anyone seen similar issues as well?
>
>
>
> Thanks and best regards,
>
> Stephan
>


Re: Container user '27' is not supported

2017-12-27 Thread Jie Yu
Just realized that this is already a warning (not failure).

The code that emits this warning is here:
https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/isolators/docker/runtime.cpp#L106-L119

And `getContainerUser` is defined here:
https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/isolators/docker/runtime.cpp#L384-L395

Basically, if your docker image defines a 'user' in the manifest (i.e.,
`USER` directive in your dockerfile, see
https://docs.docker.com/engine/reference/builder/), Mesos will emit this
warning.

The warning basically tells you that Mesos Containerizer will ignore this
field in the Dockerfile when launching your docker container.
MesosContainerizer will always launch your docker container using the uid
mapped from the specified CommandInfo.user or FrameworkInfo.user on the
agent host, irrespective of whether you defined a 'user' in your dockerfile
or not.

- Jie




On Wed, Dec 27, 2017 at 10:54 AM, Marc Roos <m.r...@f1-outsourcing.eu>
wrote:

>
>
> These are the only messages I get when I am launching the container.
>
> Dec 27 19:38:42 m02 mesos-slave[25084]: W1227 19:38:42.944775 25114
> runtime.cpp:111] Container user 'sflowrt' is not supported yet for
> container db4b85df-bf75-46a2-a080-88079d98b7a4
> Dec 27 19:38:42 m02 mesos-slave[25084]: W1227 19:38:42.944775 25114
> runtime.cpp:111] Container user 'sflowrt' is not supported yet for
> container db4b85df-bf75-46a2-a080-88079d98b7a4
>
> The reason why I am looking at these 'user' settings, is that a default
> mesos setup, is not running them.
>
>
> Marathon conf:
> {
>   "id": "sflow/vizceral",
>   "cmd": null,
>   "cpus": 0.2,
>   "mem": 256,
>   "instances": 1,
>   "acceptedResourceRoles": ["*"],
>   "constraints": [["hostname", "CLUSTER", "m02.local"]],
>   "container": {
> "type": "MESOS",
> "docker": {
>   "image": "sflow/vizceral",
>   "credential": null,
>   "forcePullImage": false
> }
>
>   }
> }
>
> marathon-1.5.2-1.noarch
> mesos-1.4.1-2.0.1.x86_64
>
>
> -Original Message-
> From: Jie Yu [mailto:yujie@gmail.com]
> Sent: woensdag 27 december 2017 17:57
> To: user
> Subject: Re: Container user '27' is not supported
>
> The 'user' specified in the image won't be honored. The current code
> will reject the container launch if the 'user' is specified in the image
> (although, i think we should print a warning if --switch_user flag is on
> because Mesos will always overwrite the user, similar to `docker run
> -u`, I'll send out patch shortly).
>
> Can you try to remove the user directive in your Dockerfile and try
> again?
>
> - Jie
>
> On Tue, Dec 26, 2017 at 6:21 AM, Marc Roos <m.r...@f1-outsourcing.eu>
> wrote:
>
>
>
> I added these changes to the mesos node:
>
> echo "true" > /etc/mesos-slave/switch_user (although I think this
> is the
> default)
> chmod u+s /usr/sbin/mesos-agent
> useradd sflowrt
>
> Modified the marathon conf to:
>
> {
>   "id": "sflow/vizceral",
>   "cmd": null,
>   "cpus": 0.2,
>   "mem": 256,
>   "user": "sflowrt",
>   "instances": 1,
>   "acceptedResourceRoles": ["*"],
>   "constraints": [["hostname", "CLUSTER", "m02.local"]],
>   "container": {
> "type": "MESOS",
> "docker": {
>   "image": "sflow/vizceral",
>   "credential": null,
>   "forcePullImage": false
> }
>
>   }
> }
>
> But still getting these:
>
> Dec 26 15:18:02 m02 mesos-slave[25084]: W1226 15:18:02.415927 25111
> runtime.cpp:111] Container user 'sflowrt' is not supported yet for
> container 4e8d2cf6-b772-4e51-8154-1b8b6244f98f
> Dec 26 15:18:02 m02 mesos-slave[25084]: W1226 15:18:02.415927 25111
> runtime.cpp:111] Container user 'sflowrt' is not supported yet for
> container 4e8d2cf6-b772-4e51-8154-1b8b6244f98f
>
>
>
>
>
>
>
>
>
>
>
>
>
> -Original Message-
> From: Tomek Janiszewski [mailto:jani...@gmail.com]
> Sent: zondag 24 december 2017 15:24
> To: user@mesos.apache.org

Re: Container user '27' is not supported

2017-12-27 Thread Jie Yu
The 'user' specified in the image won't be honored. The current code will
reject the container launch if the 'user' is specified in the image
(although, i think we should print a warning if --switch_user flag is on
because Mesos will always overwrite the user, similar to `docker run -u`,
I'll send out patch shortly).

Can you try to remove the user directive in your Dockerfile and try again?

- Jie

On Tue, Dec 26, 2017 at 6:21 AM, Marc Roos  wrote:

>
> I added these changes to the mesos node:
>
> echo "true" > /etc/mesos-slave/switch_user (although I think this is the
> default)
> chmod u+s /usr/sbin/mesos-agent
> useradd sflowrt
>
> Modified the marathon conf to:
>
> {
>   "id": "sflow/vizceral",
>   "cmd": null,
>   "cpus": 0.2,
>   "mem": 256,
>   "user": "sflowrt",
>   "instances": 1,
>   "acceptedResourceRoles": ["*"],
>   "constraints": [["hostname", "CLUSTER", "m02.local"]],
>   "container": {
> "type": "MESOS",
> "docker": {
>   "image": "sflow/vizceral",
>   "credential": null,
>   "forcePullImage": false
> }
>
>   }
> }
>
> But still getting these:
>
> Dec 26 15:18:02 m02 mesos-slave[25084]: W1226 15:18:02.415927 25111
> runtime.cpp:111] Container user 'sflowrt' is not supported yet for
> container 4e8d2cf6-b772-4e51-8154-1b8b6244f98f
> Dec 26 15:18:02 m02 mesos-slave[25084]: W1226 15:18:02.415927 25111
> runtime.cpp:111] Container user 'sflowrt' is not supported yet for
> container 4e8d2cf6-b772-4e51-8154-1b8b6244f98f
>
>
>
>
>
>
>
>
>
>
>
>
> -Original Message-
> From: Tomek Janiszewski [mailto:jani...@gmail.com]
> Sent: zondag 24 december 2017 15:24
> To: user@mesos.apache.org
> Subject: Re: Container user '27' is not supported
>
> This might be the following limitations
>
> > If the --switch_user flag is set on the agent and the framework
> specifies a user (either CommandInfo.user or FrameworkInfo.user), we
> expect that user exists in the container image and its uid and gids
> matches that on the host. User namespace is not supported yet. If the
> user is not specified, root will be used by default. The operator or the
> framework can limit the capabilities of the container by using the
> linux/capabilities isolator.
>
>
>
> niedz., 24.12.2017, 14:20 użytkownik Marc Roos
>  napisał:
>
>
>
> I am seeing this in the logs:
>
> Container user '27' is not supported yet for container
> d823196a-4ec3-41e3-a4c0-6680ba5cc99
>
> I guess this means that the container requests to run under a
> specific
> user id, and this is not yet available in mesos?
>
> mesos-1.4.1-2.0.1.x86_64
>
>
>
>


Re: Cannot enable the isolator volume/host_path

2017-12-22 Thread Jie Yu
What's your Mesos version?

On Fri, Dec 22, 2017 at 4:09 PM, Marc Roos  wrote:

>
>
> Dec 23 01:04:33 m01 mesos-slave[10344]: E1223 01:04:33.213044 10344
> main.cpp:489] EXIT with status 1: Failed to create a containerizer:
> Could not create MesosContainerizer: Failed to create isolator
> 'volume/host_path': Unknown or unsupported isolator
>
> /etc/mesos-slave/isolation
> filesystem/linux,docker/runtime,volume/image
>
> http://mesos.apache.org/documentation/latest/container-volume/
>


Re: Mesos 1.5.0 Release

2017-12-22 Thread Jie Yu
Branched!

https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=shortlog;h=refs/heads/1.5.x

- JIe

On Fri, Dec 22, 2017 at 3:12 PM, Jie Yu <yujie@gmail.com> wrote:

> Hey folks,
>
> I did a cleanup on tickets that target for 1.5.0. Here are the remaining
> tickets:
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20MESOS%20AND%20status%20in%20(Open%2C%20%22In%
> 20Progress%22%2C%20Reviewable%2C%20Accepted)%20AND%20%
> 22Target%20Version%2Fs%22%20%3D%201.5.0
>
> Please let take the action to either re-target, or let me know that it has
> to go into 1.5.0.
>
> Given the current situation, I'll do the *branch off *today (1.5.x) and
> tag rc1 probably next week.
>
> *For Committers*: if you have patch that you want to land in 1.5.0,
> please also commit into the 1.5.x branch!
>
> Thanks,
> - Jie
>
> On Fri, Dec 22, 2017 at 3:14 AM, Alex Rukletsov <a...@mesosphere.com>
> wrote:
>
>> https://issues.apache.org/jira/browse/MESOS-8297 has just landed. Let's
>> include it in 1.5.0 as well.
>>
>> On Fri, Dec 22, 2017 at 4:35 AM, Jie Yu <yujie@gmail.com> wrote:
>>
>>> Yeah, I am doing a grooming right now.
>>>
>>> Sent from my iPhone
>>>
>>> > On Dec 21, 2017, at 7:25 PM, Benjamin Mahler <bmah...@apache.org>
>>> wrote:
>>> >
>>> > Meng is working on https://issues.apache.org/jira/browse/MESOS-8352
>>> and we
>>> > should land it tonight if not tomorrow. I can cherry pick if it's after
>>> > your cut, and worst case it can go in 1.5.1.
>>> >
>>> > Have you guys gone over the unresolved items targeted for 1.5.0? I see
>>> a
>>> > lot of stuff, might be good to start adjusting / removing their target
>>> > versions to give folks a chance to respond on the ticket?
>>> >
>>> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20M
>>> ESOS%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%
>>> 2C%20Reviewable%2C%20Accepted)%20AND%20%22Target%20Version%
>>> 2Fs%22%20%3D%201.5.0
>>> >
>>> > For example, https://issues.apache.org/jira/browse/MESOS-8337 looks
>>> pretty
>>> > bad to me (master crash).
>>> >
>>> >> On Thu, Dec 21, 2017 at 7:00 PM, Jie Yu <yujie@gmail.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> We're about to cut 1.5.0-rc1 tomorrow. If you have any thing that
>>> needs to
>>> >> go into 1.5.0 that hasn't landed, please let me or Gilbert know asap.
>>> >> Thanks!
>>> >>
>>> >> - Jie
>>> >>
>>> >>> On Fri, Dec 1, 2017 at 3:58 PM, Gilbert Song <gilb...@apache.org>
>>> wrote:
>>> >>>
>>> >>> Folks,
>>> >>>
>>> >>> It is time for Mesos 1.5.0 release. I am the release manager.
>>> >>>
>>> >>> We plan to cut the rc1 in next couple weeks. Please start to wrap up
>>> >>> patches if you are contributing or shepherding any issue. If you
>>> expect
>>> >>> any
>>> >>> particular JIRA for this new release, please set *Target Version* as
>>> "
>>> >>> *1.5.0"* and mark it as "*Blocker*" priority.
>>> >>>
>>> >>> The dashboard for Mesos 1.5.0 will be posted in this thread soon.
>>> >>>
>>> >>> Cheers,
>>> >>> Gilbert
>>> >>>
>>> >>
>>> >>
>>>
>>
>>
>


Re: Mesos 1.5.0 Release

2017-12-22 Thread Jie Yu
Hey folks,

I did a cleanup on tickets that target for 1.5.0. Here are the remaining
tickets:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reviewable%2C%20Accepted)%20AND%20%22Target%20Version%2Fs%22%20%3D%201.5.0

Please let take the action to either re-target, or let me know that it has
to go into 1.5.0.

Given the current situation, I'll do the *branch off *today (1.5.x) and tag
rc1 probably next week.

*For Committers*: if you have patch that you want to land in 1.5.0, please
also commit into the 1.5.x branch!

Thanks,
- Jie

On Fri, Dec 22, 2017 at 3:14 AM, Alex Rukletsov <a...@mesosphere.com> wrote:

> https://issues.apache.org/jira/browse/MESOS-8297 has just landed. Let's
> include it in 1.5.0 as well.
>
> On Fri, Dec 22, 2017 at 4:35 AM, Jie Yu <yujie@gmail.com> wrote:
>
>> Yeah, I am doing a grooming right now.
>>
>> Sent from my iPhone
>>
>> > On Dec 21, 2017, at 7:25 PM, Benjamin Mahler <bmah...@apache.org>
>> wrote:
>> >
>> > Meng is working on https://issues.apache.org/jira/browse/MESOS-8352
>> and we
>> > should land it tonight if not tomorrow. I can cherry pick if it's after
>> > your cut, and worst case it can go in 1.5.1.
>> >
>> > Have you guys gone over the unresolved items targeted for 1.5.0? I see a
>> > lot of stuff, might be good to start adjusting / removing their target
>> > versions to give folks a chance to respond on the ticket?
>> >
>> > https://issues.apache.org/jira/issues/?jql=project%20%3D%
>> 20MESOS%20AND%20status%20in%20(Open%2C%20%22In%20Progress%
>> 22%2C%20Reviewable%2C%20Accepted)%20AND%20%22Target%
>> 20Version%2Fs%22%20%3D%201.5.0
>> >
>> > For example, https://issues.apache.org/jira/browse/MESOS-8337 looks
>> pretty
>> > bad to me (master crash).
>> >
>> >> On Thu, Dec 21, 2017 at 7:00 PM, Jie Yu <yujie@gmail.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> We're about to cut 1.5.0-rc1 tomorrow. If you have any thing that
>> needs to
>> >> go into 1.5.0 that hasn't landed, please let me or Gilbert know asap.
>> >> Thanks!
>> >>
>> >> - Jie
>> >>
>> >>> On Fri, Dec 1, 2017 at 3:58 PM, Gilbert Song <gilb...@apache.org>
>> wrote:
>> >>>
>> >>> Folks,
>> >>>
>> >>> It is time for Mesos 1.5.0 release. I am the release manager.
>> >>>
>> >>> We plan to cut the rc1 in next couple weeks. Please start to wrap up
>> >>> patches if you are contributing or shepherding any issue. If you
>> expect
>> >>> any
>> >>> particular JIRA for this new release, please set *Target Version* as "
>> >>> *1.5.0"* and mark it as "*Blocker*" priority.
>> >>>
>> >>> The dashboard for Mesos 1.5.0 will be posted in this thread soon.
>> >>>
>> >>> Cheers,
>> >>> Gilbert
>> >>>
>> >>
>> >>
>>
>
>


Re: Mesos 1.5.0 Release

2017-12-21 Thread Jie Yu
Yeah, I am doing a grooming right now.

Sent from my iPhone

> On Dec 21, 2017, at 7:25 PM, Benjamin Mahler <bmah...@apache.org> wrote:
> 
> Meng is working on https://issues.apache.org/jira/browse/MESOS-8352 and we
> should land it tonight if not tomorrow. I can cherry pick if it's after
> your cut, and worst case it can go in 1.5.1.
> 
> Have you guys gone over the unresolved items targeted for 1.5.0? I see a
> lot of stuff, might be good to start adjusting / removing their target
> versions to give folks a chance to respond on the ticket?
> 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reviewable%2C%20Accepted)%20AND%20%22Target%20Version%2Fs%22%20%3D%201.5.0
> 
> For example, https://issues.apache.org/jira/browse/MESOS-8337 looks pretty
> bad to me (master crash).
> 
>> On Thu, Dec 21, 2017 at 7:00 PM, Jie Yu <yujie@gmail.com> wrote:
>> 
>> Hi,
>> 
>> We're about to cut 1.5.0-rc1 tomorrow. If you have any thing that needs to
>> go into 1.5.0 that hasn't landed, please let me or Gilbert know asap.
>> Thanks!
>> 
>> - Jie
>> 
>>> On Fri, Dec 1, 2017 at 3:58 PM, Gilbert Song <gilb...@apache.org> wrote:
>>> 
>>> Folks,
>>> 
>>> It is time for Mesos 1.5.0 release. I am the release manager.
>>> 
>>> We plan to cut the rc1 in next couple weeks. Please start to wrap up
>>> patches if you are contributing or shepherding any issue. If you expect
>>> any
>>> particular JIRA for this new release, please set *Target Version* as "
>>> *1.5.0"* and mark it as "*Blocker*" priority.
>>> 
>>> The dashboard for Mesos 1.5.0 will be posted in this thread soon.
>>> 
>>> Cheers,
>>> Gilbert
>>> 
>> 
>> 


Re: Mesos 1.5.0 Release

2017-12-21 Thread Jie Yu
Hi,

We're about to cut 1.5.0-rc1 tomorrow. If you have any thing that needs to
go into 1.5.0 that hasn't landed, please let me or Gilbert know asap.
Thanks!

- Jie

On Fri, Dec 1, 2017 at 3:58 PM, Gilbert Song  wrote:

> Folks,
>
> It is time for Mesos 1.5.0 release. I am the release manager.
>
> We plan to cut the rc1 in next couple weeks. Please start to wrap up
> patches if you are contributing or shepherding any issue. If you expect any
> particular JIRA for this new release, please set *Target Version* as "
> *1.5.0"* and mark it as "*Blocker*" priority.
>
> The dashboard for Mesos 1.5.0 will be posted in this thread soon.
>
> Cheers,
> Gilbert
>


Re: explain these replication logs?

2017-12-13 Thread Jie Yu
Nov  8 00:12:37 host2091 aurora-scheduler[80146]: I1108 00:12:37.005336 80278
replica.cpp:710] Persisted action APPEND at position 67516183
Nov  8 00:00:32 host1162 aurora-scheduler[14638]: I1108 00:00:32.736395 14772
coordinator.cpp:348] Coordinator attempting to write APPEND action at
position 67516183

Looks like this is for the same position. So your disk might be slow (or
having some issues) during that time?

On Wed, Dec 13, 2017 at 9:27 AM, Mohit Jaggi  wrote:

> Folks,
> Can you help please?
>
> Mohit.
> -- Forwarded message --
> From: Bill Farner 
> Date: Wed, Dec 13, 2017 at 9:06 AM
> Subject: Re: explain these replication logs?
> To: u...@aurora.apache.org
>
>
> I'm unfamiliar.  The mesos dev list may be able to give more insight.  I'd
> be interested in your findings!
>
> On Tue, Dec 12, 2017 at 4:32 PM, Mohit Jaggi  wrote:
>
>> For the same position I see two bursts of writes, one around 00:12:36 and
>> another 12 min earlier. Any idea what this means?
>>
>> ~/a/a/aurora-outage ❯❯❯ grep 67516183 cpp-repl-logs
>> Nov  8 00:12:36 host1161 aurora-scheduler[112446]: I1108 00:12:36.979269
>> 112579 replica.cpp:390] Replica received explicit promise request from
>> __req_res__(7)@172.0.6.42.2:8083 for position 67516183 with proposal
>> 33898
>> Nov  8 00:12:36 host1161 aurora-scheduler[112446]: I1108 00:12:36.982532
>> 112579 replica.cpp:710] Persisted action NOP at position 67516183
>> Nov  8 00:12:36 host1161 aurora-scheduler[112446]: I1108 00:12:36.990187
>> 112580 replica.cpp:693] Replica received learned notice for position
>> 67516183 from @0.0.0.0:0
>> Nov  8 00:12:36 host1161 aurora-scheduler[112446]: I1108 00:12:36.994510
>> 112580 replica.cpp:710] Persisted action APPEND at position 67516183
>> Nov  8 00:12:36 host2091 aurora-scheduler[80146]: I1108 00:12:36.978763
>> 80281 replica.cpp:390] Replica received explicit promise request from
>> __req_res__(6)@172.0.6.42.2:8083 for position 67516183 with proposal
>> 33898
>> Nov  8 00:12:36 host2091 aurora-scheduler[80146]: I1108 00:12:36.989364
>> 80281 replica.cpp:710] Persisted action NOP at position 67516183
>> Nov  8 00:12:36 host2091 aurora-scheduler[80146]: I1108 00:12:36.989794
>> 80278 replica.cpp:693] Replica received learned notice for position
>> 67516183 from @0.0.0.0:0
>> Nov  8 00:12:37 host2091 aurora-scheduler[80146]: I1108 00:12:37.005336
>> 80278 replica.cpp:710] Persisted action APPEND at position 67516183
>> Nov  8 00:00:32 host1162 aurora-scheduler[14638]: I1108 00:00:32.736395
>> 14772 coordinator.cpp:348] Coordinator attempting to write APPEND action at
>> position 67516183
>> Nov  8 00:00:32 host1162 aurora-scheduler[14638]: I1108 00:00:32.736794
>> 14756 replica.cpp:539] Replica received write request for position 67516183
>> from __req_res__(4)@172.0.8.42.11:8083
>> Nov  8 00:00:32 host1162 aurora-scheduler[14638]: I1108 00:00:32.740519
>> 14756 replica.cpp:710] Persisted action APPEND at position 67516183
>> Nov  8 00:00:32 host1162 aurora-scheduler[14638]: I1108 00:00:32.749094
>> 14764 replica.cpp:693] Replica received learned notice for position
>> 67516183 from @0.0.0.0:0
>> Nov  8 00:00:32 host1162 aurora-scheduler[14638]: I1108 00:00:32.749300
>> 14764 replica.cpp:710] Persisted action APPEND at position 67516183
>> Nov  8 00:12:36 host1162 aurora-scheduler[46132]: I1108 00:12:36.992617
>> 46463 replica.cpp:390] Replica received explicit promise request from
>> __req_res__(8)@172.0.6.42.2:8083 for position 67516183 with proposal
>> 33898
>> Nov  8 00:12:36 host1162 aurora-scheduler[46132]: I1108 00:12:36.993018
>> 46463 replica.cpp:710] Persisted action APPEND at position 67516183
>> Nov  8 00:12:36 host1162 aurora-scheduler[46132]: I1108 00:12:36.993108
>> 46463 replica.cpp:693] Replica received learned notice for position
>> 67516183 from @0.0.0.0:0
>> Nov  8 00:12:36 host1162 aurora-scheduler[46132]: I1108 00:12:36.993345
>> 46463 replica.cpp:710] Persisted action APPEND at position 67516183
>> Nov  8 00:12:37 host1159 aurora-scheduler[37324]: I1108 00:12:36.978830
>> 37443 replica.cpp:390] Replica received explicit promise request from
>> __req_res__(10)@172.0.6.42.2:8083 for position 67516183 with proposal
>> 33898
>> Nov  8 00:12:37 host1159 aurora-scheduler[37324]: I1108 00:12:36.988788
>> 37443 replica.cpp:710] Persisted action APPEND at position 67516183
>> Nov  8 00:12:37 host1159 aurora-scheduler[37324]: I1108 00:12:36.989609
>> 37444 replica.cpp:693] Replica received learned notice for position
>> 67516183 from @0.0.0.0:0
>> Nov  8 00:12:37 host1159 aurora-scheduler[37324]: I1108 00:12:36.989812
>> 37444 replica.cpp:710] Persisted action APPEND at position 67516183
>> Nov  8 00:00:32 host1159 aurora-scheduler[37324]: I1108 00:00:32.737551
>> 37453 replica.cpp:539] Replica received write request for position 67516183
>> from __req_res__(6)@172.0.8.42.11:8083
>> Nov  8 00:00:32 host1159 aurora-scheduler[37324]: I1108 

Re: Dedicated ip to task

2017-12-11 Thread Jie Yu
+ Avinash

On Mon, Dec 11, 2017 at 2:21 PM, Marc Roos  wrote:

>
>
> In this https://youtu.be/0UMCoojACOs?t=1737 cni video of Avinash
> Sridharan, he has a haproxy setup with two webservers on different
> networks. But how does he know what these ip adresses will be, so he can
> configure them in the proxy?
>
>
>
>
>


Re: Welcome Andrew Schwartzmeyer as a new committer and PMC member!

2017-11-27 Thread Jie Yu
Congrats, Andy!

- Jie

On Mon, Nov 27, 2017 at 3:00 PM, Joseph Wu  wrote:

> Hi devs & users,
>
> I'm happy to announce that Andrew Schwartzmeyer has become a new committer
> and member of the PMC for the Apache Mesos project.  Please join me in
> congratulating him!
>
> Andrew has been an active contributor to Mesos for about a year.  He has
> been the primary contributor behind our efforts to change our default build
> system to CMake and to port Mesos onto Windows.
>
> Here is his committer candidate checklist for your perusal:
> https://docs.google.com/document/d/1MfJRYbxxoX2-A-g8NEeryUdU
> i7FvIoNcdUbDbGguH1c/
>
> Congrats Andy!
> ~Joseph
>


Re: Is it possible to configure a mesos agent to use multiple work directories?

2017-11-22 Thread Jie Yu
You can config multiple disks for persistent volumes. Please see this doc
for more details:
http://mesos.apache.org/documentation/latest/multiple-disk/

- Jie

On Wed, Nov 22, 2017 at 1:57 PM, Jeff Kubina  wrote:

> Thanks, that is what I thought.
>
> Why: To spread the I/O-workload of some frameworks across many disks.
>
> --
> Jeff Kubina
> 410-988-4436 <(410)%20988-4436>
>
>
> On Wed, Nov 22, 2017 at 2:21 PM, Vinod Kone  wrote:
>
>> No. Why do you need that?
>>
>> On Wed, Nov 22, 2017 at 10:42 AM, Jeff Kubina 
>> wrote:
>>
>>> Is it possible to configure a mesos agent to use multiple work
>>> directories (the work_dir parameter)?
>>>
>>>
>>
>


Re: Directory mounted in job not visible on host

2017-10-23 Thread Jie Yu
Tobias,

By default, Mesos marks all mounts in a Mesos container as slave mount.
Therefore, the mount propagation is from host to container, but not
container to host.

I am actually working on a patch chain to enable bidirectional mount
propagation:
https://issues.apache.org/jira/browse/MESOS-7306

In particular, see the proposed API in this patch:
https://reviews.apache.org/r/63213/

That'll help achieve your goal. Stay tuned and we'll land this in next
Mesos release.

- Jie

On Mon, Oct 23, 2017 at 1:00 AM, Tobias Pfeiffer  wrote:

> Hi,
>
> in my Mesos job (Mesos containerizer) I am mounting a squashfs image file
> to some directory on the file system and can access the directory and its
> contents fine from within that job. However, on the Mesos host (i.e., not
> in the job itself) that directory does not appear in the output of the
> `mount` command and when inspecting the directory, it is empty. In
> particular, my Mesos job launches a Docker container and mounts that
> previously mounted directory as a volume (don't ask ...), but in the Docker
> container that volume is also empty.
>
> I am wondering if there is any way that I could make a mount operation
> performed by a job visible to the outside world?
>
> Thanks
> Tobias
>
>


Re: Adding the limited resource to TaskStatus messages

2017-10-09 Thread Jie Yu
+1

On Mon, Oct 9, 2017 at 10:56 AM, James Peach  wrote:

> Hi all,
>
> In https://reviews.apache.org/r/62644/, I am proposing to add an optional
> Resources field to the TaskStatus message named `limited_resources`.
>
> In the case that a task is killed because it violated a resource
> constraint (ie. the reason field is REASON_CONTAINER_LIMITATION,
> REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY),
> this field may be populated with the resource that triggered the
> limitation. This is intended to give better information to schedulers about
> task resource failures, in the expectation that it will help them bubble
> useful information up to the user or a monitoring system.
>
> diff --git a/include/mesos/v1/mesos.proto b/include/mesos/v1/mesos.proto
> index d742adbbf..559d09e37 100644
> --- a/include/mesos/v1/mesos.proto
> +++ b/include/mesos/v1/mesos.proto
> @@ -2252,6 +2252,13 @@ message TaskStatus {
>// status updates for tasks running on agents that are unreachable
>// (e.g., partitioned away from the master).
>optional TimeInfo unreachable_time = 14;
> +
> +  // If the reason field indicates a container resource limitation,
> +  // this field contains the resource whose limits were violated.
> +  //
> +  // NOTE: 'Resources' is used here because the resource may span
> +  // multiple roles (e.g. `"mem(*):1;mem(role):2"`).
> +  repeated Resource limited_resources = 16;
>  }
>
>
>
> cheers,
> James
>
>
>


Re: Containerizers & Isolation

2017-09-14 Thread Jie Yu
>
> - When someone is using the Docker Containerizer what kind of isolation
> can be used and how Mesos handles it?


It's the default isolation that Docker engine provides. You can alter some
of those using ContainerInfo.DockerInfo.

- Custom isolators can be used both for Docker Containerizer and Mesos
> Containerizer or only in Mesos Containerizer?


Custom isolators only work for Mesos Containerizer currently.

- Is there any isolation between the executors under the same container
> when using Mesos Containerizer?


There should be only one executor per container.

 - A task is launched by command executor, default mesos executor, or
> custom executor. In each case task goes under the container of the
> executor? or a task and an executor are two different containers?


Tasks go under the container of the executor.

- Jie


On Thu, Sep 14, 2017 at 8:18 AM, Thodoris Zois  wrote:

> Hello list,
>
> I have 4 questions, and i would be glad if someone can help..
>
> - When someone is using the Docker Containerizer what kind of isolation
> can be used and how Mesos handles it?
>
> - Custom isolators can be used both for Docker Containerizer and Mesos
> Containerizer or only in Mesos Containerizer?
>
> - Is there any isolation between the executors under the same container
> when using Mesos Containerizer?
>
> - A task is launched by command executor, default mesos executor, or
> custom executor. In each case task goes under the container of the
> executor? or a task and an executor are two different containers?
>   Because i saw an image (attached) on the book Apache Mesos Essentials
> and it actually confused me..
>
> Thank you very much,
> Thodoris
>
>


Re: Welcome James Peach as a new committer and PMC memeber!

2017-09-06 Thread Jie Yu
Congrats James! Well deserved!

On Wed, Sep 6, 2017 at 2:08 PM, Yan Xu  wrote:

> Hi Mesos devs and users,
>
> Please welcome James Peach as a new Apache Mesos committer and PMC member.
>
> James has been an active contributor to Mesos for over two years now. He
> has made many great contributions to the project which include XFS disk
> isolator, improvement to Linux capabilities support and IPC namespace
> isolator. He's super active on the mailing lists and slack channels, always
> eager to help folks in the community and he has been helping with a lot of
> Mesos reviews as well.
>
> Here is his formal committer candidate checklist:
>
> https://docs.google.com/document/d/19G5zSxhrRBdS6GXn9KjCznjX
> 3cp0mUbck6Jy1Hgn3RY/edit?usp=sharing
> 
>
> Congrats James!
>
> Yan
>
>


Re: Deprecating `--disable-zlib` in libprocess

2017-08-08 Thread Jie Yu
+1 on removing this flag.

On Tue, Aug 8, 2017 at 11:32 AM, Benjamin Mahler  wrote:

> Sorry, I think this was me, feel free to remove it from libprocess now that
> it's required.
>
> On Tue, Aug 8, 2017 at 10:57 AM, Chun-Hung Hsiao 
> wrote:
>
> > Hi all,
> >
> > In libprocess, we have an optional `--disable-zlib` flag, but it's
> > currently not used
> > for conditional compilation and we always use zlib in libprocess,
> > and there's a requirement check in Mesos to make sure that zlib exists.
> > Should this option be removed then?
> > Or is there anyone working on a system without zlib?
> >
> > Thanks for your opinions!
> > Chun-Hung
> >
>


Re: Welcome Greg Mann as a new committer and PMC member!

2017-06-13 Thread Jie Yu
Congrats Greg!

On Tue, Jun 13, 2017 at 2:42 PM, Vinod Kone  wrote:

> Hi folks,
>
> Please welcome Greg Mann as the newest committer and PMC member of the
> Apache Mesos project.
>
> Greg has been an active contributor to the Mesos project for close to 2
> years now and has made many solid contributions. His biggest source code
> contribution to the project has been around adding authentication support
> for default executor. This was a major new feature that involved quite a
> few moving parts. Additionally, he also worked on improving the scheduler
> and executor APIs.
>
> Here is his more formal checklist for your perusal.
>
> https://docs.google.com/document/d/1S6U5OFVrl7ySmpJsfD4fJ3_
> R8JYRRc5spV0yKrpsGBw/edit
>
> Thanks,
> Vinod
>
>


Welcome Gilbert Song as a new committer and PMC member!

2017-05-24 Thread Jie Yu
Hi folks,

I' happy to announce that the PMC has voted Gilbert Song as a new committer and
member of PMC for the Apache Mesos project. Please join me to congratulate
him!

Gilbert has been working on Mesos project for 1.5 years now. His main
contribution
is his work on unified containerizer, nested container (aka Pod) support.
He also helped a lot of folks in the community regarding their patches,
questions and etc. He also played an important role organizing MesosCon
Asia last year and this year!

His formal committer checklist can be found here:
https://docs.google.com/document/d/1iSiqmtdX_0CU-YgpViA6r6PU_aMCVuxuNUZ458FR7Qw/edit?usp=sharing

Welcome, Gilbert!

- Jie


Re: [VOTE] Release Apache Mesos 1.1.2 (rc1)

2017-05-09 Thread Jie Yu
-1

I suggest we include this fix in 1.1.2
https://issues.apache.org/jira/browse/MESOS-7471

On Thu, May 4, 2017 at 12:07 PM, Alex Rukletsov  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.1.2.
>
> 1.1.2 includes the following:
> 
> 
> ** Bug
>   * [MESOS-2537] - AC_ARG_ENABLED checks are broken.
>   * [MESOS-5028] - Copy provisioner cannot replace directory with symlink.
>   * [MESOS-5172] - Registry puller cannot fetch blobs correctly from http
> Redirect 3xx urls.
>   * [MESOS-6327] - Large docker images causes container launch failures:
> Too many levels of symbolic links.
>   * [MESOS-7057] - Consider using the relink functionality of libprocess in
> the executor driver.
>   * [MESOS-7119] - Mesos master crash while accepting inverse offer.
>   * [MESOS-7152] - The agent may be flapping after the machine reboots due
> to provisioner recover.
>   * [MESOS-7197] - Requesting tiny amount of CPU crashes master.
>   * [MESOS-7210] - HTTP health check doesn't work when mesos runs with
> --docker_mesos_image.
>   * [MESOS-7237] - Enabling cgroups_limit_swap can lead to "invalid
> argument" error.
>   * [MESOS-7265] - Containerizer startup may cause sensitive data to leak
> into sandbox logs.
>   * [MESOS-7350] - Failed to pull image from Nexus Registry due to
> signature missing.
>   * [MESOS-7366] - Agent sandbox gc could accidentally delete the entire
> persistent volume content.
>   * [MESOS-7383] - Docker executor logs possibly sensitive parameters.
>   * [MESOS-7422] - Docker containerizer should not leak possibly sensitive
> data to agent log.
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
> plain;f=CHANGELOG;hb=1.1.2-rc1
> 
> 
>
> The candidate for Mesos 1.1.2 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.1.2-rc1/mesos-1.1.2.tar.gz
>
> The tag to be voted on is 1.1.2-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.1.2-rc1
>
> The MD5 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.1.2-rc1/
> mesos-1.1.2.tar.gz.md5
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.1.2-rc1/
> mesos-1.1.2.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1188
>
> Please vote on releasing this package as Apache Mesos 1.1.2!
>
> The vote is open until Tue May 9 12:12:12 CEST 2017 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.1.2
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Alex & Till
>


Re: [VOTE] Release Apache Mesos 1.3.0 (rc1)

2017-05-09 Thread Jie Yu
-1

I want to get this fix into 1.3.0
https://issues.apache.org/jira/browse/MESOS-7471

- Jie

On Tue, May 9, 2017 at 1:14 AM, Yan Xu  wrote:

> We work around autotools and protobuf bugs and glibc is only harder for
> users and developers to upgrade. :)
>
> I agree that we can establish the minimum glibc version/linux distro
> releases etc we support but currently we don't and there are folks who use
> Mesos that depend on this version.
>
> We should follow http://mesos.apache.org/documentation/latest/versioning/
> and when answers are not in there, we should seek consensus to improve the
> process and update the doc and give folks adequate time to adjust to the
> new or better defined) process. In the meantime, I think we shouldn't break
> the current users.
>
> ---
> Jiang Yan Xu  | @xujyan 
>
> On Mon, May 8, 2017 at 3:59 PM, Neil Conway  wrote:
>
>> Personally, I'm not convinced that we need to fix MESOS-7378. The
>> problem is essentially a bug in glibc that was fixed 6 years ago. (As
>> a point of reference, the oldest version of g++ we support was
>> released 2 years ago... :) )
>>
>> Neil
>>
>> On Mon, May 8, 2017 at 3:45 PM, Yan Xu  wrote:
>> > I am still hoping that we get
>> > https://issues.apache.org/jira/browse/MESOS-7378 fixed before shipping
>> > 0.13.0. :)
>> >
>> > ---
>> > Jiang Yan Xu  | @xujyan
>> >
>> > On Fri, May 5, 2017 at 6:31 PM, Michael Park  wrote:
>> >>
>> >> Hi all,
>> >>
>> >> Please vote on releasing the following candidate as Apache Mesos 1.3.0.
>> >>
>> >>
>> >> 1.3.0 includes the following:
>> >>
>> >> 
>> 
>> >>   - Multi-role framework support
>> >>   - Executor authentication support
>> >>   - Allow frameworks to modify their roles.
>> >>   - Hierarchical roles (*EXPERIMENTAL*)
>> >>
>> >> The CHANGELOG for the release is available at:
>> >>
>> >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>> lain;f=CHANGELOG;hb=1.3.0-rc1
>> >>
>> >> 
>> 
>> >>
>> >> The candidate for Mesos 1.3.0 release is available at:
>> >> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/mesos
>> -1.3.0.tar.gz
>> >>
>> >> The tag to be voted on is 1.3.0-rc1:
>> >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit
>> ;h=1.3.0-rc1
>> >>
>> >> The MD5 checksum of the tarball can be found at:
>> >>
>> >> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/mesos
>> -1.3.0.tar.gz.md5
>> >>
>> >> The signature of the tarball can be found at:
>> >>
>> >> https://dist.apache.org/repos/dist/dev/mesos/1.3.0-rc1/mesos
>> -1.3.0.tar.gz.asc
>> >>
>> >> The PGP key used to sign the release is here:
>> >> https://dist.apache.org/repos/dist/release/mesos/KEYS
>> >>
>> >> The JAR is up in Maven in a staging repository here:
>> >> https://repository.apache.org/content/repositories/orgapachemesos-1190
>> >>
>> >> Please vote on releasing this package as Apache Mesos 1.3.0!
>> >>
>> >> The vote is open until Wed May 10 11:59:59 PDT 2017 and passes if a
>> >> majority of at least 3 +1 PMC votes are cast.
>> >>
>> >> [ ] +1 Release this package as Apache Mesos 1.3.0
>> >> [ ] -1 Do not release this package because ...
>> >>
>> >> Thanks,
>> >>
>> >> MPark & Neil
>> >
>> >
>>
>
>


Re: [Req]Starting Japan User Group

2017-05-09 Thread Jie Yu
Shingo,

This is great! Can you send a pull request to update this doc?
https://github.com/apache/mesos/blob/master/site/source/community/user-groups.html.md

- Jie

On Tue, May 9, 2017 at 9:50 AM, Kitayama, Shingo 
wrote:

>
>
> Hi, owners of Mesos User Groups.
>
>
>
> This is Shingo.Kitayama from Hewlett Packard Enterprise Japan Ltd.
>
> I’m in charge of open source solution delivery including Apache Mesos or
> other cloud orchestration tools.
>
>
>
> You may know that Hewlett Packard Enterprise announced the relationship
> with Mesosphere,
>
> after that, I am giving high priority to convey to Japanese engineers the
> appeal of Mesos.
>
>
>
> Additionally I and my team members are planning to held the Mesos Meetup
> in Tokyo and to extend its value continuously.
>
> In this meetup, we will foster the adoption of Mesos, exchange tips or
> issues, and make up collaborative solutions with participants.
>
> So could you approve to establish Japan User group at Tokyo? Of course we
> are going to form a commercial-free community group!
>
>
>
> Best regards,
>
> Shingo from Japan@Tokyo
>
>
>
> [image: PointNext] 
>
> Services to Accelerate your Digital Transformation
>
> by *Hewlett-Packard Enterprise Japan, Ltd*.
>
>
>
>  〒136-8711 東京都江東区大島2丁目2番1号
>
> テクノロジーコンサルティング事業統括
>
> クロス・インダストリ・ソリューション統括本部
>
> テクノロジーアーキテクト部 (EG TSC CIS TA)
>
>
>
> *北山晋吾** Shingo.Kitayama* (21958016)
>
> TEL: 090-1269-8125
>  Mail Stop: HQ5-B22
>  Mailto:shingo.kitay...@hpe.com
>
>
>


Re: Mesos fetcher error when running as non-root user

2017-04-26 Thread Jie Yu
Graig,

Thanks for reporting! I believe this might be related to MESOS-7208
, which is fixed in the
1.2.x branch. So 1.2.1 should have this issue resolved. Is there a way to
test 1.2.x branch see if this problem still exists?

- Jie

On Wed, Apr 26, 2017 at 12:54 PM, De Groot (CTR), Craig <
craig.degroot@usgs.gov> wrote:

> Joseph,
>
> Below is the error log from the agent.  The user has permission to read
> the file (docker.tar.gz).  It just can't create the file because the copy
> runs as the specified user but the sandbox directory is owned by root.
> Curiously, both the stderr and stdout files are owned by the specified
> user.  I see similar errors (cp: cannot create regular file) in the stderr
> file in the sandbox.
>
> ---
>
> W0413 11:59:29.657481 43771 fetcher.cpp:896] Begin fetcher log (stderr in
> sandbox) for container 51a13bcd-9598-423d-b437-54324960f5f7 from running
> command: /usr/libexec/mesos/mesos-fetcher
> I0413 11:59:29.617892 43795 fetcher.cpp:531] Fetcher Info:
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/14820533-768a-4c32-8a99-
> 80cbca9958af-S4\/testuser","items":[{"action":"BYPASS_
> CACHE","uri":{"cache":false,"executable":false,"extract":
> true,"value":"\/usr\/local\/usgs\/bridge\/docker.tar.gz"}}
> ],"sandbox_directory":"\/usr\/local\/usgs\/mesos\/working\/
> slaves\/14820533-768a-4c32-8a99-80cbca9958af-S4\/
> frameworks\/f61717d6-3ee2-40a7-bbd0-dfdd36a11932-\/
> executors\/marathon-test.8eed2d73-206a-11e7-b25d-
> 02421f7b2f1a\/runs\/51a13bcd-9598-423d-b437-54324960f5f7","
> user":"testuser"}
> I0413 11:59:29.634415 43795 fetcher.cpp:442] Fetching URI
> '/usr/local/usgs/bridge/docker.tar.gz'
> I0413 11:59:29.634462 43795 fetcher.cpp:283] Fetching directly into the
> sandbox directory
> I0413 11:59:29.634488 43795 fetcher.cpp:220] Fetching URI
> '/usr/local/usgs/bridge/docker.tar.gz'
> cp: cannot create regular file ‘/usr/local/usgs/mesos/
> working/slaves/14820533-768a-4c32-8a99-80cbca9958af-S4/
> frameworks/f61717d6-3ee2-40a7-bbd0-dfdd36a11932-/
> executors/marathon-test.8eed2d73-206a-11e7-b25d-
> 02421f7b2f1a/runs/51a13bcd-9598-423d-b437-54324960f5f7/docker.tar.gz’:
> Permission denied
> Failed to fetch '/usr/local/usgs/bridge/docker.tar.gz': Failed to copy
> '/usr/local/usgs/bridge/docker.tar.gz': exited with status 1
>
> End fetcher log for container 51a13bcd-9598-423d-b437-54324960f5f7
> E0413 11:59:29.657582 43771 fetcher.cpp:558] Failed to run mesos-fetcher:
> Failed to fetch all URIs for container '51a13bcd-9598-423d-b437-54324960f5f7'
> with exit status: 256
> E0413 11:59:29.670099 43770 slave.cpp:4650] Container
> '51a13bcd-9598-423d-b437-54324960f5f7' for executor
> 'marathon-test.8eed2d73-206a-11e7-b25d-02421f7b2f1a' of framework
> f61717d6-3ee2-40a7-bbd0-dfdd36a11932- failed to start: Failed to
> fetch all URIs for container '51a13bcd-9598-423d-b437-54324960f5f7' with
> exit status: 256
>
>
>
> __
> Craig De Groot
> Systems Engineer
> Stinger Ghaffarian Technologies (SGT)
> Technical Support Services Contractor to the
> U.S. Geological Survey (USGS)
> Earth Resources Observation and Science (EROS) Center
> 47914 252nd Street
> Sioux Falls, SD 57198-0001
> Ph: 605-594-2507 <(605)%20594-2507>
>
>
> On Wed, Apr 26, 2017 at 1:58 PM, Joseph Wu  wrote:
>
>> There was a change in 1.2.0 which changed how the fetcher would chown the
>> sandbox:
>> https://issues.apache.org/jira/browse/MESOS-5218
>>
>> Prior to 1.2, when the fetcher ran, it would recursively chown the entire
>> sandbox to the given user.  This was incorrect behavior, since the Mesos
>> agent will create the sandbox under the same user (but might put some root
>> files in the non-root sandbox).
>>
>> Can you check your agent logs and paste the fetcher's error here?
>>
>> On Wed, Apr 26, 2017 at 9:06 AM, De Groot (CTR), Craig <
>> craig.degroot@usgs.gov> wrote:
>>
>>> We recently upgraded from Mesos 1.1.0 to 1.2.0 and are encountering
>>> errors with code that previously worked in 1.1.0.  I believe that this is a
>>> bug in the new version.  If not, I would like to know the correct procedure
>>> for using the sandbox as a user other than root.
>>>
>>> Here is the scenario:
>>> 1) Setup a job in Marathon which specifies a URI to our private
>>> docker.tar.gz
>>>   - See: this for an example ... https://mesosphere.github.
>>> io/marathon/docs/native-docker-private-registry.html
>>>   - This is a local file on each node
>>>
>>> 2) Specify a User (other than root) in the Marathon UI
>>>
>>> 3) Mesos will try to fetch the file and fails during the copy because
>>> the ownership of the sandbox directory are not changed to the specified
>>> user.
>>>   - Note that 1.1.0 correctly set the sandbox directory to the specified
>>> user
>>>   - This behavior is documented in the Mesos Docs here (see "specifying
>>> a user name"):  

Re: Mesos installation problems

2017-04-03 Thread Jie Yu
Zois, have you joined the Apache Mesos slack? (search Slack in the
following link)
http://mesos.apache.org/community/

You'll get a lot of help from there.

- Jie

On Mon, Apr 3, 2017 at 8:00 AM, Zois Theodoros  wrote:

> Hello,
>
> I am new to Mesos and i have many problems on installing locally Mesos to
> my laptop. Is there any forum or something that somebody can help me solve
> some problems?
>
> Thank you
>
>


Re: Mesos (and Marathon) port mapping

2017-03-31 Thread Jie Yu
Thomas,

- it is the hostports which are used to multiplex traffic into container.
> My understanding is that, since each container is in it's network
> namespace, it has its own full range of container ports and that you use a
> direct mapping (hostport n <-> same container port n), is that correct ?

Yes.

- those ports which are divided into disjoint subsets are the ephermeral
> ports. The non- ephemeral ports are in a set shared between all containers,
> correct ?


No. non-ephemeral ports are allocated by framework. (non-ephemeral ports
are modeled as Resources in Mesos). So containers must have disjoint sets
of non-ephemeral ports.

- the use case you described is when you cannot afford one ip/container and
> when you are using the mesos containeraizer : does it mean that network
> mapping isolation makes no sense with the docker containerizer or can it be
> somehow composed with it ?]


If you're looking for private bridge + DNAT solution (like Docker
--net=bridge), you can follow the following docs if you want to use it with
Mesos containerizer. It's supported through a more standard interface
called CNI (https://github.com/containernetworking/cni)
https://github.com/apache/mesos/blob/master/docs/cni.md
https://github.com/apache/mesos/blob/master/docs/cni.md#a-port-mapper-plugin-for-cni-networks

The ip/container limitation is not related to which containerizer you're
using. It's specific to the company (Twitter)'s environment. For instance,
we cannot change the service discovery mechanism at that time, requiring
all container's IP must be routable.

I didn't quite understand why you cannot use NAT (in the same way docker in
> BRIDGE mode does) and assign as many ip addresses that you want in a
> private network...


See my response above. If you're looking for docker --net=bridge support,
follow the two links above.

- Jie

On Fri, Mar 31, 2017 at 3:39 AM, Thomas HUMMEL 
wrote:

> Thanks for your answer,
>
> I've watched your talk. Very interesting.
>
> Let me check if I get everything staight :
>
> - it is the hostports which are used to multiplex traffic into container.
> My understanding is that, since each container is in it's network
> namespace, it has its own full range of container ports and that you use a
> direct mapping (hostport n <-> same container port n), is that correct ?
>
> - those ports which are divided into disjoint subsets are the ephermeral
> ports. The non- ephemeral ports are in a set shared between all containers,
> correct ?
>
> - the use case you described is when you cannot afford one ip/container
> and when you are using the mesos containeraizer : does it mean that network
> mapping isolation makes no sense with the docker containerizer or can it be
> somehow composed with it ?]
>
> I didn't quite understand why you cannot use NAT (in the same way docker
> in BRIDGE mode does) and assign as many ip addresses that you want in a
> private network...
>
> Thanks.
>
> --
>
> TH.
>
>
>
>


Re: Mesos (and Marathon) port mapping

2017-03-31 Thread Jie Yu
Tomek and Olivier,

The bridge network support (with port mapping) has been added to Mesos 1.2.
See this doc for more details how to use it:
https://github.com/apache/mesos/blob/master/docs/cni.md#a-port-mapper-plugin-for-cni-networks

TL;DR: we developed a CNI port mapper plugin (DNAT) in Mesos repo, and uses
a delegation model in CNI. For the bridge CNI plugin, you can simply use
the default bridge plugin in CNI repo (
https://github.com/containernetworking/cni). @avinash can explain more here.



On Fri, Mar 31, 2017 at 3:40 AM, Olivier Sallou <olivier.sal...@irisa.fr>
wrote:

>
>
> On 03/31/2017 10:23 AM, Tomek Janiszewski wrote:
>
> I have a question that is related to this topic. In "docker support and
> current limitations" section [1] there is a following statement:
> > Only host network is supported. We will add bridge network support soon
> using CNI support in Mesos (MESOS-4641
> <https://issues.apache.org/jira/browse/MESOS-4641>)
> Mentioned issue is resolved. Does this means bridge network is working for
> Mesos containerizer?
>
> [1]: https://github.com/apache/mesos/blob/master/docs/
> container-image.md#docker-support-and-current-limitations
>
> CNI support in unified containerizer (mesos) gives the possibility to
> assign an IP per container, so no port mapping (the ports you use will be
> used direclty as container has its own IP address). There is no "bridge"
> network as per Docker (mapping of container port 80 to host port 3 for
> example)
>
> Olivier
>
>
> pt., 31 mar 2017 o 02:04 użytkownik Jie Yu <yujie@gmail.com> napisał:
>
>> are you talking about the NAT feature of docker in BRIDGE m
>>
>>
>> Yes
>>
>>  - regarding the "port mapping isolator giving network namespace" : what
>> confuses me is that, given the previous answers, I thought that in that
>> case, the non-ephemeral port range was *shared* (as a ressource) between
>> containers, which sounds to me at the opposite of the namespace concept (as
>> a slightly different example 2 docker container have their own private 80
>> port for instance).
>>
>>
>> The port mapping isolator is for the case where ip per container is not
>> possible (due to ipam restriction, etc), but the user still wants to have
>> network namespace per container (for isolation, getting statistics, etc.)
>>
>> Since all containers, even if they are in separate namespaces, share the
>> same IP, we have to use some other mechanism to tell which packet belongs
>> to which container. We use ports in that case. You can find more details
>> about port mapping isolator in this talk I gave in 2015 MesosCon:
>> https://www.youtube.com/watch?v=ZA96g1M4v8Y
>>
>> - Jie
>>
>> On Thu, Mar 30, 2017 at 2:13 AM, Thomas HUMMEL <thomas.hum...@pasteur.fr>
>> wrote:
>>
>>
>> On 03/29/2017 07:25 PM, Jie Yu wrote:
>>
>> Thomas,
>>
>> I think you are confused about the port mapping for NAT purpose, and the port
>> mapping isolator
>> <http://mesos.apache.org/documentation/latest/port-mapping-isolator/>.
>> Those two very different thing. The port mapping isolator (unfortunate
>> naming), as described in the doc, gives you network namespace per container
>> without requiring ip per container. No NAT is involved. I think for you
>> case, you should not use it and it does not work for DockerContainerizer.
>>
>> Thanks,
>>
>> I'm not sure to understand what you say :
>>
>> - are you talking about the NAT feature of docker in BRIDGE mode ?
>>
>> - regarding the "port mapping isolator giving network namespace" : what
>> confuses me is that, given the previous answers, I thought that in that
>> case, the non-ephemeral port range was *shared* (as a ressource) between
>> containers, which sounds to me at the opposite of the namespace concept (as
>> a slightly different example 2 docker container have their own private 80
>> port for instance).
>>
>> What am I missing ?
>>
>> Thanks
>>
>> --
>> TH
>>
>>
>>
> --
> Olivier Sallou
> IRISA / University of Rennes 1
> Campus de Beaulieu, 35000 RENNES - FRANCE
> Tel: 02.99.84.71.95
>
> gpg key id: 4096R/326D8438  (keyring.debian.org)
> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>
>
>


Re: Mesos (and Marathon) port mapping

2017-03-30 Thread Jie Yu
>
> are you talking about the NAT feature of docker in BRIDGE m


Yes

 - regarding the "port mapping isolator giving network namespace" : what
> confuses me is that, given the previous answers, I thought that in that
> case, the non-ephemeral port range was *shared* (as a ressource) between
> containers, which sounds to me at the opposite of the namespace concept (as
> a slightly different example 2 docker container have their own private 80
> port for instance).


The port mapping isolator is for the case where ip per container is not
possible (due to ipam restriction, etc), but the user still wants to have
network namespace per container (for isolation, getting statistics, etc.)

Since all containers, even if they are in separate namespaces, share the
same IP, we have to use some other mechanism to tell which packet belongs
to which container. We use ports in that case. You can find more details
about port mapping isolator in this talk I gave in 2015 MesosCon:
https://www.youtube.com/watch?v=ZA96g1M4v8Y

- Jie

On Thu, Mar 30, 2017 at 2:13 AM, Thomas HUMMEL <thomas.hum...@pasteur.fr>
wrote:

>
> On 03/29/2017 07:25 PM, Jie Yu wrote:
>
> Thomas,
>
> I think you are confused about the port mapping for NAT purpose, and the port
> mapping isolator
> <http://mesos.apache.org/documentation/latest/port-mapping-isolator/>.
> Those two very different thing. The port mapping isolator (unfortunate
> naming), as described in the doc, gives you network namespace per container
> without requiring ip per container. No NAT is involved. I think for you
> case, you should not use it and it does not work for DockerContainerizer.
>
> Thanks,
>
> I'm not sure to understand what you say :
>
> - are you talking about the NAT feature of docker in BRIDGE mode ?
>
> - regarding the "port mapping isolator giving network namespace" : what
> confuses me is that, given the previous answers, I thought that in that
> case, the non-ephemeral port range was *shared* (as a ressource) between
> containers, which sounds to me at the opposite of the namespace concept (as
> a slightly different example 2 docker container have their own private 80
> port for instance).
>
> What am I missing ?
>
> Thanks
>
> --
> TH
>
>


Re: mesos container cluster came across health check coredump log

2017-03-29 Thread Jie Yu
+ AlexR, haosdent

For posterity, the root cause of this problem is that when agent is running
inside a docker container and `--docker_mesos_image` flag is specified, the
pid namespace of the executor container (which initiate the health check)
is different than the root pid namespace. Therefore, getting the network
namespace handle using `/proc//ns/net` does not work because the 'pid'
here is in the root pid namespace (reported by docker daemon).

Alex and haosdent, I think we should fix this issue. As suggested above, we
can launch the executor container with --pid=host if `--docker_mesos_image`
is specified.

- Jie

On Wed, Mar 29, 2017 at 3:56 AM, tommy xiao  wrote:

> it resolved by add --pid=host.  thanks for community guys supports. thanks
> a lot.
>
> 2017-03-29 9:52 GMT+08:00 tommy xiao :
>
>> My Environment is specified:
>>
>> mesos 1.2 in docker containerized.
>>
>> send a sample nginx docker container with mesos native health check.
>>
>> then get sandbox core dump.
>>
>> i have digg into more information for your reference:
>>
>> in mesos slave container, i can only see task container pid. but i can't
>> found process nginx pid.
>>
>> but in host console, i can found the nginx pid. so how can i get the pid
>> in container?
>>
>>
>>
>>
>> 2017-03-28 13:49 GMT+08:00 tommy xiao :
>>
>>> https://issues.apache.org/jira/browse/MESOS-6184
>>>
>>> anyone give some hint?
>>>
>>> ```
>>>
>>> I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0
>>> I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent
>>> a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4
>>> I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H
>>> unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432
>>> --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e
>>> 3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a
>>> 274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b
>>> 47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox
>>> --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest
>>> --label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu
>>> --label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp
>>> --name 
>>> mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb
>>> nginx
>>> I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as
>>> health check still in grace period
>>> W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed 1
>>> times consecutively: HTTP health check failed: curl returned terminated
>>> with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/
>>> include/process/posix/subprocess.hpp:190): Failed to execute
>>> Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid
>>> 18596 does not exist
>>>
>>>-
>>>   -
>>>  - Aborted at 1490672906 (unix time) try "date -d @1490672906"
>>>  if you are using GNU date ***
>>>  PC: @ 0x7f26bfb485f7 __GI_raise
>>>  - SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from
>>>  PID 74; stack trace: ***
>>>  @ 0x7f26c0703100 (unknown)
>>>  @ 0x7f26bfb485f7 __GI_raise
>>>  @ 0x7f26bfb49ce8 __GI_abort
>>>  @ 0x7f26c315778e _Abort()
>>>  @ 0x7f26c31577cc _Abort()
>>>  @ 0x7f26c237a4b6 process::internal::childMain()
>>>  @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
>>>  @ 0x7f26c2379e53 process::internal::defaultClone()
>>>  @ 0x7f26c237b951 process::internal::cloneChild()
>>>  @ 0x7f26c237954f process::subprocess()
>>>  @ 0x7f26c15a9fb1 mesos::internal::checks::Healt
>>>  hCheckerProcess::httpHealthCheck()
>>>  @ 0x7f26c15ababd mesos::internal::checks::Healt
>>>  hCheckerProcess::performSingleCheck()
>>>  @ 0x7f26c2331389 process::ProcessManager::resume()
>>>  @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s
>>>  impleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M
>>>  _runEv
>>>  @ 0x7f26c04a1220 (unknown)
>>>  @ 0x7f26c06fbdc5 start_thread
>>>  @ 0x7f26bfc0928d __clone
>>>  W0328 11:48:36.340055 55 health_checker.cpp:202] Health check
>>>  failed 2 times consecutively: HTTP health check failed: curl 
>>> returned
>>>  terminated with signal Aborted (core dumped): ABORT:
>>>  
>>> (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190):
>>>  Failed to execute Subprocess::ChildHook: Failed to enter the net 
>>> namespace
>>>  of pid 18596: Pid 18596 does not exist
>>>  - Aborted at 1490672916 (unix time) try "date -d @1490672916"
>>>  if you are using GNU date ***
>>>  PC: @ 0x7f26bfb485f7 __GI_raise
>>>  - SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from
>>>  PID 

Re: Mesos (and Marathon) port mapping

2017-03-29 Thread Jie Yu
Thomas,

I think you are confused about the port mapping for NAT purpose, and the port
mapping isolator
.
Those two very different thing. The port mapping isolator (unfortunate
naming), as described in the doc, gives you network namespace per container
without requiring ip per container. No NAT is involved. I think for you
case, you should not use it and it does not work for DockerContainerizer.

- Jie

On Wed, Mar 29, 2017 at 2:22 AM, Thomas HUMMEL 
wrote:

>
>
> On 03/28/2017 06:53 PM, Tomek Janiszewski wrote:
>
> 1. Mentioned port range is the Mesos Agent resource setting, so if you
> don't explicitly define port range it would be used.
> https://github.com/apache/mesos/blob/1.2.0/src/slave/constants.hpp#L86
>
> 2. With ports mapping two or more applications could attach to same
> container port but will be exposed under different host port.
>
>
> Thanks for your answer.
>
> 1. So it's not network/portmapping isolator specific, right ? Even without
> it, non-ephemeral ports would be considered as part of the offer and would
> be chosen in this range by default ?
>
> 2. So containers, even with network/port_mapping isolation, *share* the
> non-ephemeral port range, although doc states "The agent assigns each
> container a non-overlapping range of the ports" which I first read as "each
> container gets it's own port range", right ?
>
> So I am a bit confused since what's described here
>
> http://mesos.apache.org/documentation/latest/port-mapping-isolator/
>
> in the "Configuring network ports" seems to be valid even without port
> mapping isolator.
>
> Am I getting this right this time ?
>
> Thanks.
>
> --
> Thomas HUMMEL
>
>


MesosCon Asia CFP deadline extended

2017-03-27 Thread Jie Yu
Chinese version below/ 中文详情请见下文

In order to allow for more submissions, we have extended the CFP deadline
for MesosCon Asia until April 5th 11:59pm PDT.
http://events.linuxfoundation.org/events/mesoscon-asia/program/cfp

None of the other submission dates have changed. Community voting on
submissions will still happen between April 10th and 16th. We will let you
know whether or not your talk has been accepted on April 22nd (as planned),
so you should have plenty of time to make travel arrangements.

MesosCon Asia will have simultaneous interpretation and we encourage
chinese talks, but request that you submit your abstract in English to make
community voting simpler.

Looking forward to many great submissions and talks,
Your MesosCon PMC

-

MesosCon Asia CFP截止日期延至4月5日

为了获得更多的演讲提案,我们将今年MesosCon Asia的CFP提交截止日期延至美西太平洋时间4月5日11:59pm。
http://events.linuxfoundation.org/events/mesoscon-asia/program/cfp

其余截止日期皆维持不变,演讲提案的社区公开投票仍将在4月10日至4月16日间进行。为了让您获得足够充裕的时间提前筹备旅行计划,我们将于4月22日通知您演讲提案是否获得通过。

MesosCon
Asia将提供同声传译服务,同时我们鼓励大家使用中文进行演讲,但您的演讲提案摘要(abstract)仍需使用英文撰写,以便于社区进行投票。

期待大家的提案及演讲,
MesosCon PMC


Re: [Design Doc] Improve Storage Support in Mesos using Resources Provider

2017-03-23 Thread Jie Yu
Yes, the idea is to make this general in the future. In fact, the whole
resource provider design keeps that in mind.

We could add a general "CONVERT" operation in the future with a free formed
key value pairs (as well as the source resources) as the parameters. And
it's up to the corresponding resource provider to interpret that.

- Jie

On Thu, Mar 23, 2017 at 3:50 AM, Sargun Dhillon <sar...@sargun.me> wrote:

> Is the intent to make this generic beyond disks? I can see the
> concepts apply beyond volumes, and blocks. Perhaps a generic
> Create{generation} -- where larger generation numbers descend from
> smaller ones?
>
> I can also see this valuable in networking. My use case is ENIs in
> AWS. I would like to have a ResourceProvider that can manipulate ENIs
> based on the invocation of the scheduler. Instead of "CREATE_BLOCK"
> it'd be CREATE_INTERFACE, with some given options about the ENI,
> giving us a raw interface. Subsequently, we would want to do a
> CREATE_IPVLAN, as a subinterface that we can assign an actual IP to.
> The IPVLAN interface is a descendant of the raw interface, just as
> volumes are descendants of block devices.
>
>
> On Sun, Mar 12, 2017 at 6:47 PM, Jie Yu <yujie@gmail.com> wrote:
> > Hi,
> >
> > Currently, Mesos supports both local persistent volumes as well as
> external
> > persistent volumes. However, both of them are not ideal.
> >
> > Local persistent volumes do not support offering physical or logical
> block
> > devices directly. Also, frameworks do not have choices to select
> > filesystems for their local persistent volumes. There are also some
> > usability problem with the local persistent volumes. Mesos does support
> > multiple local disks. However, it’s a big burden for operators to
> configure
> > each agent properly to be able to leverage this feature.
> >
> > External persistent volumes support in Mesos currently bypasses the
> > resource management part. In other words, using an external persistent
> > volume does not go through the usual offer cycle. Mesos doesn’t track
> > resources associated with the external volumes. This makes quota control,
> > reservation, fair sharing almost impossible to implement. Also, the
> current
> > interface Mesos uses to interact with volume providers is the Docker
> Volume
> > Driver interface (DVDI), which is very specific to operations on a
> > particular agent.
> >
> > The main problem I see currently is that we don’t have a coherent story
> for
> > storage. Yes, we have some primitives in Mesos that can support some
> > stateful services, but this is far from ideal. Some of them are just the
> > stop gap solution (e.g., the external volume support). This design tries
> to
> > tell a coherent story for supporting storage in Mesos.
> >
> > https://docs.google.com/document/d/125YWqg_
> 5BB5OY9a6M7LZcby5RSqBwo2PZzpVLuxYXh4/edit?usp=sharing
> >
> > Please feel free to reply this thread or comment on the doc if you have
> any
> > comments or suggestions! Thanks!
> >
> > - Jie
>


Container Storage Interface (CSI)

2017-03-22 Thread Jie Yu
Hi,

Container Storage Interface (CSI)
is a joint work between
container orchestrators (Mesos, Kubernetes, Docker and Cloud Foundry)
trying to standardize the interface between container orchestrator and
storage platforms.

The goal is to let storage platforms just write one plugin that works for
all container orchestrators.

The initial discussion is captured in this doc:
https://docs.google.com/document/d/1JMNVNP-ZHz8cGlnqckOnpJmHF-DNY7IYP-
Di7iuVhQI/edit?usp=sharing

Feel free to comment on that doc if you have any thoughts! This is the same
doc shared in the k8s and Docker community.

The relevant Mesos work to integrate CSI is captured in the Resource
Provider  design doc:
https://docs.google.com/document/d/125YWqg_5BB5OY9a6M7LZcby5RSqBwo2PZzpVL
uxYXh4/edit?usp=sharing

- Jie


Re: mesos network/port_mapping + spark traffic not flowing between containers

2017-03-19 Thread Jie Yu
Dominic,

This might be related to this:
https://issues.apache.org/jira/browse/MESOS-7130

- Jie

On Sun, Mar 19, 2017 at 10:10 AM, Dominic Grégoire <
dominic.grego...@gmail.com> wrote:

> Hello all,
>
> I’m testing with mesos 1.1.0 on aws linux to see if it applies to some of
> our processes and I ran into a problem with network/port_mapping, maybe
> this is a known issue?
>
> The agent is running with these flags:
> export MESOS_isolation=cgroups/cpu,cgroups/mem,network/port_mapping
> export MESOS_containerizers=mesos
> export MESOS_resources="ports:[31000-32000];ephemeral_ports:[32768-57344]"
> export MESOS_ephemeral_ports_per_container=1024
>
> Running spark 2.1.0 with 2 mesos containers on the same host, they can
> connect to each other’s block manager but can’t send traffic, it stays in
> their netns send-q.
>
> Spark is logging:
> 7/03/19 16:54:56 INFO TransportClientFactory: Successfully created
> connection to ip-10-32-20-34.ec2.internal/10.32.20.34:34294 after 12 ms
> (0 ms spent in bootstraps)
> 17/03/19 16:56:56 ERROR TransportChannelHandler: Connection to
> ip-10-32-20-34.ec2.internal/10.32.20.34:34294 has been quiet for 12
> ms while there are outstanding requests. Assuming connection is dead;
> please adjust spark.network.timeout if this is wrong.
>
> I can see connections established between containers but everything stays
> in the send Qs:
> [root@ip-10-32-20-34 sysctl.d]# ip netns
> 4602 (id: 1)
> 4600 (id: 0)
> [root@ip-10-32-20-34 sysctl.d]# ip netns exec 4600 netstat -an
> Connexions Internet actives (serveurs et établies)
> Proto Recv-Q Send-Q Local Address   Foreign Address
>  State
> tcp0  0 10.32.20.34:32861   0.0.0.0:*
>LISTEN
> tcp0  0 0.0.0.0:33003   0.0.0.0:*
>LISTEN
> tcp0  0 10.32.20.34:33003   10.32.20.34:57363
>ESTABLISHED
> tcp0  0 10.32.20.34:33566   10.32.20.34:34294
>ESTABLISHED
> tcp0  0 10.32.20.34:33658   10.32.18.185:40600
>   ESTABLISHED
> tcp0  0 10.32.20.34:32832   10.32.18.185:40196
>   ESTABLISHED
> tcp0  0 10.32.20.34:33406   10.32.20.34:5051
>   ESTABLISHED
> Sockets du domaine UNIX actives(serveurs et établies)
> Proto RefCpt Indicatrs   Type   Etat  I-Node Chemin
> unix  2  [ ] STREAM CONNECTE  21869
> unix  2  [ ] STREAM CONNECTE  20339
> [root@ip-10-32-20-34 sysctl.d]# ip netns exec 4602 netstat -an
> Connexions Internet actives (serveurs et établies)
> Proto Recv-Q Send-Q Local Address   Foreign Address
>  State
> tcp0  0 0.0.0.0:33836   0.0.0.0:*
>LISTEN
> tcp0  0 10.32.20.34:34294   0.0.0.0:*
>LISTEN
> tcp0  24229 10.32.20.34:34294   10.32.20.34:33566
>ESTABLISHED
> tcp0  0 10.32.20.34:33860   10.32.18.185:40196
>   ESTABLISHED
> tcp0  0 10.32.20.34:34680   10.32.18.185:40600
>   ESTABLISHED
> tcp0  0 10.32.20.34:34434   10.32.20.34:5051
>   ESTABLISHED
> tcp0  0 10.32.20.34:33836   10.32.20.34:58149
>ESTABLISHED
> Sockets du domaine UNIX actives(serveurs et établies)
> Proto RefCpt Indicatrs   Type   Etat  I-Node Chemin
> unix  2  [ ] STREAM CONNECTE  20359
> unix  2  [ ] STREAM CONNECTE  20373
> [root@ip-10-32-20-34 sysctl.d]#
>


[Design Doc] Improve Storage Support in Mesos using Resources Provider

2017-03-12 Thread Jie Yu
Hi,

Currently, Mesos supports both local persistent volumes as well as external
persistent volumes. However, both of them are not ideal.

Local persistent volumes do not support offering physical or logical block
devices directly. Also, frameworks do not have choices to select
filesystems for their local persistent volumes. There are also some
usability problem with the local persistent volumes. Mesos does support
multiple local disks. However, it’s a big burden for operators to configure
each agent properly to be able to leverage this feature.

External persistent volumes support in Mesos currently bypasses the
resource management part. In other words, using an external persistent
volume does not go through the usual offer cycle. Mesos doesn’t track
resources associated with the external volumes. This makes quota control,
reservation, fair sharing almost impossible to implement. Also, the current
interface Mesos uses to interact with volume providers is the Docker Volume
Driver interface (DVDI), which is very specific to operations on a
particular agent.

The main problem I see currently is that we don’t have a coherent story for
storage. Yes, we have some primitives in Mesos that can support some
stateful services, but this is far from ideal. Some of them are just the
stop gap solution (e.g., the external volume support). This design tries to
tell a coherent story for supporting storage in Mesos.

https://docs.google.com/document/d/125YWqg_5BB5OY9a6M7LZcby5RSqBwo2PZzpVLuxYXh4/edit?usp=sharing

Please feel free to reply this thread or comment on the doc if you have any
comments or suggestions! Thanks!

- Jie


Re: [VOTE] Release Apache Mesos 1.2.0 (rc2)

2017-03-06 Thread Jie Yu
-0

I wanna fix MESOS-7208  which
affects all tasks that are launched as non-root using container image.

But this is not a new regression because it exists in 1.1.0, thus I am a -0

- Jie

On Fri, Mar 3, 2017 at 4:08 PM, Vinod Kone  wrote:

> +1 (binding)
>
> Since the perf and flaky test that I reported earlier doesn't seem to be
> blockers.
>
> On Fri, Mar 3, 2017 at 4:01 PM, Adam Bordelon  wrote:
>
> > I haven't heard any -1's so I'm going to go ahead and vote myself, from a
> > DC/OS perspective:
> >
> > +1 (binding)
> >
> > I ran 1.2.0-rc2 through the DC/OS integration tests on top of the
> > 1.9.0-rc1, which covers many Mesos features and tests multiple
> frameworks.
> > See CI results of https://github.com/dcos/dcos/pull/1295
> >
> > This was then merged into DC/OS 1.9.0-rc2 which passed another suite of
> > integration tests. Available for testing at https://dcos.io/releases/1.9
> .
> > 0-rc2/
> >
> >
> > On Thu, Mar 2, 2017 at 12:02 AM, Adam Bordelon 
> wrote:
> >
> >> TL;DR: No consensus yet. Let's extend the vote for a day or two, until
> we
> >> have 3 +1s or a legit -1.
> >> During that time we can test further, and investigate any issues that
> >> have shown up.
> >>
> >> Here's a summary of what's been reported on the 1.2.0-rc2 vote thread:
> >>
> >> - There was a perf core dump on ASF CI, which is not necessarily a
> >> blocker:
> >> MESOS-7160  Parsing of perf version segfaults
> >>   Perhaps fixed by backporting MESOS-6982: PerfTest.Version fails on
> >> recent Arch Linux
> >>
> >> - There were a couple of (known/unsurprising) flaky tests:
> >> MESOS-7185  DockerRuntimeIsolatorTest.ROOT_INTERNET_CURL_
> DockerDefaultEntryptRegistryPuller
> >> is flaky
> >> MESOS-4570  DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems
> flaky.
> >>
> >> - If we were to have an rc3, the following Critical bugs could be
> >> included:
> >> MESOS-7050  IOSwitchboard FDs leaked when containerizer launch fails --
> >> leads to deadlock
> >> MESOS-6982  PerfTest.Version fails on recent Arch Linux
> >>
> >> - Plus doc updates:
> >> MESOS-7188 Add documentation for Debug APIs to Operator API doc
> >> MESOS-7189 Add nested container launch/wait/kill APIs to agent API
> >> docs.
> >>
> >>
> >> On Wed, Mar 1, 2017 at 11:30 AM, Neil Conway 
> >> wrote:
> >>
> >>> The perf core dump might be addressed if we backport this change:
> >>>
> >>> https://reviews.apache.org/r/56611/
> >>>
> >>> Although my guess is that this isn't a severe problem: for some
> >>> as-yet-unknown reason, running `perf` on the host segfaulted, which
> >>> causes the test to fail.
> >>>
> >>> Neil
> >>>
> >>> On Wed, Mar 1, 2017 at 11:09 AM, Vinod Kone 
> >>> wrote:
> >>> > Tested on ASF CI.
> >>> >
> >>> > Saw 2 configurations fail. One was the perf core dump issue
> >>> > . Other is a known
> >>> (since
> >>> > 0..28.0) flaky test with Docker fetcher plugin
> >>> > .
> >>> >
> >>> > Withholding the vote until we know the severity of the perf core
> dump.
> >>> >
> >>> >
> >>> > *Revision*: b9d8202a7444d0d1e49476bfc9817eb4583beaff
> >>> >
> >>> >- refs/tags/1.1.1-rc2
> >>> >
> >>> > Configuration Matrix gcc clang
> >>> > centos:7 --verbose --enable-libevent --enable-ssl autotools
> >>> > [image: Success]
> >>> >  >>> ease/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--ver
> >>> bose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
> >>> 1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%
> >>> 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> >>> > [image: Not run]
> >>> > cmake
> >>> > [image: Success]
> >>> >  >>> ease/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
> >>> %20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
> >>> 20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoo
> >>> p)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> >>> > [image: Not run]
> >>> > --verbose autotools
> >>> > [image: Success]
> >>> >  >>> ease/30/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--ver
> >>> bose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,
> >>> label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> >>> > [image: Not run]
> >>> > cmake
> >>> > [image: Success]
> >>> >  >>> ease/30/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
> >>> ,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_
> >>> exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
> >>> > [image: Not run]
> >>> > ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
> >>> > [image: Success]
> >>> > 

Re: Welcome Neil Conway as Mesos Committer and PMC member!

2017-01-22 Thread Jie Yu
Congrats! Neil! Well deserved!

On Sun, Jan 22, 2017 at 6:56 PM, haosdent  wrote:

> Congrats Neil !!
>
> On Sun, Jan 22, 2017 at 3:29 AM, Gabriel Hartmann 
> wrote:
>
>> Congrats Neil.
>>
>> On Sat, Jan 21, 2017 at 7:08 AM Deepak Vij (A) 
>> wrote:
>>
>>> Congrats Neil.
>>>
>>> Deepak Vij
>>>
>>> Sent from HUAWEI AnyOffice
>>> From:Vinod Kone
>>> To:dev,user
>>> Date:2017-01-20 23:04:30
>>> Subject:Welcome Neil Conway as Mesos Committer and PMC member!
>>>
>>> Hi folks,
>>>
>>> Please welcome Neil Conway as the newest committer and PMC member of the
>>> Apache Mesos project.
>>>
>>> Neil has been an active contributor to Mesos for more than a year now. As
>>> part of his work, he has contributed some major features (Partition aware
>>> frameworks, floating point operations for resources). Neil also took the
>>> initiative to improve the documentation of our project and shepherded
>>> several improvements over time. Doing that even without being a
>>> committer,
>>> shows that he takes ownership of the project seriously.
>>>
>>> Here is his more formal checklist for your perusal.
>>>
>>> https://docs.google.com/document/d/137MYwxEw9QCZRH09CXfn1544p1LuM
>>> uoj9LxS-sk2_F4/edit
>>> 
>>>
>>> Thanks,
>>> Vinod
>>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: Why task environment variables are set for executor/containerizer?

2017-01-18 Thread Jie Yu
Daniel,

Yes, this is a tech debt in the old command executor. The new default
executor does not have this issue.

- Jie

On Wed, Jan 18, 2017 at 5:49 PM, Daniel Krawczyk  wrote:

> Hi,
>
> When I set an environment variable for my task it is as well applied both
> to mesos-containerizer and mesos-executor.
> This may lead to problems, for example when loading dynamic libraries with
> LD_PRELOAD.
>
> Is setting task variables to executor/containerizer a bug?
>
> Cheers,
> Daniel
>
>


Re: Why task environment variables are set for executor/containerizer?

2017-01-18 Thread Jie Yu
Can you create a ticket for that? I personally want to fix that for command
executor.

- Jie

On Wed, Jan 18, 2017 at 5:52 PM, Jie Yu <yujie@gmail.com> wrote:

> Daniel,
>
> Yes, this is a tech debt in the old command executor. The new default
> executor does not have this issue.
>
> - Jie
>
> On Wed, Jan 18, 2017 at 5:49 PM, Daniel Krawczyk <dank...@gmail.com>
> wrote:
>
>> Hi,
>>
>> When I set an environment variable for my task it is as well applied both
>> to mesos-containerizer and mesos-executor.
>> This may lead to problems, for example when loading dynamic libraries
>> with LD_PRELOAD.
>>
>> Is setting task variables to executor/containerizer a bug?
>>
>> Cheers,
>> Daniel
>>
>>
>


Re: customized IP for health check

2017-01-18 Thread Jie Yu
Had a discussion with Vinod and AlexR on this.

I DO think it's common that frameworks do not know about the IP of the
container before launching it. For instance, any networking solution that
has a dynamic IPAM (e.g., calico, dc/os overlay, weave, etc.).

For those cases, probably doesn't make sense to ask the frameworks (or
users) to specify the health check ip addresses for their containers.

One solution we discussed was to use the container ip by default, but
optionally allow people to specify `LOCALHOST` if they want to (this is
what k8s does for probe. what's different is that k8s performs health check
from kublet which runs outside the container network, while Mesos perform
health checks from the executor which is in the same network as the
container).

However, the downside for this approach is that if a container joins
multiple networks, what should be the address that Mesos uses to do health
check by default?

Therefore, I personally prefer a more consistent semantics (i.e., always
using localhost). Although we can always add a backdoor for allowing people
to specify a custom address for health check, but this should be avoided if
possible.

- Jie

On Wed, Jan 18, 2017 at 10:44 AM, CmingXu <cming...@gmail.com> wrote:

> The framework user have to make sure the IPs assigned unique and
> accessible within the VLAN.
>
> In some cases, framework user want their DB, Cache or Proxy type of apps
> handled by my framework & mesos, they might also want the app treated as
> they are deployed  as the old way, which have a unique IP for each
> container.
>
> This kind of app is not the only type that my framework support, with
> BRIDGE driver supported too.
>
> On Wed, Jan 18, 2017 at 5:30 PM, Jie Yu <yujie@gmail.com> wrote:
>
>> It's also possible that the IP is not known by the task/framework upfront
>> (in fact, this is quite common depending on the underlying network driver),
>> what does your general framework do in this case?
>>
>> - Jie
>>
>> On Wed, Jan 18, 2017 at 10:26 AM, CmingXu <cming...@gmail.com> wrote:
>>
>>> I am not sure what kind of apps are going to be running on Mesos, what I
>>> am doing is a general purpose framework kind of like Marathon.
>>>
>>> On Wed, Jan 18, 2017 at 5:24 PM, Jie Yu <yujie@gmail.com> wrote:
>>>
>>>> and we don't know if the task listening on all interfaces or not
>>>>
>>>>
>>>> OK, i think that's the reason. Although, i am wondering: the task is
>>>> already listening on an external IP, why not just listen on 0.0.0.0. Any
>>>> specific reason this is a concern? Or just because there is no way to
>>>> configure the listening address of the task?
>>>>
>>>> - Jie
>>>>
>>>> On Wed, Jan 18, 2017 at 10:17 AM, CmingXu <cming...@gmail.com> wrote:
>>>>
>>>>> To Alex:
>>>>> Yes, we know the IP upfront, framework user need reserve unique IP for
>>>>> each task, and we don't know if the task listening on all interfaces or
>>>>> not, so let the health check on the IP is the best option.
>>>>>
>>>>>
>>>>>
>>>>> To Jie Yu:
>>>>>
>>>>> by DEFAULT_DOMAIN I mean
>>>>>
>>>>> *static const string DEFAULT_DOMAIN = "127.0.0.1"*
>>>>>
>>>>> in source code src/health-check/health_checker.cpp
>>>>>
>>>>> On Wed, Jan 18, 2017 at 4:58 PM, Jie Yu <yujie@gmail.com> wrote:
>>>>>
>>>>>> So you want to use the IP addressed assigned by your macvlan driver
>>>>>> to do health check? If that's the case, I still don't understand why
>>>>>> entering the network namespace of the container and use localhost for
>>>>>> health check does not work (which is what Mesos is doing).
>>>>>>
>>>>>> I walked through the Mesos source code and obviously the TCP & HTTP
>>>>>>> doesn't meet my requirements as DEFAULT_DOMAIN is hard coded
>>>>>>
>>>>>>
>>>>>> What do you mean by DEFAULT_DOMAIN?
>>>>>>
>>>>>> - Jie
>>>>>>
>>>>>> On Wed, Jan 18, 2017 at 9:54 AM, CmingXu <cming...@gmail.com> wrote:
>>>>>>
>>>>>>> the network I am currently used is USER, and each task was assigned
>>>>>>> with a unique vLAN IP with the underlaying docker driver is Macvlan.
>>>>>>> I
>&

Re: customized IP for health check

2017-01-18 Thread Jie Yu
>
> and we don't know if the task listening on all interfaces or not


OK, i think that's the reason. Although, i am wondering: the task is
already listening on an external IP, why not just listen on 0.0.0.0. Any
specific reason this is a concern? Or just because there is no way to
configure the listening address of the task?

- Jie

On Wed, Jan 18, 2017 at 10:17 AM, CmingXu <cming...@gmail.com> wrote:

> To Alex:
> Yes, we know the IP upfront, framework user need reserve unique IP for
> each task, and we don't know if the task listening on all interfaces or
> not, so let the health check on the IP is the best option.
>
>
>
> To Jie Yu:
>
> by DEFAULT_DOMAIN I mean
>
> *static const string DEFAULT_DOMAIN = "127.0.0.1"*
>
> in source code src/health-check/health_checker.cpp
>
> On Wed, Jan 18, 2017 at 4:58 PM, Jie Yu <yujie@gmail.com> wrote:
>
>> So you want to use the IP addressed assigned by your macvlan driver to do
>> health check? If that's the case, I still don't understand why entering the
>> network namespace of the container and use localhost for health check does
>> not work (which is what Mesos is doing).
>>
>> I walked through the Mesos source code and obviously the TCP & HTTP
>>> doesn't meet my requirements as DEFAULT_DOMAIN is hard coded
>>
>>
>> What do you mean by DEFAULT_DOMAIN?
>>
>> - Jie
>>
>> On Wed, Jan 18, 2017 at 9:54 AM, CmingXu <cming...@gmail.com> wrote:
>>
>>> the network I am currently used is USER, and each task was assigned
>>> with a unique vLAN IP with the underlaying docker driver is Macvlan. I
>>> want my framework user have the ability to define there own
>>> HealthChecks with the IP assigned to a specific task.
>>>
>>> I walked through the Mesos source code and obviously the TCP & HTTP
>>> doesn't meet my requirements as DEFAULT_DOMAIN is hard coded, now the
>>> only option to be might be health check with COMMAND, but if TCP does
>>> support passing IP would be great help.
>>>
>>> Thanks
>>>
>>> On Wed, Jan 18, 2017 at 4:40 PM, Jie Yu <yujie@gmail.com> wrote:
>>> > Hi, can you elaborate a bit more on why you need to use an customized
>>> IP,
>>> > rather than using localhost for health check?
>>> >
>>> > - Jie
>>> >
>>> > On Wed, Jan 18, 2017 at 9:19 AM, CmingXu <cming...@gmail.com> wrote:
>>> >>
>>> >> Is there any plan we support customized IP when define a health check?
>>> >> If true, what's the ETA?
>>> >>
>>> >> thanks
>>> >
>>> >
>>>
>>
>>
>


Re: customized IP for health check

2017-01-18 Thread Jie Yu
So you want to use the IP addressed assigned by your macvlan driver to do
health check? If that's the case, I still don't understand why entering the
network namespace of the container and use localhost for health check does
not work (which is what Mesos is doing).

I walked through the Mesos source code and obviously the TCP & HTTP
> doesn't meet my requirements as DEFAULT_DOMAIN is hard coded


What do you mean by DEFAULT_DOMAIN?

- Jie

On Wed, Jan 18, 2017 at 9:54 AM, CmingXu <cming...@gmail.com> wrote:

> the network I am currently used is USER, and each task was assigned
> with a unique vLAN IP with the underlaying docker driver is Macvlan. I
> want my framework user have the ability to define there own
> HealthChecks with the IP assigned to a specific task.
>
> I walked through the Mesos source code and obviously the TCP & HTTP
> doesn't meet my requirements as DEFAULT_DOMAIN is hard coded, now the
> only option to be might be health check with COMMAND, but if TCP does
> support passing IP would be great help.
>
> Thanks
>
> On Wed, Jan 18, 2017 at 4:40 PM, Jie Yu <yujie@gmail.com> wrote:
> > Hi, can you elaborate a bit more on why you need to use an customized IP,
> > rather than using localhost for health check?
> >
> > - Jie
> >
> > On Wed, Jan 18, 2017 at 9:19 AM, CmingXu <cming...@gmail.com> wrote:
> >>
> >> Is there any plan we support customized IP when define a health check?
> >> If true, what's the ETA?
> >>
> >> thanks
> >
> >
>


Re: customized IP for health check

2017-01-18 Thread Jie Yu
Hi, can you elaborate a bit more on why you need to use an customized IP,
rather than using localhost for health check?

- Jie

On Wed, Jan 18, 2017 at 9:19 AM, CmingXu  wrote:

> Is there any plan we support customized IP when define a health check?
> If true, what's the ETA?
>
> thanks
>


Re: Providing end-user feedback on Docker image download progress

2017-01-09 Thread Jie Yu
Frank,

Thanks for reaching out! I think this is definitely something we've thought
about, just don't have the cycle to get it prioritized.
https://issues.apache.org/jira/browse/MESOS-2256

The idea is around re-using STAGING state with more information about the
progress of the provisioning (and fetching). cc Vinod, BenM, BenH

We want to contribute this work back to the project and like to know

which of the above and other options are the most viable.


That's great!

- Jie

On Mon, Jan 9, 2017 at 3:07 AM, Frank Scholten 
wrote:

> Hi,
>
> Together with a client we are looking into ways to provide end-user
> feedback on Docker image downloads for Mesos tasks.
>
> The idea is that an end user who uses a cli tool that connects to a
> Mesos scheduler gets image dowload and provisioning progress on it's
> standard out stream every few seconds.
>
> Our client has a Mesos framework that runs large Docker images via the
> Mesos Containerizer and we like to know what our options are of adding
> this feature either to the framework, to a Mesos module, or to Mesos
> itself using some sort of API or a new task state next to STAGING like
> DOWNLOADING including detailed progress information.
>
> We want to contribute this work back to the project and like to know
> which of the above and other options are the most viable.
>
> Cheers,
>
> Frank
>


Re: Welcome Haosdent Huang as Mesos Committer and PMC member!

2016-12-19 Thread Jie Yu
Congrats! Well deserved!!

Always wondering why you have so much time!

- Jie

On Mon, Dec 19, 2016 at 5:19 PM, Jay Guo  wrote:

> Congratulations Haosdent!!!
>
> /J
>
> On Mon, Dec 19, 2016 at 4:40 PM, Chengwei Yang
>  wrote:
> > Congratulations! Well deserved.
> >
> > Haosdent helps me a lot!
> >
> > On Fri, Dec 16, 2016 at 01:59:19PM -0500, Vinod Kone wrote:
> >> Hi folks,
> >>
> >> Please join me in formally welcoming Haosdent Huang as Mesos Committer
> and
> >> PMC member.
> >>
> >> Haosdent has been an active contributor to the project for more than a
> year
> >> now. He has contributed a number of patches and features to the Mesos
> code
> >> base, most notably the unified cgroups isolator and health check
> >> improvements. The most impressive thing about him is that he always
> >> volunteers to help out people in the community, be it on slack/IRC or
> >> mailing lists. The fact that he does all this even though working on
> Mesos
> >> is not part of his day job is even more impressive.
> >>
> >> Here is his more formal checklist
> >>  hvy-H8ZGLXG6CF9VP2IY_UU5_0/edit?ts=57e0029d>
> >> for your perusal.
> >>
> >> Thanks,
> >> Vinod
> >>
> >> P.S: Sorry for the delay in sending the welcome email.
> >
> > --
> > Thanks,
> > Chengwei
>


Welcome Guangya Liu as Mesos Committer and PMC member!

2016-12-16 Thread Jie Yu
Hi folks,

Please join me in formally welcoming Guangya Liu as Mesos Committer and PMC
member.

Guangya has worked on the project for more than a year now and has been a
very active contributor to the project. I think one of the most important
contribution he has for the community is that he helped grow the Mesos
community in China. He initiated the Xian-Mesos-User-Group and successfully
organized two meetups which attracted more than 100 people from Xi’an
China. He wrote a handful of blogs and articles in Chinese tech media which
attracted a lot of interests in Mesos. He had given several talks about
Mesos at conferences in China.

His major coding contribution to the project was the docker volume driver
isolator. He has also been involved in allocator performance improvement,
gpu support for docker containerizer, Mesos Tiers/Optimistic Offer design,
scarce resources discussion, and many others.

His formal checklist is here:
https://docs.google.com/document/d/1tot79kyJCTTgJHBhzStFKrVkDK4pX
qfl-LHCLOovNtI/edit?usp=sharing

Thanks,
- Jie


Re: Multi-agent machine

2016-12-09 Thread Jie Yu
Charles,

It should be possible. Here are the global 'object' that might conflict:
1) cgroup (you can use different cgroup root)
2) work_dir and runime_dir (you can set them to be different between agents)
3) network (e.g., iptables, if you use host network, should not be a
problem. Otherwise, you might need to configure your network isolator
properly)

But we haven't tested. Another potential thing that might come up is the
code that rely on hostname of the agent (MachineID in maintenance
primitive?)

- Jie

On Fri, Dec 9, 2016 at 12:11 PM, Charles Allen <
charles.al...@metamarkets.com> wrote:

> Is it possible to setup a machine such that multiple mesos agents are
> running on the same machine and registering with the same master?
>
> For example, with different cgroup roots or different default working
> directory.
>


Re: Path vs Mount disk resources

2016-11-20 Thread Jie Yu
>
> The documentation states "non performance-critical applications" and
> "should only be done in a testing or staging environment" about Path type
> resources. Is there any particular reason for that?


I think the documentation is not very precise. Path disk resource is the
default disk resource, and has been used in prod for years for task's logs
and other stuffs in its sandbox.

MOUNT disk was introduced mainly for data services which may require disk
performance isolation and wants to have exclusive access to a disk.

Both of them are definitely ready for production.

- Jie

On Mon, Nov 21, 2016 at 9:48 AM, Tobias Pfeiffer  wrote:

> Hi,
>
> I have a Mesos cluster where each node has both SSD and HDD disks, so I
> want to offer them separately as described in http://mesos.apache.org/
> documentation/latest/multiple-disk/
>
> I am not sure yet where the OS will be installed, but I guess on the disk
> that holds the OS I cannot use a Mount type resource, but would have to use
> a Path type resource? The documentation states "non performance-critical
> applications" and "should only be done in a testing or staging environment"
> about Path type resources. Is there any particular reason for that?
>
> I imagine if I have something like, say,
>
>   /  HDD, 1000 GB
>   /mnt/data  SSD, 200 GB
>
> would it make sense to configure the disk resources like
>
> {
>   "resources" : [
> {
>   "name" : "disk",
>   "type" : "SCALAR",
>   "scalar" : { "value" : 80 },
>   "disk" : {
> "source" : {
>   "type" : "PATH",
>   "path" : { "root" : "/var/lib/mesos-data" }
> }
>   }
> },
> {
>   "name" : "disk",
>   "type" : "SCALAR",
>   "scalar" : { "value" : 20 },
>   "disk" : {
> "source" : {
>   "type" : "MOUNT",
>   "mount" : { "root" : "/mnt/data" }
> }
>   }
> }
>   ]
> }
>
> or is that something that's not suitable for production?
>
> Thanks,
> Tobias
>
>


Re: Docker snadbox disk statistics

2016-11-03 Thread Jie Yu
Mainly because we don't have a good way to hook into the lifecycle of a
Docker daemon managed Docker container.

- Jie

On Thu, Nov 3, 2016 at 9:48 AM, Tomek Janiszewski <jani...@gmail.com> wrote:

> Thanks. I was suspecting Mesos isolators do not work for Docker but
> hoping there is some trick to turn it on.
>
> Thanks
> Tomek
>
> czw., 3.11.2016 o 17:46 użytkownik Jie Yu <yujie@gmail.com> napisał:
>
>> posix/disk isolator is for MesosContaineizer only. If you're using docker
>> containerizer, this won't work.
>>
>> Or you can use MesosContainerizer to launch you docker image, and then,
>> you'll get the sandbox size properly.
>> https://github.com/apache/mesos/blob/master/docs/container-image.md
>>
>>
>> - Jie
>>
>> On Thu, Nov 3, 2016 at 9:41 AM, Tomek Janiszewski <jani...@gmail.com>
>> wrote:
>>
>> Hi
>>
>> Is it possible to get information about Docker sandboxes size? When I
>> visit monitor/statistics there is no information about disk? I have enabled
>> posix/disk isolation and working on 0.28.2
>>
>> Thanks
>> Tomek
>>
>>
>>


Re: Mesos containerizer & isolation

2016-11-02 Thread Jie Yu
To add to haosdent's reply:

- I have a USER directive in my Dockerfile in order for the CMD to be
> executed as that user, but that does not seem to be supported (yet?) by the
> Docker image provider. Is there any method (except `sudo`/`setuser`) to
> achieve running as a user present in the image's /etc/fstab?


Currently, USER directive in Dockerfile is not honored. You can think of
that as using `-u` when doing docker run, and uses the uid of the 'user' on
the host ('user' here is what's specified in CommandInfo.user or
frameworkInfo.user if the former is not specified). The reason we need to
do that is because we want to make sure the processes in the container can
access its sandbox and persistent volumes which is owned by 'user'.

This can be potentially solved by using user namespace as haosdent pointed
out.

- I may have to run untrusted code, so can I make sure that users cannot
break out of the chroot? What about UID namespacing, so that root in the
chroot does not become root on the host system when breaking out?

You can run your code using an unprivileged user (e.g., nobody). You just
need to set CommandInfo.user.

- Jie

On Wed, Nov 2, 2016 at 7:14 PM, haosdent  wrote:

> >- Is it possible to hide host processes from the container?
> You may consider to use the namespaces/pid isolator, add `namespaces/pid`
> in the `--isolation` flag when launch Mesos Agent
> > -Is it possible to run processes that open network ports (possibly
> already open on the host system) and have them mapped to different ports on
> the host system, just as with Docker's `-p`?
> You need to use CNI port mapping. Refer to its document
> https://reviews.apache.org/r/53015/
> >  Is there any method (except `sudo`/`setuser`) to achieve running as a
> user present in the image's /etc/fstab?
> Mesos don't support user namespace now, need to use su to switch users
>
> On Thu, Nov 3, 2016 at 9:56 AM, Tobias Pfeiffer  wrote:
>
>> Actually, say I was in a fancy mood, could I actually *not* use the
>> Docker image provider and instead run `nvidia-docker run [more hand-crafted
>> parameters] myimage ` as an ordinary command within the Mesos
>> container, or would I have to dig very deep into Mesos to find the right
>> parameters to pass to nvidia-docker?
>>
>> Thanks
>> Tobias
>>
>> On Thu, Nov 3, 2016 at 10:18 AM, Tobias Pfeiffer 
>> wrote:
>>
>>> Hi,
>>>
>>> I asked this question also yesterday in the #mesos channel on IRC, but I
>>> guess due to timezone differences there were not many people awake and/or
>>> working, sorry for reposting. (Maybe someone answered after I left, but it
>>> seems that the IRC bot is only archiving channel joins/leaves? ->
>>> http://wilderness.apache.org/channels/?f=apache-syncope/2016-11-02)
>>>
>>> My question is about the Mesos containerizer. I want to run code using
>>> the Mesos GPU support and the docs state that this is currently only
>>> supported by the Mesos containerizer. So my understanding of using the
>>> Mesos containerizer with Docker images is that
>>> - the content of the Docker images is unpacked to the filesystem (using
>>> one of the provisioner backends, such as "copy" or "overlay")
>>> - the user's command is executed in a chroot in that directory.
>>> Is that correct?
>>>
>>> The first thing I noticed is (besides a much higher latency due to the
>>> image provisioning process) that `ps aux` and `hostname` expose details of
>>> the host system, so I was wondering about the level of isolation that I can
>>> achieve with the Mesos containerizer, as opposed to running in a Docker
>>> container. In particular:
>>> - Is it possible to hide host processes from the container?
>>> - Is it possible to run processes that open network ports (possibly
>>> already open on the host system) and have them mapped to different ports on
>>> the host system, just as with Docker's `-p`?
>>> - I have a USER directive in my Dockerfile in order for the CMD to be
>>> executed as that user, but that does not seem to be supported (yet?) by the
>>> Docker image provider. Is there any method (except `sudo`/`setuser`) to
>>> achieve running as a user present in the image's /etc/fstab?
>>> - I may have to run untrusted code, so can I make sure that users cannot
>>> break out of the chroot? What about UID namespacing, so that root in the
>>> chroot does not become root on the host system when breaking out?
>>>
>>> Thanks for your help
>>> Tobias
>>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: Soft limit for docker tasks

2016-09-24 Thread Jie Yu
You can try to use Mesos containerizer to launch your Docker containers
https://github.com/apache/mesos/blob/master/docs/containerizer.md
https://github.com/apache/mesos/blob/master/docs/container-image.md

, and build a custom memory isolator to set the soft memory limit for your
task.

- Jie

On Fri, Sep 23, 2016 at 3:38 PM, Jacopo Sabbatini 
wrote:

> Hi all,
>
> I’m exploring Mesos as a scheduler for batch jobs running in docker
> containers.
>
> When accepting offers and submitting tasks Mesos seems to run the
> container with the hard limit `—memory` and I was wondering if it possible
> to run them with soft memory limit instead, the `—memory-reservation`
> parameter.
>
> I’ll explain the use case. In my company we have data intensive jobs that
> could run for hours. These jobs could have memory spikes and we would like
> not to waste hours of computation just for a brief memory spike. Our
> current scheduler let jobs go over their allocated memory unless there is
> overall memory pressure in the machine, i.e. jobs don’t get killed unless
> they all go over their resource at the same time.
>
> What would be a viable solution to this problem in Mesos?
> Oversubscription? Custom executor?
>
> Thanks
>


Unified cgroups isolator

2016-09-13 Thread Jie Yu
Hi,

We just merged the unified cgroups isolator. Huge shout out to @haosdent
and @qianzhang to make this happen!
https://issues.apache.org/jira/browse/MESOS-4697

Just to give you some context. Previously, it's a huge pain to add a new
cgroups subsystem to Mesos because it requires creating a new isolator (a
lot of code duplication). Now, we merge all the subsystems into one single
isolator, that makes adding a new subsystem very easy.

More importantly, the new cgroups isolator supports cgroups v2!

- Jie


Re: Support for tasks groups aka pods in Mesos

2016-08-11 Thread Jie Yu
Aaron,

an important feature of kubernetes for us is the ability to share mount and
> network namespaces between containers in the pod (so containers can share
> mounts etc)


yes, this will be supported in MVP. it's described in the design doc.

- Jie

On Thu, Aug 11, 2016 at 6:32 AM, Aaron Carey  wrote:

> Hi Vinod,
>
> I'm not sure where this should go in the design doc, but an important
> feature of kubernetes for us is the ability to share mount and network
> namespaces between containers in the pod (so containers can share mounts
> etc). This currently isn't very well supported in docker (I think
> Kubernetes uses its own containeriser to do this?).
>
> We'd be very keen to have this ability in Mesos too.
>
> Thanks,
> Aaron
>
>
> --
>
> Aaron Carey
> Production Engineer - Cloud Pipeline
> Industrial Light & Magic
> London
> 020 3751 9150
>
> --
> *From:* Vinod Kone [vinodk...@apache.org]
> *Sent:* 09 August 2016 02:42
> *To:* Vinod Kone
> *Cc:* dev; user
> *Subject:* Re: Support for tasks groups aka pods in Mesos
>
> Sorry, sent the wrong link earlier for design doc.
>
> Design doc: https://issues.apache.org/jira/browse/MESOS-6009
>>
>
> Direct link: https://docs.google.com/document/d/1FtcyQkDfGp-
> bPHTW4pUoqQCgVlPde936bo-IIENO_ho/edit#heading=h.ip4t59nlogfz
>


Re: [VOTE] Release Apache Mesos 1.0.0 (rc4)

2016-07-26 Thread Jie Yu
I prefer 2 if possible. Since this is a UI fix which should be easy to 
validate. I am worried about people trying 1.0.0 because of the blog posts 
without noticing 1.0.1. If 2 does not work, I am OK with 1.


> On Jul 26, 2016, at 5:39 PM, Vinod Kone <vinodk...@apache.org> wrote:
> 
> We've the ASF press wire and other community blog posts lined up to be posted 
> tomorrow, so it will be really hard to tell all those folks to postpone it 
> this late. I've a couple options that I want to propose
> 
> 1) Fix the webui bug in 1.0.1 which we will cut as soon as we fix this bug.
> 
> 2) Try to fix the bug in the next couple hours, cut rc5, and vote it in 
> tonight without doing the typical 72 hour voting period.
> 
> 
> I'm personally leaning towards 1) given the timing and the nature of the bug. 
> What do others think? PMC?
> 
>> On Tue, Jul 26, 2016 at 4:08 PM, Yan Xu <xuj...@apple.com> wrote:
>> I don't mind if it's shepherd by folks with more front-end expertise. 
>> Actually my original suggested solution on 
>> https://issues.apache.org/jira/browse/MESOS-5911 seemed incorrect. Let's 
>> discuss the actual fix on the ticket, I feel that a short term fix shouldn't 
>> be more than a few lines to unblock the release.
>> 
>>> On Jul 26, 2016, at 3:26 PM, Jie Yu <yujie@gmail.com> wrote:
>>> 
>>> Yan, are you going to shepherd the fix for this one? If yes, when do you
>>> think it can be done?
>>> 
>>> - Jie
>>> 
>>>> On Tue, Jul 26, 2016 at 3:05 PM, Yan Xu <xuj...@apple.com> wrote:
>>>> 
>>>> -1
>>>> 
>>>> We tested it in our testing environment but webUI redirect didn't work. We
>>>> filed: https://issues.apache.org/jira/browse/MESOS-5911
>>>> 
>>>> Given that webUI is the portal for Mesos clusters I feel that we should at
>>>> least have a basic fix (more context in the JIRA) before release 1.0.
>>>> Thoughts?
>>>> 
>>>> On Jul 26, 2016, at 2:52 PM, Kapil Arya <ka...@mesosphere.io> wrote:
>>>> 
>>>> +1 (binding)
>>>> 
>>>> OpenSUSE Tumbleweed:
>>>>./configure --disable-java --disable-python && make check
>>>> 
>>>>> On Tue, Jul 26, 2016 at 4:44 PM, Zhitao Li <zhitaoli...@gmail.com> wrote:
>>>>> 
>>>>> Also tested:
>>>>> 
>>>>> make check passes on OS X
>>>>> 
>>>>> One thing I found when testing RC4 debian with Aurora integration test
>>>>> suite (on its master) is that scheduler previously expected GPU resource
>>>>> will not receive offers without new `GPU_RESOURCES` capability even it's
>>>>> the only scheduler.
>>>>> 
>>>>> Given that GPU support is not technically released until 1.0, I don't
>>>>> consider this is a blocker to me, but it might be surprising to people
>>>>> already testing GPU support.
>>>>> 
>>>>> On Tue, Jul 26, 2016 at 12:45 PM, Benjamin Mahler <bmah...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> +1 (binding)
>>>>>> 
>>>>>> OS X 10.11.6
>>>>>> ./configure --disable-python --disable-java
>>>>>> make check
>>>>>> 
>>>>>>> On Tue, Jul 26, 2016 at 10:24 AM, Greg Mann <g...@mesosphere.io> wrote:
>>>>>>> 
>>>>>>> +1 (non-binding)
>>>>>>> 
>>>>>>> * Ran `sudo make distcheck` successfully on CentOS 7.1 with only one
>>>>>> test
>>>>>>> failure: ExamplesTest.PythonFramework fails for me the first time it's
>>>>>>> executed as part of the whole test suite, and then succeeds on
>>>>>> subsequent
>>>>>>> executions. I'm investigating further, and will file a ticket if
>>>>>> necessary.
>>>>>>> * Ran the upgrade testing script successfully from 0.28.2 -> 1.0.0-rc4
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Greg
>>>>>>> 
>>>>>>>> On Tue, Jul 26, 2016 at 1:58 AM, haosdent <haosd...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> * make check in CentOS 7.2
>>>>>>>> * make check in Ubuntu 14.04
>>>>>>>> * test upg

Re: [VOTE] Release Apache Mesos 1.0.0 (rc4)

2016-07-26 Thread Jie Yu
+1

sudo make check on CentOS 6/7, Debian 8, Fedora 23, Ubuntu 12/14/15/16

- Jie

On Tue, Jul 26, 2016 at 1:58 AM, haosdent  wrote:

> +1
>
> * make check in CentOS 7.2
> * make check in Ubuntu 14.04
> * test upgrade from 0.28.2 to 1.0.0-rc4
>
>
> On Tue, Jul 26, 2016 at 8:33 AM, Kapil Arya  wrote:
>
>> One can find the deb/rpm packages here:
>> http://open.mesosphere.com/downloads/mesos-rc/#apache-mesos-1.0.0-rc4
>>
>> And here are the corresponding docker images based off of Ubuntu 14.04:
>> mesosphere/mesos:1.0.0-rc4
>> mesosphere/mesos-master:1.0.0-rc4
>> mesosphere/mesos-slave:1.0.0-rc4
>>
>> Kapil
>>
>> On Sat, Jul 23, 2016 at 1:40 AM, Vinod Kone  wrote:
>>
>> > Hi all,
>> >
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 1.0.0.
>> >
>> > *The vote is open until Tue Jul 25 11:00:00 PDT 2016 and passes if a
>> > majority of at least 3 +1 PMC votes are cast.*
>>
>> >
>> > 1.0.0 includes the following:
>> >
>> >
>> >
>> 
>> >
>> >   * Scheduler and Executor v1 HTTP APIs are now considered stable.
>> >
>> >
>> >
>> >
>> >
>> >   * [MESOS-4791] - **Experimental** support for v1 Master and Agent
>> APIs.
>> > These
>> >
>> > APIs let operators and services (monitoring, load balancers) send
>> > HTTP
>> >
>> > requests to '/api/v1' endpoint on master or agent. See
>> >
>> >
>> > `docs/operator-http-api.md` for details.
>> >
>> >
>> >
>> >
>> >
>> >   * [MESOS-4828] - **Experimental** support for a new `disk/xfs'
>> isolator
>> >
>> >
>> > has been added to isolate disk resources more efficiently. Please
>> > refer to
>> >
>> > docs/mesos-containerizer.md for more details.
>> >
>> >
>> >
>> >
>> >
>> >   * [MESOS-4355] - **Experimental** support for Docker volume plugin. We
>> > added a
>> >
>> > new isolator 'docker/volume' which allows users to use external
>> > volumes in
>> >
>> > Mesos containerizer. Currently, the isolator interacts with the
>> > Docker
>> >
>> > volume plugins using a tool called 'dvdcli'. By speaking the Docker
>> > volume
>> >
>> > plugin API, most of the Docker volume plugins are supported.
>> >
>> >
>> >
>> >
>> >
>> >   * [MESOS-4641] - **Experimental** A new network isolator, the
>> >
>> >
>> > `network/cni` isolator, has been introduced in the
>> > `MesosContainerizer`. The
>> >
>> > `network/cni` isolator implements the Container Network Interface
>> > (CNI)
>> >
>> > specification proposed by CoreOS.  With CNI the `network/cni`
>> isolator
>> > is
>> >
>> > able to allocate a network namespace to Mesos containers and attach
>> > the
>> >
>> > container to different types of IP networks by invoking network
>> > drivers
>> >
>> > called CNI plugins.
>> >
>> >
>> >
>> >
>> >
>> >   * [MESOS-2948, MESOS-5403] - The authorizer interface has been
>> > refactored in
>> >
>> > order to decouple the ACLs definition language from the interface.
>> >
>> >
>> > It additionally includes the option of retrieving `ObjectApprover`.
>> > An
>> >
>> > `ObjectApprover` can be used to synchronously check authorizations
>> for
>> > a
>> >
>> > given object and is hence useful when authorizing a large number of
>> > objects
>> >
>> > and/or large objects (which need to be copied using request based
>> >
>> >
>> > authorization). NOTE: This is a **breaking change** for authorizer
>> > modules.
>> >
>> >
>> >
>> >
>> >   * [MESOS-5405] - The `subject` and `object` fields in
>> > authorization::Request
>> >
>> > have been changed from required to optional. If either of these
>> fields
>> > is
>> >
>> > not set, the request should only be authorized if any subject/object
>> > should
>> >
>> > be allowed.
>> >
>> >
>> > NOTE: This is a semantic change for authorizer modules.
>> >
>> >
>> >
>> >
>> >
>> >   * [MESOS-4931, MESOS-5709, MESOS-5704] - Authorization based HTTP
>> > endpoint
>> >
>> > filtering enables operators to restrict what part of the cluster
>> state
>> > a
>> >
>> > user is authorized to see.
>> >
>> >
>> > Consider for example the `/state` master endpoint: an operator can
>> > now
>> >
>> > authorize users to only see a subset of the running frameworks,
>> tasks,
>> > or
>> >
>> > Consider for example the `/state` master endpoint: an operator can
>> > now
>> >
>> > authorize users to only see a subset of the running frameworks,
>> tasks,
>> > or
>> >
>> > executors. The following endpoints support HTTP endpoint filtering:
>> >
>> >
>> > '/state', '/state-summary', '/tasks', '/frameworks','/weights',
>> >
>> >
>> > and '/roles'. Additonally the following v1 API calls support
>> > filtering:
>> >
>> > 'GET_ROLES','GET_WEIGHTS','GET_FRAMEWORKS', 'GET_STATE', and
>> > 'GET_TASKS'.
>> >
>> >
>> >
>> >
>> >   * [MESOS-4909] - Tasks can now specify a kill policy. They 

  1   2   >