Re: use of bridge / port-mapper, can't access mapped port from remote server
On 03/12/2018 12:31 PM, Olivier Sallou wrote: > Hi, > > I tried to setup CNI bridge + mesos port mapper with unified container, > following doc > http://mesos.apache.org/documentation/latest/cni/#a-port-mapper-plugin > > This partially works (example with container ip 192.0.0.2 and port > mapping 22 => 31000) > > - my container starts and get a local assigned IP 192.0.0.2 > > - I can access directly to the port of the container: ssh 192.0.0.2 > > - I can access via the *local* gateway: ssh 192.0.0.1 -p 31000 > > > However, I cannot access the container via the IP of my server: ssh > 131.x.y.z -p 31000 > > > In iptables rules, I do not see any mesos related chain. I see no > specific CHAIN nor comment in iptables (iptables -L) Additional info, using -t nat option, I can see iptables chain. Chain MESOS-TEST-PORT-MAPPER (2 references) target prot opt source destination DNAT tcp -- anywhere anywhere tcp dpt:31000 /* container_id: 3a4e0070-7fe2-4807-a643-27ff9608e882 */ to:192.168.0.2:22 In fact I could make it worked, using *external* ip address of my server. One of iptable rules set by mesos prevent routing to localhost, that's why my previous tests failed > > > Is it an expected behavior (port mapping maps ports but only via local > bridge gateway), or should mesos add routes to local mesos bridge to > allow remote access to the mapped ports? > > > I have iptables 1.6.0 and linux kernel 4.4. > > > > I used config from documentation > > bridge.conf > > > { > "name": "cni-test", > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "ipam": { > "type": "host-local", > "subnet": "192.168.0.0/16", > "routes": [ > { "dst": > "0.0.0.0/0" } > ] > } > } > > > and portmapper.conf > > { > "name" : "port-mapper-test", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : ["mesos-cni0"], > "chain": "MESOS-TEST-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "ipam": { > "type": "host-local", > "subnet": "192.168.0.0/16", > "routes": [ > { "dst": > "0.0.0.0/0" } > ] > } > } > } > > Thanks > > > Olivier > -- Olivier Sallou Univ Rennes, Inria, CNRS, IRISA Irisa, Campus de Beaulieu F-35042 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
use of bridge / port-mapper, can't access mapped port from remote server
Hi, I tried to setup CNI bridge + mesos port mapper with unified container, following doc http://mesos.apache.org/documentation/latest/cni/#a-port-mapper-plugin This partially works (example with container ip 192.0.0.2 and port mapping 22 => 31000) - my container starts and get a local assigned IP 192.0.0.2 - I can access directly to the port of the container: ssh 192.0.0.2 - I can access via the *local* gateway: ssh 192.0.0.1 -p 31000 However, I cannot access the container via the IP of my server: ssh 131.x.y.z -p 31000 In iptables rules, I do not see any mesos related chain. I see no specific CHAIN nor comment in iptables (iptables -L) Is it an expected behavior (port mapping maps ports but only via local bridge gateway), or should mesos add routes to local mesos bridge to allow remote access to the mapped ports? I have iptables 1.6.0 and linux kernel 4.4. I used config from documentation bridge.conf { "name": "cni-test", "type": "bridge", "bridge": "mesos-cni0", "isGateway": true, "ipMasq": true, "ipam": { "type": "host-local", "subnet": "192.168.0.0/16", "routes": [ { "dst": "0.0.0.0/0" } ] } } and portmapper.conf { "name" : "port-mapper-test", "type" : "mesos-cni-port-mapper", "excludeDevices" : ["mesos-cni0"], "chain": "MESOS-TEST-PORT-MAPPER", "delegate": { "type": "bridge", "bridge": "mesos-cni0", "isGateway": true, "ipMasq": true, "ipam": { "type": "host-local", "subnet": "192.168.0.0/16", "routes": [ { "dst": "0.0.0.0/0" } ] } } } Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: strange behaviour: Task status -> error-> finished
On 09/19/2017 11:22 AM, Benno Evers wrote: > Hi Olivier, > >> Can we have "non terminal" errors, from mesos point of view, where task > should not be considered as over? > > Not really, what you're seeing certainly looks like a bug, terminal updates > should be terminal. It'lls probably be hard to debug it without more data ;) indeed... > > As a wild guess, since you seem to be using custom task id's, maybe you > tried to start a task twice, and the TASK_ERROR was generated on the master > in response to the duplicate task id or some other validation issue, and > the TASK_FINISHED was generated on the slave when the first task finished? > Although I'm not sure from the top of my head if there are checks in mesos > that would catch this. nope, task was not started twice (got only one TASK_RUNNING event). When resubmitted, task id is modified. Thanks anyway. > > Best regards, > > On Tue, Sep 19, 2017 at 7:47 AM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > >> Hi >> I found a strange behaviour on a cluster that I do not understand. I do >> not have access to mesos logs (not in my cluster), but anyone faced this >> before ? >> My framework uses Docker containerizer. We faced a task that sent >> TASK_ERROR to the framework (why not), but in reality the Docker executed >> correctly on mesos slave, then we received a TASK_FINISHED. >> So mesos detected an error with task but it detected anyway the end of the >> task sending the finished event at the end. >> >> How mesos can detect an error but still watching the task and detect its >> end ? >> >> Here are my framework logs: >> 2017-09-17 01:06:35,447 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 >> is in state TASK_RUNNING >> 2017-09-17 01:06:46,286 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 >> is in state TASK_ERROR >> 2017-09-17 02:13:44,537 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 >> is in state TASK_FINISHED >> >> Unfortunalty I did not log the "reason" of the ERROR, so I do not know >> what occured, and cannot at this stage reproduce manually the use case. >> >> Can we have "non terminal" errors, from mesos point of view, where task >> should not be considered as over? >> >> Thanks >> >> Olivier >> > > -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
strange behaviour: Task status -> error-> finished
Hi I found a strange behaviour on a cluster that I do not understand. I do not have access to mesos logs (not in my cluster), but anyone faced this before ? My framework uses Docker containerizer. We faced a task that sent TASK_ERROR to the framework (why not), but in reality the Docker executed correctly on mesos slave, then we received a TASK_FINISHED. So mesos detected an error with task but it detected anyway the end of the task sending the finished event at the end. How mesos can detect an error but still watching the task and detect its end ? Here are my framework logs: 2017-09-17 01:06:35,447 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 is in state TASK_RUNNING 2017-09-17 01:06:46,286 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 is in state TASK_ERROR 2017-09-17 02:13:44,537 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 is in state TASK_FINISHED Unfortunalty I did not log the "reason" of the ERROR, so I do not know what occured, and cannot at this stage reproduce manually the use case. Can we have "non terminal" errors, from mesos point of view, where task should not be considered as over? Thanks Olivier
Re: GPU Users -- Deprecation of GPU_RESOURCES capability
On 05/21/2017 03:45 AM, Kevin Klues wrote: > Hello GPU users, > > We are currently considering deprecating the requirement that frameworks > register with the GPU _RESOURCES capability in order to receive offers that > contain GPUs. Going forward, we will recommend that users rely on Mesos's > builtin `reservation` mechanism to achieve similar results. > > Before deprecating it, we wanted to get a sense from the community if > anyone is currently relying on this capability and would like to see it > persist. If not, we will begin deprecating it in the next Mesos release and > completely remove it in Mesos 2.0. Well, I am using it for GoDocker framework where jos can specify to sue (or not) some GPUs. > > As background, the original motivation for this capability was to keep > “legacy” frameworks from inadvertently scheduling jobs that don’t require > GPUs on GPU capable machines and thus starving out other frameworks that > legitimately want to place GPU jobs on those machines. The assumption here > was that most machines in a cluster won't have GPUs installed on them, so > some mechanism was necessary to keep legacy frameworks from scheduling jobs > on those machines. In essence, it provided an implicit reservation of GPU > machines for "GPU aware" frameworks, bypassing the traditional > `reservation` mechanism already built into Mesos. > > In such a setup, legacy frameworks would be free to schedule jobs on > non-GPU machines, and "GPU aware" frameworks would be free to schedule GPU > jobs GPU machines and other types of jobs on other machines (or mix and > match them however they please). > > However, the problem comes when *all* machines in a cluster contain GPUs > (or even if most of the machines in a cluster container them). When this is > the case, we have the opposite problem we were trying to solve by > introducing the GPU_RESOURCES capability in the first place. We end up > starving out jobs from legacy frameworks that *don’t* require GPU resources > because there are not enough machines available that don’t have GPUs on > them to service those jobs. We've actually seen this problem manifest in > the wild at least once. > > An alternative to completely deprecating the GPU_RESOURCES flag would be to > add a new flag to the mesos master called `--filter-gpu-resources`. When > set to `true`, this flag will cause the mesos master to continue to > function as it does today. That is, it would filter offers containing GPU > resources and only send them to frameworks that opt into the GPU_RESOURCES > framework capability. When set to `false`, this flag would cause the master > to *not* filter offers containing GPU resources, and indiscriminately send > them to all frameworks whether they set the GPU_RESOURCES capability or not. > > , this flag would allow them to keep relying on it without disruption. > > We'd prefer to deprecate the capability completely, but would consider > adding this flag if people are currently relying on the GPU_RESOURCES > capability and would like to see it persist > > We welcome any feedback you have. > > Kevin + Ben > -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re : Re: protbuf to json not compatible
- Benjamin Mahler <bmah...@apache.org> a écrit : > James, I'm curious, do you know specifically what the incompatibility is? > > Olivier, if you're dealing with protobuf already and trying to send it to > mesos, there's no need to use JSON. Unless you have a requirement to do so? I can manage json, this is fine. Sending protobuf mean sending whole accept message as protobuf, not task definition only. But for this I need mesos.native python package, and i want to avoid this. So i will switch to full json. Olivier > There are some outstanding issues with our JSON<->Protobuf conversion, > specifically we currently are inconsistent from proto3 when it comes to the > int(32|64), fixed(32|64), uint(32|64) handling, for one (we don't allow > strings on the input side (tomek is addressing that), and we don't use > strings on the output side). > > On Fri, Mar 24, 2017 at 12:44 AM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > > > > > > > On 03/24/2017 04:02 AM, James Peach wrote: > > >> On Mar 23, 2017, at 7:58 PM, James Peach <jor...@gmail.com> wrote: > > >> > > >>> On Mar 23, 2017, at 1:54 AM, Olivier Sallou <olivier.sal...@irisa.fr> > > wrote: > > >>> > > >>> Hi, > > >>> > > >>> when transforming a protobug message to json with MessageToJson, the > > >>> json is not compatible with the json format expected by Mesos master. > > >> This is because you generated the protobuf bindings with proto3 > > compiler. AFAICT they made an incompatible change to the JSON wire format. > > This bites you when using the jsonpb Go package, for example. I ended up > > post-processing the generated Go code to correct the field names. > > > Sorry I forgot to mention that the other workaround is to generate the > > protobuf bindings with the proto2 compiler. > > Thanks > > My first workaround is to generate json directly, not a big deal in my > > case, but I wanted to understand. > > > > Olivier > > > > > > J > > > > -- > > Olivier Sallou > > IRISA / University of Rennes 1 > > Campus de Beaulieu, 35000 RENNES - FRANCE > > Tel: 02.99.84.71.95 > > > > gpg key id: 4096R/326D8438 (keyring.debian.org) > > Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 > > > >
how to get executor command? unified containerizer fails with unknown flag 'command'
Hi, while switching from python protobuf library to HTTP API, I face an issue when starting the container with unified containerizer. With the Docker containerizer, everything is fine, but unified containeriser fails (while it worked nice with the python lib, before my modifications). The only executor log I have is: "Failed to parse the flags: Failed to load unknown flag 'command'" and update status reason "REASON_CONTAINER_LAUNCH_FAILED" In slave logs I only find the following: I0324 13:42:39.487109 29096 linux_launcher.cpp:421] Launching container 61327ae0-5c9b-4d4c-a015-674b7112539a and cloning with namespaces CLONE_NEWNS I0324 13:42:39.507308 29096 systemd.cpp:96] Assigned child process '8256' to 'mesos_executors.slice' I0324 13:42:39.558537 29102 containerizer.cpp:2313] Container 61327ae0-5c9b-4d4c-a015-674b7112539a has exited It may be related to my json task definition, but I do not see what mesos is trying to execute, and what this "command" flag is and why it is present. Is there a way for Mesos to add additional logs to display the executor command ? In my task_infos, I define a container of type MESOS and a command with a value. Should be the same than Docker containerizer, only difference is in container that has a mesos parameter instead of a docker parameter. Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: protbuf to json not compatible
On 03/24/2017 04:02 AM, James Peach wrote: >> On Mar 23, 2017, at 7:58 PM, James Peach <jor...@gmail.com> wrote: >> >>> On Mar 23, 2017, at 1:54 AM, Olivier Sallou <olivier.sal...@irisa.fr> wrote: >>> >>> Hi, >>> >>> when transforming a protobug message to json with MessageToJson, the >>> json is not compatible with the json format expected by Mesos master. >> This is because you generated the protobuf bindings with proto3 compiler. >> AFAICT they made an incompatible change to the JSON wire format. This bites >> you when using the jsonpb Go package, for example. I ended up >> post-processing the generated Go code to correct the field names. > Sorry I forgot to mention that the other workaround is to generate the > protobuf bindings with the proto2 compiler. Thanks My first workaround is to generate json directly, not a big deal in my case, but I wanted to understand. Olivier > > J -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
protbuf to json not compatible
Hi, when transforming a protobug message to json with MessageToJson, the json is not compatible with the json format expected by Mesos master. For example, for volumes it generates volumes: [ {'hostPath': '', 'containerPath': '...', ... } ] but HTTP API expects "source" and "container_path" is it an expected behavior ? This prevents from "creating" a task in protobuf format and sending it to HTTP API with a protobug to json conversion. Thanks Olivier -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: cannot find mesos.native python lib (mesos 1.1.0
On 03/13/2017 05:41 PM, Olivier Sallou wrote: > Hi, > > I installed Mesos 1.1.0 via deb repo, but when executing python "import > mesos.native", I have a no module named native. > > I tried to compile from source install egg files directly, but I still > have the issue. installed eggs in a virtualenv works, so this is really a system install related issue... but should not be the case at least with deb files. > > I can however see the module in python path: > > > root:~/mesos-1.1.0/build/src/python/dist# find /usr/lib/python2.7 | grep > native > /usr/lib/python2.7/dist-packages/pygments/styles/native.py > /usr/lib/python2.7/dist-packages/pygments/styles/native.pyc > /usr/lib/python2.7/site-packages/mesos.native-1.1.0-py2.7-nspkg.pth > /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info > /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/namespace_packages.txt > /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/RECORD > /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/DESCRIPTION.rst > /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/top_level.txt > /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/metadata.json > /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/WHEEL > /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/METADATA > /usr/lib/python2.7/site-packages/mesos/native > /usr/lib/python2.7/site-packages/mesos/native/__init__.pyc > /usr/lib/python2.7/site-packages/mesos/native/__init__.py > > If I try to uninstall package (pip uninstall mesos.native), I have error: > "Not uninstalling mesos.native at /usr/lib/python2.7/site-packages, > owned by OS" > > so it is seen by the system, but not by python... :-( > > my PYTHONPATH is /usr/lib/python2.7/site-packages/ > > any idea on how to fix this ? > > Thanks > Olivier > -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
cannot find mesos.native python lib (mesos 1.1.0
Hi, I installed Mesos 1.1.0 via deb repo, but when executing python "import mesos.native", I have a no module named native. I tried to compile from source install egg files directly, but I still have the issue. I can however see the module in python path: root:~/mesos-1.1.0/build/src/python/dist# find /usr/lib/python2.7 | grep native /usr/lib/python2.7/dist-packages/pygments/styles/native.py /usr/lib/python2.7/dist-packages/pygments/styles/native.pyc /usr/lib/python2.7/site-packages/mesos.native-1.1.0-py2.7-nspkg.pth /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/namespace_packages.txt /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/RECORD /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/DESCRIPTION.rst /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/top_level.txt /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/metadata.json /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/WHEEL /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/METADATA /usr/lib/python2.7/site-packages/mesos/native /usr/lib/python2.7/site-packages/mesos/native/__init__.pyc /usr/lib/python2.7/site-packages/mesos/native/__init__.py If I try to uninstall package (pip uninstall mesos.native), I have error: "Not uninstalling mesos.native at /usr/lib/python2.7/site-packages, owned by OS" so it is seen by the system, but not by python... :-( my PYTHONPATH is /usr/lib/python2.7/site-packages/ any idea on how to fix this ? Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: Docker containerizer: override USER
- Mail original - > De: "Gilbert Song" <gilb...@mesosphere.io> > À: "dev" <dev@mesos.apache.org> > Envoyé: Jeudi 1 Septembre 2016 19:21:06 > Objet: Re: Docker containerizer: override USER > > We considered support --user option in docker containerizer. Unfortunately, > it would > potentially break some previous users in behavior. So we did not merge it. > Please > see this JIRA for detail: > > https://issues.apache.org/jira/browse/MESOS-5754 > > However, you can still use DockerInfo::Parameter to specify your --user as a > workaround. That's what I did, but I expected a more *integrated* solution. Thanks anyway Olivier > > Gilbert > > On Thu, Sep 1, 2016 at 9:15 AM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > > > > > > > - Mail original - > > > De: "Qian Zhang" <zhq527...@gmail.com> > > > À: dev@mesos.apache.org > > > Envoyé: Jeudi 1 Septembre 2016 15:57:39 > > > Objet: Re: Docker containerizer: override USER > > > > > > Hi Olivier, > > > > > > Can you try TaskInfo.CommandInfo.user? > > > > I will try but mesos.proto specifies: > > > > // Enables executor and tasks to run as a specific user. If the user > > // field is present both in FrameworkInfo and here, the CommandInfo > > // user value takes precedence. > > > > FrameworkInfo.user is specified in my case and set to the expected user > > XX. So it does not seem that the container is executed wit the --user XX > > flag. > > > > Olivier > > > > > > > > > > > > > Thanks, > > > Qian Zhang > > > > > > On Thu, Sep 1, 2016 at 4:39 PM, Olivier Sallou <olivier.sal...@irisa.fr> > > > wrote: > > > > > > > Hi, > > > > If Docker image specified a USER in Dockerfile, docker will use this > > user > > > > when executing command in container. > > > > In Docker commands, it can be overriden with -u XX . > > > > > > > > I do not find however in mesos.proto a way to do so. There is the > > > > "arguments" of DockerInfo that I could use to append this to the > > executor > > > > command line, but I think it is not advised as it may not be supported > > in > > > > future. > > > > > > > > Did I miss something ? > > > > > > > > Thanks > > > > > > > > Olvier > > > > > > > > > >
Re: Docker containerizer: override USER
- Mail original - > De: "Qian Zhang" <zhq527...@gmail.com> > À: dev@mesos.apache.org > Envoyé: Jeudi 1 Septembre 2016 15:57:39 > Objet: Re: Docker containerizer: override USER > > Hi Olivier, > > Can you try TaskInfo.CommandInfo.user? I will try but mesos.proto specifies: // Enables executor and tasks to run as a specific user. If the user // field is present both in FrameworkInfo and here, the CommandInfo // user value takes precedence. FrameworkInfo.user is specified in my case and set to the expected user XX. So it does not seem that the container is executed wit the --user XX flag. Olivier > > > Thanks, > Qian Zhang > > On Thu, Sep 1, 2016 at 4:39 PM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > > > Hi, > > If Docker image specified a USER in Dockerfile, docker will use this user > > when executing command in container. > > In Docker commands, it can be overriden with -u XX . > > > > I do not find however in mesos.proto a way to do so. There is the > > "arguments" of DockerInfo that I could use to append this to the executor > > command line, but I think it is not advised as it may not be supported in > > future. > > > > Did I miss something ? > > > > Thanks > > > > Olvier > > >
Docker containerizer: override USER
Hi, If Docker image specified a USER in Dockerfile, docker will use this user when executing command in container. In Docker commands, it can be overriden with -u XX . I do not find however in mesos.proto a way to do so. There is the "arguments" of DockerInfo that I could use to append this to the executor command line, but I think it is not advised as it may not be supported in future. Did I miss something ? Thanks Olvier
Re: Maintenance API question
- Mail original - > De: "Joseph Wu" <jos...@mesosphere.io> > À: "dev" <dev@mesos.apache.org> > Envoyé: Mercredi 31 Août 2016 17:16:57 > Objet: Re: Maintenance API question > > Most likely, the hostname and IP you've put into the "machine_Ids" > does not *exactly > match* the hostname and IP the agent is identifying itself as. in this case master should reject the request according to the documentation. Here it is accepted (200 OK in response and appears in maintenance/schedule and maintenance/status If in > doubt, you can check the master's /slaves endpoint. Or, you can manually > set the hostname and IP when starting the agent. I took information for the master UI and it is the same. Maybe the issue is the fact I am on a single machine, so hostname and ip are the same for master and slave > > On Wed, Aug 31, 2016 at 3:16 AM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > > > Hi, > > I am trying to use the /maintenance API for mesos slave maintenance/drain. > > > > I follow doc at http://mesos.apache.org/documentation/latest/maintenance/ > > > > I use mesos 1.0.1 on a single machine (for dev). > > > > When scheduling a node using > > > > > > > > { > > "windows" : [ > > { > > "machine_ids" : [ > > { "hostname" : "tifenn.irisa.fr", "ip" : "127.0.0.1" } > > ], > > "unavailability" : { > > "start" : { "nanoseconds" : 14726373400 }, > > "duration" : { "nanoseconds" : 36000 } > > } > > } > > ] > > } > > > > > > > > > > The start date is set in the recent past (setting to future did not > > change). > > > > > > I see in /maintenance/status > > > > {"draining_machines":[{"id":{"hostname":"tifenn.irisa.fr"," > > ip":"127.0.0.1"}}]} > > > > However, the offers I receive do not contain the unavailibility parameter. > > I do not know if it is expected, but start/duration do not appear in > > maintenance/status result. > > I see in master logs: HTTP POST for /master/maintenance/schedule from > > 127.0.0.1:34858 with User-Agent='curl/7.43.0' > > > > > > I tried anyway to switch the node to maintenance (/maintenance/down) but I > > continue to receive offers for this slave. In status, I see my slave in > > machines_down: > > > > {"down_machines":[{"hostname":"tifenn.irisa.fr","ip":"127.0.0.1"}]} > > > > I can see on master logs: > > > > > > > > I0831 12:12:37.568898 6428 http.cpp:381] HTTP POST for > > /master/machine/down from 127.0.0.1:34970 with User-Agent='curl/7.43.0' > > > > > > > > Sending 1 offers to framework a559cd9e-3e58-4377-9e1a-c8f3d28d2318- > > (Go-Docker Mesos) at scheduler-41e42d1f-b8f8-473a- > > b460-6fab3a150915@127.0.1.1:43060 > > > > > > > > > > Should something be set to enable maintenance in mesos ? > > > > > > > > > > Thanks > > > > > > > > > > Olivier > > >
Maintenance API question
Hi, I am trying to use the /maintenance API for mesos slave maintenance/drain. I follow doc at http://mesos.apache.org/documentation/latest/maintenance/ I use mesos 1.0.1 on a single machine (for dev). When scheduling a node using { "windows" : [ { "machine_ids" : [ { "hostname" : "tifenn.irisa.fr", "ip" : "127.0.0.1" } ], "unavailability" : { "start" : { "nanoseconds" : 14726373400 }, "duration" : { "nanoseconds" : 36000 } } } ] } The start date is set in the recent past (setting to future did not change). I see in /maintenance/status {"draining_machines":[{"id":{"hostname":"tifenn.irisa.fr","ip":"127.0.0.1"}}]} However, the offers I receive do not contain the unavailibility parameter. I do not know if it is expected, but start/duration do not appear in maintenance/status result. I see in master logs: HTTP POST for /master/maintenance/schedule from 127.0.0.1:34858 with User-Agent='curl/7.43.0' I tried anyway to switch the node to maintenance (/maintenance/down) but I continue to receive offers for this slave. In status, I see my slave in machines_down: {"down_machines":[{"hostname":"tifenn.irisa.fr","ip":"127.0.0.1"}]} I can see on master logs: I0831 12:12:37.568898 6428 http.cpp:381] HTTP POST for /master/machine/down from 127.0.0.1:34970 with User-Agent='curl/7.43.0' Sending 1 offers to framework a559cd9e-3e58-4377-9e1a-c8f3d28d2318- (Go-Docker Mesos) at scheduler-41e42d1f-b8f8-473a-b460-6fab3a150915@127.0.1.1:43060 Should something be set to enable maintenance in mesos ? Thanks Olivier
Re: Fail to get CNI with unified containerizer, job remains stuck on staging
On 08/24/2016 04:04 PM, Avinash Sridharan wrote: > Oliver, you can't have the agent running on 127.0.0.1. The agent needs to > be running in a routeabl IP address (choose an IP from one of the > interfaces). > > Reason being that if agent is on local host the executor running in its own > network namespace will try to make a connection in its own network > namespace and fail. Thanks! modying ip address to reachable IP works. > On Wed, Aug 24, 2016 at 5:15 AM Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > >> I have the same behavior with Calico. Task get IP from CNI plugin, but >> task remains in STAGING and same logs. >> >> mesos-execute --containerizer=mesos \ >>> --name=cni \ >>> --master=127.0.0.1:5050 \ >>> --networks=calico-net-1 \ >>> --command="ifconfig" >> I0824 14:12:03.202328 24912 scheduler.cpp:172] Version: 1.0.0 >> I0824 14:12:03.203009 24911 scheduler.cpp:461] New master detected at >> master@127.0.0.1:5050 >> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0017' >> Submitted task 'cni' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' >> >> REMAINS STAGING! >> >> >> I0824 14:12:03.857158 24806 cni.cpp:1109] Got assigned IPv4 address >> '192.168.0.0/32' from CNI network 'calico-net-1' for container >> bdbb275a-ec5f-4a50-aca0-5e694ae57324 >> I0824 14:12:03.857348 24805 cni.cpp:838] Unable to find DNS nameservers >> for container bdbb275a-ec5f-4a50-aca0-5e694ae57324. Using host >> '/etc/resolv.conf' >> >> No more logs >> >> >> Olivier >> >> On 08/24/2016 08:23 AM, Olivier Sallou wrote: >>> On 08/23/2016 06:13 PM, Jie Yu wrote: >>>> The DNS related logging means that the weave plugin does not return DNS >>>> information, the agent uses the host resolv.conf for the container. So I >>>> think is irrelevant to your problem. >>>> >>>> Mesos requires that executor can talk to agent. Can you see if there is >> a >>>> route from 10.32.0.1 to the agent IP? >>> How can I check this as task does not start ? I have exposed weave >>> network on host: >>> >>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo ./weave expose >>> 10.32.0.2 >>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ ping 10.32.0.2 >>> PING 10.32.0.2 (10.32.0.2) 56(84) bytes of data. >>> 64 bytes from 10.32.0.2: icmp_seq=1 ttl=64 time=0.032 ms >>> 64 bytes from 10.32.0.2: icmp_seq=2 ttl=64 time=0.029 ms >>> 64 bytes from 10.32.0.2: icmp_seq=3 ttl=64 time=0.029 ms >>> 64 bytes from 10.32.0.2: icmp_seq=4 ttl=64 time=0.031 ms >>> >>> And why is it blocking? >>> >>> I am on a single host environement, so agent is on 127.0.0.1. >>> >>> Olivier >>>> On Tue, Aug 23, 2016 at 9:05 AM, Olivier Sallou < >> olivier.sal...@irisa.fr> >>>> wrote: >>>> >>>>> HI, >>>>> >>>>> I have setup Mesos 1.0.0-2 to use CNI with Weave (1.6.1) >>>>> >>>>> Weave works nicely with the Docker containerizer. >>>>> >>>>> When I try to launch a task via my framework with unified >> containerizer, >>>>> the job remains waiting forever (no RUNNING message). I can see however >>>>> that weave cni allocated an IP address to Mesos. >>>>> >>>>> I tried with a simple mesos-execute test. >>>>> >>>>> Example with a mesos-execute with no CNI, everything is OK >>>>> >>>>> >>>>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo >> mesos-execute >>>>> --command="sleep 2" -docker_image=centos:latest --master= >> 127.0.0.1:5050 >>>>> --name=test0 I0823 17:56:50.067520 28815 scheduler.cpp:172] Version: >> 1.0.0 >>>>> I0823 17:56:50.068260 28822 scheduler.cpp:461] New master detected at >>>>> master@127.0.0.1:5050 >>>>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0005' >>>>> Submitted task 'test0' to agent >> 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' >>>>> Received status update TASK_RUNNING for task 'test0' >>>>> source: SOURCE_EXECUTOR >>>>> Received status update TASK_FINISHED for task 'test0' >>>>> message: 'Command exited with status 0' >>>>> >>>>> >>>>> Sample example specifying t
Re: Fail to get CNI with unified containerizer, job remains stuck on staging
I have the same behavior with Calico. Task get IP from CNI plugin, but task remains in STAGING and same logs. mesos-execute --containerizer=mesos \ > --name=cni \ > --master=127.0.0.1:5050 \ > --networks=calico-net-1 \ > --command="ifconfig" I0824 14:12:03.202328 24912 scheduler.cpp:172] Version: 1.0.0 I0824 14:12:03.203009 24911 scheduler.cpp:461] New master detected at master@127.0.0.1:5050 Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0017' Submitted task 'cni' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' REMAINS STAGING! I0824 14:12:03.857158 24806 cni.cpp:1109] Got assigned IPv4 address '192.168.0.0/32' from CNI network 'calico-net-1' for container bdbb275a-ec5f-4a50-aca0-5e694ae57324 I0824 14:12:03.857348 24805 cni.cpp:838] Unable to find DNS nameservers for container bdbb275a-ec5f-4a50-aca0-5e694ae57324. Using host '/etc/resolv.conf' No more logs Olivier On 08/24/2016 08:23 AM, Olivier Sallou wrote: > > On 08/23/2016 06:13 PM, Jie Yu wrote: >> The DNS related logging means that the weave plugin does not return DNS >> information, the agent uses the host resolv.conf for the container. So I >> think is irrelevant to your problem. >> >> Mesos requires that executor can talk to agent. Can you see if there is a >> route from 10.32.0.1 to the agent IP? > How can I check this as task does not start ? I have exposed weave > network on host: > > osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo ./weave expose > 10.32.0.2 > osallou@tifenn~/Development/NOSAVE/go-docker/weave $ ping 10.32.0.2 > PING 10.32.0.2 (10.32.0.2) 56(84) bytes of data. > 64 bytes from 10.32.0.2: icmp_seq=1 ttl=64 time=0.032 ms > 64 bytes from 10.32.0.2: icmp_seq=2 ttl=64 time=0.029 ms > 64 bytes from 10.32.0.2: icmp_seq=3 ttl=64 time=0.029 ms > 64 bytes from 10.32.0.2: icmp_seq=4 ttl=64 time=0.031 ms > > And why is it blocking? > > I am on a single host environement, so agent is on 127.0.0.1. > > Olivier >> On Tue, Aug 23, 2016 at 9:05 AM, Olivier Sallou <olivier.sal...@irisa.fr> >> wrote: >> >>> HI, >>> >>> I have setup Mesos 1.0.0-2 to use CNI with Weave (1.6.1) >>> >>> Weave works nicely with the Docker containerizer. >>> >>> When I try to launch a task via my framework with unified containerizer, >>> the job remains waiting forever (no RUNNING message). I can see however >>> that weave cni allocated an IP address to Mesos. >>> >>> I tried with a simple mesos-execute test. >>> >>> Example with a mesos-execute with no CNI, everything is OK >>> >>> >>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute >>> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050 >>> --name=test0 I0823 17:56:50.067520 28815 scheduler.cpp:172] Version: 1.0.0 >>> I0823 17:56:50.068260 28822 scheduler.cpp:461] New master detected at >>> master@127.0.0.1:5050 >>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0005' >>> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' >>> Received status update TASK_RUNNING for task 'test0' >>> source: SOURCE_EXECUTOR >>> Received status update TASK_FINISHED for task 'test0' >>> message: 'Command exited with status 0' >>> >>> >>> Sample example specifying the weave network >>> >>> >>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute >>> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050 >>> --name=test0 --networks=weave >>> I0823 17:57:15.845304 28856 scheduler.cpp:172] Version: 1.0.0 >>> I0823 17:57:15.846248 28857 scheduler.cpp:461] New master detected at >>> master@127.0.0.1:5050 >>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0006' >>> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' >>> ==> REMAINS WAITING HERE, job is in STAGING in Mesos UI >>> >>> mesos-slave logs: >>> >>> I0823 17:57:15.873872 26522 cni.cpp:716] Bind mounted >>> '/proc/28869/ns/net' to >>> '/run/mesos/isolators/network/cni/4f91a5df-2e9a-4cfc-93f5-aa197646db09/ns' >>> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09 >>> I0823 17:57:16.257063 26519 cni.cpp:1109] Got assigned IPv4 address >>> '10.32.0.1/12' from CNI network 'weave' for container >>> 4f91a5df-2e9a-4cfc-93f5-aa197646db09 >>> I0823 17:57:16.257258 26525 cni.cpp:838] Unable to find DNS nameservers >>> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09. Using host >>> '/etc/resolv.conf' >>> >>> There are no other logs until I kill the job. >>> We can see that Mesos container got an IP but it seems to block on DNS, >>> >>> Thanks for hints >>> >>> -- >>> >>> gpg key id: 4096R/326D8438 (keyring.debian.org) >>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >>> >>> >>> -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: Fail to get CNI with unified containerizer, job remains stuck on staging
On 08/24/2016 08:23 AM, Olivier Sallou wrote: > > On 08/23/2016 06:13 PM, Jie Yu wrote: >> The DNS related logging means that the weave plugin does not return DNS >> information, the agent uses the host resolv.conf for the container. So I >> think is irrelevant to your problem. >> >> Mesos requires that executor can talk to agent. Can you see if there is a >> route from 10.32.0.1 to the agent IP? > How can I check this as task does not start ? I have exposed weave > network on host: > > osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo ./weave expose > 10.32.0.2 > osallou@tifenn~/Development/NOSAVE/go-docker/weave $ ping 10.32.0.2 > PING 10.32.0.2 (10.32.0.2) 56(84) bytes of data. > 64 bytes from 10.32.0.2: icmp_seq=1 ttl=64 time=0.032 ms > 64 bytes from 10.32.0.2: icmp_seq=2 ttl=64 time=0.029 ms > 64 bytes from 10.32.0.2: icmp_seq=3 ttl=64 time=0.029 ms > 64 bytes from 10.32.0.2: icmp_seq=4 ttl=64 time=0.031 ms > > And why is it blocking? > > I am on a single host environement, so agent is on 127.0.0.1. By the way, running a Docker container to use the weave CNI plugin works fine, it gets it IP and container runs nicely. > > Olivier >> On Tue, Aug 23, 2016 at 9:05 AM, Olivier Sallou <olivier.sal...@irisa.fr> >> wrote: >> >>> HI, >>> >>> I have setup Mesos 1.0.0-2 to use CNI with Weave (1.6.1) >>> >>> Weave works nicely with the Docker containerizer. >>> >>> When I try to launch a task via my framework with unified containerizer, >>> the job remains waiting forever (no RUNNING message). I can see however >>> that weave cni allocated an IP address to Mesos. >>> >>> I tried with a simple mesos-execute test. >>> >>> Example with a mesos-execute with no CNI, everything is OK >>> >>> >>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute >>> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050 >>> --name=test0 I0823 17:56:50.067520 28815 scheduler.cpp:172] Version: 1.0.0 >>> I0823 17:56:50.068260 28822 scheduler.cpp:461] New master detected at >>> master@127.0.0.1:5050 >>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0005' >>> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' >>> Received status update TASK_RUNNING for task 'test0' >>> source: SOURCE_EXECUTOR >>> Received status update TASK_FINISHED for task 'test0' >>> message: 'Command exited with status 0' >>> >>> >>> Sample example specifying the weave network >>> >>> >>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute >>> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050 >>> --name=test0 --networks=weave >>> I0823 17:57:15.845304 28856 scheduler.cpp:172] Version: 1.0.0 >>> I0823 17:57:15.846248 28857 scheduler.cpp:461] New master detected at >>> master@127.0.0.1:5050 >>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0006' >>> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' >>> ==> REMAINS WAITING HERE, job is in STAGING in Mesos UI >>> >>> mesos-slave logs: >>> >>> I0823 17:57:15.873872 26522 cni.cpp:716] Bind mounted >>> '/proc/28869/ns/net' to >>> '/run/mesos/isolators/network/cni/4f91a5df-2e9a-4cfc-93f5-aa197646db09/ns' >>> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09 >>> I0823 17:57:16.257063 26519 cni.cpp:1109] Got assigned IPv4 address >>> '10.32.0.1/12' from CNI network 'weave' for container >>> 4f91a5df-2e9a-4cfc-93f5-aa197646db09 >>> I0823 17:57:16.257258 26525 cni.cpp:838] Unable to find DNS nameservers >>> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09. Using host >>> '/etc/resolv.conf' >>> >>> There are no other logs until I kill the job. >>> We can see that Mesos container got an IP but it seems to block on DNS, >>> >>> Thanks for hints >>> >>> -- >>> >>> gpg key id: 4096R/326D8438 (keyring.debian.org) >>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >>> >>> >>> -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: Fail to get CNI with unified containerizer, job remains stuck on staging
On 08/23/2016 06:13 PM, Jie Yu wrote: > The DNS related logging means that the weave plugin does not return DNS > information, the agent uses the host resolv.conf for the container. So I > think is irrelevant to your problem. > > Mesos requires that executor can talk to agent. Can you see if there is a > route from 10.32.0.1 to the agent IP? How can I check this as task does not start ? I have exposed weave network on host: osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo ./weave expose 10.32.0.2 osallou@tifenn~/Development/NOSAVE/go-docker/weave $ ping 10.32.0.2 PING 10.32.0.2 (10.32.0.2) 56(84) bytes of data. 64 bytes from 10.32.0.2: icmp_seq=1 ttl=64 time=0.032 ms 64 bytes from 10.32.0.2: icmp_seq=2 ttl=64 time=0.029 ms 64 bytes from 10.32.0.2: icmp_seq=3 ttl=64 time=0.029 ms 64 bytes from 10.32.0.2: icmp_seq=4 ttl=64 time=0.031 ms And why is it blocking? I am on a single host environement, so agent is on 127.0.0.1. Olivier > > On Tue, Aug 23, 2016 at 9:05 AM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > >> HI, >> >> I have setup Mesos 1.0.0-2 to use CNI with Weave (1.6.1) >> >> Weave works nicely with the Docker containerizer. >> >> When I try to launch a task via my framework with unified containerizer, >> the job remains waiting forever (no RUNNING message). I can see however >> that weave cni allocated an IP address to Mesos. >> >> I tried with a simple mesos-execute test. >> >> Example with a mesos-execute with no CNI, everything is OK >> >> >> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute >> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050 >> --name=test0 I0823 17:56:50.067520 28815 scheduler.cpp:172] Version: 1.0.0 >> I0823 17:56:50.068260 28822 scheduler.cpp:461] New master detected at >> master@127.0.0.1:5050 >> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0005' >> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' >> Received status update TASK_RUNNING for task 'test0' >> source: SOURCE_EXECUTOR >> Received status update TASK_FINISHED for task 'test0' >> message: 'Command exited with status 0' >> >> >> Sample example specifying the weave network >> >> >> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute >> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050 >> --name=test0 --networks=weave >> I0823 17:57:15.845304 28856 scheduler.cpp:172] Version: 1.0.0 >> I0823 17:57:15.846248 28857 scheduler.cpp:461] New master detected at >> master@127.0.0.1:5050 >> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0006' >> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' >> ==> REMAINS WAITING HERE, job is in STAGING in Mesos UI >> >> mesos-slave logs: >> >> I0823 17:57:15.873872 26522 cni.cpp:716] Bind mounted >> '/proc/28869/ns/net' to >> '/run/mesos/isolators/network/cni/4f91a5df-2e9a-4cfc-93f5-aa197646db09/ns' >> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09 >> I0823 17:57:16.257063 26519 cni.cpp:1109] Got assigned IPv4 address >> '10.32.0.1/12' from CNI network 'weave' for container >> 4f91a5df-2e9a-4cfc-93f5-aa197646db09 >> I0823 17:57:16.257258 26525 cni.cpp:838] Unable to find DNS nameservers >> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09. Using host >> '/etc/resolv.conf' >> >> There are no other logs until I kill the job. >> We can see that Mesos container got an IP but it seems to block on DNS, >> >> Thanks for hints >> >> -- >> >> gpg key id: 4096R/326D8438 (keyring.debian.org) >> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >> >> >> -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: mesos per task monitoring/metrics
On 08/23/2016 09:37 PM, Benjamin Mahler wrote: > +jie > > Hi Olivier, > > Could you tell us what you're trying to do at a high level? > > I'm not familiar with cAdvisor, are you trying to generate a link to the > cAdvisor page for a particular container? on the web interface of my app, I propose to show real time cpu/mem usage of the job. To do so, I indeed "link" to the cadvisor job page. cAdvisor API is a URL with the container id. So I need to know the container id. I can send a request to the slave to get it, but this not really efficient, it would be best to get the container id in TaskStatus message. In mesos.proto there is a ContainerStatus in the TaskStatus, but it also sends network/cgroup related info, not the container id, it would be nice to get it here. When we use the Docker containerizer, we have the container id info in the TaskStatus data parameter. Olivier > > Ben > > On Tue, Aug 23, 2016 at 7:53 AM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > >> >> On 08/23/2016 04:12 PM, haosdent wrote: >>> Hi, @Olivier You could get the containerId from the state endpoint of >> Mesos >>> Agent. http://mesos.apache.org/documentation/latest/ >> endpoints/slave/state/ >> Yes, I saw that, but I expected to get it from the TaskStatus message on >> RUNNING state change. >> >> With the Docker containerizer, we could get the container id in the data >> parameter. >> >> Triggering the slave on each task to get its container id is a little >> tricky and "expensive". >> >> Olivier >>> On Tue, Aug 23, 2016 at 3:50 PM, Olivier Sallou <olivier.sal...@irisa.fr >>> >>> wrote: >>> >>>> One more question though. Using cgroups isolation, I can see mesos >>>> container in cAdvisor under, for example: >>>> >>>> /mesos/966e0b09-f38e-497c-afb8-0133d8fb48b1 >>>> >>>> >>>> but where can I get the container Id >>>> 966e0b09-f38e-497c-afb8-0133d8fb48b1 from TaskStatus ? >>>> >>>> >>>> I can see in Mesos UI the job details for a URL like: >>>> >>>> >>>> var / lib / mesos / slaves / b1925e13-76db-4225-a3dc-39ce65c79b3c-S0 / >>>> frameworks / b1925e13-76db-4225-a3dc-39ce65c79b3c- / executors / >>>> 274 / runs / 966e0b09-f38e-497c-afb8-0133d8fb48b1 >>>> >>>> >>>> I can know/find all parameters but this last one. >>>> >>>> >>>> Thanks >>>> >>>> >>>> Olivier >>>> >>>> >>>> On 08/23/2016 09:30 AM, Olivier Sallou wrote: >>>>> ok, >>>>> >>>>> activating isolation with cgroups ni slave config activates detailled >>>> stats. >>>>> On 08/23/2016 09:23 AM, Olivier Sallou wrote: >>>>>> Hi, >>>>>> >>>>>> when switching to docker containerizer to unified containerizer, I >> lost >>>>>> the capacity to monitor task metrics (used cpu, used mem, ...) from >>>>>> cAdvisor. >>>>>> >>>>>> I tried to get stats from /monitor/statistics.json but I do not have >> any >>>>>> "live" metrics: >>>>>> >>>>>> [{"executor_id":"271","executor_name":"Command Executor (Task: 271) >>>>>> (Command: sh -c >>>>>> '\/mnt\/go-dock...')","framework_id":"b1925e13-76db- >>>> 4225-a3dc-39ce65c79b3c-","source":"271","statistics":{" >>>> cpus_limit":1.1,"mem_limit_bytes":2130706432,"timestamp": >>>> 1471936602.26916}}] >>>>>> I only see reserved metrics. >>>>>> >>>>>> >>>>>> Is there any specific config to get "live" monitoring. >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Olivier >>>>>> >>>> -- >>>> >>>> gpg key id: 4096R/326D8438 (keyring.debian.org) >>>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >>>> >>>> >> -- >> Olivier Sallou >> IRISA / University of Rennes 1 >> Campus de Beaulieu, 35000 RENNES - FRANCE >> Tel: 02.99.84.71.95 >> >> gpg key id: 4096R/326D8438 (keyring.debian.org) >> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >> >> -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Fail to get CNI with unified containerizer, job remains stuck on staging
HI, I have setup Mesos 1.0.0-2 to use CNI with Weave (1.6.1) Weave works nicely with the Docker containerizer. When I try to launch a task via my framework with unified containerizer, the job remains waiting forever (no RUNNING message). I can see however that weave cni allocated an IP address to Mesos. I tried with a simple mesos-execute test. Example with a mesos-execute with no CNI, everything is OK osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050 --name=test0 I0823 17:56:50.067520 28815 scheduler.cpp:172] Version: 1.0.0 I0823 17:56:50.068260 28822 scheduler.cpp:461] New master detected at master@127.0.0.1:5050 Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0005' Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' Received status update TASK_RUNNING for task 'test0' source: SOURCE_EXECUTOR Received status update TASK_FINISHED for task 'test0' message: 'Command exited with status 0' Sample example specifying the weave network osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050 --name=test0 --networks=weave I0823 17:57:15.845304 28856 scheduler.cpp:172] Version: 1.0.0 I0823 17:57:15.846248 28857 scheduler.cpp:461] New master detected at master@127.0.0.1:5050 Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0006' Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0' ==> REMAINS WAITING HERE, job is in STAGING in Mesos UI mesos-slave logs: I0823 17:57:15.873872 26522 cni.cpp:716] Bind mounted '/proc/28869/ns/net' to '/run/mesos/isolators/network/cni/4f91a5df-2e9a-4cfc-93f5-aa197646db09/ns' for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09 I0823 17:57:16.257063 26519 cni.cpp:1109] Got assigned IPv4 address '10.32.0.1/12' from CNI network 'weave' for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09 I0823 17:57:16.257258 26525 cni.cpp:838] Unable to find DNS nameservers for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09. Using host '/etc/resolv.conf' There are no other logs until I kill the job. We can see that Mesos container got an IP but it seems to block on DNS, Thanks for hints -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: mesos per task monitoring/metrics
On 08/23/2016 04:12 PM, haosdent wrote: > Hi, @Olivier You could get the containerId from the state endpoint of Mesos > Agent. http://mesos.apache.org/documentation/latest/endpoints/slave/state/ Yes, I saw that, but I expected to get it from the TaskStatus message on RUNNING state change. With the Docker containerizer, we could get the container id in the data parameter. Triggering the slave on each task to get its container id is a little tricky and "expensive". Olivier > > On Tue, Aug 23, 2016 at 3:50 PM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > >> One more question though. Using cgroups isolation, I can see mesos >> container in cAdvisor under, for example: >> >> /mesos/966e0b09-f38e-497c-afb8-0133d8fb48b1 >> >> >> but where can I get the container Id >> 966e0b09-f38e-497c-afb8-0133d8fb48b1 from TaskStatus ? >> >> >> I can see in Mesos UI the job details for a URL like: >> >> >> var / lib / mesos / slaves / b1925e13-76db-4225-a3dc-39ce65c79b3c-S0 / >> frameworks / b1925e13-76db-4225-a3dc-39ce65c79b3c- / executors / >> 274 / runs / 966e0b09-f38e-497c-afb8-0133d8fb48b1 >> >> >> I can know/find all parameters but this last one. >> >> >> Thanks >> >> >> Olivier >> >> >> On 08/23/2016 09:30 AM, Olivier Sallou wrote: >>> ok, >>> >>> activating isolation with cgroups ni slave config activates detailled >> stats. >>> >>> On 08/23/2016 09:23 AM, Olivier Sallou wrote: >>>> Hi, >>>> >>>> when switching to docker containerizer to unified containerizer, I lost >>>> the capacity to monitor task metrics (used cpu, used mem, ...) from >>>> cAdvisor. >>>> >>>> I tried to get stats from /monitor/statistics.json but I do not have any >>>> "live" metrics: >>>> >>>> [{"executor_id":"271","executor_name":"Command Executor (Task: 271) >>>> (Command: sh -c >>>> '\/mnt\/go-dock...')","framework_id":"b1925e13-76db- >> 4225-a3dc-39ce65c79b3c-","source":"271","statistics":{" >> cpus_limit":1.1,"mem_limit_bytes":2130706432,"timestamp": >> 1471936602.26916}}] >>>> I only see reserved metrics. >>>> >>>> >>>> Is there any specific config to get "live" monitoring. >>>> >>>> >>>> Thanks >>>> >>>> Olivier >>>> >> -- >> >> gpg key id: 4096R/326D8438 (keyring.debian.org) >> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >> >> > -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: How to select Containerizer? Mesos containerizer or Docker?
On 08/23/2016 11:36 AM, Yu Wei wrote: > Hi, > > > Which containerizer should be used? Mesos, Docker or other? > > Is there any principles to help making decision? with unified containerizer, mesos pushes for Mesos containerizer. However, it does not support for the moment port mapping. So if you need port mapping, you should go to Docker one. Olivier > > > Thanks, > > > Jared, (??) > Software developer > Interested in open source software, big data, Linux > -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: mesos per task monitoring/metrics
One more question though. Using cgroups isolation, I can see mesos container in cAdvisor under, for example: /mesos/966e0b09-f38e-497c-afb8-0133d8fb48b1 but where can I get the container Id 966e0b09-f38e-497c-afb8-0133d8fb48b1 from TaskStatus ? I can see in Mesos UI the job details for a URL like: var / lib / mesos / slaves / b1925e13-76db-4225-a3dc-39ce65c79b3c-S0 / frameworks / b1925e13-76db-4225-a3dc-39ce65c79b3c- / executors / 274 / runs / 966e0b09-f38e-497c-afb8-0133d8fb48b1 I can know/find all parameters but this last one. Thanks Olivier On 08/23/2016 09:30 AM, Olivier Sallou wrote: > ok, > > activating isolation with cgroups ni slave config activates detailled stats. > > > On 08/23/2016 09:23 AM, Olivier Sallou wrote: >> Hi, >> >> when switching to docker containerizer to unified containerizer, I lost >> the capacity to monitor task metrics (used cpu, used mem, ...) from >> cAdvisor. >> >> I tried to get stats from /monitor/statistics.json but I do not have any >> "live" metrics: >> >> [{"executor_id":"271","executor_name":"Command Executor (Task: 271) >> (Command: sh -c >> '\/mnt\/go-dock...')","framework_id":"b1925e13-76db-4225-a3dc-39ce65c79b3c-","source":"271","statistics":{"cpus_limit":1.1,"mem_limit_bytes":2130706432,"timestamp":1471936602.26916}}] >> >> I only see reserved metrics. >> >> >> Is there any specific config to get "live" monitoring. >> >> >> Thanks >> >> Olivier >> -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: mesos per task monitoring/metrics
ok, activating isolation with cgroups ni slave config activates detailled stats. On 08/23/2016 09:23 AM, Olivier Sallou wrote: > Hi, > > when switching to docker containerizer to unified containerizer, I lost > the capacity to monitor task metrics (used cpu, used mem, ...) from > cAdvisor. > > I tried to get stats from /monitor/statistics.json but I do not have any > "live" metrics: > > [{"executor_id":"271","executor_name":"Command Executor (Task: 271) > (Command: sh -c > '\/mnt\/go-dock...')","framework_id":"b1925e13-76db-4225-a3dc-39ce65c79b3c-","source":"271","statistics":{"cpus_limit":1.1,"mem_limit_bytes":2130706432,"timestamp":1471936602.26916}}] > > I only see reserved metrics. > > > Is there any specific config to get "live" monitoring. > > > Thanks > > Olivier > -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
mesos per task monitoring/metrics
Hi, when switching to docker containerizer to unified containerizer, I lost the capacity to monitor task metrics (used cpu, used mem, ...) from cAdvisor. I tried to get stats from /monitor/statistics.json but I do not have any "live" metrics: [{"executor_id":"271","executor_name":"Command Executor (Task: 271) (Command: sh -c '\/mnt\/go-dock...')","framework_id":"b1925e13-76db-4225-a3dc-39ce65c79b3c-","source":"271","statistics":{"cpus_limit":1.1,"mem_limit_bytes":2130706432,"timestamp":1471936602.26916}}] I only see reserved metrics. Is there any specific config to get "live" monitoring. Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: error when using dockerinfo user network
answering to myself ;-) I use a virtualenv, and mesos package now install python lib on system. I had in my virtualenv the old python libs of mesos - Mail original - > De: "Olivier Sallou" <olivier.sal...@irisa.fr> > À: dev@mesos.apache.org > Envoyé: Vendredi 19 Août 2016 09:10:52 > Objet: error when using dockerinfo user network > > HI, > > I just upgraded to mesos 1.0 (package 1.0.0-2.0.89.ubuntu1510). > > I tried to setup with Docker containerizer the use of a user defined > network (via docker cni plugin), using python binding. > > > I face an error: > > "Unknown enum value: 4" when setting DockerInfo network value to 4. > > > I was previously setting to value 2 (bridge) and it works. > > I can see in mesos.proto: > > enum Network { > HOST = 1; > BRIDGE = 2; > NONE = 3; > USER = 4; > } > > > so value 4 should be ok. > > Any hint? > > Thanks > > Olivier > > -- > Olivier Sallou > IRISA / University of Rennes 1 > Campus de Beaulieu, 35000 RENNES - FRANCE > Tel: 02.99.84.71.95 > > gpg key id: 4096R/326D8438 (keyring.debian.org) > Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 > > >
error when using dockerinfo user network
HI, I just upgraded to mesos 1.0 (package 1.0.0-2.0.89.ubuntu1510). I tried to setup with Docker containerizer the use of a user defined network (via docker cni plugin), using python binding. I face an error: "Unknown enum value: 4" when setting DockerInfo network value to 4. I was previously setting to value 2 (bridge) and it works. I can see in mesos.proto: enum Network { HOST = 1; BRIDGE = 2; NONE = 3; USER = 4; } so value 4 should be ok. Any hint? Thanks Olivier -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: cni / public port questions
- Mail original - > De: "Jie Yu" <yujie@gmail.com> > À: "dev" <dev@mesos.apache.org> > Cc: "Qian AZ Zhang" <zhang...@cn.ibm.com>, "Avinash Sridharan" > <avin...@mesosphere.io> > Envoyé: Jeudi 28 Juillet 2016 18:41:33 > Objet: Re: cni / public port questions > > you can still use bridge with CNI (you'll need to use the built-in bridge > plugin of CNI). > > Port mapping is still under development. Expecting this coming soon. Yes, I had seen that feature ni JIRA, but was wondering if there were other solutions in the meanwhile. As my containers need to expose some ports to public, port mapping is needed for bridge. So either I keep my existing docker containerizer with Docker bridge, either I switch to unified with CNI and port management (more complex to setup and more complex to manage by framework). I would have like not to force my framework users to use a CNI tool while switching my code to unified containerizer. This would complexify code upgrades (impacts mesos install, even for simple bridge CNI). This means that frameworks willing to switch to unifed cont. need to continue to provide docker cont. for existing installations (we can't force a mesos admin to switch to CNI just for a framework). Thanks Olivier > > - Jie > > On Thu, Jul 28, 2016 at 2:44 AM, haosdent <haosd...@gmail.com> wrote: > > > Hi, @Olivier. The port forwarding of mesos is still under implementing. You > > could subscribe https://issues.apache.org/jira/browse/MESOS-4823 to track > > the progress. > > > > On Thu, Jul 28, 2016 at 4:42 PM, Olivier Sallou <olivier.sal...@irisa.fr> > > wrote: > > > > > Hi, > > > I am looking at using unified containerizer. As it only support host > > mode, > > > it needs cni. > > > However, it is not really clear for me regarding "public" ports. > > > > > > If I have a container that needs to expose a port (let's say port 123), > > > can I expose it via the Mesos API only? > > > > > > When I use cni, as I understood, I allocate an IP per container. If IP is > > > routable in network, are all ports reachable (from any host / other > > > container) ? Or should it be explicitly opened ? > > > > > > To be simple, can I launch a container that would expose to public (any > > > host) only port 123 and other ports reachable only but containers in same > > > "private network" : > > > > > > - container 1 expose public port 123 and private port 456 (accessible by > > > container 2 only) > > > - container 2 connects to container 1 port 456. > > > > > > For the moment, I am using the Docker containerizer with bridge mode, so > > > exposing port was simply a matter of mapping ports. Private networks are > > > managed by user networks of Docker. > > > > > > > > > Thanks > > > > > > Olivier > > > > > > > > > > > > -- > > Best Regards, > > Haosdent Huang > > >
Re: [Mesos 2.0] Let's talk about the future
- Mail original - > De: "Jay JN Guo"> À: "user" , "mesos" > Envoyé: Vendredi 29 Juillet 2016 09:13:20 > Objet: [Mesos 2.0] Let's talk about the future > Hi, > As we are all excited about release 1.0.0, it's never too early to talk about > next big thing: Mesos 2.0.0. What major things should be done next? > I believe there are still many features you desire in Mesos and some of them > are already under development. I'd like to collect your minds and align the > vision in this mail thread. For example, here are items on Mesos long term > roadmap: > Pluggable Fetcher > Oversubscription for reservation: Optimistic offers > Resource Revocation > Pod support > Quota chunks > Multiple-role support for frameworks > User namespace support What features do you expect from this? Is it running a task/container as a different user on a per container basis (root in container but seen as user X on host)? (as expected in Docker in the future, seems it also need linux kernel updates) > Event bus > First class resources (Cpu topology info, GPU topology info, disk speed, etc) there was a quite recent proposal about location awareness (rack etc...) which also looks interesting > Deprecate Docker containerizer (in favor of Unified containerizer w/ Docker > support) while this is long term (let's keep people time to switch to unified ;-) ), deprecation of Docker containerizer should go with support of equivalent port mapping over bridge functionality as currently proposed by Docker network bridge mode. I know there is a track in JIRA for this feature, but without it, I think that you cannot drop the Docker containerizer. CNI plugins on mesos are important (IP per container), but should not be mandatory (more complex to install/setup than pure mesos). Indeed, CNI integration is not complete with Mesos or other frameworks (you do not fully manage ports of Calico etc... via Mesos, basically you only ask an IP for your container, all port rules are managed directly via the tool), and current Docker bridge/user mode with Mesos is far more easy to setup/use. Olivier > I would appreciate it if you could either share your ideas or vote on these > items, and we will discuss it in next community sync. > We may not have an unshakeable conclusion as container technology is evolving > at an ever faster pace, but the whole community, especially newbies like > myself, would profoundly benefit from a clear plan and priority for next 3-6 > months. > Cheers, > /Jay
cni / public port questions
Hi, I am looking at using unified containerizer. As it only support host mode, it needs cni. However, it is not really clear for me regarding "public" ports. If I have a container that needs to expose a port (let's say port 123), can I expose it via the Mesos API only? When I use cni, as I understood, I allocate an IP per container. If IP is routable in network, are all ports reachable (from any host / other container) ? Or should it be explicitly opened ? To be simple, can I launch a container that would expose to public (any host) only port 123 and other ports reachable only but containers in same "private network" : - container 1 expose public port 123 and private port 456 (accessible by container 2 only) - container 2 connects to container 1 port 456. For the moment, I am using the Docker containerizer with bridge mode, so exposing port was simply a matter of mapping ports. Private networks are managed by user networks of Docker. Thanks Olivier
Re: failed to start mesos-slave
On 06/10/2016 11:43 AM, Neil Conway wrote: > Hi Olivier, > > You might be running into > https://issues.apache.org/jira/browse/MESOS-2986 . Note that Mesos > 0.22 is quite old and is no longer supported. certainly but upgrading mesos in production is not a daily task upgrading to 0.22.2-0.2.62 seems to fix the issue. Thanks > > Neil > > > On Fri, Jun 10, 2016 at 11:37 AM, Olivier Sallou > <olivier.sal...@irisa.fr> wrote: >> Hi, >> I upgraded docker on one of my mesos slaves (v0.22) >> >> Now it fails to start with error: >> >> Failed to create a containerizer: Could not create DockerContainerizer: >> Insufficient version of Docker! Please upgrade to >= 1.0.0 >> >> Though docker is 1.11: >> >> docker -v >> Docker version 1.11.2, build b9f10c9 >> >> Any idea ? >> >> Thanks >> >> Olivier >> >> -- >> Olivier Sallou >> IRISA / University of Rennes 1 >> Campus de Beaulieu, 35000 RENNES - FRANCE >> Tel: 02.99.84.71.95 >> >> gpg key id: 4096R/326D8438 (keyring.debian.org) >> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >> -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
failed to start mesos-slave
Hi, I upgraded docker on one of my mesos slaves (v0.22) Now it fails to start with error: Failed to create a containerizer: Could not create DockerContainerizer: Insufficient version of Docker! Please upgrade to >= 1.0.0 Though docker is 1.11: docker -v Docker version 1.11.2, build b9f10c9 Any idea ? Thanks Olivier -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: how to debug HTTP API
On 06/07/2016 06:29 PM, Vinod Kone wrote: > Olivier, on a side note, it's great to see that you are playing with the > new HTTP API in python! I briefly looked at your linked code and it looks > like you are mixing the business logic of your application and the Mesos > API interaction in the same file. It would be great if (at some point) you > can extract the Mesos API interaction into a python library that can be > used by other frameworks. See other libraries (C++ > <https://github.com/apache/mesos/blob/master/include/mesos/v1/scheduler.hpp>, > Java <https://github.com/mesosphere/mesos-rxjava>, Go > <https://github.com/mesos/mesos-go>) for inspiration. I will try to do this later on. For the moment I focus no reproducing my code with the HTTP API, code is not yet clean. Would be better indeed to extract Mesos side from business logic. Olivier > > On Tue, Jun 7, 2016 at 11:46 AM, Anand Mazumdar <an...@mesosphere.io> wrote: > >> Olivier, >> >> You are missing the “task_infos” key in your “ACCEPT” call. The master >> treats “Accept” operations with no launch tasks as declining offers >> implicitly. I would file a followup JIRA to ensure this is logged on the >> master (if not so). >> >> An example correct JSON: >> https://gist.github.com/hatred/7325d8a4afde607ecc0f376ab62d60eb < >> https://gist.github.com/hatred/7325d8a4afde607ecc0f376ab62d60eb> >> >> -anand >> >>> On Jun 7, 2016, at 8:38 AM, Olivier Sallou <olivier.sal...@irisa.fr> >> wrote: >>> >>> >>> On 06/07/2016 04:53 PM, Guangya Liu wrote: >>>> So how many agent nodes are there in your cluster? If you continue >>>> receiving offer but without getting UPDATE message, then it may be >> caused >>>> by that your task definition and the framework continually decline >> offer. >>> I have only one node (master/slave), for development. It worked fine >>> with the python API. >>> we see on master that it received the ACCEPT, and no DECLINE. However, >>> as I receive no UPDATE, I suppose that mesos "drops" the ACCEPT (wrong >>> task definition maybe), and sends new offers several seconds after I >>> sent the ACCEPT. >>>> Can you please share your framework code here for the logic of "Event:: >>>> OFFERS"? >>> Code is available here: >>> >>> >> https://bitbucket.org/osallou/go-docker/src/b1948063fb7f68fbc77f5de6b473d832a7dd36af/plugins/mesos.py?at=master=file-view-default >> < >> https://bitbucket.org/osallou/go-docker/src/b1948063fb7f68fbc77f5de6b473d832a7dd36af/plugins/mesos.py?at=master=file-view-default >>> >>> in method run of MesosThread, line 613 >>> >>> Code is a little complex, as it is a port of existing code using mesos >>> python lib. >>> >>> Code related to HTTP is in development, so there may be further errors, >>> but registration is fine as well as offer messages. >>> >>> I have added locally a debug print to show any message received by mesos >>> (in case I would have received an other message indicating an error), >>> but I received no other than offer and heartbeats. >>> >>> If Mesos see the ACCEPT message as it appears in logs, that it should >>> either reject it (with a different status code than 202) or send an >>> UPDATE error message if there is an error with my task definition. >>> >>> Olivier >>>> Thanks, >>>> >>>> Guangya >>>> >>>> On Tue, Jun 7, 2016 at 8:29 PM, Olivier Sallou <olivier.sal...@irisa.fr >>>> wrote: >>>> >>>>> On 06/07/2016 01:59 PM, Guangya Liu wrote: >>>>>> I can see that your framework is now holding the offer, how did you >>>>> launch >>>>>> task? >>>>> I execute an HTTP POST request in Python with json content-type: >>>>> >>>>> {'type': 'ACCEPT', >>>>> 'framework_id': {'value': >> u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'}, >>>>> 'accept': { >>>>>'operations': [ >>>>>{'type': 'LAUNCH', >>>>>'launch': {'container': { >>>>>'docker': {'image': u'centos:latest', >>>>> 'force_pull_image': True, 'port_mappings': [], 'network': 2}, >>>>>'type': 1, >>>>>'volumes': [ >>>>>{'host_path': u'/a/b', 'container_path': >>>>> u'
Re: how to debug HTTP API
On 06/07/2016 05:46 PM, Anand Mazumdar wrote: > Olivier, > > You are missing the “task_infos” key in your “ACCEPT” call. The master treats > “Accept” operations with no launch tasks as declining offers implicitly. I > would file a followup JIRA to ensure this is logged on the master (if not so). > > An example correct JSON: > https://gist.github.com/hatred/7325d8a4afde607ecc0f376ab62d60eb > <https://gist.github.com/hatred/7325d8a4afde607ecc0f376ab62d60eb> thanks, example is really useful. I supposed task "structure" was the issue, but getting no error log on master about this was an issue and difficult to understand. Would indeed be fine to get a master log about the issue. Thanks! Olivier > > -anand > >> On Jun 7, 2016, at 8:38 AM, Olivier Sallou <olivier.sal...@irisa.fr> wrote: >> >> >> >> On 06/07/2016 04:53 PM, Guangya Liu wrote: >>> So how many agent nodes are there in your cluster? If you continue >>> receiving offer but without getting UPDATE message, then it may be caused >>> by that your task definition and the framework continually decline offer. >> I have only one node (master/slave), for development. It worked fine >> with the python API. >> we see on master that it received the ACCEPT, and no DECLINE. However, >> as I receive no UPDATE, I suppose that mesos "drops" the ACCEPT (wrong >> task definition maybe), and sends new offers several seconds after I >> sent the ACCEPT. >>> Can you please share your framework code here for the logic of "Event:: >>> OFFERS"? >> Code is available here: >> >> https://bitbucket.org/osallou/go-docker/src/b1948063fb7f68fbc77f5de6b473d832a7dd36af/plugins/mesos.py?at=master=file-view-default >> >> <https://bitbucket.org/osallou/go-docker/src/b1948063fb7f68fbc77f5de6b473d832a7dd36af/plugins/mesos.py?at=master=file-view-default> >> >> in method run of MesosThread, line 613 >> >> Code is a little complex, as it is a port of existing code using mesos >> python lib. >> >> Code related to HTTP is in development, so there may be further errors, >> but registration is fine as well as offer messages. >> >> I have added locally a debug print to show any message received by mesos >> (in case I would have received an other message indicating an error), >> but I received no other than offer and heartbeats. >> >> If Mesos see the ACCEPT message as it appears in logs, that it should >> either reject it (with a different status code than 202) or send an >> UPDATE error message if there is an error with my task definition. >> >> Olivier >>> Thanks, >>> >>> Guangya >>> >>> On Tue, Jun 7, 2016 at 8:29 PM, Olivier Sallou <olivier.sal...@irisa.fr> >>> wrote: >>> >>>> On 06/07/2016 01:59 PM, Guangya Liu wrote: >>>>> I can see that your framework is now holding the offer, how did you >>>> launch >>>>> task? >>>> I execute an HTTP POST request in Python with json content-type: >>>> >>>> {'type': 'ACCEPT', >>>> 'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'}, >>>> 'accept': { >>>>'operations': [ >>>>{'type': 'LAUNCH', >>>>'launch': {'container': { >>>>'docker': {'image': u'centos:latest', >>>> 'force_pull_image': True, 'port_mappings': [], 'network': 2}, >>>>'type': 1, >>>>'volumes': [ >>>>{'host_path': u'/a/b', 'container_path': >>>> u'/mnt/home', 'mode': 1}, >>>>{'host_path': u'/a/b/c', 'container_path': >>>> u'/mnt/go-docker', 'mode': 1}, >>>>{'host_path': u'/b/c/d', 'container_path': >>>> u'/mnt/god-data', 'mode': 2} >>>>] >>>>}, >>>>'name': u'testr', >>>>'task_id': {'value': '128'}, >>>>'command': {'uris': [{'value': >>>> u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'}, >>>>'slave_id': {'value': >>>> u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'}, >>>>'resources': [ >>>>{'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'}, >>>>{'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'} >>>>] >>>>} # end launch >>>>} # e
Re: how to debug HTTP API
On 06/07/2016 04:53 PM, Guangya Liu wrote: > So how many agent nodes are there in your cluster? If you continue > receiving offer but without getting UPDATE message, then it may be caused > by that your task definition and the framework continually decline offer. I have only one node (master/slave), for development. It worked fine with the python API. we see on master that it received the ACCEPT, and no DECLINE. However, as I receive no UPDATE, I suppose that mesos "drops" the ACCEPT (wrong task definition maybe), and sends new offers several seconds after I sent the ACCEPT. > > Can you please share your framework code here for the logic of "Event:: > OFFERS"? Code is available here: https://bitbucket.org/osallou/go-docker/src/b1948063fb7f68fbc77f5de6b473d832a7dd36af/plugins/mesos.py?at=master=file-view-default in method run of MesosThread, line 613 Code is a little complex, as it is a port of existing code using mesos python lib. Code related to HTTP is in development, so there may be further errors, but registration is fine as well as offer messages. I have added locally a debug print to show any message received by mesos (in case I would have received an other message indicating an error), but I received no other than offer and heartbeats. If Mesos see the ACCEPT message as it appears in logs, that it should either reject it (with a different status code than 202) or send an UPDATE error message if there is an error with my task definition. Olivier > > Thanks, > > Guangya > > On Tue, Jun 7, 2016 at 8:29 PM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > >> >> On 06/07/2016 01:59 PM, Guangya Liu wrote: >>> I can see that your framework is now holding the offer, how did you >> launch >>> task? >> I execute an HTTP POST request in Python with json content-type: >> >> {'type': 'ACCEPT', >> 'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'}, >> 'accept': { >> 'operations': [ >> {'type': 'LAUNCH', >> 'launch': {'container': { >> 'docker': {'image': u'centos:latest', >> 'force_pull_image': True, 'port_mappings': [], 'network': 2}, >> 'type': 1, >> 'volumes': [ >> {'host_path': u'/a/b', 'container_path': >> u'/mnt/home', 'mode': 1}, >> {'host_path': u'/a/b/c', 'container_path': >> u'/mnt/go-docker', 'mode': 1}, >> {'host_path': u'/b/c/d', 'container_path': >> u'/mnt/god-data', 'mode': 2} >> ] >> }, >> 'name': u'testr', >> 'task_id': {'value': '128'}, >> 'command': {'uris': [{'value': >> u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'}, >> 'slave_id': {'value': >> u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'}, >> 'resources': [ >> {'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'}, >> {'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'} >> ] >> } # end launch >> } # end operation >> ], >> 'offer_ids': [{'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28'}] >> } >> } >> >> We can see that Mesos received the ACCEPT: >> >> I0607 11:45:15.873584 14896 master.cpp:3104] Processing ACCEPT call for >> offers: [ e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28 ] on slave >> e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0 at slave(1)@127.0.1.1:5051 >> (tifenn.irisa.fr) for framework >> >> >> and I continue to receive new offers, so "connection" is OK. I should >> receive an UPDATE message even if there is an error, but I receive none >> (I track/log all messages received, whatever the type). >> >> Olivier >> >>> Perhaps you can take a look at >>> https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L311 >> which >>> is an example framework using HTTP API >>> >>> Thanks, >>> >>> Guangya >>> >>> On Tue, Jun 7, 2016 at 7:19 PM, Olivier Sallou <olivier.sal...@irisa.fr> >>> wrote: >>> >>>> On 06/07/2016 12:25 PM, Guangya Liu wrote: >>>>> Olivier, >>>>> >>>>> For such case, seems there is sth wrong with your framework? can you >>>> please >>>>> run the following two commands and check the output? >>>> I don't think it is a framework issue, I receive offers, heartbeats >> etc... >>>> It is only at task creation step, when I have no rejection nor up
Re: how to debug HTTP API
On 06/07/2016 01:59 PM, Guangya Liu wrote: > I can see that your framework is now holding the offer, how did you launch > task? I execute an HTTP POST request in Python with json content-type: {'type': 'ACCEPT', 'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'}, 'accept': { 'operations': [ {'type': 'LAUNCH', 'launch': {'container': { 'docker': {'image': u'centos:latest', 'force_pull_image': True, 'port_mappings': [], 'network': 2}, 'type': 1, 'volumes': [ {'host_path': u'/a/b', 'container_path': u'/mnt/home', 'mode': 1}, {'host_path': u'/a/b/c', 'container_path': u'/mnt/go-docker', 'mode': 1}, {'host_path': u'/b/c/d', 'container_path': u'/mnt/god-data', 'mode': 2} ] }, 'name': u'testr', 'task_id': {'value': '128'}, 'command': {'uris': [{'value': u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'}, 'slave_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'}, 'resources': [ {'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'}, {'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'} ] } # end launch } # end operation ], 'offer_ids': [{'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28'}] } } We can see that Mesos received the ACCEPT: I0607 11:45:15.873584 14896 master.cpp:3104] Processing ACCEPT call for offers: [ e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28 ] on slave e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0 at slave(1)@127.0.1.1:5051 (tifenn.irisa.fr) for framework and I continue to receive new offers, so "connection" is OK. I should receive an UPDATE message even if there is an error, but I receive none (I track/log all messages received, whatever the type). Olivier > Perhaps you can take a look at > https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L311 which > is an example framework using HTTP API > > Thanks, > > Guangya > > On Tue, Jun 7, 2016 at 7:19 PM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > >> >> On 06/07/2016 12:25 PM, Guangya Liu wrote: >>> Olivier, >>> >>> For such case, seems there is sth wrong with your framework? can you >> please >>> run the following two commands and check the output? >> I don't think it is a framework issue, I receive offers, heartbeats etc... >> It is only at task creation step, when I have no rejection nor update >> message. >> >> It could be (certainly) an issue with the json task message I sent in >> the ACCEPT, but as there is no error, I have no way to understand what's >> wrong with it. >>> curl "http://:5050/master/frameworks" 2>/dev/null|python >> -m >>> json.tool >> { >> "completed_frameworks": [], >> "frameworks": [ >> { >> "active": true, >> "capabilities": [], >> "checkpoint": false, >> "completed_tasks": [], >> "executors": [], >> "failover_timeout": 0.0, >> "hostname": "", >> "id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0021", >> "name": "GoDocker HTTP Framework", >> "offered_resources": { >> "cpus": 4.0, >> "disk": 459470.0, >> "mem": 14898.0, >> "ports": "[31000-32000]" >> }, >> "offers": [ >> { >> "framework_id": >> "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0021", >> "id": "1f1486e3-43ee-44c5-b073-82a901add956-O0", >> "resources": { >> "cpus": 4.0, >> "disk": 459470.0, >> "mem": 14898.0, >> "ports": "[31000-32000]" >> }, >> "slave_id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0" >> } >> ], >> "registered_time": 1465298174.2483, >> "resources": { >> "cpus": 4.0, >> "disk": 459470.0, >> "mem
Re: how to debug HTTP API
": "drf", "version": "false", "webui_dir": "/usr/share/mesos/webui", "work_dir": "/var/lib/mesos", "zk": "zk://localhost:2181/mesos", "zk_session_timeout": "10secs" }, "frameworks": [ { "active": true, "capabilities": [], "checkpoint": false, "completed_tasks": [], "executors": [], "failover_timeout": 0.0, "hostname": "", "id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0021", "name": "GoDocker HTTP Framework", "offered_resources": { "cpus": 0.0, "disk": 0.0, "mem": 0.0 }, "offers": [], "registered_time": 1465298174.2483, "resources": { "cpus": 0.0, "disk": 0.0, "mem": 0.0 }, "role": "*", "tasks": [], "unregistered_time": 0.0, "used_resources": { "cpus": 0.0, "disk": 0.0, "mem": 0.0 }, "user": "godocker_http_test", "webui_url": "" } ], "git_sha": "555db235a34afbb9fb49940376cc33a66f1f85f0", "git_tag": "0.28.1", "hostname": "tifenn.irisa.fr", "id": "1f1486e3-43ee-44c5-b073-82a901add956", "leader": "master@127.0.1.1:5050", "log_dir": "/var/log/mesos", "orphan_tasks": [], "pid": "master@127.0.1.1:5050", "slaves": [ { "active": true, "attributes": { "hostname": "127.0.0.1" }, "hostname": "tifenn.irisa.fr", "id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0", "offered_resources": { "cpus": 0.0, "disk": 0.0, "mem": 0.0 }, "pid": "slave(1)@127.0.1.1:5051", "registered_time": 1465298164.37517, "reregistered_time": 1465298164.37526, "reserved_resources": {}, "resources": { "cpus": 4.0, "disk": 459470.0, "mem": 14898.0, "ports": "[31000-32000]" }, "unreserved_resources": { "cpus": 4.0, "disk": 459470.0, "mem": 14898.0, "ports": "[31000-32000]" }, "used_resources": { "cpus": 0.0, "disk": 0.0, "mem": 0.0 }, "version": "0.28.1" } ], "start_time": 1465298159.26321, "unregistered_frameworks": [], "version": "0.28.1" } > > Thanks, > > Guangya > > On Tue, Jun 7, 2016 at 6:04 PM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > >> Hi, >> I am trying to switch from Python to HTTP API. I use mesos 0.28.1 >> >> I could create framework to register, receive offers etc... but I have >> an issue accepting offers. >> >> I send my ACCEPT message but I do not receive any UPDATE message, only >> new offers and hearbeat messages. >> >> On mesos master logs I see: >> >> I0607 11:45:15.873184 14896 http.cpp:312] HTTP POST for >> /master/api/v1/scheduler from 127.0.0.1:38298 with >> User-Agent='python-requests/2.9.1' >> I0607 11:45:15.873584 14896 master.cpp:3104] Processing ACCEPT call for >> offers: [ e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28 ] on slave >> e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0 at slave(1)@127.0.1.1:5051 >> (tifenn.irisa.fr) for framework >> e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020 (GoDocker HTTP Framework) >> >> There is a "Processing ACCEPT" and no error, but my task is not ran on >> mesos. >> No error on slave either. >> >> Response code to my ACCEPT is 202 as expected. >> >> Here is my HTTP json message: >> >> {'type': 'ACCEPT', >> 'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'}, >> 'accept': { >> 'operations': [ >> {'type': 'LAUNCH', >> 'launch': {'container': { >> 'docker': {'image': u'centos:latest', >> 'force_pull_image': True, 'port_mappings': [], 'network': 2}, >> 'type': 1, >> 'volumes': [ >> {'host_path': u'/a/b', 'container_path': >> u'/mnt/home', 'mode': 1}, >> {'host_path': u'/a/b/c', 'container_path': >> u'/mnt/go-docker', 'mode': 1}, >> {'host_path': u'/b/c/d', 'container_path': >> u'/mnt/god-data', 'mode': 2} >> ] >> }, >> 'name': u'testr', >> 'task_id': {'value': '128'}, >> 'command': {'uris': [{'value': >> u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'}, >> 'slave_id': {'value': >> u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'}, >> 'resources': [ >> {'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'}, >> {'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'} >> ] >> } # end launch >> } # end operation >> ], >> 'offer_ids': [{'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28'}] >> } >> } >> >> There could be an issue with my task definition, but as no error is >> raised and I receive no UPDATE error message. >> >> Any hint on how to debug this? >> >> Thanks >> >> >> -- >> Olivier Sallou >> IRISA / University of Rennes 1 >> Campus de Beaulieu, 35000 RENNES - FRANCE >> Tel: 02.99.84.71.95 >> >> gpg key id: 4096R/326D8438 (keyring.debian.org) >> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >> >> -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
how to debug HTTP API
Hi, I am trying to switch from Python to HTTP API. I use mesos 0.28.1 I could create framework to register, receive offers etc... but I have an issue accepting offers. I send my ACCEPT message but I do not receive any UPDATE message, only new offers and hearbeat messages. On mesos master logs I see: I0607 11:45:15.873184 14896 http.cpp:312] HTTP POST for /master/api/v1/scheduler from 127.0.0.1:38298 with User-Agent='python-requests/2.9.1' I0607 11:45:15.873584 14896 master.cpp:3104] Processing ACCEPT call for offers: [ e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28 ] on slave e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0 at slave(1)@127.0.1.1:5051 (tifenn.irisa.fr) for framework e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020 (GoDocker HTTP Framework) There is a "Processing ACCEPT" and no error, but my task is not ran on mesos. No error on slave either. Response code to my ACCEPT is 202 as expected. Here is my HTTP json message: {'type': 'ACCEPT', 'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'}, 'accept': { 'operations': [ {'type': 'LAUNCH', 'launch': {'container': { 'docker': {'image': u'centos:latest', 'force_pull_image': True, 'port_mappings': [], 'network': 2}, 'type': 1, 'volumes': [ {'host_path': u'/a/b', 'container_path': u'/mnt/home', 'mode': 1}, {'host_path': u'/a/b/c', 'container_path': u'/mnt/go-docker', 'mode': 1}, {'host_path': u'/b/c/d', 'container_path': u'/mnt/god-data', 'mode': 2} ] }, 'name': u'testr', 'task_id': {'value': '128'}, 'command': {'uris': [{'value': u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'}, 'slave_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'}, 'resources': [ {'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'}, {'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'} ] } # end launch } # end operation ], 'offer_ids': [{'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28'}] } } There could be an issue with my task definition, but as no error is raised and I receive no UPDATE error message. Any hint on how to debug this? Thanks -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: volume / mount point error with Unified Containerizer
- Mail original - > De: "Guangya Liu" <gyliu...@gmail.com> > À: "dev" <dev@mesos.apache.org>, "Jie Yu" <j...@mesosphere.io>, "Gilbert > Song" <gilb...@mesosphere.io> > Envoyé: Lundi 23 Mai 2016 17:34:41 > Objet: Re: volume / mount point error with Unified Containerizer > > It is a bit strange to me, I also did some test and review code for > relative path, and found that relative path works well. > > In 0.28.1, if deploy a docker container with MesosContaineirizer, then if > using absolute path as continer_path, the mesos agent will update the > container_path to a relative path by adding a prefix ./rootfs to the > container_path, e.g. /file/path = > ./rootfs/file/path. > > If deploy a docker container with MesosContaineirizer with relative path as > container_path, then the mesos agent will not update the container_path. > > So the final mount point for the container should be either > > 1) /tmp/mesos/slaves/agent_id/frameworks/framework_id/ > executors/51/runs/container_id/.rootfs/file/path > 2) /tmp/mesos/slaves/agent_id/frameworks/framework_id/executors/51/runs/ > container_id/file/path > > The only difference is adding ./rootfs as a prefix or not, the test result > is that 1) does not work and 2) works well. And even the mount for 1) > failed, but I can see the mount point path does exist. > @Guangya I confirm that using relative path works fine, I get volumes in mesos path (but it does not help for my implementation). If I use the Docker containerizer, absolute paths are fine, this is what I use for the moment in my code, and am investigating to switch to unified container. > @Yu Jie and @Gilbert, any comments for this? > > @Oilivier, > > In order not to block your test, can you please use mesos after 0.28.1? You > can use either 0.28.2 or above version. Well, as this is not an urgent matter, I am waiting 0.29 to test against this release (with other features I am waiting for). > > Thanks, > > Guangya > > > On Mon, May 23, 2016 at 10:30 PM, Guangya Liu <gyliu...@gmail.com> wrote: > > > Thanks Olivier, I can reproduce this issue now and still checking what is > > wrong. > > > > What I did is as following: > > 1) Check out code with tag of 0.28.1 > > 2) update mesos-execute to add a host path volume > > diff --git a/src/cli/execute.cpp b/src/cli/execute.cpp > > index 81a0388..0ff913c 100644 > > --- a/src/cli/execute.cpp > > +++ b/src/cli/execute.cpp > > @@ -72,6 +72,8 @@ using mesos::v1::TaskID; > > using mesos::v1::TaskInfo; > > using mesos::v1::TaskState; > > using mesos::v1::TaskStatus; > > +using mesos::v1::Volume; > > +using mesos::v1::Parameters; > > > > using mesos::v1::scheduler::Call; > > using mesos::v1::scheduler::Event; > > @@ -572,6 +574,12 @@ private: > > } > >} > > > > + Volume* volume1 = containerInfo.add_volumes(); > > + volume1->set_container_path("/tmp/abcd"); > > + volume1->set_mode(Volume::RW); > > + volume1->set_host_path("/root/convoy"); > > + cout << "Add Voume 1" << endl; > > + > >return containerInfo; > > } else if (containerizer == "docker") { > >// 'docker' containerizer only supports 'docker' images. > > 3) launch a task with docker image, task failed. > > > > 4) Check sandbox: > > + /root/src/mesos/m1/mesos/build/src/mesos-containerizer mount > > --help=false --operation=make-rslave --path=/ > > + grep -E /tmp/mesos/.+ /proc/self/mountinfo > > + grep -v 3239aafc-78d8-4f70-81e5-f32fb379 > > + cut+ -d -f5 > > xargs --no-run-if-empty umount -l > > + mount -n --rbind > > /tmp/mesos/provisioner/containers/3239aafc-78d8-4f70-81e5-f32fb379/backends/copy/rootfses/5e8bf3fa-53b1-4bd5-bb3d-525ddc7900b6 > > /tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs > > + mount -n --rbind /root/convoy > > /tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs/tmp/abcd > > mount: mount point > > /tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs/tmp/abcd > > does not exist > > Failed to execute a preparation shell command > > > > Will check more for this
Re: volume / mount point error with Unified Containerizer
On 05/23/2016 09:33 AM, Olivier Sallou wrote: > > On 05/20/2016 03:26 PM, Guangya Liu wrote: >> Since you are using docker image which means that your container will have >> rootfs, so it is not required to have the absolute path exist, the linux >> file system isolator will help create the path automatically >> https://github.com/apache/mesos/blob/0.28.x/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L390-L402 >> >> Can you please share your framework? How did you set the volume part in >> your framework? > @Guangya > > I use Python API. > > Here is related code: > > > # Define container volumes > for v in job['container']['volumes']: > volume = container.volumes.add() > volume.container_path = v['mount'] > volume.host_path = v['path'] > if v['acl'] == 'rw': > volume.mode = 1 # mesos_pb2.Volume.Mode.RW > else: > volume.mode = 2 # mesos_pb2.Volume.Mode.RO > > => In my test case, I add 2 volumes from a host shared directory, > mounted in container as /mnt/go-docker and /mnt/god-data. > > ... > # Define docker image and network > docker = mesos_pb2.ContainerInfo.MesosInfo() > docker.image.type = 2 # Docker > docker.image.docker.name ='centos:latest' > # Request an IP from a network module > network_info = container.network_infos.add() > network_info_name = 'sampletest' > # Get an IP V4 address > ip_address = network_info.ip_addresses.add() > ip_address.protocol = 1 > # The network group to join > group = network_info.groups.append(network_info_name) > port_list = [22] > if port_list: > for port in port_list: > job['container']['port_mapping'].append({'host': > port, 'container': port}) > container.mesos.MergeFrom(docker) > > It results in error message: > > + mount -n --rbind > /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f > /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs > + mount -n --rbind > /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task > /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data > mount: mount point > /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data > does not exist > Failed to execute a preparation shell command > > > We can see the .rootfs, but mnt/god-data under .rootfs fails. Local > directory exists, it does not pre-exists in container. What is strange > is , if I look in mesos task dir, .rootfs/mnt/go-data, directory is present. > > Or, is the error message (.rootfs/mnt/god-data does not exist) simply a > warning, and it creates it, and final error (Failed to execute a > preparation shell command) not related (and unclear...) Additional info: command to execute in container is located in one of mounted volume. > > Olivier >> Thanks, >> >> Guangya >> >> On Fri, May 20, 2016 at 4:54 AM, Olivier Sallou <olivier.sal...@irisa.fr> >> wrote: >> >>> - Mail original - >>>> De: "Gilbert Song" <gilb...@mesosphere.io> >>>> À: "dev" <dev@mesos.apache.org> >>>> Envoyé: Jeudi 19 Mai 2016 01:57:16 >>>> Objet: Re: volume / mount point error with Unified Containerizer >>>> >>>> @Olivier, >>>> In mesos 0.28.1, you are supposed to be able bind mount a volume from >>>> the host into the mesos container. Did you specify a docker image (we >>>> determine >>>> the mount point differently depending whether the container has a >>> rootfs)? >>> >>> Yes I specified an image, a Docker image URI. >>> >>>> How >>>> do you specify your 'container_path' (the mount point in the container)? >>> If >>>> it is an >>>> absolute path, we require that dir to be pre-existed. If it is a relative >>>> path, we will >>>> mkdir for it. >>> It is an absolute path,
Re: volume / mount point error with Unified Containerizer
On 05/20/2016 03:26 PM, Guangya Liu wrote: > Since you are using docker image which means that your container will have > rootfs, so it is not required to have the absolute path exist, the linux > file system isolator will help create the path automatically > https://github.com/apache/mesos/blob/0.28.x/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L390-L402 > > Can you please share your framework? How did you set the volume part in > your framework? @Guangya I use Python API. Here is related code: # Define container volumes for v in job['container']['volumes']: volume = container.volumes.add() volume.container_path = v['mount'] volume.host_path = v['path'] if v['acl'] == 'rw': volume.mode = 1 # mesos_pb2.Volume.Mode.RW else: volume.mode = 2 # mesos_pb2.Volume.Mode.RO => In my test case, I add 2 volumes from a host shared directory, mounted in container as /mnt/go-docker and /mnt/god-data. ... # Define docker image and network docker = mesos_pb2.ContainerInfo.MesosInfo() docker.image.type = 2 # Docker docker.image.docker.name ='centos:latest' # Request an IP from a network module network_info = container.network_infos.add() network_info_name = 'sampletest' # Get an IP V4 address ip_address = network_info.ip_addresses.add() ip_address.protocol = 1 # The network group to join group = network_info.groups.append(network_info_name) port_list = [22] if port_list: for port in port_list: job['container']['port_mapping'].append({'host': port, 'container': port}) container.mesos.MergeFrom(docker) It results in error message: + mount -n --rbind /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs + mount -n --rbind /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data mount: mount point /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data does not exist Failed to execute a preparation shell command We can see the .rootfs, but mnt/god-data under .rootfs fails. Local directory exists, it does not pre-exists in container. What is strange is , if I look in mesos task dir, .rootfs/mnt/go-data, directory is present. Or, is the error message (.rootfs/mnt/god-data does not exist) simply a warning, and it creates it, and final error (Failed to execute a preparation shell command) not related (and unclear...) Olivier > > Thanks, > > Guangya > > On Fri, May 20, 2016 at 4:54 AM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > >> >> - Mail original - >>> De: "Gilbert Song" <gilb...@mesosphere.io> >>> À: "dev" <dev@mesos.apache.org> >>> Envoyé: Jeudi 19 Mai 2016 01:57:16 >>> Objet: Re: volume / mount point error with Unified Containerizer >>> >>> @Olivier, >>> In mesos 0.28.1, you are supposed to be able bind mount a volume from >>> the host into the mesos container. Did you specify a docker image (we >>> determine >>> the mount point differently depending whether the container has a >> rootfs)? >> >> Yes I specified an image, a Docker image URI. >> >>> How >>> do you specify your 'container_path' (the mount point in the container)? >> If >>> it is an >>> absolute path, we require that dir to be pre-existed. If it is a relative >>> path, we will >>> mkdir for it. >> It is an absolute path, but it does not exists in image (this is the >> issue). Images are custom Docker images (images containing tools for batch >> computing), and I want, for example, to mount some shared resources (user >> home dir, common data, etc.) in the image. Of course those directories do >> not pre-exists in container images as they are specific to the environment. >> Requiring existence of the directory in the image is not issue as it >> prevents using any existing image from a repo. >> >> When using Docker containerizer it works fine, I
Re: volume / mount point error with Unified Containerizer
- Mail original - > De: "Gilbert Song" <gilb...@mesosphere.io> > À: "dev" <dev@mesos.apache.org> > Envoyé: Jeudi 19 Mai 2016 01:57:16 > Objet: Re: volume / mount point error with Unified Containerizer > > @Olivier, > In mesos 0.28.1, you are supposed to be able bind mount a volume from > the host into the mesos container. Did you specify a docker image (we > determine > the mount point differently depending whether the container has a rootfs)? Yes I specified an image, a Docker image URI. > How > do you specify your 'container_path' (the mount point in the container)? If > it is an > absolute path, we require that dir to be pre-existed. If it is a relative > path, we will > mkdir for it. It is an absolute path, but it does not exists in image (this is the issue). Images are custom Docker images (images containing tools for batch computing), and I want, for example, to mount some shared resources (user home dir, common data, etc.) in the image. Of course those directories do not pre-exists in container images as they are specific to the environment. Requiring existence of the directory in the image is not issue as it prevents using any existing image from a repo. When using Docker containerizer it works fine, I can mount any external storage in the container. Olivie > > @Joshua, > Thank for posting your workaround on mesos. As I mentioned above, in 0.28.1 > or > older, we only mkdir for container_path which is relative path (not > starting with "/"). > Because if no rootfs specified for a mesos container, the container shares > the host > root filesystem. Obviously we don't want any random files to be created > implicitly > on your host fs. > From mesos 0.29 (release by the end of this month), we will mkdir the mount > point in the container except for the command task case that specify an > absolute > container_path without a rootfs. Because we simplify the mounting logic, and > sandbox bind mount will only be done in container mount namespace instead of > host mount namespace (what we did before). Please keep tuned. > > Cheers, > Gilbert > > On Wed, May 18, 2016 at 8:14 AM, Joshua Cohen <jco...@apache.org> wrote: > > > Hi Olivier, > > > > I touched on this issue as part of > > https://issues.apache.org/jira/browse/MESOS-5229. It would be nice if > > Mesos > > automatically created container mount points if they don't already exist. > > In the meantime, as a workaround for this, I've updated my filesystem > > images to include the path (e.g. in Dockerfile, add `RUN mkdir -p > > /some/mount/point`). Not the best solution, but the only thing I've seen > > that works at the moment. > > > > Cheers, > > > > Joshua > > > > On Wed, May 18, 2016 at 7:36 AM, Guangya Liu <gyliu...@gmail.com> wrote: > > > > > It's pretty simple for you from scratch with source code > > > > > > > > https://github.com/apache/mesos/blob/master/docs/getting-started.md#building-mesos > > > ;-) > > > > > > Thanks, > > > > > > Guangya > > > > > > On Wed, May 18, 2016 at 8:30 PM, Olivier Sallou <olivier.sal...@irisa.fr > > > > > > wrote: > > > > > > > > > > > > > > > On 05/18/2016 02:31 PM, Guangya Liu wrote: > > > > > Just saw that you are working with 0.28.1, the "docker volume driver" > > > > code > > > > > was not in 0.28.1, can you please have a try with mesos master branch > > > if > > > > > you are only doing some test? > > > > this is indeed test only for the moment. But I will have to > > > > recompile/install mesos :-( (I used packages for install). > > > > > > > > I will try when possible, but thanks for the hint. > > > > > > > > > > Thanks, > > > > > > > > > > Guangya > > > > > > > > > > On Wed, May 18, 2016 at 8:28 PM, Guangya Liu <gyliu...@gmail.com> > > > wrote: > > > > > > > > > >> Hi Olivier, > > > > >> > > > > >> I think that you need to enable "docker volume isolator" if you want > > > use > > > > >> external storage with unified container I was writing a document > > here > > > > >> https://reviews.apache.org/r/47511/, perhaps you can have a try > > > > according > > > > >> to the document and post some comments there if you find any issues. > > > > >> >
Re: volume / mount point error with Unified Containerizer
On 05/18/2016 02:31 PM, Guangya Liu wrote: > Just saw that you are working with 0.28.1, the "docker volume driver" code > was not in 0.28.1, can you please have a try with mesos master branch if > you are only doing some test? this is indeed test only for the moment. But I will have to recompile/install mesos :-( (I used packages for install). I will try when possible, but thanks for the hint. > > Thanks, > > Guangya > > On Wed, May 18, 2016 at 8:28 PM, Guangya Liu <gyliu...@gmail.com> wrote: > >> Hi Olivier, >> >> I think that you need to enable "docker volume isolator" if you want use >> external storage with unified container I was writing a document here >> https://reviews.apache.org/r/47511/, perhaps you can have a try according >> to the document and post some comments there if you find any issues. >> >> Also you can patch mesos-execute here https://reviews.apache.org/r/46762/ to >> have a try with mesos-execute. >> >> Thanks, >> >> Guangya >> >> On Wed, May 18, 2016 at 7:17 PM, Olivier Sallou <olivier.sal...@irisa.fr> >> wrote: >> >>> Answering (partially) to myself. >>> >>> I seems issue is container_path does not exists inside container. On >>> Docker, path is created and mounted. With pure mesos, container_path >>> must exists. >>> >>> mesos.proto says: "If the path is an absolute path, that path must >>> already exist." >>> >>> This is an issue however, using Docker images, the path I want to mount >>> does not exists, and it cannot be modified "on the fly". >>> >>> Is there a workaround for this ? >>> >>> >>> On 05/18/2016 12:24 PM, Olivier Sallou wrote: >>>> Hi, >>>> I am trying unified containerizer on a single server (master/slave) on >>>> mesos 0.28.1, to switch from docker containerizer to mesos+docker image >>>> container. >>>> >>>> I have setup slave config as suggested in documentation: >>>> >>>> containerizers=docker,mesos >>>> image_providers=docker \ >>>> isolation=filesystem/linux,docker/runtime >>>> >>>> However, when I execute my task with a volume I have an error: >>>> >>>> >>>> + mount -n --rbind >>>> >>> /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f >>> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs >>>> + mount -n --rbind >>>> >>> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task >>> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data >>>> mount: mount point >>>> >>> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data >>>> does not exist >>>> Failed to execute a preparation shell command >>>> >>>> Then, my task switches to FAILED. >>>> >>>> I define a local volume to bind mount in my "container" >>>> >>> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task >>>> => /mnt/god-data >>>> My directory exists on local server. >>>> In mesos UI, I can see the .rootfs directory along stdout and stderr >>>> files, and inside .rootfs, I can see /mnt/god-data (empty). >>>> >>>> Running the same using Docker containerizer instead of mesos >>>> containerizer (with a Docker image) works fine. >>>> >>>> It seems it fails to mount my local directory in the container. Any idea >>>> of what is going wrong or how to debug this? >>>> >>>> >>>> Thanks >>>> >>> -- >>> Olivier Sallou >>> IRISA / University of Rennes 1 >>> Campus de Beaulieu, 35000 RENNES - FRANCE >>> Tel: 02.99.84.71.95 >>> >>> gpg key id: 4096R/326D8438 (keyring.debian.org) >>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >>> >>> -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: volume / mount point error with Unified Containerizer
Answering (partially) to myself. I seems issue is container_path does not exists inside container. On Docker, path is created and mounted. With pure mesos, container_path must exists. mesos.proto says: "If the path is an absolute path, that path must already exist." This is an issue however, using Docker images, the path I want to mount does not exists, and it cannot be modified "on the fly". Is there a workaround for this ? On 05/18/2016 12:24 PM, Olivier Sallou wrote: > Hi, > I am trying unified containerizer on a single server (master/slave) on > mesos 0.28.1, to switch from docker containerizer to mesos+docker image > container. > > I have setup slave config as suggested in documentation: > > containerizers=docker,mesos > image_providers=docker \ > isolation=filesystem/linux,docker/runtime > > However, when I execute my task with a volume I have an error: > > > + mount -n --rbind > /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f > /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs > + mount -n --rbind > /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task > /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data > mount: mount point > /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data > does not exist > Failed to execute a preparation shell command > > Then, my task switches to FAILED. > > I define a local volume to bind mount in my "container" > /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task > => /mnt/god-data > My directory exists on local server. > In mesos UI, I can see the .rootfs directory along stdout and stderr > files, and inside .rootfs, I can see /mnt/god-data (empty). > > Running the same using Docker containerizer instead of mesos > containerizer (with a Docker image) works fine. > > It seems it fails to mount my local directory in the container. Any idea > of what is going wrong or how to debug this? > > > Thanks > -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
volume / mount point error with Unified Containerizer
Hi, I am trying unified containerizer on a single server (master/slave) on mesos 0.28.1, to switch from docker containerizer to mesos+docker image container. I have setup slave config as suggested in documentation: containerizers=docker,mesos image_providers=docker \ isolation=filesystem/linux,docker/runtime However, when I execute my task with a volume I have an error: + mount -n --rbind /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs + mount -n --rbind /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data mount: mount point /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data does not exist Failed to execute a preparation shell command Then, my task switches to FAILED. I define a local volume to bind mount in my "container" /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task => /mnt/god-data My directory exists on local server. In mesos UI, I can see the .rootfs directory along stdout and stderr files, and inside .rootfs, I can see /mnt/god-data (empty). Running the same using Docker containerizer instead of mesos containerizer (with a Docker image) works fine. It seems it fails to mount my local directory in the container. Any idea of what is going wrong or how to debug this? Thanks -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Mesos admin REST API
Hi, Is there any operator/admin admin to kill a task, via an admin API ? I faced issue where mesos does not send any offer to my framework after a task failure (remains in staging, or can't contact an old framework. The result is my framework cannot send new kills etc.. I'd like, as a mesos admin, to send a kill request (or other kind of requests), "by passing" the framework. Thanks Olivier -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
work on Mesos Containerizer to support docker containers
Hi, I have seen there are some work on Mesos Containerizer to support docker containers instead of using Docker Containerizer, which would help support Docker network etc... with Calico for example. Is there any doc on this available somewhere ? Where is code of the Mesos Containerizer? (I found Docker one but can't find default Mesos one). Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: work on Mesos Containerizer to support docker containers
On 01/19/2016 03:55 PM, Jan Schlicht wrote: > Hi Olivier, > > status for the "Unified Containerizer" project is tracked under this epic: > https://issues.apache.org/jira/browse/MESOS-2840 > There's a design document linked in the epic, unfortunately I'm not able to > access it. perfect, thanks > > Cheers, > Jan > > On Tue, Jan 19, 2016 at 3:06 PM, Qian Zhang <zhq527...@gmail.com> wrote: > >> Hi Olivier, >> >> Here is the doc of MesosContainerizer: >> https://github.com/apache/mesos/blob/master/docs/mesos-containerizer.md >> >> And you may also find the following docs helpful: >> https://github.com/apache/mesos/blob/master/docs/containerizer.md >> https://github.com/apache/mesos/blob/master/docs/containerizer-internals.md >> >> And the code of MesosContainerizer is under: >> src/slave/containerizer/mesos/ >> >> >> Regards, >> Qian >> >> >> On Tue, Jan 19, 2016 at 9:14 PM, Olivier Sallou <olivier.sal...@irisa.fr> >> wrote: >> >>> Hi, >>> I have seen there are some work on Mesos Containerizer to support docker >>> containers instead of using Docker Containerizer, which would help >>> support Docker network etc... with Calico for example. >>> Is there any doc on this available somewhere ? Where is code of the >>> Mesos Containerizer? (I found Docker one but can't find default Mesos >> one). >>> Thanks >>> >>> Olivier >>> >>> -- >>> >>> gpg key id: 4096R/326D8438 (keyring.debian.org) >>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >>> >>> > > -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
ContainerId in TaskStatus message: can't find update in mesos.proto
Hi, mesos .023 added ContainerId in TaskStatus message as per: https://issues.apache.org/jira/browse/MESOS-2191 However, I do not see any related modification in mesos.proto [0] Am I missing something? As such is python client including the modification? [0] https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: ContainerId in TaskStatus message: can't find update in mesos.proto
On 12/21/2015 03:08 PM, Shuai Lin wrote: > From what I read in the ticket, What's done is "adding the output of > `docker output` to the `data` field of TaskStatus message when a task is in > TASK_RUNNING state', so the related protobuf field is TaskStatus.data, not > a specific 'containerid' field. See > https://github.com/apache/mesos/blob/09a2fb3/src/docker/executor.cpp#L166 Seems so indeed, label of task is misleading. Thanks anyway > On Mon, Dec 21, 2015 at 5:37 PM, Olivier Sallou <olivier.sal...@irisa.fr> > wrote: > >> Hi, >> mesos .023 added ContainerId in TaskStatus message as per: >> https://issues.apache.org/jira/browse/MESOS-2191 >> >> However, I do not see any related modification in mesos.proto [0] >> >> Am I missing something? As such is python client including the >> modification? >> >> [0] https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto >> >> Thanks >> >> Olivier >> >> -- >> >> gpg key id: 4096R/326D8438 (keyring.debian.org) >> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >> >> -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Docker network support
Hi, what is the current/planned feature support for Docker network ? Docker network creates an overlay network to link multiple containers on multiple hosts. Is it supported/planned in mesos ? I do not find any such info for the moment in mesos.proto Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
how to get docker container id?
Hi, how can we get the container id when executing a TaskInfo with a Docker ContainerInfo ? Mesos execute a Docker container with name mesos-xxx but how can we get this identifier ? I set in my TaskInfo a unique id in Task Id, but itis not used as Docker identifier. I need it to query cAdvisor, running on my nodes. Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: how to get docker container id?
On 06/12/2015 12:02 PM, Adam Bordelon wrote: You can query the slave's state.json to get the container ID. See the previous thread: http://search-hadoop.com/m/0Vlr6OtCiO1p8ypc2/mesos+accessing+programmatticallysubj=Re+Accessing+stdout+stderr+of+a+task+programmattically+ Thanks, I could get it, but it would be nice to get the information in update message rather than needing to trigger the nodes (with information for all tasks). Olivier On Fri, Jun 12, 2015 at 2:35 AM, Olivier Sallou olivier.sal...@irisa.fr wrote: Hi, how can we get the container id when executing a TaskInfo with a Docker ContainerInfo ? Mesos execute a Docker container with name mesos-xxx but how can we get this identifier ? I set in my TaskInfo a unique id in Task Id, but itis not used as Docker identifier. I need it to query cAdvisor, running on my nodes. Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Docker port_mapping issue
Hi, I can run task with success in a Docker container in my mesos install using base executor. However, I cannot get a task running when I add port mapping (though port is available). I use mesos 0.22, with python 2.7. If I print the sent task I have: name: task 0 task_id { value: 0 } slave_id { value: 20150526-114150-16777343-5050-2035-S0 } resources { name: cpus type: SCALAR scalar { value: 1 } } resources { name: mem type: SCALAR scalar { value: 128 } } command { value: echo \hello world # $MESOS_SANDBOX #\ } container { type: DOCKER docker { image: centos network: BRIDGE port_mappings { host_port: 31000 container_port: 22 } force_pull_image: true } } And it ends with error: Task 0 is in state TASK_FAILED Abnormal executor termination Slave shows: I0529 13:50:49.813928 18426 docker.cpp:626] Starting container 'd9b5be3e-9f00-4242-aa91-d6a6f3a5175a' for task '0' (and executor '0') of framework '20150529-103634-16777343-5050-18179-0020' E0529 13:50:54.362663 18420 slave.cpp:3112] Container 'd9b5be3e-9f00-4242-aa91-d6a6f3a5175a' for executor '0' of framework '20150529-103634-16777343-5050-18179-0020' failed to start: Port mappings require port resources However the offer present port resources: resources { name: ports type: RANGES ranges { range { begin: 31000 end: 32000 } } role: * } At slave startup I also see: I0529 14:05:37.481212 22455 slave.cpp:322] Slave resources: cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000] Any idea of what is going wrong? Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: Docker port_mapping issue
On 05/29/2015 02:07 PM, Olivier Sallou wrote: Hi, I can run task with success in a Docker container in my mesos install using base executor. However, I cannot get a task running when I add port mapping (though port is available). ok, it appears that in addition to Docker port_mapping, we need to add a port resource declaration in the task too, with something like: ports = task.resources.add() ports.name = ports ports.type = mesos_pb2.Value.RANGES port_range = ports.ranges.range.add() port_range.begin=31000 port_range.end=31000 we kinda need to duplicate port declaration (task and docker) in task. I use mesos 0.22, with python 2.7. If I print the sent task I have: name: task 0 task_id { value: 0 } slave_id { value: 20150526-114150-16777343-5050-2035-S0 } resources { name: cpus type: SCALAR scalar { value: 1 } } resources { name: mem type: SCALAR scalar { value: 128 } } command { value: echo \hello world # $MESOS_SANDBOX #\ } container { type: DOCKER docker { image: centos network: BRIDGE port_mappings { host_port: 31000 container_port: 22 } force_pull_image: true } } And it ends with error: Task 0 is in state TASK_FAILED Abnormal executor termination Slave shows: I0529 13:50:49.813928 18426 docker.cpp:626] Starting container 'd9b5be3e-9f00-4242-aa91-d6a6f3a5175a' for task '0' (and executor '0') of framework '20150529-103634-16777343-5050-18179-0020' E0529 13:50:54.362663 18420 slave.cpp:3112] Container 'd9b5be3e-9f00-4242-aa91-d6a6f3a5175a' for executor '0' of framework '20150529-103634-16777343-5050-18179-0020' failed to start: Port mappings require port resources However the offer present port resources: resources { name: ports type: RANGES ranges { range { begin: 31000 end: 32000 } } role: * } At slave startup I also see: I0529 14:05:37.481212 22455 slave.cpp:322] Slave resources: cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000] Any idea of what is going wrong? Thanks Olivier -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: Docker containers not removed
On 05/26/2015 03:44 PM, Olivier Sallou wrote: Hi, I could make a test scrip to submit tasks in Docker containers. My tasks ends in FINISHED state, and everything goes fine. The point is the container is not removed (can be seen with a docker ps -a), though documentation states: 6. On container exit or containerizer destroy, stop and remove the docker container. even after a few minutes, they are still present. am i missing something? I just found somewhere in config: --docker_remove_delay=VALUE default being 6 hrs. I will wait and check ! Thanks Olivier -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Docker containers not removed
Hi, I could make a test scrip to submit tasks in Docker containers. My tasks ends in FINISHED state, and everything goes fine. The point is the container is not removed (can be seen with a docker ps -a), though documentation states: 6. On container exit or containerizer destroy, stop and remove the docker container. even after a few minutes, they are still present. am i missing something? Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: use of docker containerizer
On 10/22/2014 11:02 AM, Adam Bordelon wrote: Olivier, You should only need to create the /etc/mesos-slave/containerizers OR specify --containerizers on the mesos-slave command-line. Either should work. - Is dockerd installed and running on the slave? yes docker is running on slave - You could be running into MESOS-1873 https://issues.apache.org/jira/browse/MESOS-1873. Try setting your value: ls -l /etc instead of using the arguments fields with shell: false I tried too, my Task contains: command { shell: true arguments: ls arguments: -l arguments: /etc } container { type: DOCKER docker { image: dockerimages/centos-core } } but I have the same error: Container '3f98b4ee-3417-407f-8717-b60a1ab6f359' for executor '0' of framework '20141022-112627-16777343-5050-6219-' failed to start: None of the enabled containerizers (mesos) could create a container for the provided TaskInfo/ExecutorInfo message. I find strange to find enabled containerizers (mesos) instead of something like enabled containerizers (docker,mesos) On Tue, Oct 21, 2014 at 2:58 AM, Olivier Sallou olivier.sal...@irisa.fr wrote: Hi, I try to use the default docker containizer but I can't get it work... :-( My Task is correctly executed when using default executor with CommandInfo. If I add a ContainerInfo it fails. I launch my slave with options: –-containerizers=docker,mesos (this is a source install, not system wide installed) I see in slave logs: E1021 11:50:26.392259 12748 slave.cpp:2656] Container '4460417c-1f78-4d99-ab07-8524c73ab35c' for executor '0' of framework '20141021-113729-16777343-5050-12670-0003' failed to start: None of the enabled containerizers (mesos) could create a container for the provided TaskInfo/ExecutorInfo message. and Tasks ends in FAILED state. My Task looks like: ... command { value: ls shell: false arguments: -l arguments: /etc } container { type: DOCKER docker { image: docker:///dockerimages/centos-core } } It acts as if the docker option on slave is not taken into account. I found on Internet that a file need to be created: echo 'docker,mesos' /etc/mesos-slave/containerizers to activate the flag, but I do not know where this file should be created when mesos is not system-wide installed. Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
use of docker containerizer
Hi, I try to use the default docker containizer but I can't get it work... :-( My Task is correctly executed when using default executor with CommandInfo. If I add a ContainerInfo it fails. I launch my slave with options: –-containerizers=docker,mesos (this is a source install, not system wide installed) I see in slave logs: E1021 11:50:26.392259 12748 slave.cpp:2656] Container '4460417c-1f78-4d99-ab07-8524c73ab35c' for executor '0' of framework '20141021-113729-16777343-5050-12670-0003' failed to start: None of the enabled containerizers (mesos) could create a container for the provided TaskInfo/ExecutorInfo message. and Tasks ends in FAILED state. My Task looks like: ... command { value: ls shell: false arguments: -l arguments: /etc } container { type: DOCKER docker { image: docker:///dockerimages/centos-core } } It acts as if the docker option on slave is not taken into account. I found on Internet that a file need to be created: echo 'docker,mesos' /etc/mesos-slave/containerizers to activate the flag, but I do not know where this file should be created when mesos is not system-wide installed. Thanks Olivier -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: how to debug task lost in custom scheduler?
On 10/17/2014 07:31 PM, Vinod Kone wrote: Can you grep for TASK_LOST in master and slave logs and paste the output here? I do not see any TASK_LOST in any master/slave log, this is one of the reason I do not understand. I only found console log, I do not see any file log. For information, mesos is not installed system-wide but locally from source, I execute from the build directory. On Fri, Oct 17, 2014 at 8:24 AM, Olivier Sallou olivier.sal...@irisa.fr wrote: Hi, I have installed mesos on a single host master/slave config (for devpt/test). Mesos works fine with frameworks I tested (aurora...). I try to create my own scheduler/executor in python, based on example given with sources, but I cannot get my task executed. Executor is not executed (I have added debug logs in a file to check, and no file is created), but I see no error in master logs (console) nor slave logs. In master I can see: I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to framework 20141017-141022-16777343-5050-25774-0047 I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051 (localhost) for framework 20141017-141022-16777343-5050-25774-0047 I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563] Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000] (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0 from framework 20141017-141022-16777343-5050-25774-0047 My reply to the offer is received, but in my scheduler I receive an update status of TASK_LOST. I do not see how to debug this, I see no information why my task is lost (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that it is rejected at master level. Any hint on how to analyse this? Thanks -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: how to debug task lost in custom scheduler?
On 10/18/2014 12:55 PM, Alex Rukletsov wrote: Hi Oliver, you can get a TASK_LOST if import directives in your executor fail. Do you have mesos python eggs installed or available through PYTHONPATH? Could you please also paste the output of stderr and stdout of the lost task (you can access them via mesos webUI → sandbox)? I do not see the task at all on webUI. Python eggs are available from PYTHONPATH. My eggs are in MESOS_BUILD_DIR. If I execute directly my executor, I have no python error, only a MISSING SLAVE ID (but this is correct as mesos adds this env at runtime). I see that task is lost because, in my scheduler, in the statusUpdate method, I print the task status (value = 5). Message is empty. nothing in webUI, nothing in console logs as my executor is not executed, it means that mesos (master or slave) give me this error status, but I have no additional info about the reason. I have used and adapted the examples given with sources (src/examples/python). Olivier On Fri, Oct 17, 2014 at 7:31 PM, Vinod Kone vinodk...@gmail.com wrote: Can you grep for TASK_LOST in master and slave logs and paste the output here? On Fri, Oct 17, 2014 at 8:24 AM, Olivier Sallou olivier.sal...@irisa.fr wrote: Hi, I have installed mesos on a single host master/slave config (for devpt/test). Mesos works fine with frameworks I tested (aurora...). I try to create my own scheduler/executor in python, based on example given with sources, but I cannot get my task executed. Executor is not executed (I have added debug logs in a file to check, and no file is created), but I see no error in master logs (console) nor slave logs. In master I can see: I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to framework 20141017-141022-16777343-5050-25774-0047 I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051 (localhost) for framework 20141017-141022-16777343-5050-25774-0047 I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563] Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000] (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0 from framework 20141017-141022-16777343-5050-25774-0047 My reply to the offer is received, but in my scheduler I receive an update status of TASK_LOST. I do not see how to debug this, I see no information why my task is lost (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that it is rejected at master level. Any hint on how to analyse this? Thanks -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: how to debug task lost in custom scheduler?
On 10/20/2014 08:11 AM, Olivier Sallou wrote: On 10/18/2014 12:55 PM, Alex Rukletsov wrote: Hi Oliver, you can get a TASK_LOST if import directives in your executor fail. Do you have mesos python eggs installed or available through PYTHONPATH? Could you please also paste the output of stderr and stdout of the lost task (you can access them via mesos webUI → sandbox)? I do not see the task at all on webUI. Python eggs are available from PYTHONPATH. My eggs are in MESOS_BUILD_DIR. If I execute directly my executor, I have no python error, only a MISSING SLAVE ID (but this is correct as mesos adds this env at runtime). I see that task is lost because, in my scheduler, in the statusUpdate method, I print the task status (value = 5). Message is empty. nothing in webUI, nothing in console logs as my executor is not executed, it means that mesos (master or slave) give me this error status, but I have no additional info about the reason. I have used and adapted the examples given with sources (src/examples/python). Taking as example the python code in src/examples/python, I could progress a little. Though there is no additional error log, I found an issue with setting the command parameter. If I comment the command parameter, my executor is executed (it fails but that's fine for the moment). In my task, I was setting: task.command.value = something to execute on node Setting command creates a silent error. My TaskInfo was like: . executor { executor_id { value: default } command { value: ../test-executor } name: Test Executor (Python) source: python_test } command { value: ls -l } So I wonder: 1) why the error is silent on master side 2) how do I set the command to execute in the TaskInfo object ? Olivier On Fri, Oct 17, 2014 at 7:31 PM, Vinod Kone vinodk...@gmail.com wrote: Can you grep for TASK_LOST in master and slave logs and paste the output here? On Fri, Oct 17, 2014 at 8:24 AM, Olivier Sallou olivier.sal...@irisa.fr wrote: Hi, I have installed mesos on a single host master/slave config (for devpt/test). Mesos works fine with frameworks I tested (aurora...). I try to create my own scheduler/executor in python, based on example given with sources, but I cannot get my task executed. Executor is not executed (I have added debug logs in a file to check, and no file is created), but I see no error in master logs (console) nor slave logs. In master I can see: I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to framework 20141017-141022-16777343-5050-25774-0047 I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051 (localhost) for framework 20141017-141022-16777343-5050-25774-0047 I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563] Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000] (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0 from framework 20141017-141022-16777343-5050-25774-0047 My reply to the offer is received, but in my scheduler I receive an update status of TASK_LOST. I do not see how to debug this, I see no information why my task is lost (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that it is rejected at master level. Any hint on how to analyse this? Thanks -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Re: how to debug task lost in custom scheduler?
On 10/20/2014 05:20 PM, Alex Rukletsov wrote: It looks like you try to set both command and executor. This is not allowed, since setting a command implies using the CommandExecutor aka mesos-executor. If you task is a command, do not specify the executor in your TaskInfo: mesos will do it for you. See https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto line 579. Btw, you should observe something like Task id should have either CommandInfo or ExecutorInfo set but not both in your logs. ok, thanks, I could get it work (at least I see my job). There is a lack of documentation on API per language. :-( Thanks for your help Olivier On Mon, Oct 20, 2014 at 5:13 PM, Olivier Sallou olivier.sal...@irisa.fr wrote: On 10/20/2014 08:11 AM, Olivier Sallou wrote: On 10/18/2014 12:55 PM, Alex Rukletsov wrote: Hi Oliver, you can get a TASK_LOST if import directives in your executor fail. Do you have mesos python eggs installed or available through PYTHONPATH? Could you please also paste the output of stderr and stdout of the lost task (you can access them via mesos webUI → sandbox)? I do not see the task at all on webUI. Python eggs are available from PYTHONPATH. My eggs are in MESOS_BUILD_DIR. If I execute directly my executor, I have no python error, only a MISSING SLAVE ID (but this is correct as mesos adds this env at runtime). I see that task is lost because, in my scheduler, in the statusUpdate method, I print the task status (value = 5). Message is empty. nothing in webUI, nothing in console logs as my executor is not executed, it means that mesos (master or slave) give me this error status, but I have no additional info about the reason. I have used and adapted the examples given with sources (src/examples/python). Taking as example the python code in src/examples/python, I could progress a little. Though there is no additional error log, I found an issue with setting the command parameter. If I comment the command parameter, my executor is executed (it fails but that's fine for the moment). In my task, I was setting: task.command.value = something to execute on node Setting command creates a silent error. My TaskInfo was like: . executor { executor_id { value: default } command { value: ../test-executor } name: Test Executor (Python) source: python_test } command { value: ls -l } So I wonder: 1) why the error is silent on master side 2) how do I set the command to execute in the TaskInfo object ? Olivier On Fri, Oct 17, 2014 at 7:31 PM, Vinod Kone vinodk...@gmail.com wrote: Can you grep for TASK_LOST in master and slave logs and paste the output here? On Fri, Oct 17, 2014 at 8:24 AM, Olivier Sallou olivier.sal...@irisa.fr wrote: Hi, I have installed mesos on a single host master/slave config (for devpt/test). Mesos works fine with frameworks I tested (aurora...). I try to create my own scheduler/executor in python, based on example given with sources, but I cannot get my task executed. Executor is not executed (I have added debug logs in a file to check, and no file is created), but I see no error in master logs (console) nor slave logs. In master I can see: I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to framework 20141017-141022-16777343-5050-25774-0047 I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051 (localhost) for framework 20141017-141022-16777343-5050-25774-0047 I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563] Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000] (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0 from framework 20141017-141022-16777343-5050-25774-0047 My reply to the offer is received, but in my scheduler I receive an update status of TASK_LOST. I do not see how to debug this, I see no information why my task is lost (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that it is rejected at master level. Any hint on how to analyse this? Thanks -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 -- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
how to debug task lost in custom scheduler?
Hi, I have installed mesos on a single host master/slave config (for devpt/test). Mesos works fine with frameworks I tested (aurora...). I try to create my own scheduler/executor in python, based on example given with sources, but I cannot get my task executed. Executor is not executed (I have added debug logs in a file to check, and no file is created), but I see no error in master logs (console) nor slave logs. In master I can see: I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to framework 20141017-141022-16777343-5050-25774-0047 I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051 (localhost) for framework 20141017-141022-16777343-5050-25774-0047 I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563] Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000] (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0 from framework 20141017-141022-16777343-5050-25774-0047 My reply to the offer is received, but in my scheduler I receive an update status of TASK_LOST. I do not see how to debug this, I see no information why my task is lost (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that it is rejected at master level. Any hint on how to analyse this? Thanks -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438