from:"Olivier Sallou"

Re: use of bridge / port-mapper, can't access mapped port from remote server

2018-03-12 Thread Olivier Sallou



On 03/12/2018 12:31 PM, Olivier Sallou wrote:
> Hi,
>
> I tried to setup CNI bridge + mesos port mapper with unified container,
> following doc
> http://mesos.apache.org/documentation/latest/cni/#a-port-mapper-plugin
>
> This partially works (example with container ip 192.0.0.2 and port
> mapping 22 => 31000)
>
>  - my container starts and get a local assigned IP 192.0.0.2
>
> - I can access directly to the port of the container: ssh 192.0.0.2
>
> - I can access via the *local* gateway: ssh 192.0.0.1 -p 31000
>
>
> However, I cannot access the container via the IP of my server: ssh
> 131.x.y.z -p 31000
>
>
> In iptables rules, I do not see any mesos related chain. I see no
> specific CHAIN nor comment in iptables (iptables -L)
Additional info, using -t nat option, I can see iptables chain.

Chain MESOS-TEST-PORT-MAPPER (2 references)
target prot opt source   destination
DNAT   tcp  --  anywhere anywhere tcp
dpt:31000 /* container_id: 3a4e0070-7fe2-4807-a643-27ff9608e882 */
to:192.168.0.2:22


In fact I could make it worked, using *external* ip address of my
server. One of iptable rules set by mesos prevent routing to localhost,
that's why my previous tests failed
>
>
> Is it an expected behavior (port mapping maps ports but only via local
> bridge gateway), or should mesos add routes to local mesos bridge to
> allow remote access to the mapped ports?
>
>
> I have iptables 1.6.0 and linux kernel 4.4.
>
>
>
> I used config from documentation
>
> bridge.conf
>
>
> {
> "name": "cni-test",
> "type": "bridge",
> "bridge": "mesos-cni0",
> "isGateway": true,
> "ipMasq": true,
> "ipam": {
>     "type": "host-local",
>     "subnet": "192.168.0.0/16",
>     "routes": [
>     { "dst":
>   "0.0.0.0/0" }
>     ]
>   }
> }
>
>
> and portmapper.conf
>
> {
>   "name" : "port-mapper-test",
>   "type" : "mesos-cni-port-mapper",
>   "excludeDevices" : ["mesos-cni0"],
>   "chain": "MESOS-TEST-PORT-MAPPER",
>   "delegate": {
>   "type": "bridge",
>   "bridge": "mesos-cni0",
>   "isGateway": true,
>   "ipMasq": true,
>   "ipam": {
>     "type": "host-local",
>     "subnet": "192.168.0.0/16",
>     "routes": [
>     { "dst":
>   "0.0.0.0/0" }
>     ]
>   }
>   }
> }
>
> Thanks
>
>
> Olivier
>

-- 
Olivier Sallou
Univ Rennes, Inria, CNRS, IRISA
Irisa, Campus de Beaulieu
F-35042 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

use of bridge / port-mapper, can't access mapped port from remote server

2018-03-12 Thread Olivier Sallou

Hi,

I tried to setup CNI bridge + mesos port mapper with unified container,
following doc
http://mesos.apache.org/documentation/latest/cni/#a-port-mapper-plugin

This partially works (example with container ip 192.0.0.2 and port
mapping 22 => 31000)

 - my container starts and get a local assigned IP 192.0.0.2

- I can access directly to the port of the container: ssh 192.0.0.2

- I can access via the *local* gateway: ssh 192.0.0.1 -p 31000


However, I cannot access the container via the IP of my server: ssh
131.x.y.z -p 31000


In iptables rules, I do not see any mesos related chain. I see no
specific CHAIN nor comment in iptables (iptables -L)


Is it an expected behavior (port mapping maps ports but only via local
bridge gateway), or should mesos add routes to local mesos bridge to
allow remote access to the mapped ports?


I have iptables 1.6.0 and linux kernel 4.4.



I used config from documentation

bridge.conf


{
"name": "cni-test",
"type": "bridge",
"bridge": "mesos-cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
    "type": "host-local",
    "subnet": "192.168.0.0/16",
    "routes": [
    { "dst":
  "0.0.0.0/0" }
    ]
  }
}


and portmapper.conf

{
  "name" : "port-mapper-test",
  "type" : "mesos-cni-port-mapper",
  "excludeDevices" : ["mesos-cni0"],
  "chain": "MESOS-TEST-PORT-MAPPER",
  "delegate": {
  "type": "bridge",
  "bridge": "mesos-cni0",
  "isGateway": true,
  "ipMasq": true,
  "ipam": {
    "type": "host-local",
    "subnet": "192.168.0.0/16",
    "routes": [
    { "dst":
  "0.0.0.0/0" }
    ]
  }
  }
}

Thanks


Olivier

-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: strange behaviour: Task status -> error-> finished

2017-09-19 Thread Olivier Sallou



On 09/19/2017 11:22 AM, Benno Evers wrote:
> Hi Olivier,
>
>> Can we have "non terminal" errors, from mesos point of view, where task
> should not be considered as over?
>
> Not really, what you're seeing certainly looks like a bug, terminal updates
> should be terminal. It'lls probably be hard to debug it without more data ;)
indeed...
>
> As a wild guess, since you seem to be using custom task id's, maybe you
> tried to start a task twice, and the TASK_ERROR was generated on the master
> in response to the duplicate task id or some other validation issue, and
> the TASK_FINISHED was generated on the slave when the first task finished?
> Although I'm not sure from the top of my head if there are checks in mesos
> that would catch this.
nope, task was not started twice (got only one TASK_RUNNING event). When
resubmitted, task id is modified.
Thanks anyway.
>
> Best regards,
>
> On Tue, Sep 19, 2017 at 7:47 AM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>> Hi
>> I found a strange behaviour on a cluster that I do not understand. I do
>> not have access to mesos logs (not in my cluster), but anyone faced this
>> before ?
>> My framework uses Docker containerizer. We faced a task that sent
>> TASK_ERROR to the framework (why not), but in reality the Docker executed
>> correctly on mesos slave, then we received a TASK_FINISHED.
>> So mesos detected an error with task but it detected anyway the end of the
>> task sending the finished event at the end.
>>
>> How mesos can detect an error but still watching the task and detect its
>> end ?
>>
>> Here are my framework logs:
>> 2017-09-17 01:06:35,447 DEBUG [godocker-scheduler][Thread-1] Task 17820-0
>> is in state TASK_RUNNING
>> 2017-09-17 01:06:46,286 DEBUG [godocker-scheduler][Thread-1] Task 17820-0
>> is in state TASK_ERROR
>> 2017-09-17 02:13:44,537 DEBUG [godocker-scheduler][Thread-1] Task 17820-0
>> is in state TASK_FINISHED
>>
>> Unfortunalty I did not log the "reason" of the ERROR, so I do not know
>> what occured, and cannot at this stage reproduce manually the use case.
>>
>> Can we have "non terminal" errors, from mesos point of view, where task
>> should not be considered as over?
>>
>> Thanks
>>
>> Olivier
>>
>
>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

strange behaviour: Task status -> error-> finished

2017-09-18 Thread Olivier Sallou

Hi 
I found a strange behaviour on a cluster that I do not understand. I do not 
have access to mesos logs (not in my cluster), but anyone faced this before ? 
My framework uses Docker containerizer. We faced a task that sent TASK_ERROR to 
the framework (why not), but in reality the Docker executed correctly on mesos 
slave, then we received a TASK_FINISHED. 
So mesos detected an error with task but it detected anyway the end of the task 
sending the finished event at the end. 

How mesos can detect an error but still watching the task and detect its end ? 

Here are my framework logs: 
2017-09-17 01:06:35,447 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 is in 
state TASK_RUNNING 
2017-09-17 01:06:46,286 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 is in 
state TASK_ERROR 
2017-09-17 02:13:44,537 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 is in 
state TASK_FINISHED 

Unfortunalty I did not log the "reason" of the ERROR, so I do not know what 
occured, and cannot at this stage reproduce manually the use case. 

Can we have "non terminal" errors, from mesos point of view, where task should 
not be considered as over? 

Thanks 

Olivier

Re: GPU Users -- Deprecation of GPU_RESOURCES capability

2017-05-22 Thread Olivier Sallou



On 05/21/2017 03:45 AM, Kevin Klues wrote:
> Hello GPU users,
>
> We are currently considering deprecating the requirement that frameworks
> register with the GPU _RESOURCES capability in order to receive offers that
> contain GPUs. Going forward, we will recommend that users rely on Mesos's
> builtin `reservation` mechanism to achieve similar results.
>
> Before deprecating it, we wanted to get a sense from the community if
> anyone is currently relying on this capability and would like to see it
> persist. If not, we will begin deprecating it in the next Mesos release and
> completely remove it in Mesos 2.0.
Well, I am using it for GoDocker framework where jos can specify to sue
(or not) some GPUs.
>
> As background, the original motivation for this capability was to keep
> “legacy” frameworks from inadvertently scheduling jobs that don’t require
> GPUs on GPU capable machines and thus starving out other frameworks that
> legitimately want to place GPU jobs on those machines. The assumption here
> was that most machines in a cluster won't have GPUs installed on them, so
> some mechanism was necessary to keep legacy frameworks from scheduling jobs
> on those machines. In essence, it provided an implicit reservation of GPU
> machines for "GPU aware" frameworks, bypassing the traditional
> `reservation` mechanism already built into Mesos.
>
> In such a setup, legacy frameworks would be free to schedule jobs on
> non-GPU machines, and "GPU aware" frameworks would be free to schedule GPU
> jobs GPU machines and other types of jobs on other machines (or mix and
> match them however they please).
>
> However, the problem comes when *all* machines in a cluster contain GPUs
> (or even if most of the machines in a cluster container them). When this is
> the case, we have the opposite problem we were trying to solve by
> introducing the GPU_RESOURCES capability in the first place. We end up
> starving out jobs from legacy frameworks that *don’t* require GPU resources
> because there are not enough machines available that don’t have GPUs on
> them to service those jobs. We've actually seen this problem manifest in
> the wild at least once.
>
> An alternative to completely deprecating the GPU_RESOURCES flag would be to
> add a new flag to the mesos master called `--filter-gpu-resources`. When
> set to `true`, this flag will cause the mesos master to continue to
> function as it does today. That is, it would filter offers containing GPU
> resources and only send them to frameworks that opt into the GPU_RESOURCES
> framework capability. When set to `false`, this flag would cause the master
> to *not* filter offers containing GPU resources, and indiscriminately send
> them to all frameworks whether they set the GPU_RESOURCES capability or not.
>
> , this flag would allow them to keep relying on it without disruption.
>
> We'd prefer to deprecate the capability completely, but would consider
> adding this flag if people are currently relying on the GPU_RESOURCES
> capability and would like to see it persist
>
> We welcome any feedback you have.
>
> Kevin + Ben
>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re : Re: protbuf to json not compatible

2017-03-25 Thread Olivier Sallou


- Benjamin Mahler <bmah...@apache.org> a écrit :
> James, I'm curious, do you know specifically what the incompatibility is?
> 
> Olivier, if you're dealing with protobuf already and trying to send it to
> mesos, there's no need to use JSON. Unless you have a requirement to do so?

I can manage json, this is fine.
Sending protobuf mean sending whole accept message as protobuf, not task 
definition only. But for this I need mesos.native python package, and i want to 
avoid this. So i will switch to full json.

Olivier



> There are some outstanding issues with our JSON<->Protobuf conversion,
> specifically we currently are inconsistent from proto3 when it comes to the
> int(32|64), fixed(32|64), uint(32|64) handling, for one (we don't allow
> strings on the input side (tomek is addressing that), and we don't use
> strings on the output side).
> 
> On Fri, Mar 24, 2017 at 12:44 AM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
> 
> >
> >
> > On 03/24/2017 04:02 AM, James Peach wrote:
> > >> On Mar 23, 2017, at 7:58 PM, James Peach <jor...@gmail.com> wrote:
> > >>
> > >>> On Mar 23, 2017, at 1:54 AM, Olivier Sallou <olivier.sal...@irisa.fr>
> > wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> when transforming a protobug message to json with MessageToJson, the
> > >>> json is not compatible with the json format expected by Mesos master.
> > >> This is because you generated the protobuf bindings with proto3
> > compiler. AFAICT they made an incompatible change to the JSON wire format.
> > This bites you when using the jsonpb Go package, for example. I ended up
> > post-processing the generated Go code to correct the field names.
> > > Sorry I forgot to mention that the other workaround is to generate the
> > protobuf bindings with the proto2 compiler.
> > Thanks
> > My first workaround is to generate json directly, not a big deal in my
> > case, but I wanted to understand.
> >
> > Olivier
> > >
> > > J
> >
> > --
> > Olivier Sallou
> > IRISA / University of Rennes 1
> > Campus de Beaulieu, 35000 RENNES - FRANCE
> > Tel: 02.99.84.71.95
> >
> > gpg key id: 4096R/326D8438  (keyring.debian.org)
> > Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
> >
> >

how to get executor command? unified containerizer fails with unknown flag 'command'

2017-03-24 Thread Olivier Sallou

Hi,

while switching from python protobuf library to HTTP API, I face an
issue when starting the container with unified containerizer.

With the Docker containerizer, everything is fine, but unified
containeriser fails (while it worked nice with the python lib, before my
modifications).

The only executor log I have is:

"Failed to parse the flags: Failed to load unknown flag 'command'"

and update status reason "REASON_CONTAINER_LAUNCH_FAILED"

In slave logs I only find the following:

I0324 13:42:39.487109 29096 linux_launcher.cpp:421] Launching
container 61327ae0-5c9b-4d4c-a015-674b7112539a and cloning with
namespaces CLONE_NEWNS
I0324 13:42:39.507308 29096 systemd.cpp:96] Assigned child process
'8256' to 'mesos_executors.slice'
I0324 13:42:39.558537 29102 containerizer.cpp:2313] Container
61327ae0-5c9b-4d4c-a015-674b7112539a has exited


It may be related to my json task definition, but I do not see what
mesos is trying to execute, and what this "command" flag is and why it
is present.

Is there a way for Mesos to add additional logs to display the executor
command ?


In my task_infos, I define a container of type MESOS and a command with
a value. Should be the same than Docker containerizer, only difference
is in container that has a mesos parameter instead of a docker parameter.

Thanks

Olivier

-- 


gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: protbuf to json not compatible

2017-03-24 Thread Olivier Sallou



On 03/24/2017 04:02 AM, James Peach wrote:
>> On Mar 23, 2017, at 7:58 PM, James Peach <jor...@gmail.com> wrote:
>>
>>> On Mar 23, 2017, at 1:54 AM, Olivier Sallou <olivier.sal...@irisa.fr> wrote:
>>>
>>> Hi,
>>>
>>> when transforming a protobug message to json with MessageToJson, the
>>> json is not compatible with the json format expected by Mesos master.
>> This is because you generated the protobuf bindings with proto3 compiler. 
>> AFAICT they made an incompatible change to the JSON wire format. This bites 
>> you when using the jsonpb Go package, for example. I ended up 
>> post-processing the generated Go code to correct the field names.
> Sorry I forgot to mention that the other workaround is to generate the 
> protobuf bindings with the proto2 compiler.
Thanks
My first workaround is to generate json directly, not a big deal in my
case, but I wanted to understand.

Olivier
>
> J

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

protbuf to json not compatible

2017-03-23 Thread Olivier Sallou

Hi,

when transforming a protobug message to json with MessageToJson, the
json is not compatible with the json format expected by Mesos master.

For example, for volumes it generates


volumes: [

{'hostPath': '',

  'containerPath': '...',

 ...

   }

]


but HTTP API expects "source" and "container_path"

is it an expected behavior ? This prevents from "creating" a task in
protobuf format and sending it to HTTP API with a protobug to json
conversion.

Thanks

Olivier

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: cannot find mesos.native python lib (mesos 1.1.0

2017-03-14 Thread Olivier Sallou



On 03/13/2017 05:41 PM, Olivier Sallou wrote:
> Hi,
>
> I installed Mesos 1.1.0 via deb repo, but when executing python "import
> mesos.native", I have a no module named native.
>
> I tried to compile from source install egg files directly, but I still
> have the issue.
installed eggs in a virtualenv works, so this is really a system install
related issue... but should not be the case at least with deb files.
>
> I can however see the module in python path:
>
>
> root:~/mesos-1.1.0/build/src/python/dist# find /usr/lib/python2.7 | grep
> native
> /usr/lib/python2.7/dist-packages/pygments/styles/native.py
> /usr/lib/python2.7/dist-packages/pygments/styles/native.pyc
> /usr/lib/python2.7/site-packages/mesos.native-1.1.0-py2.7-nspkg.pth
> /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info
> /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/namespace_packages.txt
> /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/RECORD
> /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/DESCRIPTION.rst
> /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/top_level.txt
> /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/metadata.json
> /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/WHEEL
> /usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/METADATA
> /usr/lib/python2.7/site-packages/mesos/native
> /usr/lib/python2.7/site-packages/mesos/native/__init__.pyc
> /usr/lib/python2.7/site-packages/mesos/native/__init__.py
>
> If I try to uninstall package (pip uninstall mesos.native), I have error:
> "Not uninstalling mesos.native at /usr/lib/python2.7/site-packages,
> owned by OS"
>
> so it is seen by the system, but not by python... :-(
>
> my PYTHONPATH is /usr/lib/python2.7/site-packages/
>
> any idea on how to fix this ?
>
> Thanks
> Olivier
>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

cannot find mesos.native python lib (mesos 1.1.0

2017-03-13 Thread Olivier Sallou

Hi,

I installed Mesos 1.1.0 via deb repo, but when executing python "import
mesos.native", I have a no module named native.

I tried to compile from source install egg files directly, but I still
have the issue.


I can however see the module in python path:


root:~/mesos-1.1.0/build/src/python/dist# find /usr/lib/python2.7 | grep
native
/usr/lib/python2.7/dist-packages/pygments/styles/native.py
/usr/lib/python2.7/dist-packages/pygments/styles/native.pyc
/usr/lib/python2.7/site-packages/mesos.native-1.1.0-py2.7-nspkg.pth
/usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info
/usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/namespace_packages.txt
/usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/RECORD
/usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/DESCRIPTION.rst
/usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/top_level.txt
/usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/metadata.json
/usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/WHEEL
/usr/lib/python2.7/site-packages/mesos.native-1.1.0.dist-info/METADATA
/usr/lib/python2.7/site-packages/mesos/native
/usr/lib/python2.7/site-packages/mesos/native/__init__.pyc
/usr/lib/python2.7/site-packages/mesos/native/__init__.py

If I try to uninstall package (pip uninstall mesos.native), I have error:
"Not uninstalling mesos.native at /usr/lib/python2.7/site-packages,
owned by OS"

so it is seen by the system, but not by python... :-(

my PYTHONPATH is /usr/lib/python2.7/site-packages/

any idea on how to fix this ?

Thanks
Olivier

-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: Docker containerizer: override USER

2016-09-02 Thread Olivier Sallou



- Mail original -
> De: "Gilbert Song" <gilb...@mesosphere.io>
> À: "dev" <dev@mesos.apache.org>
> Envoyé: Jeudi 1 Septembre 2016 19:21:06
> Objet: Re: Docker containerizer: override USER
> 
> We considered support --user option in docker containerizer. Unfortunately,
> it would
> potentially break some previous users in behavior. So we did not merge it.
> Please
> see this JIRA for detail:
> 
> https://issues.apache.org/jira/browse/MESOS-5754
> 
> However, you can still use DockerInfo::Parameter to specify your --user as a
> workaround.

That's what I did, but I expected a more *integrated* solution.

Thanks  anyway

Olivier
> 
> Gilbert
> 
> On Thu, Sep 1, 2016 at 9:15 AM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
> 
> >
> >
> > - Mail original -
> > > De: "Qian Zhang" <zhq527...@gmail.com>
> > > À: dev@mesos.apache.org
> > > Envoyé: Jeudi 1 Septembre 2016 15:57:39
> > > Objet: Re: Docker containerizer: override USER
> > >
> > > Hi Olivier,
> > >
> > > Can you try TaskInfo.CommandInfo.user?
> >
> > I will try but mesos.proto specifies:
> >
> >   // Enables executor and tasks to run as a specific user. If the user
> >   // field is present both in FrameworkInfo and here, the CommandInfo
> >   // user value takes precedence.
> >
> > FrameworkInfo.user is specified in my case and set to the expected user
> > XX. So it does not seem that the container is executed wit the --user XX
> > flag.
> >
> > Olivier
> >
> >
> > >
> > >
> > > Thanks,
> > > Qian Zhang
> > >
> > > On Thu, Sep 1, 2016 at 4:39 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> > > wrote:
> > >
> > > > Hi,
> > > > If Docker image specified a USER in Dockerfile, docker will use this
> > user
> > > > when executing command in container.
> > > > In Docker commands, it can be overriden with -u XX .
> > > >
> > > > I do not find however in mesos.proto a way to do so. There is the
> > > > "arguments" of DockerInfo that I could use to append this to the
> > executor
> > > > command line, but I think it is not advised as it may not be supported
> > in
> > > > future.
> > > >
> > > > Did I miss something ?
> > > >
> > > > Thanks
> > > >
> > > > Olvier
> > > >
> > >
> >
>

Re: Docker containerizer: override USER

2016-09-01 Thread Olivier Sallou



- Mail original -
> De: "Qian Zhang" <zhq527...@gmail.com>
> À: dev@mesos.apache.org
> Envoyé: Jeudi 1 Septembre 2016 15:57:39
> Objet: Re: Docker containerizer: override USER
> 
> Hi Olivier,
> 
> Can you try TaskInfo.CommandInfo.user?

I will try but mesos.proto specifies:

  // Enables executor and tasks to run as a specific user. If the user
  // field is present both in FrameworkInfo and here, the CommandInfo
  // user value takes precedence.

FrameworkInfo.user is specified in my case and set to the expected user XX. So 
it does not seem that the container is executed wit the --user XX flag.

Olivier


> 
> 
> Thanks,
> Qian Zhang
> 
> On Thu, Sep 1, 2016 at 4:39 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
> 
> > Hi,
> > If Docker image specified a USER in Dockerfile, docker will use this user
> > when executing command in container.
> > In Docker commands, it can be overriden with -u XX .
> >
> > I do not find however in mesos.proto a way to do so. There is the
> > "arguments" of DockerInfo that I could use to append this to the executor
> > command line, but I think it is not advised as it may not be supported in
> > future.
> >
> > Did I miss something ?
> >
> > Thanks
> >
> > Olvier
> >
>

Docker containerizer: override USER

2016-09-01 Thread Olivier Sallou

Hi, 
If Docker image specified a USER in Dockerfile, docker will use this user when 
executing command in container. 
In Docker commands, it can be overriden with -u XX . 

I do not find however in mesos.proto a way to do so. There is the "arguments" 
of DockerInfo that I could use to append this to the executor command line, but 
I think it is not advised as it may not be supported in future. 

Did I miss something ? 

Thanks 

Olvier

Re: Maintenance API question

2016-08-31 Thread Olivier Sallou



- Mail original -
> De: "Joseph Wu" <jos...@mesosphere.io>
> À: "dev" <dev@mesos.apache.org>
> Envoyé: Mercredi 31 Août 2016 17:16:57
> Objet: Re: Maintenance API question
> 
> Most likely, the hostname and IP you've put into the "machine_Ids"
> does not *exactly
> match* the hostname and IP the agent is identifying itself as.

in this case master should reject the request according to the documentation. 
Here it is accepted (200 OK in response and appears in maintenance/schedule  
and maintenance/status

  If in
> doubt, you can check the master's /slaves endpoint.  Or, you can manually
> set the hostname and IP when starting the agent.


I took information for the master UI and it is the same.

Maybe the issue is the fact I am on a single machine, so hostname and ip are 
the same for master and slave

> 
> On Wed, Aug 31, 2016 at 3:16 AM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
> 
> > Hi,
> > I am trying to use the /maintenance API for mesos slave maintenance/drain.
> >
> > I follow doc at http://mesos.apache.org/documentation/latest/maintenance/
> >
> > I use mesos 1.0.1 on a single machine (for dev).
> >
> > When scheduling a node using
> >
> >
> >
> > {
> > "windows" : [
> > {
> > "machine_ids" : [
> > { "hostname" : "tifenn.irisa.fr", "ip" : "127.0.0.1" }
> > ],
> > "unavailability" : {
> > "start" : { "nanoseconds" : 14726373400 },
> > "duration" : { "nanoseconds" : 36000 }
> > }
> > }
> > ]
> > }
> >
> >
> >
> >
> > The start date is set in the recent past (setting to future did not
> > change).
> >
> >
> > I see in /maintenance/status
> >
> > {"draining_machines":[{"id":{"hostname":"tifenn.irisa.fr","
> > ip":"127.0.0.1"}}]}
> >
> > However, the offers I receive do not contain the unavailibility parameter.
> > I do not know if it is expected, but start/duration do not appear in
> > maintenance/status result.
> > I see in master logs: HTTP POST for /master/maintenance/schedule from
> > 127.0.0.1:34858 with User-Agent='curl/7.43.0'
> >
> >
> > I tried anyway to switch the node to maintenance (/maintenance/down) but I
> > continue to receive offers for this slave. In status, I see my slave in
> > machines_down:
> >
> > {"down_machines":[{"hostname":"tifenn.irisa.fr","ip":"127.0.0.1"}]}
> >
> > I can see on master logs:
> >
> >
> >
> > I0831 12:12:37.568898 6428 http.cpp:381] HTTP POST for
> > /master/machine/down from 127.0.0.1:34970 with User-Agent='curl/7.43.0'
> >
> > 
> >
> > Sending 1 offers to framework a559cd9e-3e58-4377-9e1a-c8f3d28d2318-
> > (Go-Docker Mesos) at scheduler-41e42d1f-b8f8-473a-
> > b460-6fab3a150915@127.0.1.1:43060
> >
> >
> >
> >
> > Should something be set to enable maintenance in mesos ?
> >
> >
> >
> >
> > Thanks
> >
> >
> >
> >
> > Olivier
> >
>

Maintenance API question

2016-08-31 Thread Olivier Sallou

Hi, 
I am trying to use the /maintenance API for mesos slave maintenance/drain. 

I follow doc at http://mesos.apache.org/documentation/latest/maintenance/ 

I use mesos 1.0.1 on a single machine (for dev). 

When scheduling a node using 



{ 
"windows" : [ 
{ 
"machine_ids" : [ 
{ "hostname" : "tifenn.irisa.fr", "ip" : "127.0.0.1" } 
], 
"unavailability" : { 
"start" : { "nanoseconds" : 14726373400 }, 
"duration" : { "nanoseconds" : 36000 } 
} 
} 
] 
} 




The start date is set in the recent past (setting to future did not change). 


I see in /maintenance/status 

{"draining_machines":[{"id":{"hostname":"tifenn.irisa.fr","ip":"127.0.0.1"}}]} 

However, the offers I receive do not contain the unavailibility parameter. I do 
not know if it is expected, but start/duration do not appear in 
maintenance/status result. 
I see in master logs: HTTP POST for /master/maintenance/schedule from 
127.0.0.1:34858 with User-Agent='curl/7.43.0' 


I tried anyway to switch the node to maintenance (/maintenance/down) but I 
continue to receive offers for this slave. In status, I see my slave in 
machines_down: 

{"down_machines":[{"hostname":"tifenn.irisa.fr","ip":"127.0.0.1"}]} 

I can see on master logs: 



I0831 12:12:37.568898 6428 http.cpp:381] HTTP POST for /master/machine/down 
from 127.0.0.1:34970 with User-Agent='curl/7.43.0' 

 

Sending 1 offers to framework a559cd9e-3e58-4377-9e1a-c8f3d28d2318- 
(Go-Docker Mesos) at 
scheduler-41e42d1f-b8f8-473a-b460-6fab3a150915@127.0.1.1:43060 




Should something be set to enable maintenance in mesos ? 




Thanks 




Olivier

Re: Fail to get CNI with unified containerizer, job remains stuck on staging

2016-08-24 Thread Olivier Sallou



On 08/24/2016 04:04 PM, Avinash Sridharan wrote:
> Oliver, you can't have the agent running on 127.0.0.1. The agent needs to
> be running in a routeabl IP address (choose an IP from one of the
> interfaces).
>
> Reason being that if agent is on local host the executor running in its own
> network namespace will try to make a connection in its own network
> namespace and fail.

Thanks! modying ip address to reachable IP works.


> On Wed, Aug 24, 2016 at 5:15 AM Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>> I have the same behavior with Calico. Task get IP from CNI plugin, but
>> task remains in STAGING and same logs.
>>
>> mesos-execute --containerizer=mesos \
>>>   --name=cni \
>>>   --master=127.0.0.1:5050 \
>>>   --networks=calico-net-1 \
>>>   --command="ifconfig"
>> I0824 14:12:03.202328 24912 scheduler.cpp:172] Version: 1.0.0
>> I0824 14:12:03.203009 24911 scheduler.cpp:461] New master detected at
>> master@127.0.0.1:5050
>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0017'
>> Submitted task 'cni' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'
>>
>> REMAINS STAGING!
>>
>>
>> I0824 14:12:03.857158 24806 cni.cpp:1109] Got assigned IPv4 address
>> '192.168.0.0/32' from CNI network 'calico-net-1' for container
>> bdbb275a-ec5f-4a50-aca0-5e694ae57324
>> I0824 14:12:03.857348 24805 cni.cpp:838] Unable to find DNS nameservers
>> for container bdbb275a-ec5f-4a50-aca0-5e694ae57324. Using host
>> '/etc/resolv.conf'
>>
>> No more logs
>>
>>
>> Olivier
>>
>> On 08/24/2016 08:23 AM, Olivier Sallou wrote:
>>> On 08/23/2016 06:13 PM, Jie Yu wrote:
>>>> The DNS related logging means that the weave plugin does not return DNS
>>>> information, the agent uses the host resolv.conf for the container. So I
>>>> think is irrelevant to your problem.
>>>>
>>>> Mesos requires that executor can talk to agent. Can you see if there is
>> a
>>>> route from 10.32.0.1 to the agent IP?
>>> How can I check this as task does not start ? I have exposed weave
>>> network on host:
>>>
>>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo ./weave expose
>>> 10.32.0.2
>>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ ping 10.32.0.2
>>> PING 10.32.0.2 (10.32.0.2) 56(84) bytes of data.
>>> 64 bytes from 10.32.0.2: icmp_seq=1 ttl=64 time=0.032 ms
>>> 64 bytes from 10.32.0.2: icmp_seq=2 ttl=64 time=0.029 ms
>>> 64 bytes from 10.32.0.2: icmp_seq=3 ttl=64 time=0.029 ms
>>> 64 bytes from 10.32.0.2: icmp_seq=4 ttl=64 time=0.031 ms
>>>
>>>  And why is it blocking?
>>>
>>> I am on a single host environement, so agent is on 127.0.0.1.
>>>
>>> Olivier
>>>> On Tue, Aug 23, 2016 at 9:05 AM, Olivier Sallou <
>> olivier.sal...@irisa.fr>
>>>> wrote:
>>>>
>>>>> HI,
>>>>>
>>>>> I have setup Mesos 1.0.0-2 to use CNI with Weave (1.6.1)
>>>>>
>>>>> Weave works nicely with the Docker containerizer.
>>>>>
>>>>> When I try to launch a task via my framework with unified
>> containerizer,
>>>>> the job remains waiting forever (no RUNNING message). I can see however
>>>>> that weave cni allocated an IP address to Mesos.
>>>>>
>>>>> I tried with a simple mesos-execute test.
>>>>>
>>>>> Example with a mesos-execute with no CNI, everything is OK
>>>>>
>>>>>
>>>>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo
>> mesos-execute
>>>>> --command="sleep 2" -docker_image=centos:latest --master=
>> 127.0.0.1:5050
>>>>> --name=test0  I0823 17:56:50.067520 28815 scheduler.cpp:172] Version:
>> 1.0.0
>>>>> I0823 17:56:50.068260 28822 scheduler.cpp:461] New master detected at
>>>>> master@127.0.0.1:5050
>>>>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0005'
>>>>> Submitted task 'test0' to agent
>> 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'
>>>>> Received status update TASK_RUNNING for task 'test0'
>>>>>   source: SOURCE_EXECUTOR
>>>>> Received status update TASK_FINISHED for task 'test0'
>>>>>   message: 'Command exited with status 0'
>>>>>
>>>>>
>>>>> Sample example specifying t

Re: Fail to get CNI with unified containerizer, job remains stuck on staging

2016-08-24 Thread Olivier Sallou

I have the same behavior with Calico. Task get IP from CNI plugin, but
task remains in STAGING and same logs.

mesos-execute --containerizer=mesos \
>   --name=cni \
>   --master=127.0.0.1:5050 \
>   --networks=calico-net-1 \
>   --command="ifconfig"
I0824 14:12:03.202328 24912 scheduler.cpp:172] Version: 1.0.0
I0824 14:12:03.203009 24911 scheduler.cpp:461] New master detected at
master@127.0.0.1:5050
Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0017'
Submitted task 'cni' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'

REMAINS STAGING!


I0824 14:12:03.857158 24806 cni.cpp:1109] Got assigned IPv4 address
'192.168.0.0/32' from CNI network 'calico-net-1' for container
bdbb275a-ec5f-4a50-aca0-5e694ae57324
I0824 14:12:03.857348 24805 cni.cpp:838] Unable to find DNS nameservers
for container bdbb275a-ec5f-4a50-aca0-5e694ae57324. Using host
'/etc/resolv.conf'

No more logs


Olivier

On 08/24/2016 08:23 AM, Olivier Sallou wrote:
>
> On 08/23/2016 06:13 PM, Jie Yu wrote:
>> The DNS related logging means that the weave plugin does not return DNS
>> information, the agent uses the host resolv.conf for the container. So I
>> think is irrelevant to your problem.
>>
>> Mesos requires that executor can talk to agent. Can you see if there is a
>> route from 10.32.0.1 to the agent IP?
> How can I check this as task does not start ? I have exposed weave
> network on host:
>
> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo ./weave expose
> 10.32.0.2
> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ ping 10.32.0.2
> PING 10.32.0.2 (10.32.0.2) 56(84) bytes of data.
> 64 bytes from 10.32.0.2: icmp_seq=1 ttl=64 time=0.032 ms
> 64 bytes from 10.32.0.2: icmp_seq=2 ttl=64 time=0.029 ms
> 64 bytes from 10.32.0.2: icmp_seq=3 ttl=64 time=0.029 ms
> 64 bytes from 10.32.0.2: icmp_seq=4 ttl=64 time=0.031 ms
>
>  And why is it blocking?
>
> I am on a single host environement, so agent is on 127.0.0.1.
>
> Olivier
>> On Tue, Aug 23, 2016 at 9:05 AM, Olivier Sallou <olivier.sal...@irisa.fr>
>> wrote:
>>
>>> HI,
>>>
>>> I have setup Mesos 1.0.0-2 to use CNI with Weave (1.6.1)
>>>
>>> Weave works nicely with the Docker containerizer.
>>>
>>> When I try to launch a task via my framework with unified containerizer,
>>> the job remains waiting forever (no RUNNING message). I can see however
>>> that weave cni allocated an IP address to Mesos.
>>>
>>> I tried with a simple mesos-execute test.
>>>
>>> Example with a mesos-execute with no CNI, everything is OK
>>>
>>>
>>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute
>>> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050
>>> --name=test0  I0823 17:56:50.067520 28815 scheduler.cpp:172] Version: 1.0.0
>>> I0823 17:56:50.068260 28822 scheduler.cpp:461] New master detected at
>>> master@127.0.0.1:5050
>>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0005'
>>> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'
>>> Received status update TASK_RUNNING for task 'test0'
>>>   source: SOURCE_EXECUTOR
>>> Received status update TASK_FINISHED for task 'test0'
>>>   message: 'Command exited with status 0'
>>>
>>>
>>> Sample example specifying the weave network
>>>
>>>
>>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute
>>> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050
>>> --name=test0   --networks=weave
>>> I0823 17:57:15.845304 28856 scheduler.cpp:172] Version: 1.0.0
>>> I0823 17:57:15.846248 28857 scheduler.cpp:461] New master detected at
>>> master@127.0.0.1:5050
>>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0006'
>>> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'
>>> ==> REMAINS WAITING HERE, job is in STAGING in Mesos UI
>>>
>>> mesos-slave logs:
>>>
>>> I0823 17:57:15.873872 26522 cni.cpp:716] Bind mounted
>>> '/proc/28869/ns/net' to
>>> '/run/mesos/isolators/network/cni/4f91a5df-2e9a-4cfc-93f5-aa197646db09/ns'
>>> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09
>>> I0823 17:57:16.257063 26519 cni.cpp:1109] Got assigned IPv4 address
>>> '10.32.0.1/12' from CNI network 'weave' for container
>>> 4f91a5df-2e9a-4cfc-93f5-aa197646db09
>>> I0823 17:57:16.257258 26525 cni.cpp:838] Unable to find DNS nameservers
>>> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09. Using host
>>> '/etc/resolv.conf'
>>>
>>> There are no other logs until I kill the job.
>>> We can see that Mesos container got an IP but it seems to block on DNS,
>>>
>>> Thanks for hints
>>>
>>> --
>>>
>>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>>
>>>
>>>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: Fail to get CNI with unified containerizer, job remains stuck on staging

2016-08-24 Thread Olivier Sallou



On 08/24/2016 08:23 AM, Olivier Sallou wrote:
>
> On 08/23/2016 06:13 PM, Jie Yu wrote:
>> The DNS related logging means that the weave plugin does not return DNS
>> information, the agent uses the host resolv.conf for the container. So I
>> think is irrelevant to your problem.
>>
>> Mesos requires that executor can talk to agent. Can you see if there is a
>> route from 10.32.0.1 to the agent IP?
> How can I check this as task does not start ? I have exposed weave
> network on host:
>
> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo ./weave expose
> 10.32.0.2
> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ ping 10.32.0.2
> PING 10.32.0.2 (10.32.0.2) 56(84) bytes of data.
> 64 bytes from 10.32.0.2: icmp_seq=1 ttl=64 time=0.032 ms
> 64 bytes from 10.32.0.2: icmp_seq=2 ttl=64 time=0.029 ms
> 64 bytes from 10.32.0.2: icmp_seq=3 ttl=64 time=0.029 ms
> 64 bytes from 10.32.0.2: icmp_seq=4 ttl=64 time=0.031 ms
>
>  And why is it blocking?
>
> I am on a single host environement, so agent is on 127.0.0.1.
By the way, running a Docker container to use the weave CNI plugin works
fine, it gets it IP and container runs nicely.
>
> Olivier
>> On Tue, Aug 23, 2016 at 9:05 AM, Olivier Sallou <olivier.sal...@irisa.fr>
>> wrote:
>>
>>> HI,
>>>
>>> I have setup Mesos 1.0.0-2 to use CNI with Weave (1.6.1)
>>>
>>> Weave works nicely with the Docker containerizer.
>>>
>>> When I try to launch a task via my framework with unified containerizer,
>>> the job remains waiting forever (no RUNNING message). I can see however
>>> that weave cni allocated an IP address to Mesos.
>>>
>>> I tried with a simple mesos-execute test.
>>>
>>> Example with a mesos-execute with no CNI, everything is OK
>>>
>>>
>>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute
>>> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050
>>> --name=test0  I0823 17:56:50.067520 28815 scheduler.cpp:172] Version: 1.0.0
>>> I0823 17:56:50.068260 28822 scheduler.cpp:461] New master detected at
>>> master@127.0.0.1:5050
>>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0005'
>>> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'
>>> Received status update TASK_RUNNING for task 'test0'
>>>   source: SOURCE_EXECUTOR
>>> Received status update TASK_FINISHED for task 'test0'
>>>   message: 'Command exited with status 0'
>>>
>>>
>>> Sample example specifying the weave network
>>>
>>>
>>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute
>>> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050
>>> --name=test0   --networks=weave
>>> I0823 17:57:15.845304 28856 scheduler.cpp:172] Version: 1.0.0
>>> I0823 17:57:15.846248 28857 scheduler.cpp:461] New master detected at
>>> master@127.0.0.1:5050
>>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0006'
>>> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'
>>> ==> REMAINS WAITING HERE, job is in STAGING in Mesos UI
>>>
>>> mesos-slave logs:
>>>
>>> I0823 17:57:15.873872 26522 cni.cpp:716] Bind mounted
>>> '/proc/28869/ns/net' to
>>> '/run/mesos/isolators/network/cni/4f91a5df-2e9a-4cfc-93f5-aa197646db09/ns'
>>> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09
>>> I0823 17:57:16.257063 26519 cni.cpp:1109] Got assigned IPv4 address
>>> '10.32.0.1/12' from CNI network 'weave' for container
>>> 4f91a5df-2e9a-4cfc-93f5-aa197646db09
>>> I0823 17:57:16.257258 26525 cni.cpp:838] Unable to find DNS nameservers
>>> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09. Using host
>>> '/etc/resolv.conf'
>>>
>>> There are no other logs until I kill the job.
>>> We can see that Mesos container got an IP but it seems to block on DNS,
>>>
>>> Thanks for hints
>>>
>>> --
>>>
>>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>>
>>>
>>>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: Fail to get CNI with unified containerizer, job remains stuck on staging

2016-08-24 Thread Olivier Sallou



On 08/23/2016 06:13 PM, Jie Yu wrote:
> The DNS related logging means that the weave plugin does not return DNS
> information, the agent uses the host resolv.conf for the container. So I
> think is irrelevant to your problem.
>
> Mesos requires that executor can talk to agent. Can you see if there is a
> route from 10.32.0.1 to the agent IP?
How can I check this as task does not start ? I have exposed weave
network on host:

osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo ./weave expose
10.32.0.2
osallou@tifenn~/Development/NOSAVE/go-docker/weave $ ping 10.32.0.2
PING 10.32.0.2 (10.32.0.2) 56(84) bytes of data.
64 bytes from 10.32.0.2: icmp_seq=1 ttl=64 time=0.032 ms
64 bytes from 10.32.0.2: icmp_seq=2 ttl=64 time=0.029 ms
64 bytes from 10.32.0.2: icmp_seq=3 ttl=64 time=0.029 ms
64 bytes from 10.32.0.2: icmp_seq=4 ttl=64 time=0.031 ms

 And why is it blocking?

I am on a single host environement, so agent is on 127.0.0.1.

Olivier
>
> On Tue, Aug 23, 2016 at 9:05 AM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>> HI,
>>
>> I have setup Mesos 1.0.0-2 to use CNI with Weave (1.6.1)
>>
>> Weave works nicely with the Docker containerizer.
>>
>> When I try to launch a task via my framework with unified containerizer,
>> the job remains waiting forever (no RUNNING message). I can see however
>> that weave cni allocated an IP address to Mesos.
>>
>> I tried with a simple mesos-execute test.
>>
>> Example with a mesos-execute with no CNI, everything is OK
>>
>>
>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute
>> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050
>> --name=test0  I0823 17:56:50.067520 28815 scheduler.cpp:172] Version: 1.0.0
>> I0823 17:56:50.068260 28822 scheduler.cpp:461] New master detected at
>> master@127.0.0.1:5050
>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0005'
>> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'
>> Received status update TASK_RUNNING for task 'test0'
>>   source: SOURCE_EXECUTOR
>> Received status update TASK_FINISHED for task 'test0'
>>   message: 'Command exited with status 0'
>>
>>
>> Sample example specifying the weave network
>>
>>
>> osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute
>> --command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050
>> --name=test0   --networks=weave
>> I0823 17:57:15.845304 28856 scheduler.cpp:172] Version: 1.0.0
>> I0823 17:57:15.846248 28857 scheduler.cpp:461] New master detected at
>> master@127.0.0.1:5050
>> Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0006'
>> Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'
>> ==> REMAINS WAITING HERE, job is in STAGING in Mesos UI
>>
>> mesos-slave logs:
>>
>> I0823 17:57:15.873872 26522 cni.cpp:716] Bind mounted
>> '/proc/28869/ns/net' to
>> '/run/mesos/isolators/network/cni/4f91a5df-2e9a-4cfc-93f5-aa197646db09/ns'
>> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09
>> I0823 17:57:16.257063 26519 cni.cpp:1109] Got assigned IPv4 address
>> '10.32.0.1/12' from CNI network 'weave' for container
>> 4f91a5df-2e9a-4cfc-93f5-aa197646db09
>> I0823 17:57:16.257258 26525 cni.cpp:838] Unable to find DNS nameservers
>> for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09. Using host
>> '/etc/resolv.conf'
>>
>> There are no other logs until I kill the job.
>> We can see that Mesos container got an IP but it seems to block on DNS,
>>
>> Thanks for hints
>>
>> --
>>
>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>
>>
>>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: mesos per task monitoring/metrics

2016-08-24 Thread Olivier Sallou



On 08/23/2016 09:37 PM, Benjamin Mahler wrote:
> +jie
>
> Hi Olivier,
>
> Could you tell us what you're trying to do at a high level?
>
> I'm not familiar with cAdvisor, are you trying to generate a link to the
> cAdvisor page for a particular container?
on the web interface of my app, I propose to show real time cpu/mem
usage of the job. To do so, I indeed "link" to the cadvisor job page.
cAdvisor API is a URL with the container id. So I need to know the
container id.

I can send a request to the slave to get it, but this not really
efficient, it would be best to get the container id in TaskStatus message.

In mesos.proto there is a ContainerStatus in the TaskStatus, but it also
sends network/cgroup related info, not the container id, it would be
nice to get it here.

When we use the Docker containerizer, we have the container id info in
the TaskStatus data parameter.

Olivier
>
> Ben
>
> On Tue, Aug 23, 2016 at 7:53 AM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>>
>> On 08/23/2016 04:12 PM, haosdent wrote:
>>> Hi, @Olivier You could get the containerId from the state endpoint of
>> Mesos
>>> Agent. http://mesos.apache.org/documentation/latest/
>> endpoints/slave/state/
>> Yes, I saw that, but I expected to get it from the TaskStatus message on
>> RUNNING state change.
>>
>> With the Docker containerizer, we could get the container id in the data
>> parameter.
>>
>> Triggering the slave on each task to get its container id is a little
>> tricky and "expensive".
>>
>> Olivier
>>> On Tue, Aug 23, 2016 at 3:50 PM, Olivier Sallou <olivier.sal...@irisa.fr
>>>
>>> wrote:
>>>
>>>> One more question though. Using cgroups isolation, I can see mesos
>>>> container in cAdvisor under, for example:
>>>>
>>>> /mesos/966e0b09-f38e-497c-afb8-0133d8fb48b1
>>>>
>>>>
>>>> but where can I get the container Id
>>>> 966e0b09-f38e-497c-afb8-0133d8fb48b1 from TaskStatus ?
>>>>
>>>>
>>>> I can see in Mesos UI the job details for a URL like:
>>>>
>>>>
>>>>  var / lib / mesos / slaves / b1925e13-76db-4225-a3dc-39ce65c79b3c-S0 /
>>>> frameworks / b1925e13-76db-4225-a3dc-39ce65c79b3c- / executors  /
>>>> 274 / runs / 966e0b09-f38e-497c-afb8-0133d8fb48b1
>>>>
>>>>
>>>> I can know/find all parameters but this last one.
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Olivier
>>>>
>>>>
>>>> On 08/23/2016 09:30 AM, Olivier Sallou wrote:
>>>>> ok,
>>>>>
>>>>> activating isolation with cgroups ni slave config activates detailled
>>>> stats.
>>>>> On 08/23/2016 09:23 AM, Olivier Sallou wrote:
>>>>>> Hi,
>>>>>>
>>>>>> when switching to docker containerizer to unified containerizer, I
>> lost
>>>>>> the capacity to monitor task metrics (used cpu, used mem, ...) from
>>>>>> cAdvisor.
>>>>>>
>>>>>> I tried to get stats from /monitor/statistics.json but I do not have
>> any
>>>>>> "live" metrics:
>>>>>>
>>>>>> [{"executor_id":"271","executor_name":"Command Executor (Task: 271)
>>>>>> (Command: sh -c
>>>>>> '\/mnt\/go-dock...')","framework_id":"b1925e13-76db-
>>>> 4225-a3dc-39ce65c79b3c-","source":"271","statistics":{"
>>>> cpus_limit":1.1,"mem_limit_bytes":2130706432,"timestamp":
>>>> 1471936602.26916}}]
>>>>>> I only see reserved metrics.
>>>>>>
>>>>>>
>>>>>> Is there any specific config to get "live" monitoring.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Olivier
>>>>>>
>>>> --
>>>>
>>>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>>>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>>>
>>>>
>> --
>> Olivier Sallou
>> IRISA / University of Rennes 1
>> Campus de Beaulieu, 35000 RENNES - FRANCE
>> Tel: 02.99.84.71.95
>>
>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>
>>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Fail to get CNI with unified containerizer, job remains stuck on staging

2016-08-23 Thread Olivier Sallou

HI,

I have setup Mesos 1.0.0-2 to use CNI with Weave (1.6.1)

Weave works nicely with the Docker containerizer.

When I try to launch a task via my framework with unified containerizer,
the job remains waiting forever (no RUNNING message). I can see however
that weave cni allocated an IP address to Mesos.

I tried with a simple mesos-execute test.

Example with a mesos-execute with no CNI, everything is OK


osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute
--command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050
--name=test0  I0823 17:56:50.067520 28815 scheduler.cpp:172] Version: 1.0.0
I0823 17:56:50.068260 28822 scheduler.cpp:461] New master detected at
master@127.0.0.1:5050
Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0005'
Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'
Received status update TASK_RUNNING for task 'test0'
  source: SOURCE_EXECUTOR
Received status update TASK_FINISHED for task 'test0'
  message: 'Command exited with status 0'


Sample example specifying the weave network


osallou@tifenn~/Development/NOSAVE/go-docker/weave $ sudo mesos-execute
--command="sleep 2" -docker_image=centos:latest --master=127.0.0.1:5050
--name=test0   --networks=weave
I0823 17:57:15.845304 28856 scheduler.cpp:172] Version: 1.0.0
I0823 17:57:15.846248 28857 scheduler.cpp:461] New master detected at
master@127.0.0.1:5050
Subscribed with ID 'b1925e13-76db-4225-a3dc-39ce65c79b3c-0006'
Submitted task 'test0' to agent 'b1925e13-76db-4225-a3dc-39ce65c79b3c-S0'
==> REMAINS WAITING HERE, job is in STAGING in Mesos UI

mesos-slave logs:

I0823 17:57:15.873872 26522 cni.cpp:716] Bind mounted
'/proc/28869/ns/net' to
'/run/mesos/isolators/network/cni/4f91a5df-2e9a-4cfc-93f5-aa197646db09/ns'
for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09
I0823 17:57:16.257063 26519 cni.cpp:1109] Got assigned IPv4 address
'10.32.0.1/12' from CNI network 'weave' for container
4f91a5df-2e9a-4cfc-93f5-aa197646db09
I0823 17:57:16.257258 26525 cni.cpp:838] Unable to find DNS nameservers
for container 4f91a5df-2e9a-4cfc-93f5-aa197646db09. Using host
'/etc/resolv.conf'

There are no other logs until I kill the job.
We can see that Mesos container got an IP but it seems to block on DNS,

Thanks for hints

-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: mesos per task monitoring/metrics

2016-08-23 Thread Olivier Sallou



On 08/23/2016 04:12 PM, haosdent wrote:
> Hi, @Olivier You could get the containerId from the state endpoint of Mesos
> Agent. http://mesos.apache.org/documentation/latest/endpoints/slave/state/
Yes, I saw that, but I expected to get it from the TaskStatus message on
RUNNING state change.

With the Docker containerizer, we could get the container id in the data
parameter.

Triggering the slave on each task to get its container id is a little
tricky and "expensive".

Olivier
>
> On Tue, Aug 23, 2016 at 3:50 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>> One more question though. Using cgroups isolation, I can see mesos
>> container in cAdvisor under, for example:
>>
>> /mesos/966e0b09-f38e-497c-afb8-0133d8fb48b1
>>
>>
>> but where can I get the container Id
>> 966e0b09-f38e-497c-afb8-0133d8fb48b1 from TaskStatus ?
>>
>>
>> I can see in Mesos UI the job details for a URL like:
>>
>>
>>  var / lib / mesos / slaves / b1925e13-76db-4225-a3dc-39ce65c79b3c-S0 /
>> frameworks / b1925e13-76db-4225-a3dc-39ce65c79b3c- / executors  /
>> 274 / runs / 966e0b09-f38e-497c-afb8-0133d8fb48b1
>>
>>
>> I can know/find all parameters but this last one.
>>
>>
>> Thanks
>>
>>
>> Olivier
>>
>>
>> On 08/23/2016 09:30 AM, Olivier Sallou wrote:
>>> ok,
>>>
>>> activating isolation with cgroups ni slave config activates detailled
>> stats.
>>>
>>> On 08/23/2016 09:23 AM, Olivier Sallou wrote:
>>>> Hi,
>>>>
>>>> when switching to docker containerizer to unified containerizer, I lost
>>>> the capacity to monitor task metrics (used cpu, used mem, ...) from
>>>> cAdvisor.
>>>>
>>>> I tried to get stats from /monitor/statistics.json but I do not have any
>>>> "live" metrics:
>>>>
>>>> [{"executor_id":"271","executor_name":"Command Executor (Task: 271)
>>>> (Command: sh -c
>>>> '\/mnt\/go-dock...')","framework_id":"b1925e13-76db-
>> 4225-a3dc-39ce65c79b3c-","source":"271","statistics":{"
>> cpus_limit":1.1,"mem_limit_bytes":2130706432,"timestamp":
>> 1471936602.26916}}]
>>>> I only see reserved metrics.
>>>>
>>>>
>>>> Is there any specific config to get "live" monitoring.
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Olivier
>>>>
>> --
>>
>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>
>>
>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: How to select Containerizer? Mesos containerizer or Docker?

2016-08-23 Thread Olivier Sallou



On 08/23/2016 11:36 AM, Yu Wei wrote:
> Hi,
>
>
> Which containerizer should be used? Mesos, Docker or other?
>
> Is there any principles to help making decision?
with unified containerizer, mesos pushes for Mesos containerizer.
However, it does not support for the moment port mapping. So if you need
port mapping, you should go to Docker one.

Olivier
>
>
> Thanks,
>
>
> Jared, (??)
> Software developer
> Interested in open source software, big data, Linux
>

-- 


gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: mesos per task monitoring/metrics

2016-08-23 Thread Olivier Sallou

One more question though. Using cgroups isolation, I can see mesos
container in cAdvisor under, for example:

/mesos/966e0b09-f38e-497c-afb8-0133d8fb48b1


but where can I get the container Id
966e0b09-f38e-497c-afb8-0133d8fb48b1 from TaskStatus ?


I can see in Mesos UI the job details for a URL like:


 var / lib / mesos / slaves / b1925e13-76db-4225-a3dc-39ce65c79b3c-S0 /
frameworks / b1925e13-76db-4225-a3dc-39ce65c79b3c- / executors  /
274 / runs / 966e0b09-f38e-497c-afb8-0133d8fb48b1


I can know/find all parameters but this last one.


Thanks


Olivier


On 08/23/2016 09:30 AM, Olivier Sallou wrote:
> ok,
>
> activating isolation with cgroups ni slave config activates detailled stats.
>
>
> On 08/23/2016 09:23 AM, Olivier Sallou wrote:
>> Hi,
>>
>> when switching to docker containerizer to unified containerizer, I lost
>> the capacity to monitor task metrics (used cpu, used mem, ...) from
>> cAdvisor.
>>
>> I tried to get stats from /monitor/statistics.json but I do not have any
>> "live" metrics:
>>
>> [{"executor_id":"271","executor_name":"Command Executor (Task: 271)
>> (Command: sh -c
>> '\/mnt\/go-dock...')","framework_id":"b1925e13-76db-4225-a3dc-39ce65c79b3c-","source":"271","statistics":{"cpus_limit":1.1,"mem_limit_bytes":2130706432,"timestamp":1471936602.26916}}]
>>
>> I only see reserved metrics.
>>
>>
>> Is there any specific config to get "live" monitoring.
>>
>>
>> Thanks
>>
>> Olivier
>>

-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: mesos per task monitoring/metrics

2016-08-23 Thread Olivier Sallou

ok,

activating isolation with cgroups ni slave config activates detailled stats.


On 08/23/2016 09:23 AM, Olivier Sallou wrote:
> Hi,
>
> when switching to docker containerizer to unified containerizer, I lost
> the capacity to monitor task metrics (used cpu, used mem, ...) from
> cAdvisor.
>
> I tried to get stats from /monitor/statistics.json but I do not have any
> "live" metrics:
>
> [{"executor_id":"271","executor_name":"Command Executor (Task: 271)
> (Command: sh -c
> '\/mnt\/go-dock...')","framework_id":"b1925e13-76db-4225-a3dc-39ce65c79b3c-","source":"271","statistics":{"cpus_limit":1.1,"mem_limit_bytes":2130706432,"timestamp":1471936602.26916}}]
>
> I only see reserved metrics.
>
>
> Is there any specific config to get "live" monitoring.
>
>
> Thanks
>
> Olivier
>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

mesos per task monitoring/metrics

2016-08-23 Thread Olivier Sallou

Hi,

when switching to docker containerizer to unified containerizer, I lost
the capacity to monitor task metrics (used cpu, used mem, ...) from
cAdvisor.

I tried to get stats from /monitor/statistics.json but I do not have any
"live" metrics:

[{"executor_id":"271","executor_name":"Command Executor (Task: 271)
(Command: sh -c
'\/mnt\/go-dock...')","framework_id":"b1925e13-76db-4225-a3dc-39ce65c79b3c-","source":"271","statistics":{"cpus_limit":1.1,"mem_limit_bytes":2130706432,"timestamp":1471936602.26916}}]

I only see reserved metrics.


Is there any specific config to get "live" monitoring.


Thanks

Olivier

-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: error when using dockerinfo user network

2016-08-19 Thread Olivier Sallou

answering to myself ;-)

I use a virtualenv, and mesos package now install python lib on system. I had 
in my virtualenv the old python libs of mesos

- Mail original -
> De: "Olivier Sallou" <olivier.sal...@irisa.fr>
> À: dev@mesos.apache.org
> Envoyé: Vendredi 19 Août 2016 09:10:52
> Objet: error when using dockerinfo user network
> 
> HI,
> 
> I just upgraded to mesos 1.0 (package 1.0.0-2.0.89.ubuntu1510).
> 
> I tried to setup with Docker containerizer the use of a user defined
> network (via docker cni plugin), using python binding.
> 
> 
> I face an error:
> 
> "Unknown enum value: 4"  when setting DockerInfo network value to 4.
> 
> 
> I was previously setting to value 2 (bridge) and it works.
> 
> I can see in mesos.proto:
> 
> enum Network {
>   HOST = 1;
>   BRIDGE = 2;
>   NONE = 3;
>   USER = 4;
> }
> 
> 
> so value 4 should be ok.
> 
> Any hint?
> 
> Thanks
> 
> Olivier
> 
> --
> Olivier Sallou
> IRISA / University of Rennes 1
> Campus de Beaulieu, 35000 RENNES - FRANCE
> Tel: 02.99.84.71.95
> 
> gpg key id: 4096R/326D8438  (keyring.debian.org)
> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
> 
> 
>

error when using dockerinfo user network

2016-08-19 Thread Olivier Sallou

HI,

I just upgraded to mesos 1.0 (package 1.0.0-2.0.89.ubuntu1510).

I tried to setup with Docker containerizer the use of a user defined
network (via docker cni plugin), using python binding.


I face an error:

"Unknown enum value: 4"  when setting DockerInfo network value to 4.


I was previously setting to value 2 (bridge) and it works.

I can see in mesos.proto:

enum Network {
  HOST = 1;
  BRIDGE = 2;
  NONE = 3;
  USER = 4;
}


so value 4 should be ok.

Any hint?

Thanks

Olivier

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: cni / public port questions

2016-07-29 Thread Olivier Sallou



- Mail original -
> De: "Jie Yu" <yujie@gmail.com>
> À: "dev" <dev@mesos.apache.org>
> Cc: "Qian AZ Zhang" <zhang...@cn.ibm.com>, "Avinash Sridharan" 
> <avin...@mesosphere.io>
> Envoyé: Jeudi 28 Juillet 2016 18:41:33
> Objet: Re: cni / public port questions
> 
> you can still use bridge with CNI (you'll need to use the built-in bridge
> plugin of CNI).
> 
> Port mapping is still under development. Expecting this coming soon.

Yes, I had seen that feature ni JIRA, but was wondering if there were other 
solutions in the meanwhile. As my containers need to expose some ports to 
public, port mapping is needed for bridge. So either I keep my existing docker 
containerizer with Docker bridge, either I switch to unified with CNI and port 
management (more complex to setup and more complex to manage by framework).

I would have like not to force my framework users to use a CNI tool while 
switching my code to unified containerizer. This would complexify code upgrades 
(impacts mesos install, even for simple bridge CNI).

This means that frameworks willing to switch to unifed cont. need to continue 
to provide docker cont. for existing installations (we can't force a mesos 
admin to switch to CNI just for a framework).

Thanks

Olivier

> 
> - Jie
> 
> On Thu, Jul 28, 2016 at 2:44 AM, haosdent <haosd...@gmail.com> wrote:
> 
> > Hi, @Olivier. The port forwarding of mesos is still under implementing. You
> > could subscribe https://issues.apache.org/jira/browse/MESOS-4823 to track
> > the progress.
> >
> > On Thu, Jul 28, 2016 at 4:42 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> > wrote:
> >
> > > Hi,
> > > I am looking at using unified containerizer. As it only support host
> > mode,
> > > it needs cni.
> > > However, it is not really clear for me regarding "public" ports.
> > >
> > > If I have a container that needs to expose a port (let's say port 123),
> > > can I expose it via the Mesos API only?
> > >
> > > When I use cni, as I understood, I allocate an IP per container. If IP is
> > > routable in network, are all ports reachable (from any host / other
> > > container) ? Or should it be explicitly opened ?
> > >
> > > To be simple, can I launch a container that would expose to public (any
> > > host) only port 123 and other ports reachable only but containers in same
> > > "private network" :
> > >
> > > - container 1 expose public port 123 and private port 456 (accessible by
> > > container 2 only)
> > > - container 2 connects to container 1 port 456.
> > >
> > > For the moment, I am using the Docker containerizer with bridge mode, so
> > > exposing port was simply a matter of mapping ports. Private networks are
> > > managed by user networks of Docker.
> > >
> > >
> > > Thanks
> > >
> > > Olivier
> > >
> > >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
> >
>

Re: [Mesos 2.0] Let's talk about the future

2016-07-29 Thread Olivier Sallou

- Mail original -

> De: "Jay JN Guo" 
> À: "user" , "mesos" 
> Envoyé: Vendredi 29 Juillet 2016 09:13:20
> Objet: [Mesos 2.0] Let's talk about the future

> Hi,

> As we are all excited about release 1.0.0, it's never too early to talk about
> next big thing: Mesos 2.0.0. What major things should be done next?

> I believe there are still many features you desire in Mesos and some of them
> are already under development. I'd like to collect your minds and align the
> vision in this mail thread. For example, here are items on Mesos long term
> roadmap:

> Pluggable Fetcher
> Oversubscription for reservation: Optimistic offers
> Resource Revocation
> Pod support
> Quota chunks
> Multiple-role support for frameworks
> User namespace support
What features do you expect from this? Is it running a task/container as a 
different user on a per container basis (root in container but seen as user X 
on host)? (as expected in Docker in the future, seems it also need linux kernel 
updates) 

> Event bus
> First class resources (Cpu topology info, GPU topology info, disk speed, etc)
there was a quite recent proposal about location awareness (rack etc...) which 
also looks interesting 

> Deprecate Docker containerizer (in favor of Unified containerizer w/ Docker
> support)
while this is long term (let's keep people time to switch to unified ;-) ), 
deprecation of Docker containerizer should go with support of equivalent port 
mapping over bridge functionality as currently proposed by Docker network 
bridge mode. I know there is a track in JIRA for this feature, but without it, 
I think that you cannot drop the Docker containerizer. CNI plugins on mesos are 
important (IP per container), but should not be mandatory (more complex to 
install/setup than pure mesos). Indeed, CNI integration is not complete with 
Mesos or other frameworks (you do not fully manage ports of Calico etc... via 
Mesos, basically you only ask an IP for your container, all port rules are 
managed directly via the tool), and current Docker bridge/user mode with Mesos 
is far more easy to setup/use. 

Olivier 

> I would appreciate it if you could either share your ideas or vote on these
> items, and we will discuss it in next community sync.

> We may not have an unshakeable conclusion as container technology is evolving
> at an ever faster pace, but the whole community, especially newbies like
> myself, would profoundly benefit from a clear plan and priority for next 3-6
> months.

> Cheers,
> /Jay

cni / public port questions

2016-07-28 Thread Olivier Sallou

Hi, 
I am looking at using unified containerizer. As it only support host mode, it 
needs cni. 
However, it is not really clear for me regarding "public" ports. 

If I have a container that needs to expose a port (let's say port 123), can I 
expose it via the Mesos API only? 

When I use cni, as I understood, I allocate an IP per container. If IP is 
routable in network, are all ports reachable (from any host / other container) 
? Or should it be explicitly opened ? 

To be simple, can I launch a container that would expose to public (any host) 
only port 123 and other ports reachable only but containers in same "private 
network" : 

- container 1 expose public port 123 and private port 456 (accessible by 
container 2 only) 
- container 2 connects to container 1 port 456. 

For the moment, I am using the Docker containerizer with bridge mode, so 
exposing port was simply a matter of mapping ports. Private networks are 
managed by user networks of Docker. 


Thanks 

Olivier

Re: failed to start mesos-slave

2016-06-10 Thread Olivier Sallou



On 06/10/2016 11:43 AM, Neil Conway wrote:
> Hi Olivier,
>
> You might be running into
> https://issues.apache.org/jira/browse/MESOS-2986 . Note that Mesos
> 0.22 is quite old and is no longer supported.
certainly but upgrading mesos in production is not a daily task

upgrading to 0.22.2-0.2.62 seems to fix the issue. Thanks

>
> Neil
>
>
> On Fri, Jun 10, 2016 at 11:37 AM, Olivier Sallou
> <olivier.sal...@irisa.fr> wrote:
>> Hi,
>> I upgraded docker on one of my mesos slaves (v0.22)
>>
>> Now it fails to start with error:
>>
>> Failed to create a containerizer: Could not create DockerContainerizer:
>> Insufficient version of Docker! Please upgrade to >= 1.0.0
>>
>> Though docker is 1.11:
>>
>> docker -v
>> Docker version 1.11.2, build b9f10c9
>>
>> Any idea ?
>>
>> Thanks
>>
>> Olivier
>>
>> --
>> Olivier Sallou
>> IRISA / University of Rennes 1
>> Campus de Beaulieu, 35000 RENNES - FRANCE
>> Tel: 02.99.84.71.95
>>
>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

failed to start mesos-slave

2016-06-10 Thread Olivier Sallou

Hi,
I upgraded docker on one of my mesos slaves (v0.22)

Now it fails to start with error:

Failed to create a containerizer: Could not create DockerContainerizer:
Insufficient version of Docker! Please upgrade to >= 1.0.0

Though docker is 1.11:

docker -v
Docker version 1.11.2, build b9f10c9

Any idea ?

Thanks

Olivier

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: how to debug HTTP API

2016-06-08 Thread Olivier Sallou



On 06/07/2016 06:29 PM, Vinod Kone wrote:
> Olivier, on a side note, it's great to see that you are playing with the
> new HTTP API in python! I briefly looked at your linked code and it looks
> like you are mixing the business logic of your application and the Mesos
> API interaction in the same file. It would be great if (at some point) you
> can extract the Mesos API interaction into a python library that can be
> used by other frameworks. See other libraries (C++
> <https://github.com/apache/mesos/blob/master/include/mesos/v1/scheduler.hpp>,
> Java <https://github.com/mesosphere/mesos-rxjava>, Go
> <https://github.com/mesos/mesos-go>) for inspiration.
I will try to do this later on. For the moment I focus no reproducing my
code with the HTTP API, code is not yet clean.
Would be better indeed to extract Mesos side from business logic.

Olivier
>
> On Tue, Jun 7, 2016 at 11:46 AM, Anand Mazumdar <an...@mesosphere.io> wrote:
>
>> Olivier,
>>
>> You are missing the “task_infos” key in your “ACCEPT” call. The master
>> treats “Accept” operations with no launch tasks as declining offers
>> implicitly. I would file a followup JIRA to ensure this is logged on the
>> master (if not so).
>>
>> An example correct JSON:
>> https://gist.github.com/hatred/7325d8a4afde607ecc0f376ab62d60eb <
>> https://gist.github.com/hatred/7325d8a4afde607ecc0f376ab62d60eb>
>>
>> -anand
>>
>>> On Jun 7, 2016, at 8:38 AM, Olivier Sallou <olivier.sal...@irisa.fr>
>> wrote:
>>>
>>>
>>> On 06/07/2016 04:53 PM, Guangya Liu wrote:
>>>> So how many agent nodes are there in your cluster? If you continue
>>>> receiving offer but without getting UPDATE message, then it may be
>> caused
>>>> by that your task definition and the framework continually decline
>> offer.
>>> I have only one node (master/slave), for development. It worked fine
>>> with the python API.
>>> we see on master that it received the ACCEPT, and no DECLINE. However,
>>> as I receive no UPDATE, I suppose that mesos "drops" the ACCEPT (wrong
>>> task definition maybe), and sends new offers several seconds after I
>>> sent the ACCEPT.
>>>> Can you please share your framework code here for the logic of "Event::
>>>> OFFERS"?
>>> Code is available here:
>>>
>>>
>> https://bitbucket.org/osallou/go-docker/src/b1948063fb7f68fbc77f5de6b473d832a7dd36af/plugins/mesos.py?at=master=file-view-default
>> <
>> https://bitbucket.org/osallou/go-docker/src/b1948063fb7f68fbc77f5de6b473d832a7dd36af/plugins/mesos.py?at=master=file-view-default
>>>
>>> in method run of MesosThread, line 613
>>>
>>> Code is a little complex, as it is a port of existing code using mesos
>>> python lib.
>>>
>>> Code related to HTTP is in development, so there may be further errors,
>>> but registration is fine as well as offer messages.
>>>
>>> I have added locally a debug print to show any message received by mesos
>>> (in case I would have received an other message indicating an error),
>>> but I received no other than offer and heartbeats.
>>>
>>> If Mesos see the ACCEPT message as it appears in logs, that it should
>>> either reject it (with a different status code than 202) or send an
>>> UPDATE error message if there is an error with my task definition.
>>>
>>> Olivier
>>>> Thanks,
>>>>
>>>> Guangya
>>>>
>>>> On Tue, Jun 7, 2016 at 8:29 PM, Olivier Sallou <olivier.sal...@irisa.fr
>>>> wrote:
>>>>
>>>>> On 06/07/2016 01:59 PM, Guangya Liu wrote:
>>>>>> I can see that your framework is now holding the offer, how did you
>>>>> launch
>>>>>> task?
>>>>> I execute an HTTP POST request in Python with json content-type:
>>>>>
>>>>> {'type': 'ACCEPT',
>>>>> 'framework_id': {'value':
>> u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'},
>>>>> 'accept': {
>>>>>'operations': [
>>>>>{'type': 'LAUNCH',
>>>>>'launch': {'container': {
>>>>>'docker': {'image': u'centos:latest',
>>>>> 'force_pull_image': True, 'port_mappings': [], 'network': 2},
>>>>>'type': 1,
>>>>>'volumes': [
>>>>>{'host_path': u'/a/b', 'container_path':
>>>>> u'

Re: how to debug HTTP API

2016-06-08 Thread Olivier Sallou



On 06/07/2016 05:46 PM, Anand Mazumdar wrote:
> Olivier,
>
> You are missing the “task_infos” key in your “ACCEPT” call. The master treats 
> “Accept” operations with no launch tasks as declining offers implicitly. I 
> would file a followup JIRA to ensure this is logged on the master (if not so).
>
> An example correct JSON: 
> https://gist.github.com/hatred/7325d8a4afde607ecc0f376ab62d60eb 
> <https://gist.github.com/hatred/7325d8a4afde607ecc0f376ab62d60eb>
thanks,
example is really useful.

I supposed task "structure" was the issue, but getting no error log on
master about this was an issue and difficult to understand. Would indeed
be fine to get a master log about the issue.

Thanks!

Olivier
>
> -anand
>
>> On Jun 7, 2016, at 8:38 AM, Olivier Sallou <olivier.sal...@irisa.fr> wrote:
>>
>>
>>
>> On 06/07/2016 04:53 PM, Guangya Liu wrote:
>>> So how many agent nodes are there in your cluster? If you continue
>>> receiving offer but without getting UPDATE message, then it may be caused
>>> by that your task definition and the framework continually decline offer.
>> I have only one node (master/slave), for development. It worked fine
>> with the python API.
>> we see on master that it received the ACCEPT, and no DECLINE. However,
>> as I receive no UPDATE, I suppose that mesos "drops" the ACCEPT (wrong
>> task definition maybe), and sends new offers several seconds after I
>> sent the ACCEPT.
>>> Can you please share your framework code here for the logic of "Event::
>>> OFFERS"?
>> Code is available here:
>>
>> https://bitbucket.org/osallou/go-docker/src/b1948063fb7f68fbc77f5de6b473d832a7dd36af/plugins/mesos.py?at=master=file-view-default
>>  
>> <https://bitbucket.org/osallou/go-docker/src/b1948063fb7f68fbc77f5de6b473d832a7dd36af/plugins/mesos.py?at=master=file-view-default>
>>
>> in method run of MesosThread, line 613
>>
>> Code is a little complex, as it is a port of existing code using mesos
>> python lib.
>>
>> Code related to HTTP is in development, so there may be further errors,
>> but registration is fine as well as offer messages.
>>
>> I have added locally a debug print to show any message received by mesos
>> (in case I would have received an other message indicating an error),
>> but I received no other than offer and heartbeats.
>>
>> If Mesos see the ACCEPT message as it appears in logs, that it should
>> either reject it (with a different status code than 202) or send an
>> UPDATE error message if there is an error with my task definition.
>>
>> Olivier
>>> Thanks,
>>>
>>> Guangya
>>>
>>> On Tue, Jun 7, 2016 at 8:29 PM, Olivier Sallou <olivier.sal...@irisa.fr>
>>> wrote:
>>>
>>>> On 06/07/2016 01:59 PM, Guangya Liu wrote:
>>>>> I can see that your framework is now holding the offer, how did you
>>>> launch
>>>>> task?
>>>> I execute an HTTP POST request in Python with json content-type:
>>>>
>>>> {'type': 'ACCEPT',
>>>> 'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'},
>>>> 'accept': {
>>>>'operations': [
>>>>{'type': 'LAUNCH',
>>>>'launch': {'container': {
>>>>'docker': {'image': u'centos:latest',
>>>> 'force_pull_image': True, 'port_mappings': [], 'network': 2},
>>>>'type': 1,
>>>>'volumes': [
>>>>{'host_path': u'/a/b', 'container_path':
>>>> u'/mnt/home', 'mode': 1},
>>>>{'host_path': u'/a/b/c', 'container_path':
>>>> u'/mnt/go-docker', 'mode': 1},
>>>>{'host_path': u'/b/c/d', 'container_path':
>>>> u'/mnt/god-data', 'mode': 2}
>>>>]
>>>>},
>>>>'name': u'testr',
>>>>'task_id': {'value': '128'},
>>>>'command': {'uris': [{'value':
>>>> u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'},
>>>>'slave_id': {'value':
>>>> u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'},
>>>>'resources': [
>>>>{'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'},
>>>>{'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'}
>>>>]
>>>>} # end launch
>>>>} # e

Re: how to debug HTTP API

2016-06-07 Thread Olivier Sallou



On 06/07/2016 04:53 PM, Guangya Liu wrote:
> So how many agent nodes are there in your cluster? If you continue
> receiving offer but without getting UPDATE message, then it may be caused
> by that your task definition and the framework continually decline offer.
I have only one node (master/slave), for development. It worked fine
with the python API.
we see on master that it received the ACCEPT, and no DECLINE. However,
as I receive no UPDATE, I suppose that mesos "drops" the ACCEPT (wrong
task definition maybe), and sends new offers several seconds after I
sent the ACCEPT.
>
> Can you please share your framework code here for the logic of "Event::
> OFFERS"?
Code is available here:

https://bitbucket.org/osallou/go-docker/src/b1948063fb7f68fbc77f5de6b473d832a7dd36af/plugins/mesos.py?at=master=file-view-default

in method run of MesosThread, line 613

Code is a little complex, as it is a port of existing code using mesos
python lib.

Code related to HTTP is in development, so there may be further errors,
but registration is fine as well as offer messages.

I have added locally a debug print to show any message received by mesos
(in case I would have received an other message indicating an error),
but I received no other than offer and heartbeats.

If Mesos see the ACCEPT message as it appears in logs, that it should
either reject it (with a different status code than 202) or send an
UPDATE error message if there is an error with my task definition.

Olivier
>
> Thanks,
>
> Guangya
>
> On Tue, Jun 7, 2016 at 8:29 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>>
>> On 06/07/2016 01:59 PM, Guangya Liu wrote:
>>> I can see that your framework is now holding the offer, how did you
>> launch
>>> task?
>> I execute an HTTP POST request in Python with json content-type:
>>
>>  {'type': 'ACCEPT',
>> 'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'},
>> 'accept': {
>> 'operations': [
>> {'type': 'LAUNCH',
>> 'launch': {'container': {
>> 'docker': {'image': u'centos:latest',
>> 'force_pull_image': True, 'port_mappings': [], 'network': 2},
>> 'type': 1,
>> 'volumes': [
>> {'host_path': u'/a/b', 'container_path':
>> u'/mnt/home', 'mode': 1},
>> {'host_path': u'/a/b/c', 'container_path':
>> u'/mnt/go-docker', 'mode': 1},
>> {'host_path': u'/b/c/d', 'container_path':
>> u'/mnt/god-data', 'mode': 2}
>> ]
>> },
>> 'name': u'testr',
>> 'task_id': {'value': '128'},
>> 'command': {'uris': [{'value':
>> u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'},
>> 'slave_id': {'value':
>> u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'},
>> 'resources': [
>> {'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'},
>> {'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'}
>> ]
>> } # end launch
>> } # end operation
>> ],
>> 'offer_ids': [{'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28'}]
>> }
>> }
>>
>> We can see that Mesos received the ACCEPT:
>>
>> I0607 11:45:15.873584 14896 master.cpp:3104] Processing ACCEPT call for
>> offers: [ e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28 ] on slave
>> e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0 at slave(1)@127.0.1.1:5051
>> (tifenn.irisa.fr) for framework
>>
>>
>> and I continue to receive new offers, so "connection" is OK. I should
>> receive an UPDATE message even if there is an error, but I receive none
>> (I track/log all messages received, whatever the type).
>>
>> Olivier
>>
>>>  Perhaps you can take a look at
>>> https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L311
>> which
>>> is an example framework using HTTP API
>>>
>>> Thanks,
>>>
>>> Guangya
>>>
>>> On Tue, Jun 7, 2016 at 7:19 PM, Olivier Sallou <olivier.sal...@irisa.fr>
>>> wrote:
>>>
>>>> On 06/07/2016 12:25 PM, Guangya Liu wrote:
>>>>> Olivier,
>>>>>
>>>>> For such case, seems there is sth wrong with your framework? can you
>>>> please
>>>>> run the following two commands and check the output?
>>>> I don't think it is a framework issue, I receive offers, heartbeats
>> etc...
>>>> It is only at task creation step, when I have no rejection nor up

Re: how to debug HTTP API

2016-06-07 Thread Olivier Sallou



On 06/07/2016 01:59 PM, Guangya Liu wrote:
> I can see that your framework is now holding the offer, how did you launch
> task?

I execute an HTTP POST request in Python with json content-type:

 {'type': 'ACCEPT',
'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'},
'accept': {
'operations': [
{'type': 'LAUNCH',
'launch': {'container': {
'docker': {'image': u'centos:latest',
'force_pull_image': True, 'port_mappings': [], 'network': 2},
'type': 1,
'volumes': [
{'host_path': u'/a/b', 'container_path':
u'/mnt/home', 'mode': 1},
{'host_path': u'/a/b/c', 'container_path':
u'/mnt/go-docker', 'mode': 1},
{'host_path': u'/b/c/d', 'container_path':
u'/mnt/god-data', 'mode': 2}
]
},
'name': u'testr',
'task_id': {'value': '128'},
'command': {'uris': [{'value':
u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'},
'slave_id': {'value':
u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'},
'resources': [
{'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'},
{'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'}
]
} # end launch
} # end operation
],
'offer_ids': [{'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28'}]
}
}

We can see that Mesos received the ACCEPT:

I0607 11:45:15.873584 14896 master.cpp:3104] Processing ACCEPT call for
offers: [ e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28 ] on slave
e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0 at slave(1)@127.0.1.1:5051
(tifenn.irisa.fr) for framework


and I continue to receive new offers, so "connection" is OK. I should
receive an UPDATE message even if there is an error, but I receive none
(I track/log all messages received, whatever the type).

Olivier

>  Perhaps you can take a look at
> https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L311 which
> is an example framework using HTTP API
>
> Thanks,
>
> Guangya
>
> On Tue, Jun 7, 2016 at 7:19 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>>
>> On 06/07/2016 12:25 PM, Guangya Liu wrote:
>>> Olivier,
>>>
>>> For such case, seems there is sth wrong with your framework? can you
>> please
>>> run the following two commands and check the output?
>> I don't think it is a framework issue, I receive offers, heartbeats etc...
>> It is only at task creation step, when I have no rejection nor update
>> message.
>>
>> It could be (certainly) an issue with the json task message I sent in
>> the ACCEPT, but as there is no error, I have no way to understand what's
>> wrong with it.
>>> curl "http://:5050/master/frameworks" 2>/dev/null|python
>> -m
>>> json.tool
>> {
>> "completed_frameworks": [],
>> "frameworks": [
>> {
>> "active": true,
>> "capabilities": [],
>> "checkpoint": false,
>> "completed_tasks": [],
>> "executors": [],
>> "failover_timeout": 0.0,
>> "hostname": "",
>> "id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0021",
>> "name": "GoDocker HTTP Framework",
>> "offered_resources": {
>> "cpus": 4.0,
>> "disk": 459470.0,
>> "mem": 14898.0,
>> "ports": "[31000-32000]"
>> },
>> "offers": [
>> {
>> "framework_id":
>> "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0021",
>> "id": "1f1486e3-43ee-44c5-b073-82a901add956-O0",
>> "resources": {
>> "cpus": 4.0,
>> "disk": 459470.0,
>> "mem": 14898.0,
>> "ports": "[31000-32000]"
>> },
>> "slave_id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0"
>> }
>> ],
>> "registered_time": 1465298174.2483,
>> "resources": {
>> "cpus": 4.0,
>> "disk": 459470.0,
>> "mem

Re: how to debug HTTP API

2016-06-07 Thread Olivier Sallou

": "drf",
"version": "false",
"webui_dir": "/usr/share/mesos/webui",
"work_dir": "/var/lib/mesos",
"zk": "zk://localhost:2181/mesos",
"zk_session_timeout": "10secs"
},
"frameworks": [
{
"active": true,
"capabilities": [],
"checkpoint": false,
"completed_tasks": [],
"executors": [],
"failover_timeout": 0.0,
"hostname": "",
"id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0021",
"name": "GoDocker HTTP Framework",
"offered_resources": {
"cpus": 0.0,
"disk": 0.0,
"mem": 0.0
},
"offers": [],
"registered_time": 1465298174.2483,
"resources": {
"cpus": 0.0,
"disk": 0.0,
"mem": 0.0
},
"role": "*",
"tasks": [],
"unregistered_time": 0.0,
"used_resources": {
"cpus": 0.0,
"disk": 0.0,
    "mem": 0.0
},
"user": "godocker_http_test",
"webui_url": ""
}
],
"git_sha": "555db235a34afbb9fb49940376cc33a66f1f85f0",
"git_tag": "0.28.1",
"hostname": "tifenn.irisa.fr",
"id": "1f1486e3-43ee-44c5-b073-82a901add956",
"leader": "master@127.0.1.1:5050",
"log_dir": "/var/log/mesos",
"orphan_tasks": [],
"pid": "master@127.0.1.1:5050",
"slaves": [
{
"active": true,
"attributes": {
"hostname": "127.0.0.1"
},
"hostname": "tifenn.irisa.fr",
"id": "e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0",
"offered_resources": {
"cpus": 0.0,
"disk": 0.0,
"mem": 0.0
},
"pid": "slave(1)@127.0.1.1:5051",
"registered_time": 1465298164.37517,
"reregistered_time": 1465298164.37526,
"reserved_resources": {},
"resources": {
"cpus": 4.0,
"disk": 459470.0,
"mem": 14898.0,
"ports": "[31000-32000]"
},
"unreserved_resources": {
"cpus": 4.0,
"disk": 459470.0,
"mem": 14898.0,
"ports": "[31000-32000]"
},
"used_resources": {
"cpus": 0.0,
"disk": 0.0,
"mem": 0.0
},
"version": "0.28.1"
}
],
"start_time": 1465298159.26321,
"unregistered_frameworks": [],
"version": "0.28.1"
}


>
> Thanks,
>
> Guangya
>
> On Tue, Jun 7, 2016 at 6:04 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>> Hi,
>> I am trying to switch from Python to HTTP API. I use mesos 0.28.1
>>
>> I could create framework to register, receive offers etc...  but I have
>> an issue accepting offers.
>>
>> I send my ACCEPT message but I do not receive any UPDATE message, only
>> new offers and hearbeat messages.
>>
>> On mesos master logs I see:
>>
>> I0607 11:45:15.873184 14896 http.cpp:312] HTTP POST for
>> /master/api/v1/scheduler from 127.0.0.1:38298 with
>> User-Agent='python-requests/2.9.1'
>> I0607 11:45:15.873584 14896 master.cpp:3104] Processing ACCEPT call for
>> offers: [ e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28 ] on slave
>> e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0 at slave(1)@127.0.1.1:5051
>> (tifenn.irisa.fr) for framework
>> e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020 (GoDocker HTTP Framework)
>>
>> There is a "Processing ACCEPT" and no error, but my task is not ran on
>> mesos.
>> No error on slave either.
>>
>> Response code to my ACCEPT is 202 as expected.
>>
>> Here is my HTTP json message:
>>
>> {'type': 'ACCEPT',
>> 'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'},
>> 'accept': {
>> 'operations': [
>> {'type': 'LAUNCH',
>> 'launch': {'container': {
>> 'docker': {'image': u'centos:latest',
>> 'force_pull_image': True, 'port_mappings': [], 'network': 2},
>> 'type': 1,
>> 'volumes': [
>> {'host_path': u'/a/b', 'container_path':
>> u'/mnt/home', 'mode': 1},
>> {'host_path': u'/a/b/c', 'container_path':
>> u'/mnt/go-docker', 'mode': 1},
>> {'host_path': u'/b/c/d', 'container_path':
>> u'/mnt/god-data', 'mode': 2}
>> ]
>> },
>> 'name': u'testr',
>> 'task_id': {'value': '128'},
>> 'command': {'uris': [{'value':
>> u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'},
>> 'slave_id': {'value':
>> u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'},
>> 'resources': [
>> {'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'},
>> {'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'}
>> ]
>> } # end launch
>> } # end operation
>> ],
>> 'offer_ids': [{'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28'}]
>> }
>> }
>>
>> There could be an issue with my task definition, but as no error is
>> raised and I receive no UPDATE error message.
>>
>> Any hint on how to debug this?
>>
>> Thanks
>>
>>
>> --
>> Olivier Sallou
>> IRISA / University of Rennes 1
>> Campus de Beaulieu, 35000 RENNES - FRANCE
>> Tel: 02.99.84.71.95
>>
>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>
>>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

how to debug HTTP API

2016-06-07 Thread Olivier Sallou

Hi,
I am trying to switch from Python to HTTP API. I use mesos 0.28.1

I could create framework to register, receive offers etc...  but I have
an issue accepting offers.

I send my ACCEPT message but I do not receive any UPDATE message, only
new offers and hearbeat messages.

On mesos master logs I see:

I0607 11:45:15.873184 14896 http.cpp:312] HTTP POST for
/master/api/v1/scheduler from 127.0.0.1:38298 with
User-Agent='python-requests/2.9.1'
I0607 11:45:15.873584 14896 master.cpp:3104] Processing ACCEPT call for
offers: [ e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28 ] on slave
e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0 at slave(1)@127.0.1.1:5051
(tifenn.irisa.fr) for framework
e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020 (GoDocker HTTP Framework)

There is a "Processing ACCEPT" and no error, but my task is not ran on
mesos.
No error on slave either.

Response code to my ACCEPT is 202 as expected.

Here is my HTTP json message:

{'type': 'ACCEPT',
'framework_id': {'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-0020'},
'accept': {
'operations': [
{'type': 'LAUNCH',
'launch': {'container': {
'docker': {'image': u'centos:latest',
'force_pull_image': True, 'port_mappings': [], 'network': 2},
'type': 1,
'volumes': [
{'host_path': u'/a/b', 'container_path':
u'/mnt/home', 'mode': 1},
{'host_path': u'/a/b/c', 'container_path':
u'/mnt/go-docker', 'mode': 1},
{'host_path': u'/b/c/d', 'container_path':
u'/mnt/god-data', 'mode': 2}
]
},
'name': u'testr',
'task_id': {'value': '128'},
'command': {'uris': [{'value':
u'/home/osallou/docker.tar.gz'}], 'value': u'/mnt/go-docker/wrapper.sh'},
'slave_id': {'value':
u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-S0'},
'resources': [
{'scalar': {'value': 1}, 'type': 0, 'name': 'cpus'},
{'scalar': {'value': 2000}, 'type': 0, 'name': 'mem'}
]
} # end launch
} # end operation
],
'offer_ids': [{'value': u'e303a1f0-4e7c-4c32-aafc-8707ea2b2718-O28'}]
}
}

There could be an issue with my task definition, but as no error is
raised and I receive no UPDATE error message.

Any hint on how to debug this?

Thanks


-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: volume / mount point error with Unified Containerizer

2016-05-23 Thread Olivier Sallou



- Mail original -
> De: "Guangya Liu" <gyliu...@gmail.com>
> À: "dev" <dev@mesos.apache.org>, "Jie Yu" <j...@mesosphere.io>, "Gilbert 
> Song" <gilb...@mesosphere.io>
> Envoyé: Lundi 23 Mai 2016 17:34:41
> Objet: Re: volume / mount point error with Unified Containerizer
> 
> It is a bit strange to me, I also did some test and review code for
> relative path, and found that relative path works well.
> 
> In 0.28.1, if deploy a docker container with MesosContaineirizer, then if
> using absolute path as continer_path, the mesos agent will update the
> container_path to a relative path by adding a prefix ./rootfs to the
> container_path, e.g. /file/path = > ./rootfs/file/path.
> 
> If deploy a docker container with MesosContaineirizer with relative path as
> container_path, then the mesos agent will not update the container_path.
> 
> So the final mount point for the container should be either
> 
> 1) /tmp/mesos/slaves/agent_id/frameworks/framework_id/
> executors/51/runs/container_id/.rootfs/file/path
> 2) /tmp/mesos/slaves/agent_id/frameworks/framework_id/executors/51/runs/
> container_id/file/path
> 
> The only difference is adding ./rootfs as a prefix or not, the test result
> is that 1) does not work and 2) works well. And even the mount for 1)
> failed, but I can see the mount point path does exist.
> 

@Guangya
I confirm that using relative path works fine, I get volumes in mesos path (but 
it does not help for my implementation).
If I use the Docker containerizer, absolute paths are fine, this is what I use 
for the moment in my code, and am investigating to switch to unified container.


> @Yu Jie and @Gilbert, any comments for this?
> 
> @Oilivier,
> 
> In order not to block your test, can you please use mesos after 0.28.1? You
> can use either 0.28.2 or above version.

Well, as this is not an urgent matter, I am waiting 0.29 to test against this 
release (with other features I am waiting for).

> 
> Thanks,
> 
> Guangya
> 
> 
> On Mon, May 23, 2016 at 10:30 PM, Guangya Liu <gyliu...@gmail.com> wrote:
> 
> > Thanks Olivier, I can reproduce this issue now and still checking what is
> > wrong.
> >
> > What I did is as following:
> > 1)  Check out code with tag of 0.28.1
> > 2) update mesos-execute to add a host path volume
> > diff --git a/src/cli/execute.cpp b/src/cli/execute.cpp
> > index 81a0388..0ff913c 100644
> > --- a/src/cli/execute.cpp
> > +++ b/src/cli/execute.cpp
> > @@ -72,6 +72,8 @@ using mesos::v1::TaskID;
> >  using mesos::v1::TaskInfo;
> >  using mesos::v1::TaskState;
> >  using mesos::v1::TaskStatus;
> > +using mesos::v1::Volume;
> > +using mesos::v1::Parameters;
> >
> >  using mesos::v1::scheduler::Call;
> >  using mesos::v1::scheduler::Event;
> > @@ -572,6 +574,12 @@ private:
> >  }
> >}
> >
> > +  Volume* volume1 = containerInfo.add_volumes();
> > +  volume1->set_container_path("/tmp/abcd");
> > +  volume1->set_mode(Volume::RW);
> > +  volume1->set_host_path("/root/convoy");
> > +   cout << "Add Voume 1" << endl;
> > +
> >return containerInfo;
> >  } else if (containerizer == "docker") {
> >// 'docker' containerizer only supports 'docker' images.
> > 3) launch a task with docker image, task failed.
> >
> > 4) Check sandbox:
> > + /root/src/mesos/m1/mesos/build/src/mesos-containerizer mount
> > --help=false --operation=make-rslave --path=/
> > + grep -E /tmp/mesos/.+ /proc/self/mountinfo
> > + grep -v 3239aafc-78d8-4f70-81e5-f32fb379
> > + cut+  -d  -f5
> > xargs --no-run-if-empty umount -l
> > + mount -n --rbind
> > /tmp/mesos/provisioner/containers/3239aafc-78d8-4f70-81e5-f32fb379/backends/copy/rootfses/5e8bf3fa-53b1-4bd5-bb3d-525ddc7900b6
> > /tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs
> > + mount -n --rbind /root/convoy
> > /tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs/tmp/abcd
> > mount: mount point
> > /tmp/mesos/slaves/a4294ed5-10e8-47db-a3b9-a43a4f951374-S0/frameworks/a4294ed5-10e8-47db-a3b9-a43a4f951374-/executors/test/runs/3239aafc-78d8-4f70-81e5-f32fb379/.rootfs/tmp/abcd
> > does not exist
> > Failed to execute a preparation shell command
> >
> > Will check more for this

Re: volume / mount point error with Unified Containerizer

2016-05-23 Thread Olivier Sallou



On 05/23/2016 09:33 AM, Olivier Sallou wrote:
>
> On 05/20/2016 03:26 PM, Guangya Liu wrote:
>> Since you are using docker image which means that your container will have
>> rootfs, so it is not required to have the absolute path exist, the linux
>> file system isolator will help create the path automatically
>> https://github.com/apache/mesos/blob/0.28.x/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L390-L402
>>
>> Can you please share your framework? How did you set the volume part in
>> your framework?
> @Guangya
>
> I use Python API.
>
> Here is related code:
>
> 
>  # Define container volumes
>  for v in job['container']['volumes']:
> volume = container.volumes.add()
> volume.container_path = v['mount']
> volume.host_path = v['path']
> if v['acl'] == 'rw':
> volume.mode = 1 # mesos_pb2.Volume.Mode.RW
> else:
> volume.mode = 2 # mesos_pb2.Volume.Mode.RO
>
> => In my test case, I add 2 volumes from a host shared directory,
> mounted in container as /mnt/go-docker and /mnt/god-data.
>
> ...
> # Define docker  image and network
> docker = mesos_pb2.ContainerInfo.MesosInfo()
> docker.image.type = 2 # Docker
> docker.image.docker.name ='centos:latest'
> # Request an IP from a network module
> network_info = container.network_infos.add()
> network_info_name = 'sampletest'
> # Get an IP V4 address
> ip_address = network_info.ip_addresses.add()
> ip_address.protocol = 1
> # The network group to join
> group = network_info.groups.append(network_info_name)
> port_list = [22]
> if port_list:
> for port in port_list:
> job['container']['port_mapping'].append({'host':
> port, 'container': port})
> container.mesos.MergeFrom(docker)
>
> It results in error message:
>
> + mount -n --rbind
> /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs
> + mount -n --rbind
> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
> mount: mount point
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
> does not exist
> Failed to execute a preparation shell command
>
>
> We can see the  .rootfs, but mnt/god-data under .rootfs fails. Local
> directory exists, it does not pre-exists in container. What is strange
> is , if I look in mesos task dir, .rootfs/mnt/go-data, directory is present.
>
> Or, is the error message (.rootfs/mnt/god-data does not exist) simply a
> warning, and it creates it, and final error (Failed to execute a
> preparation shell command) not related (and unclear...)
Additional info: command to execute in container is located in one of
mounted volume.
>
> Olivier
>> Thanks,
>>
>> Guangya
>>
>> On Fri, May 20, 2016 at 4:54 AM, Olivier Sallou <olivier.sal...@irisa.fr>
>> wrote:
>>
>>> - Mail original -
>>>> De: "Gilbert Song" <gilb...@mesosphere.io>
>>>> À: "dev" <dev@mesos.apache.org>
>>>> Envoyé: Jeudi 19 Mai 2016 01:57:16
>>>> Objet: Re: volume / mount point error with Unified Containerizer
>>>>
>>>> @Olivier,
>>>> In mesos 0.28.1, you are supposed to be able bind mount a volume from
>>>> the host into the mesos container. Did you specify a docker image (we
>>>> determine
>>>> the mount point differently depending whether the container has a
>>> rootfs)?
>>>
>>> Yes I specified an image, a Docker image URI.
>>>
>>>> How
>>>> do you specify your 'container_path' (the mount point in the container)?
>>> If
>>>> it is an
>>>> absolute path, we require that dir to be pre-existed. If it is a relative
>>>> path, we will
>>>> mkdir for it.
>>> It is an absolute path,

Re: volume / mount point error with Unified Containerizer

2016-05-23 Thread Olivier Sallou



On 05/20/2016 03:26 PM, Guangya Liu wrote:
> Since you are using docker image which means that your container will have
> rootfs, so it is not required to have the absolute path exist, the linux
> file system isolator will help create the path automatically
> https://github.com/apache/mesos/blob/0.28.x/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L390-L402
>
> Can you please share your framework? How did you set the volume part in
> your framework?
@Guangya

I use Python API.

Here is related code:


 # Define container volumes
 for v in job['container']['volumes']:
volume = container.volumes.add()
volume.container_path = v['mount']
volume.host_path = v['path']
if v['acl'] == 'rw':
volume.mode = 1 # mesos_pb2.Volume.Mode.RW
else:
volume.mode = 2 # mesos_pb2.Volume.Mode.RO

=> In my test case, I add 2 volumes from a host shared directory,
mounted in container as /mnt/go-docker and /mnt/god-data.

...
# Define docker  image and network
docker = mesos_pb2.ContainerInfo.MesosInfo()
docker.image.type = 2 # Docker
docker.image.docker.name ='centos:latest'
# Request an IP from a network module
network_info = container.network_infos.add()
network_info_name = 'sampletest'
# Get an IP V4 address
ip_address = network_info.ip_addresses.add()
ip_address.protocol = 1
# The network group to join
group = network_info.groups.append(network_info_name)
port_list = [22]
if port_list:
for port in port_list:
job['container']['port_mapping'].append({'host':
port, 'container': port})
container.mesos.MergeFrom(docker)

It results in error message:

+ mount -n --rbind
/tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f
/tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs
+ mount -n --rbind
/home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
/tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
mount: mount point
/tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
does not exist
Failed to execute a preparation shell command


We can see the  .rootfs, but mnt/god-data under .rootfs fails. Local
directory exists, it does not pre-exists in container. What is strange
is , if I look in mesos task dir, .rootfs/mnt/go-data, directory is present.

Or, is the error message (.rootfs/mnt/god-data does not exist) simply a
warning, and it creates it, and final error (Failed to execute a
preparation shell command) not related (and unclear...)

Olivier
>
> Thanks,
>
> Guangya
>
> On Fri, May 20, 2016 at 4:54 AM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>>
>> - Mail original -
>>> De: "Gilbert Song" <gilb...@mesosphere.io>
>>> À: "dev" <dev@mesos.apache.org>
>>> Envoyé: Jeudi 19 Mai 2016 01:57:16
>>> Objet: Re: volume / mount point error with Unified Containerizer
>>>
>>> @Olivier,
>>> In mesos 0.28.1, you are supposed to be able bind mount a volume from
>>> the host into the mesos container. Did you specify a docker image (we
>>> determine
>>> the mount point differently depending whether the container has a
>> rootfs)?
>>
>> Yes I specified an image, a Docker image URI.
>>
>>> How
>>> do you specify your 'container_path' (the mount point in the container)?
>> If
>>> it is an
>>> absolute path, we require that dir to be pre-existed. If it is a relative
>>> path, we will
>>> mkdir for it.
>> It is an absolute path, but it does not exists in image (this is the
>> issue). Images are custom Docker images (images containing tools for batch
>> computing), and I want, for example, to mount some shared resources (user
>> home dir, common data, etc.) in the image. Of course those directories do
>> not pre-exists in container images as they are specific to the environment.
>> Requiring existence of the directory in the image is not issue as it
>> prevents using any existing image from a repo.
>>
>> When using Docker containerizer it works fine, I

Re: volume / mount point error with Unified Containerizer

2016-05-19 Thread Olivier Sallou



- Mail original -
> De: "Gilbert Song" <gilb...@mesosphere.io>
> À: "dev" <dev@mesos.apache.org>
> Envoyé: Jeudi 19 Mai 2016 01:57:16
> Objet: Re: volume / mount point error with Unified Containerizer
> 
> @Olivier,
> In mesos 0.28.1, you are supposed to be able bind mount a volume from
> the host into the mesos container. Did you specify a docker image (we
> determine
> the mount point differently depending whether the container has a rootfs)?

Yes I specified an image, a Docker image URI.

> How
> do you specify your 'container_path' (the mount point in the container)? If
> it is an
> absolute path, we require that dir to be pre-existed. If it is a relative
> path, we will
> mkdir for it.

It is an absolute path, but it does not exists in image (this is the issue). 
Images are custom Docker images (images containing tools for batch computing), 
and I want, for example, to mount some shared resources (user home dir, common 
data, etc.) in the image. Of course those directories do not pre-exists in 
container images as they are specific to the environment. Requiring existence 
of the directory in the image is not issue as it prevents using any existing 
image from a repo.

When using Docker containerizer it works fine, I can mount any external storage 
in the container.

Olivie


> 
> @Joshua,
> Thank for posting your workaround on mesos. As I mentioned above, in 0.28.1
> or
> older, we only mkdir for container_path which is relative path (not
> starting with "/").
> Because if no rootfs specified for a mesos container, the container shares
> the host
> root filesystem. Obviously we don't want any random files to be created
> implicitly
> on your host fs.
> From mesos 0.29 (release by the end of this month), we will mkdir the mount
> point in the container except for the command task case that specify an
> absolute
> container_path without a rootfs. Because we simplify the mounting logic, and
> sandbox bind mount will only be done in container mount namespace instead of
> host mount namespace (what we did before). Please keep tuned.
> 
> Cheers,
> Gilbert
> 
> On Wed, May 18, 2016 at 8:14 AM, Joshua Cohen <jco...@apache.org> wrote:
> 
> > Hi Olivier,
> >
> > I touched on this issue as part of
> > https://issues.apache.org/jira/browse/MESOS-5229. It would be nice if
> > Mesos
> > automatically created container mount points if they don't already exist.
> > In the meantime, as a workaround for this, I've updated my filesystem
> > images to include the path (e.g. in Dockerfile, add `RUN mkdir -p
> > /some/mount/point`). Not the best solution, but the only thing I've seen
> > that works at the moment.
> >
> > Cheers,
> >
> > Joshua
> >
> > On Wed, May 18, 2016 at 7:36 AM, Guangya Liu <gyliu...@gmail.com> wrote:
> >
> > > It's pretty simple for you from scratch with source code
> > >
> > >
> > https://github.com/apache/mesos/blob/master/docs/getting-started.md#building-mesos
> > > ;-)
> > >
> > > Thanks,
> > >
> > > Guangya
> > >
> > > On Wed, May 18, 2016 at 8:30 PM, Olivier Sallou <olivier.sal...@irisa.fr
> > >
> > > wrote:
> > >
> > > >
> > > >
> > > > On 05/18/2016 02:31 PM, Guangya Liu wrote:
> > > > > Just saw that you are working with 0.28.1, the "docker volume driver"
> > > > code
> > > > > was not in 0.28.1, can you please have a try with mesos master branch
> > > if
> > > > > you are only doing some test?
> > > > this is indeed test only for the moment. But I will have to
> > > > recompile/install mesos  :-(  (I used packages for install).
> > > >
> > > > I will try when possible, but thanks for the hint.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Guangya
> > > > >
> > > > > On Wed, May 18, 2016 at 8:28 PM, Guangya Liu <gyliu...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Hi Olivier,
> > > > >>
> > > > >> I think that you need to enable "docker volume isolator" if you want
> > > use
> > > > >> external storage with unified container I was writing a document
> > here
> > > > >> https://reviews.apache.org/r/47511/, perhaps you can have a try
> > > > according
> > > > >> to the document and post some comments there if you find any issues.
> > > > >>
>

Re: volume / mount point error with Unified Containerizer

2016-05-18 Thread Olivier Sallou



On 05/18/2016 02:31 PM, Guangya Liu wrote:
> Just saw that you are working with 0.28.1, the "docker volume driver" code
> was not in 0.28.1, can you please have a try with mesos master branch if
> you are only doing some test?
this is indeed test only for the moment. But I will have to
recompile/install mesos  :-(  (I used packages for install).

I will try when possible, but thanks for the hint.
>
> Thanks,
>
> Guangya
>
> On Wed, May 18, 2016 at 8:28 PM, Guangya Liu <gyliu...@gmail.com> wrote:
>
>> Hi Olivier,
>>
>> I think that you need to enable "docker volume isolator" if you want use
>> external storage with unified container I was writing a document here
>> https://reviews.apache.org/r/47511/, perhaps you can have a try according
>> to the document and post some comments there if you find any issues.
>>
>> Also you can patch mesos-execute here https://reviews.apache.org/r/46762/ to
>> have a try with mesos-execute.
>>
>> Thanks,
>>
>> Guangya
>>
>> On Wed, May 18, 2016 at 7:17 PM, Olivier Sallou <olivier.sal...@irisa.fr>
>> wrote:
>>
>>> Answering (partially) to myself.
>>>
>>> I seems issue is container_path does not exists inside container. On
>>> Docker, path is created and mounted. With pure mesos, container_path
>>> must exists.
>>>
>>> mesos.proto says: "If the path is an absolute path, that path must
>>> already exist."
>>>
>>> This is an issue however, using Docker images, the path I want to mount
>>> does not exists, and it cannot be modified "on the fly".
>>>
>>> Is there a workaround for this ?
>>>
>>>
>>> On 05/18/2016 12:24 PM, Olivier Sallou wrote:
>>>> Hi,
>>>> I am trying unified containerizer on a single server (master/slave) on
>>>> mesos 0.28.1, to switch from docker containerizer to mesos+docker image
>>>> container.
>>>>
>>>> I have setup slave config as suggested in documentation:
>>>>
>>>> containerizers=docker,mesos
>>>> image_providers=docker \
>>>> isolation=filesystem/linux,docker/runtime
>>>>
>>>> However, when I execute my task with a volume I have an error:
>>>>
>>>> 
>>>> + mount -n --rbind
>>>>
>>> /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f
>>> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs
>>>> + mount -n --rbind
>>>>
>>> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
>>> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
>>>> mount: mount point
>>>>
>>> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
>>>> does not exist
>>>> Failed to execute a preparation shell command
>>>>
>>>> Then, my task switches to FAILED.
>>>>
>>>> I define a local volume to bind mount in my "container"
>>>>
>>> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
>>>> => /mnt/god-data
>>>> My directory exists on local server.
>>>> In mesos UI, I can see the .rootfs directory along stdout and stderr
>>>> files, and inside .rootfs, I can see /mnt/god-data (empty).
>>>>
>>>> Running the same using Docker containerizer instead of mesos
>>>> containerizer (with a Docker image) works fine.
>>>>
>>>> It seems it fails to mount my local directory in the container. Any idea
>>>> of what is going wrong or how to debug this?
>>>>
>>>>
>>>> Thanks
>>>>
>>> --
>>> Olivier Sallou
>>> IRISA / University of Rennes 1
>>> Campus de Beaulieu, 35000 RENNES - FRANCE
>>> Tel: 02.99.84.71.95
>>>
>>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>>
>>>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: volume / mount point error with Unified Containerizer

2016-05-18 Thread Olivier Sallou

Answering (partially) to myself.

I seems issue is container_path does not exists inside container. On
Docker, path is created and mounted. With pure mesos, container_path
must exists.

mesos.proto says: "If the path is an absolute path, that path must
already exist."

This is an issue however, using Docker images, the path I want to mount
does not exists, and it cannot be modified "on the fly".

Is there a workaround for this ?


On 05/18/2016 12:24 PM, Olivier Sallou wrote:
> Hi,
> I am trying unified containerizer on a single server (master/slave) on
> mesos 0.28.1, to switch from docker containerizer to mesos+docker image
> container.
>
> I have setup slave config as suggested in documentation:
>
> containerizers=docker,mesos
> image_providers=docker \
> isolation=filesystem/linux,docker/runtime
>
> However, when I execute my task with a volume I have an error:
>
> 
> + mount -n --rbind
> /tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs
> + mount -n --rbind
> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
> mount: mount point
> /tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
> does not exist
> Failed to execute a preparation shell command
>
> Then, my task switches to FAILED.
>
> I define a local volume to bind mount in my "container"
> /home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
> => /mnt/god-data
> My directory exists on local server.
> In mesos UI, I can see the .rootfs directory along stdout and stderr
> files, and inside .rootfs, I can see /mnt/god-data (empty).
>
> Running the same using Docker containerizer instead of mesos
> containerizer (with a Docker image) works fine.
>
> It seems it fails to mount my local directory in the container. Any idea
> of what is going wrong or how to debug this?
>
>
> Thanks
>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

volume / mount point error with Unified Containerizer

2016-05-18 Thread Olivier Sallou

Hi,
I am trying unified containerizer on a single server (master/slave) on
mesos 0.28.1, to switch from docker containerizer to mesos+docker image
container.

I have setup slave config as suggested in documentation:

containerizers=docker,mesos
image_providers=docker \
isolation=filesystem/linux,docker/runtime

However, when I execute my task with a volume I have an error:


+ mount -n --rbind
/tmp/mesos/provisioner/containers/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/backends/copy/rootfses/f9f66bb2-308d-4555-ba77-49ec61cbeb4f
/tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs
+ mount -n --rbind
/home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
/tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
mount: mount point
/tmp/mesos/slaves/2a296daf-7419-4659-ade1-763c792cd522-S0/frameworks/aef1b0e3-ea2d-4770-baac-96d673ab88f9-/executors/51/runs/2d7ea311-5e8b-440f-a3ca-a40e1b946b8e/.rootfs/mnt/god-data
does not exist
Failed to execute a preparation shell command

Then, my task switches to FAILED.

I define a local volume to bind mount in my "container"
/home/osallou/Development/NOSAVE/go-docker/godshared/tasks/pairtree_root/us/er/_o/sa/ll/ou/task
=> /mnt/god-data
My directory exists on local server.
In mesos UI, I can see the .rootfs directory along stdout and stderr
files, and inside .rootfs, I can see /mnt/god-data (empty).

Running the same using Docker containerizer instead of mesos
containerizer (with a Docker image) works fine.

It seems it fails to mount my local directory in the container. Any idea
of what is going wrong or how to debug this?


Thanks

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Mesos admin REST API

2016-05-18 Thread Olivier Sallou

Hi,
Is there any operator/admin admin to kill a task,  via an admin API ?

I faced issue where mesos does not send any offer to my framework after
a task failure (remains in staging, or can't contact an old framework.
The result is my framework cannot send new kills etc..

I'd like, as a mesos admin, to send a kill request (or other kind of
requests), "by passing" the framework.

Thanks

Olivier

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

work on Mesos Containerizer to support docker containers

2016-01-19 Thread Olivier Sallou

Hi,
I have seen there are some work on Mesos Containerizer to support docker
containers instead of using Docker Containerizer, which would help
support Docker network etc... with Calico for example.
Is there any doc on this available somewhere ? Where is code of the
Mesos Containerizer? (I found Docker one but can't find default Mesos one).

Thanks

Olivier

-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: work on Mesos Containerizer to support docker containers

2016-01-19 Thread Olivier Sallou



On 01/19/2016 03:55 PM, Jan Schlicht wrote:
> Hi Olivier,
>
> status for the "Unified Containerizer" project is tracked under this epic:
> https://issues.apache.org/jira/browse/MESOS-2840
> There's a design document linked in the epic, unfortunately I'm not able to
> access it.
perfect, thanks
>
> Cheers,
> Jan
>
> On Tue, Jan 19, 2016 at 3:06 PM, Qian Zhang <zhq527...@gmail.com> wrote:
>
>> Hi Olivier,
>>
>> Here is the doc of MesosContainerizer:
>> https://github.com/apache/mesos/blob/master/docs/mesos-containerizer.md
>>
>> And you may also find the following docs helpful:
>> https://github.com/apache/mesos/blob/master/docs/containerizer.md
>> https://github.com/apache/mesos/blob/master/docs/containerizer-internals.md
>>
>> And the code of MesosContainerizer is under:
>> src/slave/containerizer/mesos/
>>
>>
>> Regards,
>> Qian
>>
>>
>> On Tue, Jan 19, 2016 at 9:14 PM, Olivier Sallou <olivier.sal...@irisa.fr>
>> wrote:
>>
>>> Hi,
>>> I have seen there are some work on Mesos Containerizer to support docker
>>> containers instead of using Docker Containerizer, which would help
>>> support Docker network etc... with Calico for example.
>>> Is there any doc on this available somewhere ? Where is code of the
>>> Mesos Containerizer? (I found Docker one but can't find default Mesos
>> one).
>>> Thanks
>>>
>>> Olivier
>>>
>>> --
>>>
>>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>>
>>>
>
>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

ContainerId in TaskStatus message: can't find update in mesos.proto

2015-12-21 Thread Olivier Sallou

Hi,
mesos .023 added ContainerId in TaskStatus message as per:
https://issues.apache.org/jira/browse/MESOS-2191

However, I do not see any related modification in mesos.proto [0]

Am I missing something? As such is python client including the modification?

[0] https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto

Thanks

Olivier

-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: ContainerId in TaskStatus message: can't find update in mesos.proto

2015-12-21 Thread Olivier Sallou



On 12/21/2015 03:08 PM, Shuai Lin wrote:
> From what I read in the ticket, What's done is "adding the output of
> `docker output` to the `data` field of TaskStatus message when a task is in
> TASK_RUNNING state', so the related protobuf field is TaskStatus.data, not
> a specific 'containerid' field. See
> https://github.com/apache/mesos/blob/09a2fb3/src/docker/executor.cpp#L166
Seems so indeed, label of task is misleading. Thanks anyway
> On Mon, Dec 21, 2015 at 5:37 PM, Olivier Sallou <olivier.sal...@irisa.fr>
> wrote:
>
>> Hi,
>> mesos .023 added ContainerId in TaskStatus message as per:
>> https://issues.apache.org/jira/browse/MESOS-2191
>>
>> However, I do not see any related modification in mesos.proto [0]
>>
>> Am I missing something? As such is python client including the
>> modification?
>>
>> [0] https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto
>>
>> Thanks
>>
>> Olivier
>>
>> --
>>
>> gpg key id: 4096R/326D8438  (keyring.debian.org)
>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
>>
>>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Docker network support

2015-12-08 Thread Olivier Sallou

Hi,
what is the current/planned feature support for Docker network ?

Docker network creates an overlay network to link multiple containers on
multiple hosts. Is it supported/planned in mesos ? I do not find any
such info for the moment in mesos.proto

Thanks

Olivier

-- 


gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

how to get docker container id?

2015-06-12 Thread Olivier Sallou

Hi,
how can we get the container id when executing a TaskInfo with a  Docker
ContainerInfo ?

Mesos execute a Docker container with name mesos-xxx but how can we get
this identifier ?

I set in my TaskInfo a unique id in Task Id, but itis not used as Docker
identifier.

I need it to query cAdvisor, running on my nodes.

Thanks

Olivier

-- 


gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: how to get docker container id?

2015-06-12 Thread Olivier Sallou



On 06/12/2015 12:02 PM, Adam Bordelon wrote:
 You can query the slave's state.json to get the container ID.
 See the previous thread:
 http://search-hadoop.com/m/0Vlr6OtCiO1p8ypc2/mesos+accessing+programmatticallysubj=Re+Accessing+stdout+stderr+of+a+task+programmattically+
Thanks, I could get it, but it would be nice to get the information in
update message rather than needing to trigger the nodes (with
information for all tasks).

Olivier

 On Fri, Jun 12, 2015 at 2:35 AM, Olivier Sallou olivier.sal...@irisa.fr
 wrote:

 Hi,
 how can we get the container id when executing a TaskInfo with a  Docker
 ContainerInfo ?

 Mesos execute a Docker container with name mesos-xxx but how can we get
 this identifier ?

 I set in my TaskInfo a unique id in Task Id, but itis not used as Docker
 identifier.

 I need it to query cAdvisor, running on my nodes.

 Thanks

 Olivier

 --


 gpg key id: 4096R/326D8438  (keyring.debian.org)
 Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438



-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Docker port_mapping issue

2015-05-29 Thread Olivier Sallou

Hi,
I can run task with success in a Docker container in my mesos install
using base executor.

However, I cannot get a task running when I add port mapping (though
port is available).

I use mesos 0.22, with python 2.7.


If I print the sent task I have:

name: task 0
task_id {
  value: 0
}
slave_id {
  value: 20150526-114150-16777343-5050-2035-S0
}
resources {
  name: cpus
  type: SCALAR
  scalar {
value: 1
  }
}
resources {
  name: mem
  type: SCALAR
  scalar {
value: 128
  }
}
command {
  value: echo \hello world # $MESOS_SANDBOX #\
}
container {
  type: DOCKER
  docker {
image: centos
network: BRIDGE
port_mappings {
  host_port: 31000
  container_port: 22
}
force_pull_image: true
  }
}

And it ends with error:

Task 0 is in state TASK_FAILED
Abnormal executor termination


Slave shows:

I0529 13:50:49.813928 18426 docker.cpp:626] Starting container
'd9b5be3e-9f00-4242-aa91-d6a6f3a5175a' for task '0' (and executor '0')
of framework '20150529-103634-16777343-5050-18179-0020'
E0529 13:50:54.362663 18420 slave.cpp:3112] Container
'd9b5be3e-9f00-4242-aa91-d6a6f3a5175a' for executor '0' of framework
'20150529-103634-16777343-5050-18179-0020' failed to start: Port
mappings require port resources

However the offer present port resources:

resources {
  name: ports
  type: RANGES
  ranges {
range {
  begin: 31000
  end: 32000
}
  }
  role: *
}

At slave startup I also see:
I0529 14:05:37.481212 22455 slave.cpp:322] Slave resources: cpus(*):8;
mem(*):6900; disk(*):215925; ports(*):[31000-32000]


Any idea of what is going wrong?


Thanks

Olivier

-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: Docker port_mapping issue

2015-05-29 Thread Olivier Sallou



On 05/29/2015 02:07 PM, Olivier Sallou wrote:
 Hi,
 I can run task with success in a Docker container in my mesos install
 using base executor.

 However, I cannot get a task running when I add port mapping (though
 port is available).
ok, it appears that in addition to Docker port_mapping, we need to add a
port resource declaration in the task too, with something like:

ports = task.resources.add()
ports.name = ports
ports.type = mesos_pb2.Value.RANGES
port_range = ports.ranges.range.add()
port_range.begin=31000
port_range.end=31000

we kinda need to duplicate port declaration (task and docker) in task.


 I use mesos 0.22, with python 2.7.


 If I print the sent task I have:

 name: task 0
 task_id {
   value: 0
 }
 slave_id {
   value: 20150526-114150-16777343-5050-2035-S0
 }
 resources {
   name: cpus
   type: SCALAR
   scalar {
 value: 1
   }
 }
 resources {
   name: mem
   type: SCALAR
   scalar {
 value: 128
   }
 }
 command {
   value: echo \hello world # $MESOS_SANDBOX #\
 }
 container {
   type: DOCKER
   docker {
 image: centos
 network: BRIDGE
 port_mappings {
   host_port: 31000
   container_port: 22
 }
 force_pull_image: true
   }
 }

 And it ends with error:

 Task 0 is in state TASK_FAILED
 Abnormal executor termination


 Slave shows:

 I0529 13:50:49.813928 18426 docker.cpp:626] Starting container
 'd9b5be3e-9f00-4242-aa91-d6a6f3a5175a' for task '0' (and executor '0')
 of framework '20150529-103634-16777343-5050-18179-0020'
 E0529 13:50:54.362663 18420 slave.cpp:3112] Container
 'd9b5be3e-9f00-4242-aa91-d6a6f3a5175a' for executor '0' of framework
 '20150529-103634-16777343-5050-18179-0020' failed to start: Port
 mappings require port resources

 However the offer present port resources:

 resources {
   name: ports
   type: RANGES
   ranges {
 range {
   begin: 31000
   end: 32000
 }
   }
   role: *
 }

 At slave startup I also see:
 I0529 14:05:37.481212 22455 slave.cpp:322] Slave resources: cpus(*):8;
 mem(*):6900; disk(*):215925; ports(*):[31000-32000]


 Any idea of what is going wrong?


 Thanks

 Olivier


-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: Docker containers not removed

2015-05-26 Thread Olivier Sallou



On 05/26/2015 03:44 PM, Olivier Sallou wrote:
 Hi,
 I could make a test scrip to submit tasks in Docker containers.

 My tasks ends in FINISHED state, and everything goes fine.

 The point is the container is not removed (can be seen with a docker ps
 -a), though documentation states:

  6. On container exit or containerizer destroy, stop and remove the
 docker container.

 even after a few minutes, they are still present.

 am i missing something?
I just found somewhere in config:

--docker_remove_delay=VALUE


default being 6 hrs. I will wait and check !

 Thanks

 Olivier



-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Docker containers not removed

2015-05-26 Thread Olivier Sallou

Hi,
I could make a test scrip to submit tasks in Docker containers.

My tasks ends in FINISHED state, and everything goes fine.

The point is the container is not removed (can be seen with a docker ps
-a), though documentation states:

 6. On container exit or containerizer destroy, stop and remove the
docker container.

even after a few minutes, they are still present.

am i missing something?

Thanks

Olivier


-- 


gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: use of docker containerizer

2014-10-22 Thread Olivier Sallou


On 10/22/2014 11:02 AM, Adam Bordelon wrote:
 Olivier,

 You should only need to create the /etc/mesos-slave/containerizers OR
 specify --containerizers on the mesos-slave command-line. Either should
 work.
 - Is dockerd installed and running on the slave?
yes docker is running on slave
 - You could be running into MESOS-1873
 https://issues.apache.org/jira/browse/MESOS-1873. Try setting your value:
 ls -l /etc instead of using the arguments fields with shell: false
I tried too, my Task contains:

command {
  shell: true
  arguments: ls
  arguments: -l
  arguments: /etc
}
container {
  type: DOCKER
  docker {
image: dockerimages/centos-core
  }
}

but I have the same error:

Container '3f98b4ee-3417-407f-8717-b60a1ab6f359' for executor '0' of
framework '20141022-112627-16777343-5050-6219-' failed to start:
None of the enabled containerizers (mesos) could create a container for
the provided TaskInfo/ExecutorInfo message.

I find strange to find enabled containerizers (mesos) instead of
something like enabled containerizers (docker,mesos)

 On Tue, Oct 21, 2014 at 2:58 AM, Olivier Sallou olivier.sal...@irisa.fr
 wrote:

 Hi,
 I try to use the default docker containizer but I can't get it work... :-(

 My Task is correctly executed when using default executor with CommandInfo.

 If I add a ContainerInfo it fails.


 I launch my slave with options: –-containerizers=docker,mesos
 (this is a source install, not system wide installed)

 I see in slave logs:

 E1021 11:50:26.392259 12748 slave.cpp:2656] Container
 '4460417c-1f78-4d99-ab07-8524c73ab35c' for executor '0' of framework
 '20141021-113729-16777343-5050-12670-0003' failed to start: None of the
 enabled containerizers (mesos) could create a container for the provided
 TaskInfo/ExecutorInfo message.

 and Tasks ends in FAILED state.

 My Task looks like:

 ...
 command {
   value: ls
   shell: false
   arguments: -l
   arguments: /etc
 }
 container {
   type: DOCKER
   docker {
 image: docker:///dockerimages/centos-core
   }
 }


 It acts as if the docker option on slave is not taken into account.

 I found on Internet that a file need to be created:

 echo 'docker,mesos'  /etc/mesos-slave/containerizers

 to activate the flag, but I do not know where this file should be
 created when mesos is not system-wide installed.

 Thanks

 Olivier

 --


 gpg key id: 4096R/326D8438  (keyring.debian.org)
 Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438



-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

use of docker containerizer

2014-10-21 Thread Olivier Sallou

Hi,
I try to use the default docker containizer but I can't get it work... :-(

My Task is correctly executed when using default executor with CommandInfo.

If I add a ContainerInfo it fails.


I launch my slave with options: –-containerizers=docker,mesos
(this is a source install, not system wide installed)

I see in slave logs:

E1021 11:50:26.392259 12748 slave.cpp:2656] Container
'4460417c-1f78-4d99-ab07-8524c73ab35c' for executor '0' of framework
'20141021-113729-16777343-5050-12670-0003' failed to start: None of the
enabled containerizers (mesos) could create a container for the provided
TaskInfo/ExecutorInfo message.

and Tasks ends in FAILED state.

My Task looks like:

...
command {
  value: ls
  shell: false
  arguments: -l
  arguments: /etc
}
container {
  type: DOCKER
  docker {
image: docker:///dockerimages/centos-core
  }
}


It acts as if the docker option on slave is not taken into account.

I found on Internet that a file need to be created:

echo 'docker,mesos'  /etc/mesos-slave/containerizers

to activate the flag, but I do not know where this file should be
created when mesos is not system-wide installed.

Thanks

Olivier

-- 


gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: how to debug task lost in custom scheduler?

2014-10-20 Thread Olivier Sallou


On 10/17/2014 07:31 PM, Vinod Kone wrote:
 Can you grep for TASK_LOST in master and slave logs and paste the output
 here?
I do not see any TASK_LOST in any master/slave log, this is one of the
reason I do not understand.

I only found console log, I do not see any file log.

For information, mesos is not installed system-wide but locally from
source, I execute from the build directory.

 On Fri, Oct 17, 2014 at 8:24 AM, Olivier Sallou olivier.sal...@irisa.fr
 wrote:

 Hi,
 I have installed mesos on a single host master/slave config (for
 devpt/test).

 Mesos works fine with frameworks I tested (aurora...).

 I try to create my own scheduler/executor in python, based on example
 given with sources, but I cannot get my task executed.

 Executor is not executed (I have added debug logs in a file to check,
 and no file is created), but I see no error in master logs (console) nor
 slave logs.

 In master I can see:

 I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to
 framework 20141017-141022-16777343-5050-25774-0047
 I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for
 offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave
 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051
 (localhost) for framework 20141017-141022-16777343-5050-25774-0047
 I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563]
 Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000]
 (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925;
 ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0
 from framework 20141017-141022-16777343-5050-25774-0047

 My reply to the offer is received, but in my scheduler I receive an
 update status of TASK_LOST.

 I do not see how to debug this, I see no information why my task is lost
 (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that
 it is rejected at master level.

 Any hint on how to analyse this?

 Thanks

 --
 gpg key id: 4096R/326D8438  (keyring.debian.org)
 Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438




-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: how to debug task lost in custom scheduler?

2014-10-20 Thread Olivier Sallou


On 10/18/2014 12:55 PM, Alex Rukletsov wrote:
 Hi Oliver,

 you can get a TASK_LOST if import directives in your executor fail. Do you
 have mesos python eggs installed or available through PYTHONPATH? Could you
 please also paste the output of stderr and stdout of the lost task (you can
 access them via mesos webUI → sandbox)?
I do not see the task at all on webUI. Python eggs are available from
PYTHONPATH. My eggs are in MESOS_BUILD_DIR.
If I execute directly my executor, I have no python error, only a
MISSING SLAVE ID (but this is correct as mesos adds this env at runtime).

I see that task is lost because, in my scheduler, in the statusUpdate
method, I print the task status (value = 5). Message is empty.

nothing in webUI, nothing in console logs as my executor is not
executed, it means that mesos (master or slave) give me this error
status, but I have no additional info about the reason.

I have used and adapted the examples given with sources
(src/examples/python).

Olivier

 On Fri, Oct 17, 2014 at 7:31 PM, Vinod Kone vinodk...@gmail.com wrote:

 Can you grep for TASK_LOST in master and slave logs and paste the output
 here?

 On Fri, Oct 17, 2014 at 8:24 AM, Olivier Sallou olivier.sal...@irisa.fr
 wrote:

 Hi,
 I have installed mesos on a single host master/slave config (for
 devpt/test).

 Mesos works fine with frameworks I tested (aurora...).

 I try to create my own scheduler/executor in python, based on example
 given with sources, but I cannot get my task executed.

 Executor is not executed (I have added debug logs in a file to check,
 and no file is created), but I see no error in master logs (console) nor
 slave logs.

 In master I can see:

 I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to
 framework 20141017-141022-16777343-5050-25774-0047
 I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for
 offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave
 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051
 (localhost) for framework 20141017-141022-16777343-5050-25774-0047
 I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563]
 Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000]
 (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925;
 ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0
 from framework 20141017-141022-16777343-5050-25774-0047

 My reply to the offer is received, but in my scheduler I receive an
 update status of TASK_LOST.

 I do not see how to debug this, I see no information why my task is lost
 (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that
 it is rejected at master level.

 Any hint on how to analyse this?

 Thanks

 --
 gpg key id: 4096R/326D8438  (keyring.debian.org)
 Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438




-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: how to debug task lost in custom scheduler?

2014-10-20 Thread Olivier Sallou


On 10/20/2014 08:11 AM, Olivier Sallou wrote:
 On 10/18/2014 12:55 PM, Alex Rukletsov wrote:
 Hi Oliver,

 you can get a TASK_LOST if import directives in your executor fail. Do you
 have mesos python eggs installed or available through PYTHONPATH? Could you
 please also paste the output of stderr and stdout of the lost task (you can
 access them via mesos webUI → sandbox)?
 I do not see the task at all on webUI. Python eggs are available from
 PYTHONPATH. My eggs are in MESOS_BUILD_DIR.
 If I execute directly my executor, I have no python error, only a
 MISSING SLAVE ID (but this is correct as mesos adds this env at runtime).

 I see that task is lost because, in my scheduler, in the statusUpdate
 method, I print the task status (value = 5). Message is empty.

 nothing in webUI, nothing in console logs as my executor is not
 executed, it means that mesos (master or slave) give me this error
 status, but I have no additional info about the reason.

 I have used and adapted the examples given with sources
 (src/examples/python).
Taking as example the python code in src/examples/python, I could
progress a little.

Though there is no additional error log, I found an issue with setting
the command parameter.

If I comment the command parameter, my executor is executed (it fails
but that's fine for the moment).

In my task, I was setting: task.command.value = something to execute on
node

Setting command creates a silent error.

My TaskInfo was like:
.
executor {
  executor_id {
value: default
  }
  command {
value: ../test-executor
  }
  name: Test Executor (Python)
  source: python_test
}
command {
  value: ls -l
}

So I wonder:

1) why the error is silent on master side

2) how do I set the command to execute in the TaskInfo object ?

 Olivier
 On Fri, Oct 17, 2014 at 7:31 PM, Vinod Kone vinodk...@gmail.com wrote:

 Can you grep for TASK_LOST in master and slave logs and paste the output
 here?

 On Fri, Oct 17, 2014 at 8:24 AM, Olivier Sallou olivier.sal...@irisa.fr
 wrote:

 Hi,
 I have installed mesos on a single host master/slave config (for
 devpt/test).

 Mesos works fine with frameworks I tested (aurora...).

 I try to create my own scheduler/executor in python, based on example
 given with sources, but I cannot get my task executed.

 Executor is not executed (I have added debug logs in a file to check,
 and no file is created), but I see no error in master logs (console) nor
 slave logs.

 In master I can see:

 I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to
 framework 20141017-141022-16777343-5050-25774-0047
 I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for
 offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave
 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051
 (localhost) for framework 20141017-141022-16777343-5050-25774-0047
 I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563]
 Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000]
 (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925;
 ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0
 from framework 20141017-141022-16777343-5050-25774-0047

 My reply to the offer is received, but in my scheduler I receive an
 update status of TASK_LOST.

 I do not see how to debug this, I see no information why my task is lost
 (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that
 it is rejected at master level.

 Any hint on how to analyse this?

 Thanks

 --
 gpg key id: 4096R/326D8438  (keyring.debian.org)
 Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438




-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: how to debug task lost in custom scheduler?

2014-10-20 Thread Olivier Sallou


On 10/20/2014 05:20 PM, Alex Rukletsov wrote:
 It looks like you try to set both command and executor. This is not
 allowed, since setting a command implies using the CommandExecutor aka
 mesos-executor. If you task is a command, do not specify the executor in
 your TaskInfo: mesos will do it for you. See
 https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto line
 579.

 Btw, you should observe something like Task id should have either
 CommandInfo or ExecutorInfo set but not both in your logs.
ok, thanks, I could get it work (at least I see my job).

There is a lack of documentation on API per language.  :-(

Thanks for your help

Olivier

 On Mon, Oct 20, 2014 at 5:13 PM, Olivier Sallou olivier.sal...@irisa.fr
 wrote:

 On 10/20/2014 08:11 AM, Olivier Sallou wrote:
 On 10/18/2014 12:55 PM, Alex Rukletsov wrote:
 Hi Oliver,

 you can get a TASK_LOST if import directives in your executor fail. Do
 you
 have mesos python eggs installed or available through PYTHONPATH? Could
 you
 please also paste the output of stderr and stdout of the lost task (you
 can
 access them via mesos webUI → sandbox)?
 I do not see the task at all on webUI. Python eggs are available from
 PYTHONPATH. My eggs are in MESOS_BUILD_DIR.
 If I execute directly my executor, I have no python error, only a
 MISSING SLAVE ID (but this is correct as mesos adds this env at runtime).

 I see that task is lost because, in my scheduler, in the statusUpdate
 method, I print the task status (value = 5). Message is empty.

 nothing in webUI, nothing in console logs as my executor is not
 executed, it means that mesos (master or slave) give me this error
 status, but I have no additional info about the reason.

 I have used and adapted the examples given with sources
 (src/examples/python).
 Taking as example the python code in src/examples/python, I could
 progress a little.

 Though there is no additional error log, I found an issue with setting
 the command parameter.

 If I comment the command parameter, my executor is executed (it fails
 but that's fine for the moment).

 In my task, I was setting: task.command.value = something to execute on
 node

 Setting command creates a silent error.

 My TaskInfo was like:
 .
 executor {
   executor_id {
 value: default
   }
   command {
 value: ../test-executor
   }
   name: Test Executor (Python)
   source: python_test
 }
 command {
   value: ls -l
 }

 So I wonder:

 1) why the error is silent on master side

 2) how do I set the command to execute in the TaskInfo object ?
 Olivier
 On Fri, Oct 17, 2014 at 7:31 PM, Vinod Kone vinodk...@gmail.com
 wrote:
 Can you grep for TASK_LOST in master and slave logs and paste the
 output
 here?

 On Fri, Oct 17, 2014 at 8:24 AM, Olivier Sallou 
 olivier.sal...@irisa.fr
 wrote:

 Hi,
 I have installed mesos on a single host master/slave config (for
 devpt/test).

 Mesos works fine with frameworks I tested (aurora...).

 I try to create my own scheduler/executor in python, based on example
 given with sources, but I cannot get my task executed.

 Executor is not executed (I have added debug logs in a file to check,
 and no file is created), but I see no error in master logs (console)
 nor
 slave logs.

 In master I can see:

 I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to
 framework 20141017-141022-16777343-5050-25774-0047
 I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for
 offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave
 20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051
 (localhost) for framework 20141017-141022-16777343-5050-25774-0047
 I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563]
 Recovered cpus(*):8; mem(*):6900; disk(*):215925;
 ports(*):[31000-32000]
 (total allocatable: cpus(*):8; mem(*):6900; disk(*):215925;
 ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0
 from framework 20141017-141022-16777343-5050-25774-0047

 My reply to the offer is received, but in my scheduler I receive an
 update status of TASK_LOST.

 I do not see how to debug this, I see no information why my task is
 lost
 (there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems
 that
 it is rejected at master level.

 Any hint on how to analyse this?

 Thanks

 --
 gpg key id: 4096R/326D8438  (keyring.debian.org)
 Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438



 --
 Olivier Sallou
 IRISA / University of Rennes 1
 Campus de Beaulieu, 35000 RENNES - FRANCE
 Tel: 02.99.84.71.95

 gpg key id: 4096R/326D8438  (keyring.debian.org)
 Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438



-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

how to debug task lost in custom scheduler?

2014-10-17 Thread Olivier Sallou

Hi,
I have installed mesos on a single host master/slave config (for
devpt/test).

Mesos works fine with frameworks I tested (aurora...).

I try to create my own scheduler/executor in python, based on example
given with sources, but I cannot get my task executed.

Executor is not executed (I have added debug logs in a file to check,
and no file is created), but I see no error in master logs (console) nor
slave logs.

In master I can see:

I1017 16:50:30.601210 25794 master.cpp:3559] Sending 1 offers to
framework 20141017-141022-16777343-5050-25774-0047
I1017 16:50:30.608912 25789 master.cpp:2169] Processing reply for
offers: [ 20141017-141022-16777343-5050-25774-97 ] on slave
20141017-141022-16777343-5050-25774-0 at slave(1)@127.0.0.1:5051
(localhost) for framework 20141017-141022-16777343-5050-25774-0047
I1017 16:50:30.609207 25789 hierarchical_allocator_process.hpp:563]
Recovered cpus(*):8; mem(*):6900; disk(*):215925; ports(*):[31000-32000]
(total allocatable: cpus(*):8; mem(*):6900; disk(*):215925;
ports(*):[31000-32000]) on slave 20141017-141022-16777343-5050-25774-0
from framework 20141017-141022-16777343-5050-25774-0047

My reply to the offer is received, but in my scheduler I receive an
update status of TASK_LOST.

I do not see how to debug this, I see no information why my task is lost
(there is enough cpu/mem, I ask 2 cpu, and 2024 mem), and it seems that
it is rejected at master level.

Any hint on how to analyse this?

Thanks

-- 
gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

66 matches

Mail list logo