Re: Running a task in Mesos cluster

Ondrej Smola Sat, 03 Oct 2015 02:37:07 -0700

Yes there should be configuration options for this in mesos configuration -
see documentation. I am leaving now so i wont be able to respond till Sunday


2015-10-03 11:18 GMT+02:00 Pradeep Kiruvale <[email protected]>:

> I have different login names for different system. I have a client system,
> from where I launch the tasks. But these tasks are not getting any
> resources. So, they are not getting scheduled.
>
> I mean to say my cluster arrangement is 1 client, 1 Master, 3 slaves. All
> are different physical systems.
>
> Is there any way of run the tasks under one unified user?
>
> Regards,
> Pradeep
>
> On 3 October 2015 at 10:43, Ondrej Smola <[email protected]> wrote:
>
>>
>> mesos framework receive offers and based on those offers it decides where
>> to run tasks.
>>
>>
>> mesos-execute is little framework that executes your task (hackbench) -
>> see here https://github.com/apache/mesos/blob/master/src/cli/execute.cpp
>>
>> https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L320 you
>> can see that it uses user that run mesos-execute command
>>
>> error you can see should be from here (su command)
>>
>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/posix/os.hpp#L520
>>
>> under which user do you run mesos-execute and mesos daemons?
>>
>> 2015-10-02 15:26 GMT+02:00 Pradeep Kiruvale <[email protected]>:
>>
>>> Hi Ondrej,
>>>
>>> Thanks for your reply
>>>
>>> I did solve that issue, yes you are right there was an issue with slave
>>> IP address setting.
>>>
>>> Now I am facing issue with the scheduling the tasks. When I try to
>>> schedule a task using
>>>
>>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
>>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
>>> --resources="cpus(*):3;mem(*):2560"
>>>
>>> The tasks always get scheduled on the same node. The resources from the
>>> other nodes are not getting used to schedule the tasks.
>>>
>>>  I just start the mesos slaves like below
>>>
>>> ./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos
>>>  --hostname=slave1
>>>
>>> If I submit the task using the above (mesos-execute) command from same
>>> as one of the slave it runs on that system.
>>>
>>> But when I submit the task from some different system. It uses just that
>>> system and queues the tasks not runs on the other slaves.
>>> Some times I see the message "Failed to getgid: unknown user"
>>>
>>> Do I need to start some process to push the task on all the slaves
>>> equally? Am I missing something here?
>>>
>>> Regards,
>>> Pradeep
>>>
>>>
>>>
>>> On 2 October 2015 at 15:07, Ondrej Smola <[email protected]> wrote:
>>>
>>>> Hi Pradeep,
>>>>
>>>> the problem is with IP your slave advertise - mesos by default resolves
>>>> your hostname - there are several solutions  (let say your node ip is
>>>> 192.168.56.128)
>>>>
>>>> 1)  export LIBPROCESS_IP=192.168.56.128
>>>> 2)  set mesos options - ip, hostname
>>>>
>>>> one way to do this is to create files
>>>>
>>>> echo "192.168.56.128" > /etc/mesos-slave/ip
>>>> echo "abc.mesos.com" > /etc/mesos-slave/hostname
>>>>
>>>> for more configuration options see
>>>> http://mesos.apache.org/documentation/latest/configuration
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale <[email protected]>
>>>> :
>>>>
>>>>> Hi Guangya,
>>>>>
>>>>> Thanks for reply. I found one interesting log message.
>>>>>
>>>>>  7410 master.cpp:5977] Removed slave
>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
>>>>> registered at the same address
>>>>>
>>>>> Mostly because of this issue, the systems/slave nodes are getting
>>>>> registered and de-registered to make a room for the next node. I can even
>>>>> see this on
>>>>> the UI interface, for some time one node got added and after some time
>>>>> that will be replaced with the new slave node.
>>>>>
>>>>> The above log is followed by the below log messages.
>>>>>
>>>>>
>>>>> I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action (18
>>>>> bytes) to leveldb took 104089ns
>>>>> I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at 384
>>>>> E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown
>>>>> socket with fd 15: Transport endpoint is not connected
>>>>> I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>>>>> (192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578;
>>>>> ports(*):[31000-32000]
>>>>> I1002 10:01:12.754065  7413 master.cpp:1080] Slave
>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>>>>> (192.168.0.116) disconnected
>>>>> I1002 10:01:12.754072  7416 hierarchical.hpp:675] Added slave
>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8;
>>>>> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: )
>>>>> I1002 10:01:12.754084  7413 master.cpp:2534] Disconnecting slave
>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>>>>> (192.168.0.116)
>>>>> E1002 10:01:12.754118  7417 process.cpp:1912] Failed to shutdown
>>>>> socket with fd 16: Transport endpoint is not connected
>>>>> I1002 10:01:12.754132  7413 master.cpp:2553] Deactivating slave
>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>>>>> (192.168.0.116)
>>>>> I1002 10:01:12.754237  7416 hierarchical.hpp:768] Slave
>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated
>>>>> I1002 10:01:12.754240  7413 replica.cpp:658] Replica received learned
>>>>> notice for position 384
>>>>> I1002 10:01:12.754360  7413 leveldb.cpp:343] Persisting action (20
>>>>> bytes) to leveldb took 95171ns
>>>>> I1002 10:01:12.754395  7413 leveldb.cpp:401] Deleting ~2 keys from
>>>>> leveldb took 20333ns
>>>>> I1002 10:01:12.754406  7413 replica.cpp:679] Persisted action at 384
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Pradeep
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2 October 2015 at 02:35, Guangya Liu <[email protected]> wrote:
>>>>>
>>>>>> Hi Pradeep,
>>>>>>
>>>>>> Please check some of my questions in line.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Guangya
>>>>>>
>>>>>> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I am new to Mesos. I have set up a Mesos cluster with 1 Master and 3
>>>>>>> Slaves.
>>>>>>>
>>>>>>> One slave runs on the Master Node itself and Other slaves run on
>>>>>>> different nodes. Here node means the physical boxes.
>>>>>>>
>>>>>>> I tried running the tasks by configuring one Node cluster. Tested
>>>>>>> the task scheduling using mesos-execute, works fine.
>>>>>>>
>>>>>>> When I configure three Node cluster (1master and 3 slaves) and try
>>>>>>> to see the resources on the master (in GUI) only the Master node 
>>>>>>> resources
>>>>>>> are visible.
>>>>>>>  The other nodes resources are not visible. Some times visible but
>>>>>>> in a de-actived state.
>>>>>>>
>>>>>> Can you please append some logs from mesos-slave and mesos-master?
>>>>>> There should be some logs in either master or slave telling you what is
>>>>>> wrong.
>>>>>>
>>>>>>>
>>>>>>> *Please let me know what could be the reason. All the nodes are in
>>>>>>> the same network. *
>>>>>>>
>>>>>>> When I try to schedule a task using
>>>>>>>
>>>>>>> /src/mesos-execute --master=192.168.0.102:5050
>>>>>>> --name="cluster-test" --command="/usr/bin/hackbench -s 4096 -l 10845760 
>>>>>>> -g
>>>>>>> 2 -f 2 -P" --resources="cpus(*):3;mem(*):2560"
>>>>>>>
>>>>>>> The tasks always get scheduled on the same node. The resources from
>>>>>>> the other nodes are not getting used to schedule the tasks.
>>>>>>>
>>>>>> Based on your previous question, there is only one node in your
>>>>>> cluster, that's why other nodes are not available. We need first identify
>>>>>> what is wrong with other three nodes first.
>>>>>>
>>>>>>>
>>>>>>> I*s it required to register the frameworks from every slave node on
>>>>>>> the Master?*
>>>>>>>
>>>>>> It is not required.
>>>>>>
>>>>>>>
>>>>>>> *I have configured this cluster using the git-hub code.*
>>>>>>>
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Pradeep
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Running a task in Mesos cluster

Reply via email to