Yes there should be configuration options for this in mesos configuration - see documentation. I am leaving now so i wont be able to respond till Sunday
2015-10-03 11:18 GMT+02:00 Pradeep Kiruvale <[email protected]>: > I have different login names for different system. I have a client system, > from where I launch the tasks. But these tasks are not getting any > resources. So, they are not getting scheduled. > > I mean to say my cluster arrangement is 1 client, 1 Master, 3 slaves. All > are different physical systems. > > Is there any way of run the tasks under one unified user? > > Regards, > Pradeep > > On 3 October 2015 at 10:43, Ondrej Smola <[email protected]> wrote: > >> >> mesos framework receive offers and based on those offers it decides where >> to run tasks. >> >> >> mesos-execute is little framework that executes your task (hackbench) - >> see here https://github.com/apache/mesos/blob/master/src/cli/execute.cpp >> >> https://github.com/apache/mesos/blob/master/src/cli/execute.cpp#L320 you >> can see that it uses user that run mesos-execute command >> >> error you can see should be from here (su command) >> >> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/posix/os.hpp#L520 >> >> under which user do you run mesos-execute and mesos daemons? >> >> 2015-10-02 15:26 GMT+02:00 Pradeep Kiruvale <[email protected]>: >> >>> Hi Ondrej, >>> >>> Thanks for your reply >>> >>> I did solve that issue, yes you are right there was an issue with slave >>> IP address setting. >>> >>> Now I am facing issue with the scheduling the tasks. When I try to >>> schedule a task using >>> >>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test" >>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P" >>> --resources="cpus(*):3;mem(*):2560" >>> >>> The tasks always get scheduled on the same node. The resources from the >>> other nodes are not getting used to schedule the tasks. >>> >>> I just start the mesos slaves like below >>> >>> ./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos >>> --hostname=slave1 >>> >>> If I submit the task using the above (mesos-execute) command from same >>> as one of the slave it runs on that system. >>> >>> But when I submit the task from some different system. It uses just that >>> system and queues the tasks not runs on the other slaves. >>> Some times I see the message "Failed to getgid: unknown user" >>> >>> Do I need to start some process to push the task on all the slaves >>> equally? Am I missing something here? >>> >>> Regards, >>> Pradeep >>> >>> >>> >>> On 2 October 2015 at 15:07, Ondrej Smola <[email protected]> wrote: >>> >>>> Hi Pradeep, >>>> >>>> the problem is with IP your slave advertise - mesos by default resolves >>>> your hostname - there are several solutions (let say your node ip is >>>> 192.168.56.128) >>>> >>>> 1) export LIBPROCESS_IP=192.168.56.128 >>>> 2) set mesos options - ip, hostname >>>> >>>> one way to do this is to create files >>>> >>>> echo "192.168.56.128" > /etc/mesos-slave/ip >>>> echo "abc.mesos.com" > /etc/mesos-slave/hostname >>>> >>>> for more configuration options see >>>> http://mesos.apache.org/documentation/latest/configuration >>>> >>>> >>>> >>>> >>>> >>>> 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale <[email protected]> >>>> : >>>> >>>>> Hi Guangya, >>>>> >>>>> Thanks for reply. I found one interesting log message. >>>>> >>>>> 7410 master.cpp:5977] Removed slave >>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave >>>>> registered at the same address >>>>> >>>>> Mostly because of this issue, the systems/slave nodes are getting >>>>> registered and de-registered to make a room for the next node. I can even >>>>> see this on >>>>> the UI interface, for some time one node got added and after some time >>>>> that will be replaced with the new slave node. >>>>> >>>>> The above log is followed by the below log messages. >>>>> >>>>> >>>>> I1002 10:01:12.753865 7416 leveldb.cpp:343] Persisting action (18 >>>>> bytes) to leveldb took 104089ns >>>>> I1002 10:01:12.753885 7416 replica.cpp:679] Persisted action at 384 >>>>> E1002 10:01:12.753891 7417 process.cpp:1912] Failed to shutdown >>>>> socket with fd 15: Transport endpoint is not connected >>>>> I1002 10:01:12.753988 7413 master.cpp:3930] Registered slave >>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 >>>>> (192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578; >>>>> ports(*):[31000-32000] >>>>> I1002 10:01:12.754065 7413 master.cpp:1080] Slave >>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 >>>>> (192.168.0.116) disconnected >>>>> I1002 10:01:12.754072 7416 hierarchical.hpp:675] Added slave >>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8; >>>>> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: ) >>>>> I1002 10:01:12.754084 7413 master.cpp:2534] Disconnecting slave >>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 >>>>> (192.168.0.116) >>>>> E1002 10:01:12.754118 7417 process.cpp:1912] Failed to shutdown >>>>> socket with fd 16: Transport endpoint is not connected >>>>> I1002 10:01:12.754132 7413 master.cpp:2553] Deactivating slave >>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 >>>>> (192.168.0.116) >>>>> I1002 10:01:12.754237 7416 hierarchical.hpp:768] Slave >>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated >>>>> I1002 10:01:12.754240 7413 replica.cpp:658] Replica received learned >>>>> notice for position 384 >>>>> I1002 10:01:12.754360 7413 leveldb.cpp:343] Persisting action (20 >>>>> bytes) to leveldb took 95171ns >>>>> I1002 10:01:12.754395 7413 leveldb.cpp:401] Deleting ~2 keys from >>>>> leveldb took 20333ns >>>>> I1002 10:01:12.754406 7413 replica.cpp:679] Persisted action at 384 >>>>> >>>>> >>>>> Thanks, >>>>> Pradeep >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 2 October 2015 at 02:35, Guangya Liu <[email protected]> wrote: >>>>> >>>>>> Hi Pradeep, >>>>>> >>>>>> Please check some of my questions in line. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Guangya >>>>>> >>>>>> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I am new to Mesos. I have set up a Mesos cluster with 1 Master and 3 >>>>>>> Slaves. >>>>>>> >>>>>>> One slave runs on the Master Node itself and Other slaves run on >>>>>>> different nodes. Here node means the physical boxes. >>>>>>> >>>>>>> I tried running the tasks by configuring one Node cluster. Tested >>>>>>> the task scheduling using mesos-execute, works fine. >>>>>>> >>>>>>> When I configure three Node cluster (1master and 3 slaves) and try >>>>>>> to see the resources on the master (in GUI) only the Master node >>>>>>> resources >>>>>>> are visible. >>>>>>> The other nodes resources are not visible. Some times visible but >>>>>>> in a de-actived state. >>>>>>> >>>>>> Can you please append some logs from mesos-slave and mesos-master? >>>>>> There should be some logs in either master or slave telling you what is >>>>>> wrong. >>>>>> >>>>>>> >>>>>>> *Please let me know what could be the reason. All the nodes are in >>>>>>> the same network. * >>>>>>> >>>>>>> When I try to schedule a task using >>>>>>> >>>>>>> /src/mesos-execute --master=192.168.0.102:5050 >>>>>>> --name="cluster-test" --command="/usr/bin/hackbench -s 4096 -l 10845760 >>>>>>> -g >>>>>>> 2 -f 2 -P" --resources="cpus(*):3;mem(*):2560" >>>>>>> >>>>>>> The tasks always get scheduled on the same node. The resources from >>>>>>> the other nodes are not getting used to schedule the tasks. >>>>>>> >>>>>> Based on your previous question, there is only one node in your >>>>>> cluster, that's why other nodes are not available. We need first identify >>>>>> what is wrong with other three nodes first. >>>>>> >>>>>>> >>>>>>> I*s it required to register the frameworks from every slave node on >>>>>>> the Master?* >>>>>>> >>>>>> It is not required. >>>>>> >>>>>>> >>>>>>> *I have configured this cluster using the git-hub code.* >>>>>>> >>>>>>> >>>>>>> Thanks & Regards, >>>>>>> Pradeep >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >

