Hi Guangya, I am running a frame work from some other physical node, which is part of the same network. Still I am getting below messages and the framework not getting registered.
Any idea what is the reason? I1007 11:24:58.781914 32392 master.cpp:4815] Framework failover timeout, removing framework 89b179d8-9fb7-4a61-ad03-a9a5525482ff-0019 (Balloon Framework (C++)) at [email protected]:54203 I1007 11:24:58.781968 32392 master.cpp:5571] Removing framework 89b179d8-9fb7-4a61-ad03-a9a5525482ff-0019 (Balloon Framework (C++)) at [email protected]:54203 I1007 11:24:58.782352 32392 hierarchical.hpp:552] Removed framework 89b179d8-9fb7-4a61-ad03-a9a5525482ff-0019 E1007 11:24:58.782577 32399 process.cpp:1912] Failed to shutdown socket with fd 13: Transport endpoint is not connected I1007 11:24:59.699587 32396 master.cpp:2179] Received SUBSCRIBE call for framework 'Balloon Framework (C++)' at [email protected]:54203 I1007 11:24:59.699717 32396 master.cpp:2250] Subscribing framework Balloon Framework (C++) with checkpointing disabled and capabilities [ ] I1007 11:24:59.700251 32393 hierarchical.hpp:515] Added framework 89b179d8-9fb7-4a61-ad03-a9a5525482ff-0020 E1007 11:24:59.700253 32399 process.cpp:1912] Failed to shutdown socket with fd 13: Transport endpoint is not connected Regards, Pradeep On 5 October 2015 at 13:51, Guangya Liu <[email protected]> wrote: > Hi Pradeep, > > I think that the problem might be caused by that you are running the lxc > container on master node and not sure if there are any port conflict or > what else wrong. > > For my case, I was running the client in a new node but not on master > node, perhaps you can have a try to put your client on a new node but not > on master node. > > Thanks, > > Guangya > > > On Mon, Oct 5, 2015 at 7:30 PM, Pradeep Kiruvale < > [email protected]> wrote: > >> Hi Guangya, >> >> Hmm!...That is strange in my case! >> >> If I run from the mesos-execute on one of the slave/master node then the >> tasks get their resources and they get scheduled well. >> But if I start the mesos-execute on another node which is neither >> slave/master then I have this issue. >> >> I am using an lxc container on master as a client to launch the tasks. >> This is also in the same network as master/slaves. >> And I just launch the task as you did. But the tasks are not getting >> scheduled. >> >> >> On master the logs are same as I sent you before >> >> Deactivating framework 77539063-89ce-4efa-a20b-ca788abbd912-0066 >> >> On both of the slaves I can see the below logs >> >> I1005 13:23:32.547987 4831 slave.cpp:1980] Asked to shut down framework >> 77539063-89ce-4efa-a20b-ca788abbd912-0060 by [email protected]:5050 >> W1005 13:23:32.548135 4831 slave.cpp:1995] Cannot shut down unknown >> framework 77539063-89ce-4efa-a20b-ca788abbd912-0060 >> I1005 13:23:33.697707 4833 slave.cpp:3926] Current disk usage 3.60%. Max >> allowed age: 6.047984349521910days >> I1005 13:23:34.098599 4829 slave.cpp:1980] Asked to shut down framework >> 77539063-89ce-4efa-a20b-ca788abbd912-0061 by [email protected]:5050 >> W1005 13:23:34.098740 4829 slave.cpp:1995] Cannot shut down unknown >> framework 77539063-89ce-4efa-a20b-ca788abbd912-0061 >> I1005 13:23:35.274569 4831 slave.cpp:1980] Asked to shut down framework >> 77539063-89ce-4efa-a20b-ca788abbd912-0062 by [email protected]:5050 >> W1005 13:23:35.274683 4831 slave.cpp:1995] Cannot shut down unknown >> framework 77539063-89ce-4efa-a20b-ca788abbd912-0062 >> I1005 13:23:36.193964 4829 slave.cpp:1980] Asked to shut down framework >> 77539063-89ce-4efa-a20b-ca788abbd912-0063 by [email protected]:5050 >> W1005 13:23:36.194090 4829 slave.cpp:1995] Cannot shut down unknown >> framework 77539063-89ce-4efa-a20b-ca788abbd912-0063 >> I1005 13:24:01.914788 4827 slave.cpp:1980] Asked to shut down framework >> 77539063-89ce-4efa-a20b-ca788abbd912-0064 by [email protected]:5050 >> W1005 13:24:01.914937 4827 slave.cpp:1995] Cannot shut down unknown >> framework 77539063-89ce-4efa-a20b-ca788abbd912-0064 >> I1005 13:24:03.469974 4833 slave.cpp:1980] Asked to shut down framework >> 77539063-89ce-4efa-a20b-ca788abbd912-0065 by [email protected]:5050 >> W1005 13:24:03.470118 4833 slave.cpp:1995] Cannot shut down unknown >> framework 77539063-89ce-4efa-a20b-ca788abbd912-0065 >> I1005 13:24:04.642654 4826 slave.cpp:1980] Asked to shut down framework >> 77539063-89ce-4efa-a20b-ca788abbd912-0066 by [email protected]:5050 >> W1005 13:24:04.642812 4826 slave.cpp:1995] Cannot shut down unknown >> framework 77539063-89ce-4efa-a20b-ca788abbd912-0066 >> >> >> >> On 5 October 2015 at 13:09, Guangya Liu <[email protected]> wrote: >> >>> Hi Pradeep, >>> >>> From your log, seems that the master process is exiting and this caused >>> the framework fail over to another mesos master. Can you please show more >>> detail for your issue reproduced steps? >>> >>> I did some test by running mesos-execute on a client host which does not >>> have any mesos service and the task can schedule well. >>> >>> root@mesos008:~/src/mesos/m1/mesos/build# ./src/mesos-execute --master= >>> 192.168.0.107:5050 --name="cluster-test" --command="/bin/sleep 10" >>> --resources="cpus(*):1;mem(*):256" >>> I1005 18:59:47.974123 1233 sched.cpp:164] Version: 0.26.0 >>> I1005 18:59:47.990890 1248 sched.cpp:262] New master detected at >>> [email protected]:5050 >>> I1005 18:59:47.993074 1248 sched.cpp:272] No credentials provided. >>> Attempting to register without authentication >>> I1005 18:59:48.001194 1249 sched.cpp:641] Framework registered with >>> 04b9af5e-e9b6-4c59-8734-eba407163922-0002 >>> Framework registered with 04b9af5e-e9b6-4c59-8734-eba407163922-0002 >>> task cluster-test submitted to slave >>> c0e5fdde-595e-4768-9d04-25901d4523b6-S0 >>> Received status update TASK_RUNNING for task cluster-test >>> Received status update TASK_FINISHED for task cluster-test >>> I1005 18:59:58.431144 1249 sched.cpp:1771] Asked to stop the driver >>> I1005 18:59:58.431591 1249 sched.cpp:1040] Stopping framework >>> '04b9af5e-e9b6-4c59-8734-eba407163922-0002' >>> root@mesos008:~/src/mesos/m1/mesos/build# ps -ef | grep mesos >>> root 1259 1159 0 19:06 pts/0 00:00:00 grep --color=auto mesos >>> >>> Thanks, >>> >>> Guangya >>> >>> >>> On Mon, Oct 5, 2015 at 6:50 PM, Pradeep Kiruvale < >>> [email protected]> wrote: >>> >>>> Hi Guangya, >>>> >>>> I am facing one more issue. If I try to schedule the tasks from some >>>> external client system running the same cli mesos-execute. >>>> The tasks are not getting launched. The tasks reach the Master and it >>>> just drops the requests, below are the logs related to that >>>> >>>> I1005 11:33:35.025594 21369 master.cpp:2250] Subscribing framework >>>> with checkpointing disabled and capabilities [ ] >>>> E1005 11:33:35.026100 21373 process.cpp:1912] Failed to shutdown socket >>>> with fd 14: Transport endpoint is not connected >>>> I1005 11:33:35.026129 21372 hierarchical.hpp:515] Added framework >>>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 >>>> I1005 11:33:35.026298 21369 master.cpp:1119] Framework >>>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at >>>> [email protected]:47259 >>>> disconnected >>>> I1005 11:33:35.026329 21369 master.cpp:2475] Disconnecting framework >>>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at >>>> [email protected]:47259 >>>> I1005 11:33:35.026340 21369 master.cpp:2499] Deactivating framework >>>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at >>>> [email protected]:47259 >>>> E1005 11:33:35.026345 21373 process.cpp:1912] Failed to shutdown socket >>>> with fd 14: Transport endpoint is not connected >>>> I1005 11:33:35.026376 21369 master.cpp:1143] Giving framework >>>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at >>>> [email protected]:47259 0ns to >>>> failover >>>> I1005 11:33:35.026743 21372 hierarchical.hpp:599] Deactivated framework >>>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 >>>> W1005 11:33:35.026757 21368 master.cpp:4828] Master returning resources >>>> offered to framework 77539063-89ce-4efa-a20b-ca788abbd912-0055 because the >>>> framework has terminated or is inactive >>>> I1005 11:33:35.027014 21371 hierarchical.hpp:1103] Recovered cpus(*):8; >>>> mem(*):14868; disk(*):218835; ports(*):[31000-32000] (total: cpus(*):8; >>>> mem(*):14868; disk(*):218835; ports(*):[31000-32000], allocated: ) on slave >>>> 77539063-89ce-4efa-a20b-ca788abbd912-S2 from framework >>>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 >>>> I1005 11:33:35.027159 21371 hierarchical.hpp:1103] Recovered cpus(*):8; >>>> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (total: cpus(*):8; >>>> mem(*):14930; disk(*):218578; ports(*):[31000-32000], allocated: ) on slave >>>> 77539063-89ce-4efa-a20b-ca788abbd912-S1 from framework >>>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 >>>> I1005 11:33:35.027668 21366 master.cpp:4815] Framework failover >>>> timeout, removing framework 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at >>>> [email protected]:47259 >>>> I1005 11:33:35.027715 21366 master.cpp:5571] Removing framework >>>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at >>>> [email protected]:47259 >>>> >>>> >>>> Can you please tell me what is the reason? The client is in the same >>>> network as well. But it does not run any master or slave processes. >>>> >>>> Thanks & Regards, >>>> Pradeeep >>>> >>>> On 5 October 2015 at 12:13, Guangya Liu <[email protected]> wrote: >>>> >>>>> Hi Pradeep, >>>>> >>>>> Glad it finally works! Not sure if you are using systemd.slice or not, >>>>> are you running to this issue: >>>>> https://issues.apache.org/jira/browse/MESOS-1195 >>>>> >>>>> Hope Jie Yu can give you some help on this ;-) >>>>> >>>>> Thanks, >>>>> >>>>> Guangya >>>>> >>>>> On Mon, Oct 5, 2015 at 5:25 PM, Pradeep Kiruvale < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Guangya, >>>>>> >>>>>> >>>>>> Thanks for sharing the information. >>>>>> >>>>>> Now I could launch the tasks. The problem was with the permission. If >>>>>> I start all the slaves and Master as root it works fine. >>>>>> Else I have problem with launching the tasks. >>>>>> >>>>>> But on one of the slave I could not launch the slave as root, I am >>>>>> facing the following issue. >>>>>> >>>>>> Failed to create a containerizer: Could not create >>>>>> MesosContainerizer: Failed to create launcher: Failed to create Linux >>>>>> launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/freezer': >>>>>> 'freezer' is already attached to another hierarchy >>>>>> >>>>>> I took that out from the cluster for now. The tasks are getting >>>>>> scheduled on the other two slave nodes. >>>>>> >>>>>> Thanks for your timely help >>>>>> >>>>>> -Pradeep >>>>>> >>>>>> On 5 October 2015 at 10:54, Guangya Liu <[email protected]> wrote: >>>>>> >>>>>>> Hi Pradeep, >>>>>>> >>>>>>> My steps was pretty simple just as >>>>>>> https://github.com/apache/mesos/blob/master/docs/getting-started.md#examples >>>>>>> >>>>>>> On Master node: root@mesos1:~/src/mesos/m1/mesos/build# GLOG_v=1 >>>>>>> ./bin/mesos-master.sh --ip=192.168.0.107 --work_dir=/var/lib/mesos >>>>>>> On 3 Slave node: root@mesos007:~/src/mesos/m1/mesos/build# GLOG_v=1 >>>>>>> ./bin/mesos-slave.sh --master=192.168.0.107:5050 >>>>>>> >>>>>>> Then schedule a task on any of the node, here I was using slave node >>>>>>> mesos007, you can see that the two tasks was launched on different host. >>>>>>> >>>>>>> root@mesos007:~/src/mesos/m1/mesos/build# ./src/mesos-execute >>>>>>> --master=192.168.0.107:5050 --name="cluster-test" >>>>>>> --command="/bin/sleep 100" --resources="cpus(*):1;mem(*):256" >>>>>>> I1005 16:49:11.013432 2971 sched.cpp:164] Version: 0.26.0 >>>>>>> I1005 16:49:11.027802 2992 sched.cpp:262] New master detected at >>>>>>> [email protected]:5050 >>>>>>> I1005 16:49:11.029579 2992 sched.cpp:272] No credentials provided. >>>>>>> Attempting to register without authentication >>>>>>> I1005 16:49:11.038182 2985 sched.cpp:641] Framework registered with >>>>>>> c0e5fdde-595e-4768-9d04-25901d4523b6-0002 >>>>>>> Framework registered with c0e5fdde-595e-4768-9d04-25901d4523b6-0002 >>>>>>> task cluster-test submitted to slave >>>>>>> c0e5fdde-595e-4768-9d04-25901d4523b6-S0 <<<<<<<<<<<<<<<<<< >>>>>>> Received status update TASK_RUNNING for task cluster-test >>>>>>> ^C >>>>>>> root@mesos007:~/src/mesos/m1/mesos/build# ./src/mesos-execute >>>>>>> --master=192.168.0.107:5050 --name="cluster-test" >>>>>>> --command="/bin/sleep 100" --resources="cpus(*):1;mem(*):256" >>>>>>> I1005 16:50:18.346984 3036 sched.cpp:164] Version: 0.26.0 >>>>>>> I1005 16:50:18.366114 3055 sched.cpp:262] New master detected at >>>>>>> [email protected]:5050 >>>>>>> I1005 16:50:18.368010 3055 sched.cpp:272] No credentials provided. >>>>>>> Attempting to register without authentication >>>>>>> I1005 16:50:18.376338 3056 sched.cpp:641] Framework registered with >>>>>>> c0e5fdde-595e-4768-9d04-25901d4523b6-0003 >>>>>>> Framework registered with c0e5fdde-595e-4768-9d04-25901d4523b6-0003 >>>>>>> task cluster-test submitted to slave >>>>>>> c0e5fdde-595e-4768-9d04-25901d4523b6-S1 <<<<<<<<<<<<<<<<<<<< >>>>>>> Received status update TASK_RUNNING for task cluster-test >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Guangya >>>>>>> >>>>>>> On Mon, Oct 5, 2015 at 4:21 PM, Pradeep Kiruvale < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Guangya, >>>>>>>> >>>>>>>> Thanks for your reply. >>>>>>>> >>>>>>>> I just want to know how did you launch the tasks. >>>>>>>> >>>>>>>> 1. What processes you have started on Master? >>>>>>>> 2. What are the processes you have started on Slaves? >>>>>>>> >>>>>>>> I am missing something here, otherwise all my slave have enough >>>>>>>> memory and cpus to launch the tasks I mentioned. >>>>>>>> What I am missing is some configuration steps. >>>>>>>> >>>>>>>> Thanks & Regards, >>>>>>>> Pradeep >>>>>>>> >>>>>>>> >>>>>>>> On 3 October 2015 at 13:14, Guangya Liu <[email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Pradeep, >>>>>>>>> >>>>>>>>> I did some test with your case and found that the task can run >>>>>>>>> randomly on the three slave hosts, every time may have different >>>>>>>>> result. >>>>>>>>> The logic is here: >>>>>>>>> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.hpp#L1263-#L1266 >>>>>>>>> The allocator will help random shuffle the slaves every time when >>>>>>>>> allocate resources for offers. >>>>>>>>> >>>>>>>>> I see that every of your task need the minimum resources as " >>>>>>>>> resources="cpus(*):3;mem(*):2560", can you help check if all of >>>>>>>>> your slaves have enough resources? If you want your task run on other >>>>>>>>> slaves, then those slaves need to have at least 3 cpus and 2550M >>>>>>>>> memory. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> On Fri, Oct 2, 2015 at 9:26 PM, Pradeep Kiruvale < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Ondrej, >>>>>>>>>> >>>>>>>>>> Thanks for your reply >>>>>>>>>> >>>>>>>>>> I did solve that issue, yes you are right there was an issue with >>>>>>>>>> slave IP address setting. >>>>>>>>>> >>>>>>>>>> Now I am facing issue with the scheduling the tasks. When I try >>>>>>>>>> to schedule a task using >>>>>>>>>> >>>>>>>>>> /src/mesos-execute --master=192.168.0.102:5050 >>>>>>>>>> --name="cluster-test" --command="/usr/bin/hackbench -s 4096 -l >>>>>>>>>> 10845760 -g >>>>>>>>>> 2 -f 2 -P" --resources="cpus(*):3;mem(*):2560" >>>>>>>>>> >>>>>>>>>> The tasks always get scheduled on the same node. The resources >>>>>>>>>> from the other nodes are not getting used to schedule the tasks. >>>>>>>>>> >>>>>>>>>> I just start the mesos slaves like below >>>>>>>>>> >>>>>>>>>> ./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos >>>>>>>>>> --hostname=slave1 >>>>>>>>>> >>>>>>>>>> If I submit the task using the above (mesos-execute) command from >>>>>>>>>> same as one of the slave it runs on that system. >>>>>>>>>> >>>>>>>>>> But when I submit the task from some different system. It uses >>>>>>>>>> just that system and queues the tasks not runs on the other slaves. >>>>>>>>>> Some times I see the message "Failed to getgid: unknown user" >>>>>>>>>> >>>>>>>>>> Do I need to start some process to push the task on all the >>>>>>>>>> slaves equally? Am I missing something here? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Pradeep >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2 October 2015 at 15:07, Ondrej Smola <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Pradeep, >>>>>>>>>>> >>>>>>>>>>> the problem is with IP your slave advertise - mesos by default >>>>>>>>>>> resolves your hostname - there are several solutions (let say your >>>>>>>>>>> node ip >>>>>>>>>>> is 192.168.56.128) >>>>>>>>>>> >>>>>>>>>>> 1) export LIBPROCESS_IP=192.168.56.128 >>>>>>>>>>> 2) set mesos options - ip, hostname >>>>>>>>>>> >>>>>>>>>>> one way to do this is to create files >>>>>>>>>>> >>>>>>>>>>> echo "192.168.56.128" > /etc/mesos-slave/ip >>>>>>>>>>> echo "abc.mesos.com" > /etc/mesos-slave/hostname >>>>>>>>>>> >>>>>>>>>>> for more configuration options see >>>>>>>>>>> http://mesos.apache.org/documentation/latest/configuration >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale < >>>>>>>>>>> [email protected]>: >>>>>>>>>>> >>>>>>>>>>>> Hi Guangya, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for reply. I found one interesting log message. >>>>>>>>>>>> >>>>>>>>>>>> 7410 master.cpp:5977] Removed slave >>>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new >>>>>>>>>>>> slave >>>>>>>>>>>> registered at the same address >>>>>>>>>>>> >>>>>>>>>>>> Mostly because of this issue, the systems/slave nodes are >>>>>>>>>>>> getting registered and de-registered to make a room for the next >>>>>>>>>>>> node. I >>>>>>>>>>>> can even see this on >>>>>>>>>>>> the UI interface, for some time one node got added and after >>>>>>>>>>>> some time that will be replaced with the new slave node. >>>>>>>>>>>> >>>>>>>>>>>> The above log is followed by the below log messages. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I1002 10:01:12.753865 7416 leveldb.cpp:343] Persisting action >>>>>>>>>>>> (18 bytes) to leveldb took 104089ns >>>>>>>>>>>> I1002 10:01:12.753885 7416 replica.cpp:679] Persisted action >>>>>>>>>>>> at 384 >>>>>>>>>>>> E1002 10:01:12.753891 7417 process.cpp:1912] Failed to >>>>>>>>>>>> shutdown socket with fd 15: Transport endpoint is not connected >>>>>>>>>>>> I1002 10:01:12.753988 7413 master.cpp:3930] Registered slave >>>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@ >>>>>>>>>>>> 127.0.1.1:5051 (192.168.0.116) with cpus(*):8; mem(*):14930; >>>>>>>>>>>> disk(*):218578; ports(*):[31000-32000] >>>>>>>>>>>> I1002 10:01:12.754065 7413 master.cpp:1080] Slave >>>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@ >>>>>>>>>>>> 127.0.1.1:5051 (192.168.0.116) disconnected >>>>>>>>>>>> I1002 10:01:12.754072 7416 hierarchical.hpp:675] Added slave >>>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with >>>>>>>>>>>> cpus(*):8; >>>>>>>>>>>> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: ) >>>>>>>>>>>> I1002 10:01:12.754084 7413 master.cpp:2534] Disconnecting >>>>>>>>>>>> slave 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@ >>>>>>>>>>>> 127.0.1.1:5051 (192.168.0.116) >>>>>>>>>>>> E1002 10:01:12.754118 7417 process.cpp:1912] Failed to >>>>>>>>>>>> shutdown socket with fd 16: Transport endpoint is not connected >>>>>>>>>>>> I1002 10:01:12.754132 7413 master.cpp:2553] Deactivating slave >>>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@ >>>>>>>>>>>> 127.0.1.1:5051 (192.168.0.116) >>>>>>>>>>>> I1002 10:01:12.754237 7416 hierarchical.hpp:768] Slave >>>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated >>>>>>>>>>>> I1002 10:01:12.754240 7413 replica.cpp:658] Replica received >>>>>>>>>>>> learned notice for position 384 >>>>>>>>>>>> I1002 10:01:12.754360 7413 leveldb.cpp:343] Persisting action >>>>>>>>>>>> (20 bytes) to leveldb took 95171ns >>>>>>>>>>>> I1002 10:01:12.754395 7413 leveldb.cpp:401] Deleting ~2 keys >>>>>>>>>>>> from leveldb took 20333ns >>>>>>>>>>>> I1002 10:01:12.754406 7413 replica.cpp:679] Persisted action >>>>>>>>>>>> at 384 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Pradeep >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 2 October 2015 at 02:35, Guangya Liu <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Pradeep, >>>>>>>>>>>>> >>>>>>>>>>>>> Please check some of my questions in line. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Guangya >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am new to Mesos. I have set up a Mesos cluster with 1 >>>>>>>>>>>>>> Master and 3 Slaves. >>>>>>>>>>>>>> >>>>>>>>>>>>>> One slave runs on the Master Node itself and Other slaves run >>>>>>>>>>>>>> on different nodes. Here node means the physical boxes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I tried running the tasks by configuring one Node cluster. >>>>>>>>>>>>>> Tested the task scheduling using mesos-execute, works fine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> When I configure three Node cluster (1master and 3 slaves) >>>>>>>>>>>>>> and try to see the resources on the master (in GUI) only the >>>>>>>>>>>>>> Master node >>>>>>>>>>>>>> resources are visible. >>>>>>>>>>>>>> The other nodes resources are not visible. Some times >>>>>>>>>>>>>> visible but in a de-actived state. >>>>>>>>>>>>>> >>>>>>>>>>>>> Can you please append some logs from mesos-slave and >>>>>>>>>>>>> mesos-master? There should be some logs in either master or slave >>>>>>>>>>>>> telling >>>>>>>>>>>>> you what is wrong. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Please let me know what could be the reason. All the nodes >>>>>>>>>>>>>> are in the same network. * >>>>>>>>>>>>>> >>>>>>>>>>>>>> When I try to schedule a task using >>>>>>>>>>>>>> >>>>>>>>>>>>>> /src/mesos-execute --master=192.168.0.102:5050 >>>>>>>>>>>>>> --name="cluster-test" --command="/usr/bin/hackbench -s 4096 -l >>>>>>>>>>>>>> 10845760 -g >>>>>>>>>>>>>> 2 -f 2 -P" --resources="cpus(*):3;mem(*):2560" >>>>>>>>>>>>>> >>>>>>>>>>>>>> The tasks always get scheduled on the same node. The >>>>>>>>>>>>>> resources from the other nodes are not getting used to schedule >>>>>>>>>>>>>> the tasks. >>>>>>>>>>>>>> >>>>>>>>>>>>> Based on your previous question, there is only one node in >>>>>>>>>>>>> your cluster, that's why other nodes are not available. We need >>>>>>>>>>>>> first >>>>>>>>>>>>> identify what is wrong with other three nodes first. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I*s it required to register the frameworks from every slave >>>>>>>>>>>>>> node on the Master?* >>>>>>>>>>>>>> >>>>>>>>>>>>> It is not required. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *I have configured this cluster using the git-hub code.* >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks & Regards, >>>>>>>>>>>>>> Pradeep >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >

