Phew!! Finally. I had multiple versions of Java 7 and 6, and after fixing that and the meta clean up, I am able to get it working. Thank you very much, Adam!
On Thu, Jan 8, 2015 at 12:22 PM, Adam Bordelon <[email protected]> wrote: > Looks like that slave was unavailable for a while, so the master removed > its slaveID as 'shutdown'. > If you restart the slave, it should reset & register as a new slaveID. > But if you want to be extra sure, wipe the contents of > `/var/run/mesos/meta` and then restart the slave. > > On Thu, Jan 8, 2015 at 12:10 PM, Srinivas Murthy <[email protected]> > wrote: > >> Duh, so much for my diligence :-) >> >> On Thu, Jan 8, 2015 at 12:09 PM, Srinivas Murthy <[email protected]> >> wrote: >> >>> Running on machine:xxxx >>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg >>> I0107 12:07:44.205533 26720 logging.cpp:172] INFO level logging started! >>> I0107 12:07:44.205888 26720 main.cpp:142] Build: 2014-12-23 10:33:15 by >>> root >>> I0107 12:07:44.205914 26720 main.cpp:144] Version: 0.21.0 >>> I0107 12:07:44.206110 26720 containerizer.cpp:100] Using isolation: >>> posix/cpu,posix/mem >>> I0107 12:07:44.207408 26720 main.cpp:165] Starting Mesos slave >>> I0107 12:07:44.210654 26720 slave.cpp:169] Slave started on 1)@ >>> 10.122.21.21:5051 >>> I0107 12:07:44.211130 26720 slave.cpp:289] Slave resources: cpus(*):4; >>> mem(*):6816; disk(*):19825; ports(*):[31000-32000 >>> ] >>> I0107 12:07:44.211328 26720 slave.cpp:318] Slave hostname: xxxx >>> I0107 12:07:44.211362 26720 slave.cpp:319] Slave checkpoint: true >>> I0107 12:07:44.218703 26728 state.cpp:33] Recovering state from >>> '/var/run/mesos/meta' >>> I0107 12:07:44.219048 26725 group.cpp:313] Group process (group(1)@ >>> 10.122.21.21:5051) connected to ZooKeeper >>> I0107 12:07:44.219140 26725 group.cpp:790] Syncing group operations: >>> queue size (joins, cancels, datas) = (0, 0, 0) >>> I0107 12:07:44.219173 26725 group.cpp:385] Trying to create path >>> '/mesos' in ZooKeeper >>> I0107 12:07:44.221113 26723 status_update_manager.cpp:197] Recovering >>> status update manager >>> I0107 12:07:44.221750 26721 containerizer.cpp:281] Recovering >>> containerizer >>> I0107 12:07:44.222080 26725 detector.cpp:138] Detected a new leader: >>> (id='8') >>> I0107 12:07:44.222859 26725 group.cpp:659] Trying to get >>> '/mesos/info_0000000008' in ZooKeeper >>> I0107 12:07:44.223629 26721 slave.cpp:3466] Finished recovery >>> I0107 12:07:44.226488 26726 detector.cpp:433] A new leading master (UPID= >>> [email protected]:5050) is detected >>> I0107 12:07:44.226738 26726 slave.cpp:602] New master detected at >>> master@mymaster:5050 >>> I0107 12:07:44.226922 26726 slave.cpp:627] No credentials provided. >>> Attempting to register without authentication >>> I0107 12:07:44.227015 26726 slave.cpp:638] Detecting new master >>> I0107 12:07:44.227149 26726 status_update_manager.cpp:171] Pausing >>> sending status updates >>> I0107 12:07:44.991296 26721 slave.cpp:526] Slave asked to shut down by >>> master@mymaster:5050 because 'Slave attempted >>> to re-register after removal' >>> I0107 12:07:44.991412 26721 slave.cpp:484] Slave terminating >>> >>> I have masked some IP addresses from these log entries >>> >>> On Thu, Jan 8, 2015 at 11:53 AM, Adam Bordelon <[email protected]> >>> wrote: >>> >>>> There should be a WARNING log line in the mesos slave log (typically >>>> /var/log/mesos/mesos-slave.INFO) that says "Shutting down executor ... >>>> because ..." probably right after the line that says "Got registration for >>>> executor ..." >>>> Can you post a gist of the relevant slave log lines? >>>> >>>> On Thu, Jan 8, 2015 at 11:39 AM, Srinivas Murthy <[email protected] >>>> > wrote: >>>> >>>>> Its a custom executor, I can see each of the nodes have >>>>> /tmp/mesos/...executors/..runs/../latest with stderr and stdout, along >>>>> with >>>>> the jar file. >>>>> My stdout, is blank, while the stderr has "Executor asked to shutdown" >>>>> as its last line, after the URI is accessed and the resource jar is >>>>> fetched.. >>>>> >>>>> >>>>> On Thu, Jan 8, 2015 at 11:29 AM, Adam Bordelon <[email protected]> >>>>> wrote: >>>>> >>>>>> Is your "adhoc framework" using the default Mesos executor, or does >>>>>> it use a custom executor? >>>>>> You can check the task/executor's sandbox from the Mesos web UI, to >>>>>> see if the custom executor or other URIs were properly downloaded, and to >>>>>> view the stdout/stderr of the executor/task. >>>>>> >>>>>> On Thu, Jan 8, 2015 at 10:14 AM, Srinivas Murthy < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I am running a cluster with one master node and three slaves. >>>>>>> Just got hold of a tutorial code from Git that runs an adhoc >>>>>>> framework written in Java, nothing fancy. >>>>>>> All I am getting is " Executor asked to shutdown" and the code exits >>>>>>> gracefully, no exceptions. I am trying to put some logging statements in >>>>>>> all the callback functions, but looks like the Executors are invoked but >>>>>>> never run. >>>>>>> Any clues on how to debug this? >>>>>>> I am running Mesos 0.21 and JDK 1.7.55. >>>>>>> >>>>>>> Regards >>>>>>> Srinivas >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >

