Re: Executor asked to shutdown

Srinivas Murthy Fri, 09 Jan 2015 09:32:59 -0800

Phew!! Finally. I had multiple versions of Java 7 and 6, and after fixing
that and the meta clean up, I am able to get it working. Thank you very
much, Adam!


On Thu, Jan 8, 2015 at 12:22 PM, Adam Bordelon <[email protected]> wrote:

> Looks like that slave was unavailable for a while, so the master removed
> its slaveID as 'shutdown'.
> If you restart the slave, it should reset & register as a new slaveID.
> But if you want to be extra sure, wipe the contents of
> `/var/run/mesos/meta` and then restart the slave.
>
> On Thu, Jan 8, 2015 at 12:10 PM, Srinivas Murthy <[email protected]>
> wrote:
>
>> Duh, so much for my diligence :-)
>>
>> On Thu, Jan 8, 2015 at 12:09 PM, Srinivas Murthy <[email protected]>
>> wrote:
>>
>>> Running on machine:xxxx
>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>> I0107 12:07:44.205533 26720 logging.cpp:172] INFO level logging started!
>>> I0107 12:07:44.205888 26720 main.cpp:142] Build: 2014-12-23 10:33:15 by
>>> root
>>> I0107 12:07:44.205914 26720 main.cpp:144] Version: 0.21.0
>>> I0107 12:07:44.206110 26720 containerizer.cpp:100] Using isolation:
>>> posix/cpu,posix/mem
>>> I0107 12:07:44.207408 26720 main.cpp:165] Starting Mesos slave
>>> I0107 12:07:44.210654 26720 slave.cpp:169] Slave started on 1)@
>>> 10.122.21.21:5051
>>> I0107 12:07:44.211130 26720 slave.cpp:289] Slave resources: cpus(*):4;
>>> mem(*):6816; disk(*):19825; ports(*):[31000-32000
>>> ]
>>> I0107 12:07:44.211328 26720 slave.cpp:318] Slave hostname: xxxx
>>> I0107 12:07:44.211362 26720 slave.cpp:319] Slave checkpoint: true
>>> I0107 12:07:44.218703 26728 state.cpp:33] Recovering state from
>>> '/var/run/mesos/meta'
>>> I0107 12:07:44.219048 26725 group.cpp:313] Group process (group(1)@
>>> 10.122.21.21:5051) connected to ZooKeeper
>>> I0107 12:07:44.219140 26725 group.cpp:790] Syncing group operations:
>>> queue size (joins, cancels, datas) = (0, 0, 0)
>>> I0107 12:07:44.219173 26725 group.cpp:385] Trying to create path
>>> '/mesos' in ZooKeeper
>>> I0107 12:07:44.221113 26723 status_update_manager.cpp:197] Recovering
>>> status update manager
>>> I0107 12:07:44.221750 26721 containerizer.cpp:281] Recovering
>>> containerizer
>>> I0107 12:07:44.222080 26725 detector.cpp:138] Detected a new leader:
>>> (id='8')
>>> I0107 12:07:44.222859 26725 group.cpp:659] Trying to get
>>> '/mesos/info_0000000008' in ZooKeeper
>>> I0107 12:07:44.223629 26721 slave.cpp:3466] Finished recovery
>>> I0107 12:07:44.226488 26726 detector.cpp:433] A new leading master (UPID=
>>> [email protected]:5050) is detected
>>> I0107 12:07:44.226738 26726 slave.cpp:602] New master detected at
>>> master@mymaster:5050
>>> I0107 12:07:44.226922 26726 slave.cpp:627] No credentials provided.
>>> Attempting to register without authentication
>>> I0107 12:07:44.227015 26726 slave.cpp:638] Detecting new master
>>> I0107 12:07:44.227149 26726 status_update_manager.cpp:171] Pausing
>>> sending status updates
>>> I0107 12:07:44.991296 26721 slave.cpp:526] Slave asked to shut down by
>>> master@mymaster:5050 because 'Slave attempted
>>>  to re-register after removal'
>>> I0107 12:07:44.991412 26721 slave.cpp:484] Slave terminating
>>>
>>> I have masked some IP addresses from these log entries
>>>
>>> On Thu, Jan 8, 2015 at 11:53 AM, Adam Bordelon <[email protected]>
>>> wrote:
>>>
>>>> There should be a WARNING log line in the mesos slave log (typically
>>>> /var/log/mesos/mesos-slave.INFO) that says "Shutting down executor ...
>>>> because ..." probably right after the line that says "Got registration for
>>>> executor ..."
>>>> Can you post a gist of the relevant slave log lines?
>>>>
>>>> On Thu, Jan 8, 2015 at 11:39 AM, Srinivas Murthy <[email protected]
>>>> > wrote:
>>>>
>>>>> Its a custom executor, I can see each of the nodes have
>>>>> /tmp/mesos/...executors/..runs/../latest with stderr and stdout, along 
>>>>> with
>>>>> the jar file.
>>>>> My stdout, is blank, while the stderr has "Executor asked to shutdown"
>>>>> as its last line, after the URI is accessed and the resource jar is
>>>>> fetched..
>>>>>
>>>>>
>>>>> On Thu, Jan 8, 2015 at 11:29 AM, Adam Bordelon <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Is your "adhoc framework" using the default Mesos executor, or does
>>>>>> it use a custom executor?
>>>>>> You can check the task/executor's sandbox from the Mesos web UI, to
>>>>>> see if the custom executor or other URIs were properly downloaded, and to
>>>>>> view the stdout/stderr of the executor/task.
>>>>>>
>>>>>> On Thu, Jan 8, 2015 at 10:14 AM, Srinivas Murthy <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> I am running a cluster with one master node and three slaves.
>>>>>>> Just got hold of a tutorial code from Git that runs an adhoc
>>>>>>> framework written in Java, nothing fancy.
>>>>>>> All I am getting is " Executor asked to shutdown" and the code exits
>>>>>>> gracefully, no exceptions. I am trying to put some logging statements in
>>>>>>> all the callback functions, but looks like the Executors are invoked but
>>>>>>> never run.
>>>>>>> Any clues on how to debug this?
>>>>>>> I am running Mesos 0.21 and  JDK 1.7.55.
>>>>>>>
>>>>>>> Regards
>>>>>>> Srinivas
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Executor asked to shutdown

Reply via email to