I've acquired the debug log on slave side(attached).
It seems that MesosExecutorBackend lanched with null backend.

15/11/24 22:31:26 ERROR MesosExecutorBackend: Received launchTask but
executor was null

Anyway, I'll increase memory size of each slave.

Best Regards,
Mitsutoshi Kiuchi


2015-11-24 15:18 GMT+09:00 木内満歳 <[email protected]>:

> Hi, Tim
>
> I've reproduced and taken debug logs(attached).
> I cannot understand what is going on, but it seems that the slave is
> repeatedly sending ACCEPT message to master.
>
> Please have your comment.
>
> Best Regards,
> Mitsutoshi Kiuchi
>
>
> 2015-11-24 5:28 GMT+09:00 Tim Chen <[email protected]>:
>
>> Hi Mitsutoshi,
>>
>> Can you enable TRACING log on Spark (modify your log4j.properties file)?
>>
>> It should have more information on why offers are being rejected, but
>> most of the time it's due to not enough resources in your cluster to
>> satifsy launching your Spark job. You can either increase your slave(s)
>> resources or lower your cpu/memory requirement for your job through
>> configuration.
>>
>> Tim
>>
>> On Mon, Nov 23, 2015 at 6:30 AM, 木内満歳 <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I'm experiencing that some spark task on Mesos 0.25 occasionally won't
>>> start.
>>> Please tell some advice how to see more detail against it.
>>>
>>> Here is the slave log about bad task
>>>
>>> Nov 23 08:54:26 mesos-s2 mesos-slave[18499]: I1123 08:54:26.677291 18516
>>> slave.cpp:2379] Got registration for executor
>>> '235498ca-6603-4cfe-bfc7-94005bb235fb-S5' of framework
>>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1442 from executor(1)@
>>> 10.130.91.16:60295
>>> Nov 23 08:54:26 mesos-s2 mesos-slave[18499]: I1123 08:54:26.679875 18516
>>> slave.cpp:1760] Sending queued task '0' to executor
>>> '235498ca-6603-4cfe-bfc7-94005bb235fb-S5' of framework
>>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1442
>>> (no more log about this task)
>>>
>>> When task succeed to run, slave log shows like that.
>>>
>>> Nov 23 08:44:39 al-mesos-s3 mesos-slave[8644]: I1123 08:44:39.637285
>>> 8658 slave.cpp:2379] Got registration for executor
>>> '235498ca-6603-4cfe-bfc7-94005bb235fb-S6' of framework
>>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 from executor(1)@
>>> 10.130.98.65:52273
>>> Nov 23 08:44:39 al-mesos-s3 mesos-slave[8644]: I1123 08:44:39.639233
>>> 8658 slave.cpp:1760] Sending queued task '6' to executor
>>> '235498ca-6603-4cfe-bfc7-94005bb235fb-S6' of framework
>>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437
>>> Nov 23 08:44:42 al-mesos-s3 mesos-slave[8644]: I1123 08:44:42.608182
>>> 8658 slave.cpp:2717] Handling status update TASK_RUNNING (UUID:
>>> ff5a2278-0753-4541-bd33-a55f3a09fb69) for task 6 of framework
>>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 from executor(1)@
>>> 10.130.98.65:52273
>>> Nov 23 08:44:42 al-mesos-s3 mesos-slave[8644]: I1123 08:44:42.612318
>>> 8658 status_update_manager.cpp:322] Received status update TASK_RUNNING
>>> (UUID: ff5a2278-0753-4541-bd33-a55f3a09fb69) for task 6 of framework
>>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437
>>>
>>> Any advice is welcome.
>>>
>>> Best Regards,
>>> Mitsutoshi Kiuchi
>>>
>>>
>>
>

Attachment: stderr.jobSuccessfullyStart
Description: Binary data

Attachment: stderr.jobNeverStart
Description: Binary data

Reply via email to