Hi, Tim I've reproduced and taken debug logs(attached). I cannot understand what is going on, but it seems that the slave is repeatedly sending ACCEPT message to master.
Please have your comment. Best Regards, Mitsutoshi Kiuchi 2015-11-24 5:28 GMT+09:00 Tim Chen <[email protected]>: > Hi Mitsutoshi, > > Can you enable TRACING log on Spark (modify your log4j.properties file)? > > It should have more information on why offers are being rejected, but most > of the time it's due to not enough resources in your cluster to satifsy > launching your Spark job. You can either increase your slave(s) resources > or lower your cpu/memory requirement for your job through configuration. > > Tim > > On Mon, Nov 23, 2015 at 6:30 AM, 木内満歳 <[email protected]> wrote: > >> Hi, >> >> I'm experiencing that some spark task on Mesos 0.25 occasionally won't >> start. >> Please tell some advice how to see more detail against it. >> >> Here is the slave log about bad task >> >> Nov 23 08:54:26 mesos-s2 mesos-slave[18499]: I1123 08:54:26.677291 18516 >> slave.cpp:2379] Got registration for executor >> '235498ca-6603-4cfe-bfc7-94005bb235fb-S5' of framework >> 235498ca-6603-4cfe-bfc7-94005bb235fb-1442 from executor(1)@ >> 10.130.91.16:60295 >> Nov 23 08:54:26 mesos-s2 mesos-slave[18499]: I1123 08:54:26.679875 18516 >> slave.cpp:1760] Sending queued task '0' to executor >> '235498ca-6603-4cfe-bfc7-94005bb235fb-S5' of framework >> 235498ca-6603-4cfe-bfc7-94005bb235fb-1442 >> (no more log about this task) >> >> When task succeed to run, slave log shows like that. >> >> Nov 23 08:44:39 al-mesos-s3 mesos-slave[8644]: I1123 08:44:39.637285 >> 8658 slave.cpp:2379] Got registration for executor >> '235498ca-6603-4cfe-bfc7-94005bb235fb-S6' of framework >> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 from executor(1)@ >> 10.130.98.65:52273 >> Nov 23 08:44:39 al-mesos-s3 mesos-slave[8644]: I1123 08:44:39.639233 >> 8658 slave.cpp:1760] Sending queued task '6' to executor >> '235498ca-6603-4cfe-bfc7-94005bb235fb-S6' of framework >> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 >> Nov 23 08:44:42 al-mesos-s3 mesos-slave[8644]: I1123 08:44:42.608182 >> 8658 slave.cpp:2717] Handling status update TASK_RUNNING (UUID: >> ff5a2278-0753-4541-bd33-a55f3a09fb69) for task 6 of framework >> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 from executor(1)@ >> 10.130.98.65:52273 >> Nov 23 08:44:42 al-mesos-s3 mesos-slave[8644]: I1123 08:44:42.612318 >> 8658 status_update_manager.cpp:322] Received status update TASK_RUNNING >> (UUID: ff5a2278-0753-4541-bd33-a55f3a09fb69) for task 6 of framework >> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 >> >> Any advice is welcome. >> >> Best Regards, >> Mitsutoshi Kiuchi >> >> >
log.driverStdErr.gz
Description: GNU Zip compressed data
log.mesosMaster.gz
Description: GNU Zip compressed data
log.mesosSlave.gz
Description: GNU Zip compressed data

