Ah, I see, I am experiencing a similar thing with fine-grained where one of the tasks would stay in staging and fail the whole job but never in coarse mode.
Kind regards, Radek Gruchalski [email protected] (mailto:[email protected]) (mailto:[email protected]) de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/) Confidentiality: This communication is intended for the above-named person and may be confidential and/or legally privileged. If it has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender immediately. On Tuesday, 24 November 2015 at 10:07, 木内満歳 wrote: > Hi Rad, > > I've tried both. I've experienced same symptom on both case. > > Thanks, > Mitsutoshi Kiuchi > > > 2015-11-24 17:57 GMT+09:00 Rad Gruchalski <[email protected] > (mailto:[email protected])>: > > Mitsutoshi, > > > > Is this in a fine-grained mode? > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > Radek Gruchalski > > [email protected] (mailto:[email protected]) > > (mailto:[email protected]) > > de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/) > > > > Confidentiality: > > This communication is intended for the above-named person and may be > > confidential and/or legally privileged. > > If it has come to you in error you must take no action based on it, nor > > must you copy or show it to anyone; please delete/destroy and inform the > > sender immediately. > > > > > > > > On Tuesday, 24 November 2015 at 07:18, 木内満歳 wrote: > > > > > Hi, Tim > > > > > > I've reproduced and taken debug logs(attached). > > > I cannot understand what is going on, but it seems that the slave is > > > repeatedly sending ACCEPT message to master. > > > > > > Please have your comment. > > > > > > Best Regards, > > > Mitsutoshi Kiuchi > > > > > > > > > 2015-11-24 5:28 GMT+09:00 Tim Chen <[email protected] > > > (mailto:[email protected])>: > > > > Hi Mitsutoshi, > > > > > > > > Can you enable TRACING log on Spark (modify your log4j.properties file)? > > > > > > > > It should have more information on why offers are being rejected, but > > > > most of the time it's due to not enough resources in your cluster to > > > > satifsy launching your Spark job. You can either increase your slave(s) > > > > resources or lower your cpu/memory requirement for your job through > > > > configuration. > > > > > > > > Tim > > > > > > > > On Mon, Nov 23, 2015 at 6:30 AM, 木内満歳 <[email protected] > > > > (mailto:[email protected])> wrote: > > > > > Hi, > > > > > > > > > > I'm experiencing that some spark task on Mesos 0.25 occasionally > > > > > won't start. > > > > > Please tell some advice how to see more detail against it. > > > > > > > > > > Here is the slave log about bad task > > > > > > > > > > Nov 23 08:54:26 mesos-s2 mesos-slave[18499]: I1123 08:54:26.677291 > > > > > 18516 slave.cpp:2379] Got registration for executor > > > > > '235498ca-6603-4cfe-bfc7-94005bb235fb-S5' of framework > > > > > 235498ca-6603-4cfe-bfc7-94005bb235fb-1442 from > > > > > executor(1)@10.130.91.16:60295 (http://10.130.91.16:60295) > > > > > Nov 23 08:54:26 mesos-s2 mesos-slave[18499]: I1123 08:54:26.679875 > > > > > 18516 slave.cpp:1760] Sending queued task '0' to executor > > > > > '235498ca-6603-4cfe-bfc7-94005bb235fb-S5' of framework > > > > > 235498ca-6603-4cfe-bfc7-94005bb235fb-1442 > > > > > (no more log about this task) > > > > > > > > > > When task succeed to run, slave log shows like that. > > > > > > > > > > Nov 23 08:44:39 al-mesos-s3 mesos-slave[8644]: I1123 08:44:39.637285 > > > > > 8658 slave.cpp:2379] Got registration for executor > > > > > '235498ca-6603-4cfe-bfc7-94005bb235fb-S6' of framework > > > > > 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 from > > > > > executor(1)@10.130.98.65:52273 (http://10.130.98.65:52273) > > > > > Nov 23 08:44:39 al-mesos-s3 mesos-slave[8644]: I1123 08:44:39.639233 > > > > > 8658 slave.cpp:1760] Sending queued task '6' to executor > > > > > '235498ca-6603-4cfe-bfc7-94005bb235fb-S6' of framework > > > > > 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 > > > > > Nov 23 08:44:42 al-mesos-s3 mesos-slave[8644]: I1123 08:44:42.608182 > > > > > 8658 slave.cpp:2717] Handling status update TASK_RUNNING (UUID: > > > > > ff5a2278-0753-4541-bd33-a55f3a09fb69) for task 6 of framework > > > > > 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 from > > > > > executor(1)@10.130.98.65:52273 (http://10.130.98.65:52273) > > > > > Nov 23 08:44:42 al-mesos-s3 mesos-slave[8644]: I1123 08:44:42.612318 > > > > > 8658 status_update_manager.cpp:322] Received status update > > > > > TASK_RUNNING (UUID: ff5a2278-0753-4541-bd33-a55f3a09fb69) for task 6 > > > > > of framework 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 > > > > > > > > > > Any advice is welcome. > > > > > > > > > > Best Regards, > > > > > Mitsutoshi Kiuchi > > > > > > > > > > > > > > > > > > Attachments: > > > - log.driverStdErr.gz > > > > > > - log.mesosMaster.gz > > > > > > - log.mesosSlave.gz > > > > > > > > > > >

