Hi, I setup a Mesos cluster which runs Apache Aurora framework, and i registered 100 cron jobs which run every min on a 5 slave machine pool. I found after scheduled around 100 times, the cron jobs stuck in "PENDING" state. May i ask what kind of logs i can inspect and what is the possible problem ? I tried to restart the Aurora scheduler several times, every time after restarting, the cron jobs starts scheduled but when they hit around 100 time, all cron jobs stops. The BTW, Job executable is a very simple program which just write some number to a file, so i don't think it is a resource problem since i have 40 GB memory/40 CPUs in the slave machine pool
Thank you a lot ! -- Regards, Zi-Liang Mail:[email protected]
