> > I suspect they are getting enqueued
Just to be sure - the offers do eventually get through though? The most likely culprit is contention for the storage write lock, observable via spikes in stat log_storage_write_lock_wait_ns_total. I see that a lot of getJobUpdateDetails() and getTasksWithoutConfigs() > calls are being made at that time This sounds like API activity. This shouldn't interfere with offer processing directly, but could potentially slow down the scheduler as a whole. I also notice a lot of "Timeout reached for task..." around the same time. > Can this happen if task is in PENDING state and does not reach ASSIGNED due > to lack of offers? This is unusual. Pending tasks are not timed out; this applies to tasks in states where the scheduler is waiting for something else to act and it does not hear back (via a status update). I suggest digging into the cause of delayed offer processing first, i suspect it might be related to the task timeouts as well. version close to 0.18.0 Is the ambiguity is due to custom patches? Can you at least indicate the last git SHA off aurora/master? Digging much deeper in diagnosing this may prove tricky without knowing what code is in play. On Thu, Nov 9, 2017 at 9:49 PM, Mohit Jaggi <[email protected]> wrote: > I also notice a lot of "Timeout reached for task..." around the same time. > Can this happen if task is in PENDING state and does not reach ASSIGNED due > to lack of offers? > > On Thu, Nov 9, 2017 at 4:33 PM, Mohit Jaggi <[email protected]> wrote: > >> Folks, >> I have noticed some weird behavior in Aurora (version close to 0.18.0). >> Sometimes, it shows no offers in the UI offers page. But if I tail the logs >> I can see offers are coming in. I suspect they are getting enqueued for >> processing by "executor" but stay there for a long time and are not >> processed either due to locking or thread starvation. >> >> I see that a lot of getJobUpdateDetails() and getTasksWithoutConfigs() >> calls are being made at that time. Could these calls starve the >> OfferManager(e.g. by contending for some lock)? What should I be looking >> for to debug this condition further? >> >> Mohit. >> > >
