#991: Allow tasks to execute on the host set in "host" field
--------------------+-----------------------
Reporter:  adeiana  |       Owner:
    Type:  defect   |      Status:  assigned
Priority:  major    |   Component:  BibSched
 Version:           |  Resolution:
Keywords:           |
--------------------+-----------------------

Comment (by skaplun):

 Replying to [comment:11 adeiana]:
 > I see multiple reasons not dependent of our code that would make it long
 to start.
 > Particularly an unresponsive AFS leaving the processes in D state for
 several seconds.
 > or there is a problem on startup we never reach the startup and enter
 some deadlock.

 Yep!

 > We have the same exact problem with RUNNING tasks.
 > If they were kill by the oom killer, they remain in RUNNING status. I
 already had this happening too.
 >
 > Depending on our need to differentiate a starting task from a running
 task.

 This is currently already the case: right before executing a task,
 bibsched is changing its status from WAITING to SCHEDULED. It's then the
 responsibility of the task to change it from SCHEDULED to RUNNING (in
 order to proof it's in a healthy state).

 > If needed, we can add a STARTING status, if not we mark the task as
 RUNNING.
 > All tasks in starting status can behave like running as a result will
 have the queue entering the current deadlock we have.

 > In a separate ticket we handle pinging tasks regularly to check that
 they are not dead.

 Ouch, I thought we were already doing this in bibsched (and probably this
 was the case in old versions, but the code somehow is no longer there).

 > Viable solution?

 To regularly ping the running tasks as in the above case is definitively a
 viable solution. Special care must be taken however: if the task is in
 status RUNNING, and we fail to ping it, we have to check that meanwhile,
 it has not changed the status to DONE or something else, as the task might
 end in the very moment we are pinging it :-)

-- 
Ticket URL: <http://invenio-software.org/ticket/991#comment:12>
Invenio <http://invenio-software.org>

Reply via email to