[task #7321] bibsched scheduler should not block inside given hours

noreply [Tibor Simko] Mon, 21 Jun 2010 22:31:25 +0200

This is an automated notification sent by LCG Savannah.
It relates to:
                task #7321, project CDS Invenio


==============================================================================
 LATEST MODIFICATIONS of task #7321:
==============================================================================

Update of task #7321 (project cdsware):

                  Status:                    None => Duplicate              
        Percent Complete:                      0% => 100%                   
             Open/Closed:                    Open => Closed                 

    _______________________________________________________

Follow-up Comment #3:

Moved to http://invenio-software.org/ticket/152

==============================================================================
 OVERVIEW of task #7321:
==============================================================================

URL:
  <http://savannah.cern.ch/task/?7321>

                 Summary: bibsched scheduler should not block inside given
hours
                 Project: CDS Invenio
            Submitted by: skaplun
            Submitted on: 2008-07-05 09:22
         Should Start On: 2008-07-05 00:00
   Should be Finished on: 2008-07-05 00:00
                Category: BibSched
                Priority: 5 - Normal
                  Status: Duplicate
                 Privacy: Private
        Percent Complete: 100%
             Assigned to: pcaderno
             Open/Closed: Closed
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________


bibsched scheduler blocks on purpose warning the admin, when a task terminate
with an error.
This is not desirable outside administrator working hours (e.g. weekends)
because blocking the whole queue is worse than not recovering correctly the
error (usually).
A way to instruct bibsched which hours should keep going must be added.

    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
List-Post: project-invenio-devel@cern.ch
Date: 2010-06-21 20:31              By: Tibor Simko <simko>
Moved to http://invenio-software.org/ticket/152

-------------------------------------------------------
List-Post: project-invenio-devel@cern.ch
Date: 2010-03-24 17:24              By: Tibor Simko <simko>
More info for this task: say we have bibupload tasks 1, 2, ... k
waiting in the queue, and say that the bibupload task 2 concerns
records R1, R2, ..., Rn, and say that it fails on record Ri.  Then
ideally we should not stop the processing queue for all subsequent
bibupload tasks greater than 2, but we should mark the internal queue
state to say that records Ri..Rn may be vulnerable, and we should
process bibupload jobs 3...k anyway, and in case a job touches other
records than Ri...Rn, say records Rw...Rz, then this job should be
allowed, otherwise the job should be held waiting for manual resolve.

The queue checking described above is similar to the freshness check
that BibEdit does, so parts of the checks may be shared.  But beware
of complications, because a record is not recognized solely by its
record ID (001), but also by external sysno (970), internal OAI ID
(024), external OAI ID (035), some of them with provenance checking.

So, this task may require a new bibsched state, and may get complex.

The primary use case is to ease problematic situations like the task
queue being stopped in the middle of the night for the ATLAS
publications because there was an error in a Photo record upload.

P.S. Another option to ease this use case situation would be to make
bibsched/bibupload/bibedit work more in diff-like manner.  But we
might have incoming bibupload replace jobs due to webuploads anyway,
so such a queue checking would be indeed good to have.

-------------------------------------------------------
List-Post: project-invenio-devel@cern.ch
Date: 2008-07-07 08:05              By: Tibor Simko <simko>
This option would be dangerous without a notion of task dependencies. Imagine
two jobs, bibupload1 and bibupload2, with bibupload2 depending on bibupload1. 
If 1 fails, and 2 would go, then bad things would happen.  It is much safer to
stop the queue if a bibupload fails, as it usually necessitates human's
intervention to resolve the problem.

On the other hand, if a bibindex/bibrank/etc job fails, then we may still be
accepting further bibuploads without too much risk.

Hence it is better to think here not in terms stopping the queue, but in
terms of allowing/forbidding certain tasks after a failure.  E.g. if a
bibupload task fails, no further bibupload tasks should be permitted until
the first failed task is solved and acknowledged.  E.g. if a bibindex task
fails, no further bibindex task should be permitted, but other jobs such as
bibuploads are okay.  Etc.

(... and I don't think that working hours or not should make a difference
here.)





    _______________________________________________________

Carbon-Copy List:

CC Address                          | Comment
------------------------------------+-----------------------------
6446                                | -UPD-
1576                                | -COM-
2195                                | -SUB-




==============================================================================

This item URL is:
  <http://savannah.cern.ch/task/?7321>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/

[task #7321] bibsched scheduler should not block inside given hours

Reply via email to