Re: [lopsa-tech] Experience with the Condor batch processing system on Linux.

Richard Chycoski Sat, 18 Jul 2009 20:07:42 -0700

Edward Ned Harvey wrote:
>
> By default, it will distribute jobs every 15 sec, but it can be configured 
> down to 1sec.  So if you have jobs of 2sec, the efficiency might not be great.
>
> You can have job dependencies, but I'm not sure how much knowledge it has 
> that's relevant to your situation.  If you submit a job, let's say jobid is 
> 500, and you submit another job, let's say 501, you can make job 501 depend 
> on 500.  There isn't any conditional detection of "pass/fail" on 500 ... just 
> a scheduling delay to ensure 501 runs after 500.
>   
These are some typical differences between 'load sharing' and 
'traditional batch' facilities. Some products work at bridging the gap 
(some more successfully than others), but most products are more firmly 
aimed at either 'give me jobs and spread them across as many machines as 
I can get my hands on' (load sharing) and 'run job x based on conditions 
y on host z' (traditional batch).


The comment above about 'seconds' and 'distribute' is important. A 
typical load sharing system is working hard to spread work across all of 
the machines that it 'owns'. Typical uses are software builds. For the 
second comment - builds have few dependencies - these are usually 
resolved in a 'make' or similar process.

Batch processes often have a different nature. They are often scheduled 
to occur at a specific time rather than 'ASAP', and their delivery may 
be associated with SLAs (Service Level Agreements) where the output of a 
job must arrive on the C[E|F|I]O's desk at 8 am, under penalty of 
dismemberment if it's late!

Batch jobs often have complex dependency trees with 'corrective' 
processes that come in to play if a particular step or job fails. Batch 
processing scripts/controls may have a rich language to allow for 
fine-grained control. Load sharers usually do not.

Batch processing has evolved from the mainframe era, where Computer 
Operators (yes, the job title was capitalised :-) put decks of cards 
into readers, often boxes at a time, and the scheduler (e.g., HASP or 
JES2 on IBM mainframes) queued up and held all of the jobs. The 
operators had a worksheet (or workbook) that told them what jobs to 
release and when, and what to do if a job failed.(This may have involved 
running a corrective job or paging someone.)  Modern batch systems 
replace the operator with code (to one exent or another).

[And yah, when I started in computing this is how it worked. :-]

If you need load sharing, it would seem that systems like SGE and LSF 
are what you are looking for. If you need batch processing, Autosys, 
BMC, Orsyp, Tidal, and Tivoli are the places to look for answers. These 
are the commercial (or near-commercial solutions), I haven't worked with 
the Open Source alternatives in this field - which for batch processing 
is common since most companies want some company 'on the hook' if their 
batch processing system (usually inextricably linked to their 
'bread-and-butter') fails. For load sharing, I'd certainly look over 
open systems like SGE.

- Richard
_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-tech] Experience with the Condor batch processing system on Linux.

Reply via email to