Narayan Desai wrote:
> On Thu, 16 Jul 2009 12:16:14 -0400 Doug Hughes wrote:
>
>   Doug> Narayan Desai wrote:
>   Doug> > On Thu, 16 Jul 2009 11:15:48 -0400 Edward Ned Harvey wrote:
>   Doug> >
>   Doug> >   Ned> > I am interested in soliciting experiences deploying, using 
> and
>   Doug> >   Ned> > maintaining the
>   Doug> >   Ned> > Condor batch processing system, especially under Linux / 
> Debian.
>   Doug> >   Ned> >   Ned> > Our use would predominantly be many small jobs,
>   Doug> > rather than a few large
>   Doug> >   Ned> > jobs,
>   Doug> >   Ned> > with runtimes measured in a few hours.  Probably only a 
> handful of
>   Doug> >   Ned> > nodes, on
>   Doug> >   Ned> > the order of half a dozen, in total.[1]
>   Doug> >
>   Doug> >
>   Doug> >   Ned> I don't know anything about condor, or torque.  The obvious
>   Doug> >   Ned> choice to me would be SGE.  I wonder what advantage there is 
> to
>   Doug> >   Ned> using something other than SGE?
>   Doug> >
>   Doug> > Well, the area where condor is pretty much the undisputed king is 
> in the
>   Doug> > scavenger arena. The basic idea is that you could deploy condor on 
> top
>   Doug> > of your regular desktops and jobs would be deployed to use wasted
>   Doug> > cycles (during idle periods or on a set schedule, etc).  -nld
>   Doug> >
>   Doug> >   
>   Doug> Doesn't it also excel at the whole state/migration thing? E.G. you can
>   Doug> take a node out for maintenance and migrate a running job off to
>   Doug> another node by saving the memory state and performing the migration
>   Doug> and then resuming the job. (May only work for some job configurations)
>
> So I hear. I don't have any direct experience with the
> checkpointing/migration stuff. I gather they are starting to use VMs for
> this sort of thing as well as library-based checkpointing. 
>  -nld
>   
This depends on the purpose of the batch jobs. If you're looking for 
simple load sharing/cloud computing, we've used LSF in our engineering 
environment for a long time. It has the option of consuming unused 
desktop cycles, but we found this to be unreliable and problematic - not 
because LSF was bad, but because individuals had messed around with 
their desktops in such a way as to mangle any jobs distributed to them. 
Even distcc is an excellent way to spread out compiles across a bunch of 
machines (I even use it at home for this).

If the batch jobs are for the purpose of performing functions on 
particular machines, then you're not looking for a load distribution 
facility, you're looking for more traditional batch execution. The 
commercial players in this field are companies like Autosys, BMC, Orsyp, 
Tidal(*), and such. These products schedule (with very complex calendars 
and conditions, when necessary) jobs on particular machines (and some of 
them can load balance as well).

I work in a group who's main purpose is to provide automation, 
especially for the batch processing environment at $WORK. You're welcome 
to ping me - here on the list or privately - if you would like more help.

- Richard

[ In the interests of full disclosure, $WORK recently acquired (*) - but 
I'm not a sales person - I don't even play one on TV! ]
_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to