Thanks Dave and yes, I accidentally sent it non-ascii - I hate it when that 
happens.

I want to tackle single jobs first so I'll try DMTCP.

What SGE scripts do you recommend?    I found this but not sure if there are 
better?

https://github.com/opoplawski/gridengine_dmtcp/blob/master/dmtcp_starter

Joseph

On 10/2/2013 4:02 PM, Dave Love wrote:
Joseph Farran <[email protected]> writes:

[Please don't post content-type: text/html.]

Hi all.

We have Grid Engine 8.1.4 running on a cluster with CentOS 6.4, using kernel
2.6.32-358.18.1.    We are just getting started on setting up job checkpoint.

We got BLCR compiled and are currently testing it.     Before we go much
further and setup Grid Engine with it, I like to know if others are using
DMTCP?

Which one is the better choice, DMTCP or BLCR?     Or are folks using both
depending on the need?
If you're running MPI, and want checkpointing integrated with it, you
probably don't have a choice, certainly not if you don't use TCP.  For
single node jobs, DMTCP presumably wins by virtue being in user space,
but if you already have BLCR for parallel jobs, it's probably simplest
to just use that.  With BLCR you probably want DKMS packages to keep up
with kernel updates.


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to