Figured it out. The checkpoint_blcr.a and checkpoint_blcr.so libraries were not created in the /usr/local/lib/slurm directory.
This happened because I installed BLCR **AFTER** I installed SLURM. When I installed SLURM after I installed BLCR, the BLCR installation was detected and the slurmctld daemon ran correctly. On Thu, Jul 3, 2014 at 5:50 PM, Arjun J Rao <[email protected]> wrote: > I have installed BLCR and it is checkpointing serial jobs fine on > individual nodes. I've done checkpointing using BLCR and SLURM 2.6.6-2 > earlier on this machine. > > But now, with both SLURM 14.03.4-2 and SLURM 2.6.6-2, I get the following > cryptic error : > > slurmctld : error : Couldn't find the specified plugin name for > checkpoint/blcr looking at all files > slurmctld : error : Cannot find checkpoint plugin for checkpoint/blcr > slurmctld : error : Cannot create checkpoint context for checkpoint/blcr > slurmctld : fatal : failed to initialize checkpoint plugin > > My cluster is based on Scientific Linux 6.2 with linux kernel 2.6.32-220. > What could be going wrong ? > > Have attached the slurm.conf file as well. >
