Hi there, I'm shamefully hijacking this thread, but I'm wondering, since you've both mentioned you install BLCR: do you guys actively use it? Are you successfully checkpointing and restarting random applications with it? Do you use BLCR in a preemption context?
We (Stanford Research Computing) are investigating that area, and I'm interested in real-life experiences with either BLCR or CRIU (although CRIU may not integrate with Slurm as easily), or any framework that allows system-level checkpointing of applications. Thanks! -- Kilian
