On 03/17/2016 04:01, 温圣召 wrote: > The preempted job1 show a PD reason of BeginTime > my job invocation at the info of them as follow: > [root@szwg]# sbatch --gres=gpu:4 -N 1 --partition=low mybatch.sh
You demand for _4_ GPUs and 1 node. Your config says each node has Gres=gpu:2 > Submitted batch job 103 > > > [root@szwg]# squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 103 low mybatch. root R 0:10 1 > cp01-sys-hic-gpu-00.cp01.baidu.com > > > [root@szwg]# sbatch --gres=gpu:4 -N 1 --partition=hig mybatch.sh > Submitted batch job 104 > > > [root@szwg]# squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 103 low mybatch. root PD 0:00 1 > (BeginTime) > 104 hig mybatch. root R 0:45 1 > cp01-sys-hic-gpu-00.cp01.baidu.com We are neither using preemption nor gres but and I am wrong, but I think "BeginTime" is misleading. As far as I understand there are't engouh free gpus (none) in your partion with the idle node, requeue can't happe as long as 104 is running. Regards, Benjamin -- FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html vox: +49 3641 9 44323 | fax: +49 3641 9 44321
