On 03/17/2016 04:01, 温圣召 wrote:
> The preempted job1 show a PD reason of  BeginTime
> my job invocation at  the info of them as follow:
> [root@szwg]#  sbatch --gres=gpu:4 -N 1 --partition=low  mybatch.sh

You demand for _4_ GPUs and 1 node.
Your config says each node has Gres=gpu:2

> Submitted batch job 103
> 
> 
> [root@szwg]# squeue
>              JOBID PARTITION     NAME     USER ST       TIME  NODES 
> NODELIST(REASON)
>                103       low mybatch.     root  R       0:10      1 
> cp01-sys-hic-gpu-00.cp01.baidu.com
> 
> 
> [root@szwg]#  sbatch --gres=gpu:4 -N 1 --partition=hig  mybatch.sh
> Submitted batch job 104
> 
> 
> [root@szwg]# squeue
>              JOBID PARTITION     NAME     USER ST       TIME  NODES 
> NODELIST(REASON)
>                103       low mybatch.     root PD       0:00      1 
> (BeginTime)
>                104       hig mybatch.     root  R       0:45      1 
> cp01-sys-hic-gpu-00.cp01.baidu.com

We are neither using preemption nor gres and maybe I am wrong, but I
think "BeginTime" is misleading.
As far as I understand there aren't enough free gpus (none) in your
partition with the idle node, requeue can't happen as long as 104 is
running.

Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321

Reply via email to