[slurm-users] slurm problem with GrpTres

2020-05-04 Thread Alberto Morillas, Angelines
Hello, I have a problem with GrpTres, I specify the limits with sacctmgr --immediate modify user where user= set GrpTres=cpu=144,node=4 but when the user send serial jobs, for example 5 jobs , the user only can execute 4, and the rest of the jobs are PD with the reason=AssocGrpNodeLimit. I

[slurm-users] problems with number of jobs with GrpTres

2020-05-19 Thread Alberto Morillas, Angelines
Hello, I have a problem with GrpTres, I specify the limits with sacctmgr --immediate modify user where user= set GrpTres=cpu=144,node=4 but when the user send serial jobs, for example 5 jobs , the user only can execute 4, and the rest of the jobs are PD with the reason=AssocGrpNodeLimit. I

[slurm-users] problems with OpenMPI 4.0.3

2020-05-29 Thread Alberto Morillas, Angelines
Good morning, We have a cluster with two kind of infiniband cards, one connectx-4 and the other connectx-6. Openmpi-3.1.3 works fine, but when we start with connectx-6 we started to use openmpi-4.0.3 (that support connectx-6) and the programs that have several parts, first a call to a secuencia

Re: [slurm-users] [EXTERNAL] problems with OpenMPI 4.0.3

2020-06-01 Thread Alberto Morillas, Angelines
D: Content-Type: text/plain; charset="utf-8" Hello Angelines, Do you know how the Open MPI 4.0.3 package was configured and built? That information would be useful to help diagnose the problem. Thanks, Howard From: slurm-users on behalf of "Alb

Re: [slurm-users] [EXTERNAL] problems with OpenMPI 4.0.3

2020-06-01 Thread Alberto Morillas, Angelines
band in the Open MPI 4.0.x release stream. Howard ?On 6/1/20, 10:29 AM, "slurm-users on behalf of Alberto Morillas, Angelines" wrote: Hello Howard I installed it with spack: openmpi@4.0.3 -cuda +cxx_exceptions fabrics=verbs -java -legacylauncher

[slurm-users] fail job

2020-06-30 Thread Alberto Morillas, Angelines
Hi, We have slurm version 18.08.6 One of my nodes is in drain state Reason=Kill task failed [root@2020-06-27T02:25:29] In the node I can see in the slurmd.log 2020-06-27T01:24:26.242] task_p_slurmd_batch_request: 963771 [2020-06-27T01:24:26.242] task/affinity: job 963771 CPU input mask for node

Re: [slurm-users] fail job

2020-06-30 Thread Alberto Morillas, Angelines
... [2020-06-30T11:46:52.740] error: select_nodes: calling _get_req_features() for JobId=964556 with not NULL job resources [2020-06-30T11:46:52.740] error: select_nodes: calling _get_req_features() for JobId=964574 with not NULL job resources [2020-06-30T11:46:52.741] error: select_nodes: ca

[slurm-users] Get original script of a job

2021-03-05 Thread Alberto Morillas, Angelines
Hi, I would like to know if it will be possible to get the script that was used to send a job. I know that when I send a job with scontroI can get the path and the name of the script used to send this job, but normally the users change theirs scripts and sometimes all was wrong after that, so i

Re: [slurm-users] slurm-users Digest, Vol 41, Issue 13

2021-03-05 Thread Alberto Morillas, Angelines
rmat=flowed On 05-03-2021 11:29, Alberto Morillas, Angelines wrote: > I would like to know if it will be possible to get the script that was > used to send a job. > > I know that when I send a job with scontroI can get the path and the > name of the script used to send this job, but

Re: [slurm-users] Get original script of a job

2021-03-07 Thread Alberto Morillas, Angelines
ng, please edit your Subject line so it is more specific than "Re: Contents of slurm-users digest..." Today's Topics: 1. Re: slurm-users Digest, Vol 41, Issue 13 (Alberto Morillas, Angelines) 2. Re: Get original script of a job (Ward Poelmans)

[slurm-users] xalloc

2021-03-09 Thread Alberto Morillas, Angelines
Hi, I need your help. I have users that need an interactive shell on a compute node with the possibility of running programs with a graphical user interface directly on the compute node. Looking for information I have found the xalloc command but it must be a wrapper because It isn`t installed i

[slurm-users] Checkpoint

2021-11-29 Thread Alberto Morillas, Angelines
Hi! I need your help How could I use chekpoint (dmtcp) with slurm? Thanks in advance Angelines