Re: [slurm-users] seff Not Caluculating [FIXED?]

2020-11-18 Thread Diego Zuccato
Il 18/11/20 15:15, Jason Simms ha scritto: > Use of uninitialized value $hash{"2"} in division (/) at /bin/seff line > 108, line 602. > Use of uninitialized value $hash{"2"} in division (/) at /bin/seff line > 108, line 602. Seems some setups report data in a different format, hence the

Re: [slurm-users] Users can't scancel

2020-11-18 Thread mercan
These log lines about the prolog script looks very suspicious to me: [2020-11-18T10:19:35.388] debug:  [job 110] attempting to run prolog [/cm/local/apps/cmd/scripts/prolog] then [2020-11-18T10:21:10.121] debug:  Waiting for job 110's prolog to complete [2020-11-18T10:21:10.121] debug: 

Re: [slurm-users] Users can't scancel

2020-11-18 Thread William Markuske
The epilog script does have exit 0 set at the end. Epilogs exit cleanly when run. With log set to debug5 I get the following results for any scancel call. Submit host slurmctld.log [2020-11-18T10:19:34.944] _slurm_rpc_submit_batch_job: JobId=110 InitPrio=110503 usec=191

Re: [slurm-users] Users can't scancel

2020-11-18 Thread mercan
Hi; Check epilog return value which comes from the return value of the last line of epilog script. Also, you can add a "exit 0" line at the last line of the epilog script to ensure to get a zero return value for testing purpose. Ahmet M. 18.11.2020 20:00 tarihinde William Markuske yazdı:

Re: [slurm-users] seff Not Caluculating [FIXED?]

2020-11-18 Thread Jason Simms
Dear Peter, Thanks for your response. Yes, I am running ProctrackType=proctrack/cgroup The behavior that I was seeing with the default seff, and that Diego saw as well, was simply that seff was not reporting really any information for a given job. I'm glad it's working for you, but it doesn't

Re: [slurm-users] seff Not Caluculating [FIXED?]

2020-11-18 Thread Peter Kjellström
On Wed, 18 Nov 2020 09:15:59 -0500 Jason Simms wrote: > Dear Diego, > > A while back, I attempted to make some edits locally to see whether I > could produce "better" results. Here is a comparison of the output of > your latest version, and then mine: I'm not sure what bug or behavior you're

[slurm-users] Users can't scancel

2020-11-18 Thread William Markuske
Hello, I am having an odd problem where users are unable to kill their jobs with scancel. Users can submit jobs just fine and when the task completes it is able to close correctly. However, if a user attempts to cancel a job via scancel the SIGKILL signals are sent to the step but don't

Re: [slurm-users] missing info from sacct

2020-11-18 Thread Andy Riebs
Hi Navin, I can't help with the sreport problem, but I did recognize the situation with the gap in job numbers (the use of federation), and jumped in for that one. Since this list is completely populated by volunteers, there is no one "assigned" to topic areas, but people jump in where they

Re: [slurm-users] missing info from sacct

2020-11-18 Thread navin srivastava
Thank you Andy. but when i am trying to get the utilization for the months it says it is 100%. when i tried to find it using utilization by user it gives me a very different value which i am unable to understand. deda1x1466:~ # sreport cluster AccountUtilizationByUser start=10/02/20

Re: [slurm-users] missing info from sacct

2020-11-18 Thread Andy Riebs
I see from your subsequent post that you're using a pair of clusters with a single database, so yes, you are using federation. The high order bits of the Job ID identify the cluster that ran the job, so you will typically have a huge gap between ranges of Job IDs. Andy On 11/18/2020 9:15

Re: [slurm-users] seff Not Caluculating [FIXED?]

2020-11-18 Thread Jason Simms
Dear Diego, A while back, I attempted to make some edits locally to see whether I could produce "better" results. Here is a comparison of the output of your latest version, and then mine: [root@hpc bin]# seff 24567 Use of uninitialized value $hash{"2"} in division (/) at /bin/seff line 108,

[slurm-users] missing info from sacct

2020-11-18 Thread navin srivastava
While running the sacct we found that some jobid are not listing. 5535566 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED 0:0 5535567 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED 0:0 11016496 jupyter-s+ stdg_defq stdg_acc 1RUNNING 0:0

Re: [slurm-users] slurm-users Digest, Vol 37, Issue 33

2020-11-18 Thread vero chaul
normal > > While generating the report I am able to generate for the local > cluster(hpc1) without any issue and it looks good. but from the second > cluster data it always shows me 100%