[slurm-dev] Re: Slurm with Torque

2017-04-17 Thread Gilles Gouaillardet
Mahmood, fwiw, slurm provides torque compatible commands (qsub, qstat, pbsnodes) that can help your users to transition from torque to slurm : your users can submit torque scripts on your slurm cluster qsub script.pbs until they move to slurm sbatch script.slurm Cheers Gilles On

[slurm-dev] Re: Slurm leaving nodes in COMPLETING state

2017-04-17 Thread Sander Kuusemets
Alright, I found the problem. sdiag says that my agent queue size is HUGE. [root@rocket ~]# sdiag *** sdiag output at Mon Apr 17 11:54:02 2017 Data since Mon Apr 17 08:57:20 2017 ***

[slurm-dev] job stats in e-mail

2017-04-17 Thread Vladimir Daric
Hello, I would like to automatically send some job stats in email when #SBATCH --mail-user and #SBATCH --mail-type options are set for a job ? With our slurm cluster configuration, when those options are used an empty mail is sent, all useful informations are in mail subject. Thanks in

[slurm-dev] Re: job stats in e-mail

2017-04-17 Thread Sander Kuusemets
You can write a script to send mail. Slurm.conf: MailProg=/etc/slurm/MailWrapper.sh This will be called instead when email is sent by slurm. You can add any kind of information to the email here, and then send it. Best regards, -- Sander Kuusemets University of Tartu, High Performance

[slurm-dev] SLURM terminating jobs before they finish

2017-04-17 Thread Batsirai Mabvakure
Hi, SLURM has been running okay until recently my jobs are terminating before they finish. I have tried increasing memory using --mem, but still the jobs stop halfway with an error in the slurm.out file. I then tried running again a job which once ran and completed a week ago, it also

[slurm-dev] SLURM terminating jobs before they finish

2017-04-17 Thread Batsirai Mabvakure
Hi, Slurm has been running okay until recently my jobs are being terminated before they finish running. At first I thought it was the memory and I allocated —mem=1, then moved to —mem=2, but still the jobs run halfway and stop without an error in the slurm.out file. I then tried a job

[slurm-dev] Re: Error messages: find_node_record: lookup failure when setting FQDN for compute nodes

2017-04-17 Thread Jianwen Wei
Thank you, Ryan. I read through the "NodeName" session in https://slurm.schedmd.com/slurm.conf.html , finding there is no clue on setting host list like "node[001-512].yourdomain.com". Cited as below, SLURM seems to support "node[001-512]"

[slurm-dev] Re: SLURM terminating jobs before they finish

2017-04-17 Thread Benjamin Redling
Hi Batsirai, Am 17.04.2017 um 14:54 schrieb Batsirai Mabvakure: > SLURM has been running okay until recently my jobs are terminating before > they finish. > I have tried increasing memory using --mem, but still the jobs stop halfway with an error in the slurm.out file. > I then tried running

[slurm-dev] Re: Power user sstat rights

2017-04-17 Thread Christopher Benjamin Coffey
Hello all, In my attempt to create another “root” user, I’ve found that it is not possible to create another user with the ability to “sstat jobid” every job on the cluster. This must be a bug. Can anyone confirm this? Thanks! Best, Chris — Christopher Coffey High-Performance Computing