[slurm-dev] Re: node getting again and again to drain or down state

2015-03-10 Thread suprita.bothra
What is the solution for this not responding reason? sinfo -R REASON USER TIMESTAMP NODELIST Not responding root 2015-03-10T15:43:59 democlient1 Regards Suprita -Original Message- From: Uwe Sauter [mailto:uwe.sauter...@gmail.com] Sent: Tuesday,

[slurm-dev] Re: node getting again and again to drain or down state

2015-03-10 Thread suprita.bothra
I had 1 core on each node. Changed the conf file and restarted slurm -Original Message- From: Uwe Sauter [mailto:uwe.sauter...@gmail.com] Sent: Tuesday, March 10, 2015 3:34 PM To: slurm-dev Subject: [slurm-dev] Re: node getting again and again to drain or down state In your slurmconf:

[slurm-dev] Re: node getting again and again to drain or down state

2015-03-10 Thread Mehdi Denou
Set the slurmd log in debug and take a look at it. Le 10/03/2015 11:37, suprita.bot...@wipro.com a écrit : What is the solution for this not responding reason? sinfo -R REASON USER TIMESTAMP NODELIST Not responding root 2015-03-10T15:43:59

[slurm-dev] Re: Separate slurm-realdev list

2015-03-10 Thread Mehdi Denou
Hi, As discussed during the last SUG: we should keep the dev list for dev topics and re-activate the user mailing list... My 2 cents. Le 10/03/2015 10:24, Marcin Stolarek a écrit : Separate slurm-realdev list Hi guys, One of the ideas that came on the last slurm user group was to create a

[slurm-dev] Re: node getting again and again to drain or down state

2015-03-10 Thread suprita.bothra
The o/p of sinfo-R is as follows: REASON USER TIMESTAMP NODELIST Not responding root 2015-03-10T14:21:11 democlient1 Low socket*core*thre root 2015-03-10T14:37:51 demomaster1 And I am attaching configuration file too. Kindly see to it. -Original

[slurm-dev] Re: node getting again and again to drain or down state

2015-03-10 Thread Uwe Sauter
In your slurmconf: Procs=2 From the output: Low socket*core*thre How many cores / CPUs / sockets do your nodes have? Am 10.03.2015 um 10:48 schrieb suprita.bot...@wipro.com: The o/p of sinfo-R is as follows: REASON USER TIMESTAMP NODELIST Not responding

[slurm-dev] Re: SlurmDBD Archiving

2015-03-10 Thread Paul Edmon
Oh and for the record we are running 14.11.4 -Paul Edmon- On 03/10/2015 09:26 AM, Paul Edmon wrote: So when I tried to do an archive dump I got the following error. What does this mean? [root@holy-slurm01 slurm]# sacctmgr -i archive dump sacctmgr: error: slurmdbd: Getting response to

[slurm-dev] Re: SlurmDBD Archiving

2015-03-10 Thread Paul Edmon
So when I tried to do an archive dump I got the following error. What does this mean? [root@holy-slurm01 slurm]# sacctmgr -i archive dump sacctmgr: error: slurmdbd: Getting response to message type 1459 sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error Problem dumping archive:

[slurm-dev] Re: SlurmDBD Archiving

2015-03-10 Thread Danny Auble
The fatal you received means your query lasted more than 15 minutes, mysql deemed it hung and aborted. You can increase the timeout for innodb_lock_wait_timeout in your my.cnf and try again, but that generally isn't a good idea. You can safely try again as many times as you would like and

[slurm-dev] Re: SlurmDBD Archiving

2015-03-10 Thread Paul Edmon
Ok, good to know. We've never purged this database and it has well over 34 million jobs in it. I was hoping to do so in a controlled way, but guess I will have to wait for the 1st. Out of curiousity is there a way to change which day of the month it does the archive on? Could that be added

[slurm-dev] Re: possible bug in srun --unbuffered option

2015-03-10 Thread Manu Thambi
After a bit of back and forth with David, here is what I found: - slurmd/srun used to buffer stdout internally in old versions (I know for sure this happened in v2.6.2). This buffering can be turned off using --unbuffered . This is distinct from any buffering done on the user task. -

[slurm-dev] How to debug a job that won't start

2015-03-10 Thread Uwe Sauter
Hi, I have an account production configured with limitations GrpNodes=18, MaxNodes=18, MaxWall=7-00:00:00, an associated user with limitations MaxNodes=18, MaxWall=7-00:00:00 and a QoS with limitations Priority=10, GraceTime=00:00:00, PreemtMode=cluster, Flags=DenyOnLimit, UsageFact0r=1.0,

[slurm-dev] change SLURM behavior when job exceeds memory limit

2015-03-10 Thread Slurm User
Hi My job is getting killed by SLURM when it exceeds the memory I've initially requested through srun. Is there a way to request SLURM to NOT kill my job? (Other than setting a higher memory limit through srun?)

[slurm-dev] confused by some values in `scontrol show job`

2015-03-10 Thread Jonathon A Anderson
I’m writing some internal slurm documentation for our users, and I can’t find *official* (or any other) documentation for some of the output: * SecsPreSuspend * ReqB:S:C:T * NtasksPerN:B:S:C * Socks/Node (I think I get it; but it’s not present in scontrol manpage) * CoreSpec Could someone

[slurm-dev] node getting again and again to drain or down state

2015-03-10 Thread suprita.bothra
Hi Please help me if anyone can. I am running command Scontrol update NodeName=xyz state=idle After running this command ny node gets idle state but after sometime again gets back to drain or down state I have cheked my iptables and ip6tables status also its turned off What might be the

[slurm-dev] Re: node getting again and again to drain or down state

2015-03-10 Thread Uwe Sauter
Check that your node resources in slurm.conf represent your actual configuration, e.g. that the amount of memory in your node is configured as equal or less in slurm.conf. Am 10.03.2015 um 10:05 schrieb suprita.bot...@wipro.com: Hi Please help me if anyone can. I am running

[slurm-dev] Re: node getting again and again to drain or down state

2015-03-10 Thread Mehdi Denou
What is the output of sinfo -R for this node ? Le 10/03/2015 10:08, Uwe Sauter a écrit : Check that your node resources in slurm.conf represent your actual configuration, e.g. that the amount of memory in your node is configured as equal or less in slurm.conf. Am 10.03.2015 um 10:05

[slurm-dev] Separate slurm-realdev list

2015-03-10 Thread Marcin Stolarek
Hi guys, One of the ideas that came on the last slurm user group was to create a separate list for more advanced topics. Any news on this? cheers, marcin