What is the solution for this not responding reason?
sinfo -R
REASON USER TIMESTAMP NODELIST
Not responding root 2015-03-10T15:43:59 democlient1
Regards
Suprita
-Original Message-
From: Uwe Sauter [mailto:uwe.sauter...@gmail.com]
Sent: Tuesday,
I had 1 core on each node.
Changed the conf file and restarted slurm
-Original Message-
From: Uwe Sauter [mailto:uwe.sauter...@gmail.com]
Sent: Tuesday, March 10, 2015 3:34 PM
To: slurm-dev
Subject: [slurm-dev] Re: node getting again and again to drain or down state
In your slurmconf:
Set the slurmd log in debug and take a look at it.
Le 10/03/2015 11:37, suprita.bot...@wipro.com a écrit :
What is the solution for this not responding reason?
sinfo -R
REASON USER TIMESTAMP NODELIST
Not responding root 2015-03-10T15:43:59
Hi,
As discussed during the last SUG: we should keep the dev list for dev
topics and re-activate the user mailing list...
My 2 cents.
Le 10/03/2015 10:24, Marcin Stolarek a écrit :
Separate slurm-realdev list
Hi guys,
One of the ideas that came on the last slurm user group was to create
a
The o/p of sinfo-R is as follows:
REASON USER TIMESTAMP NODELIST
Not responding root 2015-03-10T14:21:11 democlient1
Low socket*core*thre root 2015-03-10T14:37:51 demomaster1
And I am attaching configuration file too.
Kindly see to it.
-Original
In your slurmconf: Procs=2
From the output: Low socket*core*thre
How many cores / CPUs / sockets do your nodes have?
Am 10.03.2015 um 10:48 schrieb suprita.bot...@wipro.com:
The o/p of sinfo-R is as follows:
REASON USER TIMESTAMP NODELIST
Not responding
Oh and for the record we are running 14.11.4
-Paul Edmon-
On 03/10/2015 09:26 AM, Paul Edmon wrote:
So when I tried to do an archive dump I got the following error. What
does this mean?
[root@holy-slurm01 slurm]# sacctmgr -i archive dump
sacctmgr: error: slurmdbd: Getting response to
So when I tried to do an archive dump I got the following error. What
does this mean?
[root@holy-slurm01 slurm]# sacctmgr -i archive dump
sacctmgr: error: slurmdbd: Getting response to message type 1459
sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
Problem dumping archive:
The fatal you received means your query lasted more than 15 minutes, mysql
deemed it hung and aborted. You can increase the timeout for
innodb_lock_wait_timeout in your my.cnf and try again, but that generally isn't
a good idea. You can safely try again as many times as you would like and
Ok, good to know. We've never purged this database and it has well over
34 million jobs in it. I was hoping to do so in a controlled way, but
guess I will have to wait for the 1st.
Out of curiousity is there a way to change which day of the month it
does the archive on? Could that be added
After a bit of back and forth with David, here is what I found:
- slurmd/srun used to buffer stdout internally in old versions (I know
for sure this happened in v2.6.2).
This buffering can be turned off using --unbuffered .
This is distinct from any buffering done on the user task.
-
Hi,
I have an account production configured with limitations GrpNodes=18,
MaxNodes=18, MaxWall=7-00:00:00, an associated user with
limitations MaxNodes=18, MaxWall=7-00:00:00 and a QoS with limitations
Priority=10, GraceTime=00:00:00, PreemtMode=cluster,
Flags=DenyOnLimit, UsageFact0r=1.0,
Hi
My job is getting killed by SLURM when it exceeds the memory I've initially
requested through srun.
Is there a way to request SLURM to NOT kill my job? (Other than setting a
higher memory limit through srun?)
I’m writing some internal slurm documentation for our users, and I can’t find
*official* (or any other) documentation for some of the output:
* SecsPreSuspend
* ReqB:S:C:T
* NtasksPerN:B:S:C
* Socks/Node (I think I get it; but it’s not present in scontrol manpage)
* CoreSpec
Could someone
Hi
Please help me if anyone can.
I am running command
Scontrol update NodeName=xyz state=idle
After running this command ny node gets idle state but after sometime again
gets back to drain or down state
I have cheked my iptables and ip6tables status also its turned off
What might be the
Check that your node resources in slurm.conf represent your actual
configuration, e.g. that the amount of memory in your node is
configured as equal or less in slurm.conf.
Am 10.03.2015 um 10:05 schrieb suprita.bot...@wipro.com:
Hi
Please help me if anyone can.
I am running
What is the output of sinfo -R for this node ?
Le 10/03/2015 10:08, Uwe Sauter a écrit :
Check that your node resources in slurm.conf represent your actual
configuration, e.g. that the amount of memory in your node is
configured as equal or less in slurm.conf.
Am 10.03.2015 um 10:05
Hi guys,
One of the ideas that came on the last slurm user group was to create a
separate list for more advanced topics.
Any news on this?
cheers,
marcin
18 matches
Mail list logo