Hello Véronique, slurmd uses statvfs or statfs (choice made at build time) to get TmpFS size (with a default of /tmp if TmpFS is null).
I think Uwe is right, it could be a filesystem mount timing problem. You can test it easily : - starting from a correct state, stop slurmd - unmount /local/scratch - start slurmd - check slurmd.log Regarding slurmd -C , I checked the source (17.2.6) the size of /tmp is returned (hardcoded), not sure if it should be considered as a bug. Regards, Pierre-Marie Le Biot -----Original Message----- From: Uwe Sauter [mailto:uwe.sauter...@gmail.com] Sent: Wednesday, October 11, 2017 4:18 PM To: slurm-dev <slurm-dev@schedmd.com> Subject: [slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk What distribution are you using? If it is using systemd then it is possible that slurmd gets started before /local/scratch is mounted. You'd need to add a dependency to the slurmd service so it waits till /local/scratch is mounted before the service is started. Am 11.10.2017 um 15:38 schrieb Véronique LEGRAND: > Hello Pierre-Marie, > > > > I stopped the slurmd daemon on tars-XXX then restarted it in the foreground > with: > > > > sudo /my/path/to /slurmd -vvvvvvvv -D -d /opt/slurm/sbin/slurmstepd > > > > and got: > > slurmd: Gres Name=disk Type=(null) Count=204000 > > and also: > > slurmd: debug3: CPUs=12 Boards=1 Sockets=2 Cores=6 Threads=1 > Memory=258373 TmpDisk=204699 Uptime=162294 CPUSpecList=(null) > FeaturesAvail=(null) FeaturesActive=(null) > > in the output. > > So, the value this time was correct. > > > > In slurmctld.log, I have: > > 2017-10-11T12:24:01+02:00 tars-master slurmctld[120352]: Node > *tars-113* now responding > > 2017-10-11T12:24:01+02:00 tars-master slurmctld[120352]: node > *tars-113* returned to service > > > > I waited 2 hours and didn’t get any : error: Node tars-XXX has low > tmp_disk size (129186 < 204000) > > So, I stopped it and started it again in the usual way: > > > > sudo /etc/init.d/slurm start at 2:38 pm > > > > I got no error message in slurmctld.log and no erroneous value in slurmd.log. > > > > A 2:49 pm, I reboot the machine and here is what I got in slurmd.log: > > > > -sh-4.1$ sudo cat slurmd.log > > 2017-10-11T14:50:30.742049+02:00 tars-113 slurmd[18621]: Message > aggregation enabled: WindowMsgs=24, WindowTime=200 > > 2017-10-11T14:50:30.797696+02:00 tars-113 slurmd[18621]: CPU frequency > setting not configured for this node > > 2017-10-11T14:50:30.797706+02:00 tars-113 slurmd[18621]: Resource > spec: Reserved system memory limit not configured for this node > > 2017-10-11T14:50:30.986903+02:00 tars-113 slurmd[18621]: cgroup > namespace 'freezer' is now mounted > > 2017-10-11T14:50:31.023900+02:00 tars-113 slurmd[18621]: cgroup > namespace 'cpuset' is now mounted > > 2017-10-11T14:50:31.066430+02:00 tars-113 slurmd[18633]: slurmd > version 16.05.9 started > > 2017-10-11T14:50:31.123213+02:00 tars-113 slurmd[18633]: slurmd > started on Wed, 11 Oct 2017 14:50:31 +0200 > > 2017-10-11T14:50:31.123493+02:00 tars-113 slurmd[18633]: CPUs=12 > Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186 > Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) > FeaturesActive=(null) > > > > Erroneous value was back again! > > > > So, I did again : > > sudo /etc/init.d/slurm stop > > sudo /etc/init.d/slurm start > > > > and the following lines were added to the log: > > 2017-10-11T14:51:29.707556+02:00 tars-113 slurmd[18633]: Slurmd > shutdown completing > > 2017-10-11T14:51:51.496552+02:00 tars-113 slurmd[19047]: Message > aggregation enabled: WindowMsgs=24, WindowTime=200 > > 2017-10-11T14:51:51.555792+02:00 tars-113 slurmd[19047]: CPU frequency > setting not configured for this node > > 2017-10-11T14:51:51.555803+02:00 tars-113 slurmd[19047]: Resource > spec: Reserved system memory limit not configured for this node > > 2017-10-11T14:51:51.567003+02:00 tars-113 slurmd[19049]: slurmd > version 16.05.9 started > > 2017-10-11T14:51:51.569174+02:00 tars-113 slurmd[19049]: slurmd > started on Wed, 11 Oct 2017 14:51:51 +0200 > > 2017-10-11T14:51:51.569533+02:00 tars-113 slurmd[19049]: CPUs=12 > Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=204699 > Uptime=155 CPUSpecList=(null) FeaturesAvail=(null) > FeaturesActive=(null) > > > > The value for TmpDisk was correct again. > > > > So, my question is, where does slurmd read the value for the size of > /local/scratch. Does it use “df” or another command? It seems that on > startup, slurmd reads a value that is not yet set correctly… > > > > Thank you in advance for any help. > > > > Regards, > > > > Véronique > > > > > > > > -- > > Véronique Legrand > > IT engineer – scientific calculation & software development > > https://research.pasteur.fr/en/member/veronique-legrand/ > > Cluster and computing group > > IT department > > Institut Pasteur Paris > > Tel : 95 03 > > > > > > *From: *"Le Biot, Pierre-Marie" <pierre-marie.leb...@hpe.com> > *Reply-To: *slurm-dev <slurm-dev@schedmd.com> > *Date: *Tuesday, 10 October 2017 at 17:00 > *To: *slurm-dev <slurm-dev@schedmd.com> > *Subject: *[slurm-dev] RE: Node always going to DRAIN state with > reason=Low TmpDisk > > > > Véronique, > > > > So that’s the culprit : > > 2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12 > Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186 > Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) > FeaturesActive=(null) > > > > For a reason you have to determine, when slurmd starts on tars-XXX it > finds that the size of /local/scratch (assuming that this is the value > of TmpFS in slurm.conf for this node) is 129186MB and sends this value to > slurmctld which compares it with the value recorded in slurm.conf, that is > 204000 for that node. > > By the way, 129186MB is very close to the size of /dev/shm… > > > > About the value returned by slurmd -C (500), it could be that /tmp is > harcoded somewhere. > > > > Regards, > > Pierre-Marie Le Biot > > > > *From:*Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr] > *Sent:* Tuesday, October 10, 2017 4:33 PM > *To:* slurm-dev <slurm-dev@schedmd.com> > *Subject:* [slurm-dev] RE: Node always going to DRAIN state with > reason=Low TmpDisk > > > > Pierre-Marie, > > > > Here is what I have in slurmd.log on tars-XXX > > > > -sh-4.1$ sudo cat slurmd.log > > 2017-10-09T17:09:57.538636+02:00 tars-XXX slurmd[18597]: Message > aggregation enabled: WindowMsgs=24, WindowTime=200 > > 2017-10-09T17:09:57.647486+02:00 tars-XXX slurmd[18597]: CPU frequency > setting not configured for this node > > 2017-10-09T17:09:57.647499+02:00 tars-XXX slurmd[18597]: Resource > spec: Reserved system memory limit not configured for this node > > 2017-10-09T17:09:57.808352+02:00 tars-XXX slurmd[18597]: cgroup > namespace 'freezer' is now mounted > > 2017-10-09T17:09:57.844400+02:00 tars-XXX slurmd[18597]: cgroup > namespace 'cpuset' is now mounted > > 2017-10-09T17:09:57.902418+02:00 tars-XXX slurmd[18640]: slurmd > version 16.05.9 started > > 2017-10-09T17:09:57.957030+02:00 tars-XXX slurmd[18640]: slurmd > started on Mon, 09 Oct 2017 17:09:57 +0200 > > 2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12 > Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186 > Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) > FeaturesActive=(null) > > > > > > -- > > Véronique Legrand > > IT engineer – scientific calculation & software development > > https://research.pasteur.fr/en/member/veronique-legrand/ > > Cluster and computing group > > IT department > > Institut Pasteur Paris > > Tel : 95 03 > > > > > > *From: *"Le Biot, Pierre-Marie" <pierre-marie.leb...@hpe.com > <mailto:pierre-marie.leb...@hpe.com>> > *Reply-To: *slurm-dev <slurm-dev@schedmd.com > <mailto:slurm-dev@schedmd.com>> > *Date: *Tuesday, 10 October 2017 at 15:20 > *To: *slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>> > *Subject: *[slurm-dev] RE: Node always going to DRAIN state with > reason=Low TmpDisk > > > > Véronique, > > > > This not what I expected, I was thinking slurmd -C would return > TmpDisk=204000 or more probably 129186 as seen in slurmctld log. > > > > I suppose that you already checked slurmd logs on tars-XXX ? > > > > Regards, > > Pierre-Marie Le Biot > > > > *From:*Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr] > *Sent:* Tuesday, October 10, 2017 2:09 PM > *To:* slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>> > *Subject:* [slurm-dev] RE: Node always going to DRAIN state with > reason=Low TmpDisk > > > > Hello Pierre-Marie, > > > > First, thank you for your hint. > > I just tried. > > > >>slurmd -C > > NodeName=tars-XXX CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6 > ThreadsPerCore=1 RealMemory=258373 TmpDisk=500 > > UpTime=0-20:50:54 > > > > The value for TmpDisk is erroneous. I do not know what can be the > cause of this since the operating system df command gives the right values. > > > > -sh-4.1$ df -hl > > Filesystem Size Used Avail Use% Mounted on > > slash_root 3.5G 1.6G 1.9G 47% / > > tmpfs 127G 0 127G 0% /dev/shm > > tmpfs 500M 84K 500M 1% /tmp > > /dev/sda1 200G 33M 200G 1% /local/scratch > > > > > > Could slurmd be messing up tmpfs with /local/scratch? > > > > I tried the same thing on another similar node (tars-XXX-1) > > > > I got: > > > > -sh-4.1$ df -hl > > Filesystem Size Used Avail Use% Mounted on > > slash_root 3.5G 1.7G 1.8G 49% / > > tmpfs 127G 0 127G 0% /dev/shm > > tmpfs 500M 5.7M 495M 2% /tmp > > /dev/sda1 200G 33M 200G 1% /local/scratch > > > > and > > > > slurmd -C > > NodeName=tars-XXX-1 CPUs=12 Boards=1 SocketsPerBoard=2 > CoresPerSocket=6 ThreadsPerCore=1 RealMemory=258373 TmpDisk=500 > > UpTime=101-21:34:14 > > > > > > So, slurmd –C gives exactly the same answer but this node doesn’t go into > DRAIN state; it works perfectly. > > > > Thank you again for your help. > > > > Regards, > > > > Véronique > > > > > > > > -- > > Véronique Legrand > > IT engineer – scientific calculation & software development > > https://research.pasteur.fr/en/member/veronique-legrand/ > > Cluster and computing group > > IT department > > Institut Pasteur Paris > > Tel : 95 03 > > > > > > *From: *"Le Biot, Pierre-Marie" <pierre-marie.leb...@hpe.com > <mailto:pierre-marie.leb...@hpe.com>> > *Reply-To: *slurm-dev <slurm-dev@schedmd.com > <mailto:slurm-dev@schedmd.com>> > *Date: *Tuesday, 10 October 2017 at 13:53 > *To: *slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>> > *Subject: *[slurm-dev] RE: Node always going to DRAIN state with > reason=Low TmpDisk > > > > Hi Véronique, > > > > Did you check the result of slurmd -C on tars-XXX ? > > > > Regards, > > Pierre-Marie Le Biot > > > > *From:*Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr] > *Sent:* Tuesday, October 10, 2017 12:02 PM > *To:* slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>> > *Subject:* [slurm-dev] Node always going to DRAIN state with > reason=Low TmpDisk > > > > Hello, > > > > I have a problem with 1 node in our cluster. It is exactly as all the > other nodes (200 GB of temporary storage) > > > > Here is what I have in slurm.conf: > > > > # COMPUTES > > TmpFS=/local/scratch > > > > # NODES > > GresTypes=disk,gpu > > ReturnToService=2 > > NodeName=DEFAULT State=UNKNOWN Gres=disk:204000,gpu:0 TmpDisk=204000 > > NodeName=tars-[XXX-YYY] Sockets=2 CoresPerSocket=6 RealMemory=254373 > Feature=ram256,cpu,fast,normal,long,specific,admin Weight=20 > > > > The node that has the trouble is tars-XXX. > > > > Here is what I have in gres.conf: > > > > # Local disk space in MB (/local/scratch) > > NodeName=tars-[ZZZ-UUU] Name=disk Count=204000 > > > > XXX is in range: [ZZZ,UUU]. > > > > If I ssh to tars-XXX, here is what I get: > > > > -sh-4.1$ df -hl > > Filesystem Size Used Avail Use% Mounted on > > slash_root 3.5G 1.6G 1.9G 47% / > > tmpfs 127G 0 127G 0% /dev/shm > > tmpfs 500M 84K 500M 1% /tmp > > /dev/sda1 200G 33M 200G 1% /local/scratch > > > > /local/scratch is the directory for temporary storage. > > > > The problem is when I do > > scontrol show node tars-XXX, > > > > I get: > > > > NodeName=tars-XXX Arch=x86_64 CoresPerSocket=6 > > CPUAlloc=0 CPUErr=0 CPUTot=12 CPULoad=0.00 > > AvailableFeatures=ram256,cpu,fast,normal,long,specific,admin > > ActiveFeatures=ram256,cpu,fast,normal,long,specific,admin > > Gres=disk:204000,gpu:0 > > NodeAddr=tars-113 NodeHostName=tars-113 Version=16.05 > > OS=Linux RealMemory=254373 AllocMem=0 FreeMem=255087 Sockets=2 > Boards=1 > > State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=204000 Weight=20 > Owner=N/A MCS_label=N/A > > BootTime=2017-10-09T17:08:43 SlurmdStartTime=2017-10-09T17:09:57 > > CapWatts=n/a > > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > Reason=Low TmpDisk [slurm@2017-10-10T11:25:04] > > > > > > And in the slurmctld logs, I get the error message: > > 2017-10-10T08:35:57+02:00 tars-master slurmctld[120352]: error: Node > tars-XXX has low tmp_disk size (129186 < 204000) > > 2017-10-10T08:35:57+02:00 tars-master slurmctld[120352]: error: > _slurm_rpc_node_registration node=tars-XXX: Invalid argument > > > > I tried to reboot tars-XXX yesterday but the problem is still here. > > I also tried: > > scontrol update NodeName=ClusterNode0 State=Resume > > but state went back to DRAIN after a while… > > > > Does anyone have an idea of what could cause the problem? My > configuration files seem correct and there really are 200G free in > /local/scratch on tars-XXX… > > > > I thank you in advance for any help. > > > > Regards, > > > > > > Véronique > > > > > > > > > > > > > > > > -- > > Véronique Legrand > > IT engineer – scientific calculation & software development > > https://research.pasteur.fr/en/member/veronique-legrand/ > > Cluster and computing group > > IT department > > Institut Pasteur Paris > > Tel : 95 03 > > >