[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

Uwe Sauter Wed, 11 Oct 2017 07:16:39 -0700

What distribution are you using? If it is using systemd then it is possible 
that slurmd gets started before /local/scratch is
mounted. You'd need to add a dependency to the slurmd service so it waits till 
/local/scratch is mounted before the service is
started.



Am 11.10.2017 um 15:38 schrieb Véronique  LEGRAND:
> Hello Pierre-Marie,
> 
>  
> 
> I stopped the slurmd daemon on tars-XXX then restarted it in the foreground 
> with:
> 
>  
> 
> sudo /my/path/to /slurmd -vvvvvvvv -D -d /opt/slurm/sbin/slurmstepd
> 
>  
> 
> and got:
> 
> slurmd: Gres Name=disk Type=(null) Count=204000
> 
> and also:
> 
> slurmd: debug3: CPUs=12 Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 
> TmpDisk=204699 Uptime=162294 CPUSpecList=(null)
> FeaturesAvail=(null) FeaturesActive=(null)
> 
> in the output.
> 
> So, the value this time was correct.
> 
>  
> 
> In slurmctld.log, I have:
> 
> 2017-10-11T12:24:01+02:00 tars-master slurmctld[120352]: Node *tars-113* now 
> responding
> 
> 2017-10-11T12:24:01+02:00 tars-master slurmctld[120352]: node *tars-113* 
> returned to service
> 
>  
> 
> I waited 2 hours and didn’t get any : error: Node tars-XXX has low tmp_disk 
> size (129186 < 204000)
> 
> So, I stopped it and started it again in the usual way:
> 
>  
> 
> sudo /etc/init.d/slurm start at 2:38 pm
> 
>  
> 
> I got no error message in slurmctld.log and no erroneous value in slurmd.log.
> 
>  
> 
> A 2:49 pm, I reboot the machine and here is what I got in slurmd.log:
> 
>  
> 
> -sh-4.1$ sudo cat slurmd.log
> 
> 2017-10-11T14:50:30.742049+02:00 tars-113 slurmd[18621]: Message aggregation 
> enabled: WindowMsgs=24, WindowTime=200
> 
> 2017-10-11T14:50:30.797696+02:00 tars-113 slurmd[18621]: CPU frequency 
> setting not configured for this node
> 
> 2017-10-11T14:50:30.797706+02:00 tars-113 slurmd[18621]: Resource spec: 
> Reserved system memory limit not configured for this node
> 
> 2017-10-11T14:50:30.986903+02:00 tars-113 slurmd[18621]: cgroup namespace 
> 'freezer' is now mounted
> 
> 2017-10-11T14:50:31.023900+02:00 tars-113 slurmd[18621]: cgroup namespace 
> 'cpuset' is now mounted
> 
> 2017-10-11T14:50:31.066430+02:00 tars-113 slurmd[18633]: slurmd version 
> 16.05.9 started
> 
> 2017-10-11T14:50:31.123213+02:00 tars-113 slurmd[18633]: slurmd started on 
> Wed, 11 Oct 2017 14:50:31 +0200
> 
> 2017-10-11T14:50:31.123493+02:00 tars-113 slurmd[18633]: CPUs=12 Boards=1 
> Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186
> Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
> 
>  
> 
> Erroneous value was back again!
> 
>  
> 
> So, I did again :
> 
> sudo /etc/init.d/slurm stop
> 
> sudo /etc/init.d/slurm start
> 
>  
> 
> and the following lines were added to the log:
> 
> 2017-10-11T14:51:29.707556+02:00 tars-113 slurmd[18633]: Slurmd shutdown 
> completing
> 
> 2017-10-11T14:51:51.496552+02:00 tars-113 slurmd[19047]: Message aggregation 
> enabled: WindowMsgs=24, WindowTime=200
> 
> 2017-10-11T14:51:51.555792+02:00 tars-113 slurmd[19047]: CPU frequency 
> setting not configured for this node
> 
> 2017-10-11T14:51:51.555803+02:00 tars-113 slurmd[19047]: Resource spec: 
> Reserved system memory limit not configured for this node
> 
> 2017-10-11T14:51:51.567003+02:00 tars-113 slurmd[19049]: slurmd version 
> 16.05.9 started
> 
> 2017-10-11T14:51:51.569174+02:00 tars-113 slurmd[19049]: slurmd started on 
> Wed, 11 Oct 2017 14:51:51 +0200
> 
> 2017-10-11T14:51:51.569533+02:00 tars-113 slurmd[19049]: CPUs=12 Boards=1 
> Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=204699
> Uptime=155 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
> 
>  
> 
> The value for TmpDisk was correct again.
> 
>  
> 
> So, my question is, where does slurmd read the value for the size of 
> /local/scratch. Does it use “df” or another command? It seems
> that on startup, slurmd reads a value that is not yet set correctly…
> 
>  
> 
> Thank you in advance for any help.
> 
>  
> 
> Regards,
> 
>  
> 
> Véronique
> 
>  
> 
>  
> 
>  
> 
> --
> 
> Véronique Legrand
> 
> IT engineer – scientific calculation & software development
> 
> https://research.pasteur.fr/en/member/veronique-legrand/
> 
> Cluster and computing group
> 
> IT department
> 
> Institut Pasteur Paris
> 
> Tel : 95 03
> 
>  
> 
>  
> 
> *From: *"Le Biot, Pierre-Marie" <pierre-marie.leb...@hpe.com>
> *Reply-To: *slurm-dev <slurm-dev@schedmd.com>
> *Date: *Tuesday, 10 October 2017 at 17:00
> *To: *slurm-dev <slurm-dev@schedmd.com>
> *Subject: *[slurm-dev] RE: Node always going to DRAIN state with reason=Low 
> TmpDisk
> 
>  
> 
> Véronique,
> 
>  
> 
> So that’s the culprit :
> 
> 2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12 Boards=1 
> Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186
> Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
> 
>  
> 
> For a reason you have to determine, when slurmd starts on tars-XXX it finds 
> that the size of /local/scratch (assuming that this is
> the value of TmpFS in slurm.conf for this node)  is 129186MB and sends this 
> value to slurmctld which compares it with the value
> recorded in slurm.conf, that is 204000 for that node.
> 
> By the way, 129186MB is very close to the size of /dev/shm…
> 
>  
> 
> About the value returned by slurmd -C (500), it could be that /tmp is 
> harcoded somewhere.
> 
>  
> 
> Regards,
> 
> Pierre-Marie Le Biot
> 
>  
> 
> *From:*Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]
> *Sent:* Tuesday, October 10, 2017 4:33 PM
> *To:* slurm-dev <slurm-dev@schedmd.com>
> *Subject:* [slurm-dev] RE: Node always going to DRAIN state with reason=Low 
> TmpDisk
> 
>  
> 
> Pierre-Marie,
> 
>  
> 
> Here is what I have in slurmd.log on tars-XXX
> 
>  
> 
> -sh-4.1$ sudo cat slurmd.log
> 
> 2017-10-09T17:09:57.538636+02:00 tars-XXX slurmd[18597]: Message aggregation 
> enabled: WindowMsgs=24, WindowTime=200
> 
> 2017-10-09T17:09:57.647486+02:00 tars-XXX slurmd[18597]: CPU frequency 
> setting not configured for this node
> 
> 2017-10-09T17:09:57.647499+02:00 tars-XXX slurmd[18597]: Resource spec: 
> Reserved system memory limit not configured for this node
> 
> 2017-10-09T17:09:57.808352+02:00 tars-XXX slurmd[18597]: cgroup namespace 
> 'freezer' is now mounted
> 
> 2017-10-09T17:09:57.844400+02:00 tars-XXX slurmd[18597]: cgroup namespace 
> 'cpuset' is now mounted
> 
> 2017-10-09T17:09:57.902418+02:00 tars-XXX slurmd[18640]: slurmd version 
> 16.05.9 started
> 
> 2017-10-09T17:09:57.957030+02:00 tars-XXX slurmd[18640]: slurmd started on 
> Mon, 09 Oct 2017 17:09:57 +0200
> 
> 2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12 Boards=1 
> Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186
> Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
> 
>  
> 
>  
> 
> --
> 
> Véronique Legrand
> 
> IT engineer – scientific calculation & software development
> 
> https://research.pasteur.fr/en/member/veronique-legrand/
> 
> Cluster and computing group
> 
> IT department
> 
> Institut Pasteur Paris
> 
> Tel : 95 03
> 
>  
> 
>  
> 
> *From: *"Le Biot, Pierre-Marie" <pierre-marie.leb...@hpe.com 
> <mailto:pierre-marie.leb...@hpe.com>>
> *Reply-To: *slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
> *Date: *Tuesday, 10 October 2017 at 15:20
> *To: *slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
> *Subject: *[slurm-dev] RE: Node always going to DRAIN state with reason=Low 
> TmpDisk
> 
>  
> 
> Véronique,
> 
>  
> 
> This not what I expected, I was thinking slurmd -C would return 
> TmpDisk=204000 or more probably 129186 as seen in slurmctld log.
> 
>  
> 
> I suppose that you already checked slurmd logs on tars-XXX ?
> 
>  
> 
> Regards,
> 
> Pierre-Marie Le Biot
> 
>  
> 
> *From:*Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]
> *Sent:* Tuesday, October 10, 2017 2:09 PM
> *To:* slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
> *Subject:* [slurm-dev] RE: Node always going to DRAIN state with reason=Low 
> TmpDisk
> 
>  
> 
> Hello Pierre-Marie,
> 
>  
> 
> First, thank you for your hint.
> 
> I just tried.
> 
>  
> 
>>slurmd -C
> 
> NodeName=tars-XXX CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6 
> ThreadsPerCore=1 RealMemory=258373 TmpDisk=500
> 
> UpTime=0-20:50:54
> 
>  
> 
> The value for TmpDisk is erroneous. I do not know what can be the cause of 
> this since the operating system df command gives the
> right values.
> 
>  
> 
> -sh-4.1$ df -hl
> 
> Filesystem      Size  Used Avail Use% Mounted on
> 
> slash_root      3.5G  1.6G  1.9G  47% /
> 
> tmpfs           127G     0  127G   0% /dev/shm
> 
> tmpfs           500M   84K  500M   1% /tmp
> 
> /dev/sda1       200G   33M  200G   1% /local/scratch
> 
>  
> 
>  
> 
> Could slurmd be messing up tmpfs with /local/scratch?
> 
>  
> 
> I tried the same thing on another similar node (tars-XXX-1)
> 
>  
> 
> I got:
> 
>  
> 
> -sh-4.1$ df -hl
> 
> Filesystem      Size  Used Avail Use% Mounted on
> 
> slash_root      3.5G  1.7G  1.8G  49% /
> 
> tmpfs           127G     0  127G   0% /dev/shm
> 
> tmpfs           500M  5.7M  495M   2% /tmp
> 
> /dev/sda1       200G   33M  200G   1% /local/scratch
> 
>  
> 
> and
> 
>  
> 
> slurmd -C
> 
> NodeName=tars-XXX-1 CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6 
> ThreadsPerCore=1 RealMemory=258373 TmpDisk=500
> 
> UpTime=101-21:34:14
> 
>  
> 
>  
> 
> So, slurmd –C gives exactly the same answer but this node doesn’t go into 
> DRAIN state; it works perfectly.
> 
>  
> 
> Thank you again for your help.
> 
>  
> 
> Regards,
> 
>  
> 
> Véronique
> 
>  
> 
>  
> 
>  
> 
> --
> 
> Véronique Legrand
> 
> IT engineer – scientific calculation & software development
> 
> https://research.pasteur.fr/en/member/veronique-legrand/
> 
> Cluster and computing group
> 
> IT department
> 
> Institut Pasteur Paris
> 
> Tel : 95 03
> 
>  
> 
>  
> 
> *From: *"Le Biot, Pierre-Marie" <pierre-marie.leb...@hpe.com 
> <mailto:pierre-marie.leb...@hpe.com>>
> *Reply-To: *slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
> *Date: *Tuesday, 10 October 2017 at 13:53
> *To: *slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
> *Subject: *[slurm-dev] RE: Node always going to DRAIN state with reason=Low 
> TmpDisk
> 
>  
> 
> Hi Véronique,
> 
>  
> 
> Did you check the result of slurmd -C on tars-XXX ?
> 
>  
> 
> Regards,
> 
> Pierre-Marie Le Biot
> 
>  
> 
> *From:*Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]
> *Sent:* Tuesday, October 10, 2017 12:02 PM
> *To:* slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
> *Subject:* [slurm-dev] Node always going to DRAIN state with reason=Low 
> TmpDisk
> 
>  
> 
> Hello,
> 
>  
> 
> I have a problem with 1 node in our cluster. It is exactly as all the other 
> nodes (200 GB of temporary storage)
> 
>  
> 
> Here is what I have in slurm.conf:
> 
>  
> 
> # COMPUTES
> 
> TmpFS=/local/scratch
> 
>  
> 
> # NODES
> 
> GresTypes=disk,gpu
> 
> ReturnToService=2
> 
> NodeName=DEFAULT State=UNKNOWN Gres=disk:204000,gpu:0 TmpDisk=204000
> 
> NodeName=tars-[XXX-YYY] Sockets=2 CoresPerSocket=6 RealMemory=254373 
> Feature=ram256,cpu,fast,normal,long,specific,admin Weight=20
> 
>  
> 
> The node that has the trouble is tars-XXX.
> 
>  
> 
> Here is what I have in gres.conf:
> 
>  
> 
> # Local disk space in MB (/local/scratch)
> 
> NodeName=tars-[ZZZ-UUU] Name=disk Count=204000
> 
>  
> 
> XXX is in range: [ZZZ,UUU].
> 
>  
> 
> If I ssh to tars-XXX, here is what I get:
> 
>  
> 
> -sh-4.1$ df -hl
> 
> Filesystem      Size  Used Avail Use% Mounted on
> 
> slash_root      3.5G  1.6G  1.9G  47% /
> 
> tmpfs           127G     0  127G   0% /dev/shm
> 
> tmpfs           500M   84K  500M   1% /tmp
> 
> /dev/sda1       200G   33M  200G   1% /local/scratch
> 
>  
> 
> /local/scratch is the directory for temporary storage.
> 
>  
> 
> The problem is  when I do
> 
> scontrol show node tars-XXX,
> 
>  
> 
> I get:
> 
>  
> 
> NodeName=tars-XXX Arch=x86_64 CoresPerSocket=6
> 
>    CPUAlloc=0 CPUErr=0 CPUTot=12 CPULoad=0.00
> 
>    AvailableFeatures=ram256,cpu,fast,normal,long,specific,admin
> 
>    ActiveFeatures=ram256,cpu,fast,normal,long,specific,admin
> 
>    Gres=disk:204000,gpu:0
> 
>    NodeAddr=tars-113 NodeHostName=tars-113 Version=16.05
> 
>    OS=Linux RealMemory=254373 AllocMem=0 FreeMem=255087 Sockets=2 Boards=1
> 
>    State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=204000 Weight=20 Owner=N/A 
> MCS_label=N/A
> 
>    BootTime=2017-10-09T17:08:43 SlurmdStartTime=2017-10-09T17:09:57
> 
>    CapWatts=n/a
> 
>    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
> 
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
> 
>    Reason=Low TmpDisk [slurm@2017-10-10T11:25:04]
> 
>  
> 
>  
> 
> And in the slurmctld logs, I get the error message:
> 
> 2017-10-10T08:35:57+02:00 tars-master slurmctld[120352]: error: Node tars-XXX 
> has low tmp_disk size (129186 < 204000)
> 
> 2017-10-10T08:35:57+02:00 tars-master slurmctld[120352]: error: 
> _slurm_rpc_node_registration node=tars-XXX: Invalid argument
> 
>  
> 
> I tried to reboot tars-XXX yesterday but the problem is still here.
> 
> I also tried:
> 
> scontrol update  NodeName=ClusterNode0 State=Resume
> 
> but state went back to DRAIN after a while…
> 
>  
> 
> Does anyone have an idea of what could cause the problem? My configuration 
> files seem correct and there really are 200G free in
> /local/scratch on tars-XXX…
> 
>  
> 
> I thank you in advance for any help.
> 
>  
> 
> Regards,
> 
>  
> 
>  
> 
> Véronique
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> --
> 
> Véronique Legrand
> 
> IT engineer – scientific calculation & software development
> 
> https://research.pasteur.fr/en/member/veronique-legrand/
> 
> Cluster and computing group
> 
> IT department
> 
> Institut Pasteur Paris
> 
> Tel : 95 03
> 
>  
>

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

Reply via email to