[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

Véronique LEGRAND Thu, 12 Oct 2017 02:28:10 -0700

Thank you very much to you and Uwe,

We will do the tests very soon (luckily this happens on only one machine).


As for slurmd –C, this explains why it says “500” for tmpFS’s size.

Regards,

Véronique

--
Véronique Legrand
IT engineer – scientific calculation & software development
https://research.pasteur.fr/en/member/veronique-legrand/
Cluster and computing group
IT department
Institut Pasteur Paris
Tel : 95 03
 

On 12/10/2017, 11:21, "Le Biot, Pierre-Marie" <pierre-marie.leb...@hpe.com> 
wrote:

    Hello Véronique,
    
    slurmd uses statvfs or statfs (choice made at build time) to get TmpFS size 
(with a default of /tmp if TmpFS is null).
    
    I think Uwe is right, it could be a filesystem mount timing problem.
    
    You can test it easily :
        - starting from a correct state, stop slurmd
        - unmount /local/scratch
        - start slurmd
        - check slurmd.log
    
    Regarding slurmd -C , I checked the source (17.2.6) the size of /tmp  is 
returned (hardcoded), not sure if it should be considered as a bug.
    
    Regards,
    Pierre-Marie Le Biot
    
    -----Original Message-----
    From: Uwe Sauter [mailto:uwe.sauter...@gmail.com] 
    Sent: Wednesday, October 11, 2017 4:18 PM
    To: slurm-dev <slurm-dev@schedmd.com>
    Subject: [slurm-dev] RE: Node always going to DRAIN state with reason=Low 
TmpDisk
    
    
    What distribution are you using? If it is using systemd then it is possible 
that slurmd gets started before /local/scratch is mounted. You'd need to add a 
dependency to the slurmd service so it waits till /local/scratch is mounted 
before the service is started.
    
    
    Am 11.10.2017 um 15:38 schrieb Véronique  LEGRAND:
    > Hello Pierre-Marie,
    > 
    >  
    > 
    > I stopped the slurmd daemon on tars-XXX then restarted it in the 
foreground with:
    > 
    >  
    > 
    > sudo /my/path/to /slurmd -vvvvvvvv -D -d /opt/slurm/sbin/slurmstepd
    > 
    >  
    > 
    > and got:
    > 
    > slurmd: Gres Name=disk Type=(null) Count=204000
    > 
    > and also:
    > 
    > slurmd: debug3: CPUs=12 Boards=1 Sockets=2 Cores=6 Threads=1 
    > Memory=258373 TmpDisk=204699 Uptime=162294 CPUSpecList=(null)
    > FeaturesAvail=(null) FeaturesActive=(null)
    > 
    > in the output.
    > 
    > So, the value this time was correct.
    > 
    >  
    > 
    > In slurmctld.log, I have:
    > 
    > 2017-10-11T12:24:01+02:00 tars-master slurmctld[120352]: Node 
    > *tars-113* now responding
    > 
    > 2017-10-11T12:24:01+02:00 tars-master slurmctld[120352]: node 
    > *tars-113* returned to service
    > 
    >  
    > 
    > I waited 2 hours and didn’t get any : error: Node tars-XXX has low 
    > tmp_disk size (129186 < 204000)
    > 
    > So, I stopped it and started it again in the usual way:
    > 
    >  
    > 
    > sudo /etc/init.d/slurm start at 2:38 pm
    > 
    >  
    > 
    > I got no error message in slurmctld.log and no erroneous value in 
slurmd.log.
    > 
    >  
    > 
    > A 2:49 pm, I reboot the machine and here is what I got in slurmd.log:
    > 
    >  
    > 
    > -sh-4.1$ sudo cat slurmd.log
    > 
    > 2017-10-11T14:50:30.742049+02:00 tars-113 slurmd[18621]: Message 
    > aggregation enabled: WindowMsgs=24, WindowTime=200
    > 
    > 2017-10-11T14:50:30.797696+02:00 tars-113 slurmd[18621]: CPU frequency 
    > setting not configured for this node
    > 
    > 2017-10-11T14:50:30.797706+02:00 tars-113 slurmd[18621]: Resource 
    > spec: Reserved system memory limit not configured for this node
    > 
    > 2017-10-11T14:50:30.986903+02:00 tars-113 slurmd[18621]: cgroup 
    > namespace 'freezer' is now mounted
    > 
    > 2017-10-11T14:50:31.023900+02:00 tars-113 slurmd[18621]: cgroup 
    > namespace 'cpuset' is now mounted
    > 
    > 2017-10-11T14:50:31.066430+02:00 tars-113 slurmd[18633]: slurmd 
    > version 16.05.9 started
    > 
    > 2017-10-11T14:50:31.123213+02:00 tars-113 slurmd[18633]: slurmd 
    > started on Wed, 11 Oct 2017 14:50:31 +0200
    > 
    > 2017-10-11T14:50:31.123493+02:00 tars-113 slurmd[18633]: CPUs=12 
    > Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186
    > Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) 
    > FeaturesActive=(null)
    > 
    >  
    > 
    > Erroneous value was back again!
    > 
    >  
    > 
    > So, I did again :
    > 
    > sudo /etc/init.d/slurm stop
    > 
    > sudo /etc/init.d/slurm start
    > 
    >  
    > 
    > and the following lines were added to the log:
    > 
    > 2017-10-11T14:51:29.707556+02:00 tars-113 slurmd[18633]: Slurmd 
    > shutdown completing
    > 
    > 2017-10-11T14:51:51.496552+02:00 tars-113 slurmd[19047]: Message 
    > aggregation enabled: WindowMsgs=24, WindowTime=200
    > 
    > 2017-10-11T14:51:51.555792+02:00 tars-113 slurmd[19047]: CPU frequency 
    > setting not configured for this node
    > 
    > 2017-10-11T14:51:51.555803+02:00 tars-113 slurmd[19047]: Resource 
    > spec: Reserved system memory limit not configured for this node
    > 
    > 2017-10-11T14:51:51.567003+02:00 tars-113 slurmd[19049]: slurmd 
    > version 16.05.9 started
    > 
    > 2017-10-11T14:51:51.569174+02:00 tars-113 slurmd[19049]: slurmd 
    > started on Wed, 11 Oct 2017 14:51:51 +0200
    > 
    > 2017-10-11T14:51:51.569533+02:00 tars-113 slurmd[19049]: CPUs=12 
    > Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=204699
    > Uptime=155 CPUSpecList=(null) FeaturesAvail=(null) 
    > FeaturesActive=(null)
    > 
    >  
    > 
    > The value for TmpDisk was correct again.
    > 
    >  
    > 
    > So, my question is, where does slurmd read the value for the size of 
    > /local/scratch. Does it use “df” or another command? It seems that on 
    > startup, slurmd reads a value that is not yet set correctly…
    > 
    >  
    > 
    > Thank you in advance for any help.
    > 
    >  
    > 
    > Regards,
    > 
    >  
    > 
    > Véronique
    > 
    >  
    > 
    >  
    > 
    >  
    > 
    > --
    > 
    > Véronique Legrand
    > 
    > IT engineer – scientific calculation & software development
    > 
    > https://research.pasteur.fr/en/member/veronique-legrand/
    > 
    > Cluster and computing group
    > 
    > IT department
    > 
    > Institut Pasteur Paris
    > 
    > Tel : 95 03
    > 
    >  
    > 
    >  
    > 
    > *From: *"Le Biot, Pierre-Marie" <pierre-marie.leb...@hpe.com>
    > *Reply-To: *slurm-dev <slurm-dev@schedmd.com>
    > *Date: *Tuesday, 10 October 2017 at 17:00
    > *To: *slurm-dev <slurm-dev@schedmd.com>
    > *Subject: *[slurm-dev] RE: Node always going to DRAIN state with 
    > reason=Low TmpDisk
    > 
    >  
    > 
    > Véronique,
    > 
    >  
    > 
    > So that’s the culprit :
    > 
    > 2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12 
    > Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186
    > Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) 
    > FeaturesActive=(null)
    > 
    >  
    > 
    > For a reason you have to determine, when slurmd starts on tars-XXX it 
    > finds that the size of /local/scratch (assuming that this is the value 
    > of TmpFS in slurm.conf for this node)  is 129186MB and sends this value 
to slurmctld which compares it with the value recorded in slurm.conf, that is 
204000 for that node.
    > 
    > By the way, 129186MB is very close to the size of /dev/shm…
    > 
    >  
    > 
    > About the value returned by slurmd -C (500), it could be that /tmp is 
harcoded somewhere.
    > 
    >  
    > 
    > Regards,
    > 
    > Pierre-Marie Le Biot
    > 
    >  
    > 
    > *From:*Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]
    > *Sent:* Tuesday, October 10, 2017 4:33 PM
    > *To:* slurm-dev <slurm-dev@schedmd.com>
    > *Subject:* [slurm-dev] RE: Node always going to DRAIN state with 
    > reason=Low TmpDisk
    > 
    >  
    > 
    > Pierre-Marie,
    > 
    >  
    > 
    > Here is what I have in slurmd.log on tars-XXX
    > 
    >  
    > 
    > -sh-4.1$ sudo cat slurmd.log
    > 
    > 2017-10-09T17:09:57.538636+02:00 tars-XXX slurmd[18597]: Message 
    > aggregation enabled: WindowMsgs=24, WindowTime=200
    > 
    > 2017-10-09T17:09:57.647486+02:00 tars-XXX slurmd[18597]: CPU frequency 
    > setting not configured for this node
    > 
    > 2017-10-09T17:09:57.647499+02:00 tars-XXX slurmd[18597]: Resource 
    > spec: Reserved system memory limit not configured for this node
    > 
    > 2017-10-09T17:09:57.808352+02:00 tars-XXX slurmd[18597]: cgroup 
    > namespace 'freezer' is now mounted
    > 
    > 2017-10-09T17:09:57.844400+02:00 tars-XXX slurmd[18597]: cgroup 
    > namespace 'cpuset' is now mounted
    > 
    > 2017-10-09T17:09:57.902418+02:00 tars-XXX slurmd[18640]: slurmd 
    > version 16.05.9 started
    > 
    > 2017-10-09T17:09:57.957030+02:00 tars-XXX slurmd[18640]: slurmd 
    > started on Mon, 09 Oct 2017 17:09:57 +0200
    > 
    > 2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12 
    > Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186
    > Uptime=74 CPUSpecList=(null) FeaturesAvail=(null) 
    > FeaturesActive=(null)
    > 
    >  
    > 
    >  
    > 
    > --
    > 
    > Véronique Legrand
    > 
    > IT engineer – scientific calculation & software development
    > 
    > https://research.pasteur.fr/en/member/veronique-legrand/
    > 
    > Cluster and computing group
    > 
    > IT department
    > 
    > Institut Pasteur Paris
    > 
    > Tel : 95 03
    > 
    >  
    > 
    >  
    > 
    > *From: *"Le Biot, Pierre-Marie" <pierre-marie.leb...@hpe.com 
    > <mailto:pierre-marie.leb...@hpe.com>>
    > *Reply-To: *slurm-dev <slurm-dev@schedmd.com 
    > <mailto:slurm-dev@schedmd.com>>
    > *Date: *Tuesday, 10 October 2017 at 15:20
    > *To: *slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
    > *Subject: *[slurm-dev] RE: Node always going to DRAIN state with 
    > reason=Low TmpDisk
    > 
    >  
    > 
    > Véronique,
    > 
    >  
    > 
    > This not what I expected, I was thinking slurmd -C would return 
TmpDisk=204000 or more probably 129186 as seen in slurmctld log.
    > 
    >  
    > 
    > I suppose that you already checked slurmd logs on tars-XXX ?
    > 
    >  
    > 
    > Regards,
    > 
    > Pierre-Marie Le Biot
    > 
    >  
    > 
    > *From:*Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]
    > *Sent:* Tuesday, October 10, 2017 2:09 PM
    > *To:* slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
    > *Subject:* [slurm-dev] RE: Node always going to DRAIN state with 
    > reason=Low TmpDisk
    > 
    >  
    > 
    > Hello Pierre-Marie,
    > 
    >  
    > 
    > First, thank you for your hint.
    > 
    > I just tried.
    > 
    >  
    > 
    >>slurmd -C
    > 
    > NodeName=tars-XXX CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6 
    > ThreadsPerCore=1 RealMemory=258373 TmpDisk=500
    > 
    > UpTime=0-20:50:54
    > 
    >  
    > 
    > The value for TmpDisk is erroneous. I do not know what can be the 
    > cause of this since the operating system df command gives the right 
values.
    > 
    >  
    > 
    > -sh-4.1$ df -hl
    > 
    > Filesystem      Size  Used Avail Use% Mounted on
    > 
    > slash_root      3.5G  1.6G  1.9G  47% /
    > 
    > tmpfs           127G     0  127G   0% /dev/shm
    > 
    > tmpfs           500M   84K  500M   1% /tmp
    > 
    > /dev/sda1       200G   33M  200G   1% /local/scratch
    > 
    >  
    > 
    >  
    > 
    > Could slurmd be messing up tmpfs with /local/scratch?
    > 
    >  
    > 
    > I tried the same thing on another similar node (tars-XXX-1)
    > 
    >  
    > 
    > I got:
    > 
    >  
    > 
    > -sh-4.1$ df -hl
    > 
    > Filesystem      Size  Used Avail Use% Mounted on
    > 
    > slash_root      3.5G  1.7G  1.8G  49% /
    > 
    > tmpfs           127G     0  127G   0% /dev/shm
    > 
    > tmpfs           500M  5.7M  495M   2% /tmp
    > 
    > /dev/sda1       200G   33M  200G   1% /local/scratch
    > 
    >  
    > 
    > and
    > 
    >  
    > 
    > slurmd -C
    > 
    > NodeName=tars-XXX-1 CPUs=12 Boards=1 SocketsPerBoard=2 
    > CoresPerSocket=6 ThreadsPerCore=1 RealMemory=258373 TmpDisk=500
    > 
    > UpTime=101-21:34:14
    > 
    >  
    > 
    >  
    > 
    > So, slurmd –C gives exactly the same answer but this node doesn’t go into 
DRAIN state; it works perfectly.
    > 
    >  
    > 
    > Thank you again for your help.
    > 
    >  
    > 
    > Regards,
    > 
    >  
    > 
    > Véronique
    > 
    >  
    > 
    >  
    > 
    >  
    > 
    > --
    > 
    > Véronique Legrand
    > 
    > IT engineer – scientific calculation & software development
    > 
    > https://research.pasteur.fr/en/member/veronique-legrand/
    > 
    > Cluster and computing group
    > 
    > IT department
    > 
    > Institut Pasteur Paris
    > 
    > Tel : 95 03
    > 
    >  
    > 
    >  
    > 
    > *From: *"Le Biot, Pierre-Marie" <pierre-marie.leb...@hpe.com 
    > <mailto:pierre-marie.leb...@hpe.com>>
    > *Reply-To: *slurm-dev <slurm-dev@schedmd.com 
    > <mailto:slurm-dev@schedmd.com>>
    > *Date: *Tuesday, 10 October 2017 at 13:53
    > *To: *slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
    > *Subject: *[slurm-dev] RE: Node always going to DRAIN state with 
    > reason=Low TmpDisk
    > 
    >  
    > 
    > Hi Véronique,
    > 
    >  
    > 
    > Did you check the result of slurmd -C on tars-XXX ?
    > 
    >  
    > 
    > Regards,
    > 
    > Pierre-Marie Le Biot
    > 
    >  
    > 
    > *From:*Véronique LEGRAND [mailto:veronique.legr...@pasteur.fr]
    > *Sent:* Tuesday, October 10, 2017 12:02 PM
    > *To:* slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
    > *Subject:* [slurm-dev] Node always going to DRAIN state with 
    > reason=Low TmpDisk
    > 
    >  
    > 
    > Hello,
    > 
    >  
    > 
    > I have a problem with 1 node in our cluster. It is exactly as all the 
    > other nodes (200 GB of temporary storage)
    > 
    >  
    > 
    > Here is what I have in slurm.conf:
    > 
    >  
    > 
    > # COMPUTES
    > 
    > TmpFS=/local/scratch
    > 
    >  
    > 
    > # NODES
    > 
    > GresTypes=disk,gpu
    > 
    > ReturnToService=2
    > 
    > NodeName=DEFAULT State=UNKNOWN Gres=disk:204000,gpu:0 TmpDisk=204000
    > 
    > NodeName=tars-[XXX-YYY] Sockets=2 CoresPerSocket=6 RealMemory=254373 
    > Feature=ram256,cpu,fast,normal,long,specific,admin Weight=20
    > 
    >  
    > 
    > The node that has the trouble is tars-XXX.
    > 
    >  
    > 
    > Here is what I have in gres.conf:
    > 
    >  
    > 
    > # Local disk space in MB (/local/scratch)
    > 
    > NodeName=tars-[ZZZ-UUU] Name=disk Count=204000
    > 
    >  
    > 
    > XXX is in range: [ZZZ,UUU].
    > 
    >  
    > 
    > If I ssh to tars-XXX, here is what I get:
    > 
    >  
    > 
    > -sh-4.1$ df -hl
    > 
    > Filesystem      Size  Used Avail Use% Mounted on
    > 
    > slash_root      3.5G  1.6G  1.9G  47% /
    > 
    > tmpfs           127G     0  127G   0% /dev/shm
    > 
    > tmpfs           500M   84K  500M   1% /tmp
    > 
    > /dev/sda1       200G   33M  200G   1% /local/scratch
    > 
    >  
    > 
    > /local/scratch is the directory for temporary storage.
    > 
    >  
    > 
    > The problem is  when I do
    > 
    > scontrol show node tars-XXX,
    > 
    >  
    > 
    > I get:
    > 
    >  
    > 
    > NodeName=tars-XXX Arch=x86_64 CoresPerSocket=6
    > 
    >    CPUAlloc=0 CPUErr=0 CPUTot=12 CPULoad=0.00
    > 
    >    AvailableFeatures=ram256,cpu,fast,normal,long,specific,admin
    > 
    >    ActiveFeatures=ram256,cpu,fast,normal,long,specific,admin
    > 
    >    Gres=disk:204000,gpu:0
    > 
    >    NodeAddr=tars-113 NodeHostName=tars-113 Version=16.05
    > 
    >    OS=Linux RealMemory=254373 AllocMem=0 FreeMem=255087 Sockets=2 
    > Boards=1
    > 
    >    State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=204000 Weight=20 
    > Owner=N/A MCS_label=N/A
    > 
    >    BootTime=2017-10-09T17:08:43 SlurmdStartTime=2017-10-09T17:09:57
    > 
    >    CapWatts=n/a
    > 
    >    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
    > 
    >    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
    > 
    >    Reason=Low TmpDisk [slurm@2017-10-10T11:25:04]
    > 
    >  
    > 
    >  
    > 
    > And in the slurmctld logs, I get the error message:
    > 
    > 2017-10-10T08:35:57+02:00 tars-master slurmctld[120352]: error: Node 
    > tars-XXX has low tmp_disk size (129186 < 204000)
    > 
    > 2017-10-10T08:35:57+02:00 tars-master slurmctld[120352]: error: 
    > _slurm_rpc_node_registration node=tars-XXX: Invalid argument
    > 
    >  
    > 
    > I tried to reboot tars-XXX yesterday but the problem is still here.
    > 
    > I also tried:
    > 
    > scontrol update  NodeName=ClusterNode0 State=Resume
    > 
    > but state went back to DRAIN after a while…
    > 
    >  
    > 
    > Does anyone have an idea of what could cause the problem? My 
    > configuration files seem correct and there really are 200G free in 
    > /local/scratch on tars-XXX…
    > 
    >  
    > 
    > I thank you in advance for any help.
    > 
    >  
    > 
    > Regards,
    > 
    >  
    > 
    >  
    > 
    > Véronique
    > 
    >  
    > 
    >  
    > 
    >  
    > 
    >  
    > 
    >  
    > 
    >  
    > 
    >  
    > 
    > --
    > 
    > Véronique Legrand
    > 
    > IT engineer – scientific calculation & software development
    > 
    > https://research.pasteur.fr/en/member/veronique-legrand/
    > 
    > Cluster and computing group
    > 
    > IT department
    > 
    > Institut Pasteur Paris
    > 
    > Tel : 95 03
    > 
    >  
    >

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

Reply via email to