we have the same problem with 14.11.2. version 14.11.1 does not have that 
problem at all. 

Fred


----- Original Message -----
From: Lennart Karlsson <[email protected]>
To: slurm-dev <[email protected]>
Cc: 
Sent: Wednesday, January 7, 2015 1:15 AM
Subject: [slurm-dev] Re: squeue SEGV error in 14.11.2


On 01/06/2015 05:01 PM, Andy Riebs wrote:
>     We are seeing occasional SEGV's from squeue in 14.11.2 that we
>   hadn't seen previously. As near as we can tell, it might happen
>       when the reason for not scheduling jobs is longer than 32
>       characters, due to the particular request, such as
>     JobState=PENDING
>     Reason=ReqNodeNotAvail(Unavailable:noden[0692,0777,1788,1836])
>
>     Does this ring a bell?
>
>     Andy
>   --
> Andy Riebs
> Hewlett-Packard Company
> High Performance Computing
> +1 404 648 9024
> My opinions are not necessarily those of HP

Hi,

Today I try to upgrade from version 14.03.7 to version 14.11.2 and seem
to get the same problem. A simple "squeue" command without parameters gives:

# squeue
              JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
*** buffer overflow detected ***: squeue terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x2b5d8eb1e697]
/lib64/libc.so.6(+0x100580)[0x2b5d8eb1c580]
/lib64/libc.so.6(+0xffc7b)[0x2b5d8eb1bc7b]
/lib64/libc.so.6(__snprintf_chk+0x7a)[0x2b5d8eb1bb4a]
squeue(_print_job_reason_list+0x9d)[0x428c3d]
squeue[0x427735]
squeue(print_job_from_format+0x128)[0x428558]
squeue(slurm_list_for_each+0x4e)[0x4436fe]
squeue(print_jobs_array+0x566)[0x429856]
squeue[0x425142]
squeue(main+0x1f8)[0x4256a8]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x2b5d8ea3ad5d]
squeue[0x424f79]
======= Memory map: ========
00400000-00549000 r-xp 00000000 fd:00 1321798                            
/usr/bin/squeue
00748000-0074f000 rw-p 00148000 fd:00 1321798                            
/usr/bin/squeue
0074f000-00753000 rw-p 00000000 00:00 0
017f2000-019de000 rw-p 00000000 00:00 0                                  [heap]
2b5d8e3d8000-2b5d8e3f8000 r-xp 00000000 fd:00 1179685                    
/lib64/ld-2.12.so
2b5d8e3f8000-2b5d8e3f9000 rw-p 00000000 00:00 0
2b5d8e5f7000-2b5d8e5f8000 r--p 0001f000 fd:00 1179685                    
/lib64/ld-2.12.so
2b5d8e5f8000-2b5d8e5f9000 rw-p 00020000 fd:00 1179685                    
/lib64/ld-2.12.so
2b5d8e5f9000-2b5d8e5fa000 rw-p 00000000 00:00 0
2b5d8e5fa000-2b5d8e5fc000 r-xp 00000000 fd:00 1179731                    
/lib64/libdl-2.12.so
2b5d8e5fc000-2b5d8e7fc000 ---p 00002000 fd:00 1179731                    
/lib64/libdl-2.12.so
2b5d8e7fc000-2b5d8e7fd000 r--p 00002000 fd:00 1179731                    
/lib64/libdl-2.12.so
2b5d8e7fd000-2b5d8e7fe000 rw-p 00003000 fd:00 1179731                    
/lib64/libdl-2.12.so
2b5d8e7fe000-2b5d8e815000 r-xp 00000000 fd:00 1179698                    
/lib64/libpthread-2.12.so
2b5d8e815000-2b5d8ea15000 ---p 00017000 fd:00 1179698                    
/lib64/libpthread-2.12.so
2b5d8ea15000-2b5d8ea16000 r--p 00017000 fd:00 1179698                    
/lib64/libpthread-2.12.so
2b5d8ea16000-2b5d8ea17000 rw-p 00018000 fd:00 1179698                    
/lib64/libpthread-2.12.so
2b5d8ea17000-2b5d8ea1c000 rw-p 00000000 00:00 0
2b5d8ea1c000-2b5d8eba6000 r-xp 00000000 fd:00 1179674                    
/lib64/libc-2.12.so
2b5d8eba6000-2b5d8eda6000 ---p 0018a000 fd:00 1179674                    
/lib64/libc-2.12.so
2b5d8eda6000-2b5d8edaa000 r--p 0018a000 fd:00 1179674                    
/lib64/libc-2.12.so
2b5d8edaa000-2b5d8edab000 rw-p 0018e000 fd:00 1179674                    
/lib64/libc-2.12.so
2b5d8edab000-2b5d8edb2000 rw-p 00000000 00:00 0
2b5d8edb2000-2b5d8edbe000 r-xp 00000000 fd:00 1181056                    
/lib64/libnss_files-2.12.so
2b5d8edbe000-2b5d8efbe000 ---p 0000c000 fd:00 1181056                    
/lib64/libnss_files-2.12.so
2b5d8efbe000-2b5d8efbf000 r--p 0000c000 fd:00 1181056                    
/lib64/libnss_files-2.12.so
2b5d8efbf000-2b5d8efc0000 rw-p 0000d000 fd:00 1181056                    
/lib64/libnss_files-2.12.so
2b5d8efc0000-2b5d8f083000 r-xp 00000000 fd:00 1183698                    
/lib64/libnss_db-2.2.3.so
2b5d8f083000-2b5d8f283000 ---p 000c3000 fd:00 1183698                    
/lib64/libnss_db-2.2.3.so
2b5d8f283000-2b5d8f285000 rw-p 000c3000 fd:00 1183698                    
/lib64/libnss_db-2.2.3.so
2b5d8f285000-2b5d8f28a000 r-xp 00000000 fd:00 1179688                    
/lib64/libnss_dns-2.12.so
2b5d8f28a000-2b5d8f489000 ---p 00005000 fd:00 1179688                    
/lib64/libnss_dns-2.12.so
2b5d8f489000-2b5d8f48a000 r--p 00004000 fd:00 1179688                    
/lib64/libnss_dns-2.12.so
2b5d8f48a000-2b5d8f48b000 rw-p 00005000 fd:00 1179688                    
/lib64/libnss_dns-2.12.so
2b5d8f48b000-2b5d8f4a1000 r-xp 00000000 fd:00 1181062                    
/lib64/libresolv-2.12.so
2b5d8f4a1000-2b5d8f6a1000 ---p 00016000 fd:00 1181062                    
/lib64/libresolv-2.12.so
2b5d8f6a1000-2b5d8f6a2000 r--p 00016000 fd:00 1181062                    
/lib64/libresolv-2.12.so
2b5d8f6a2000-2b5d8f6a3000 rw-p 00017000 fd:00 1181062                    
/lib64/libresolv-2.12.so
2b5d8f6a3000-2b5d8f6a5000 rw-p 00000000 00:00 0
2b5d8f6a5000-2b5d8f6a8000 r-xp 00000000 fd:00 1329386                    
/usr/lib64/slurm/auth_munge.so
2b5d8f6a8000-2b5d8f8a7000 ---p 00003000 fd:00 1329386                    
/usr/lib64/slurm/auth_munge.so
2b5d8f8a7000-2b5d8f8a8000 rw-p 00002000 fd:00 1329386                    
/usr/lib64/slurm/auth_munge.so
2b5d8f8a8000-2b5d8f8b0000 r-xp 00000000 fd:00 1321711                    
/usr/lib64/libmunge.so.2.0.0
2b5d8f8b0000-2b5d8fab0000 ---p 00008000 fd:00 1321711                    
/usr/lib64/libmunge.so.2.0.0
2b5d8fab0000-2b5d8fab1000 rw-p 00008000 fd:00 1321711                    
/usr/lib64/libmunge.so.2.0.0
2b5d8fab1000-2b5d8fab2000 rw-p 00000000 00:00 0
2b5d8fbd4000-2b5d8fcdc000 rw-p 00000000 00:00 0
2b5d8fcdc000-2b5d8fce7000 r-xp 00000000 fd:00 1329261                    
/usr/lib64/slurm/select_cray.so
2b5d8fce7000-2b5d8fee7000 ---p 0000b000 fd:00 1329261                    
/usr/lib64/slurm/select_cray.so
2b5d8fee7000-2b5d8fee8000 rw-p 0000b000 fd:00 1329261                    
/usr/lib64/slurm/select_cray.so
2b5d8fee8000-2b5d8fef1000 r-xp 00000000 fd:00 1321772                    
/usr/lib64/slurm/select_alps.so
2b5d8fef1000-2b5d900f0000 ---p 00009000 fd:00 1321772                    
/usr/lib64/slurm/select_alps.so
2b5d900f0000-2b5d900f1000 rw-p 00008000 fd:00 1321772                    
/usr/lib64/slurm/select_alps.so
2b5d900f1000-2b5d900f2000 rw-p 00000000 00:00 0
2b5d900f2000-2b5d90101000 r-xp 00000000 fd:00 1321773                    
/usr/lib64/slurm/select_bluegene.so
2b5d90101000-2b5d90301000 ---p 0000f000 fd:00 1321773                    
/usr/lib64/slurm/select_bluegene.so
2b5d90301000-2b5d90302000 rw-p 0000f000 fd:00 1321773                    
/usr/lib64/slurm/select_bluegene.so
2b5d90319000-2b5d90401000 r-xp 00000000 fd:00 1312091                    
/usr/lib64/libstdc++.so.6.0.13
2b5d90401000-2b5d90601000 ---p 000e8000 fd:00 1312091                    
/usr/lib64/libstdc++.so.6.0.13
2b5d90601000-2b5d90608000 r--p 000e8000 fd:00 1312091                    
/usr/lib64/libstdc++.so.6.0.13
2b5d90608000-2b5d9060a000 rw-p 000ef000 fd:00 1312091                    
/usr/lib64/libstdc++.so.6.0.13
2b5d9060a000-2b5d9061f000 rw-p 00000000 00:00 0
2b5d9061f000-2b5d906a2000 r-xp 00000000 fd:00 1181043                    
/lib64/libm-2.12.so
2b5d906a2000-2b5d908a1000 ---p 00083000 fd:00 1181043                    
/lib64/libm-2.12.so
2b5d908a1000-2b5d908a2000 r--p 00082000 fd:00 1181043                    
/lib64/libm-2.12.so           4208962      node q_timing      lka PD       0:00 
     1 Aborted (core dumped)



And, yes, we have long reasons with the new Slurm version, the examples below 
coming
from a "scontrol show job|grep Reason=" command:
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)
    JobState=PENDING 
Reason=ReqNodeNotAvail(Unavailable:m[13,19,55,79,125,128,134,138,140,167,198,200])
 Dependency=(null)

Too bad, I probably have to wait for a future version. Let me see if I can keep 
the new slurmdbd version...

Best regards,
-- Lennart Karlsson, UPPMAX, Uppsala University, Sweden
    http://uppmax.uu.se

Reply via email to