[slurm-dev] Re: slurmctld causes slurmdbd to seg fault

Doug Meyer Sat, 21 Oct 2017 06:46:12 -0700

Hi,

Would love to say I have had success but am in midstride with the same
issue.  Looks like the simple path is to upgrade to v16 but until now have
been pleased with v15.08


Here is a post describing the fix
https://groups.google.com/forum/#!searchin/slurm-devel/dropbox/slurm-devel/WF4a36l0Y9g/U_XcpgKGBQAJ

Here is the link containing lost.pl reference that will help you see the
scope of the problem.

https://groups.google.com/forum/#!searchin/slurm-devel/lost.pl/slurm-devel/TQcerLLEKAU/6QtpxZ2PBgAJ

Good luck,
Doug

On Tue, Oct 17, 2017 at 8:58 AM, Douglas Jacobsen <dmjacob...@lbl.gov>
wrote:

> You probably have a core file in the directory where slurmdbd logs to, a
> back trace from gdb would be most telling
>
> On Oct 17, 2017 08:17, "Loris Bennett" <loris.benn...@fu-berlin.de> wrote:
>
>>
>> Hi,
>>
>> We have been having some with NFS mounts via Infiniband getting dropped
>> by nodes.  We ended up switching our main admin server, which provides
>> NFS and Slurm from one machine to another.
>>
>> Now, however, if slurmdbd is started, as soon as slurmctld starts,
>> slurmdbd seg faults.  In the slurmdbd.log we have
>>
>>   slurmdbd: error: We have more allocated time than is possible (7724741
>> > 7012800) for cluster soroban(1948) from 2017-10-17T16:00:00 -
>> 2017-10-17T17:00:00 tres 1
>>   slurmdbd: error: We have more time than is possible
>> (7012800+36720+0)(7049520) > 7012800 for cluster soroban(1948) from
>> 2017-10-17T16:00:00 - 2017-10-17T17:00:00 tres 1
>>   slurmdbd: Warning: Note very large processing time from hourly_rollup
>> for soroban: usec=46390426 began=17:08:17.777
>>   Segmentation fault (core dumped)
>>
>> and the corresponding output of strace is
>>
>>   fstat(3, {st_mode=S_IFREG|0600, st_size=871270, ...}) = 0
>>   write(3, "[2017-10-17T17:09:04.168] Warnin"..., 132) = 132
>>   +++ killed by SIGSEGV (core dumped) +++
>>
>> We're running 17.02.7.  Any ideas?
>>
>> Cheers,
>>
>> Loris
>>
>> --
>> Dr. Loris Bennett (Mr.)
>> ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de
>>
>

[slurm-dev] Re: slurmctld causes slurmdbd to seg fault

Reply via email to