Am 14.11.2012 um 15:40 schrieb Loong, Andreas: >>> My fault for not being more clear. It starts up just fine and >> everything >>> I'd expect works just as our old qmaster did. After approx 2 >> minutes it >>> segfaults without anything new or even odd added to the messages >> file. >> >> It sounds most likely to be due to a load report arriving, but I've >> no >> idea what might be wrong. It's rather odd with a clean install on >> Red >> Hat. Is the installation from the RPMs I made, or a separate build? > > They're from the RPMs provided on the SoGE site (arc.liv.. ) - speaking > of RPMs - have you tested the debug package of the qmaster? I can not > get it to run at all. Look at this: > > file sge_qmaster* > sge_qmaster: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), > stripped > sge_qmaster.debug: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not > stripped > > ldd sge_qmaster.debug > statically linked
This is output under several conditions by `ldd`, it's just a script and dangerous if executed as root: http://www.catonmat.net/blog/ldd-arbitrary-code-execution/ > ./sge_qmaster.debug > -bash: ./sge_qmaster.debug: bad ELF interpreter: No such file or > directory Better: $ readelf -l sge_qmaster.debug What do you see in the "Requesting program interpreter:" line? -- Reuti > What's going on here? 'file' says it uses shared libs, ldd says it > doesn't, and it won't run? > >>>>> As soon as I change back from pure files to "files dns" it >> takes >>>>> 2-3 minutes and the qmaster segfaults again. >>>> >>>> Do you mean qmaster runs for that long, or the init script waits >>>> that >>>> long for it? What do you get with and without dns in NSS and >>>> flushing >>>> the nscd cache from >>>> >>>> utilbin/lx-amd64/gethostbyname -all srvname >>> >>> I tried as many options I could think of to get differing results, >> but >>> the output never changed. >> >> There must be something different between them if it affects the >> qmaster, but I don't have any good ideas what to try. There have >> been >> changes made for problems with NSS-style lookup, but I'm surprised >> if >> any of that has caused trouble, and I don't remember any being >> specifically host-related. > > This doesn't have to be NSS related though, does it? I'm guessing here, > but if I configure it to use only files, wouldn't this affect what/how > much it does? What I'm getting is that perhaps we'll still run into the > problem, but it's triggered much less often. > >>> Next, I'll try the GDB approach. >> >> If you can send me a backtrace from that, I hope it will indicate >> what's wrong. > > Got a stacktrace from another user running SuSE, I've forwarded it and > got ticket #1441 in trac. > > Wbr > Andreas > > -------------------------------------------------------------------------- > Confidentiality Notice: This message is private and may contain confidential > and proprietary information. If you have received this message in error, > please notify us and remove it from your system and note that you must not > copy, distribute or take any action in reliance on it. Any unauthorized use > or disclosure of the contents of this message is not permitted and may be > unlawful. > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
