Hello,
we experienced a problem on RHEL/CentOS 6 machines with qlogin/qrsh via the builtin starter. The job seems to be scheduled and started fine, but for some reason the shell at the end won't start and the job ends with a commlib error: $ qlogin -verbose -q queue@host Your job 998 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ... Your interactive job 998 has been successfully scheduled. Establishing builtin session to host exechost.f.q.d.n ... error: commlib error: got read error (closing "exechost.f.q.d.n/shepherd_ijs/2") Tracing through the execd on the destination machine showed that the execle() call for the shell failed with EFAULT: write(4, "07/09/2013 08:30:44 [50449:30912]: execle(/bin/bash, -bash, NULL, env)\n", 71) = 71 execve("/bin/bash", ["-bash"], ["SHELL=/bin/bash", "HOME=/home/username", "TERM=xterm", "LOGNAME=username", "PATH=/bin:/usr/bin", 0x7fffffffffff]) = -1 EFAULT After some digging it looks like the environment array the funtion start_qlogin_job() generates isn't properly ended with a NULL pointer any more (like it was in the SGE 6.2u5 source). The attached trivial patch fixed our problems. Regards, Thomas Mainka -- Thomas Mainka science+computing ag System Administration Hagellocher Weg 73 mail: t.mai...@science-computing.de 72070 Tuebingen, Germany tel.: +49 7071 9457 472 www.science-computing.de -- Vorstandsvorsitzender/Chairman of the board of management: Gerd-Lothar Leonhart Vorstand/Board of Management: Dr. Bernd Finkbeiner, Michael Heinrichs, Dr. Arno Steitz, Dr. Ingrid Zech Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196
*** sge-8.1.3.orig/source/daemons/shepherd/builtin_starter.c Sat Feb 23 21:44:10 2013 --- sge-8.1.3/source/daemons/shepherd/builtin_starter.c Tue Jul 9 08:30:11 2013 *************** *** 1943,1948 **** --- 1943,1949 ---- /* This used to be set explicitly for a long list of targets, and default to /usr/bin, but there seems no reason to exclude /bin. */ my_env[i++] = strcat(path, "/bin:/usr/bin"); + my_env[i] = NULL; sge_free(&buffer);
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users