Hello,

we experienced a problem on RHEL/CentOS 6 machines with qlogin/qrsh via the
builtin starter. The job seems to be scheduled and started fine, but for some
reason the shell at the end won't start and the job ends with a commlib error:

  $ qlogin -verbose -q queue@host
  Your job 998 ("QLOGIN") has been submitted
  waiting for interactive job to be scheduled ...
  Your interactive job 998 has been successfully scheduled.
  Establishing builtin session to host exechost.f.q.d.n ...
  error: commlib error: got read error (closing 
"exechost.f.q.d.n/shepherd_ijs/2")

Tracing through the execd on the destination machine showed that the execle() 
call
for the shell failed with EFAULT:

  write(4, "07/09/2013 08:30:44 [50449:30912]: execle(/bin/bash, -bash, NULL,
        env)\n", 71) = 71
  execve("/bin/bash", ["-bash"], ["SHELL=/bin/bash", "HOME=/home/username",
         "TERM=xterm", "LOGNAME=username", "PATH=/bin:/usr/bin",
         0x7fffffffffff]) = -1 EFAULT

After some digging it looks like the environment array the funtion
start_qlogin_job() generates isn't properly ended with a NULL pointer any more
(like it was in the SGE 6.2u5 source).

The attached trivial patch fixed our problems.


Regards,
Thomas Mainka

-- 
Thomas Mainka                       science+computing ag
System Administration               Hagellocher Weg 73
mail: t.mai...@science-computing.de 72070 Tuebingen, Germany
tel.: +49 7071 9457 472             www.science-computing.de
-- 
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs, 
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
*** sge-8.1.3.orig/source/daemons/shepherd/builtin_starter.c	Sat Feb 23 21:44:10 2013
--- sge-8.1.3/source/daemons/shepherd/builtin_starter.c	Tue Jul  9 08:30:11 2013
***************
*** 1943,1948 ****
--- 1943,1949 ----
     /* This used to be set explicitly for a long list of targets, and
        default to /usr/bin, but there seems no reason to exclude /bin.  */
     my_env[i++] = strcat(path, "/bin:/usr/bin");
+    my_env[i] = NULL;
  
     sge_free(&buffer);
  
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to