We're running SGE 6.2u5 (Sun) under Linux (CentOS5.6 x86_64) and we're having
a very odd problem when specifying an h_vmem value.
We use environment modules[1] to load default settings and allow the selection
of different packages. Only when h_vmem is specified, we get errors very early
in the environment modules initialization.
The actual h_vmem value and the job are immaterial. For example, if the
job is a shell script that consists of the command "date", it succeeds
when h_vmem is not set, but fails when h_vmem is set to 1G (or 15G).
SGE correctly dispatches the job to a compute node, and the job runs--but
the environment isn't initialized correctly. For something as trivial as
"date", the job works, but for scientific processing that depends on
the environment modules initialization, jobs fail.
The first part of the STDERR is very odd; it is an error message like:
id: cannot find name for group ID 40193
where the group ID number varies, always above 40000. That's the group_id
range assigned in the qmaster.
None of the shell initialization scripts use the GID, and they all succeed
when "h_vmem" is not specified.
Any suggestions of the next step in troubleshooting this issue?
Thanks,
Mark
[1] http://modules.sourceforge.net/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users