Am 15.07.2011 um 22:32 schrieb [email protected]: > We're running SGE 6.2u5 (Sun) under Linux (CentOS5.6 x86_64) and we're having > a very odd problem when specifying an h_vmem value. > > We use environment modules[1] to load default settings and allow the selection > of different packages. Only when h_vmem is specified, we get errors very early > in the environment modules initialization. > > The actual h_vmem value and the job are immaterial. For example, if the > job is a shell script that consists of the command "date", it succeeds > when h_vmem is not set, but fails when h_vmem is set to 1G (or 15G). > > SGE correctly dispatches the job to a compute node, and the job runs--but > the environment isn't initialized correctly. For something as trivial as > "date", the job works, but for scientific processing that depends on > the environment modules initialization, jobs fail. > > The first part of the STDERR is very odd; it is an error message like: > > id: cannot find name for group ID 40193
SGE assigns an additional group ID to each job track usage. It looks like the definition of a particular modules settings tries to resolve all group IDs it finds attached. Is this error only there for failing jobs or for all (I would assume the latter). == If h_vmem is set, some applications need the h_stack set too. Often a value of 128M or 256M does enable the application to run again. -- Reuti > where the group ID number varies, always above 40000. That's the group_id > range assigned in the qmaster. > > None of the shell initialization scripts use the GID, and they all succeed > when "h_vmem" is not specified. > > > Any suggestions of the next step in troubleshooting this issue? > > Thanks, > > Mark > > [1] http://modules.sourceforge.net/ > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
