Am 07.11.2012 um 15:46 schrieb Petter Gustad: >> From: Reuti <[email protected]> >> Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3 >> Date: Tue, 30 Oct 2012 11:27:49 +0100 >> >>> Just use the version you have already in the shared /usr/sge or your >>> particular mountpoint. >> >> I should probably try this first, at least to verify that it's >> working. But later I would like to migrate to the CentOS on all my >> exechosts and leave the installation to somebody else. > > I did this and it worked out fine on the first machine I migrated. > However, on the next set of machines I run into the problem where the > submitted job will cause the queue to go into the error state. > > I observe that: > > 1) It will not be submitted > 2) The queue will be marked with the 'E' state > 3) I get an e-mail saying > Shepherd pe_hostfile: > node 1 queue@node UNDEFINED > 4) The node will log the following in the spool/node/messages file: > 11/07/2012 15:33:07| main|node|E|shepherd of job 48548.1 exited with exit > status = 11 > 5) qstat -j jobnumber returns > > error reason 1: 11/07/2012 15:33:06 [555:29681]: unable to > find job file "/work/gridengine/spool/node/job_scr
This looks like an anusual path for the spool directory. The name of the node should be included. $ qconf -sconf (at the top something like: execd_spool_dir /var/spool/sge, the directory for the particular node will be created automatically when the execd starts up) $ qconf -sconfl (get all exechost definitions [if any are present at all]), then for the particular node: $ qconf -sconf node42 and check the path to the execd_spool_dir. -- Reuti > scheduling info: (Collecting of scheduler job information is > turned off) _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
