Am 21.11.2012 um 17:28 schrieb François-Michel L'Heureux: > Hi! > > Thanks for the reply. > > No, the job did not run. My launch command sets the verbose flag and -now > no. The first thing I get is > waiting for interactive job to be scheduled ...
Yep, with -now n it will wait until resources are available. The default behavior would be to fail more or less instantly. > Which is good. Then nothing happens. Later, when I kill the jobs, I see a mix > of some > Your "qrsh" request could not be scheduled, try again later. popping in my > logs. > and > error: commlib error: got select error (No route to host) > and Is there a route to the host? > error: commlib error: got select error (Connection timed out) > > It's strange that this is only received after the kill. > > From my terminal experience, qrsh can behave in a weird manner. When I get an > error message, the qrsh job is queued (and showed in qstat), but I lose my > handle over it. > > Regarding the dynamic cluster, my IPs are static for the duration of a node > life. Nodes can be added and removed. Their IPs won't change in the middle of > a run. But say that node3 is added with an IP, then removed, then added back, > the IP will not be the same. Might it be the cause? For SGE it would be a different node then with a different name. What's the reason for adding and removing nodes? -- Reuti > Thanks > Mich > > > On Wed, Nov 21, 2012 at 10:55 AM, Reuti <[email protected]> wrote: > Hi, > > Am 21.11.2012 um 16:10 schrieb François-Michel L'Heureux: > > > I have an issue where some jobs I call with the qrsh commands never appear > > into the queue. If I run the command "ps -ef | grep qrsh" I can see them. > > My setup > > Ok, but did it ever start on any node? > > > > is as follows: > > > > • I just have one process calling the grid engine via qrsh. This > > process resides on the master node. > > • I don't use nfs, I use sshfs instead. > > • I run over a dynamic cluster, which mean that at anytime nodes can > > be added or removed. > > Is anyone having an idea on what can cause the issue? I can counter it by > > looking at the process list when the queue is empty and > > killing/rescheduling those running a qrsh command, but I would rather > > prevent it. > > What do you mean by "dynamic cluster". SGE needs fixed addresses per node. > > -- Reuti > > > > Thanks > > Mich > > _______________________________________________ > > users mailing list > > [email protected] > > https://gridengine.org/mailman/listinfo/users > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
