Hi!

Thanks for the reply.

No, the job did not run. My launch command sets the verbose flag  and -now
no. The first thing I get is
waiting for interactive job to be scheduled ...

Which is good. Then nothing happens. Later, when I kill the jobs, I see a
mix of some
Your "qrsh" request could not be scheduled, try again later. popping in my
logs.
and
error: commlib error: got select error (No route to host)
and
error: commlib error: got select error (Connection timed out)

It's strange that this is only received after the kill.

>From my terminal experience, qrsh can behave in a weird manner. When I get
an error message, the qrsh job is queued (and showed in qstat), but I lose
my handle over it.

Regarding the dynamic cluster, my IPs are static for the duration of a node
life. Nodes can be added and removed. Their IPs won't change in the middle
of a run. But say that node3 is added with an IP, then removed, then added
back, the IP will not be the same. Might it be the cause?

Thanks
Mich


On Wed, Nov 21, 2012 at 10:55 AM, Reuti <[email protected]> wrote:

> Hi,
>
> Am 21.11.2012 um 16:10 schrieb François-Michel L'Heureux:
>
> > I have an issue where some jobs I call with the qrsh commands never
> appear into the queue. If I run the command "ps -ef | grep qrsh" I can see
> them. My setup
>
> Ok, but did it ever start on any node?
>
>
> > is as follows:
> >
> >       • I just have one process calling the grid engine via qrsh. This
> process resides on the master node.
> >       • I don't use nfs, I use sshfs instead.
> >       • I run over a dynamic cluster, which mean that at anytime nodes
> can be added or removed.
> > Is anyone having an idea on what can cause the issue? I can counter it
> by looking at the process list when the queue is empty and
> killing/rescheduling those running a qrsh command, but I would rather
> prevent it.
>
> What do you mean by "dynamic cluster". SGE needs fixed addresses per node.
>
> -- Reuti
>
>
> > Thanks
> > Mich
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to