I do however observe a sporadic issue with qlogin/qrsh ...

$ qlogin -l h=gpu-v100
Your job 1247919 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 1247919 has been successfully scheduled.

Your interactive job 1247919 has been successfully scheduled.

^Cerror: error while waiting for builtin IJS connection: "got select timeout"

Are all nodes connected in the same way to the machine you issued the `qlogin` on? Is the firewall setting identical on all computing nodes?

-- Reuti

... this goes on forever until I type ctrl-C as I otherwise never get a prompt - while qstat says I have a running job ...

$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 1247919 0.25000 QLOGIN sylvain r 04/05/2018 11:12:10 gp...@gpu-v100.bic.mni.mcgill. 1

qrsh behaves the same except that it is silent.

If I try a different node, I get success, which suggetss it is perhaps a configuration issue but I just can't get my head around it - any ideas on how to troubleshoot this ? Or perhaps someone on this list has already faced this issue ?

PS - just to be sure I disabled all quota restrictions

no firewall on the nodes - they're on a private subnet.

I do otherwise observe some network lag for that particular system (and another for which I have the same issue). I'll find the cause for this and try again once the lag has been resolved.

This lag is perhaps causing a timeout for the qlogin connection ?

I found and fixed the network lag issue, as these two systems are KVM guests on 
an Ubuntu 16.04 server.


Yet, the qlogin/qrsh problem persists ... ideas ?


Sylvain Milot (sylv...@bic.mni.mcgill.ca)
Brain Imaging Centre
Montreal Neurological Institute
3801 University Street, Webster 2B, Room 206
Montreal, Qc., Canada, H3A 2B4
Phone  : (514) 398-4965, Fax: 398-8948
Mobile : (514) 712-1768
Office : Room NW119 (North Wing)
SGE-discuss mailing list

Reply via email to