Re: [gridengine users] Some generic questions: binding, parallel, over-subscription

2013-11-01 Thread Reuti
Oh dear, having the checkpoint/h_rt issue yesterday I found this without a reply. Am 14.12.2012 um 17:43 schrieb Dave Love: Reuti re...@staff.uni-marburg.de writes: I'd have said they're not the same generally. The reserved granted resources are a subset of the requested ones. Even an

Re: [gridengine users] Jobs stuck in qw with bizarre (to me) error message

2013-11-01 Thread Reuti
Am 01.11.2013 um 14:39 schrieb Sylvain Foisy Ph. D.: Hi Reuti, Everything seems to be working fine now. My $SGE_ROOT is located on a SAN volume, connected to me cluster via NFS. Would network saturation issues might cause this type of behaviour? Yes. It's best to have all spool

[gridengine users] Debugging *really* long scheduling runs

2013-11-01 Thread Joshua Baker-LePain
I'm currently running Grid Engine 2011.11p1 on CentOS-6. I'm using classic spooling to a local disk, local $SGE_ROOT (except for $SGE_ROOT/$SGE_CELL/common), and local spooling directories on the nodes (of which there are more than 600). I'm occasionally seeing *really* long scheduling runs

Re: [gridengine users] Queue limit s_rt / h_rt and CheckPoint

2013-11-01 Thread Joseph Farran
Hi Reuti. Yes, after going through the logs, the subsequent restarts are messed up. I've played with it more and there is easy no way to do this inside the job submission script, so I will have to resort ( as you indicated ) to using outside script to run periodically and do a qsub -sj job /

Re: [gridengine users] Queue limit s_rt / h_rt and CheckPoint

2013-11-01 Thread Reuti
Hi, Am 01.11.2013 um 19:18 schrieb Joseph Farran: Yes, after going through the logs, the subsequent restarts are messed up. I've played with it more and there is easy no way to do this inside the job submission script, Inside the submission script it's possible - I thought you were

Re: [gridengine users] Queue limit s_rt / h_rt and CheckPoint

2013-11-01 Thread Reuti
Am 01.11.2013 um 19:29 schrieb Reuti: Hi, Am 01.11.2013 um 19:18 schrieb Joseph Farran: Yes, after going through the logs, the subsequent restarts are messed up. I've played with it more and there is easy no way to do this inside the job submission script, Inside the submission