An update on this thread.

It looks like Univa has identified the issue and a fix for the error.

I noticed these comments in the Son Of Gridengine v8.0.0c Release Notes:
  * Increase default MAX_DYN_EC qmaster param [U]
  * Fix qsub -sync y error message and enforce MAX_DYN_EC correctly [U]

I've found some related commits in the github repo.
  * 
https://github.com/gridengine/gridengine/commit/b449607972614e4608272d8c0fc6f109d35fbecc
  * 
https://github.com/gridengine/gridengine/commit/a47c32f965111554ec076db1526a6ad62c5bdae5

Anyways, I think a solution might be to increase the MAX_DYN_EC parameter to 
1000.  I'm going to give that a try in our cluster.

Thanks,
Brad Dobbie

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Dobbie, Brad
Sent: Monday, October 10, 2011 9:12 AM
To: [email protected]
Cc: [email protected]
Subject: Re: [gridengine users] error with qsub -sync y

We occasionally get this in our site, and it is pretty catastrophic.  Most 
users have switched to qrsh to avoid the bug.  I agree that it happens in 
batches, for periods of 5-10 minutes.  More posts describing the problem are 
starting to show up on google, since I first ran into it about a year ago.

I tried adding the "-t 1" option to define a range_list, but that did not help.

Most recently, I tried stopping the qmaster and starting it again.  That seemed 
to fix the problem, but that's not a very good solution.  We probably wont see 
the issue again for months.  It seems network-load or nfs-load dependent, but I 
have no data to back up that claim.

Our cluster uses RHEL5.4 and SGE 6.2u5.

Thanks,
Brad

On Oct 8, 2011, at 1:25 PM, Daniel Povey wrote:

> I have been getting occasional errors when using qsub -sync y.  It prints out 
> the error message:
> 
> Unable to initialize environment because of error: range_list containes no 
> elements
> Exiting.
> 
> This is not reproducible, but seems to occur in batches.  This is with GE 
> 6.2R5.
> Looking for this online gives little information that is useful-- it seems to 
> be a bug in qsub.
> Is anyone familiar?  What is the best way to debug this?  I don't have root 
> on the machines concerned.
> 
> Dan
> 
> 
> <ATT00001..txt>


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to