An update on this thread. It looks like Univa has identified the issue and a fix for the error.
I noticed these comments in the Son Of Gridengine v8.0.0c Release Notes: * Increase default MAX_DYN_EC qmaster param [U] * Fix qsub -sync y error message and enforce MAX_DYN_EC correctly [U] I've found some related commits in the github repo. * https://github.com/gridengine/gridengine/commit/b449607972614e4608272d8c0fc6f109d35fbecc * https://github.com/gridengine/gridengine/commit/a47c32f965111554ec076db1526a6ad62c5bdae5 Anyways, I think a solution might be to increase the MAX_DYN_EC parameter to 1000. I'm going to give that a try in our cluster. Thanks, Brad Dobbie -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Dobbie, Brad Sent: Monday, October 10, 2011 9:12 AM To: [email protected] Cc: [email protected] Subject: Re: [gridengine users] error with qsub -sync y We occasionally get this in our site, and it is pretty catastrophic. Most users have switched to qrsh to avoid the bug. I agree that it happens in batches, for periods of 5-10 minutes. More posts describing the problem are starting to show up on google, since I first ran into it about a year ago. I tried adding the "-t 1" option to define a range_list, but that did not help. Most recently, I tried stopping the qmaster and starting it again. That seemed to fix the problem, but that's not a very good solution. We probably wont see the issue again for months. It seems network-load or nfs-load dependent, but I have no data to back up that claim. Our cluster uses RHEL5.4 and SGE 6.2u5. Thanks, Brad On Oct 8, 2011, at 1:25 PM, Daniel Povey wrote: > I have been getting occasional errors when using qsub -sync y. It prints out > the error message: > > Unable to initialize environment because of error: range_list containes no > elements > Exiting. > > This is not reproducible, but seems to occur in batches. This is with GE > 6.2R5. > Looking for this online gives little information that is useful-- it seems to > be a bug in qsub. > Is anyone familiar? What is the best way to debug this? I don't have root > on the machines concerned. > > Dan > > > <ATT00001..txt> _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
