On Tue, Jan 23, 2018 at 06:22:28PM -0600, Calvin Dodge wrote: > The docs we've found say that gid_range must be greater than the > number of jobs expected to run currently on one host. > > Our recent experience suggests that it has to be greater than the > total number of jobs in the queue. If it's not, then a few jobs get > mysteriously killed (typically about 1 in 30-40). > > Has anyone else had that experience? We did fix this by expanding the > range (it was the default of 20000-20100, which we changed to > 20200-21000), but would like to know if there's a "best practice" > regarding the range of values.
Queued jobs shouldn't make a difference. It is possible that there might be some sort of race where the gid is held onto by grid engine while it runs the epilog(not sure haven't checked exactly when the group is deallocated). Having the range be twice the number of jobs should cover this unless your epilog is getting stuck for some reason. William
signature.asc
Description: PGP signature
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
