On Wed, Jun 22, 2016 at 08:39:35AM +0000, sudha.penme...@wipro.com wrote: > Hi, > > We have added the below qmaster params in the SGE configuration > > qmaster_params gdi_timeout=240 gdi_retries=-1 cl_ping=true > > Could you let me know the difference between gdi_timeout and gdi_retries. Why > is there gdi_retries parameter? Why can't we use gdi_timeout alone to retry > permanently like allowing an option -1 for gdi-timeout. I don't get the > specific purpose of having extra parameter gdi_retries. > The difference is in the manual page. gdi_timeout specifies how long to wait between retries, gdi_retries specifies how many times to retry. The timeout setting prevents you from bombarding a slow server with repeated requests while the retries setting ensures that things will progress even if the odd request gets lost for some reason. If you used a single magic value in gdi_timeout to represent try forever then there would be no way to specify how long to wait between retries.
> Because when we have NFS latency issue we receive the error "failed receiving > gdi request" but yet the job is submitted which is causing confusion. > It has been my practice to have the file system with the grid-engine config be local to the qmaster and exported to the rest of the cluster via NFS precisely because the speed with which the qmaster accesses these filesystems matters a lot more than it does for other nodes. This does mean our current setup lacks a shadow master but one of my colleagues is currently setting up a pair of servers with DRBD so we can support failover in the event of hardware failure. William
signature.asc
Description: Digital signature
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users