I agree with Hung-sheng.

If you are seeing NFS load issues, then you should switch to Local Spool Dirs.

It is very easy to use local spooling, just follow the guide 
at: http://gridscheduler.sourceforge.net/howto/nfsreduce.html

While NFSv4 is better performing than earlier versions, you need to get pNFS to 
handle really heavy load.

 -Ron




----- Original Message -----
From: Hung-sheng Tsao <[email protected]>
To: Chris Jewell <[email protected]>
Cc: "[email protected] Users" <[email protected]>
Sent: Sunday, May 6, 2012 2:22 PM
Subject: Re: [gridengine users] NFS spool dirs -- crash under heavy scheduling 
load

IMHO
Do local spool


Sent from my iPhone

On May 6, 2012, at 2:14 PM, Chris Jewell <[email protected]> wrote:

> Hi All,
> 
> Apologies for cross-posting -- not sure which list is the most active these 
> days…?
> 
> I'm currently having a real issue with our shared SGE_ROOT directory, which 
> also contains spool directories.  It is XFS-formatted on the server, which is 
> also hosts the sgemaster daemon, and shared via NFSv4.
> 
> The cluster has 108 processors, spread over 11 execution nodes, wired up with 
> 1GE.  Under heavy fast scheduling (ie *large* task arrays of very short jobs) 
> we are experiencing server crashes: spinning rpciod and nfsd processes both 
> on clients and on the server cause very high loadavg, alarm states, sgeexecd 
> to go into uninterruptible sleep states, machines falling over etc etc.
> 
> I would have thought that the NFSv4 shared directory would cope with this 
> load, since the cluster is not massive.  However, we have our scheduling 
> delay set to 0, so I'm wondering if this is causing the issue.  I'd like to 
> check your collective experience on this one, before changing the cluster 
> config to use local spool dirs.
> 
> Many thanks,
> 
> Chris
> --
> Dr Chris Jewell
> Department of Statistics
> University of Warwick
> Coventry
> CV4 7AL
> UK
> Tel: +44 (0)24 7615 0778
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to