[gridengine users] Having jobs run inter-node on local node disk to reduce I/O latency?

Jake Carroll Wed, 20 Jun 2012 02:42:45 -0700

Hi all.

A probably not so uncommon question today with what is probably a simple answer 
forthcoming.


I've currently got a situation where one of the storage arrays I'm using to 
share "big" NFS to my compute nodes is under a significant amount of 10GbE I/O 
strain. The array can't handle the concurrency I'm currently throwing at it.

To that end – I started contemplating somehow forcing queues to somehow 
"transfer" the data working sets or resources requested of the storage to local 
/scratch "inter-node". Each node has some decently speedy 15K SAS spindles 
inside it. I thought it'd be nice to see if we could reduce latency and 
contention on the 10GbE connected array a little by doing this.

We found this:

https://www.nbcr.net/pub/wiki/index.php?title=Reduce_I/O_bottleneck_by_using_compute_node_local_scratch_disks

But I am sure there is a lot more to it.

I know of a configuration item I've seen called the "transfer" queue, but I've 
got a feeling it's got nothing to do with this, and is more used as a mechanism 
to programmatically forward jobs to other SGE queues et al.

Looking for some guidance on how we might programmatically enforce the jobs at 
"wire up" time to transfer working sets to node local /scratch to increase 
efficiencies (perhaps?).

Thanks, all!

--JC

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] Having jobs run inter-node on local node disk to reduce I/O latency?

Reply via email to