600 nodes what are your interconnect switches? 1ge or 10ge or ib
What is your nfs storage? How to nas server connect to switch?
at minimum you should tar and compress  5 files before Copy back to nfs server
If possible may be setup two stage copy, each rack may need to files server 
that you copy 1st to each rack's nas server (this is assume that you have 48 
port age switch per rack)
Later you copy back to center nas server
These nas should has 10ge or ib connection
My 2c

Sent from my iPad
Hung-Sheng Tsao ( LaoTsao) Ph.D

On Mar 9, 2011, at 3:36 AM, HAUTREUX Matthieu <[email protected]> wrote:

> Hi,
> 
> you should first try to identity the root cause of the slowness. You said 
> that you are writing ( 5 * 600 ) files simultaneously. Some simples questions 
> like : 
> - what is the total size of data written on the NFS server due to the jobs ?
> - what is the network link that enables you to connect to the NFS server ?
> - what is the speed of the storage backend behind the NFS server ?
> - what is the type of NFS server you are using ?
> - Have you made some tuning to make it works better in parallel ?
> 
> It exists different file systems for different purposes, perhaps that NFS is 
> not the best FS for what you are doing right know.
> 
> HTH
> Matthieu
> 
> Paul Thirumalai a écrit :
>> Hi All
>> I have a job which i launch on a remote node using slurm. This job generates 
>> 5 files which I want to move back to the server node.
>> Now all the nodes are mounted to a nfs share. However when I have > 600 
>> nodes copying files to the nfs share, it causes alot of slowness.
>> 
>> is there a way I could use slurm to transfer the files back to the server 
>> node.
>> 
>> Thanks in advnace.
>> 
>> -Paul
> 

Reply via email to