Hi,

you should first try to identity the root cause of the slowness. You said that you are writing ( 5 * 600 ) files simultaneously. Some simples questions like : - what is the total size of data written on the NFS server due to the jobs ?
- what is the network link that enables you to connect to the NFS server ?
- what is the speed of the storage backend behind the NFS server ?
- what is the type of NFS server you are using ?
- Have you made some tuning to make it works better in parallel ?

It exists different file systems for different purposes, perhaps that NFS is not the best FS for what you are doing right know.

HTH
Matthieu

Paul Thirumalai a écrit :
Hi All
I have a job which i launch on a remote node using slurm. This job generates 5 files which I want to move back to the server node. Now all the nodes are mounted to a nfs share. However when I have > 600 nodes copying files to the nfs share, it causes alot of slowness.

is there a way I could use slurm to transfer the files back to the server node.

Thanks in advnace.

-Paul

Reply via email to