Nadav, ps. How low is your NFS performance versus local files? I bet of you looked at the NFS networkign parameters you would get a good performance boost.
May we ask what network links the compute servers? On 25 September 2017 at 15:40, John Hearns <hear...@googlemail.com> wrote: > Nadav, > I will pick up on the points regarding distributing the files. > The defautl level is to use NFS shares. Yes, I do appreciate your points > regarding performance of NFS. > However if you have 10Gbps Ethernet then you should look at tuning the > network parameters for 10Gbps. > Also NFS over RDMA is said to work so if you have the network/ interface > cards for this this is an option. > Also just ask your systems guy to look at the parameters for NFS anyway - > large rsize, wsize settings, > mounts with noatime and async mounts. It is really surprisign how much > performance gains you get just by doing them. > > > The second thing to discuss is 'data staging' - ie automatic transfer of > files at the start of the job to a local storage area, > then transfer back at the end of a job. > The local storage are on the node could eb a partition on the local hard > drive, an SSD drive or a RAMdisk area. > I had quite an extensive thread on this topic on this list about six > months ago. Surprisingly, to me, only Cray systems > seem to be actively supported here. > > Thirdly we come onto parallel filesystems. These are quite mature now, and > can be easily deployed. > I am familiar with Panasas (proprietary hardware), Lustre, GPFS (Spectrum > Scale), BeeGFS and Glustre. > (I'll count Glustre as a parallel filesystem). > These gain their performance by scaling out over several storage targets, > and can scale hugely. > You can start with one storage server though. > > My advice to you > a) start by getting your existing NFS working better. Look at those > network tuning parameters, offloadign on your NICs > and the moutn options. > Heck, ask yourself - for the depp learnign models I wan tto run, what > is the ratio of data moving/readign times to computation? > If the ratio is huge then you're OK. If the ratio is comign closer to > 1:1 then you need sto start optimising. > > b) Look at setting up a single BeeGFS server. > I admit to rather takign a shine to GPFS recently, and I find it a joy > to use. However I shoudl imagine that you are wanting to > accomplish this withotu licensed software? > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 25 September 2017 at 12:09, Diego Zuccato <diego.zucc...@unibo.it> > wrote: > >> >> Il 24/09/2017 12:10, Marcin Stolarek ha scritto: >> >> > So do I, however, I'm using sssd with AD provider joined into AD domain. >> > It's tricky and requires good sssd understanding, but it works... in >> > general. >> We are using PBIS-open to join the nodes. Quite easy to setup, just >> "sometimes" (randomly, but usually after many months) some machines lose >> the join. >> I couldn't make sssd work with our AD (I'm not an AD admin, I can only >> join machines, and there's no special bind-account). >> >> -- >> Diego Zuccato >> Servizi Informatici >> Dip. di Fisica e Astronomia (DIFA) - Università di Bologna >> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy >> tel.: +39 051 20 95786 >> mail: diego.zucc...@unibo.it >> > >