Nadav,
  I will pick up on the points regarding distributing the files.
The defautl level is to use NFS shares. Yes, I do appreciate your points
regarding performance of NFS.
However if you have 10Gbps Ethernet then you should look at tuning the
network parameters for 10Gbps.
Also NFS over RDMA is said to work so if you have the network/ interface
cards for this this is an option.
Also just ask your systems guy to look at the parameters for NFS anyway -
large rsize, wsize settings,
mounts with noatime and async mounts. It is really surprisign how much
performance gains you get just by doing them.


The second thing to discuss is 'data staging' - ie automatic transfer of
files at the start of the job to a local storage area,
then transfer back at the end of a job.
The local storage are on the node could eb a partition on the local hard
drive, an SSD drive or a RAMdisk area.
I had quite an extensive thread on this topic on this list about six months
ago. Surprisingly, to me, only Cray systems
seem to be actively supported here.

Thirdly we come onto parallel filesystems. These are quite mature now, and
can be easily deployed.
I am familiar with Panasas (proprietary hardware), Lustre, GPFS (Spectrum
Scale), BeeGFS and Glustre.
(I'll count Glustre as a parallel filesystem).
These gain their  performance by scaling out over several storage targets,
and can scale hugely.
You can start with one storage server though.

My advice to you
a) start by getting your existing NFS working better. Look at those network
tuning parameters, offloadign on your NICs
    and the moutn options.
    Heck, ask yourself - for the depp learnign models I wan tto run, what
is the ratio of data moving/readign times to computation?
    If the ratio is huge then you're OK. If the ratio is comign closer to
1:1 then you need sto start optimising.

b) Look at setting up a single BeeGFS server.
    I admit to rather takign a shine to GPFS recently, and I find it a joy
to use. However I shoudl imagine that you are wanting to
    accomplish this withotu licensed software?



























On 25 September 2017 at 12:09, Diego Zuccato <diego.zucc...@unibo.it> wrote:

>
> Il 24/09/2017 12:10, Marcin Stolarek ha scritto:
>
> > So do I, however, I'm using sssd with AD provider joined into AD domain.
> > It's tricky and requires good sssd understanding, but it works... in
> > general.
> We are using PBIS-open to join the nodes. Quite easy to setup, just
> "sometimes" (randomly, but usually after many months) some machines lose
> the join.
> I couldn't make sssd work with our AD (I'm not an AD admin, I can only
> join machines, and there's no special bind-account).
>
> --
> Diego Zuccato
> Servizi Informatici
> Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
> mail: diego.zucc...@unibo.it
>

Reply via email to