[slurm-dev] Re: defaults, passwd and data

John Hearns Mon, 25 Sep 2017 06:44:10 -0700

Nadav, ps.   How low is your NFS performance versus local files?
I bet of you looked at the NFS networkign parameters you would get a good
performance boost.


May we ask what network links the compute servers?

On 25 September 2017 at 15:40, John Hearns <hear...@googlemail.com> wrote:

> Nadav,
>   I will pick up on the points regarding distributing the files.
> The defautl level is to use NFS shares. Yes, I do appreciate your points
> regarding performance of NFS.
> However if you have 10Gbps Ethernet then you should look at tuning the
> network parameters for 10Gbps.
> Also NFS over RDMA is said to work so if you have the network/ interface
> cards for this this is an option.
> Also just ask your systems guy to look at the parameters for NFS anyway -
> large rsize, wsize settings,
> mounts with noatime and async mounts. It is really surprisign how much
> performance gains you get just by doing them.
>
>
> The second thing to discuss is 'data staging' - ie automatic transfer of
> files at the start of the job to a local storage area,
> then transfer back at the end of a job.
> The local storage are on the node could eb a partition on the local hard
> drive, an SSD drive or a RAMdisk area.
> I had quite an extensive thread on this topic on this list about six
> months ago. Surprisingly, to me, only Cray systems
> seem to be actively supported here.
>
> Thirdly we come onto parallel filesystems. These are quite mature now, and
> can be easily deployed.
> I am familiar with Panasas (proprietary hardware), Lustre, GPFS (Spectrum
> Scale), BeeGFS and Glustre.
> (I'll count Glustre as a parallel filesystem).
> These gain their  performance by scaling out over several storage targets,
> and can scale hugely.
> You can start with one storage server though.
>
> My advice to you
> a) start by getting your existing NFS working better. Look at those
> network tuning parameters, offloadign on your NICs
>     and the moutn options.
>     Heck, ask yourself - for the depp learnign models I wan tto run, what
> is the ratio of data moving/readign times to computation?
>     If the ratio is huge then you're OK. If the ratio is comign closer to
> 1:1 then you need sto start optimising.
>
> b) Look at setting up a single BeeGFS server.
>     I admit to rather takign a shine to GPFS recently, and I find it a joy
> to use. However I shoudl imagine that you are wanting to
>     accomplish this withotu licensed software?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 25 September 2017 at 12:09, Diego Zuccato <diego.zucc...@unibo.it>
> wrote:
>
>>
>> Il 24/09/2017 12:10, Marcin Stolarek ha scritto:
>>
>> > So do I, however, I'm using sssd with AD provider joined into AD domain.
>> > It's tricky and requires good sssd understanding, but it works... in
>> > general.
>> We are using PBIS-open to join the nodes. Quite easy to setup, just
>> "sometimes" (randomly, but usually after many months) some machines lose
>> the join.
>> I couldn't make sssd work with our AD (I'm not an AD admin, I can only
>> join machines, and there's no special bind-account).
>>
>> --
>> Diego Zuccato
>> Servizi Informatici
>> Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
>> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>> tel.: +39 051 20 95786
>> mail: diego.zucc...@unibo.it
>>
>
>

[slurm-dev] Re: defaults, passwd and data

Reply via email to