Re: [lustre-discuss] Do I need Lustre?

2018-05-01 Thread E.S. Rosenberg
We never had a budget sufficient to buy a large SSD-only storage so can't
say anything about that.
Our NFS filers tend to crumple when users start genuinely running jobs
against them and this tends to affect both the jobs and also any other
processes/users who are trying to use the filers.
Of course it depends on the load type where a single IO process can get
really good performance but once you have multiple threads doing IO to
different files the NFS servers start to thrash.
Of course this is also possible with an underspecced Parallel filesystem
but so far we have had far less issues with Lustre.
HTH,
Eli


On Tue, May 1, 2018 at 1:10 AM, Dilger, Andreas <andreas.dil...@intel.com>
wrote:

> On Apr 30, 2018, at 07:11, Thackeray, Neil L <ne...@illinois.edu> wrote:
> >
> > Sorry, I left out file size. We don't foresee growing tremendously. The
> plan is for researchers to upload their data, get the results, and copy it
> down to a mounted file system. This is going to be used by multiple
> researchers, and we will be charging for compute time. We really don't want
> this cluster to be used for storing data outside of the time needed for
> their computations. We may just start with 100TB of SSD storage.
>
> One of the major benefits of Lustre is that it can be used directly for
> large-scale computing.  Having users copy data to/from Lustre is fairly
> inefficient (though surprisingly copying files to/from a direct Lustre
> mount can be faster than FTP or SCP or other network copy tools).
>
> You'd be better off to increase the size of your Lustre filesystem, enough
> that users can store "projects" there for some time while they compute,
> rather than needing to move the data on/off the filesystem a lot.
>
> While using an all-SSD filesystem is appealing, you might find better
> performance with some kind of hybrid storage, like ZFS + L2ARC + Metadata
> Allocation Class (this feature is in development, target 2018-09, depending
> on your timeframe).
>
> You definitely want your MDT(s) to be SSDs, especially if you use the new
> Data-on-MDT feature to store small files tehre.  The OSTs can be HDDs to
> give you a lot more capacity for the same price.
>
> Cheers, Andreas
>
> > -Original Message-
> > From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> On
> Behalf Of Philippe Weill
> > Sent: Saturday, April 28, 2018 1:14 AM
> > To: lustre-discuss@lists.lustre.org
> > Subject: Re: [lustre-discuss] Do I need Lustre?
> >
> >
> >
> > Le 27/04/2018 à 19:07, Thackeray, Neil L a écrit :
> >> I’m new to the cluster realm, so I’m hoping for some good advice. We
> >> are starting up a new cluster, and I’ve noticed that lustre seems to be
> used widely in datacenters. The thing is I’m not sure the scale of our
> cluster will need it.
> >>
> >> We are planning a small cluster, starting with 6 -8 nodes with 2 GPUs
> >> per node. They will be used for Deep Learning, MRI data processing,
> >> and Matlab among other things. With the size of the cluster we figure
> >> that 10Gb networking will be sufficient. We aren’t going to allow
> persistent storage on the cluster. Users will just upload and download
> data. I’m mostly concerned about I/O speeds. I don’t know if NFS would be
> fast enough to handle the data.
> >>
> >> We are hoping that the cluster will grow over time. We are already
> talking about buying more nodes next fiscal year.
> >>
> >> Thanks.
> >>
> >
> > hello
> >
> > you didn't say anything about filesystem size needed and if you are
> thinking to grow fast we also run a small cluster ( 20 nodes ) but for
> climate data modeling results and satellite atmospheric data analysis we
> are growing at least 300TB per year (2PB now) and it's easier for us to
> grow with lustre
> >
> >
> > --
> > Weill Philippe -  Administrateur Systeme et Reseaux
> > CNRS/UPMC/IPSL   LATMOS (UMR 8190)
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation
>
>
>
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Do I need Lustre?

2018-04-30 Thread Dilger, Andreas
On Apr 30, 2018, at 07:11, Thackeray, Neil L <ne...@illinois.edu> wrote:
> 
> Sorry, I left out file size. We don't foresee growing tremendously. The plan 
> is for researchers to upload their data, get the results, and copy it down to 
> a mounted file system. This is going to be used by multiple researchers, and 
> we will be charging for compute time. We really don't want this cluster to be 
> used for storing data outside of the time needed for their computations. We 
> may just start with 100TB of SSD storage.

One of the major benefits of Lustre is that it can be used directly for 
large-scale computing.  Having users copy data to/from Lustre is fairly 
inefficient (though surprisingly copying files to/from a direct Lustre mount 
can be faster than FTP or SCP or other network copy tools).

You'd be better off to increase the size of your Lustre filesystem, enough that 
users can store "projects" there for some time while they compute, rather than 
needing to move the data on/off the filesystem a lot.

While using an all-SSD filesystem is appealing, you might find better 
performance with some kind of hybrid storage, like ZFS + L2ARC + Metadata 
Allocation Class (this feature is in development, target 2018-09, depending on 
your timeframe).  

You definitely want your MDT(s) to be SSDs, especially if you use the new 
Data-on-MDT feature to store small files tehre.  The OSTs can be HDDs to give 
you a lot more capacity for the same price.

Cheers, Andreas

> -Original Message-
> From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> On Behalf Of 
> Philippe Weill
> Sent: Saturday, April 28, 2018 1:14 AM
> To: lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] Do I need Lustre?
> 
> 
> 
> Le 27/04/2018 à 19:07, Thackeray, Neil L a écrit :
>> I’m new to the cluster realm, so I’m hoping for some good advice. We 
>> are starting up a new cluster, and I’ve noticed that lustre seems to be used 
>> widely in datacenters. The thing is I’m not sure the scale of our cluster 
>> will need it.
>> 
>> We are planning a small cluster, starting with 6 -8 nodes with 2 GPUs 
>> per node. They will be used for Deep Learning, MRI data processing, 
>> and Matlab among other things. With the size of the cluster we figure 
>> that 10Gb networking will be sufficient. We aren’t going to allow persistent 
>> storage on the cluster. Users will just upload and download data. I’m mostly 
>> concerned about I/O speeds. I don’t know if NFS would be fast enough to 
>> handle the data.
>> 
>> We are hoping that the cluster will grow over time. We are already talking 
>> about buying more nodes next fiscal year.
>> 
>> Thanks.
>> 
> 
> hello
> 
> you didn't say anything about filesystem size needed and if you are thinking 
> to grow fast we also run a small cluster ( 20 nodes ) but for climate data 
> modeling results and satellite atmospheric data analysis we are growing at 
> least 300TB per year (2PB now) and it's easier for us to grow with lustre
> 
> 
> --
> Weill Philippe -  Administrateur Systeme et Reseaux
> CNRS/UPMC/IPSL   LATMOS (UMR 8190)
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Do I need Lustre?

2018-04-30 Thread Colin Faber
The type of storage you decide on should be a factor of the workloads that
you plan on executing against, as well as the back-end storage systems.
Having all SSD based file systems can be useful in the case of small /
random IO however in many cases the controllers will not be able to take
advantage of the performance provided by the SSD array

 As a result you end up paying a whole lot more for a whole lot less space
and ending up with the same overall performance.

On Mon, Apr 30, 2018, 7:11 AM Thackeray, Neil L <ne...@illinois.edu> wrote:

> Sorry, I left out file size. We don't foresee growing tremendously. The
> plan is for researchers to upload their data, get the results, and copy it
> down to a mounted file system. This is going to be used by multiple
> researchers, and we will be charging for compute time. We really don't want
> this cluster to be used for storing data outside of the time needed for
> their computations. We may just start with 100TB of SSD storage.
>
> -Original Message-
> From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> On Behalf
> Of Philippe Weill
> Sent: Saturday, April 28, 2018 1:14 AM
> To: lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] Do I need Lustre?
>
>
>
> Le 27/04/2018 à 19:07, Thackeray, Neil L a écrit :
> > I’m new to the cluster realm, so I’m hoping for some good advice. We
> > are starting up a new cluster, and I’ve noticed that lustre seems to be
> used widely in datacenters. The thing is I’m not sure the scale of our
> cluster will need it.
> >
> > We are planning a small cluster, starting with 6 -8 nodes with 2 GPUs
> > per node. They will be used for Deep Learning, MRI data processing,
> > and Matlab among other things. With the size of the cluster we figure
> > that 10Gb networking will be sufficient. We aren’t going to allow
> persistent storage on the cluster. Users will just upload and download
> data. I’m mostly concerned about I/O speeds. I don’t know if NFS would be
> fast enough to handle the data.
> >
> > We are hoping that the cluster will grow over time. We are already
> talking about buying more nodes next fiscal year.
> >
> > Thanks.
> >
>
> hello
>
> you didn't say anything about filesystem size needed and if you are
> thinking to grow fast we also run a small cluster ( 20 nodes ) but for
> climate data modeling results and satellite atmospheric data analysis we
> are growing at least 300TB per year (2PB now) and it's easier for us to
> grow with lustre
>
>
> --
> Weill Philippe -  Administrateur Systeme et Reseaux
> CNRS/UPMC/IPSL   LATMOS (UMR 8190)
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Do I need Lustre?

2018-04-30 Thread Thackeray, Neil L
Sorry, I left out file size. We don't foresee growing tremendously. The plan is 
for researchers to upload their data, get the results, and copy it down to a 
mounted file system. This is going to be used by multiple researchers, and we 
will be charging for compute time. We really don't want this cluster to be used 
for storing data outside of the time needed for their computations. We may just 
start with 100TB of SSD storage.

-Original Message-
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> On Behalf Of 
Philippe Weill
Sent: Saturday, April 28, 2018 1:14 AM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Do I need Lustre?



Le 27/04/2018 à 19:07, Thackeray, Neil L a écrit :
> I’m new to the cluster realm, so I’m hoping for some good advice. We 
> are starting up a new cluster, and I’ve noticed that lustre seems to be used 
> widely in datacenters. The thing is I’m not sure the scale of our cluster 
> will need it.
> 
> We are planning a small cluster, starting with 6 -8 nodes with 2 GPUs 
> per node. They will be used for Deep Learning, MRI data processing, 
> and Matlab among other things. With the size of the cluster we figure 
> that 10Gb networking will be sufficient. We aren’t going to allow persistent 
> storage on the cluster. Users will just upload and download data. I’m mostly 
> concerned about I/O speeds. I don’t know if NFS would be fast enough to 
> handle the data.
> 
> We are hoping that the cluster will grow over time. We are already talking 
> about buying more nodes next fiscal year.
> 
> Thanks.
> 

hello

you didn't say anything about filesystem size needed and if you are thinking to 
grow fast we also run a small cluster ( 20 nodes ) but for climate data 
modeling results and satellite atmospheric data analysis we are growing at 
least 300TB per year (2PB now) and it's easier for us to grow with lustre


--
Weill Philippe -  Administrateur Systeme et Reseaux
CNRS/UPMC/IPSL   LATMOS (UMR 8190)
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Do I need Lustre?

2018-04-28 Thread Philippe Weill



Le 27/04/2018 à 19:07, Thackeray, Neil L a écrit :
I’m new to the cluster realm, so I’m hoping for some good advice. We are starting up a new cluster, and I’ve noticed that lustre 
seems to be used widely in datacenters. The thing is I’m not sure the scale of our cluster will need it.


We are planning a small cluster, starting with 6 -8 nodes with 2 GPUs per node. They will be used for Deep Learning, MRI data 
processing, and Matlab among other things. With the size of the cluster we figure that 10Gb networking will be sufficient. We aren’t 
going to allow persistent storage on the cluster. Users will just upload and download data. I’m mostly concerned about I/O speeds. I 
don’t know if NFS would be fast enough to handle the data.


We are hoping that the cluster will grow over time. We are already talking 
about buying more nodes next fiscal year.

Thanks.



hello

you didn't say anything about filesystem size needed and if you are thinking to 
grow fast
we also run a small cluster ( 20 nodes )
but for climate data modeling results and satellite atmospheric data analysis
we are growing at least 300TB per year (2PB now)
and it's easier for us to grow with lustre


--
Weill Philippe -  Administrateur Systeme et Reseaux
CNRS/UPMC/IPSL   LATMOS (UMR 8190)
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Do I need Lustre?

2018-04-27 Thread Patrick Farrell
One factor is probably budget - Lustre is probably a higher budget option, in 
terms of hardware and time investment.  I would guess at the 6-8 node range you 
probably don't need its speed, though you might need at least one other trick 
it has:

One thing Lustre gives that NFS does not is the ability for multiple nodes to 
write to the same file in parallel while maintaining consistency.  It's a 
clustered/parallel file system, not just a network file system.  Some codes 
require this if you want to run them across multiple nodes.

You might start by setting up whatever seems "easy" to you, probably an NFS 
share of a storage appliance you've already got, and then see what happens.  If 
users are happy and you don't seem to be spending a lot of time doing I/O, then 
you're probably OK.  If not, Lustre is more work, but you do get something for 
your labors. :)


From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Brett Lee <brettlee.lus...@gmail.com>
Sent: Friday, April 27, 2018 8:11:21 PM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Do I need Lustre?

Hi Neil,

One of the considerations in using Lustre should be the I/O patterns of your 
applications.  Lustre excels with large, sequential reads and writes.

Another are the costs, to include hardware, software, support, and coming up to 
speed with Lustre.  These components interact.  For example, having 
professional support helps with coming up to speed on Lustre. :)

Hey Michael!


On Fri, Apr 27, 2018, 12:22 PM Hebenstreit, Michael 
<michael.hebenstr...@intel.com<mailto:michael.hebenstr...@intel.com>> wrote:

You can do a simple test. Run a small sample of you application directly out of 
/dev/shm (the ram-disk). Then run it from the NFS file server. If you measure 
significant speedups your application is I/O sensitive and a Lustre configured 
with OPA or other InfiniBand solution will help.



From: lustre-discuss 
[mailto:lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org>]
 On Behalf Of Thackeray, Neil L
Sent: Friday, April 27, 2018 11:08 AM
To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: [lustre-discuss] Do I need Lustre?



I’m new to the cluster realm, so I’m hoping for some good advice. We are 
starting up a new cluster, and I’ve noticed that lustre seems to be used widely 
in datacenters. The thing is I’m not sure the scale of our cluster will need it.



We are planning a small cluster, starting with 6 -8 nodes with 2 GPUs per node. 
They will be used for Deep Learning, MRI data processing, and Matlab among 
other things. With the size of the cluster we figure that 10Gb networking will 
be sufficient. We aren’t going to allow persistent storage on the cluster. 
Users will just upload and download data. I’m mostly concerned about I/O 
speeds. I don’t know if NFS would be fast enough to handle the data.



We are hoping that the cluster will grow over time. We are already talking 
about buying more nodes next fiscal year.



Thanks.

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Do I need Lustre?

2018-04-27 Thread Brett Lee
Hi Neil,

One of the considerations in using Lustre should be the I/O patterns of
your applications.  Lustre excels with large, sequential reads and writes.

Another are the costs, to include hardware, software, support, and coming
up to speed with Lustre.  These components interact.  For example, having
professional support helps with coming up to speed on Lustre. :)

Hey Michael!


On Fri, Apr 27, 2018, 12:22 PM Hebenstreit, Michael <
michael.hebenstr...@intel.com> wrote:

> You can do a simple test. Run a small sample of you application directly
> out of /dev/shm (the ram-disk). Then run it from the NFS file server. If
> you measure significant speedups your application is I/O sensitive and a
> Lustre configured with OPA or other InfiniBand solution will help.
>
>
>
> *From:* lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] *On
> Behalf Of *Thackeray, Neil L
> *Sent:* Friday, April 27, 2018 11:08 AM
> *To:* lustre-discuss@lists.lustre.org
> *Subject:* [lustre-discuss] Do I need Lustre?
>
>
>
> I’m new to the cluster realm, so I’m hoping for some good advice. We are
> starting up a new cluster, and I’ve noticed that lustre seems to be used
> widely in datacenters. The thing is I’m not sure the scale of our cluster
> will need it.
>
>
>
> We are planning a small cluster, starting with 6 -8 nodes with 2 GPUs per
> node. They will be used for Deep Learning, MRI data processing, and Matlab
> among other things. With the size of the cluster we figure that 10Gb
> networking will be sufficient. We aren’t going to allow persistent storage
> on the cluster. Users will just upload and download data. I’m mostly
> concerned about I/O speeds. I don’t know if NFS would be fast enough to
> handle the data.
>
>
>
> We are hoping that the cluster will grow over time. We are already talking
> about buying more nodes next fiscal year.
>
>
>
> Thanks.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Do I need Lustre?

2018-04-27 Thread Hebenstreit, Michael
You can do a simple test. Run a small sample of you application directly out of 
/dev/shm (the ram-disk). Then run it from the NFS file server. If you measure 
significant speedups your application is I/O sensitive and a Lustre configured 
with OPA or other InfiniBand solution will help.

From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Thackeray, Neil L
Sent: Friday, April 27, 2018 11:08 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Do I need Lustre?

I'm new to the cluster realm, so I'm hoping for some good advice. We are 
starting up a new cluster, and I've noticed that lustre seems to be used widely 
in datacenters. The thing is I'm not sure the scale of our cluster will need it.

We are planning a small cluster, starting with 6 -8 nodes with 2 GPUs per node. 
They will be used for Deep Learning, MRI data processing, and Matlab among 
other things. With the size of the cluster we figure that 10Gb networking will 
be sufficient. We aren't going to allow persistent storage on the cluster. 
Users will just upload and download data. I'm mostly concerned about I/O 
speeds. I don't know if NFS would be fast enough to handle the data.

We are hoping that the cluster will grow over time. We are already talking 
about buying more nodes next fiscal year.

Thanks.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org