Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Amjad Syed
Jeff,
We intend to use 10 clients that will mount the file system.

Amjad

On Tue, Oct 31, 2017 at 3:02 AM, Jeff Johnson <
jeff.john...@aeoncomputing.com> wrote:

> Amjad,
>
> You might ask your vendor to propose a single MDT comprised of (8 * 500GB)
> 2.5" disk drives or better, SSDs. With some bio applications you would
> benefit from spreading the MDT I/O across more drives.
>
> How many clients to you expect to mount the file system? A standard filer
> (or ZFS/NFS server) will perform well compared to Lustre until you
> bottleneck somewhere in the server hardware (net, disk, cpu, etc), with
> Lustre you can add simply add one or more OSS/OSTs to the file system and
> performance potential increases by the number of additional OSS/OST servers.
>
> High-availability is nice to have but it isn't necessary unless your
> environment cannot tolerate any interruption or downtime. If your vendor
> proposes quality hardware these cases are infrequent.
>
> --Jeff
>
> On Mon, Oct 30, 2017 at 12:04 PM, Amjad Syed  wrote:
>
>> The vendor has proposed a single MDT  ( 4 * 1.2 TB) in RAID 10
>> configuration.
>> The OST will be RAID 6  and proposed are 2 OST.
>>
>>
>> On Mon, Oct 30, 2017 at 7:55 PM, Ben Evans  wrote:
>>
>>> How many OST's are behind that OSS?  How many MDT's behind the MDS?
>>>
>>> From: lustre-discuss  on
>>> behalf of Brian Andrus 
>>> Date: Monday, October 30, 2017 at 12:24 PM
>>> To: "lustre-discuss@lists.lustre.org" 
>>> Subject: Re: [lustre-discuss] 1 MDS and 1 OSS
>>>
>>> Hmm. That is an odd one from a quick thought...
>>>
>>> However, IF you are planning on growing and adding OSSes/OSTs, this is
>>> not a bad way to get started and used to how everything works. It is
>>> basically a single stripe storage.
>>>
>>> If you are not planning on growing, I would lean towards gluster on 2
>>> boxes. I do that often, actually. A single MDS/OSS has zero redundancy,
>>> unless something is being done at harware level and that would help in
>>> availability.
>>> NFS is quite viable too, but you would be splitting the available
>>> storage on 2 boxes.
>>>
>>> Brian Andrus
>>>
>>>
>>>
>>> On 10/30/2017 12:47 AM, Amjad Syed wrote:
>>>
>>> Hello
>>> We are in process in procuring one small Lustre filesystem giving us 120
>>> TB  of storage using Lustre 2.X.
>>> The vendor has proposed only 1 MDS and 1 OSS as a solution.
>>> The query we have is that is this configuration enough , or we need more
>>> OSS?
>>> The MDS and OSS server are identical  with regards to RAM (64 GB) and
>>> HDD (300GB)
>>>
>>> Thanks
>>> Majid
>>>
>>>
>>> ___
>>> lustre-discuss mailing 
>>> listlustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>>
>>>
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>
>
> --
> --
> Jeff Johnson
> Co-Founder
> Aeon Computing
>
> jeff.john...@aeoncomputing.com
> www.aeoncomputing.com
> t: 858-412-3810 x1001 <(858)%20412-3810>   f: 858-412-3845
> <(858)%20412-3845>
> m: 619-204-9061 <(619)%20204-9061>
>
> 4170 Morena Boulevard, Suite D - San Diego, CA 92117
> 
>
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Jeff Johnson
Amjad,

You might ask your vendor to propose a single MDT comprised of (8 * 500GB)
2.5" disk drives or better, SSDs. With some bio applications you would
benefit from spreading the MDT I/O across more drives.

How many clients to you expect to mount the file system? A standard filer
(or ZFS/NFS server) will perform well compared to Lustre until you
bottleneck somewhere in the server hardware (net, disk, cpu, etc), with
Lustre you can add simply add one or more OSS/OSTs to the file system and
performance potential increases by the number of additional OSS/OST servers.

High-availability is nice to have but it isn't necessary unless your
environment cannot tolerate any interruption or downtime. If your vendor
proposes quality hardware these cases are infrequent.

--Jeff

On Mon, Oct 30, 2017 at 12:04 PM, Amjad Syed  wrote:

> The vendor has proposed a single MDT  ( 4 * 1.2 TB) in RAID 10
> configuration.
> The OST will be RAID 6  and proposed are 2 OST.
>
>
> On Mon, Oct 30, 2017 at 7:55 PM, Ben Evans  wrote:
>
>> How many OST's are behind that OSS?  How many MDT's behind the MDS?
>>
>> From: lustre-discuss  on behalf
>> of Brian Andrus 
>> Date: Monday, October 30, 2017 at 12:24 PM
>> To: "lustre-discuss@lists.lustre.org" 
>> Subject: Re: [lustre-discuss] 1 MDS and 1 OSS
>>
>> Hmm. That is an odd one from a quick thought...
>>
>> However, IF you are planning on growing and adding OSSes/OSTs, this is
>> not a bad way to get started and used to how everything works. It is
>> basically a single stripe storage.
>>
>> If you are not planning on growing, I would lean towards gluster on 2
>> boxes. I do that often, actually. A single MDS/OSS has zero redundancy,
>> unless something is being done at harware level and that would help in
>> availability.
>> NFS is quite viable too, but you would be splitting the available storage
>> on 2 boxes.
>>
>> Brian Andrus
>>
>>
>>
>> On 10/30/2017 12:47 AM, Amjad Syed wrote:
>>
>> Hello
>> We are in process in procuring one small Lustre filesystem giving us 120
>> TB  of storage using Lustre 2.X.
>> The vendor has proposed only 1 MDS and 1 OSS as a solution.
>> The query we have is that is this configuration enough , or we need more
>> OSS?
>> The MDS and OSS server are identical  with regards to RAM (64 GB) and
>> HDD (300GB)
>>
>> Thanks
>> Majid
>>
>>
>> ___
>> lustre-discuss mailing 
>> listlustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Dilger, Andreas
On Oct 31, 2017, at 07:35, Andrew Elwell  wrote:
> 
> 
> 
> On 31 Oct. 2017 07:20, "Dilger, Andreas"  wrote:
>> 
>> Having a larger MDT isn't bad if you plan future expansion.  That said, you 
>> would get better performance over FDR if you used SSDs for the MDT rather 
>> than HDDs (if you aren't already planning this), and for a single OSS you 
>> probably don't need the extra MDT capacity.  With both ldiskfs+LVM and ZFS 
>> you can also expand the MDT size in the future if you need more capacity.
> 
> Can someone with wiki editing rights summarise the advantages of different 
> hardware combinations? For example I remember Daniel @ NCI had some nice 
> comments about which components (MDS v OSS) benefited from faster cores over 
> thread count and where more RAM was important.
> 
> I feel this would be useful for people building small test systems and 
> comparing vendor responses for large tenders.

Everyone has wiki editing rights, you just need to register...

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Andrew Elwell
On 31 Oct. 2017 07:20, "Dilger, Andreas"  wrote:


Having a larger MDT isn't bad if you plan future expansion.  That said, you
would get better performance over FDR if you used SSDs for the MDT rather
than HDDs (if you aren't already planning this), and for a single OSS you
probably don't need the extra MDT capacity.  With both ldiskfs+LVM and ZFS
you can also expand the MDT size in the future if you need more capacity.


Can someone with wiki editing rights summarise the advantages of different
hardware combinations? For example I remember Daniel @ NCI had some nice
comments about which components (MDS v OSS) benefited from faster cores
over thread count and where more RAM was important.

I feel this would be useful for people building small test systems and
comparing vendor responses for large tenders.

Many thanks,
Andrew
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Dilger, Andreas
On Oct 31, 2017, at 05:46, Mohr Jr, Richard Frank (Rick Mohr)  
wrote:
> 
>> On Oct 30, 2017, at 4:46 PM, Brian Andrus  wrote:
>> 
>> Someone please correct me if I am wrong, but that seems a bit large of an 
>> MDT. Of course drives these days are pretty good sized, so the extra is 
>> probably very inexpensive.
> 
> That probably depends on what the primary usage will be.  If the applications 
> create lots of small files (like some biomed programs), then a larger MDT 
> would result in more inodes allowing more Lustre files to be created.

With mirroring the MDT ends up as ~2.4TB (about 1.2B files for ldiskfs, 600M 
files for ZFS), which gives an minimum average file size of 120TB/1.2B = 100KB 
on the OSTs (200KB for ZFS).  That said, by default you won't be able to create 
so many files on the OSTs unless you reduce the inode ratio for ldiskfs at 
format time, or use ZFS (which doesn't have a fixed inode count, but uses twice 
as much space per inode ob the MDT). 

Having a larger MDT isn't bad if you plan future expansion.  That said, you 
would get better performance over FDR if you used SSDs for the MDT rather than 
HDDs (if you aren't already planning this), and for a single OSS you probably 
don't need the extra MDT capacity.  With both ldiskfs+LVM and ZFS you can also 
expand the MDT size in the future if you need more capacity.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread Dilger, Andreas
The 2.10 release added support for multi-rail LNet, which may potentially be 
causing problems here. I would suggest to install an older LNet version on your 
routers to match your client/server.

You may need to build your own RPMs for your new kernel, but can use 
--disable-server for configure to simplify things.

Cheers, Andreas

On Oct 31, 2017, at 04:45, Kevin M. Hildebrand 
> wrote:

Thanks, I completely missed that.  Indeed the ko2iblnd parameters were 
different between the servers and the router.  I've updated the parameters on 
the router to match those on the server, and things haven't gotten any better.  
(The problem appears to be on the Ethernet side anyway, so you've probably 
helped me fix a problem I didn't know I had...)
I don't see much discussion about configuring lnet parameters for Ethernet 
networks, I assume that's using ksocklnd.  On that side, it appears that all of 
the ksocklnd parameters match between the router and clients.  Interesting that 
peer_timeout is 180, which is almost exactly when my client gets marked down on 
the router.

Server (and now router) ko2iblnd parameters:
peer_credits 8
peer_credits_hiw 4
credits 256
concurrent_sends 8
ntx 512
map_on_demand 0
fmr_pool_size 512
fmr_flush_trigger 384
fmr_cache 1

Client and router ksocklnd:
peer_timeout 180
peer_credits 8
keepalive 30
sock_timeout 50
credits 256
rx_buffer_size 0
tx_buffer_size 0
keepalive_idle 30
round_robin 1
sock_timeout 50

Thanks,
Kevin


On Mon, Oct 30, 2017 at 4:16 PM, Mohr Jr, Richard Frank (Rick Mohr) 
> wrote:

> On Oct 30, 2017, at 8:47 AM, Kevin M. Hildebrand 
> > wrote:
>
> All of the hosts (client, server, router) have the following in ko2iblnd.conf:
>
> alias ko2iblnd-opa ko2iblnd
> options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 
> concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
> fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
>
> install ko2iblnd /usr/sbin/ko2iblnd-probe

Those parameters will only get applied to omnipath interfaces (which you don’t 
have), so everything you have should just be running with default parameters.  
Since your lnet routers have a different version of lustre than your 
servers/clients, it might be possible that the default values for the ko2iblnd 
parameters are different between the two versions.  You can always check this 
by looking at the values in the files under /sys/module/ko2iblnd/parameters.  
It might be worthwhile to compare those values on the lnet routers to the 
values on the servers to see if maybe there is a difference that could affect 
the behavior.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Mohr Jr, Richard Frank (Rick Mohr)

> On Oct 30, 2017, at 4:46 PM, Brian Andrus  wrote:
> 
> Someone please correct me if I am wrong, but that seems a bit large of an 
> MDT. Of course drives these days are pretty good sized, so the extra is 
> probably very inexpensive.

That probably depends on what the primary usage will be.  If the applications 
create lots of small files (like some biomed programs), then a larger MDT would 
result in more inodes allowing more Lustre files to be created.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Brian Andrus
Someone please correct me if I am wrong, but that seems a bit large of 
an MDT. Of course drives these days are pretty good sized, so the extra 
is probably very inexpensive.


Also, isn't it better to have 1 OST per OSS for parallelism rather than 
adding OSTs to an OSS? I've been doing most of my OSTs as ZFS and 
letting that handle parallel writes across drives within an OSS, which 
has performed well.


Brian Andrus


On 10/30/2017 12:04 PM, Amjad Syed wrote:
The vendor has proposed a single MDT  ( 4 * 1.2 TB) in RAID 10 
configuration.

The OST will be RAID 6  and proposed are 2 OST.


On Mon, Oct 30, 2017 at 7:55 PM, Ben Evans > wrote:


How many OST's are behind that OSS?  How many MDT's behind the MDS?

From: lustre-discuss > on behalf of
Brian Andrus >
Date: Monday, October 30, 2017 at 12:24 PM
To: "lustre-discuss@lists.lustre.org
"
>
Subject: Re: [lustre-discuss] 1 MDS and 1 OSS

Hmm. That is an odd one from a quick thought...

However, IF you are planning on growing and adding OSSes/OSTs,
this is not a bad way to get started and used to how everything
works. It is basically a single stripe storage.

If you are not planning on growing, I would lean towards gluster
on 2 boxes. I do that often, actually. A single MDS/OSS has zero
redundancy, unless something is being done at harware level and
that would help in availability.
NFS is quite viable too, but you would be splitting the available
storage on 2 boxes.

Brian Andrus



On 10/30/2017 12:47 AM, Amjad Syed wrote:

Hello
We are in process in procuring one small Lustre filesystem giving
us 120 TB of storage using Lustre 2.X.
The vendor has proposed only 1 MDS and 1 OSS as a solution.
The query we have is that is this configuration enough , or we
need more OSS?
The MDS and OSS server are identical with regards to RAM (64 GB)
and  HDD (300GB)

Thanks
Majid


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org

http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org

http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread Kevin M. Hildebrand
Thanks, I completely missed that.  Indeed the ko2iblnd parameters were
different between the servers and the router.  I've updated the parameters
on the router to match those on the server, and things haven't gotten any
better.  (The problem appears to be on the Ethernet side anyway, so you've
probably helped me fix a problem I didn't know I had...)
I don't see much discussion about configuring lnet parameters for Ethernet
networks, I assume that's using ksocklnd.  On that side, it appears that
all of the ksocklnd parameters match between the router and clients.
Interesting that peer_timeout is 180, which is almost exactly when my
client gets marked down on the router.

Server (and now router) ko2iblnd parameters:
peer_credits 8
peer_credits_hiw 4
credits 256
concurrent_sends 8
ntx 512
map_on_demand 0
fmr_pool_size 512
fmr_flush_trigger 384
fmr_cache 1

Client and router ksocklnd:
peer_timeout 180
peer_credits 8
keepalive 30
sock_timeout 50
credits 256
rx_buffer_size 0
tx_buffer_size 0
keepalive_idle 30
round_robin 1
sock_timeout 50

Thanks,
Kevin


On Mon, Oct 30, 2017 at 4:16 PM, Mohr Jr, Richard Frank (Rick Mohr) <
rm...@utk.edu> wrote:

>
> > On Oct 30, 2017, at 8:47 AM, Kevin M. Hildebrand  wrote:
> >
> > All of the hosts (client, server, router) have the following in
> ko2iblnd.conf:
> >
> > alias ko2iblnd-opa ko2iblnd
> > options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
> concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048
> fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
> >
> > install ko2iblnd /usr/sbin/ko2iblnd-probe
>
> Those parameters will only get applied to omnipath interfaces (which you
> don’t have), so everything you have should just be running with default
> parameters.  Since your lnet routers have a different version of lustre
> than your servers/clients, it might be possible that the default values for
> the ko2iblnd parameters are different between the two versions.  You can
> always check this by looking at the values in the files under
> /sys/module/ko2iblnd/parameters.  It might be worthwhile to compare those
> values on the lnet routers to the values on the servers to see if maybe
> there is a difference that could affect the behavior.
>
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread Mohr Jr, Richard Frank (Rick Mohr)

> On Oct 30, 2017, at 8:47 AM, Kevin M. Hildebrand  wrote:
> 
> All of the hosts (client, server, router) have the following in ko2iblnd.conf:
> 
> alias ko2iblnd-opa ko2iblnd
> options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 
> concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
> fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
> 
> install ko2iblnd /usr/sbin/ko2iblnd-probe

Those parameters will only get applied to omnipath interfaces (which you don’t 
have), so everything you have should just be running with default parameters.  
Since your lnet routers have a different version of lustre than your 
servers/clients, it might be possible that the default values for the ko2iblnd 
parameters are different between the two versions.  You can always check this 
by looking at the values in the files under /sys/module/ko2iblnd/parameters.  
It might be worthwhile to compare those values on the lnet routers to the 
values on the servers to see if maybe there is a difference that could affect 
the behavior.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Ben Evans
It's not going to matter.  There aren't enough physical drives to push the 
Infiniband link, unless they're all SSDs.

From: Simon Guilbault 
>
Date: Monday, October 30, 2017 at 3:13 PM
To: Amjad Syed >
Cc: Ben Evans >, 
"lustre-discuss@lists.lustre.org" 
>
Subject: Re: [lustre-discuss] 1 MDS and 1 OSS

Hi,

If everything is connected with SAS JBOD and controllers, you could probably 
run 1 OST on each server and get better performance that way. With both server 
reaching the same SAS drives, you could also have a failover in case one server 
does not work.

You can forget about failover if you are using SATA drives.

On Mon, Oct 30, 2017 at 3:04 PM, Amjad Syed 
> wrote:
The vendor has proposed a single MDT  ( 4 * 1.2 TB) in RAID 10 configuration.
The OST will be RAID 6  and proposed are 2 OST.


On Mon, Oct 30, 2017 at 7:55 PM, Ben Evans 
> wrote:
How many OST's are behind that OSS?  How many MDT's behind the MDS?

From: lustre-discuss 
>
 on behalf of Brian Andrus >
Date: Monday, October 30, 2017 at 12:24 PM
To: "lustre-discuss@lists.lustre.org" 
>
Subject: Re: [lustre-discuss] 1 MDS and 1 OSS


Hmm. That is an odd one from a quick thought...

However, IF you are planning on growing and adding OSSes/OSTs, this is not a 
bad way to get started and used to how everything works. It is basically a 
single stripe storage.

If you are not planning on growing, I would lean towards gluster on 2 boxes. I 
do that often, actually. A single MDS/OSS has zero redundancy, unless something 
is being done at harware level and that would help in availability.
NFS is quite viable too, but you would be splitting the available storage on 2 
boxes.

Brian Andrus


On 10/30/2017 12:47 AM, Amjad Syed wrote:
Hello
We are in process in procuring one small Lustre filesystem giving us 120 TB  of 
storage using Lustre 2.X.
The vendor has proposed only 1 MDS and 1 OSS as a solution.
The query we have is that is this configuration enough , or we need more OSS?
The MDS and OSS server are identical  with regards to RAM (64 GB) and  HDD 
(300GB)

Thanks
Majid



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Simon Guilbault
Hi,

If everything is connected with SAS JBOD and controllers, you could
probably run 1 OST on each server and get better performance that way. With
both server reaching the same SAS drives, you could also have a failover in
case one server does not work.

You can forget about failover if you are using SATA drives.

On Mon, Oct 30, 2017 at 3:04 PM, Amjad Syed  wrote:

> The vendor has proposed a single MDT  ( 4 * 1.2 TB) in RAID 10
> configuration.
> The OST will be RAID 6  and proposed are 2 OST.
>
>
> On Mon, Oct 30, 2017 at 7:55 PM, Ben Evans  wrote:
>
>> How many OST's are behind that OSS?  How many MDT's behind the MDS?
>>
>> From: lustre-discuss  on behalf
>> of Brian Andrus 
>> Date: Monday, October 30, 2017 at 12:24 PM
>> To: "lustre-discuss@lists.lustre.org" 
>> Subject: Re: [lustre-discuss] 1 MDS and 1 OSS
>>
>> Hmm. That is an odd one from a quick thought...
>>
>> However, IF you are planning on growing and adding OSSes/OSTs, this is
>> not a bad way to get started and used to how everything works. It is
>> basically a single stripe storage.
>>
>> If you are not planning on growing, I would lean towards gluster on 2
>> boxes. I do that often, actually. A single MDS/OSS has zero redundancy,
>> unless something is being done at harware level and that would help in
>> availability.
>> NFS is quite viable too, but you would be splitting the available storage
>> on 2 boxes.
>>
>> Brian Andrus
>>
>>
>>
>> On 10/30/2017 12:47 AM, Amjad Syed wrote:
>>
>> Hello
>> We are in process in procuring one small Lustre filesystem giving us 120
>> TB  of storage using Lustre 2.X.
>> The vendor has proposed only 1 MDS and 1 OSS as a solution.
>> The query we have is that is this configuration enough , or we need more
>> OSS?
>> The MDS and OSS server are identical  with regards to RAM (64 GB) and
>> HDD (300GB)
>>
>> Thanks
>> Majid
>>
>>
>> ___
>> lustre-discuss mailing 
>> listlustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Amjad Syed
Andreas,
Thank you for your email.
The  interconnect  proposed by Vendor is  Infiniband FDR , 56 GB/s.  Each
MDS and OSS will have only FDR Card.
This Lustre will be used to run  Life Sciences/Bioinformatics/genomics
applications .

Will single OSS handle  FDR interconnect.?

On 30 Oct 2017 4:56 p.m., "Dilger, Andreas" 
wrote:

> First, to answer Amjad's question - the number of OSS nodes you have
> depends
> on the capacity and performance you need.  For 120TB of total storage
> (assume 30x4TB drives, or 20x60TB drives) a single OSS is definitely
> capable of handling this many drives.  I'd also assume you are using 10Gb
> Ethernet (~= 1GB/s), which  a single OSS should be able to saturate (at
> either 40MB/s or 60MB/s per data drive with RAID-6 8+2 LUNs).  If you want
> more capacity or bandwidth, you can add more OSS nodes now or in the future.
>
> As Ravi mentioned, with a single OSS and MDS, you will need to reboot the
> single server in case of failures instead of having automatic failover, but
> for some systems this is fine.
>
> Finally, as for whether Lustre on a single MDS+OSS is better than running
> NFS on a single server, that depends mostly on the application workload.
> NFS is easier to administer than Lustre, and will provide better small file
> performance than Lustre.  NFS also has the benefit that it works with every
> client available.
>
> Interestingly, there are some workloads that users have reported to us
> where a single Lustre OSS will perform better than NFS, because Lustre does
> proper data locking/caching, while NFS has only close-to-open consistency
> semantics, and cannot cache data on the client for a long time.  Any
> workloads where there are multiple writers/readers to the same file will
> just not function properly with NFS.  Lustre will handle a large number of
> clients better than NFS.  For streaming IO loads, Lustre is better able to
> saturate the network (though for slower networks this doesn't really make
> much difference).  Lustre can drive faster networks (e.g. IB) much better
> with LNet than NFS with IPoIB.
>
> And of course, if you think your performance/capacity needs will increase
> in the future, then Lustre can easily scale to virtually any size and
> performance you need, while NFS will not.
>
> In general I wouldn't necessarily recommend Lustre for a single MDS+OSS
> installation, but this depends on your workload and future plans.
>
> Cheers, Andreas
>
> On Oct 30, 2017, at 15:59, E.S. Rosenberg 
> wrote:
> >
> > Maybe someone can answer this in the context of this question, is there
> any performance gain over classic filers when you are using only a single
> OSS?
> >
> > On Mon, Oct 30, 2017 at 9:56 AM, Ravi Konila 
> wrote:
> > Hi Majid
> >
> > It is better to go for HA for both OSS and MDS. You would need 2 nos of
> MDS and 2 nos of OSS (identical configuration).
> > Also use latest Lustre 2.10.1 release.
> >
> > Regards
> > Ravi Konila
> >
> >
> >> From: Amjad Syed
> >> Sent: Monday, October 30, 2017 1:17 PM
> >> To: lustre-discuss@lists.lustre.org
> >> Subject: [lustre-discuss] 1 MDS and 1 OSS
> >>
> >> Hello
> >> We are in process in procuring one small Lustre filesystem giving us
> 120 TB  of storage using Lustre 2.X.
> >> The vendor has proposed only 1 MDS and 1 OSS as a solution.
> >> The query we have is that is this configuration enough , or we need
> more OSS?
> >> The MDS and OSS server are identical  with regards to RAM (64 GB) and
> HDD (300GB)
> >>
> >> Thanks
> >> Majid
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation
>
>
>
>
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread Kevin M. Hildebrand
I received a reply from Alejandro suggesting that I check
live_router_check_interval, dead_router_check_interval and
router_ping_timeout.
I had those set to the defaults, which I assume are 60, 60, and 50 seconds
respectively.  I did just try setting those values explicitly, and I'm not
seeing any better behavior.
>From watching /proc/sys/lnet/routers on the client, I see that the client
is indeed sending router pings every 60 seconds.  On the router itself,
watching /proc/sys/lnet/peers immediately after doing 'lctl net down; lctl
net up', I see the 'last' column for my test client count from 0 up to
around 180, at which point the client is marked 'down'.  (For the other
peers, all of which are servers, the values count from 0 to around 180 and
then reset to 0, remaining 'up')
Is the 'last' column reflecting the last time the router has received a
'ping' from that peer?  If so, why do the numbers count to 180 instead of
60, which is the frequency they're being sent?

Thanks,
Kevin

On Mon, Oct 30, 2017 at 8:47 AM, Kevin M. Hildebrand  wrote:

> Hello, I'm trying to set up some new Lustre routers between a set of
> Infiniband connected Lustre servers and a few hosts connected to an
> external 100G Ethernet network.   The problem I'm having is that the
> routers work just fine for a minute or two, and then shortly thereafter
> they're marked as 'down' and all traffic stops.  If I unload/reload the
> lustre modules on the router, it'll work again for a short time and then
> stop again.  The router shows errors like:
> [236528.801275] LNetError: 54389:0:(lib-move.c:2120:lnet_parse_get())
> 10.10.104.2@tcp2: Unable to send REPLY for GET from
> 12345-10.10.104.201@tcp2: -113
>
> My Lustre router has a Mellanox ConnectX-3 interface connecting to the
> Lustre servers, and a Mellanox ConnectX-5
> ​100G ​
> interface connecting to a 100G switch to which my test client is connected.
> ​  ​
> On the Infiniband side, I've got
> ​lnet​
> ​ configured as o2ib1
> ​​
> , and on the Ethernet side, as tcp2.
>
> Clients and servers are all running Lustre 2.8.  The Lustre router at the
> moment is running Lustre 2.10.1, because of software dependencies to
> support the 100G card.
>
> I've verified that I have stable network connectivity on both the IB and
> Ethernet sides.
>
> At the moment, I have very simple lnet configurations, using the built in
> defaults.  lnet.conf on the server:
> options lnet ip2nets="o2ib1(ib0) 192.168.[64-95].*; tcp1
> 10.103.[128-159].*" routes="tcp0 192.168.64.[78-79]@o2ib1; tcp2
> 192.168.64.[78-79]@o2ib1"
>
> On the lustre router:
> options lnet networks="o2ib1(ib0),tcp2(p1p1.104)" "forwarding=enabled"
>
> And on the client:
> options lnet networks="tcp2(p4p1.104)" routes="o2ib1 10.10.104.[2-3]@tcp2"
>
> All of the hosts (client, server, router) have the following in
> ko2iblnd.conf:
>
> alias ko2iblnd-opa ko2iblnd
> options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
> concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048
> fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
>
> install ko2iblnd /usr/sbin/ko2iblnd-probe
>
>
> Does anyone see anything I've missed, or have any thoughts on where I
> should look next?
>
> Thanks,
> Kevin
>
> --
> Kevin Hildebrand
> University of Maryland, College Park
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Ben Evans
How many OST's are behind that OSS?  How many MDT's behind the MDS?

From: lustre-discuss 
>
 on behalf of Brian Andrus >
Date: Monday, October 30, 2017 at 12:24 PM
To: "lustre-discuss@lists.lustre.org" 
>
Subject: Re: [lustre-discuss] 1 MDS and 1 OSS


Hmm. That is an odd one from a quick thought...

However, IF you are planning on growing and adding OSSes/OSTs, this is not a 
bad way to get started and used to how everything works. It is basically a 
single stripe storage.

If you are not planning on growing, I would lean towards gluster on 2 boxes. I 
do that often, actually. A single MDS/OSS has zero redundancy, unless something 
is being done at harware level and that would help in availability.
NFS is quite viable too, but you would be splitting the available storage on 2 
boxes.

Brian Andrus


On 10/30/2017 12:47 AM, Amjad Syed wrote:
Hello
We are in process in procuring one small Lustre filesystem giving us 120 TB  of 
storage using Lustre 2.X.
The vendor has proposed only 1 MDS and 1 OSS as a solution.
The query we have is that is this configuration enough , or we need more OSS?
The MDS and OSS server are identical  with regards to RAM (64 GB) and  HDD 
(300GB)

Thanks
Majid



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Brian Andrus

Hmm. That is an odd one from a quick thought...

However, IF you are planning on growing and adding OSSes/OSTs, this is 
not a bad way to get started and used to how everything works. It is 
basically a single stripe storage.


If you are not planning on growing, I would lean towards gluster on 2 
boxes. I do that often, actually. A single MDS/OSS has zero redundancy, 
unless something is being done at harware level and that would help in 
availability.
NFS is quite viable too, but you would be splitting the available 
storage on 2 boxes.


Brian Andrus



On 10/30/2017 12:47 AM, Amjad Syed wrote:

Hello
We are in process in procuring one small Lustre filesystem giving us 
120 TB  of storage using Lustre 2.X.

The vendor has proposed only 1 MDS and 1 OSS as a solution.
The query we have is that is this configuration enough , or we need 
more OSS?
The MDS and OSS server are identical  with regards to RAM (64 GB) and  
HDD (300GB)


Thanks
Majid


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Dilger, Andreas
First, to answer Amjad's question - the number of OSS nodes you have depends
on the capacity and performance you need.  For 120TB of total storage (assume 
30x4TB drives, or 20x60TB drives) a single OSS is definitely capable of 
handling this many drives.  I'd also assume you are using 10Gb Ethernet (~= 
1GB/s), which  a single OSS should be able to saturate (at either 40MB/s or 
60MB/s per data drive with RAID-6 8+2 LUNs).  If you want more capacity or 
bandwidth, you can add more OSS nodes now or in the future.

As Ravi mentioned, with a single OSS and MDS, you will need to reboot the 
single server in case of failures instead of having automatic failover, but for 
some systems this is fine.

Finally, as for whether Lustre on a single MDS+OSS is better than running NFS 
on a single server, that depends mostly on the application workload.  NFS is 
easier to administer than Lustre, and will provide better small file 
performance than Lustre.  NFS also has the benefit that it works with every 
client available.

Interestingly, there are some workloads that users have reported to us where a 
single Lustre OSS will perform better than NFS, because Lustre does proper data 
locking/caching, while NFS has only close-to-open consistency semantics, and 
cannot cache data on the client for a long time.  Any workloads where there are 
multiple writers/readers to the same file will just not function properly with 
NFS.  Lustre will handle a large number of clients better than NFS.  For 
streaming IO loads, Lustre is better able to saturate the network (though for 
slower networks this doesn't really make much difference).  Lustre can drive 
faster networks (e.g. IB) much better with LNet than NFS with IPoIB.

And of course, if you think your performance/capacity needs will increase in 
the future, then Lustre can easily scale to virtually any size and performance 
you need, while NFS will not.

In general I wouldn't necessarily recommend Lustre for a single MDS+OSS 
installation, but this depends on your workload and future plans.

Cheers, Andreas

On Oct 30, 2017, at 15:59, E.S. Rosenberg  wrote:
> 
> Maybe someone can answer this in the context of this question, is there any 
> performance gain over classic filers when you are using only a single OSS?
> 
> On Mon, Oct 30, 2017 at 9:56 AM, Ravi Konila  wrote:
> Hi Majid
>  
> It is better to go for HA for both OSS and MDS. You would need 2 nos of MDS 
> and 2 nos of OSS (identical configuration).
> Also use latest Lustre 2.10.1 release.
>  
> Regards
> Ravi Konila
>  
>  
>> From: Amjad Syed
>> Sent: Monday, October 30, 2017 1:17 PM
>> To: lustre-discuss@lists.lustre.org
>> Subject: [lustre-discuss] 1 MDS and 1 OSS
>>  
>> Hello
>> We are in process in procuring one small Lustre filesystem giving us 120 TB  
>> of storage using Lustre 2.X.
>> The vendor has proposed only 1 MDS and 1 OSS as a solution.
>> The query we have is that is this configuration enough , or we need more OSS?
>> The MDS and OSS server are identical  with regards to RAM (64 GB) and  HDD 
>> (300GB)
>>  
>> Thanks
>> Majid

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread LOPEZ, ALEXANDRE
Hi Kevin,

Just wild-guessing here. Have you tried playing with the 
live_router_check_interval, dead_router_check_interval and router_ping_timeout 
LNet parameters?

HTH,
Alejandro

From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Kevin M. Hildebrand
Sent: Monday, October 30, 2017 1:47 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Lustre routing help needed

Hello, I'm trying to set up some new Lustre routers between a set of Infiniband 
connected Lustre servers and a few hosts connected to an external 100G Ethernet 
network.   The problem I'm having is that the routers work just fine for a 
minute or two, and then shortly thereafter they're marked as 'down' and all 
traffic stops.  If I unload/reload the lustre modules on the router, it'll work 
again for a short time and then stop again.  The router shows errors like:
[236528.801275] LNetError: 54389:0:(lib-move.c:2120:lnet_parse_get()) 
10.10.104.2@tcp2: Unable to send REPLY for GET from 
12345-10.10.104.201@tcp2: -113
My Lustre router has a Mellanox ConnectX-3 interface connecting to the Lustre 
servers, and a Mellanox ConnectX-5
​100G ​
interface connecting to a 100G switch to which my test client is connected.
​  ​
On the Infiniband side, I've got
​lnet​
​ configured as o2ib1
​​
, and on the Ethernet side, as tcp2.

Clients and servers are all running Lustre 2.8.  The Lustre router at the 
moment is running Lustre 2.10.1, because of software dependencies to support 
the 100G card.

I've verified that I have stable network connectivity on both the IB and 
Ethernet sides.

At the moment, I have very simple lnet configurations, using the built in 
defaults.  lnet.conf on the server:
options lnet ip2nets="o2ib1(ib0) 192.168.[64-95].*; tcp1 10.103.[128-159].*" 
routes="tcp0 192.168.64.[78-79]@o2ib1; tcp2 192.168.64.[78-79]@o2ib1"

On the lustre router:
options lnet networks="o2ib1(ib0),tcp2(p1p1.104)" "forwarding=enabled"

And on the client:
options lnet networks="tcp2(p4p1.104)" routes="o2ib1 10.10.104.[2-3]@tcp2"

All of the hosts (client, server, router) have the following in ko2iblnd.conf:

alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe


Does anyone see anything I've missed, or have any thoughts on where I should 
look next?

Thanks,
Kevin

--
Kevin Hildebrand
University of Maryland, College Park
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ldiskfsprogs

2017-10-30 Thread Parag Khuraswar
Hi,

The problem got resolved.
 But I am not able to see ib in 'lctl list_nids' output
 My lnet.conf file entry is 'options lnet networks=o2ib(ib0)' This file is not 
executable.

 Can you help ?

Regards,
Parag



-Original Message-
From: Martin Hecht [mailto:he...@hlrs.de] 
Sent: Monday, October , 2017 6:26 PM
To: Parag Khuraswar; Lustre discussion
Subject: Re: [lustre-discuss] ldiskfsprogs

Hi Parag,

please reply to the list or keep it in cc at least

On 10/30/2017 01:21 PM, Parag Khuraswar wrote:
> Hi Martin,
>
> The problem got resolved.
> But I am not able to see ib in 'lctl list_nids' output
> My lnet.conf file entry is 'options lnet networks=o2ib(ib0)' This file is
> not executable.
>
> Can you help ?
>
> Regards,
> Parag
your lnet is probably not configured correctly. Things to check:
- is the ib0 device there (i.e. make sure the infiniband layer works
correctly)?
- does the ib0 haven an ip address? (lustre normally doesn't use ip over
ib but it uses the ip-addresses for identifying the hosts)
- verify that you can ping the ip (with normal network ping to ensure
that the connection is working)
- Is the lnet module loaded?
- if not can you load it manually with modprobe lnet?
- what is written to dmesg / syslog when it fails?
- when the module is loaded, try lctl network up

Martin


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ldiskfsprogs

2017-10-30 Thread Martin Hecht
Hi Parag,

please reply to the list or keep it in cc at least

On 10/30/2017 01:21 PM, Parag Khuraswar wrote:
> Hi Martin,
>
> The problem got resolved.
> But I am not able to see ib in 'lctl list_nids' output
> My lnet.conf file entry is 'options lnet networks=o2ib(ib0)' This file is
> not executable.
>
> Can you help ?
>
> Regards,
> Parag
your lnet is probably not configured correctly. Things to check:
- is the ib0 device there (i.e. make sure the infiniband layer works
correctly)?
- does the ib0 haven an ip address? (lustre normally doesn't use ip over
ib but it uses the ip-addresses for identifying the hosts)
- verify that you can ping the ip (with normal network ping to ensure
that the connection is working)
- Is the lnet module loaded?
- if not can you load it manually with modprobe lnet?
- what is written to dmesg / syslog when it fails?
- when the module is loaded, try lctl network up

Martin



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre routing help needed

2017-10-30 Thread Kevin M. Hildebrand
Hello, I'm trying to set up some new Lustre routers between a set of
Infiniband connected Lustre servers and a few hosts connected to an
external 100G Ethernet network.   The problem I'm having is that the
routers work just fine for a minute or two, and then shortly thereafter
they're marked as 'down' and all traffic stops.  If I unload/reload the
lustre modules on the router, it'll work again for a short time and then
stop again.  The router shows errors like:
[236528.801275] LNetError: 54389:0:(lib-move.c:2120:lnet_parse_get())
10.10.104.2@tcp2: Unable to send REPLY for GET from 12345-10.10.104.201@tcp2:
-113

My Lustre router has a Mellanox ConnectX-3 interface connecting to the
Lustre servers, and a Mellanox ConnectX-5
​100G ​
interface connecting to a 100G switch to which my test client is connected.
​  ​
On the Infiniband side, I've got
​lnet​
​ configured as o2ib1
​​
, and on the Ethernet side, as tcp2.

Clients and servers are all running Lustre 2.8.  The Lustre router at the
moment is running Lustre 2.10.1, because of software dependencies to
support the 100G card.

I've verified that I have stable network connectivity on both the IB and
Ethernet sides.

At the moment, I have very simple lnet configurations, using the built in
defaults.  lnet.conf on the server:
options lnet ip2nets="o2ib1(ib0) 192.168.[64-95].*; tcp1
10.103.[128-159].*" routes="tcp0 192.168.64.[78-79]@o2ib1; tcp2
192.168.64.[78-79]@o2ib1"

On the lustre router:
options lnet networks="o2ib1(ib0),tcp2(p1p1.104)" "forwarding=enabled"

And on the client:
options lnet networks="tcp2(p4p1.104)" routes="o2ib1 10.10.104.[2-3]@tcp2"

All of the hosts (client, server, router) have the following in
ko2iblnd.conf:

alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe


Does anyone see anything I've missed, or have any thoughts on where I
should look next?

Thanks,
Kevin

--
Kevin Hildebrand
University of Maryland, College Park
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ldiskfsprogs

2017-10-30 Thread Martin Hecht
Hi,

On 10/30/2017 09:56 AM, Parag Khuraswar wrote:
> Hi,
>
> I am installing lustre cloned from github.  
Hmm... there are a few lustre related repositories on github.
I would prefer the upstream Lustre git repository managed by Intel
git://git.hpdd.intel.com unless you are interested in specific features
that are not (yet) available from there.

> After build of rpms I am trying
> to install lustre rpms. 
>
> I am getting below error 
>
> Requires: ldiskfsprogs >= 1.42.7.wc1
>
> But while compilation this package was not built.
ldiskfsprogs used to be called e2fsprogs. However, in my experience it
is a bit more a challenge to build these ones from source than for the
main lustre packages. Anyhow, in Intel's git Lustre repository
git://git.hpdd.intel.com there is also a branch tools/e2fsprogs.git - or
you can use pre-built rpms for your OS from

https://downloads.hpdd.intel.com/public/e2fsprogs/latest/

best regards,
Martin




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] ldiskfsprogs

2017-10-30 Thread Parag Khuraswar
Hi,

 

I am installing lustre cloned from github.  After build of rpms I am trying
to install lustre rpms. 

I am getting below error 

Requires: ldiskfsprogs >= 1.42.7.wc1

But while compilation this package was not built.

 

 

Regards,

Parag

 

 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread E.S. Rosenberg
Maybe someone can answer this in the context of this question, is there any
performance gain over classic filers when you are using only a single OSS?

On Mon, Oct 30, 2017 at 9:56 AM, Ravi Konila  wrote:

> Hi Majid
>
> It is better to go for HA for both OSS and MDS. You would need 2 nos of
> MDS and 2 nos of OSS (identical configuration).
> Also use latest Lustre 2.10.1 release.
>
> Regards
> *Ravi Konila*
>
>
> *From:* Amjad Syed
> *Sent:* Monday, October 30, 2017 1:17 PM
> *To:* lustre-discuss@lists.lustre.org
> *Subject:* [lustre-discuss] 1 MDS and 1 OSS
>
> Hello
> We are in process in procuring one small Lustre filesystem giving us 120
> TB  of storage using Lustre 2.X.
> The vendor has proposed only 1 MDS and 1 OSS as a solution.
> The query we have is that is this configuration enough , or we need more
> OSS?
> The MDS and OSS server are identical  with regards to RAM (64 GB) and  HDD
> (300GB)
>
> Thanks
> Majid
>
> --
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Ravi Konila
Hi Majid

It is better to go for HA for both OSS and MDS. You would need 2 nos of MDS and 
2 nos of OSS (identical configuration).
Also use latest Lustre 2.10.1 release.

Regards
Ravi Konila


From: Amjad Syed 
Sent: Monday, October 30, 2017 1:17 PM
To: lustre-discuss@lists.lustre.org 
Subject: [lustre-discuss] 1 MDS and 1 OSS

Hello 
We are in process in procuring one small Lustre filesystem giving us 120 TB  of 
storage using Lustre 2.X.
The vendor has proposed only 1 MDS and 1 OSS as a solution.
The query we have is that is this configuration enough , or we need more OSS?
The MDS and OSS server are identical  with regards to RAM (64 GB) and  HDD 
(300GB)

Thanks
Majid



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Amjad Syed
Hello
We are in process in procuring one small Lustre filesystem giving us 120
TB  of storage using Lustre 2.X.
The vendor has proposed only 1 MDS and 1 OSS as a solution.
The query we have is that is this configuration enough , or we need more
OSS?
The MDS and OSS server are identical  with regards to RAM (64 GB) and  HDD
(300GB)

Thanks
Majid
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org