Re: [lustre-discuss] Experience with resizing MDT

2018-09-21 Thread Andreas Dilger
On Sep 20, 2018, at 16:38, Mohr Jr, Richard Frank (Rick Mohr)  
wrote:
> 
> 
>> On Sep 19, 2018, at 8:09 PM, Colin Faber  wrote:
>> 
>> Why wouldn't you use DNE?
> 
> I am considering it as an option, but there appear to be some potential 
> drawbacks.
> 
> If I use DNE1, then I have to manually create directories on specific MDTs.  
> I will need to monitor MDT usage and make adjustments as necessary (which is 
> not the end of the world, but still involves some additional work).  This 
> might be fine when I am creating new top-level directories for new 
> users/projects, but any existing directories created before we add a new MDT 
> will still only use MDT0.  Since the bulk of our user/project directories 
> will be created early on, we still have the potential issue of running out of 
> inodes on MDT0.

Note that it is possible to create remote directories at any point in the 
filesystem.  If you set mdt.*.enable_remote_dir=1 then you can create 
directories that point back and forth across MDTs.  If you also set
mdt.*.enable_remote_dir_gid=-1 then all users can create remote directories.

> Based on that, I think DNE2 would be the better alternative, but it still has 
> similar limitations.  The directories created initially will still be only 
> striped over a single MDT.  When another MDT is added, I would need to 
> recursively adjust all the existing directories to have a stripe count of 2 
> (or risk having MDT0 run out of inodes).  Based on my understanding of how 
> the striped directories work, all the files in a striped directory are about 
> evenly split across all the MDTs that the directory is striped across (which 
> doesn’t work very well if MDT0 is mostly full and MDT1 is mostly empty).  
> Most likely we would want to have every directory striped across all MDTs, 
> but there is a note in the lustre manual explicitly mentioning that it’s not 
> a good idea to do this.

Yes, since remote and particularly striped directory creation has a non-zero 
overhead due to distributed transactions and ongoing extra RPC counts to 
access, it is better to limit remote and striped directories to ones that need 
it.

We're working on automating the use of DNE remote/striped directories.  In 2.12 
it is possible to use "lfs mkdir -i -1" and "lfs mkdir -c N" to automatically 
select one or more "good" MDT(s) (where "good" == least full right now), or 
"lfs mkdir -i m,n,p,q" to select a disjoint list of MDTs.

> So that is why I was thinking that resizing the MDT might be the simplest 
> approach.   Of course, I might be mistunderstanding something about DNE2, and 
> if that is the case, someone can correct me.  Of if there are options I am 
> not considering, I would welcome those too.

Yes, if you are not pushing the limits of MDT size, then resizing the MDT is a 
reasonable approach.  This also avoids issues with MDT imbalance, which is not 
ideal now, but we are working to improve.

Cheers, Andreas
---
Andreas Dilger
CTO Whamcloud






signature.asc
Description: Message signed with OpenPGP
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Second read or write performance

2018-09-21 Thread fırat yılmaz
Hi Patrick,

Thank you for clarifying flock capabilities.
So many think can cause the difference between 2 test results i saw in
dashboard, now i learn that flock has no effect on it.

Best Regars.





22 Eyl 2018 Cmt 04:14 tarihinde Patrick Farrell  şunu yazdı:

> Firat,
>
> I strongly suspect that careful remeasurement of flock on/off will show
> that removing the flock option had no effect at all.  It simply doesn’t DO
> anything like that - it controls a single flag that says, if you use flock
> operations, they work one way, or if it is not set, they work another way.
> It does nothing else, and has no impact on any part of file system
> operation except when flocks are used, and dd does not use flocks. It is
> simply impossible for the setting of the flock option to affect dd or
> performance level or variation, unless something using flocks is running at
> the same time.  (And even then, it would be affecting it indirectly)
>
> I’m pushing back strongly because I’ve repeatedly seen people on the
> mailing speculate about turning flock off as a way to increase performance,
> and it simply isn’t.
>
> - Patrick
>
>
> --
> *From:* fırat yılmaz 
> *Sent:* Friday, September 21, 2018 7:50:51 PM
> *To:* Patrick Farrell
> *Cc:* adil...@whamcloud.com; lustre-discuss@lists.lustre.org
> *Subject:* Re: [lustre-discuss] Second read or write performance
>
> The problem solved by adding lustre fine tuning parameter  to oss servers
>
> lctl set_param obdfilter.lı-lustrefolder-OST*.brw_size=16
>
> The flock is required by the application running in the filesystem so
> flock option is enabled
>
> removing flock decrased the divergence of the flactuations and about %5
> performance gain from iml dashboard
>
> Best Regards.
>
> On Sat, Sep 22, 2018 at 12:56 AM Patrick Farrell  wrote:
>
> Just 300 GiB, actually.  But that's still rather large and could skew
> things depending on OST size.
>
> - Patrick
>
> On 9/21/18, 4:43 PM, "lustre-discuss on behalf of Andreas Dilger" <
> lustre-discuss-boun...@lists.lustre.org on behalf of adil...@whamcloud.com>
> wrote:
>
> On Sep 21, 2018, at 00:43, fırat yılmaz 
> wrote:
> >
> > Hi Andreas,
> > Tests are made with dd, The test folder is created by the related
> application company, i will check that when i have connection. OST's has
> %85-86 free space  and filesystem mounted with flock option, i will ask for
> it to remove and test again.
>
> The "flock" option shouldn't make any difference, unless the
> application is actually doing userspace file locking in the code.
> Definitely "dd" will not be using it.
>
> What does "lfs getstripe" on the first and second file as well as the
> parent directory show, and "lfs df" for the filesystem?
>
> > Read test dd if=/vol1/test_read/dd.test.`hostname` of=/dev/null
> bs=1M count=30
> >
> > Write test dd if=/dev/zero of=/vol1/test_read/dd.test.2.`hostname`
> bs=1M count=30
>
> This is creating a single file of 300TB in size, so that is definitely
> going to skew the space allocation.
>
> Cheers, Andreas
>
> >
> > On Thu, Sep 20, 2018 at 10:57 PM Andreas Dilger <
> adil...@whamcloud.com> wrote:
> > On Sep 20, 2018, at 03:07, fırat yılmaz 
> wrote:
> > >
> > > Hi all,
> > >
> > > OS=Redhat 7.4
> > > Lustre Version: Intel® Manager for Lustre* software 4.0.3.0
> > > İnterconnect: Mellanox OFED, ConnectX-5
> > > 72 OST over 6 OSS with HA
> > > 1mdt and 1 mgt on 2 MDS with HA
> > >
> > > Lustre servers fine tuning parameters:
> > > lctl set_param timeout=600
> > > lctl set_param ldlm_timeout=200
> > > lctl set_param at_min=250
> > > lctl set_param at_max=600
> > > lctl set_param obdfilter.*.read_cache_enable=1
> > > lctl set_param obdfilter.*.writethrough_cache_enable=1
> > > lctl set_param obdfilter.lfs3test-OST*.brw_size=16
> > >
> > > Lustre clients fine tuning parameters:
> > > lctl set_param osc.*.checksums=0
> > > lctl set_param timeout=600
> > > lctl set_param at_min=250
> > > lctl set_param at_max=600
> > > lctl set_param ldlm.namespaces.*.lru_size=2000
> > > lctl set_param osc.*OST*.max_rpcs_in_flight=256
> > > lctl set_param osc.*OST*.max_dirty_mb=1024
> > > lctl set_param osc.*.max_pages_per_rpc=1024
> > > lctl set_param llite.*.max_read_ahead_mb=1024
> > > lctl set_param llite.*.max_read_ahead_per_file_mb=1024
> > >
> > > Mountpoint stripe count:72 stripesize:1M
> > >
> > > I have a 2Pb lustre filesystem, In the benchmark tests i get the
> optimum values for read and write, but when i start a concurrent I/O
> operation, second job throughput stays around 100-200Mb/s. I have tried
> lovering the stripe count to 36 but since the concurrent operations will
> not occur in a way that keeps OST volume inbalance, i think that its not a
> good way to move on, secondly i saw some discussion about turning off flock
> whic

Re: [lustre-discuss] Second read or write performance

2018-09-21 Thread Patrick Farrell
Firat,

I strongly suspect that careful remeasurement of flock on/off will show that 
removing the flock option had no effect at all.  It simply doesn’t DO anything 
like that - it controls a single flag that says, if you use flock operations, 
they work one way, or if it is not set, they work another way.  It does nothing 
else, and has no impact on any part of file system operation except when flocks 
are used, and dd does not use flocks. It is simply impossible for the setting 
of the flock option to affect dd or performance level or variation, unless 
something using flocks is running at the same time.  (And even then, it would 
be affecting it indirectly)

I’m pushing back strongly because I’ve repeatedly seen people on the mailing 
speculate about turning flock off as a way to increase performance, and it 
simply isn’t.

- Patrick



From: fırat yılmaz 
Sent: Friday, September 21, 2018 7:50:51 PM
To: Patrick Farrell
Cc: adil...@whamcloud.com; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Second read or write performance

The problem solved by adding lustre fine tuning parameter  to oss servers
lctl set_param obdfilter.lı-lustrefolder-OST*.brw_size=16

The flock is required by the application running in the filesystem so flock 
option is enabled

removing flock decrased the divergence of the flactuations and about %5 
performance gain from iml dashboard

Best Regards.

On Sat, Sep 22, 2018 at 12:56 AM Patrick Farrell 
mailto:p...@cray.com>> wrote:
Just 300 GiB, actually.  But that's still rather large and could skew things 
depending on OST size.

- Patrick

On 9/21/18, 4:43 PM, "lustre-discuss on behalf of Andreas Dilger" 
mailto:lustre-discuss-boun...@lists.lustre.org>
 on behalf of adil...@whamcloud.com> wrote:

On Sep 21, 2018, at 00:43, fırat yılmaz 
mailto:firatyilm...@gmail.com>> wrote:
>
> Hi Andreas,
> Tests are made with dd, The test folder is created by the related 
application company, i will check that when i have connection. OST's has  
%85-86 free space  and filesystem mounted with flock option, i will ask for it 
to remove and test again.

The "flock" option shouldn't make any difference, unless the application is 
actually doing userspace file locking in the code.  Definitely "dd" will not be 
using it.

What does "lfs getstripe" on the first and second file as well as the 
parent directory show, and "lfs df" for the filesystem?

> Read test dd if=/vol1/test_read/dd.test.`hostname` of=/dev/null bs=1M 
count=30
>
> Write test dd if=/dev/zero of=/vol1/test_read/dd.test.2.`hostname` bs=1M 
count=30

This is creating a single file of 300TB in size, so that is definitely 
going to skew the space allocation.

Cheers, Andreas

>
> On Thu, Sep 20, 2018 at 10:57 PM Andreas Dilger 
mailto:adil...@whamcloud.com>> wrote:
> On Sep 20, 2018, at 03:07, fırat yılmaz 
mailto:firatyilm...@gmail.com>> wrote:
> >
> > Hi all,
> >
> > OS=Redhat 7.4
> > Lustre Version: Intel® Manager for Lustre* software 4.0.3.0
> > İnterconnect: Mellanox OFED, ConnectX-5
> > 72 OST over 6 OSS with HA
> > 1mdt and 1 mgt on 2 MDS with HA
> >
> > Lustre servers fine tuning parameters:
> > lctl set_param timeout=600
> > lctl set_param ldlm_timeout=200
> > lctl set_param at_min=250
> > lctl set_param at_max=600
> > lctl set_param obdfilter.*.read_cache_enable=1
> > lctl set_param obdfilter.*.writethrough_cache_enable=1
> > lctl set_param obdfilter.lfs3test-OST*.brw_size=16
> >
> > Lustre clients fine tuning parameters:
> > lctl set_param osc.*.checksums=0
> > lctl set_param timeout=600
> > lctl set_param at_min=250
> > lctl set_param at_max=600
> > lctl set_param ldlm.namespaces.*.lru_size=2000
> > lctl set_param osc.*OST*.max_rpcs_in_flight=256
> > lctl set_param osc.*OST*.max_dirty_mb=1024
> > lctl set_param osc.*.max_pages_per_rpc=1024
> > lctl set_param llite.*.max_read_ahead_mb=1024
> > lctl set_param llite.*.max_read_ahead_per_file_mb=1024
> >
> > Mountpoint stripe count:72 stripesize:1M
> >
> > I have a 2Pb lustre filesystem, In the benchmark tests i get the 
optimum values for read and write, but when i start a concurrent I/O operation, 
second job throughput stays around 100-200Mb/s. I have tried lovering the 
stripe count to 36 but since the concurrent operations will not occur in a way 
that keeps OST volume inbalance, i think that its not a good way to move on, 
secondly i saw some discussion about turning off flock which ended up 
unpromising.
> >
> > As i check the stripe behaviour,
> > first operation starts to use first 36 OST
> > when a second job starts during a first job, it uses second 36 OST
> >
> > But when second job starts after 1st job it uses first 36 OST's which 
causes OST unbalance.
> >
> > Is ther

Re: [lustre-discuss] Second read or write performance

2018-09-21 Thread fırat yılmaz
The problem solved by adding lustre fine tuning parameter  to oss servers
lctl set_param obdfilter.lı-lustrefolder-OST*.brw_size=16

The flock is required by the application running in the filesystem so flock
option is enabled

removing flock decrased the divergence of the flactuations and about %5
performance gain from iml dashboard

Best Regards.

On Sat, Sep 22, 2018 at 12:56 AM Patrick Farrell  wrote:

> Just 300 GiB, actually.  But that's still rather large and could skew
> things depending on OST size.
>
> - Patrick
>
> On 9/21/18, 4:43 PM, "lustre-discuss on behalf of Andreas Dilger" <
> lustre-discuss-boun...@lists.lustre.org on behalf of adil...@whamcloud.com>
> wrote:
>
> On Sep 21, 2018, at 00:43, fırat yılmaz 
> wrote:
> >
> > Hi Andreas,
> > Tests are made with dd, The test folder is created by the related
> application company, i will check that when i have connection. OST's has
> %85-86 free space  and filesystem mounted with flock option, i will ask for
> it to remove and test again.
>
> The "flock" option shouldn't make any difference, unless the
> application is actually doing userspace file locking in the code.
> Definitely "dd" will not be using it.
>
> What does "lfs getstripe" on the first and second file as well as the
> parent directory show, and "lfs df" for the filesystem?
>
> > Read test dd if=/vol1/test_read/dd.test.`hostname` of=/dev/null
> bs=1M count=30
> >
> > Write test dd if=/dev/zero of=/vol1/test_read/dd.test.2.`hostname`
> bs=1M count=30
>
> This is creating a single file of 300TB in size, so that is definitely
> going to skew the space allocation.
>
> Cheers, Andreas
>
> >
> > On Thu, Sep 20, 2018 at 10:57 PM Andreas Dilger <
> adil...@whamcloud.com> wrote:
> > On Sep 20, 2018, at 03:07, fırat yılmaz 
> wrote:
> > >
> > > Hi all,
> > >
> > > OS=Redhat 7.4
> > > Lustre Version: Intel® Manager for Lustre* software 4.0.3.0
> > > İnterconnect: Mellanox OFED, ConnectX-5
> > > 72 OST over 6 OSS with HA
> > > 1mdt and 1 mgt on 2 MDS with HA
> > >
> > > Lustre servers fine tuning parameters:
> > > lctl set_param timeout=600
> > > lctl set_param ldlm_timeout=200
> > > lctl set_param at_min=250
> > > lctl set_param at_max=600
> > > lctl set_param obdfilter.*.read_cache_enable=1
> > > lctl set_param obdfilter.*.writethrough_cache_enable=1
> > > lctl set_param obdfilter.lfs3test-OST*.brw_size=16
> > >
> > > Lustre clients fine tuning parameters:
> > > lctl set_param osc.*.checksums=0
> > > lctl set_param timeout=600
> > > lctl set_param at_min=250
> > > lctl set_param at_max=600
> > > lctl set_param ldlm.namespaces.*.lru_size=2000
> > > lctl set_param osc.*OST*.max_rpcs_in_flight=256
> > > lctl set_param osc.*OST*.max_dirty_mb=1024
> > > lctl set_param osc.*.max_pages_per_rpc=1024
> > > lctl set_param llite.*.max_read_ahead_mb=1024
> > > lctl set_param llite.*.max_read_ahead_per_file_mb=1024
> > >
> > > Mountpoint stripe count:72 stripesize:1M
> > >
> > > I have a 2Pb lustre filesystem, In the benchmark tests i get the
> optimum values for read and write, but when i start a concurrent I/O
> operation, second job throughput stays around 100-200Mb/s. I have tried
> lovering the stripe count to 36 but since the concurrent operations will
> not occur in a way that keeps OST volume inbalance, i think that its not a
> good way to move on, secondly i saw some discussion about turning off flock
> which ended up unpromising.
> > >
> > > As i check the stripe behaviour,
> > > first operation starts to use first 36 OST
> > > when a second job starts during a first job, it uses second 36 OST
> > >
> > > But when second job starts after 1st job it uses first 36 OST's
> which causes OST unbalance.
> > >
> > > Is there a round robin setup that each 36 OST pair used in a round
> robin way?
> > >
> > > And any kind of suggestions are appreciated.
> >
> > Can you please describe what command you are using for testing.
> Lustre is already using round-robin OST allocation by default, so the
> second job should use the next set of 36 OSTs, unless the file layout has
> been specified e.g. to start on OST or the space usage of the OSTs is
> very imbalanced (more than 17% of the remaining free space).
> >
> > Cheers, Andreas
> > ---
> > Andreas Dilger
> > Principal Lustre Architect
> > Whamcloud
> >
> >
> >
> >
> >
> >
> >
>
> Cheers, Andreas
> ---
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
>
>
>
>
>
>
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Second read or write performance

2018-09-21 Thread Patrick Farrell
Just 300 GiB, actually.  But that's still rather large and could skew things 
depending on OST size.

- Patrick

On 9/21/18, 4:43 PM, "lustre-discuss on behalf of Andreas Dilger" 
 
wrote:

On Sep 21, 2018, at 00:43, fırat yılmaz  wrote:
> 
> Hi Andreas,
> Tests are made with dd, The test folder is created by the related 
application company, i will check that when i have connection. OST's has  
%85-86 free space  and filesystem mounted with flock option, i will ask for it 
to remove and test again.

The "flock" option shouldn't make any difference, unless the application is 
actually doing userspace file locking in the code.  Definitely "dd" will not be 
using it.

What does "lfs getstripe" on the first and second file as well as the 
parent directory show, and "lfs df" for the filesystem?

> Read test dd if=/vol1/test_read/dd.test.`hostname` of=/dev/null bs=1M 
count=30
> 
> Write test dd if=/dev/zero of=/vol1/test_read/dd.test.2.`hostname` bs=1M 
count=30

This is creating a single file of 300TB in size, so that is definitely 
going to skew the space allocation.

Cheers, Andreas

> 
> On Thu, Sep 20, 2018 at 10:57 PM Andreas Dilger  
wrote:
> On Sep 20, 2018, at 03:07, fırat yılmaz  wrote:
> >
> > Hi all,
> >
> > OS=Redhat 7.4
> > Lustre Version: Intel® Manager for Lustre* software 4.0.3.0
> > İnterconnect: Mellanox OFED, ConnectX-5
> > 72 OST over 6 OSS with HA
> > 1mdt and 1 mgt on 2 MDS with HA
> >
> > Lustre servers fine tuning parameters:
> > lctl set_param timeout=600
> > lctl set_param ldlm_timeout=200
> > lctl set_param at_min=250
> > lctl set_param at_max=600
> > lctl set_param obdfilter.*.read_cache_enable=1
> > lctl set_param obdfilter.*.writethrough_cache_enable=1
> > lctl set_param obdfilter.lfs3test-OST*.brw_size=16
> >
> > Lustre clients fine tuning parameters:
> > lctl set_param osc.*.checksums=0
> > lctl set_param timeout=600
> > lctl set_param at_min=250
> > lctl set_param at_max=600
> > lctl set_param ldlm.namespaces.*.lru_size=2000
> > lctl set_param osc.*OST*.max_rpcs_in_flight=256
> > lctl set_param osc.*OST*.max_dirty_mb=1024
> > lctl set_param osc.*.max_pages_per_rpc=1024
> > lctl set_param llite.*.max_read_ahead_mb=1024
> > lctl set_param llite.*.max_read_ahead_per_file_mb=1024
> >
> > Mountpoint stripe count:72 stripesize:1M
> >
> > I have a 2Pb lustre filesystem, In the benchmark tests i get the 
optimum values for read and write, but when i start a concurrent I/O operation, 
second job throughput stays around 100-200Mb/s. I have tried lovering the 
stripe count to 36 but since the concurrent operations will not occur in a way 
that keeps OST volume inbalance, i think that its not a good way to move on, 
secondly i saw some discussion about turning off flock which ended up 
unpromising.
> >
> > As i check the stripe behaviour,
> > first operation starts to use first 36 OST
> > when a second job starts during a first job, it uses second 36 OST
> >
> > But when second job starts after 1st job it uses first 36 OST's which 
causes OST unbalance.
> >
> > Is there a round robin setup that each 36 OST pair used in a round 
robin way?
> >
> > And any kind of suggestions are appreciated.
> 
> Can you please describe what command you are using for testing.  Lustre 
is already using round-robin OST allocation by default, so the second job 
should use the next set of 36 OSTs, unless the file layout has been specified 
e.g. to start on OST or the space usage of the OSTs is very imbalanced 
(more than 17% of the remaining free space).
> 
> Cheers, Andreas
> ---
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
> 
> 
> 
> 
> 
> 
> 

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud









___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Second read or write performance

2018-09-21 Thread Andreas Dilger
On Sep 21, 2018, at 00:43, fırat yılmaz  wrote:
> 
> Hi Andreas,
> Tests are made with dd, The test folder is created by the related application 
> company, i will check that when i have connection. OST's has  %85-86 free 
> space  and filesystem mounted with flock option, i will ask for it to remove 
> and test again.

The "flock" option shouldn't make any difference, unless the application is 
actually doing userspace file locking in the code.  Definitely "dd" will not be 
using it.

What does "lfs getstripe" on the first and second file as well as the parent 
directory show, and "lfs df" for the filesystem?

> Read test dd if=/vol1/test_read/dd.test.`hostname` of=/dev/null bs=1M 
> count=30
> 
> Write test dd if=/dev/zero of=/vol1/test_read/dd.test.2.`hostname` bs=1M 
> count=30

This is creating a single file of 300TB in size, so that is definitely going to 
skew the space allocation.

Cheers, Andreas

> 
> On Thu, Sep 20, 2018 at 10:57 PM Andreas Dilger  wrote:
> On Sep 20, 2018, at 03:07, fırat yılmaz  wrote:
> >
> > Hi all,
> >
> > OS=Redhat 7.4
> > Lustre Version: Intel® Manager for Lustre* software 4.0.3.0
> > İnterconnect: Mellanox OFED, ConnectX-5
> > 72 OST over 6 OSS with HA
> > 1mdt and 1 mgt on 2 MDS with HA
> >
> > Lustre servers fine tuning parameters:
> > lctl set_param timeout=600
> > lctl set_param ldlm_timeout=200
> > lctl set_param at_min=250
> > lctl set_param at_max=600
> > lctl set_param obdfilter.*.read_cache_enable=1
> > lctl set_param obdfilter.*.writethrough_cache_enable=1
> > lctl set_param obdfilter.lfs3test-OST*.brw_size=16
> >
> > Lustre clients fine tuning parameters:
> > lctl set_param osc.*.checksums=0
> > lctl set_param timeout=600
> > lctl set_param at_min=250
> > lctl set_param at_max=600
> > lctl set_param ldlm.namespaces.*.lru_size=2000
> > lctl set_param osc.*OST*.max_rpcs_in_flight=256
> > lctl set_param osc.*OST*.max_dirty_mb=1024
> > lctl set_param osc.*.max_pages_per_rpc=1024
> > lctl set_param llite.*.max_read_ahead_mb=1024
> > lctl set_param llite.*.max_read_ahead_per_file_mb=1024
> >
> > Mountpoint stripe count:72 stripesize:1M
> >
> > I have a 2Pb lustre filesystem, In the benchmark tests i get the optimum 
> > values for read and write, but when i start a concurrent I/O operation, 
> > second job throughput stays around 100-200Mb/s. I have tried lovering the 
> > stripe count to 36 but since the concurrent operations will not occur in a 
> > way that keeps OST volume inbalance, i think that its not a good way to 
> > move on, secondly i saw some discussion about turning off flock which ended 
> > up unpromising.
> >
> > As i check the stripe behaviour,
> > first operation starts to use first 36 OST
> > when a second job starts during a first job, it uses second 36 OST
> >
> > But when second job starts after 1st job it uses first 36 OST's which 
> > causes OST unbalance.
> >
> > Is there a round robin setup that each 36 OST pair used in a round robin 
> > way?
> >
> > And any kind of suggestions are appreciated.
> 
> Can you please describe what command you are using for testing.  Lustre is 
> already using round-robin OST allocation by default, so the second job should 
> use the next set of 36 OSTs, unless the file layout has been specified e.g. 
> to start on OST or the space usage of the OSTs is very imbalanced (more 
> than 17% of the remaining free space).
> 
> Cheers, Andreas
> ---
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
> 
> 
> 
> 
> 
> 
> 

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud









signature.asc
Description: Message signed with OpenPGP
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre File System Capability Transition Webinar

2018-09-21 Thread OpenSFS Administration
OpenSFS and DDN will be hosting an online webinar on October 2nd at 11:00 AM 
EDT to discuss DDN's recent acquisition of Intel’s Lustre File System 
Capability and its transition to their new Whamcloud division. Connection 
details are available below. This webinar will discuss the transfer in detail 
and provide an opportunity for community participation. The webinar will be 
recorded and made available for later online viewing. If you will be unable to 
attend live or have any questions, please email ad...@opensfs.org 
  with your questions and feedback.​

 

To join via web:

 

https://bluejeans.com/8655742173

 

To join via phone :

 

1) Dial:

 

+1.408.740.7256 (US (San Jose))

+1.888.240.2560 (US Toll Free)

+1.408.317.9253 (US (Primary, San Jose))

(see all numbers - http://bluejeans.com/numbers)

 

2) Enter Conference ID : 8655742173​​​

 

To join via Room System:

 

Video Conferencing System: bjn.vc -or- 199.48.152.152

Meeting ID : 8655742173

 

Best regards, 

OpenSFS Administration 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre on RHEL 7 with Kernel 4

2018-09-21 Thread Riccardo Veraldi

Hello,

I am running Lustre 2.10.5 on RHEL 7.5 using the kernel 4.4.157 from elrepo.

everything seems working fine. I Ask if anyone else is running kernel 4 
on CENTOS with Lustre, and if this configuration is kind of unsupported 
or not recommended for some reason.


I had an hard time with the latest RHEL 7.5 kernel because Infiniband is 
not working right


https://bugs.centos.org/view.php?id=15193

So just asking if anyone deployed the Lustre server on Kernel 4 and is 
happy with it or if I should not do that.


thanks

Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Robinhood 3.1.4 is available

2018-09-21 Thread LEIBOVICI Thomas

Hi all,

A new version of robinhood is available: robinhood 3.1.4.

It can be downloaded from here: 
https://sourceforge.net/projects/robinhood/files/robinhood/3.1.4/


This version mainly brings changelog performance improvements (measured 
up to 200k records/sec per MDT) and new web interface features.


You can find more details and a recall of improvements since v3.1 in the 
release notes below.


Regards,
Thomas

Release notes
=
3.1.4:
--
* Web gui:
    - new access control criteria: IP address, hostname
    - tasks: scheduled requests, and possiblity to keep result history
    - custom graphs
* Performance improvements:
    - Improved rules-to-SQL conversion engine. Enable faster policy runs.
    - Auto-tune hash table sizes (changelog reader and entry processor).
    - Pipeline tuning to reduce CPU usage and improve ingestion rate.
    - Default linking to jemalloc memory allocator for better performance.
  Note: it is advised to start MariaDB with this allocator too
  to achieve maximal performance.
    - Fix major performance bug in changelog reader (appeared in v3.1.3).
* Configuration files: support %include directives in sub-blocks.
* New parameter to 'common.copy': "mkdir=yes" creates target directories
  prior to the copy operation.
* New action 'common.move' (move an entry from one path/name to another).
* Command copytool 'lhsmtool_cmd': add man page.

3.1.3:
--
* Policies: add matching modes "auto_update_attrs" and "auto_update_all"
* Faster changelog reader shutdown when the process is terminated by a 
signal

* Changelog optimization: drop CREATE/UNLINK and MKDIR/RMDIR changelog pairs
* rbh-find:
    - add "-links" criterion
    - "-ost" accepts OST sets
    - add "-iname" option for case-insensitive name matching
    - support "-not" for links, size and dates
* FS scan optimization (use openat() to walk through the filesystem instead
  of full paths).
* Fix rules-to-SQL conversion

3.1.2:
--
* Implement command rbh-rebind command to assign an archived entry to a 
newly

  created fid (e.g. used for undelete operation).
* Make lhsm undelete more resilient to error cases
* lhsm: add tunable to allow custom/smaller UUID
* REST API:
    - add nagios plugin
    - add graph preview in console plugin
    - add new filters

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org