Re: [lustre-discuss] Rocky 9.2/lustre 2.15.3 client questions

2023-06-23 Thread Andreas Dilger via lustre-discuss
Applying the LU-16626 patch locally should fix the issue, and has no risk since 
it is only fixing a build issue that affects an obscure diagnostic tool.

That said, I've cherry-picked that patch back to b2_15, so it should be 
included into 2.15.4.

https://review.whamcloud.com/51426

Cheers, Andreas

On Jun 23, 2023, at 05:04, Mountford, Christopher J. (Dr.) via lustre-discuss 
 wrote:

Hi,

I'm building the lustre client/kernel modules for our new HPC cluster and have 
a couple of questions:

1) Are there any known issues running lustre 2.15.3 clients and lustre 2.12.9 
servers? I haven't seen anything showstopping on the mailing list or in JIRA 
but wondered if anyone had run into problems.

2) Is it possible to get the dkms kernel rpm to work with Rocky/RHEL 9.2? If I 
try to install the lustre-client-dkms rpm I get the following error:

error: Failed dependencies:
   /usr/bin/python2 is needed by lustre-client-dkms-2.15.3-1.el9.noarch

- Not surprisingly as I understand that python2 is not available for rocky/rhel 
9

I see there is a patch for 2.16 (from LU-16626). Not a major problem as I can 
build kmod-lustre-client rpms for our kernel/ofed, but I would prefer to use 
dkms if possible.

Kind Regards,
Christopher.


Dr. Christopher Mountford,
System Specialist,
RCS,
Digital Services.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL] AWS's FSX-Lustre - Robinhood & MDT Registration

2023-06-23 Thread Paulson McIntyre
> From what I recall AWS FSx Lustre is a manage service that provides your
clients a mounted lustre and zero control over the backend servers
Makes total sense. I'd do the same thing in their shoes.

Sounds like I'd be better off just doing lustre on top of it via EC2 or
whatever provider I want to use.

Thanks!

-Paulson

On Fri, Jun 23, 2023 at 1:47 PM Coffman, Chris  wrote:

> From what I recall AWS FSx Lustre is a manage service that provides your
> clients a mounted lustre and zero control over the backend servers. I
> believe the mdt option you’re attempting to enable requires you be on the
> MDS node you are not able to access.
>
>
>
> If this is functionality you need documenting with AWS is likely the first
> step. They may find the value in providing robinhood functionality for you
> and others.
>
>
>
> -Chris
>
>
>
> *From: *lustre-discuss  on
> behalf of Paulson McIntyre 
> *Date: *Friday, June 23, 2023 at 1:38 PM
> *To: *lustre-discuss@lists.lustre.org 
> *Subject: *[EXTERNAL] [lustre-discuss] AWS's FSX-Lustre - Robinhood & MDT
> Registration
>
> I'm using AWS's FSX's Lustre and am having issues enabling HSM and the
> changelog settings:
>
>
>
> # lctl set_param mdt.64rczbev-MDT.hsm_control=enabled
> error: set_param: param_path
> 'mdt/64rczbev-MDT/hsm_control': No such file or directory
>
>
>
> As best as I can tell the MDTs don't exist or aren't visible. I'm trying
> to figure out if it's a PBKEC or AWS blocks this.
>
>
>
> # lctl list_param \*.64rczbev-MDT\*
> mdc.64rczbev-MDT-mdc-888007eef000
>
>
>
> Normally I'd ask AWS's support but it's a personal project and that's $$$.
>
>
>
> -Paulson
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL] AWS's FSX-Lustre - Robinhood & MDT Registration

2023-06-23 Thread Coffman, Chris via lustre-discuss
>From what I recall AWS FSx Lustre is a manage service that provides your 
>clients a mounted lustre and zero control over the backend servers. I believe 
>the mdt option you’re attempting to enable requires you be on the MDS node you 
>are not able to access.

If this is functionality you need documenting with AWS is likely the first 
step. They may find the value in providing robinhood functionality for you and 
others.

-Chris

From: lustre-discuss  on behalf of 
Paulson McIntyre 
Date: Friday, June 23, 2023 at 1:38 PM
To: lustre-discuss@lists.lustre.org 
Subject: [EXTERNAL] [lustre-discuss] AWS's FSX-Lustre - Robinhood & MDT 
Registration
I'm using AWS's FSX's Lustre and am having issues enabling HSM and the 
changelog settings:

# lctl set_param mdt.64rczbev-MDT.hsm_control=enabled   
  error: set_param: param_path 
'mdt/64rczbev-MDT/hsm_control': No such file or directory

As best as I can tell the MDTs don't exist or aren't visible. I'm trying to 
figure out if it's a PBKEC or AWS blocks this.

# lctl list_param \*.64rczbev-MDT\*
mdc.64rczbev-MDT-mdc-888007eef000

Normally I'd ask AWS's support but it's a personal project and that's $$$.

-Paulson
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] AWS's FSX-Lustre - Robinhood & MDT Registration

2023-06-23 Thread Paulson McIntyre
I'm using AWS's FSX's Lustre and am having issues enabling HSM and the
changelog settings:

# lctl set_param mdt.64rczbev-MDT.hsm_control=enabled
  error: set_param: param_path
'mdt/64rczbev-MDT/hsm_control': No such file or directory

As best as I can tell the MDTs don't exist or aren't visible. I'm trying to
figure out if it's a PBKEC or AWS blocks this.

# lctl list_param \*.64rczbev-MDT\*
mdc.64rczbev-MDT-mdc-888007eef000

Normally I'd ask AWS's support but it's a personal project and that's $$$.

-Paulson
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] No space left on device MDT DoM but not full nor run out of inodes

2023-06-23 Thread Jon Marshall via lustre-discuss
Hi Andreas,

Thanks for getting back to me - we are in the process of expanding the storage 
on this filesystem so I think I'll be pushing for an upgrade instead!

Cheers
Jon

From: Andreas Dilger 
Sent: 22 June 2023 20:00
To: Jon Marshall 
Cc: lustre-discuss@lists.lustre.org 
Subject: Re: [lustre-discuss] No space left on device MDT DoM but not full nor 
run out of inodes

There is a bug in the grant accounting that leaks under certain operations 
(maybe O_DIRECT?).  It is resolved by unmounting and remounting the clients, 
and/or upgrading.  There was a thread about it on lustre-discuss a couple of 
years ago.

Cheers, Andreas

On Jun 20, 2023, at 09:32, Jon Marshall via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:

Sorry, typo in the version number - the version we are actually running is 
2.12.6

From: Jon Marshall
Sent: 20 June 2023 16:18
To: lustre-discuss@lists.lustre.org 
mailto:lustre-discuss@lists.lustre.org>>
Subject: No space left on device MDT DoM but not full nor run out of inodes

Hi,

We've been running lustre 2.15.1 in production for over a year and recently 
decided to enable PFL with DoM on our filesystem. Things have been fine up 
until last week, when users started reporting issues copying files, 
specifically "No space left on device". The MDT is running ldiskfs as the 
backend.

I've searched through the mailing list and found a couple of people reporting 
similar problems, which prompted me to check the inode allocation, which is 
currently:

UUID  Inodes   IUsed   IFree IUse% Mounted on
scratchc-MDT_UUID   62449254471144384   553348160  12% 
/mnt/scratchc[MDT:0]
scratchc-OST_UUID577125792448993433222645  43% 
/mnt/scratchc[OST:0]
scratchc-OST0001_UUID571140642450587632608188  43% 
/mnt/scratchc[OST:1]

filesystem_summary:1369752177114438465830833  52% /mnt/scratchc

So, nowhere near full - the disk usage is a little higher:

UUID   bytesUsed   Available Use% Mounted on
scratchc-MDT_UUID  882.1G  451.9G  355.8G  56% 
/mnt/scratchc[MDT:0]
scratchc-OST_UUID   53.6T   22.7T   31.0T  43% 
/mnt/scratchc[OST:0]
scratchc-OST0001_UUID   53.6T   23.0T   30.6T  43% 
/mnt/scratchc[OST:1]

filesystem_summary:   107.3T   45.7T   61.6T  43% /mnt/scratchc

But not full either! The errors are accompanied in the logs by:

LustreError: 15450:0:(tgt_grant.c:463:tgt_grant_space_left()) scratchc-MDT: 
cli ba0195c7-1ab4-4f7c-9e28-8689478f5c17/9e331e231c00 left 82586337280 < 
tot_grant 82586681321 unstable 0 pending 0 dirty 1044480
LustreError: 15450:0:(tgt_grant.c:463:tgt_grant_space_left()) Skipped 33050 
previous similar messages

For reference the DoM striping we're using is:

  lcm_layout_gen:0
  lcm_mirror_count:  1
  lcm_entry_count:   3
lcme_id: N/A
lcme_mirror_id:  N/A
lcme_flags:  0
lcme_extent.e_start: 0
lcme_extent.e_end:   1048576
  stripe_count:  0   stripe_size:   1048576   pattern:   mdt
   stripe_offset: -1

lcme_id: N/A
lcme_mirror_id:  N/A
lcme_flags:  0
lcme_extent.e_start: 1048576
lcme_extent.e_end:   1073741824
  stripe_count:  1   stripe_size:   1048576   pattern:   raid0  
 stripe_offset: -1

lcme_id: N/A
lcme_mirror_id:  N/A
lcme_flags:  0
lcme_extent.e_start: 1073741824
lcme_extent.e_end:   EOF
  stripe_count:  -1   stripe_size:   1048576   pattern:   raid0 
  stripe_offset: -1

So the first 1MB on the MDT.

My question is obviously what is causing these errors? I'm not massively 
familiar with Lustre internals, so any pointers on where to look would be 
greatly appreciated!

Cheers
Jon

Jon Marshall
High Performance Computing Specialist



IT and Scientific Computing Team



Cancer Research UK Cambridge Institute
Li Ka Shing Centre | Robinson Way | Cambridge | CB2 0RE
Web | 
Facebook | 
Twitter



[Description: CRI Logo]

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Rocky 9.2/lustre 2.15.3 client questions

2023-06-23 Thread Mountford, Christopher J. (Dr.) via lustre-discuss
Hi,

I'm building the lustre client/kernel modules for our new HPC cluster and have 
a couple of questions:

1) Are there any known issues running lustre 2.15.3 clients and lustre 2.12.9 
servers? I haven't seen anything showstopping on the mailing list or in JIRA 
but wondered if anyone had run into problems.

2) Is it possible to get the dkms kernel rpm to work with Rocky/RHEL 9.2? If I 
try to install the lustre-client-dkms rpm I get the following error:

error: Failed dependencies:
/usr/bin/python2 is needed by lustre-client-dkms-2.15.3-1.el9.noarch

- Not surprisingly as I understand that python2 is not available for rocky/rhel 
9

I see there is a patch for 2.16 (from LU-16626). Not a major problem as I can 
build kmod-lustre-client rpms for our kernel/ofed, but I would prefer to use 
dkms if possible.

Kind Regards,
Christopher.


Dr. Christopher Mountford,
System Specialist,
RCS,
Digital Services.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org