[lustre-discuss] Unresponsiveness of OSS and Directory Listing Hang-up

2023-05-18 Thread Jane Liu via lustre-discuss
Hi, We have recently upgraded our Lustre servers to run on RHEL 8.7, along with Lustre 2.15.2. Despite running smoothly for several weeks, we have encountered an issue that is same as the one reported on this webpage: https://jira.whamcloud.com/browse/LU-10697. Although the Lustre version des

Re: [lustre-discuss] mlx5 errors on oss

2023-05-18 Thread Nehring, Shane R [LAS] via lustre-discuss
We probably will go that way ultimately. I was somewhat concerned about compatibility on nodes that have adapters for both fabrics (probably not even a problem). On Thu, 2023-05-18 at 16:03 +, Andreas Dilger wrote: > I can't comment on the specific network issue, but in general it is far bette

Re: [lustre-discuss] mlx5 errors on oss

2023-05-18 Thread Nehring, Shane R [LAS] via lustre-discuss
That was helpful, thank you. In our case it's looking like it was a client that was suffering from issues with the iommu, we were seeing identical AMD-Vi errors on that client. Once this client was rebooted the errors stopped on the servers. I've set iommu=off since we don't actually need it enabl

Re: [lustre-discuss] mlx5 errors on oss

2023-05-18 Thread Kumar, Amit via lustre-discuss
I had similar issue; it was apparently not a lustre issue for us. In addition to the entries, you see below we also saw "AMD-Vi: Event ... IO_PAGE_FAULT " in the logs. Setting iommu=pt helped us. Hope that helps. Thank you, Amit -Original Message- From: lustre-discuss On Behalf Of

Re: [lustre-discuss] mlx5 errors on oss

2023-05-18 Thread Andreas Dilger via lustre-discuss
I can't comment on the specific network issue, but in general it is far better to use the MOFED drivers than the in-kernel ones. Cheers, Andreas > On May 18, 2023, at 09:08, Nehring, Shane R [LAS] via lustre-discuss > wrote: > > Hello all, > > We recently added infiniband to our cluster an

[lustre-discuss] mlx5 errors on oss

2023-05-18 Thread Nehring, Shane R [LAS] via lustre-discuss
Hello all, We recently added infiniband to our cluster and are in the process of testing it with lustre. We're running the distro provided drivers for the mellanox cards with the latest firmware. Overnight we started seeing the following errors on a few oss: infiniband mlx5_0: dump_cqe:272:(pid 4

Re: [lustre-discuss] MDT expansion

2023-05-18 Thread Peter Grandi via lustre-discuss
> * Add a new MDT for DOM associated with a specific directory. Just point out that this is a rather safe and perhaps preferable choice: one MDT without DOM, one with DOM, if setting up a new MDT/MDS is possible. I guess that it would be possible to have two MDTs in the same MDS, but I haven'

Re: [lustre-discuss] ZFS zpool/filesystem operations while mounted with '-t lustre'

2023-05-18 Thread Peter Grandi via lustre-discuss
You only need to stop lustre if you are planning to decommission the old hardware and switch over entirely to the new hardware. In that case, stopping lustre is needed to ensure that no new content is created on the mdt during the final sync. Thanks, that was my guess, as that that presentat

Re: [lustre-discuss] [EXTERNAL] Re: ZFS zpool/filesystem operations while mounted with '-t lustre'

2023-05-18 Thread Mohr, Rick via lustre-discuss
On 5/18/23, 10:15 AM, "lustre-discuss on behalf of Peter Grandi via lustre-discuss" mailto:lustre-discuss-boun...@lists.lustre.org> on behalf of lustre-discuss@lists.lustre.org > wrote: I was indeed reading that but I was a bit hesitant because the

Re: [lustre-discuss] ZFS zpool/filesystem operations while mounted with '-t lustre'

2023-05-18 Thread Peter Grandi via lustre-discuss
You might want to take a look at this: https://www.opensfs.org/wp-content/uploads/2017/06/Wed06-CroweTom-lug17-ost_data_migration_using_ZFS.pdf I was indeed reading that but I was a bit hesitant because the "zpool"/"zfs" operations are bracketed by 'service lustre stop ...'/'service lustre st

Re: [lustre-discuss] [EXTERNAL] ZFS zpool/filesystem operations while mounted with '-t lustre'

2023-05-18 Thread Mohr, Rick via lustre-discuss
Peter, You might want to take a look at this: https://www.opensfs.org/wp-content/uploads/2017/06/Wed06-CroweTom-lug17-ost_data_migration_using_ZFS.pdf It's a few years old, but it shows how IU used zfs send/receive to copy data from osts. I worked with someone several years ago to do basically

Re: [lustre-discuss] MDT expansion

2023-05-18 Thread Peter Grandi via lustre-discuss
> I want to use Data on Metadata. Before DOM I need to expand Metadata pool size. Backend file System is ZFS. > What is the best strategy for expansion of MDT pool? Somewhat approximately, Lustre (like most other shared filesystem types) does different layouts by *directory*. So you actuall

[lustre-discuss] ZFS zpool/filesystem operations while mounted with '-t lustre'

2023-05-18 Thread Peter Grandi via lustre-discuss
I have a Lustre 2.15.2 instance "temp01" on ZFS 2.1.5 (on EL8), and I just want to backup the MDT of the instance (I am mirroring the data on two separate "pools" of servers). The "zpool" is called "temp01_mdt_000" and so is the filesystem, so the '/etc/fstab' mount line is (I have set legacy

[lustre-discuss] MDT expansion

2023-05-18 Thread Taner KARAGÖL via lustre-discuss
UNCLASSIFIED Hi folks; I want to use Data on Metadata. Before DOM I need to expand Metadata pool size. Backend file System is ZFS. What is the best strategy for expansion of MDT pool? ZFS supports online expansion. Do you recommend online expansion or do you recommend to stop Lustre beforeha