Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only

2023-06-21 Thread Mike Mosley via lustre-discuss
Rick, Thanks we are going to try some of these suggestions later this evening or tomorrow. We are currently backing up the mdt (as described in the Lustre manual). I will post further once we get there. THanks for the suggestions. Mike On Wed, Jun 21, 2023 at 4:32 PM Mohr, Rick wrote: >

Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only

2023-06-21 Thread Mohr, Rick via lustre-discuss
Mike, On the off chance that the recovery process is causing the issue, you could try mounting the mdt with the "abort_recov" option and see if the behavior changes. --Rick On 6/21/23, 2:33 PM, "lustre-discuss on behalf of Jeff Johnson" mailto:lustre-discuss-boun...@lists.lustre.org> on

Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only

2023-06-21 Thread Jeff Johnson
Maybe someone else in the list can add clarity but I don't believe a recovery process on mount would keep the MDS read-only or trigger that trace. Something else may be going on. I would start from the ground up. Bring your servers up, unmounted. Ensure lnet is loaded and configured properly.

Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only

2023-06-21 Thread Mike Mosley via lustre-discuss
Jeff, At this point we have the OSS shutdown. We were coming back from. full outage and so we are trying to get the MDS up before starting to bring up the OSS. Mike On Wed, Jun 21, 2023 at 2:15 PM Jeff Johnson wrote: > Mike, > > Have you made sure the the o2ib interface on all of your Lustre

Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only

2023-06-21 Thread Jeff Johnson
Mike, Have you made sure the the o2ib interface on all of your Lustre servers (MDS & OSS) are functioning properly? Are you able to `lctl ping x.x.x.x@o2ib` successfully between MDS and OSS nodes? --Jeff On Wed, Jun 21, 2023 at 10:08 AM Mike Mosley via lustre-discuss <

Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only

2023-06-21 Thread Mike Mosley via lustre-discuss
Rick, 172.16.100.4 is the IB address of one of the OSS servers.I believe the mgt and mdt0 are the same target. My understanding is that we have a single instanceof the MGT which is on the first MDT server i.e. it was created via a comand similar to: # mkfs.lustre --fsname=scratch --index=0

Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only

2023-06-21 Thread Mike Mosley via lustre-discuss
Hi Rick, The MGS/MDS are combined. The output I posted is from the primary. THanks, Mike On Wed, Jun 21, 2023 at 12:27 PM Mohr, Rick wrote: > Mike, > > It looks like the mds server is having a problem contacting the mgs > server. I'm guessing the mgs is a separate host? I would start by

Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only

2023-06-21 Thread Mohr, Rick via lustre-discuss
Mike, It looks like the mds server is having a problem contacting the mgs server. I'm guessing the mgs is a separate host? I would start by looking for possible network problems that might explain the LNet timeouts. You can try using "lctl ping" to test the LNet connection between nodes,

Re: [lustre-discuss] [EXTERNAL] I/O error on lctl ping although ibping successful

2023-06-21 Thread Youssef Eldakar via lustre-discuss
Thanks, Rick, for that suggestion. TCP ping between a problematic host and the MDS indeed does not go through. Not exactly sure what to investigate next, but that gives me somewhere to start... - Youssef On Tue, Jun 20, 2023 at 7:00 PM Mohr, Rick via lustre-discuss <

[lustre-discuss] MDTs will only mount read only

2023-06-21 Thread Mike Mosley via lustre-discuss
Greetings, We have experienced some type of issue that is causing both of our MDS servers to only be able to mount the mdt device in read only mode. Here are some of the error messages we are seeing in the log files below. We lost our Lustre expert a while back and we are not sure how to

Re: [lustre-discuss] I/O error on lctl ping although ibping successful

2023-06-21 Thread John Hearns via lustre-discuss
Have you run ibdiagnet? Also you want to run ibqueryerrors On Tue, 20 Jun 2023, 17:11 Youssef Eldakar via lustre-discuss, < lustre-discuss@lists.lustre.org> wrote: > In a cluster having ~100 Lustre clients (compute nodes) connected together > with the MDS and OSS over Intel True Scale InfiniBand