Re: [lustre-discuss] Error messages (ex: not available for connect from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

2023-12-07 Thread Audet, Martin via lustre-discuss
Thanks Andreas and Aurélien for your answers. They makes us confident that we 
are on the right track for our cluster update !


Also I have noticed that 2.15.4-RC1 was released two weeks ago, can we expect 
2.15.4 to be ready by the end of the year ?


Regards,


Martin


From: Andreas Dilger 
Sent: December 7, 2023 6:02 AM
To: Aurelien Degremont
Cc: Audet, Martin; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Error messages (ex: not available for connect 
from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

***Attention*** This email originated from outside of the NRC. ***Attention*** 
Ce courriel provient de l'extérieur du CNRC.

Aurelien,
there have beeen a number of questions about this message.

> Lustre: lustrevm-OST0001: deleting orphan objects from 0x0:227 to 0x0:513

This is not marked LustreError, so it is just an advisory message.

This can sometimes be useful for debugging issues related to MDT->OST 
connections.
It is already printed with D_INFO level, so the lowest printk level available.
Would rewording the message make it more clear that this is a normal situation
when the MDT and OST are establishing connections?

Cheers, Andreas

On Dec 5, 2023, at 02:13, Aurelien Degremont  wrote:
>
> > Now what is the messages about "deleting orphaned objects" ? Is it normal 
> > also ?
>
> Yeah, this is kind of normal, and I'm even thinking we should lower the 
> message verbosity...
> Andreas, do you agree that could become a simple CDEBUG(D_HA, ...) instead of 
> LCONSOLE(D_INFO, ...)?
>
>
> Aurélien
>
> Audet, Martin wrote on lundi 4 décembre 2023 20:26:
>> Hello Andreas,
>>
>> Thanks for your response. Happy to learn that the "errors" I was reporting 
>> aren't really errors.
>>
>> I now understand that the 3 messages about LDISKFS were only normal messages 
>> resulting from mounting the file systems (I was fooled by vim showing this 
>> message in red, like important error messages, but this is simply a false 
>> positive result of its syntax highlight rules probably triggered by the 
>> "errors=" string which is only a mount option...).
>>
>> Now what is the messages about "deleting orphaned objects" ? Is it normal 
>> also ? We boot the clients VMs always after the server is ready and we 
>> shutdown clients cleanly well before the vlmf Lustre server is (also 
>> cleanly) shutdown. It is a sign of corruption ? How come this happen if 
>> shutdowns are clean ?
>>
>> Thanks (and sorry for the beginners questions),
>>
>> Martin
>>
>> Andreas Dilger  wrote on December 4, 2023 5:25 AM:
>>> It wasn't clear from your rail which message(s) are you concerned about?  
>>> These look like normal mount message(s) to me.
>>>
>>> The "error" is pretty normal, it just means there were multiple services 
>>> starting at once and one wasn't yet ready for the other.
>>>
>>>  LustreError: 137-5: lustrevm-MDT_UUID: not available for 
>>> connect
>>>  from 0@lo (no target). If you are running an HA pair check that 
>>> the target
>>> is mounted on the other server.
>>>
>>> It probably makes sense to quiet this message right at mount time to avoid 
>>> this.
>>>
>>> Cheers, Andreas
>>>
 On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss 
  wrote:

 
 Hello Lustre community,

 Have someone ever seen messages like these on in "/var/log/messages" on a 
 Lustre server ?

 Dec  1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1
 Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with 
 ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
 Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with 
 ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
 Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with 
 ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
 Dec  1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT_UUID: 
 not available for connect from 0@lo (no target). If you are running an HA 
 pair check that the target is mounted on the other server.
 Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery 
 not enabled, recovery window 300-900
 Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan 
 objects from 0x0:227 to 0x0:513

 This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9 
 VM hosted on a VMware) playing the role of both MGS and OSS (it hosts an 
 MDT two OST using "virtual" disks). We chose LDISKFS and not ZFS. Note 
 that this happens at every boot, well before the clients (AlmaLinux 9.3 or 
 8.9 VMs) connect and even when the clients are powered off. The network 
 connecting the clients and the server is a "virtual" 10GbE network (of 
 course there is no virtual IB). Also we had the same messages previously 
 

Re: [lustre-discuss] Error messages (ex: not available for connect from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

2023-12-07 Thread Andreas Dilger via lustre-discuss
Aurelien,
there have beeen a number of questions about this message.

> Lustre: lustrevm-OST0001: deleting orphan objects from 0x0:227 to 0x0:513

This is not marked LustreError, so it is just an advisory message.

This can sometimes be useful for debugging issues related to MDT->OST 
connections.
It is already printed with D_INFO level, so the lowest printk level available.
Would rewording the message make it more clear that this is a normal situation
when the MDT and OST are establishing connections?

Cheers, Andreas

On Dec 5, 2023, at 02:13, Aurelien Degremont  wrote:
> 
> > Now what is the messages about "deleting orphaned objects" ? Is it normal 
> > also ?
> 
> Yeah, this is kind of normal, and I'm even thinking we should lower the 
> message verbosity...
> Andreas, do you agree that could become a simple CDEBUG(D_HA, ...) instead of 
> LCONSOLE(D_INFO, ...)?
> 
> 
> Aurélien
> 
> Audet, Martin wrote on lundi 4 décembre 2023 20:26:
>> Hello Andreas,
>> 
>> Thanks for your response. Happy to learn that the "errors" I was reporting 
>> aren't really errors.
>> 
>> I now understand that the 3 messages about LDISKFS were only normal messages 
>> resulting from mounting the file systems (I was fooled by vim showing this 
>> message in red, like important error messages, but this is simply a false 
>> positive result of its syntax highlight rules probably triggered by the 
>> "errors=" string which is only a mount option...).
>> 
>> Now what is the messages about "deleting orphaned objects" ? Is it normal 
>> also ? We boot the clients VMs always after the server is ready and we 
>> shutdown clients cleanly well before the vlmf Lustre server is (also 
>> cleanly) shutdown. It is a sign of corruption ? How come this happen if 
>> shutdowns are clean ?
>> 
>> Thanks (and sorry for the beginners questions),
>> 
>> Martin
>> 
>> Andreas Dilger  wrote on December 4, 2023 5:25 AM:
>>> It wasn't clear from your rail which message(s) are you concerned about?  
>>> These look like normal mount message(s) to me. 
>>> 
>>> The "error" is pretty normal, it just means there were multiple services 
>>> starting at once and one wasn't yet ready for the other. 
>>> 
>>>  LustreError: 137-5: lustrevm-MDT_UUID: not available for 
>>> connect
>>>  from 0@lo (no target). If you are running an HA pair check that 
>>> the target
>>> is mounted on the other server.
>>> 
>>> It probably makes sense to quiet this message right at mount time to avoid 
>>> this. 
>>> 
>>> Cheers, Andreas
>>> 
 On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss 
  wrote:
 
 
 Hello Lustre community,
 
 Have someone ever seen messages like these on in "/var/log/messages" on a 
 Lustre server ?
 
 Dec  1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1
 Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with 
 ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
 Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with 
 ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
 Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with 
 ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
 Dec  1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT_UUID: 
 not available for connect from 0@lo (no target). If you are running an HA 
 pair check that the target is mounted on the other server.
 Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery 
 not enabled, recovery window 300-900
 Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan 
 objects from 0x0:227 to 0x0:513
 
 This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9 
 VM hosted on a VMware) playing the role of both MGS and OSS (it hosts an 
 MDT two OST using "virtual" disks). We chose LDISKFS and not ZFS. Note 
 that this happens at every boot, well before the clients (AlmaLinux 9.3 or 
 8.9 VMs) connect and even when the clients are powered off. The network 
 connecting the clients and the server is a "virtual" 10GbE network (of 
 course there is no virtual IB). Also we had the same messages previously 
 with Lustre 2.15.3 using an AlmaLinux 8.8 server and AlmaLinux 8.8 / 9.2 
 clients (also using VMs). Note also that we compile ourselves the Lustre 
 RPMs from the sources from the git repository. We also chose to use a 
 patched kernel. Our build procedure for RPMs seems to work well because 
 our real cluster run fine on CentOS 7.9 with Lustre 2.12.9 and IB (MOFED) 
 networking.
 
 So has anyone seen these messages ?
 
 Are they problematic ? If yes, how do we avoid them ?
 
 We would like to make sure our small test system using VMs works well 
 before we upgrade our real cluster.
 
 Thanks in advance !
 

Re: [lustre-discuss] Error messages (ex: not available for connect from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

2023-12-05 Thread Aurelien Degremont via lustre-discuss
> Now what is the messages about "deleting orphaned objects" ? Is it normal 
> also ?

Yeah, this is kind of normal, and I'm even thinking we should lower the message 
verbosity...
Andreas, do you agree that could become a simple CDEBUG(D_HA, ...) instead of 
LCONSOLE(D_INFO, ...)?


Aurélien

De : lustre-discuss  de la part de 
Audet, Martin via lustre-discuss 
Envoyé : lundi 4 décembre 2023 20:26
À : Andreas Dilger 
Cc : lustre-discuss@lists.lustre.org 
Objet : Re: [lustre-discuss] Error messages (ex: not available for connect from 
0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

External email: Use caution opening links or attachments


Hello Andrea,


Thanks for your response. Happy to learn that the "errors" I was reporting 
aren't really errors.


I now understand that the 3 messages about LDISKFS were only normal messages 
resulting from mounting the file systems (I was fooled by vim showing this 
message in red, like important error messages, but this is simply a false 
positive result of its syntax highlight rules probably triggered by the 
"errors=" string which is only a mount option...).

Now what is the messages about "deleting orphaned objects" ? Is it normal also 
? We boot the clients VMs always after the server is ready and we shutdown 
clients cleanly well before the vlmf Lustre server is (also cleanly) shutdown. 
It is a sign of corruption ? How come this happen if shutdowns are clean ?

Thanks (and sorry for the beginners questions),

Martin


From: Andreas Dilger 
Sent: December 4, 2023 5:25 AM
To: Audet, Martin
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Error messages (ex: not available for connect 
from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1


***Attention*** This email originated from outside of the NRC. ***Attention*** 
Ce courriel provient de l'extérieur du CNRC.

It wasn't clear from your rail which message(s) are you concerned about?  These 
look like normal mount message(s) to me.

The "error" is pretty normal, it just means there were multiple services 
starting at once and one wasn't yet ready for the other.

 LustreError: 137-5: lustrevm-MDT_UUID: not available for connect
 from 0@lo (no target). If you are running an HA pair check that the 
target
is mounted on the other server.

It probably makes sense to quiet this message right at mount time to avoid this.

Cheers, Andreas

On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss 
 wrote:



Hello Lustre community,


Have someone ever seen messages like these on in "/var/log/messages" on a 
Lustre server ?


Dec  1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with ordered 
data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with ordered 
data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with ordered 
data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT_UUID: not 
available for connect from 0@lo (no target). If you are running an HA pair 
check that the target is mounted on the other server.
Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery not 
enabled, recovery window 300-900
Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan objects 
from 0x0:227 to 0x0:513


This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9 VM 
hosted on a VMware) playing the role of both MGS and OSS (it hosts an MDT two 
OST using "virtual" disks). We chose LDISKFS and not ZFS. Note that this 
happens at every boot, well before the clients (AlmaLinux 9.3 or 8.9 VMs) 
connect and even when the clients are powered off. The network connecting the 
clients and the server is a "virtual" 10GbE network (of course there is no 
virtual IB). Also we had the same messages previously with Lustre 2.15.3 using 
an AlmaLinux 8.8 server and AlmaLinux 8.8 / 9.2 clients (also using VMs). Note 
also that we compile ourselves the Lustre RPMs from the sources from the git 
repository. We also chose to use a patched kernel. Our build procedure for RPMs 
seems to work well because our real cluster run fine on CentOS 7.9 with Lustre 
2.12.9 and IB (MOFED) networking.

So has anyone seen these messages ?

Are they problematic ? If yes, how do we avoid them ?

We would like to make sure our small test system using VMs works well before we 
upgrade our real cluster.

Thanks in advance !

Martin Audet

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list

Re: [lustre-discuss] Error messages (ex: not available for connect from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

2023-12-04 Thread Audet, Martin via lustre-discuss
Hello Andrea,


Thanks for your response. Happy to learn that the "errors" I was reporting 
aren't really errors.


I now understand that the 3 messages about LDISKFS were only normal messages 
resulting from mounting the file systems (I was fooled by vim showing this 
message in red, like important error messages, but this is simply a false 
positive result of its syntax highlight rules probably triggered by the 
"errors=" string which is only a mount option...).

Now what is the messages about "deleting orphaned objects" ? Is it normal also 
? We boot the clients VMs always after the server is ready and we shutdown 
clients cleanly well before the vlmf Lustre server is (also cleanly) shutdown. 
It is a sign of corruption ? How come this happen if shutdowns are clean ?

Thanks (and sorry for the beginners questions),

Martin


From: Andreas Dilger 
Sent: December 4, 2023 5:25 AM
To: Audet, Martin
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Error messages (ex: not available for connect 
from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1


***Attention*** This email originated from outside of the NRC. ***Attention*** 
Ce courriel provient de l'extérieur du CNRC.

It wasn't clear from your rail which message(s) are you concerned about?  These 
look like normal mount message(s) to me.

The "error" is pretty normal, it just means there were multiple services 
starting at once and one wasn't yet ready for the other.

 LustreError: 137-5: lustrevm-MDT_UUID: not available for connect
 from 0@lo (no target). If you are running an HA pair check that the 
target
is mounted on the other server.

It probably makes sense to quiet this message right at mount time to avoid this.

Cheers, Andreas

On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss 
 wrote:



Hello Lustre community,


Have someone ever seen messages like these on in "/var/log/messages" on a 
Lustre server ?


Dec  1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with ordered 
data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with ordered 
data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with ordered 
data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT_UUID: not 
available for connect from 0@lo (no target). If you are running an HA pair 
check that the target is mounted on the other server.
Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery not 
enabled, recovery window 300-900
Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan objects 
from 0x0:227 to 0x0:513


This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9 VM 
hosted on a VMware) playing the role of both MGS and OSS (it hosts an MDT two 
OST using "virtual" disks). We chose LDISKFS and not ZFS. Note that this 
happens at every boot, well before the clients (AlmaLinux 9.3 or 8.9 VMs) 
connect and even when the clients are powered off. The network connecting the 
clients and the server is a "virtual" 10GbE network (of course there is no 
virtual IB). Also we had the same messages previously with Lustre 2.15.3 using 
an AlmaLinux 8.8 server and AlmaLinux 8.8 / 9.2 clients (also using VMs). Note 
also that we compile ourselves the Lustre RPMs from the sources from the git 
repository. We also chose to use a patched kernel. Our build procedure for RPMs 
seems to work well because our real cluster run fine on CentOS 7.9 with Lustre 
2.12.9 and IB (MOFED) networking.

So has anyone seen these messages ?

Are they problematic ? If yes, how do we avoid them ?

We would like to make sure our small test system using VMs works well before we 
upgrade our real cluster.

Thanks in advance !

Martin Audet

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Error messages (ex: not available for connect from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

2023-12-04 Thread Backer via lustre-discuss
I do not want to hijack this thread but just checking here before I start
another new thread. I am getting similar messages randomly. The IP involved
here is one Client IP. Getting messages from multiple OSS about multiple
OST at the same time and stops. These types of messages appear
occasionally on multiple OSS, and all these are related to one client at a
time.  Wondering if it is one client related issue as this FS has 100s of
clients and only one client reports at a time. Unfortunately, there is no
easy way for me to figure out if the specified client had an access issue
around the time frame mentioned in the log (no access to clients).

Dec  4 18:05:27 oss010 kernel: LustreError: 137-5: fs-OST00b0_UUID: not
available for connect from @tcp1 (no target). If you are running
an HA pair check that the target is mounted on the other server.

On Mon, 4 Dec 2023 at 05:27, Andreas Dilger via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> It wasn't clear from your rail which message(s) are you concerned about?
> These look like normal mount message(s) to me.
>
> The "error" is pretty normal, it just means there were multiple services
> starting at once and one wasn't yet ready for the other.
>
>  LustreError: 137-5: lustrevm-MDT_UUID: not available for
> connect
>  from 0@lo (no target). If you are running an HA pair check that
> the target
> is mounted on the other server.
>
> It probably makes sense to quiet this message right at mount time to avoid
> this.
>
> Cheers, Andreas
>
> On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss <
> lustre-discuss@lists.lustre.org> wrote:
>
> 
>
> Hello Lustre community,
>
>
> Have someone ever seen messages like these on in "/var/log/messages" on a
> Lustre server ?
>
> Dec  1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1
> Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with
> ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
> Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with
> ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
> Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with
> ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
> Dec  1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT_UUID:
> not available for connect from 0@lo (no target). If you are running an HA
> pair check that the target is mounted on the other server.
> Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery
> not enabled, recovery window 300-900
> Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan
> objects from 0x0:227 to 0x0:513
>
> This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9
> VM hosted on a VMware) playing the role of both MGS and OSS (it hosts an
> MDT two OST using "virtual" disks). We chose LDISKFS and not ZFS. Note that
> this happens at every boot, well before the clients (AlmaLinux 9.3 or 8.9
> VMs) connect and even when the clients are powered off. The network
> connecting the clients and the server is a "virtual" 10GbE network (of
> course there is no virtual IB). Also we had the same messages previously
> with Lustre 2.15.3 using an AlmaLinux 8.8 server and AlmaLinux 8.8 / 9.2
> clients (also using VMs). Note also that we compile ourselves the Lustre
> RPMs from the sources from the git repository. We also chose to use a
> patched kernel. Our build procedure for RPMs seems to work well because
> our real cluster run fine on CentOS 7.9 with Lustre 2.12.9 and IB (MOFED)
> networking.
>
> So has anyone seen these messages ?
>
> Are they problematic ? If yes, how do we avoid them ?
>
> We would like to make sure our small test system using VMs works well
> before we upgrade our real cluster.
>
> Thanks in advance !
>
> Martin Audet
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Error messages (ex: not available for connect from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

2023-12-04 Thread Andreas Dilger via lustre-discuss
It wasn't clear from your rail which message(s) are you concerned about?  These 
look like normal mount message(s) to me.

The "error" is pretty normal, it just means there were multiple services 
starting at once and one wasn't yet ready for the other.

 LustreError: 137-5: lustrevm-MDT_UUID: not available for connect
 from 0@lo (no target). If you are running an HA pair check that the 
target
is mounted on the other server.

It probably makes sense to quiet this message right at mount time to avoid this.

Cheers, Andreas

On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss 
 wrote:



Hello Lustre community,


Have someone ever seen messages like these on in "/var/log/messages" on a 
Lustre server ?


Dec  1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with ordered 
data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with ordered 
data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with ordered 
data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT_UUID: not 
available for connect from 0@lo (no target). If you are running an HA pair 
check that the target is mounted on the other server.
Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery not 
enabled, recovery window 300-900
Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan objects 
from 0x0:227 to 0x0:513


This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9 VM 
hosted on a VMware) playing the role of both MGS and OSS (it hosts an MDT two 
OST using "virtual" disks). We chose LDISKFS and not ZFS. Note that this 
happens at every boot, well before the clients (AlmaLinux 9.3 or 8.9 VMs) 
connect and even when the clients are powered off. The network connecting the 
clients and the server is a "virtual" 10GbE network (of course there is no 
virtual IB). Also we had the same messages previously with Lustre 2.15.3 using 
an AlmaLinux 8.8 server and AlmaLinux 8.8 / 9.2 clients (also using VMs). Note 
also that we compile ourselves the Lustre RPMs from the sources from the git 
repository. We also chose to use a patched kernel. Our build procedure for RPMs 
seems to work well because our real cluster run fine on CentOS 7.9 with Lustre 
2.12.9 and IB (MOFED) networking.

So has anyone seen these messages ?

Are they problematic ? If yes, how do we avoid them ?

We would like to make sure our small test system using VMs works well before we 
upgrade our real cluster.

Thanks in advance !

Martin Audet

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org