Re: [lustre-discuss] [EXTERNAL] Re: Lustre Client Lockup Under Buffered I/O (2.14/2.15)

2022-01-20 Thread Ellis Wilson via lustre-discuss
Thanks for facilitating a login for me Peter.  The bug with all logs and info I 
could think to include has been opened here:

https://jira.whamcloud.com/browse/LU-15468

I'm going to keep digging on my end, but if anybody has any other bright ideas 
or experiments they'd like me to try, don't hesitate to say so here or in the 
bug.

From: Peter Jones 
Sent: Thursday, January 20, 2022 9:28 AM
To: Ellis Wilson ; Raj ; 
Patrick Farrell 
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] [EXTERNAL] Re: Lustre Client Lockup Under 
Buffered I/O (2.14/2.15)

You don't often get email from 
pjo...@whamcloud.com. Learn why this is 
important
Ellis

JIRA accounts can be requested from 
i...@whamcloud.com

Peter

From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Ellis Wilson via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>>
Reply-To: Ellis Wilson 
mailto:elliswil...@microsoft.com>>
Date: Thursday, January 20, 2022 at 6:20 AM
To: Raj mailto:rajgau...@gmail.com>>, Patrick Farrell 
mailto:pfarr...@ddn.com>>
Cc: "lustre-discuss@lists.lustre.org" 
mailto:lustre-discuss@lists.lustre.org>>
Subject: Re: [lustre-discuss] [EXTERNAL] Re: Lustre Client Lockup Under 
Buffered I/O (2.14/2.15)

Thanks Raj - I've checked all of the nodes in the cluster and they all have 
peer_credits set to 8, and credits are set to 256.  AFAIK that's quite low - 8 
concurrent sends to any given peer at a time. Since I only have two OSSes, for 
this client, that's only 16 concurrent sends at a given moment.  IDK if at this 
level this devolves to the maximum RPC size of 1MB or the current max BRW I 
have set of 4MB, but in either case these are small MB values.

I've reached out to Andreas and Patrick to try to get a JIRA account to open a 
bug, but have not heard back yet.  If somebody on-list is more appropriate to 
assist with this, please ping me.  I collected quite a bit of logs/traces 
yesterday and have sysrq stacks to share when I can get access to the whamcloud 
JIRA.

Best,

ellis

From: Raj mailto:rajgau...@gmail.com>>
Sent: Thursday, January 20, 2022 8:14 AM
To: Patrick Farrell mailto:pfarr...@ddn.com>>
Cc: Andreas Dilger mailto:adil...@whamcloud.com>>; Ellis 
Wilson mailto:elliswil...@microsoft.com>>; 
lustre-discuss@lists.lustre.org
Subject: [EXTERNAL] Re: [lustre-discuss] Lustre Client Lockup Under Buffered 
I/O (2.14/2.15)

You don't often get email from rajgau...@gmail.com. 
Learn why this is important
Ellis, I would also check the peer_credit between server and the client. They 
should be same.

On Wed, Jan 19, 2022 at 9:27 AM Patrick Farrell via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:
Ellis,

As you may have guessed, that function just set looks like a node which is 
doing buffered I/O and thrashing for memory.  No particular insight available 
from the count of functions there.

Would you consider opening a bug report in the Whamcloud JIRA?  You should have 
enough for a good report, here's a few things that would be helpful as well:

It sounds like you can hang the node on demand.  If you could collect stack 
traces with:

echo t > /proc/sysrq-trigger
after creating the hang, that would be useful.  (It will print to dmesg.)

You've also collected debug logs - Could you include, say, the last 100 MiB of 
that log set?  That should be reasonable to attach if compressed.

Regards,
Patrick

From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Ellis Wilson via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>>
Sent: Wednesday, January 19, 2022 8:32 AM
To: Andreas Dilger mailto:adil...@whamcloud.com>>
Cc: lustre-discuss@lists.lustre.org 
mailto:lustre-discuss@lists.lustre.org>>
Subject: Re: [lustre-discuss] Lustre Client Lockup Under Buffered I/O 
(2.14/2.15)


Hi Andreas,



Apologies in advance for the top-post.  I'm required to use Outlook for work, 
and it doesn't handle in-line or bottom-posting well.



Client-side defaults prior to any tuning of mine (this is a very minimal 
1-client, 1-MDS/MGS, 2-OSS cluster):

~# lctl get_param llite.*.max_cached_mb

llite.lustrefs-8d52a9c52800.max_cached_mb=

users: 5

max_cached_mb: 7748

used_mb: 0

unused_mb: 7748

reclaim_count: 0

~# lctl get_param osc.*.max_dirty_mb

osc.lustrefs-OST-osc-8d52a9c52800.max_dirty_mb=1938

osc.lustrefs-OST0001-osc-8d52a9c52800.max_dirty_mb=1938

~# lctl get_param osc.*.max_rpcs_in_flight

osc.lustrefs-OST-osc-8d52a9c52800.max_rpcs_in_flight=8

osc.lustrefs-OST0001-osc-8d52a9c52800.max_rpcs_in_flight=8

~# lctl get_param osc.*.max_pages_per_rpc


Re: [lustre-discuss] [EXTERNAL] Re: Lustre Client Lockup Under Buffered I/O (2.14/2.15)

2022-01-20 Thread Peter Jones via lustre-discuss
Ellis

JIRA accounts can be requested from 
i...@whamcloud.com

Peter

From: lustre-discuss  on behalf of 
Ellis Wilson via lustre-discuss 
Reply-To: Ellis Wilson 
Date: Thursday, January 20, 2022 at 6:20 AM
To: Raj , Patrick Farrell 
Cc: "lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] [EXTERNAL] Re: Lustre Client Lockup Under 
Buffered I/O (2.14/2.15)

Thanks Raj – I’ve checked all of the nodes in the cluster and they all have 
peer_credits set to 8, and credits are set to 256.  AFAIK that’s quite low – 8 
concurrent sends to any given peer at a time. Since I only have two OSSes, for 
this client, that’s only 16 concurrent sends at a given moment.  IDK if at this 
level this devolves to the maximum RPC size of 1MB or the current max BRW I 
have set of 4MB, but in either case these are small MB values.

I’ve reached out to Andreas and Patrick to try to get a JIRA account to open a 
bug, but have not heard back yet.  If somebody on-list is more appropriate to 
assist with this, please ping me.  I collected quite a bit of logs/traces 
yesterday and have sysrq stacks to share when I can get access to the whamcloud 
JIRA.

Best,

ellis

From: Raj 
Sent: Thursday, January 20, 2022 8:14 AM
To: Patrick Farrell 
Cc: Andreas Dilger ; Ellis Wilson 
; lustre-discuss@lists.lustre.org
Subject: [EXTERNAL] Re: [lustre-discuss] Lustre Client Lockup Under Buffered 
I/O (2.14/2.15)

You don't often get email from rajgau...@gmail.com. 
Learn why this is important
Ellis, I would also check the peer_credit between server and the client. They 
should be same.

On Wed, Jan 19, 2022 at 9:27 AM Patrick Farrell via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:
Ellis,

As you may have guessed, that function just set looks like a node which is 
doing buffered I/O and thrashing for memory.  No particular insight available 
from the count of functions there.

Would you consider opening a bug report in the Whamcloud JIRA?  You should have 
enough for a good report, here's a few things that would be helpful as well:

It sounds like you can hang the node on demand.  If you could collect stack 
traces with:

echo t > /proc/sysrq-trigger
after creating the hang, that would be useful.  (It will print to dmesg.)

You've also collected debug logs - Could you include, say, the last 100 MiB of 
that log set?  That should be reasonable to attach if compressed.

Regards,
Patrick

From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Ellis Wilson via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>>
Sent: Wednesday, January 19, 2022 8:32 AM
To: Andreas Dilger mailto:adil...@whamcloud.com>>
Cc: lustre-discuss@lists.lustre.org 
mailto:lustre-discuss@lists.lustre.org>>
Subject: Re: [lustre-discuss] Lustre Client Lockup Under Buffered I/O 
(2.14/2.15)


Hi Andreas,



Apologies in advance for the top-post.  I’m required to use Outlook for work, 
and it doesn’t handle in-line or bottom-posting well.



Client-side defaults prior to any tuning of mine (this is a very minimal 
1-client, 1-MDS/MGS, 2-OSS cluster):

~# lctl get_param llite.*.max_cached_mb

llite.lustrefs-8d52a9c52800.max_cached_mb=

users: 5

max_cached_mb: 7748

used_mb: 0

unused_mb: 7748

reclaim_count: 0

~# lctl get_param osc.*.max_dirty_mb

osc.lustrefs-OST-osc-8d52a9c52800.max_dirty_mb=1938

osc.lustrefs-OST0001-osc-8d52a9c52800.max_dirty_mb=1938

~# lctl get_param osc.*.max_rpcs_in_flight

osc.lustrefs-OST-osc-8d52a9c52800.max_rpcs_in_flight=8

osc.lustrefs-OST0001-osc-8d52a9c52800.max_rpcs_in_flight=8

~# lctl get_param osc.*.max_pages_per_rpc

osc.lustrefs-OST-osc-8d52a9c52800.max_pages_per_rpc=1024

osc.lustrefs-OST0001-osc-8d52a9c52800.max_pages_per_rpc=1024



Thus far I’ve reduced the following to what I felt were really conservative 
values for a 16GB RAM machine:



~# lctl set_param llite.*.max_cached_mb=1024

llite.lustrefs-8d52a9c52800.max_cached_mb=1024

~# lctl set_param osc.*.max_dirty_mb=512

osc.lustrefs-OST-osc-8d52a9c52800.max_dirty_mb=512

osc.lustrefs-OST0001-osc-8d52a9c52800.max_dirty_mb=512

~# lctl set_param osc.*.max_pages_per_rpc=128

osc.lustrefs-OST-osc-8d52a9c52800.max_pages_per_rpc=128

osc.lustrefs-OST0001-osc-8d52a9c52800.max_pages_per_rpc=128

~# lctl set_param osc.*.max_rpcs_in_flight=2

osc.lustrefs-OST-osc-8d52a9c52800.max_rpcs_in_flight=2

osc.lustrefs-OST0001-osc-8d52a9c52800.max_rpcs_in_flight=2



This slows down how fast I get to basically OOM from <10 seconds to more like 
25 seconds, but the trend is identical.



As an example of what I’m seeing on the client, you can see below we start with 
most free, and then iozone rapidly (within ~10 seconds) causes all memory to be 
marked used, and that stabilizes at about 140MB 

Re: [lustre-discuss] [EXTERNAL] Re: Lustre Client Lockup Under Buffered I/O (2.14/2.15)

2022-01-20 Thread Ellis Wilson via lustre-discuss
Thanks Raj - I've checked all of the nodes in the cluster and they all have 
peer_credits set to 8, and credits are set to 256.  AFAIK that's quite low - 8 
concurrent sends to any given peer at a time. Since I only have two OSSes, for 
this client, that's only 16 concurrent sends at a given moment.  IDK if at this 
level this devolves to the maximum RPC size of 1MB or the current max BRW I 
have set of 4MB, but in either case these are small MB values.

I've reached out to Andreas and Patrick to try to get a JIRA account to open a 
bug, but have not heard back yet.  If somebody on-list is more appropriate to 
assist with this, please ping me.  I collected quite a bit of logs/traces 
yesterday and have sysrq stacks to share when I can get access to the whamcloud 
JIRA.

Best,

ellis

From: Raj 
Sent: Thursday, January 20, 2022 8:14 AM
To: Patrick Farrell 
Cc: Andreas Dilger ; Ellis Wilson 
; lustre-discuss@lists.lustre.org
Subject: [EXTERNAL] Re: [lustre-discuss] Lustre Client Lockup Under Buffered 
I/O (2.14/2.15)

You don't often get email from rajgau...@gmail.com. 
Learn why this is important
Ellis, I would also check the peer_credit between server and the client. They 
should be same.

On Wed, Jan 19, 2022 at 9:27 AM Patrick Farrell via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:
Ellis,

As you may have guessed, that function just set looks like a node which is 
doing buffered I/O and thrashing for memory.  No particular insight available 
from the count of functions there.

Would you consider opening a bug report in the Whamcloud JIRA?  You should have 
enough for a good report, here's a few things that would be helpful as well:

It sounds like you can hang the node on demand.  If you could collect stack 
traces with:

echo t > /proc/sysrq-trigger
after creating the hang, that would be useful.  (It will print to dmesg.)

You've also collected debug logs - Could you include, say, the last 100 MiB of 
that log set?  That should be reasonable to attach if compressed.

Regards,
Patrick

From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Ellis Wilson via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>>
Sent: Wednesday, January 19, 2022 8:32 AM
To: Andreas Dilger mailto:adil...@whamcloud.com>>
Cc: lustre-discuss@lists.lustre.org 
mailto:lustre-discuss@lists.lustre.org>>
Subject: Re: [lustre-discuss] Lustre Client Lockup Under Buffered I/O 
(2.14/2.15)


Hi Andreas,



Apologies in advance for the top-post.  I'm required to use Outlook for work, 
and it doesn't handle in-line or bottom-posting well.



Client-side defaults prior to any tuning of mine (this is a very minimal 
1-client, 1-MDS/MGS, 2-OSS cluster):

~# lctl get_param llite.*.max_cached_mb

llite.lustrefs-8d52a9c52800.max_cached_mb=

users: 5

max_cached_mb: 7748

used_mb: 0

unused_mb: 7748

reclaim_count: 0

~# lctl get_param osc.*.max_dirty_mb

osc.lustrefs-OST-osc-8d52a9c52800.max_dirty_mb=1938

osc.lustrefs-OST0001-osc-8d52a9c52800.max_dirty_mb=1938

~# lctl get_param osc.*.max_rpcs_in_flight

osc.lustrefs-OST-osc-8d52a9c52800.max_rpcs_in_flight=8

osc.lustrefs-OST0001-osc-8d52a9c52800.max_rpcs_in_flight=8

~# lctl get_param osc.*.max_pages_per_rpc

osc.lustrefs-OST-osc-8d52a9c52800.max_pages_per_rpc=1024

osc.lustrefs-OST0001-osc-8d52a9c52800.max_pages_per_rpc=1024



Thus far I've reduced the following to what I felt were really conservative 
values for a 16GB RAM machine:



~# lctl set_param llite.*.max_cached_mb=1024

llite.lustrefs-8d52a9c52800.max_cached_mb=1024

~# lctl set_param osc.*.max_dirty_mb=512

osc.lustrefs-OST-osc-8d52a9c52800.max_dirty_mb=512

osc.lustrefs-OST0001-osc-8d52a9c52800.max_dirty_mb=512

~# lctl set_param osc.*.max_pages_per_rpc=128

osc.lustrefs-OST-osc-8d52a9c52800.max_pages_per_rpc=128

osc.lustrefs-OST0001-osc-8d52a9c52800.max_pages_per_rpc=128

~# lctl set_param osc.*.max_rpcs_in_flight=2

osc.lustrefs-OST-osc-8d52a9c52800.max_rpcs_in_flight=2

osc.lustrefs-OST0001-osc-8d52a9c52800.max_rpcs_in_flight=2



This slows down how fast I get to basically OOM from <10 seconds to more like 
25 seconds, but the trend is identical.



As an example of what I'm seeing on the client, you can see below we start with 
most free, and then iozone rapidly (within ~10 seconds) causes all memory to be 
marked used, and that stabilizes at about 140MB free until at some point it 
stalls for 20 or more seconds and then some has been synced out:

~# dstat --mem

--memory-usage-

used  free  buff  cach

1029M 13.9G 2756k  215M

1028M 13.9G 2756k  215M

1028M 13.9G 2756k  215M

1088M 13.9G 2756k  215M

2550M 11.5G 2764k 1238M

3989M 10.1G 2764k 1236M

5404M 8881M 2764k 1239M

6831M 7453M 2772k 1240M

8254M 6033M 2772k 1237M

9672M 4613M 

Re: [lustre-discuss] Lustre Client Lockup Under Buffered I/O (2.14/2.15)

2022-01-20 Thread Raj via lustre-discuss
Ellis, I would also check the peer_credit between server and the client.
They should be same.

On Wed, Jan 19, 2022 at 9:27 AM Patrick Farrell via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> Ellis,
>
> As you may have guessed, that function just set looks like a node which is
> doing buffered I/O and thrashing for memory.  No particular insight
> available from the count of functions there.
>
> Would you consider opening a bug report in the Whamcloud JIRA?  You should
> have enough for a good report, here's a few things that would be helpful as
> well:
>
> It sounds like you can hang the node on demand.  If you could collect
> stack traces with:
>
> echo t > /proc/sysrq-trigger
>
> after creating the hang, that would be useful.  (It will print to dmesg.)
>
> You've also collected debug logs - Could you include, say, the last 100
> MiB of that log set?  That should be reasonable to attach if compressed.
>
> Regards,
> Patrick
>
> --
> *From:* lustre-discuss  on
> behalf of Ellis Wilson via lustre-discuss  >
> *Sent:* Wednesday, January 19, 2022 8:32 AM
> *To:* Andreas Dilger 
> *Cc:* lustre-discuss@lists.lustre.org 
> *Subject:* Re: [lustre-discuss] Lustre Client Lockup Under Buffered I/O
> (2.14/2.15)
>
>
> Hi Andreas,
>
>
>
> Apologies in advance for the top-post.  I’m required to use Outlook for
> work, and it doesn’t handle in-line or bottom-posting well.
>
>
>
> Client-side defaults prior to any tuning of mine (this is a very minimal
> 1-client, 1-MDS/MGS, 2-OSS cluster):
>
>
> ~# lctl get_param llite.*.max_cached_mb
>
> llite.lustrefs-8d52a9c52800.max_cached_mb=
>
> users: 5
>
> max_cached_mb: 7748
>
> used_mb: 0
>
> unused_mb: 7748
>
> reclaim_count: 0
>
> ~# lctl get_param osc.*.max_dirty_mb
>
> osc.lustrefs-OST-osc-8d52a9c52800.max_dirty_mb=1938
>
> osc.lustrefs-OST0001-osc-8d52a9c52800.max_dirty_mb=1938
>
> ~# lctl get_param osc.*.max_rpcs_in_flight
>
> osc.lustrefs-OST-osc-8d52a9c52800.max_rpcs_in_flight=8
>
> osc.lustrefs-OST0001-osc-8d52a9c52800.max_rpcs_in_flight=8
>
> ~# lctl get_param osc.*.max_pages_per_rpc
>
> osc.lustrefs-OST-osc-8d52a9c52800.max_pages_per_rpc=1024
>
> osc.lustrefs-OST0001-osc-8d52a9c52800.max_pages_per_rpc=1024
>
>
>
> Thus far I’ve reduced the following to what I felt were really
> conservative values for a 16GB RAM machine:
>
>
>
> ~# lctl set_param llite.*.max_cached_mb=1024
>
> llite.lustrefs-8d52a9c52800.max_cached_mb=1024
>
> ~# lctl set_param osc.*.max_dirty_mb=512
>
> osc.lustrefs-OST-osc-8d52a9c52800.max_dirty_mb=512
>
> osc.lustrefs-OST0001-osc-8d52a9c52800.max_dirty_mb=512
>
> ~# lctl set_param osc.*.max_pages_per_rpc=128
>
> osc.lustrefs-OST-osc-8d52a9c52800.max_pages_per_rpc=128
>
> osc.lustrefs-OST0001-osc-8d52a9c52800.max_pages_per_rpc=128
>
> ~# lctl set_param osc.*.max_rpcs_in_flight=2
>
> osc.lustrefs-OST-osc-8d52a9c52800.max_rpcs_in_flight=2
>
> osc.lustrefs-OST0001-osc-8d52a9c52800.max_rpcs_in_flight=2
>
>
>
> This slows down how fast I get to basically OOM from <10 seconds to more
> like 25 seconds, but the trend is identical.
>
>
>
> As an example of what I’m seeing on the client, you can see below we start
> with most free, and then iozone rapidly (within ~10 seconds) causes all
> memory to be marked used, and that stabilizes at about 140MB free until at
> some point it stalls for 20 or more seconds and then some has been synced
> out:
>
>
> ~# dstat --mem
>
> --memory-usage-
>
> used  free  buff  cach
>
> 1029M 13.9G 2756k  215M
>
> 1028M 13.9G 2756k  215M
>
> 1028M 13.9G 2756k  215M
>
> 1088M 13.9G 2756k  215M
>
> 2550M 11.5G 2764k 1238M
>
> 3989M 10.1G 2764k 1236M
>
> 5404M 8881M 2764k 1239M
>
> 6831M 7453M 2772k 1240M
>
> 8254M 6033M 2772k 1237M
>
> 9672M 4613M 2772k 1239M
>
> 10.6G 3462M 2772k 1240M
>
> 12.1G 1902M 2772k 1240M
>
> 13.4G  582M 2772k 1240M
>
> 13.9G  139M 2488k 1161M
>
> 13.9G  139M 1528k 1174M
>
> 13.9G  140M  896k 1175M
>
> 13.9G  139M  676k 1176M
>
> 13.9G  142M  528k 1177M
>
> 13.9G  140M  484k 1188M
>
> 13.9G  139M  492k 1188M
>
> 13.9G  139M  488k 1188M
>
> 13.9G  141M  488k 1186M
>
> 13.9G  141M  480k 1187M
>
> 13.9G  139M  492k 1188M
>
> 13.9G  141M  600k 1188M
>
> 13.9G  139M  580k 1187M
>
> 13.9G  140M  536k 1186M
>
> 13.9G  141M  668k 1186M
>
> 13.9G  139M  580k 1188M
>
> 13.9G  140M  568k 1187M
>
> 12.7G 1299M 2064k 1197M missed 20 ticks <-- client is totally unresponsive
> during this time
>
> 11.0G 2972M 5404k 1238M^C
>
>
>
> Additionally, I’ve messed with sysctl settings.  Defaults:
>
> vm.dirty_background_bytes = 0
>
> vm.dirty_background_ratio = 10
>
> vm.dirty_bytes = 0
>
> vm.dirty_expire_centisecs = 3000
>
> vm.dirty_ratio = 20
>
> vm.dirty_writeback_centisecs = 500
>
>
>
> Revised to conservative values:
>
> vm.dirty_background_bytes = 1073741824
>
> vm.dirty_background_ratio = 0
>
> vm.dirty_bytes = 2147483648
>
> vm.dirty_expire_centisecs = 200
>
> vm.dirty_ratio = 0
>
> 

[lustre-discuss] Lustre 2.12.6 client crashes

2022-01-20 Thread Christopher Mountford via lustre-discuss
Date: Thu, 20 Jan 2022 12:07:40 +
From: Christopher Mountford 
To: lustre-discuss@lists.lustre.org
Subject: Client crashes
User-Agent: NeoMutt/20170306 (1.8.0)

Hi All,

We've started getting some fairly regular client panics on out lustre 2.12.7 
filesystem, looking at the stack trace I think we are hitting this bug: 
https://jira.whamcloud.com/browse/LU-12752

I note that a fix is in 2.15.0, is this likely to be patched in a 2.12 release?

We're still trying to isolate the job that is causing the crash, but once we 
have we should be able to reproduce this reliably.

Kind Regards,
Christopher.

Log entriy:

Jan 20 10:23:39 lmem006 kernel: LustreError: 
4661:0:(osc_cache.c:2519:osc_teardown_async_page()) extent 937e2756e4d0@{[0 
-> 255/255], [2|0|-|cache|wi|92fdd1dd8b40], 
[1703936|1|+|-|932384f1e880|256|  (null)]} trunc at 42.
+Jan 20 10:23:39 lmem006 kernel: LustreError: 
4661:0:(osc_cache.c:2519:osc_teardown_async_page()) ### extent: 
937e2756e4d0 ns: alice3-OST001f-osc-938e6a743000 lock: 
932384f1e880/0x6024b6d908313ce7 lrc: 2/0,0 mode: PW/PW res:
+[0x7c400:0x5c888a:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 
65536->172031) flags: 0x8000200 nid: local remote: 0x345e4fe1c451a182 
expref: -99 pid: 955 timeout: 0 lvb_type: 1
+Jan 20 10:23:39 lmem006 kernel: LustreError: 
4661:0:(osc_page.c:192:osc_page_delete()) page@933651225e00[2 
93228480b2f0 4 1   (null)]
Jan 20 10:23:39 lmem006 kernel: LustreError: 
4661:0:(osc_page.c:192:osc_page_delete()) vvp-page@933651225e50(0:0) 
vm@eaeada357d80 6f0879 3:0 933651225e00 42 lru
Jan 20 10:23:39 lmem006 kernel: LustreError: 
4661:0:(osc_page.c:192:osc_page_delete()) lov-page@933651225e90, comp 
index: 1, gen: 6
Jan 20 10:23:39 lmem006 kernel: LustreError: 
4661:0:(osc_page.c:192:osc_page_delete()) osc-page@933651225ec8 42: 1< 
0x845fed 2 0 + - > 2< 172032 0 4096 0x0 0x420 |   (null) 
938e52a7d738 92fdd1dd8b40 > 3< 0 0 0 > 4< 0 0 8 1703936 - | - - + - >
+5< - - + - | 0 - | 1 - ->
+Jan 20 10:23:39 lmem006 kernel: LustreError: 
4661:0:(osc_page.c:192:osc_page_delete()) end page@933651225e00
Jan 20 10:23:39 lmem006 kernel: LustreError: 
4661:0:(osc_page.c:192:osc_page_delete()) Trying to teardown failed: -16
Jan 20 10:23:39 lmem006 kernel: LustreError: 
4661:0:(osc_page.c:193:osc_page_delete()) ASSERTION( 0 ) failed:
Jan 20 10:23:40 lmem006 kernel: LustreError: 
4661:0:(osc_page.c:193:osc_page_delete()) LBUG
Jan 20 10:23:40 lmem006 kernel: Pid: 4661, comm: diamond 
3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 30 15:51:32 UTC 2021
Jan 20 10:23:40 lmem006 kernel: Call Trace:
Jan 20 10:23:40 lmem006 kernel: [] 
libcfs_call_trace+0x8c/0xc0 [libcfs]
Jan 20 10:23:40 lmem006 kernel: [] lbug_with_loc+0x4c/0xa0 
[libcfs]
Jan 20 10:23:40 lmem006 kernel: [] 
osc_page_delete+0x48f/0x500 [osc]
Jan 20 10:23:40 lmem006 kernel: [] cl_page_delete0+0x80/0x220 
[obdclass]
Jan 20 10:23:40 lmem006 kernel: [] cl_page_delete+0x33/0x110 
[obdclass]
Jan 20 10:23:40 lmem006 kernel: [] 
ll_invalidatepage+0x7f/0x170 [lustre]
Jan 20 10:23:40 lmem006 kernel: [] 
do_invalidatepage_range+0x7d/0x90
Jan 20 10:23:40 lmem006 kernel: [] 
truncate_inode_page+0x77/0x80
Jan 20 10:23:40 lmem006 kernel: [] 
truncate_inode_pages_range+0x1ea/0x750
Jan 20 10:23:40 lmem006 kernel: [] 
truncate_inode_pages_final+0x4f/0x60
Jan 20 10:23:40 lmem006 kernel: [] ll_delete_inode+0x4f/0x230 
[lustre]
Jan 20 10:23:40 lmem006 kernel: [] evict+0xb4/0x180
Jan 20 10:23:40 lmem006 kernel: [] iput+0xfc/0x190
Jan 20 10:23:40 lmem006 kernel: [] __dentry_kill+0x158/0x1d0
Jan 20 10:23:40 lmem006 kernel: [] dput+0xb5/0x1a0
Jan 20 10:23:40 lmem006 kernel: [] __fput+0x18d/0x230
Jan 20 10:23:40 lmem006 kernel: [] fput+0xe/0x10
Jan 20 10:23:40 lmem006 kernel: [] task_work_run+0xbb/0xe0
Jan 20 10:23:40 lmem006 kernel: [] do_notify_resume+0xa5/0xc0
Jan 20 10:23:40 lmem006 kernel: [] int_signal+0x12/0x17
Jan 20 10:23:40 lmem006 kernel: [] 0x
Jan 20 10:23:40 lmem006 kernel: Kernel panic - not syncing: LBUG


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org