[Kernel-packages] [Bug 1972898] Re: Kernel Bug: 22.04, EXT4, samba (smbd) on MDADM raid6: Copying large volume of files.

2023-03-27 Thread Michael D Labriola
I just got bit by this, too.  Was transferring ~40T via rsync to a new
NAS setup...  got about 12T in and it just stopped.  Eventually realized
the mountpoint was unresponsive.  Rebooting and kicking rsync back off
resulted in the same scenario almost immediately.  The machine in
question is running 5.15.27.  I'm going to upgrade the kernel and see if
the problem persists...  Does anyone know if this is fixed or being
worked on upstream already?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1972898

Title:
  Kernel Bug: 22.04,EXT4, samba (smbd)  on MDADM raid6: Copying large
  volume of files.

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  60 Drive MDADM Raid 6, ext4, Ubuntu 22.04.  Issue reproduced on both 
Supermicro
  SSG-6048R and HP ProLiant DL380 servers.

  System was stable on Ubuntu 20.04.  Unstable following upgrade to 22.04  
(kernel version 5.15)
  To reproduce kernel error,  copy thousands of files (~1tb of data) to 
samba-share from any windows computer. After some time(seconds to minutes), a 
Kernel error is thrown,  smbd process is unresponsive and cannot be killed, 
file transfer stops,  the mounted drive freezes (directory operations including 
ls,mv,cp on the mount are not possible) and the system needs to be 
hard-rebooted.  Quite an unhappy outcome :) 

  I then moved the 60 drives to an external enclosure, and connected to
  a new computer (HP ProLiant DL380). After assembling the raid drive,
  with a fresh install of Ubuntu 22.04 on the new system, the kernel
  error was reproduced. I cannot reproduce the error copying via nfs or
  copying files on the drive itself. Single files or small transfers
  proceed without error. Filesystem passes fsck.

  Happy to assist in troubleshooting in any way.

  Kernel error message from both systems follows.

  **New System (HP ProLiant DL380) Kernel Error**
  May 10 01:32:49 nas3 kernel: [ 1463.900175] [ cut here 
]
  May 10 01:32:49 nas3 kernel: [ 1463.900179] kernel BUG at 
fs/ext4/xattr.c:2071!
  May 10 01:32:49 nas3 kernel: [ 1463.900214] invalid opcode:  [#1] SMP PTI
  May 10 01:32:49 nas3 kernel: [ 1463.900233] CPU: 0 PID: 5989 Comm: smbd Not 
tainted 5.15.0-27-generic #28-Ubuntu
  May 10 01:32:49 nas3 kernel: [ 1463.900939] Hardware name: HP ProLiant DL380 
Gen9/ProLiant DL380 Gen9, BIOS P89 04/25/2017
  May 10 01:32:49 nas3 kernel: [ 1463.901560] RIP: 
0010:ext4_xattr_block_set+0xbba/0xbd0
  May 10 01:32:49 nas3 kernel: [ 1463.902190] Code: c7 45 8c f4 ff ff ff eb b4 
48 8b 7d 90 48 c7 c1 7f 12 61 a8 ba 2d 08 00 00 48 c7 c6 d0 3c 25 a8 e8 9b 6f 
ff ff e9 a5 fe ff ff <0>
  May 10 01:32:49 nas3 kernel: [ 1463.903445] RSP: 0018:a59e0b51f9c0 
EFLAGS: 00010206
  May 10 01:32:49 nas3 kernel: [ 1463.904080] RAX: 0003 RBX: 
97aa0490b680 RCX: a860a8e7
  May 10 01:32:49 nas3 kernel: [ 1463.904727] RDX: 0261 RSI: 
 RDI: 0003cca0
  May 10 01:32:49 nas3 kernel: [ 1463.905384] RBP: a59e0b51fa70 R08: 
97aa21824138 R09: 
  May 10 01:32:49 nas3 kernel: [ 1463.906051] R10: 97aa0f6e87e0 R11: 
97aae9073ff0 R12: 
  May 10 01:32:49 nas3 kernel: [ 1463.906738] R13: 97ada77feac0 R14: 
0003165b R15: 
  May 10 01:32:49 nas3 kernel: [ 1463.907411] FS:  7f06ceb61a40() 
GS:97b93f80() knlGS:
  May 10 01:32:49 nas3 kernel: [ 1463.908049] CS:  0010 DS:  ES:  CR0: 
80050033
  May 10 01:32:49 nas3 kernel: [ 1463.908697] CR2: 55c076d0d4f8 CR3: 
00029e6fe003 CR4: 001706f0
  May 10 01:32:49 nas3 kernel: [ 1463.909349] Call Trace:
  May 10 01:32:49 nas3 kernel: [ 1463.909989]  
  May 10 01:32:49 nas3 kernel: [ 1463.910624]  ? 
jbd2_journal_get_write_access+0x43/0x90
  May 10 01:32:49 nas3 kernel: [ 1463.911360]  ext4_xattr_set_handle+0x487/0x620
  May 10 01:32:49 nas3 kernel: [ 1463.912032]  __ext4_set_acl+0xc1/0x130
  May 10 01:32:49 nas3 kernel: [ 1463.912689]  ext4_init_acl+0xe8/0x160
  May 10 01:32:49 nas3 kernel: [ 1463.913327]  __ext4_new_inode+0xf60/0x14e0
  May 10 01:32:49 nas3 kernel: [ 1463.913962]  ? path_parentat+0x4c/0x90
  May 10 01:32:49 nas3 kernel: [ 1463.914595]  ext4_mkdir+0x157/0x330
  May 10 01:32:49 nas3 kernel: [ 1463.915265]  vfs_mkdir+0x142/0x200
  May 10 01:32:49 nas3 kernel: [ 1463.915883]  do_mkdirat+0x120/0x140
  May 10 01:32:49 nas3 kernel: [ 1463.916501]  __x64_sys_mkdirat+0x51/0x70
  May 10 01:32:49 nas3 kernel: [ 1463.917115]  do_syscall_64+0x5c/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.917733]  ? 
exit_to_user_mode_prepare+0x37/0xb0
  May 10 01:32:49 nas3 kernel: [ 1463.918365]  ? 
syscall_exit_to_user_mode+0x27/0x50
  May 10 01:32:49 nas3 kernel: [ 1463.919035]  ? __x64_sys_newfstatat+0x1c/0x20
  May 10 01:32:49 nas3 kernel: [ 1463.919665]  ? 

[Kernel-packages] [Bug 1972898] Re: Kernel Bug: 22.04, EXT4, samba (smbd) on MDADM raid6: Copying large volume of files.

2022-09-19 Thread MH
Hi list,

I get this bug, too. It is not related to samba or Fedora, but the 
kernel, I assume. Something seems to have changed in the ext4 driver in 
the kernel starting with roughly 5.13.

I copy large amount of files and hardlinks onto my filesystem and 
suddenly it stops working with the kernel bug (see below). The 
filesystem seems still to be mounted, lsof shows that one process is 
working on it, but as soon as I try to access the FS my terminal session 
hangs.

I'm using opensuse leap 15.4 (5.14.21-150400.24.21-default #1 SMP 
PREEMPT_DYNAMIC Wed Sep 7 06:51:18 UTC 2022 (974d0aa) x86_64 x86_64 
x86_64 GNU/Linux). The problem was not apparent in opensuse leap 15.3 
with a kernel 15.3. The bug seems to be introduced with kernel 5.13 as 
others reported similar problems with this or newer kernel versions.


Here some more information about my system:
Ext4 FS (LVM on a hardware HP Raid (HP ProLiant ML350 Gen9)):
> #  dumpe2fs -h /dev/mapper/backup-backup 
> dumpe2fs 1.46.4 (18-Aug-2021)
> Filesystem volume name:   
> Last mounted on:  /backup
> Filesystem UUID:  1337b52d-11ee-4914-af6c-26085295f39a
> Filesystem magic number:  0xEF53
> Filesystem revision #:1 (dynamic)
> Filesystem features:  has_journal ext_attr resize_inode dir_index 
> filetype needs_recovery extent 64bit flex_bg sparse_super large_file 
> huge_file uninit_bg dir_nlink extra_isize
> Filesystem flags: signed_directory_hash 
> Default mount options:user_xattr acl
> Filesystem state: clean
> Errors behavior:  Continue
> Filesystem OS type:   Linux
> Inode count:  610463744
> Block count:  4883703808
> Reserved block count: 4883703
> Free blocks:  1201445461
> Free inodes:  455325946
> First block:  0
> Block size:   4096
> Fragment size:4096
> Group descriptor size:64
> Blocks per group: 32768
> Fragments per group:  32768
> Inodes per group: 4096
> Inode blocks per group:   256
> RAID stride:  64
> RAID stripe width:448
> Flex block group size:16
> Filesystem created:   Fri Apr  7 12:47:45 2017
> Last mount time:  Wed Sep 14 15:32:42 2022
> Last write time:  Wed Sep 14 15:32:42 2022
> Mount count:  37
> Maximum mount count:  -1
> Last checked: Wed Jan 27 12:56:16 2021
> Check interval:   0 ()
> Lifetime writes:  88 TB
> Reserved blocks uid:  0 (user root)
> Reserved blocks gid:  0 (group root)
> First inode:  11
> Inode size:   256
> Required extra isize: 28
> Desired extra isize:  28
> Journal inode:8
> Default directory hash:   half_md4
> Directory Hash Seed:  998d3208-bcb9-4a93-8f50-9e686d7ffe52
> Journal backup:   inode blocks
> Journal features: journal_incompat_revoke journal_64bit
> Total journal size:   128M
> Total journal blocks: 32768
> Max transaction length:   32768
> Fast commit length:   0
> Journal sequence: 0x00152ac7
> Journal start:3046


Kernel messages from /var/log/messages:
> 2022-09-17T00:25:23.574959+02:00 HOSTNAME kernel: [205391.151933][ T2173] 
> [ cut here ]
> 2022-09-17T00:25:23.574991+02:00 HOSTNAME kernel: [205391.151939][ T2173] 
> kernel BUG at ../fs/ext4/xattr.c:2071!
> 2022-09-17T00:25:23.574995+02:00 HOSTNAME kernel: [205391.151951][ T2173] 
> invalid opcode:  [#1] PREEMPT SMP PTI
> 2022-09-17T00:25:23.574997+02:00 HOSTNAME kernel: [205391.151963][ T2173] 
> CPU: 6 PID: 2173 Comm: rsync Tainted: GN 5.14.21-150400.2
> 4.18-default #1 SLE15-SP4 695ab7a8fc20f5ddb345280570966cd1eb06d469
> 2022-09-17T00:25:23.575006+02:00 HOSTNAME kernel: [205391.151976][ T2173] 
> Hardware name: HP ProLiant ML350 Gen9/ProLiant ML350 Gen9, BIOS P92 10/21/2019
> 2022-09-17T00:25:23.575010+02:00 HOSTNAME kernel: [205391.151982][ T2173] 
> RIP: 0010:ext4_xattr_block_set+0xd9b/0xea0 [ext4]
> 2022-09-17T00:25:23.575013+02:00 HOSTNAME kernel: [205391.152088][ T2173] 
> Code: 7c 24 30 41 b9 01 00 00 00 41 b8 01 00 00 00 4c 89 e9 31 d2 e8 f6 d6 fc 
> ff e9 ed fc ff ff e8 fc 2a 11 fb 0f 0b e9 d9 f6 ff ff <0f> 0b 48 8b 7c 24 08 
> e8 d9 9f a6 fa e9 d1 fe ff ff c7 44 24 4c f4
> 2022-09-17T00:25:23.575015+02:00 HOSTNAME kernel: [205391.152099][ T2173] 
> RSP: 0018:9705406dba20 EFLAGS: 00010212
> 2022-09-17T00:25:23.575018+02:00 HOSTNAME kernel: [205391.152107][ T2173] 
> RAX:  RBX: 0fdc0200 RCX: 
> 2022-09-17T00:25:23.575020+02:00 HOSTNAME kernel: [205391.152113][ T2173] 
> RDX: fff7e006 RSI: c0cf112c RDI: 2c02c000d410
> 2022-09-17T00:25:23.575023+02:00 HOSTNAME kernel: [205391.152119][ T2173] 
> RBP: 9705406dbbd8 R08:  R09: 
> 2022-09-17T00:25:23.575025+02:00 HOSTNAME kernel: [205391.152125][ T2173] 
> R10:  R11: 8ae4ffdc4800 R12: 

[Kernel-packages] [Bug 1972898] Re: Kernel Bug: 22.04, EXT4, samba (smbd) on MDADM raid6: Copying large volume of files.

2022-05-10 Thread Mathew Moore
apport-collect 1972898  Cannot run command. needs browser authorization.
No gui

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1972898

Title:
  Kernel Bug: 22.04,EXT4, samba (smbd)  on MDADM raid6: Copying large
  volume of files.

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  60 Drive MDADM Raid 6, ext4, Ubuntu 22.04.  Issue reproduced on both 
Supermicro
  SSG-6048R and HP ProLiant DL380 servers.

  System was stable on Ubuntu 20.04.  Unstable following upgrade to 22.04  
(kernel version 5.15)
  To reproduce kernel error,  copy thousands of files (~1tb of data) to 
samba-share from any windows computer. After some time(seconds to minutes), a 
Kernel error is thrown,  smbd process is unresponsive and cannot be killed, 
file transfer stops,  the mounted drive freezes (directory operations including 
ls,mv,cp on the mount are not possible) and the system needs to be 
hard-rebooted.  Quite an unhappy outcome :) 

  I then moved the 60 drives to an external enclosure, and connected to
  a new computer (HP ProLiant DL380). After assembling the raid drive,
  with a fresh install of Ubuntu 22.04 on the new system, the kernel
  error was reproduced. I cannot reproduce the error copying via nfs or
  copying files on the drive itself. Single files or small transfers
  proceed without error. Filesystem passes fsck.

  Happy to assist in troubleshooting in any way.

  Kernel error message from both systems follows.

  **New System (HP ProLiant DL380) Kernel Error**
  May 10 01:32:49 nas3 kernel: [ 1463.900175] [ cut here 
]
  May 10 01:32:49 nas3 kernel: [ 1463.900179] kernel BUG at 
fs/ext4/xattr.c:2071!
  May 10 01:32:49 nas3 kernel: [ 1463.900214] invalid opcode:  [#1] SMP PTI
  May 10 01:32:49 nas3 kernel: [ 1463.900233] CPU: 0 PID: 5989 Comm: smbd Not 
tainted 5.15.0-27-generic #28-Ubuntu
  May 10 01:32:49 nas3 kernel: [ 1463.900939] Hardware name: HP ProLiant DL380 
Gen9/ProLiant DL380 Gen9, BIOS P89 04/25/2017
  May 10 01:32:49 nas3 kernel: [ 1463.901560] RIP: 
0010:ext4_xattr_block_set+0xbba/0xbd0
  May 10 01:32:49 nas3 kernel: [ 1463.902190] Code: c7 45 8c f4 ff ff ff eb b4 
48 8b 7d 90 48 c7 c1 7f 12 61 a8 ba 2d 08 00 00 48 c7 c6 d0 3c 25 a8 e8 9b 6f 
ff ff e9 a5 fe ff ff <0>
  May 10 01:32:49 nas3 kernel: [ 1463.903445] RSP: 0018:a59e0b51f9c0 
EFLAGS: 00010206
  May 10 01:32:49 nas3 kernel: [ 1463.904080] RAX: 0003 RBX: 
97aa0490b680 RCX: a860a8e7
  May 10 01:32:49 nas3 kernel: [ 1463.904727] RDX: 0261 RSI: 
 RDI: 0003cca0
  May 10 01:32:49 nas3 kernel: [ 1463.905384] RBP: a59e0b51fa70 R08: 
97aa21824138 R09: 
  May 10 01:32:49 nas3 kernel: [ 1463.906051] R10: 97aa0f6e87e0 R11: 
97aae9073ff0 R12: 
  May 10 01:32:49 nas3 kernel: [ 1463.906738] R13: 97ada77feac0 R14: 
0003165b R15: 
  May 10 01:32:49 nas3 kernel: [ 1463.907411] FS:  7f06ceb61a40() 
GS:97b93f80() knlGS:
  May 10 01:32:49 nas3 kernel: [ 1463.908049] CS:  0010 DS:  ES:  CR0: 
80050033
  May 10 01:32:49 nas3 kernel: [ 1463.908697] CR2: 55c076d0d4f8 CR3: 
00029e6fe003 CR4: 001706f0
  May 10 01:32:49 nas3 kernel: [ 1463.909349] Call Trace:
  May 10 01:32:49 nas3 kernel: [ 1463.909989]  
  May 10 01:32:49 nas3 kernel: [ 1463.910624]  ? 
jbd2_journal_get_write_access+0x43/0x90
  May 10 01:32:49 nas3 kernel: [ 1463.911360]  ext4_xattr_set_handle+0x487/0x620
  May 10 01:32:49 nas3 kernel: [ 1463.912032]  __ext4_set_acl+0xc1/0x130
  May 10 01:32:49 nas3 kernel: [ 1463.912689]  ext4_init_acl+0xe8/0x160
  May 10 01:32:49 nas3 kernel: [ 1463.913327]  __ext4_new_inode+0xf60/0x14e0
  May 10 01:32:49 nas3 kernel: [ 1463.913962]  ? path_parentat+0x4c/0x90
  May 10 01:32:49 nas3 kernel: [ 1463.914595]  ext4_mkdir+0x157/0x330
  May 10 01:32:49 nas3 kernel: [ 1463.915265]  vfs_mkdir+0x142/0x200
  May 10 01:32:49 nas3 kernel: [ 1463.915883]  do_mkdirat+0x120/0x140
  May 10 01:32:49 nas3 kernel: [ 1463.916501]  __x64_sys_mkdirat+0x51/0x70
  May 10 01:32:49 nas3 kernel: [ 1463.917115]  do_syscall_64+0x5c/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.917733]  ? 
exit_to_user_mode_prepare+0x37/0xb0
  May 10 01:32:49 nas3 kernel: [ 1463.918365]  ? 
syscall_exit_to_user_mode+0x27/0x50
  May 10 01:32:49 nas3 kernel: [ 1463.919035]  ? __x64_sys_newfstatat+0x1c/0x20
  May 10 01:32:49 nas3 kernel: [ 1463.919665]  ? do_syscall_64+0x69/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.920300]  ? __x64_sys_newfstatat+0x1c/0x20
  May 10 01:32:49 nas3 kernel: [ 1463.920929]  ? do_syscall_64+0x69/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.921534]  ? __x64_sys_newfstatat+0x1c/0x20
  May 10 01:32:49 nas3 kernel: [ 1463.922121]  ? 

[Kernel-packages] [Bug 1972898] Re: Kernel Bug: 22.04, EXT4, samba (smbd) on MDADM raid6: Copying large volume of files.

2022-05-10 Thread Mathew Moore
Pls let me know how else I can get the information.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1972898

Title:
  Kernel Bug: 22.04,EXT4, samba (smbd)  on MDADM raid6: Copying large
  volume of files.

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  60 Drive MDADM Raid 6, ext4, Ubuntu 22.04.  Issue reproduced on both 
Supermicro
  SSG-6048R and HP ProLiant DL380 servers.

  System was stable on Ubuntu 20.04.  Unstable following upgrade to 22.04  
(kernel version 5.15)
  To reproduce kernel error,  copy thousands of files (~1tb of data) to 
samba-share from any windows computer. After some time(seconds to minutes), a 
Kernel error is thrown,  smbd process is unresponsive and cannot be killed, 
file transfer stops,  the mounted drive freezes (directory operations including 
ls,mv,cp on the mount are not possible) and the system needs to be 
hard-rebooted.  Quite an unhappy outcome :) 

  I then moved the 60 drives to an external enclosure, and connected to
  a new computer (HP ProLiant DL380). After assembling the raid drive,
  with a fresh install of Ubuntu 22.04 on the new system, the kernel
  error was reproduced. I cannot reproduce the error copying via nfs or
  copying files on the drive itself. Single files or small transfers
  proceed without error. Filesystem passes fsck.

  Happy to assist in troubleshooting in any way.

  Kernel error message from both systems follows.

  **New System (HP ProLiant DL380) Kernel Error**
  May 10 01:32:49 nas3 kernel: [ 1463.900175] [ cut here 
]
  May 10 01:32:49 nas3 kernel: [ 1463.900179] kernel BUG at 
fs/ext4/xattr.c:2071!
  May 10 01:32:49 nas3 kernel: [ 1463.900214] invalid opcode:  [#1] SMP PTI
  May 10 01:32:49 nas3 kernel: [ 1463.900233] CPU: 0 PID: 5989 Comm: smbd Not 
tainted 5.15.0-27-generic #28-Ubuntu
  May 10 01:32:49 nas3 kernel: [ 1463.900939] Hardware name: HP ProLiant DL380 
Gen9/ProLiant DL380 Gen9, BIOS P89 04/25/2017
  May 10 01:32:49 nas3 kernel: [ 1463.901560] RIP: 
0010:ext4_xattr_block_set+0xbba/0xbd0
  May 10 01:32:49 nas3 kernel: [ 1463.902190] Code: c7 45 8c f4 ff ff ff eb b4 
48 8b 7d 90 48 c7 c1 7f 12 61 a8 ba 2d 08 00 00 48 c7 c6 d0 3c 25 a8 e8 9b 6f 
ff ff e9 a5 fe ff ff <0>
  May 10 01:32:49 nas3 kernel: [ 1463.903445] RSP: 0018:a59e0b51f9c0 
EFLAGS: 00010206
  May 10 01:32:49 nas3 kernel: [ 1463.904080] RAX: 0003 RBX: 
97aa0490b680 RCX: a860a8e7
  May 10 01:32:49 nas3 kernel: [ 1463.904727] RDX: 0261 RSI: 
 RDI: 0003cca0
  May 10 01:32:49 nas3 kernel: [ 1463.905384] RBP: a59e0b51fa70 R08: 
97aa21824138 R09: 
  May 10 01:32:49 nas3 kernel: [ 1463.906051] R10: 97aa0f6e87e0 R11: 
97aae9073ff0 R12: 
  May 10 01:32:49 nas3 kernel: [ 1463.906738] R13: 97ada77feac0 R14: 
0003165b R15: 
  May 10 01:32:49 nas3 kernel: [ 1463.907411] FS:  7f06ceb61a40() 
GS:97b93f80() knlGS:
  May 10 01:32:49 nas3 kernel: [ 1463.908049] CS:  0010 DS:  ES:  CR0: 
80050033
  May 10 01:32:49 nas3 kernel: [ 1463.908697] CR2: 55c076d0d4f8 CR3: 
00029e6fe003 CR4: 001706f0
  May 10 01:32:49 nas3 kernel: [ 1463.909349] Call Trace:
  May 10 01:32:49 nas3 kernel: [ 1463.909989]  
  May 10 01:32:49 nas3 kernel: [ 1463.910624]  ? 
jbd2_journal_get_write_access+0x43/0x90
  May 10 01:32:49 nas3 kernel: [ 1463.911360]  ext4_xattr_set_handle+0x487/0x620
  May 10 01:32:49 nas3 kernel: [ 1463.912032]  __ext4_set_acl+0xc1/0x130
  May 10 01:32:49 nas3 kernel: [ 1463.912689]  ext4_init_acl+0xe8/0x160
  May 10 01:32:49 nas3 kernel: [ 1463.913327]  __ext4_new_inode+0xf60/0x14e0
  May 10 01:32:49 nas3 kernel: [ 1463.913962]  ? path_parentat+0x4c/0x90
  May 10 01:32:49 nas3 kernel: [ 1463.914595]  ext4_mkdir+0x157/0x330
  May 10 01:32:49 nas3 kernel: [ 1463.915265]  vfs_mkdir+0x142/0x200
  May 10 01:32:49 nas3 kernel: [ 1463.915883]  do_mkdirat+0x120/0x140
  May 10 01:32:49 nas3 kernel: [ 1463.916501]  __x64_sys_mkdirat+0x51/0x70
  May 10 01:32:49 nas3 kernel: [ 1463.917115]  do_syscall_64+0x5c/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.917733]  ? 
exit_to_user_mode_prepare+0x37/0xb0
  May 10 01:32:49 nas3 kernel: [ 1463.918365]  ? 
syscall_exit_to_user_mode+0x27/0x50
  May 10 01:32:49 nas3 kernel: [ 1463.919035]  ? __x64_sys_newfstatat+0x1c/0x20
  May 10 01:32:49 nas3 kernel: [ 1463.919665]  ? do_syscall_64+0x69/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.920300]  ? __x64_sys_newfstatat+0x1c/0x20
  May 10 01:32:49 nas3 kernel: [ 1463.920929]  ? do_syscall_64+0x69/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.921534]  ? __x64_sys_newfstatat+0x1c/0x20
  May 10 01:32:49 nas3 kernel: [ 1463.922121]  ? do_syscall_64+0x69/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.922703]  ? 

[Kernel-packages] [Bug 1972898] Re: Kernel Bug: 22.04, EXT4, samba (smbd) on MDADM raid6: Copying large volume of files.

2022-05-10 Thread Brian Murray
** Package changed: ubuntu => linux (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1972898

Title:
  Kernel Bug: 22.04,EXT4, samba (smbd)  on MDADM raid6: Copying large
  volume of files.

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  60 Drive MDADM Raid 6, ext4, Ubuntu 22.04.  Issue reproduced on both 
Supermicro
  SSG-6048R and HP ProLiant DL380 servers.

  System was stable on Ubuntu 20.04.  Unstable following upgrade to 22.04  
(kernel version 5.15)
  To reproduce kernel error,  copy thousands of files (~1tb of data) to 
samba-share from any windows computer. After some time(seconds to minutes), a 
Kernel error is thrown,  smbd process is unresponsive and cannot be killed, 
file transfer stops,  the mounted drive freezes (directory operations including 
ls,mv,cp on the mount are not possible) and the system needs to be 
hard-rebooted.  Quite an unhappy outcome :) 

  I then moved the 60 drives to an external enclosure, and connected to
  a new computer (HP ProLiant DL380). After assembling the raid drive,
  with a fresh install of Ubuntu 22.04 on the new system, the kernel
  error was reproduced. I cannot reproduce the error copying via nfs or
  copying files on the drive itself. Single files or small transfers
  proceed without error. Filesystem passes fsck.

  Happy to assist in troubleshooting in any way.

  Kernel error message from both systems follows.

  **New System (HP ProLiant DL380) Kernel Error**
  May 10 01:32:49 nas3 kernel: [ 1463.900175] [ cut here 
]
  May 10 01:32:49 nas3 kernel: [ 1463.900179] kernel BUG at 
fs/ext4/xattr.c:2071!
  May 10 01:32:49 nas3 kernel: [ 1463.900214] invalid opcode:  [#1] SMP PTI
  May 10 01:32:49 nas3 kernel: [ 1463.900233] CPU: 0 PID: 5989 Comm: smbd Not 
tainted 5.15.0-27-generic #28-Ubuntu
  May 10 01:32:49 nas3 kernel: [ 1463.900939] Hardware name: HP ProLiant DL380 
Gen9/ProLiant DL380 Gen9, BIOS P89 04/25/2017
  May 10 01:32:49 nas3 kernel: [ 1463.901560] RIP: 
0010:ext4_xattr_block_set+0xbba/0xbd0
  May 10 01:32:49 nas3 kernel: [ 1463.902190] Code: c7 45 8c f4 ff ff ff eb b4 
48 8b 7d 90 48 c7 c1 7f 12 61 a8 ba 2d 08 00 00 48 c7 c6 d0 3c 25 a8 e8 9b 6f 
ff ff e9 a5 fe ff ff <0>
  May 10 01:32:49 nas3 kernel: [ 1463.903445] RSP: 0018:a59e0b51f9c0 
EFLAGS: 00010206
  May 10 01:32:49 nas3 kernel: [ 1463.904080] RAX: 0003 RBX: 
97aa0490b680 RCX: a860a8e7
  May 10 01:32:49 nas3 kernel: [ 1463.904727] RDX: 0261 RSI: 
 RDI: 0003cca0
  May 10 01:32:49 nas3 kernel: [ 1463.905384] RBP: a59e0b51fa70 R08: 
97aa21824138 R09: 
  May 10 01:32:49 nas3 kernel: [ 1463.906051] R10: 97aa0f6e87e0 R11: 
97aae9073ff0 R12: 
  May 10 01:32:49 nas3 kernel: [ 1463.906738] R13: 97ada77feac0 R14: 
0003165b R15: 
  May 10 01:32:49 nas3 kernel: [ 1463.907411] FS:  7f06ceb61a40() 
GS:97b93f80() knlGS:
  May 10 01:32:49 nas3 kernel: [ 1463.908049] CS:  0010 DS:  ES:  CR0: 
80050033
  May 10 01:32:49 nas3 kernel: [ 1463.908697] CR2: 55c076d0d4f8 CR3: 
00029e6fe003 CR4: 001706f0
  May 10 01:32:49 nas3 kernel: [ 1463.909349] Call Trace:
  May 10 01:32:49 nas3 kernel: [ 1463.909989]  
  May 10 01:32:49 nas3 kernel: [ 1463.910624]  ? 
jbd2_journal_get_write_access+0x43/0x90
  May 10 01:32:49 nas3 kernel: [ 1463.911360]  ext4_xattr_set_handle+0x487/0x620
  May 10 01:32:49 nas3 kernel: [ 1463.912032]  __ext4_set_acl+0xc1/0x130
  May 10 01:32:49 nas3 kernel: [ 1463.912689]  ext4_init_acl+0xe8/0x160
  May 10 01:32:49 nas3 kernel: [ 1463.913327]  __ext4_new_inode+0xf60/0x14e0
  May 10 01:32:49 nas3 kernel: [ 1463.913962]  ? path_parentat+0x4c/0x90
  May 10 01:32:49 nas3 kernel: [ 1463.914595]  ext4_mkdir+0x157/0x330
  May 10 01:32:49 nas3 kernel: [ 1463.915265]  vfs_mkdir+0x142/0x200
  May 10 01:32:49 nas3 kernel: [ 1463.915883]  do_mkdirat+0x120/0x140
  May 10 01:32:49 nas3 kernel: [ 1463.916501]  __x64_sys_mkdirat+0x51/0x70
  May 10 01:32:49 nas3 kernel: [ 1463.917115]  do_syscall_64+0x5c/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.917733]  ? 
exit_to_user_mode_prepare+0x37/0xb0
  May 10 01:32:49 nas3 kernel: [ 1463.918365]  ? 
syscall_exit_to_user_mode+0x27/0x50
  May 10 01:32:49 nas3 kernel: [ 1463.919035]  ? __x64_sys_newfstatat+0x1c/0x20
  May 10 01:32:49 nas3 kernel: [ 1463.919665]  ? do_syscall_64+0x69/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.920300]  ? __x64_sys_newfstatat+0x1c/0x20
  May 10 01:32:49 nas3 kernel: [ 1463.920929]  ? do_syscall_64+0x69/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.921534]  ? __x64_sys_newfstatat+0x1c/0x20
  May 10 01:32:49 nas3 kernel: [ 1463.922121]  ? do_syscall_64+0x69/0xc0
  May 10 01:32:49 nas3 kernel: [ 1463.922703]  ? 
syscall_exit_to_user_mode+0x27/0x50