Re: 4.0 kernel XFS filesystem crash when running AIM7's disk workload
On 04/21/2015 05:59 PM, Dave Chinner wrote: On Tue, Apr 21, 2015 at 04:52:37PM -0400, Waiman Long wrote: On 04/17/2015 07:45 PM, Dave Chinner wrote: On Fri, Apr 17, 2015 at 01:38:49PM -0400, Waiman Long wrote: Hi Dave, When I was running the AIM7's disk workload on a 8-socket Westmere-EX server with 4.0 kernel, the kernel crash. A set of small ramdisks were created (ramdisk_size=271072). Those ramdisks were formatted with XFS filesystem before the test began. The kernel log was: XFS (ram12): Mounting V4 Filesystem XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram12): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. First thing you need to do is upgrade xfsprogs so that this message goes away. or use "mkfs.xfs -l size=10m" so that the log is larger than the minimum. XFS (ram15): Ending clean mount BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] __memcpy+0xd/0x110 PGD 29f7655f067 PUD 29f75a80067 PMD 0 Oops: [#1] SMP Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E) scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) Why do you have a mix of signed and unsigned modules loaded? I did the test on a RHEL 6.6 system. The 4.0 kernel is unsigned, but there are some additional RHEL modules loaded at boot up time. Wait, what? Do you have rhel 6.6 modules loaded into a 4.0 kernel? If so, I'd suggest you fix things so that doesn't happen before running any more tests... No, I didn't. I thought the system startup scripts may have loaded some additional kernel modules, but I didn't check to see if it is really the case. Anyway, this is not the issue that was causing problem that I saw. 823case XFS_DINODE_FMT_LOCAL: 824if ((iip->ili_fields& dataflag[whichfork])&& 0x23c0<+336>:movslq %ecx,%rcx 0x23c3<+339>:movswl 0x0(%rcx,%rcx,1),%eax 0x23cb<+347>:test %eax,0x90(%rdx) 0x23d1<+353>:je 0x2350 825(ifp->if_bytes> 0)) { 0x23d7<+359>:mov(%r10),%edx 0x23da<+362>:test %edx,%edx 0x23dc<+364>:jle0x2350 So the contents of rdx says that the inode fork size is 6 bytes in local format. The call location also indicates that it is the attribute fork that is in being flushed. The minimum size of the attr fork is 3 bytes - an empty header. However, then ext valid size has a second header that adds 4 bytes to the size, plus the bytes inteh attr name and value. Hence a size of 6 bytes is invalid, and probably indicates that there is some form of memory corruption going on here. IIRC, we haven't touched this code for a while - can you test 3.19 and see if it has the same problem? If it doesn't have the problem, and given you can reliably reproduce the crash, can you run a bisect to find the cause? I have done the bisection and the following commit in 3.13 is the one that cause the problem, I think: f7be2d7f594cbc7a00902b5427332a1ad519a528 xfs: push down inactive transaction mgmt for truncate I looked at the patch, and it didn't seem quite right, In what way? but I don't know much about the XFS internal to be sure. Maybe you can take a look at that. Doesn't actually seem very likely - that's mostly just a factoring patch, and it is called on every inode that is reclaimed from memory, so it's not like that code path doesn't get well tested So, I'm confused - I thought you were reporting a recent regression. Are you actually reporting a regression between a RHEL 6.6 kernel and the current mainline kernel? Is this the first time you've run this test on XFS on a kernel more recent than RHEL6.6? Details, please; they are important. http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F Cheers, Dave. I have just sent out a patch to fix this problem. Please let me know if there is any problem with the patch. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org
Re: 4.0 kernel XFS filesystem crash when running AIM7's disk workload
On 04/21/2015 05:59 PM, Dave Chinner wrote: On Tue, Apr 21, 2015 at 04:52:37PM -0400, Waiman Long wrote: On 04/17/2015 07:45 PM, Dave Chinner wrote: On Fri, Apr 17, 2015 at 01:38:49PM -0400, Waiman Long wrote: Hi Dave, When I was running the AIM7's disk workload on a 8-socket Westmere-EX server with 4.0 kernel, the kernel crash. A set of small ramdisks were created (ramdisk_size=271072). Those ramdisks were formatted with XFS filesystem before the test began. The kernel log was: XFS (ram12): Mounting V4 Filesystem XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram12): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. First thing you need to do is upgrade xfsprogs so that this message goes away. or use mkfs.xfs -l size=10m so that the log is larger than the minimum. XFS (ram15): Ending clean mount BUG: unable to handle kernel NULL pointer dereference at (null) IP: [812abd6d] __memcpy+0xd/0x110 PGD 29f7655f067 PUD 29f75a80067 PMD 0 Oops: [#1] SMP Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E) scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) Why do you have a mix of signed and unsigned modules loaded? I did the test on a RHEL 6.6 system. The 4.0 kernel is unsigned, but there are some additional RHEL modules loaded at boot up time. Wait, what? Do you have rhel 6.6 modules loaded into a 4.0 kernel? If so, I'd suggest you fix things so that doesn't happen before running any more tests... No, I didn't. I thought the system startup scripts may have loaded some additional kernel modules, but I didn't check to see if it is really the case. Anyway, this is not the issue that was causing problem that I saw. 823case XFS_DINODE_FMT_LOCAL: 824if ((iip-ili_fields dataflag[whichfork]) 0x23c0+336:movslq %ecx,%rcx 0x23c3+339:movswl 0x0(%rcx,%rcx,1),%eax 0x23cb+347:test %eax,0x90(%rdx) 0x23d1+353:je 0x2350xfs_iflush_fork+224 825(ifp-if_bytes 0)) { 0x23d7+359:mov(%r10),%edx 0x23da+362:test %edx,%edx 0x23dc+364:jle0x2350xfs_iflush_fork+224 So the contents of rdx says that the inode fork size is 6 bytes in local format. The call location also indicates that it is the attribute fork that is in being flushed. The minimum size of the attr fork is 3 bytes - an empty header. However, then ext valid size has a second header that adds 4 bytes to the size, plus the bytes inteh attr name and value. Hence a size of 6 bytes is invalid, and probably indicates that there is some form of memory corruption going on here. IIRC, we haven't touched this code for a while - can you test 3.19 and see if it has the same problem? If it doesn't have the problem, and given you can reliably reproduce the crash, can you run a bisect to find the cause? I have done the bisection and the following commit in 3.13 is the one that cause the problem, I think: f7be2d7f594cbc7a00902b5427332a1ad519a528 xfs: push down inactive transaction mgmt for truncate I looked at the patch, and it didn't seem quite right, In what way? but I don't know much about the XFS internal to be sure. Maybe you can take a look at that. Doesn't actually seem very likely - that's mostly just a factoring patch, and it is called on every inode that is reclaimed from memory, so it's not like that code path doesn't get well tested So, I'm confused - I thought you were reporting a recent regression. Are you actually reporting a regression between a RHEL 6.6 kernel and the current mainline kernel? Is this the first time you've run this test on XFS on a kernel more recent than RHEL6.6? Details, please; they are important. http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F Cheers, Dave. I have just sent out a patch to fix this problem. Please let me know if there is any problem with the patch. Cheers, Longman -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a
Re: 4.0 kernel XFS filesystem crash when running AIM7's disk workload
On Tue, Apr 21, 2015 at 04:52:37PM -0400, Waiman Long wrote: > On 04/17/2015 07:45 PM, Dave Chinner wrote: > >On Fri, Apr 17, 2015 at 01:38:49PM -0400, Waiman Long wrote: > >>Hi Dave, > >> > >>When I was running the AIM7's disk workload on a 8-socket > >>Westmere-EX server with 4.0 kernel, the kernel crash. A set of small > >>ramdisks were created (ramdisk_size=271072). Those ramdisks were > >>formatted with XFS filesystem before the test began. The kernel log > >>was: > >> > >>XFS (ram12): Mounting V4 Filesystem > >>XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks > >>XFS (ram12): Log size out of supported range. Continuing onwards, > >>but if log hangs are > >>experienced then please report this message in the bug report. > >First thing you need to do is upgrade xfsprogs so that this message > >goes away. or use "mkfs.xfs -l size=10m" so that the log is larger > >than the minimum. > > > >>XFS (ram15): Ending clean mount > >>BUG: unable to handle kernel NULL pointer dereference at (null) > >>IP: [] __memcpy+0xd/0x110 > >>PGD 29f7655f067 PUD 29f75a80067 PMD 0 > >>Oops: [#1] SMP > >>Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables > >>xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT > >>nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter > >>ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 > >>nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 > >>vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si > >>ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi > >>acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan > >>udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich > >>mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E) > >>jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E) > >>scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) > >>radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) > >>i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > >Why do you have a mix of signed and unsigned modules loaded? > > I did the test on a RHEL 6.6 system. The 4.0 kernel is unsigned, but > there are some additional RHEL modules loaded at boot up time. Wait, what? Do you have rhel 6.6 modules loaded into a 4.0 kernel? If so, I'd suggest you fix things so that doesn't happen before running any more tests... > >>CPU: 69 PID: 116603 Comm: xfsaild/ram5 Tainted: GE 4.0.0 #2 > >>Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 > >>task: 8b9f7eeb4f80 ti: 8b9f7f1ac000 task.ti: 8b9f7f1ac000 > >>RIP: 0010:[] [] __memcpy+0xd/0x110 > >>RSP: 0018:8b9f7f1afc10 EFLAGS: 00010206 > >>RAX: 88102476a3cc RBX: 889ff2ab5000 RCX: 0005 > >>RDX: 0006 RSI: RDI: 88102476a3cc > >edx = 6 bytes. > > > >>RBP: 8b9f7f1afc18 R08: 0001 R09: 88102476a3cc > >>R10: 8a1f6c03ea80 R11: R12: 8b1ff1269400 > >>R13: 8b1f64837c98 R14: 881038701200 R15: 88102476a300 > >>FS: () GS:8b1fffa4() knlGS: > >>CS: 0010 DS: ES: CR0: 8005003b > >>CR2: CR3: 029f7655e000 CR4: 06e0 > >>Stack: > >> a0ca8c41 8b9f7f1afc68 a0cc4803 8b9f7f1afc68 > >> a0cd2777 8b9f7f1afc68 8b1ff1269400 8a9f59022800 > >> 8b1f7c932718 0003 8a9f590228e4 8b9f7f1afce8 > >>Call Trace: > >> [] ? xfs_iflush_fork+0x181/0x240 [xfs] > >> [] xfs_iflush_int+0x1f3/0x320 [xfs] > >> [] ? kmem_alloc+0x87/0x100 [xfs] > >> [] xfs_iflush_cluster+0x295/0x380 [xfs] > >> [] xfs_iflush+0xf4/0x1f0 [xfs] > >> [] xfs_inode_item_push+0xea/0x130 [xfs] > >> [] xfsaild_push+0x10d/0x500 [xfs] > >> [] ? lock_timer_base+0x70/0x70 > >> [] xfsaild+0x98/0x130 [xfs] > >> [] ? xfsaild_push+0x500/0x500 [xfs] > >> [] ? xfsaild_push+0x500/0x500 [xfs] > >> [] ? xfsaild_push+0x500/0x500 [xfs] > >> [] ? kthread_freezable_should_stop+0x70/0x70 > >> [] ret_from_fork+0x58/0x90 > >> [] ? kthread_freezable_should_stop+0x70/0x70 > >>Code: 0f b6 c0 5b c9 c3 0f 1f 84 00 00 00 00 00 e8 2b f9 ff ff 80 7b > >>25 00 74 c8 eb d3 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 > >> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c > >>RIP [] __memcpy+0xd/0x110 > >> RSP > >>CR2: > >>---[ end trace fb8a4add69562a76 ]--- > >> > >>The xfs_iflush_fork+0x181/0x240 (385) IP address is at: > >> > >(rearrange slightly to make more sense) > > > >>823case XFS_DINODE_FMT_LOCAL: > >>824if ((iip->ili_fields& dataflag[whichfork])&& > >>0x23c0<+336>:movslq %ecx,%rcx > >>0x23c3<+339>:movswl 0x0(%rcx,%rcx,1),%eax > >>0x23cb<+347>:test %eax,0x90(%rdx) > >>0x23d1<+353>:je 0x2350 > >> > >>825
Re: 4.0 kernel XFS filesystem crash when running AIM7's disk workload
On 04/17/2015 07:45 PM, Dave Chinner wrote: On Fri, Apr 17, 2015 at 01:38:49PM -0400, Waiman Long wrote: Hi Dave, When I was running the AIM7's disk workload on a 8-socket Westmere-EX server with 4.0 kernel, the kernel crash. A set of small ramdisks were created (ramdisk_size=271072). Those ramdisks were formatted with XFS filesystem before the test began. The kernel log was: XFS (ram12): Mounting V4 Filesystem XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram12): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. First thing you need to do is upgrade xfsprogs so that this message goes away. or use "mkfs.xfs -l size=10m" so that the log is larger than the minimum. XFS (ram15): Ending clean mount BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] __memcpy+0xd/0x110 PGD 29f7655f067 PUD 29f75a80067 PMD 0 Oops: [#1] SMP Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E) scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) Why do you have a mix of signed and unsigned modules loaded? I did the test on a RHEL 6.6 system. The 4.0 kernel is unsigned, but there are some additional RHEL modules loaded at boot up time. CPU: 69 PID: 116603 Comm: xfsaild/ram5 Tainted: GE 4.0.0 #2 Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 task: 8b9f7eeb4f80 ti: 8b9f7f1ac000 task.ti: 8b9f7f1ac000 RIP: 0010:[] [] __memcpy+0xd/0x110 RSP: 0018:8b9f7f1afc10 EFLAGS: 00010206 RAX: 88102476a3cc RBX: 889ff2ab5000 RCX: 0005 RDX: 0006 RSI: RDI: 88102476a3cc edx = 6 bytes. RBP: 8b9f7f1afc18 R08: 0001 R09: 88102476a3cc R10: 8a1f6c03ea80 R11: R12: 8b1ff1269400 R13: 8b1f64837c98 R14: 881038701200 R15: 88102476a300 FS: () GS:8b1fffa4() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 029f7655e000 CR4: 06e0 Stack: a0ca8c41 8b9f7f1afc68 a0cc4803 8b9f7f1afc68 a0cd2777 8b9f7f1afc68 8b1ff1269400 8a9f59022800 8b1f7c932718 0003 8a9f590228e4 8b9f7f1afce8 Call Trace: [] ? xfs_iflush_fork+0x181/0x240 [xfs] [] xfs_iflush_int+0x1f3/0x320 [xfs] [] ? kmem_alloc+0x87/0x100 [xfs] [] xfs_iflush_cluster+0x295/0x380 [xfs] [] xfs_iflush+0xf4/0x1f0 [xfs] [] xfs_inode_item_push+0xea/0x130 [xfs] [] xfsaild_push+0x10d/0x500 [xfs] [] ? lock_timer_base+0x70/0x70 [] xfsaild+0x98/0x130 [xfs] [] ? xfsaild_push+0x500/0x500 [xfs] [] ? xfsaild_push+0x500/0x500 [xfs] [] ? xfsaild_push+0x500/0x500 [xfs] [] ? kthread_freezable_should_stop+0x70/0x70 [] ret_from_fork+0x58/0x90 [] ? kthread_freezable_should_stop+0x70/0x70 Code: 0f b6 c0 5b c9 c3 0f 1f 84 00 00 00 00 00 e8 2b f9 ff ff 80 7b 25 00 74 c8 eb d3 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c RIP [] __memcpy+0xd/0x110 RSP CR2: ---[ end trace fb8a4add69562a76 ]--- The xfs_iflush_fork+0x181/0x240 (385) IP address is at: (rearrange slightly to make more sense) 823case XFS_DINODE_FMT_LOCAL: 824if ((iip->ili_fields& dataflag[whichfork])&& 0x23c0<+336>:movslq %ecx,%rcx 0x23c3<+339>:movswl 0x0(%rcx,%rcx,1),%eax 0x23cb<+347>:test %eax,0x90(%rdx) 0x23d1<+353>:je 0x2350 825(ifp->if_bytes> 0)) { 0x23d7<+359>:mov(%r10),%edx 0x23da<+362>:test %edx,%edx 0x23dc<+364>:jle0x2350 So the contents of rdx says that the inode fork size is 6 bytes in local format. The call location also indicates that it is the attribute fork that is in being flushed. The minimum size of the attr fork is 3 bytes - an empty header. However, then ext valid size has a second header that adds 4 bytes to the size, plus the bytes inteh attr name and value. Hence a size of 6 bytes is invalid, and probably
Re: 4.0 kernel XFS filesystem crash when running AIM7's disk workload
On Tue, Apr 21, 2015 at 04:52:37PM -0400, Waiman Long wrote: On 04/17/2015 07:45 PM, Dave Chinner wrote: On Fri, Apr 17, 2015 at 01:38:49PM -0400, Waiman Long wrote: Hi Dave, When I was running the AIM7's disk workload on a 8-socket Westmere-EX server with 4.0 kernel, the kernel crash. A set of small ramdisks were created (ramdisk_size=271072). Those ramdisks were formatted with XFS filesystem before the test began. The kernel log was: XFS (ram12): Mounting V4 Filesystem XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram12): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. First thing you need to do is upgrade xfsprogs so that this message goes away. or use mkfs.xfs -l size=10m so that the log is larger than the minimum. XFS (ram15): Ending clean mount BUG: unable to handle kernel NULL pointer dereference at (null) IP: [812abd6d] __memcpy+0xd/0x110 PGD 29f7655f067 PUD 29f75a80067 PMD 0 Oops: [#1] SMP Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E) scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) Why do you have a mix of signed and unsigned modules loaded? I did the test on a RHEL 6.6 system. The 4.0 kernel is unsigned, but there are some additional RHEL modules loaded at boot up time. Wait, what? Do you have rhel 6.6 modules loaded into a 4.0 kernel? If so, I'd suggest you fix things so that doesn't happen before running any more tests... CPU: 69 PID: 116603 Comm: xfsaild/ram5 Tainted: GE 4.0.0 #2 Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 task: 8b9f7eeb4f80 ti: 8b9f7f1ac000 task.ti: 8b9f7f1ac000 RIP: 0010:[812abd6d] [812abd6d] __memcpy+0xd/0x110 RSP: 0018:8b9f7f1afc10 EFLAGS: 00010206 RAX: 88102476a3cc RBX: 889ff2ab5000 RCX: 0005 RDX: 0006 RSI: RDI: 88102476a3cc edx = 6 bytes. RBP: 8b9f7f1afc18 R08: 0001 R09: 88102476a3cc R10: 8a1f6c03ea80 R11: R12: 8b1ff1269400 R13: 8b1f64837c98 R14: 881038701200 R15: 88102476a300 FS: () GS:8b1fffa4() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 029f7655e000 CR4: 06e0 Stack: a0ca8c41 8b9f7f1afc68 a0cc4803 8b9f7f1afc68 a0cd2777 8b9f7f1afc68 8b1ff1269400 8a9f59022800 8b1f7c932718 0003 8a9f590228e4 8b9f7f1afce8 Call Trace: [a0ca8c41] ? xfs_iflush_fork+0x181/0x240 [xfs] [a0cc4803] xfs_iflush_int+0x1f3/0x320 [xfs] [a0cd2777] ? kmem_alloc+0x87/0x100 [xfs] [a0cc60a5] xfs_iflush_cluster+0x295/0x380 [xfs] [a0cc8ff4] xfs_iflush+0xf4/0x1f0 [xfs] [a0cda22a] xfs_inode_item_push+0xea/0x130 [xfs] [a0ce140d] xfsaild_push+0x10d/0x500 [xfs] [810b7c20] ? lock_timer_base+0x70/0x70 [a0ce1898] xfsaild+0x98/0x130 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [81074b50] ? kthread_freezable_should_stop+0x70/0x70 [815c5748] ret_from_fork+0x58/0x90 [81074b50] ? kthread_freezable_should_stop+0x70/0x70 Code: 0f b6 c0 5b c9 c3 0f 1f 84 00 00 00 00 00 e8 2b f9 ff ff 80 7b 25 00 74 c8 eb d3 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c RIP [812abd6d] __memcpy+0xd/0x110 RSP8b9f7f1afc10 CR2: ---[ end trace fb8a4add69562a76 ]--- The xfs_iflush_fork+0x181/0x240 (385) IP address is at: (rearrange slightly to make more sense) 823case XFS_DINODE_FMT_LOCAL: 824if ((iip-ili_fields dataflag[whichfork]) 0x23c0+336:movslq %ecx,%rcx 0x23c3+339:movswl 0x0(%rcx,%rcx,1),%eax 0x23cb+347:test %eax,0x90(%rdx) 0x23d1+353:je
Re: 4.0 kernel XFS filesystem crash when running AIM7's disk workload
On 04/17/2015 07:45 PM, Dave Chinner wrote: On Fri, Apr 17, 2015 at 01:38:49PM -0400, Waiman Long wrote: Hi Dave, When I was running the AIM7's disk workload on a 8-socket Westmere-EX server with 4.0 kernel, the kernel crash. A set of small ramdisks were created (ramdisk_size=271072). Those ramdisks were formatted with XFS filesystem before the test began. The kernel log was: XFS (ram12): Mounting V4 Filesystem XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram12): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. First thing you need to do is upgrade xfsprogs so that this message goes away. or use mkfs.xfs -l size=10m so that the log is larger than the minimum. XFS (ram15): Ending clean mount BUG: unable to handle kernel NULL pointer dereference at (null) IP: [812abd6d] __memcpy+0xd/0x110 PGD 29f7655f067 PUD 29f75a80067 PMD 0 Oops: [#1] SMP Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E) scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) Why do you have a mix of signed and unsigned modules loaded? I did the test on a RHEL 6.6 system. The 4.0 kernel is unsigned, but there are some additional RHEL modules loaded at boot up time. CPU: 69 PID: 116603 Comm: xfsaild/ram5 Tainted: GE 4.0.0 #2 Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 task: 8b9f7eeb4f80 ti: 8b9f7f1ac000 task.ti: 8b9f7f1ac000 RIP: 0010:[812abd6d] [812abd6d] __memcpy+0xd/0x110 RSP: 0018:8b9f7f1afc10 EFLAGS: 00010206 RAX: 88102476a3cc RBX: 889ff2ab5000 RCX: 0005 RDX: 0006 RSI: RDI: 88102476a3cc edx = 6 bytes. RBP: 8b9f7f1afc18 R08: 0001 R09: 88102476a3cc R10: 8a1f6c03ea80 R11: R12: 8b1ff1269400 R13: 8b1f64837c98 R14: 881038701200 R15: 88102476a300 FS: () GS:8b1fffa4() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 029f7655e000 CR4: 06e0 Stack: a0ca8c41 8b9f7f1afc68 a0cc4803 8b9f7f1afc68 a0cd2777 8b9f7f1afc68 8b1ff1269400 8a9f59022800 8b1f7c932718 0003 8a9f590228e4 8b9f7f1afce8 Call Trace: [a0ca8c41] ? xfs_iflush_fork+0x181/0x240 [xfs] [a0cc4803] xfs_iflush_int+0x1f3/0x320 [xfs] [a0cd2777] ? kmem_alloc+0x87/0x100 [xfs] [a0cc60a5] xfs_iflush_cluster+0x295/0x380 [xfs] [a0cc8ff4] xfs_iflush+0xf4/0x1f0 [xfs] [a0cda22a] xfs_inode_item_push+0xea/0x130 [xfs] [a0ce140d] xfsaild_push+0x10d/0x500 [xfs] [810b7c20] ? lock_timer_base+0x70/0x70 [a0ce1898] xfsaild+0x98/0x130 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [81074b50] ? kthread_freezable_should_stop+0x70/0x70 [815c5748] ret_from_fork+0x58/0x90 [81074b50] ? kthread_freezable_should_stop+0x70/0x70 Code: 0f b6 c0 5b c9 c3 0f 1f 84 00 00 00 00 00 e8 2b f9 ff ff 80 7b 25 00 74 c8 eb d3 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c RIP [812abd6d] __memcpy+0xd/0x110 RSP8b9f7f1afc10 CR2: ---[ end trace fb8a4add69562a76 ]--- The xfs_iflush_fork+0x181/0x240 (385) IP address is at: (rearrange slightly to make more sense) 823case XFS_DINODE_FMT_LOCAL: 824if ((iip-ili_fields dataflag[whichfork]) 0x23c0+336:movslq %ecx,%rcx 0x23c3+339:movswl 0x0(%rcx,%rcx,1),%eax 0x23cb+347:test %eax,0x90(%rdx) 0x23d1+353:je 0x2350xfs_iflush_fork+224 825(ifp-if_bytes 0)) { 0x23d7+359:mov(%r10),%edx 0x23da+362:test %edx,%edx 0x23dc+364:jle0x2350xfs_iflush_fork+224 So the contents of rdx says that the inode fork size is 6 bytes in local
Re: 4.0 kernel XFS filesystem crash when running AIM7's disk workload
On Fri, Apr 17, 2015 at 01:38:49PM -0400, Waiman Long wrote: > Hi Dave, > > When I was running the AIM7's disk workload on a 8-socket > Westmere-EX server with 4.0 kernel, the kernel crash. A set of small > ramdisks were created (ramdisk_size=271072). Those ramdisks were > formatted with XFS filesystem before the test began. The kernel log > was: > > XFS (ram12): Mounting V4 Filesystem > XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks > XFS (ram12): Log size out of supported range. Continuing onwards, > but if log hangs are > experienced then please report this message in the bug report. First thing you need to do is upgrade xfsprogs so that this message goes away. or use "mkfs.xfs -l size=10m" so that the log is larger than the minimum. > XFS (ram15): Ending clean mount > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [] __memcpy+0xd/0x110 > PGD 29f7655f067 PUD 29f75a80067 PMD 0 > Oops: [#1] SMP > Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables > xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT > nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter > ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 > nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 > vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si > ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi > acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan > udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich > mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E) > jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E) > scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) > radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) > i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) Why do you have a mix of signed and unsigned modules loaded? > CPU: 69 PID: 116603 Comm: xfsaild/ram5 Tainted: GE 4.0.0 #2 > Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 > task: 8b9f7eeb4f80 ti: 8b9f7f1ac000 task.ti: 8b9f7f1ac000 > RIP: 0010:[] [] __memcpy+0xd/0x110 > RSP: 0018:8b9f7f1afc10 EFLAGS: 00010206 > RAX: 88102476a3cc RBX: 889ff2ab5000 RCX: 0005 > RDX: 0006 RSI: RDI: 88102476a3cc edx = 6 bytes. > RBP: 8b9f7f1afc18 R08: 0001 R09: 88102476a3cc > R10: 8a1f6c03ea80 R11: R12: 8b1ff1269400 > R13: 8b1f64837c98 R14: 881038701200 R15: 88102476a300 > FS: () GS:8b1fffa4() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: CR3: 029f7655e000 CR4: 06e0 > Stack: > a0ca8c41 8b9f7f1afc68 a0cc4803 8b9f7f1afc68 > a0cd2777 8b9f7f1afc68 8b1ff1269400 8a9f59022800 > 8b1f7c932718 0003 8a9f590228e4 8b9f7f1afce8 > Call Trace: > [] ? xfs_iflush_fork+0x181/0x240 [xfs] > [] xfs_iflush_int+0x1f3/0x320 [xfs] > [] ? kmem_alloc+0x87/0x100 [xfs] > [] xfs_iflush_cluster+0x295/0x380 [xfs] > [] xfs_iflush+0xf4/0x1f0 [xfs] > [] xfs_inode_item_push+0xea/0x130 [xfs] > [] xfsaild_push+0x10d/0x500 [xfs] > [] ? lock_timer_base+0x70/0x70 > [] xfsaild+0x98/0x130 [xfs] > [] ? xfsaild_push+0x500/0x500 [xfs] > [] ? xfsaild_push+0x500/0x500 [xfs] > [] ? xfsaild_push+0x500/0x500 [xfs] > [] ? kthread_freezable_should_stop+0x70/0x70 > [] ret_from_fork+0x58/0x90 > [] ? kthread_freezable_should_stop+0x70/0x70 > Code: 0f b6 c0 5b c9 c3 0f 1f 84 00 00 00 00 00 e8 2b f9 ff ff 80 7b > 25 00 74 c8 eb d3 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 > 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c > RIP [] __memcpy+0xd/0x110 > RSP > CR2: > ---[ end trace fb8a4add69562a76 ]--- > > The xfs_iflush_fork+0x181/0x240 (385) IP address is at: > (rearrange slightly to make more sense) > 823case XFS_DINODE_FMT_LOCAL: > 824if ((iip->ili_fields & dataflag[whichfork]) && >0x23c0 <+336>:movslq %ecx,%rcx >0x23c3 <+339>:movswl 0x0(%rcx,%rcx,1),%eax >0x23cb <+347>:test %eax,0x90(%rdx) >0x23d1 <+353>:je 0x2350 > > 825(ifp->if_bytes > 0)) { >0x23d7 <+359>:mov(%r10),%edx >0x23da <+362>:test %edx,%edx >0x23dc <+364>:jle0x2350 So the contents of rdx says that the inode fork size is 6 bytes in local format. The call location also indicates that it is the attribute fork that is in being flushed. The minimum size of the attr fork is 3 bytes - an empty header. However, then ext valid size has a second header that adds 4 bytes to the size, plus the bytes inteh attr name and value. Hence a size of 6 bytes is invalid, and probably indicates that there is some form of
4.0 kernel XFS filesystem crash when running AIM7's disk workload
Hi Dave, When I was running the AIM7's disk workload on a 8-socket Westmere-EX server with 4.0 kernel, the kernel crash. A set of small ramdisks were created (ramdisk_size=271072). Those ramdisks were formatted with XFS filesystem before the test began. The kernel log was: XFS (ram12): Mounting V4 Filesystem XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram12): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. XFS (ram12): Ending clean mount XFS (ram13): Mounting V4 Filesystem XFS (ram13): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram13): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. XFS (ram13): Ending clean mount XFS (ram14): Mounting V4 Filesystem XFS (ram14): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram14): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. XFS (ram14): Ending clean mount XFS (ram15): Mounting V4 Filesystem XFS (ram15): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram15): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. XFS (ram15): Ending clean mount BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] __memcpy+0xd/0x110 PGD 29f7655f067 PUD 29f75a80067 PMD 0 Oops: [#1] SMP Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E) scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) CPU: 69 PID: 116603 Comm: xfsaild/ram5 Tainted: GE 4.0.0 #2 Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 task: 8b9f7eeb4f80 ti: 8b9f7f1ac000 task.ti: 8b9f7f1ac000 RIP: 0010:[] [] __memcpy+0xd/0x110 RSP: 0018:8b9f7f1afc10 EFLAGS: 00010206 RAX: 88102476a3cc RBX: 889ff2ab5000 RCX: 0005 RDX: 0006 RSI: RDI: 88102476a3cc RBP: 8b9f7f1afc18 R08: 0001 R09: 88102476a3cc R10: 8a1f6c03ea80 R11: R12: 8b1ff1269400 R13: 8b1f64837c98 R14: 881038701200 R15: 88102476a300 FS: () GS:8b1fffa4() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 029f7655e000 CR4: 06e0 Stack: a0ca8c41 8b9f7f1afc68 a0cc4803 8b9f7f1afc68 a0cd2777 8b9f7f1afc68 8b1ff1269400 8a9f59022800 8b1f7c932718 0003 8a9f590228e4 8b9f7f1afce8 Call Trace: [] ? xfs_iflush_fork+0x181/0x240 [xfs] [] xfs_iflush_int+0x1f3/0x320 [xfs] [] ? kmem_alloc+0x87/0x100 [xfs] [] xfs_iflush_cluster+0x295/0x380 [xfs] [] xfs_iflush+0xf4/0x1f0 [xfs] [] xfs_inode_item_push+0xea/0x130 [xfs] [] xfsaild_push+0x10d/0x500 [xfs] [] ? lock_timer_base+0x70/0x70 [] xfsaild+0x98/0x130 [xfs] [] ? xfsaild_push+0x500/0x500 [xfs] [] ? xfsaild_push+0x500/0x500 [xfs] [] ? xfsaild_push+0x500/0x500 [xfs] [] ? kthread_freezable_should_stop+0x70/0x70 [] ret_from_fork+0x58/0x90 [] ? kthread_freezable_should_stop+0x70/0x70 Code: 0f b6 c0 5b c9 c3 0f 1f 84 00 00 00 00 00 e8 2b f9 ff ff 80 7b 25 00 74 c8 eb d3 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c RIP [] __memcpy+0xd/0x110 RSP CR2: ---[ end trace fb8a4add69562a76 ]--- The xfs_iflush_fork+0x181/0x240 (385) IP address is at: 823case XFS_DINODE_FMT_LOCAL: 824if ((iip->ili_fields & dataflag[whichfork]) && 0x23c0 <+336>:movslq %ecx,%rcx 0x23c3 <+339>:movswl 0x0(%rcx,%rcx,1),%eax 0x23cb <+347>:test %eax,0x90(%rdx) 0x23d1 <+353>:je 0x2350 0x23da <+362>:test %edx,%edx 0x23dc <+364>:jle0x2350 825(ifp->if_bytes > 0)) { 0x23d7 <+359>:mov(%r10),%edx 826ASSERT(ifp->if_u1.if_data !=
4.0 kernel XFS filesystem crash when running AIM7's disk workload
Hi Dave, When I was running the AIM7's disk workload on a 8-socket Westmere-EX server with 4.0 kernel, the kernel crash. A set of small ramdisks were created (ramdisk_size=271072). Those ramdisks were formatted with XFS filesystem before the test began. The kernel log was: XFS (ram12): Mounting V4 Filesystem XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram12): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. XFS (ram12): Ending clean mount XFS (ram13): Mounting V4 Filesystem XFS (ram13): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram13): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. XFS (ram13): Ending clean mount XFS (ram14): Mounting V4 Filesystem XFS (ram14): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram14): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. XFS (ram14): Ending clean mount XFS (ram15): Mounting V4 Filesystem XFS (ram15): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram15): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. XFS (ram15): Ending clean mount BUG: unable to handle kernel NULL pointer dereference at (null) IP: [812abd6d] __memcpy+0xd/0x110 PGD 29f7655f067 PUD 29f75a80067 PMD 0 Oops: [#1] SMP Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E) scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) CPU: 69 PID: 116603 Comm: xfsaild/ram5 Tainted: GE 4.0.0 #2 Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 task: 8b9f7eeb4f80 ti: 8b9f7f1ac000 task.ti: 8b9f7f1ac000 RIP: 0010:[812abd6d] [812abd6d] __memcpy+0xd/0x110 RSP: 0018:8b9f7f1afc10 EFLAGS: 00010206 RAX: 88102476a3cc RBX: 889ff2ab5000 RCX: 0005 RDX: 0006 RSI: RDI: 88102476a3cc RBP: 8b9f7f1afc18 R08: 0001 R09: 88102476a3cc R10: 8a1f6c03ea80 R11: R12: 8b1ff1269400 R13: 8b1f64837c98 R14: 881038701200 R15: 88102476a300 FS: () GS:8b1fffa4() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 029f7655e000 CR4: 06e0 Stack: a0ca8c41 8b9f7f1afc68 a0cc4803 8b9f7f1afc68 a0cd2777 8b9f7f1afc68 8b1ff1269400 8a9f59022800 8b1f7c932718 0003 8a9f590228e4 8b9f7f1afce8 Call Trace: [a0ca8c41] ? xfs_iflush_fork+0x181/0x240 [xfs] [a0cc4803] xfs_iflush_int+0x1f3/0x320 [xfs] [a0cd2777] ? kmem_alloc+0x87/0x100 [xfs] [a0cc60a5] xfs_iflush_cluster+0x295/0x380 [xfs] [a0cc8ff4] xfs_iflush+0xf4/0x1f0 [xfs] [a0cda22a] xfs_inode_item_push+0xea/0x130 [xfs] [a0ce140d] xfsaild_push+0x10d/0x500 [xfs] [810b7c20] ? lock_timer_base+0x70/0x70 [a0ce1898] xfsaild+0x98/0x130 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [81074b50] ? kthread_freezable_should_stop+0x70/0x70 [815c5748] ret_from_fork+0x58/0x90 [81074b50] ? kthread_freezable_should_stop+0x70/0x70 Code: 0f b6 c0 5b c9 c3 0f 1f 84 00 00 00 00 00 e8 2b f9 ff ff 80 7b 25 00 74 c8 eb d3 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c RIP [812abd6d] __memcpy+0xd/0x110 RSP 8b9f7f1afc10 CR2: ---[ end trace fb8a4add69562a76 ]--- The xfs_iflush_fork+0x181/0x240 (385) IP address is at: 823case XFS_DINODE_FMT_LOCAL: 824if ((iip-ili_fields dataflag[whichfork]) 0x23c0 +336:movslq %ecx,%rcx 0x23c3 +339:movswl 0x0(%rcx,%rcx,1),%eax 0x23cb +347:
Re: 4.0 kernel XFS filesystem crash when running AIM7's disk workload
On Fri, Apr 17, 2015 at 01:38:49PM -0400, Waiman Long wrote: Hi Dave, When I was running the AIM7's disk workload on a 8-socket Westmere-EX server with 4.0 kernel, the kernel crash. A set of small ramdisks were created (ramdisk_size=271072). Those ramdisks were formatted with XFS filesystem before the test began. The kernel log was: XFS (ram12): Mounting V4 Filesystem XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks XFS (ram12): Log size out of supported range. Continuing onwards, but if log hangs are experienced then please report this message in the bug report. First thing you need to do is upgrade xfsprogs so that this message goes away. or use mkfs.xfs -l size=10m so that the log is larger than the minimum. XFS (ram15): Ending clean mount BUG: unable to handle kernel NULL pointer dereference at (null) IP: [812abd6d] __memcpy+0xd/0x110 PGD 29f7655f067 PUD 29f75a80067 PMD 0 Oops: [#1] SMP Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E) scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) Why do you have a mix of signed and unsigned modules loaded? CPU: 69 PID: 116603 Comm: xfsaild/ram5 Tainted: GE 4.0.0 #2 Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 task: 8b9f7eeb4f80 ti: 8b9f7f1ac000 task.ti: 8b9f7f1ac000 RIP: 0010:[812abd6d] [812abd6d] __memcpy+0xd/0x110 RSP: 0018:8b9f7f1afc10 EFLAGS: 00010206 RAX: 88102476a3cc RBX: 889ff2ab5000 RCX: 0005 RDX: 0006 RSI: RDI: 88102476a3cc edx = 6 bytes. RBP: 8b9f7f1afc18 R08: 0001 R09: 88102476a3cc R10: 8a1f6c03ea80 R11: R12: 8b1ff1269400 R13: 8b1f64837c98 R14: 881038701200 R15: 88102476a300 FS: () GS:8b1fffa4() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 029f7655e000 CR4: 06e0 Stack: a0ca8c41 8b9f7f1afc68 a0cc4803 8b9f7f1afc68 a0cd2777 8b9f7f1afc68 8b1ff1269400 8a9f59022800 8b1f7c932718 0003 8a9f590228e4 8b9f7f1afce8 Call Trace: [a0ca8c41] ? xfs_iflush_fork+0x181/0x240 [xfs] [a0cc4803] xfs_iflush_int+0x1f3/0x320 [xfs] [a0cd2777] ? kmem_alloc+0x87/0x100 [xfs] [a0cc60a5] xfs_iflush_cluster+0x295/0x380 [xfs] [a0cc8ff4] xfs_iflush+0xf4/0x1f0 [xfs] [a0cda22a] xfs_inode_item_push+0xea/0x130 [xfs] [a0ce140d] xfsaild_push+0x10d/0x500 [xfs] [810b7c20] ? lock_timer_base+0x70/0x70 [a0ce1898] xfsaild+0x98/0x130 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [a0ce1800] ? xfsaild_push+0x500/0x500 [xfs] [81074b50] ? kthread_freezable_should_stop+0x70/0x70 [815c5748] ret_from_fork+0x58/0x90 [81074b50] ? kthread_freezable_should_stop+0x70/0x70 Code: 0f b6 c0 5b c9 c3 0f 1f 84 00 00 00 00 00 e8 2b f9 ff ff 80 7b 25 00 74 c8 eb d3 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c RIP [812abd6d] __memcpy+0xd/0x110 RSP 8b9f7f1afc10 CR2: ---[ end trace fb8a4add69562a76 ]--- The xfs_iflush_fork+0x181/0x240 (385) IP address is at: (rearrange slightly to make more sense) 823case XFS_DINODE_FMT_LOCAL: 824if ((iip-ili_fields dataflag[whichfork]) 0x23c0 +336:movslq %ecx,%rcx 0x23c3 +339:movswl 0x0(%rcx,%rcx,1),%eax 0x23cb +347:test %eax,0x90(%rdx) 0x23d1 +353:je 0x2350 xfs_iflush_fork+224 825(ifp-if_bytes 0)) { 0x23d7 +359:mov(%r10),%edx 0x23da +362:test %edx,%edx 0x23dc +364:jle0x2350 xfs_iflush_fork+224 So the contents of rdx says that the inode fork size is 6 bytes in local format. The call location also indicates that it is the attribute fork that is in being flushed. The minimum size of