Re: skd: disable discard support
On Wed, Feb 12 2014 at 5:18pm -0500, Mike Snitzer wrote: > The skd driver has never handled discards reliably. > > The kernel will BUG as a result of issuing discards to the skd device. > Disable the skd driver's discard support until it is proven reliable. > > The device-mapper-test-suite test that exposed this bug just issues a > discard that covers a portion of the skd device that was previously > written through a dm-thin device. The discard spans the entire 1GB thin > device (logical sector 0 through 2097152). > > dmtest run --profile stec --suite thin-provisioning -n > /discard_fully_provisioned_device/ I retested after applying these linux-block.git commits ontop of 3.14-rc1: 5cb8850c9c4a block: Explicitly handle discard/write same segments 8423ae3d7a3c block: Fix cloning of discard/write same bios And got this: request botched: dev skd0: type=1, flags=12248081 sector 8390784, nr/cnr 0/128 bio 88033169cba0, biotail 88032e42bb60, buffer (null), len 0 [ cut here ] kernel BUG at block/blk-core.c:2693! invalid opcode: [#1] SMP Modules linked in: dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 target_core_iblock t arget_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 8021q libfc garp stp llc scsi_transport_fc scsi_tgt sunrpc cpufreq_ondemand ipt_REJECT n f_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio i pv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_int el kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses enclosure sg ac pi_cpufreq dm_mod ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix skd sd_mod crc_t10dif crct10dif_common megaraid_sas CPU: 2 PID: 0 Comm: swapper/2 Tainted: GW3.14.0-rc1.snitm+ #5 Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011 task: 88033299e150 ti: 8803329a4000 task.ti: 8803329a4000 RIP: 0010:[] [] __blk_end_request_all+0x2a/0x40 RSP: 0018:88033fc43cf8 EFLAGS: 00010002 RAX: 0001 RBX: 88032e636ac8 RCX: 0006 RDX: 0001 RSI: 88033169cba0 RDI: 88032ec755c0 RBP: 88033fc43cf8 R08: 0002 R09: R10: 06f3 R11: 0001 R12: R13: 88033195faa8 R14: 8800ba396000 R15: 0001 FS: () GS:88033fc4() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 003bfea13000 CR3: 00032fbdc000 CR4: 07e0 Stack: 88033fc43d58
Re: skd: disable discard support
On Wed, Feb 12 2014 at 5:19pm -0500, Mike Snitzer wrote: > On Wed, Feb 12 2014 at 5:18pm -0500, > Mike Snitzer wrote: > > > The skd driver has never handled discards reliably. > > > > The kernel will BUG as a result of issuing discards to the skd device. > > Disable the skd driver's discard support until it is proven reliable. > > Here is the first BUG I recently saw: And a 2nd: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 10 CPU: 10 PID: 0 Comm: swapper/10 Tainted: GW O 3.14.0-rc1.snitm+ #4 Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011 88033fd47bb8 8153f180 fffa 817d8778 88033fd47c38 8153ef0d 0010 88033fd47c48 88033fd47be8 Call Trace: [] dump_stack+0x49/0x61 [] panic+0xbb/0x1d5 [] watchdog_overflow_callback+0xb1/0xc0 [] __perf_event_overflow+0x98/0x220 [] perf_event_overflow+0x14/0x20 [] intel_pmu_handle_irq+0x1de/0x3c0 [] ? unmap_kernel_range_noflush+0x11/0x20 [] ? ghes_copy_tofrom_phys+0xe5/0x200 [] perf_event_nmi_handler+0x34/0x60 [] nmi_handle+0x8a/0x170 [] default_do_nmi+0x68/0x210 [] do_nmi+0x90/0xe0 [] end_repeat_nmi+0x1e/0x2e [] ? skd_timer_tick_not_online+0x330/0x330 [skd] [] ? _raw_spin_lock_irqsave+0x21/0x30 [] ? _raw_spin_lock_irqsave+0x21/0x30 [] ? _raw_spin_lock_irqsave+0x21/0x30 <>[] skd_timer_tick+0x39/0x1e0 [skd] [] ? __queue_work+0x360/0x360 [] ? skd_timer_tick_not_online+0x330/0x330 [skd] [] call_timer_fn+0x48/0x120 [] run_timer_softirq+0x225/0x290 [] ? skd_timer_tick_not_online+0x330/0x330 [skd] [] __do_softirq+0xfc/0x2b0 [] ? tick_do_update_jiffies64+0x9f/0xd0 [] irq_exit+0xbd/0xd0 [] smp_apic_timer_interrupt+0x4a/0x5a [] apic_timer_interrupt+0x6a/0x70 [] ? cpuidle_enter_state+0xa0/0xd0 [] ? cpuidle_enter_state+0x5b/0xd0 [] cpuidle_idle_call+0xc7/0x160 [] arch_cpu_idle+0xe/0x30 [] cpu_idle_loop+0x9a/0x240 [] ? clockevents_register_device+0xc4/0x130 [] cpu_startup_entry+0x23/0x30 [] start_secondary+0x7a/0x80 Shutting down cpus with NMI Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) [ cut here ] WARNING: CPU: 10 PID: 0 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5f/0x70() Modules linked in: skd(O) dm_thin_pool(O) dm_bio_prison(O) dm_persistent_data(O) dm_bufio(O) libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 8021q libfc garp stp scsi_transport_fc llc scsi_tgt sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net macvtap macvlan vhost tun kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses enclosure sg acpi_cpufreq ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix sd_mod crc_t10dif crct10dif_common dm_mirror dm_region_hash dm_log dm_mod megaraid_sas [last unloaded: skd] CPU: 10 PID: 0 Comm: swapper/10 Tainted: GW O 3.14.0-rc1.snitm+ #4 Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011 007c 88033fd478c0 8153f180 007c 88033fd47900 8104e9bc 88033fd52c40 88033fc52c40 0002 88033fd52c40 8803329be250 Call Trace: [] dump_stack+0x49/0x61 [] warn_slowpath_common+0x8c/0xc0 [] warn_slowpath_null+0x1a/0x20 [] native_smp_send_reschedule+0x5f/0x70 [] trigger_load_balance+0x15e/0x200 [] scheduler_tick+0xa7/0xe0 [] update_process_times+0x61/0x80 [] ? apei_exec_write_register_value+0x1c/0x20 [] tick_sched_handle+0x39/0x80 [] tick_sched_timer+0x54/0x90 [] __run_hrtimer+0x7e/0x1c0 [] ? tick_nohz_handler+0xc0/0xc0 [] hrtimer_interrupt+0x10e/0x260 [] local_apic_timer_interrupt+0x3b/0x60 [] smp_apic_timer_interrupt+0x45/0x5a [] apic_timer_interrupt+0x6a/0x70 [] ? panic+0x192/0x1d5 [] ? panic+0xf0/0x1d5 [] watchdog_overflow_callback+0xb1/0xc0 [] __perf_event_overflow+0x98/0x220 [] perf_event_overflow+0x14/0x20 [] intel_pmu_handle_irq+0x1de/0x3c0 [] ? unmap_kernel_range_noflush+0x11/0x20 [] ? ghes_copy_tofrom_phys+0xe5/0x200 [] perf_event_nmi_handler+0x34/0x60 [] nmi_handle+0x8a/0x170 [] default_do_nmi+0x68/0x210 [] do_nmi+0x90/0xe0 [] end_repeat_nmi+0x1e/0x2e [] ? skd_timer_tick_not_online+0x330/0x330 [skd] [] ? _raw_spin_lock_irqsave+0x21/0x30 [] ? _raw_spin_lock_irqsave+0x21/0x30 [] ?
Re: skd: disable discard support
On Wed, Feb 12 2014 at 5:18pm -0500, Mike Snitzer wrote: > The skd driver has never handled discards reliably. > > The kernel will BUG as a result of issuing discards to the skd device. > Disable the skd driver's discard support until it is proven reliable. Here is the first BUG I recently saw: [ cut here ] Uhhuh. NMI received for unknown reason 21 on CPU 0. Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue kernel BUG at include/linux/scatterlist.h:65! invalid opcode: [#1] SMP Modules linked in: dm_thin_pool(O) dm_bio_prison(O) dm_persistent_data(O) dm_bufio(O) dm_mod(O) libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe libfc 8021q scsi_transport_fc garp scsi_tgt stp llc sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net macvtap macvlan vhost tun kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses enclosure sg acpi_cpufreq ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix skd sd_mod crc_t10dif crct10dif_common megaraid_sas [last unloaded: dm_mod] CPU: 5 PID: 0 Comm: swapper/5 Tainted: GW O 3.14.0-rc1.snitm+ #4 Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011 task: 8803329aef20 ti: 8803329b task.ti: 8803329b RIP: 0010:[] [] blk_rq_map_sg+0x241/0x3f0 RSP: 0018:88033fca3c38 EFLAGS: 00010002 RAX: ea000b2984f0 RBX: 0001 RCX: 8803286f6020 RDX: ea000b2984f0 RSI: RDI: 8803286f6000 RBP: 88033fca3cc8 R08: 8803290233c0 R09: 53538ec752528dc6 R10: 88032826f8e0 R11: 90c9 R12: R13: R14: 0001 R15: FS: () GS:88033fca() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 003bfd2f5170 CR3: 01a0b000 CR4: 07e0 Stack: 0002 88033fca3cf8 8112f67e 88033ffd7d80 88032826f8e0 880290c9 53538ec752528dc6 8802b66bbb28 8803286f6000 8803290233c0 000101fd8b08 Call Trace: [] ? __alloc_pages_nodemask+0x12e/0x250 [] skd_preop_sg_list+0x46/0x270 [skd] [] ? alloc_pages_current+0xb2/0x170 [] skd_request_fn+0x287/0x900 [skd] [] ? skd_isr_completion_posted+0x1ee/0x5d0 [skd] [] skd_isr+0x1a3/0x280 [skd] [] handle_irq_event_percpu+0x6d/0x200 [] handle_irq_event+0x42/0x70 [] handle_edge_irq+0x69/0x120 [] handle_irq+0x5c/0x150 [] ? __atomic_notifier_call_chain+0x12/0x20 [] ? atomic_notifier_call_chain+0x16/0x20 [] do_IRQ+0x5e/0x110 [] common_interrupt+0x6a/0x6a [] ? cpuidle_enter_state+0x53/0xd0 [] ? cpuidle_enter_state+0x4f/0xd0 [] cpuidle_idle_call+0xc7/0x160 [] arch_cpu_idle+0xe/0x30 [] cpu_idle_loop+0x9a/0x240 [] ? clockevents_register_device+0xc4/0x130 [] cpu_startup_entry+0x23/0x30 [] start_secondary+0x7a/0x80 Code: 41 5f c9 c3 66 0f 1f 44 00 00 44 29 f3 44 89 f2 44 89 de 4c 89 c8 eb 93 66 90 48 8b 4d b8 41 f6 c1 03 48 8b 01 0f 84 03 ff ff ff <0f> 0b eb fe 0f 1f 00 48 8b 45 c0 4c 8b 85 78 ff ff ff 48 8b b0 RIP [] blk_rq_map_sg+0x241/0x3f0 RSP ---[ end trace 61da6cb864bf7eb8 ]--- Kernel panic - not syncing: Fatal exception in interrupt Shutting down cpus with NMI Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] skd: disable discard support
The skd driver has never handled discards reliably. The kernel will BUG as a result of issuing discards to the skd device. Disable the skd driver's discard support until it is proven reliable. The device-mapper-test-suite test that exposed this bug just issues a discard that covers a portion of the skd device that was previously written through a dm-thin device. The discard spans the entire 1GB thin device (logical sector 0 through 2097152). dmtest run --profile stec --suite thin-provisioning -n /discard_fully_provisioned_device/ associated device-mapper-test-suite ruby test code follows: def test_discard_fully_provisioned_device with_standard_pool(@size) do |pool| with_new_thins(pool, @volume_size, 0, 1) do |thin, thin2| wipe_device(thin) wipe_device(thin2) assert_used_blocks(pool, 2 * @blocks_per_dev) thin.discard(0, @volume_size) assert_used_blocks(pool, @blocks_per_dev) end end ... Signed-off-by: Mike Snitzer --- drivers/block/skd_main.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c index eb6e1e0..5dadecc 100644 --- a/drivers/block/skd_main.c +++ b/drivers/block/skd_main.c @@ -4441,12 +4441,15 @@ static int skd_cons_disk(struct skd_device *skdev) /* set sysfs ptimal_io_size to 8K */ blk_queue_io_opt(q, 8192); +#if 0 + /* FIXME: Disable discard support until it no longer BUGs */ /* DISCARD Flag initialization. */ q->limits.discard_granularity = 8192; q->limits.discard_alignment = 0; q->limits.max_discard_sectors = UINT_MAX >> 9; q->limits.discard_zeroes_data = 1; queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); +#endif queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q); spin_lock_irqsave(>lock, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] skd: disable discard support
The skd driver has never handled discards reliably. The kernel will BUG as a result of issuing discards to the skd device. Disable the skd driver's discard support until it is proven reliable. The device-mapper-test-suite test that exposed this bug just issues a discard that covers a portion of the skd device that was previously written through a dm-thin device. The discard spans the entire 1GB thin device (logical sector 0 through 2097152). dmtest run --profile stec --suite thin-provisioning -n /discard_fully_provisioned_device/ associated device-mapper-test-suite ruby test code follows: def test_discard_fully_provisioned_device with_standard_pool(@size) do |pool| with_new_thins(pool, @volume_size, 0, 1) do |thin, thin2| wipe_device(thin) wipe_device(thin2) assert_used_blocks(pool, 2 * @blocks_per_dev) thin.discard(0, @volume_size) assert_used_blocks(pool, @blocks_per_dev) end end ... Signed-off-by: Mike Snitzer snit...@redhat.com --- drivers/block/skd_main.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c index eb6e1e0..5dadecc 100644 --- a/drivers/block/skd_main.c +++ b/drivers/block/skd_main.c @@ -4441,12 +4441,15 @@ static int skd_cons_disk(struct skd_device *skdev) /* set sysfs ptimal_io_size to 8K */ blk_queue_io_opt(q, 8192); +#if 0 + /* FIXME: Disable discard support until it no longer BUGs */ /* DISCARD Flag initialization. */ q-limits.discard_granularity = 8192; q-limits.discard_alignment = 0; q-limits.max_discard_sectors = UINT_MAX 9; q-limits.discard_zeroes_data = 1; queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); +#endif queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q); spin_lock_irqsave(skdev-lock, flags); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: skd: disable discard support
On Wed, Feb 12 2014 at 5:18pm -0500, Mike Snitzer snit...@redhat.com wrote: The skd driver has never handled discards reliably. The kernel will BUG as a result of issuing discards to the skd device. Disable the skd driver's discard support until it is proven reliable. Here is the first BUG I recently saw: [ cut here ] Uhhuh. NMI received for unknown reason 21 on CPU 0. Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue kernel BUG at include/linux/scatterlist.h:65! invalid opcode: [#1] SMP Modules linked in: dm_thin_pool(O) dm_bio_prison(O) dm_persistent_data(O) dm_bufio(O) dm_mod(O) libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe libfc 8021q scsi_transport_fc garp scsi_tgt stp llc sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net macvtap macvlan vhost tun kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses enclosure sg acpi_cpufreq ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix skd sd_mod crc_t10dif crct10dif_common megaraid_sas [last unloaded: dm_mod] CPU: 5 PID: 0 Comm: swapper/5 Tainted: GW O 3.14.0-rc1.snitm+ #4 Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011 task: 8803329aef20 ti: 8803329b task.ti: 8803329b RIP: 0010:[8125a481] [8125a481] blk_rq_map_sg+0x241/0x3f0 RSP: 0018:88033fca3c38 EFLAGS: 00010002 RAX: ea000b2984f0 RBX: 0001 RCX: 8803286f6020 RDX: ea000b2984f0 RSI: RDI: 8803286f6000 RBP: 88033fca3cc8 R08: 8803290233c0 R09: 53538ec752528dc6 R10: 88032826f8e0 R11: 90c9 R12: R13: R14: 0001 R15: FS: () GS:88033fca() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 003bfd2f5170 CR3: 01a0b000 CR4: 07e0 Stack: 0002 88033fca3cf8 8112f67e 88033ffd7d80 88032826f8e0 880290c9 53538ec752528dc6 8802b66bbb28 8803286f6000 8803290233c0 000101fd8b08 Call Trace: IRQ [8112f67e] ? __alloc_pages_nodemask+0x12e/0x250 [a0071146] skd_preop_sg_list+0x46/0x270 [skd] [811703f2] ? alloc_pages_current+0xb2/0x170 [a0072997] skd_request_fn+0x287/0x900 [skd] [a007551e] ? skd_isr_completion_posted+0x1ee/0x5d0 [skd] [a0076233] skd_isr+0x1a3/0x280 [skd] [810a73ed] handle_irq_event_percpu+0x6d/0x200 [810a75c2] handle_irq_event+0x42/0x70 [810aad19] handle_edge_irq+0x69/0x120 [81005aec] handle_irq+0x5c/0x150 [815471f2] ? __atomic_notifier_call_chain+0x12/0x20 [81547216] ? atomic_notifier_call_chain+0x16/0x20 [8154da1e] do_IRQ+0x5e/0x110 [8154376a] common_interrupt+0x6a/0x6a EOI [8144d6c3] ? cpuidle_enter_state+0x53/0xd0 [8144d6bf] ? cpuidle_enter_state+0x4f/0xd0 [8144d887] cpuidle_idle_call+0xc7/0x160 [8100cf5e] arch_cpu_idle+0xe/0x30 [810a696a] cpu_idle_loop+0x9a/0x240 [810b9e64] ? clockevents_register_device+0xc4/0x130 [810a6b33] cpu_startup_entry+0x23/0x30 [81032d5a] start_secondary+0x7a/0x80 Code: 41 5f c9 c3 66 0f 1f 44 00 00 44 29 f3 44 89 f2 44 89 de 4c 89 c8 eb 93 66 90 48 8b 4d b8 41 f6 c1 03 48 8b 01 0f 84 03 ff ff ff 0f 0b eb fe 0f 1f 00 48 8b 45 c0 4c 8b 85 78 ff ff ff 48 8b b0 RIP [8125a481] blk_rq_map_sg+0x241/0x3f0 RSP 88033fca3c38 ---[ end trace 61da6cb864bf7eb8 ]--- Kernel panic - not syncing: Fatal exception in interrupt Shutting down cpus with NMI Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: skd: disable discard support
On Wed, Feb 12 2014 at 5:19pm -0500, Mike Snitzer snit...@redhat.com wrote: On Wed, Feb 12 2014 at 5:18pm -0500, Mike Snitzer snit...@redhat.com wrote: The skd driver has never handled discards reliably. The kernel will BUG as a result of issuing discards to the skd device. Disable the skd driver's discard support until it is proven reliable. Here is the first BUG I recently saw: And a 2nd: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 10 CPU: 10 PID: 0 Comm: swapper/10 Tainted: GW O 3.14.0-rc1.snitm+ #4 Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011 88033fd47bb8 8153f180 fffa 817d8778 88033fd47c38 8153ef0d 0010 88033fd47c48 88033fd47be8 Call Trace: NMI [8153f180] dump_stack+0x49/0x61 [8153ef0d] panic+0xbb/0x1d5 [810e8761] watchdog_overflow_callback+0xb1/0xc0 [8111e9b8] __perf_event_overflow+0x98/0x220 [8111f2a4] perf_event_overflow+0x14/0x20 [8102012e] intel_pmu_handle_irq+0x1de/0x3c0 [8115f931] ? unmap_kernel_range_noflush+0x11/0x20 [8131a5c5] ? ghes_copy_tofrom_phys+0xe5/0x200 [81544e84] perf_event_nmi_handler+0x34/0x60 [8154464a] nmi_handle+0x8a/0x170 [81544848] default_do_nmi+0x68/0x210 [81544a80] do_nmi+0x90/0xe0 [81543ca7] end_repeat_nmi+0x1e/0x2e [a06ef7a0] ? skd_timer_tick_not_online+0x330/0x330 [skd] [815432a1] ? _raw_spin_lock_irqsave+0x21/0x30 [815432a1] ? _raw_spin_lock_irqsave+0x21/0x30 [815432a1] ? _raw_spin_lock_irqsave+0x21/0x30 EOE IRQ [a06ef7d9] skd_timer_tick+0x39/0x1e0 [skd] [81069480] ? __queue_work+0x360/0x360 [a06ef7a0] ? skd_timer_tick_not_online+0x330/0x330 [skd] [8105a318] call_timer_fn+0x48/0x120 [8105aef5] run_timer_softirq+0x225/0x290 [a06ef7a0] ? skd_timer_tick_not_online+0x330/0x330 [skd] [8105365c] __do_softirq+0xfc/0x2b0 [810bc09f] ? tick_do_update_jiffies64+0x9f/0xd0 [8105390d] irq_exit+0xbd/0xd0 [8154dbea] smp_apic_timer_interrupt+0x4a/0x5a [8154c8ca] apic_timer_interrupt+0x6a/0x70 EOI [8144d710] ? cpuidle_enter_state+0xa0/0xd0 [8144d6cb] ? cpuidle_enter_state+0x5b/0xd0 [8144d887] cpuidle_idle_call+0xc7/0x160 [8100cf5e] arch_cpu_idle+0xe/0x30 [810a696a] cpu_idle_loop+0x9a/0x240 [810b9e64] ? clockevents_register_device+0xc4/0x130 [810a6b33] cpu_startup_entry+0x23/0x30 [81032d5a] start_secondary+0x7a/0x80 Shutting down cpus with NMI Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) [ cut here ] WARNING: CPU: 10 PID: 0 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5f/0x70() Modules linked in: skd(O) dm_thin_pool(O) dm_bio_prison(O) dm_persistent_data(O) dm_bufio(O) libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 8021q libfc garp stp scsi_transport_fc llc scsi_tgt sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net macvtap macvlan vhost tun kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses enclosure sg acpi_cpufreq ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix sd_mod crc_t10dif crct10dif_common dm_mirror dm_region_hash dm_log dm_mod megaraid_sas [last unloaded: skd] CPU: 10 PID: 0 Comm: swapper/10 Tainted: GW O 3.14.0-rc1.snitm+ #4 Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011 007c 88033fd478c0 8153f180 007c 88033fd47900 8104e9bc 88033fd52c40 88033fc52c40 0002 88033fd52c40 8803329be250 Call Trace: NMI [8153f180] dump_stack+0x49/0x61 [8104e9bc] warn_slowpath_common+0x8c/0xc0 [8104ea0a] warn_slowpath_null+0x1a/0x20 [8103141f] native_smp_send_reschedule+0x5f/0x70 [81087e3e] trigger_load_balance+0x15e/0x200 [8107ccf7] scheduler_tick+0xa7/0xe0 [8105a031] update_process_times+0x61/0x80 [8131863c] ? apei_exec_write_register_value+0x1c/0x20 [810bbfb9] tick_sched_handle+0x39/0x80 [810bc1e4] tick_sched_timer+0x54/0x90 [810743be]
Re: skd: disable discard support
On Wed, Feb 12 2014 at 5:18pm -0500, Mike Snitzer snit...@redhat.com wrote: The skd driver has never handled discards reliably. The kernel will BUG as a result of issuing discards to the skd device. Disable the skd driver's discard support until it is proven reliable. The device-mapper-test-suite test that exposed this bug just issues a discard that covers a portion of the skd device that was previously written through a dm-thin device. The discard spans the entire 1GB thin device (logical sector 0 through 2097152). dmtest run --profile stec --suite thin-provisioning -n /discard_fully_provisioned_device/ I retested after applying these linux-block.git commits ontop of 3.14-rc1: 5cb8850c9c4a block: Explicitly handle discard/write same segments 8423ae3d7a3c block: Fix cloning of discard/write same bios And got this: request botched: dev skd0: type=1, flags=12248081 sector 8390784, nr/cnr 0/128 bio 88033169cba0, biotail 88032e42bb60, buffer (null), len 0 [ cut here ] kernel BUG at block/blk-core.c:2693! invalid opcode: [#1] SMP Modules linked in: dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 target_core_iblock t arget_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 8021q libfc garp stp llc scsi_transport_fc scsi_tgt sunrpc cpufreq_ondemand ipt_REJECT n f_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio i pv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_int el kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses enclosure sg ac pi_cpufreq dm_mod ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix skd sd_mod crc_t10dif crct10dif_common megaraid_sas CPU: 2 PID: 0 Comm: swapper/2 Tainted: GW3.14.0-rc1.snitm+ #5 Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011 task: 88033299e150 ti: 8803329a4000 task.ti: 8803329a4000 RIP: 0010:[81252f1a] [81252f1a] __blk_end_request_all+0x2a/0x40 RSP: 0018:88033fc43cf8 EFLAGS: 00010002 RAX: 0001 RBX: 88032e636ac8 RCX: 0006 RDX: 0001 RSI: 88033169cba0 RDI: 88032ec755c0 RBP: 88033fc43cf8 R08: 0002 R09: R10: 06f3 R11: 0001 R12: R13: 88033195faa8 R14: 8800ba396000 R15: 0001 FS: () GS:88033fc4() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 003bfea13000 CR3: 00032fbdc000 CR4: 07e0 Stack: