Re: skd: disable discard support

2014-02-12 Thread Mike Snitzer
On Wed, Feb 12 2014 at  5:18pm -0500,
Mike Snitzer  wrote:

> The skd driver has never handled discards reliably.
> 
> The kernel will BUG as a result of issuing discards to the skd device.
> Disable the skd driver's discard support until it is proven reliable.
> 
> The device-mapper-test-suite test that exposed this bug just issues a
> discard that covers a portion of the skd device that was previously
> written through a dm-thin device.  The discard spans the entire 1GB thin
> device (logical sector 0 through 2097152).
> 
> dmtest run --profile stec --suite thin-provisioning -n 
> /discard_fully_provisioned_device/

I retested after applying these linux-block.git commits ontop of
3.14-rc1:

5cb8850c9c4a block: Explicitly handle discard/write same segments
8423ae3d7a3c block: Fix cloning of discard/write same bios

And got this:

request botched: dev skd0: type=1, flags=12248081
  sector 8390784, nr/cnr 0/128
  bio 88033169cba0, biotail 88032e42bb60, buffer   (null), len 
0   

[ cut here ]

   
kernel BUG at block/blk-core.c:2693!

   
invalid opcode:  [#1] SMP   

   
Modules linked in: dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio 
libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 
target_core_iblock t
arget_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 
8021q libfc garp stp llc scsi_transport_fc scsi_tgt sunrpc cpufreq_ondemand 
ipt_REJECT n
f_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
ip6_tables bnx2i cnic uio i
pv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan 
vhost tun kvm_int
el kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb 
i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses 
enclosure sg ac
pi_cpufreq dm_mod ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix 
skd sd_mod crc_t10dif crct10dif_common megaraid_sas 
   
CPU: 2 PID: 0 Comm: swapper/2 Tainted: GW3.14.0-rc1.snitm+ #5   

   
Hardware name: FUJITSU  PRIMERGY RX300 S6 
/D2619, BIOS 6.00 Rev. 1.10.2619.N1   05/24/2011
 
task: 88033299e150 ti: 8803329a4000 task.ti: 8803329a4000   

   
RIP: 0010:[]  [] 
__blk_end_request_all+0x2a/0x40 
  
RSP: 0018:88033fc43cf8  EFLAGS: 00010002

   
RAX: 0001 RBX: 88032e636ac8 RCX: 0006   

   
RDX: 0001 RSI: 88033169cba0 RDI: 88032ec755c0   

   
RBP: 88033fc43cf8 R08: 0002 R09:    

   
R10: 06f3 R11: 0001 R12:    

   
R13: 88033195faa8 R14: 8800ba396000 R15: 0001   

   
FS:  () GS:88033fc4() knlGS:

   
CS:  0010 DS:  ES:  CR0: 8005003b   

   
CR2: 003bfea13000 CR3: 00032fbdc000 CR4: 07e0   

   
Stack:  

   
 88033fc43d58 

Re: skd: disable discard support

2014-02-12 Thread Mike Snitzer
On Wed, Feb 12 2014 at  5:19pm -0500,
Mike Snitzer  wrote:

> On Wed, Feb 12 2014 at  5:18pm -0500,
> Mike Snitzer  wrote:
> 
> > The skd driver has never handled discards reliably.
> > 
> > The kernel will BUG as a result of issuing discards to the skd device.
> > Disable the skd driver's discard support until it is proven reliable.
> 
> Here is the first BUG I recently saw:

And a 2nd:

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 10
CPU: 10 PID: 0 Comm: swapper/10 Tainted: GW  O 3.14.0-rc1.snitm+ #4
Hardware name: FUJITSU  PRIMERGY RX300 S6 
/D2619, BIOS 6.00 Rev. 1.10.2619.N1   05/24/2011
  88033fd47bb8 8153f180 fffa
 817d8778 88033fd47c38 8153ef0d 0010
 88033fd47c48 88033fd47be8  
Call Trace:
   [] dump_stack+0x49/0x61
 [] panic+0xbb/0x1d5
 [] watchdog_overflow_callback+0xb1/0xc0
 [] __perf_event_overflow+0x98/0x220
 [] perf_event_overflow+0x14/0x20
 [] intel_pmu_handle_irq+0x1de/0x3c0
 [] ? unmap_kernel_range_noflush+0x11/0x20
 [] ? ghes_copy_tofrom_phys+0xe5/0x200
 [] perf_event_nmi_handler+0x34/0x60
 [] nmi_handle+0x8a/0x170
 [] default_do_nmi+0x68/0x210
 [] do_nmi+0x90/0xe0
 [] end_repeat_nmi+0x1e/0x2e
 [] ? skd_timer_tick_not_online+0x330/0x330 [skd]
 [] ? _raw_spin_lock_irqsave+0x21/0x30
 [] ? _raw_spin_lock_irqsave+0x21/0x30
 [] ? _raw_spin_lock_irqsave+0x21/0x30
 <>[] skd_timer_tick+0x39/0x1e0 [skd]
 [] ? __queue_work+0x360/0x360
 [] ? skd_timer_tick_not_online+0x330/0x330 [skd]
 [] call_timer_fn+0x48/0x120
 [] run_timer_softirq+0x225/0x290
 [] ? skd_timer_tick_not_online+0x330/0x330 [skd]
 [] __do_softirq+0xfc/0x2b0
 [] ? tick_do_update_jiffies64+0x9f/0xd0
 [] irq_exit+0xbd/0xd0
 [] smp_apic_timer_interrupt+0x4a/0x5a
 [] apic_timer_interrupt+0x6a/0x70
   [] ? cpuidle_enter_state+0xa0/0xd0
 [] ? cpuidle_enter_state+0x5b/0xd0
 [] cpuidle_idle_call+0xc7/0x160
 [] arch_cpu_idle+0xe/0x30
 [] cpu_idle_loop+0x9a/0x240
 [] ? clockevents_register_device+0xc4/0x130
 [] cpu_startup_entry+0x23/0x30
 [] start_secondary+0x7a/0x80
Shutting down cpus with NMI
Kernel Offset: 0x0 from 0x8100 (relocation range: 
0x8000-0x9fff)
[ cut here ]
WARNING: CPU: 10 PID: 0 at arch/x86/kernel/smp.c:124 
native_smp_send_reschedule+0x5f/0x70()
Modules linked in: skd(O) dm_thin_pool(O) dm_bio_prison(O) 
dm_persistent_data(O) dm_bufio(O) libcrc32c ebtable_nat ebtables xt_CHECKSUM 
iptable_mangle bridge autofs4 target_core_iblock target_core_file 
target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 8021q libfc garp 
stp scsi_transport_fc llc scsi_tgt sunrpc cpufreq_ondemand ipt_REJECT 
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
ip6_tables bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net macvtap macvlan vhost tun 
kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core 
igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses 
enclosure sg acpi_cpufreq ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic 
ata_piix sd_mod crc_t10dif crct10dif_common dm_mirror dm_region_hash dm_log 
dm_mod megaraid_sas [last unloaded: skd]
CPU: 10 PID: 0 Comm: swapper/10 Tainted: GW  O 3.14.0-rc1.snitm+ #4
Hardware name: FUJITSU  PRIMERGY RX300 S6 
/D2619, BIOS 6.00 Rev. 1.10.2619.N1   05/24/2011
 007c 88033fd478c0 8153f180 007c
  88033fd47900 8104e9bc 88033fd52c40
 88033fc52c40 0002 88033fd52c40 8803329be250
Call Trace:
   [] dump_stack+0x49/0x61
 [] warn_slowpath_common+0x8c/0xc0
 [] warn_slowpath_null+0x1a/0x20
 [] native_smp_send_reschedule+0x5f/0x70
 [] trigger_load_balance+0x15e/0x200
 [] scheduler_tick+0xa7/0xe0
 [] update_process_times+0x61/0x80
 [] ? apei_exec_write_register_value+0x1c/0x20
 [] tick_sched_handle+0x39/0x80
 [] tick_sched_timer+0x54/0x90
 [] __run_hrtimer+0x7e/0x1c0
 [] ? tick_nohz_handler+0xc0/0xc0
 [] hrtimer_interrupt+0x10e/0x260
 [] local_apic_timer_interrupt+0x3b/0x60
 [] smp_apic_timer_interrupt+0x45/0x5a
 [] apic_timer_interrupt+0x6a/0x70
 [] ? panic+0x192/0x1d5
 [] ? panic+0xf0/0x1d5
 [] watchdog_overflow_callback+0xb1/0xc0
 [] __perf_event_overflow+0x98/0x220
 [] perf_event_overflow+0x14/0x20
 [] intel_pmu_handle_irq+0x1de/0x3c0
 [] ? unmap_kernel_range_noflush+0x11/0x20
 [] ? ghes_copy_tofrom_phys+0xe5/0x200
 [] perf_event_nmi_handler+0x34/0x60
 [] nmi_handle+0x8a/0x170
 [] default_do_nmi+0x68/0x210
 [] do_nmi+0x90/0xe0
 [] end_repeat_nmi+0x1e/0x2e
 [] ? skd_timer_tick_not_online+0x330/0x330 [skd]
 [] ? _raw_spin_lock_irqsave+0x21/0x30
 [] ? _raw_spin_lock_irqsave+0x21/0x30
 [] ? 

Re: skd: disable discard support

2014-02-12 Thread Mike Snitzer
On Wed, Feb 12 2014 at  5:18pm -0500,
Mike Snitzer  wrote:

> The skd driver has never handled discards reliably.
> 
> The kernel will BUG as a result of issuing discards to the skd device.
> Disable the skd driver's discard support until it is proven reliable.

Here is the first BUG I recently saw:

[ cut here ]
Uhhuh. NMI received for unknown reason 21 on CPU 0.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
kernel BUG at include/linux/scatterlist.h:65!
invalid opcode:  [#1] SMP
Modules linked in: dm_thin_pool(O) dm_bio_prison(O) dm_persistent_data(O) 
dm_bufio(O) dm_mod(O) libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle 
bridge autofs4 target_core_iblock target_core_file target_core_pscsi 
target_core_mod configfs bnx2fc fcoe libfcoe libfc 8021q scsi_transport_fc garp 
scsi_tgt stp llc sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio 
ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi vhost_net macvtap macvlan vhost tun kvm_intel kvm iTCO_wdt 
iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit 
i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses enclosure sg 
acpi_cpufreq ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix skd 
sd_mod crc_t10dif crct10dif_common megaraid_sas [last unloaded: dm_mod]
CPU: 5 PID: 0 Comm: swapper/5 Tainted: GW  O 3.14.0-rc1.snitm+ #4
Hardware name: FUJITSU  PRIMERGY RX300 S6 
/D2619, BIOS 6.00 Rev. 1.10.2619.N1   05/24/2011
task: 8803329aef20 ti: 8803329b task.ti: 8803329b
RIP: 0010:[]  [] blk_rq_map_sg+0x241/0x3f0
RSP: 0018:88033fca3c38  EFLAGS: 00010002
RAX: ea000b2984f0 RBX: 0001 RCX: 8803286f6020
RDX: ea000b2984f0 RSI:  RDI: 8803286f6000
RBP: 88033fca3cc8 R08: 8803290233c0 R09: 53538ec752528dc6
R10: 88032826f8e0 R11: 90c9 R12: 
R13:  R14: 0001 R15: 
FS:  () GS:88033fca() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 003bfd2f5170 CR3: 01a0b000 CR4: 07e0
Stack:
 0002  88033fca3cf8 8112f67e
 88033ffd7d80 88032826f8e0 880290c9 53538ec752528dc6
 8802b66bbb28 8803286f6000 8803290233c0 000101fd8b08
Call Trace:
 
 [] ? __alloc_pages_nodemask+0x12e/0x250
 [] skd_preop_sg_list+0x46/0x270 [skd]
 [] ? alloc_pages_current+0xb2/0x170
 [] skd_request_fn+0x287/0x900 [skd]
 [] ? skd_isr_completion_posted+0x1ee/0x5d0 [skd]
 [] skd_isr+0x1a3/0x280 [skd]
 [] handle_irq_event_percpu+0x6d/0x200
 [] handle_irq_event+0x42/0x70
 [] handle_edge_irq+0x69/0x120
 [] handle_irq+0x5c/0x150
 [] ? __atomic_notifier_call_chain+0x12/0x20
 [] ? atomic_notifier_call_chain+0x16/0x20
 [] do_IRQ+0x5e/0x110
 [] common_interrupt+0x6a/0x6a
 
 [] ? cpuidle_enter_state+0x53/0xd0
 [] ? cpuidle_enter_state+0x4f/0xd0
 [] cpuidle_idle_call+0xc7/0x160
 [] arch_cpu_idle+0xe/0x30
 [] cpu_idle_loop+0x9a/0x240
 [] ? clockevents_register_device+0xc4/0x130
 [] cpu_startup_entry+0x23/0x30
 [] start_secondary+0x7a/0x80
Code: 41 5f c9 c3 66 0f 1f 44 00 00 44 29 f3 44 89 f2 44 89 de 4c 89 c8 eb 93 
66 90 48 8b 4d b8 41 f6 c1 03 48 8b 01 0f 84 03 ff ff ff <0f> 0b eb fe 0f 1f 00 
48 8b 45 c0 4c 8b 85 78 ff ff ff 48 8b b0
RIP  [] blk_rq_map_sg+0x241/0x3f0
 RSP 
---[ end trace 61da6cb864bf7eb8 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Shutting down cpus with NMI
Kernel Offset: 0x0 from 0x8100 (relocation range: 
0x8000-0x9fff)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] skd: disable discard support

2014-02-12 Thread Mike Snitzer
The skd driver has never handled discards reliably.

The kernel will BUG as a result of issuing discards to the skd device.
Disable the skd driver's discard support until it is proven reliable.

The device-mapper-test-suite test that exposed this bug just issues a
discard that covers a portion of the skd device that was previously
written through a dm-thin device.  The discard spans the entire 1GB thin
device (logical sector 0 through 2097152).

dmtest run --profile stec --suite thin-provisioning -n 
/discard_fully_provisioned_device/

 associated device-mapper-test-suite ruby test code follows:

  def test_discard_fully_provisioned_device
with_standard_pool(@size) do |pool|
  with_new_thins(pool, @volume_size, 0, 1) do |thin, thin2|
wipe_device(thin)
wipe_device(thin2)
assert_used_blocks(pool, 2 * @blocks_per_dev)
thin.discard(0, @volume_size)
assert_used_blocks(pool, @blocks_per_dev)
  end
end
  ...

Signed-off-by: Mike Snitzer 
---
 drivers/block/skd_main.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
index eb6e1e0..5dadecc 100644
--- a/drivers/block/skd_main.c
+++ b/drivers/block/skd_main.c
@@ -4441,12 +4441,15 @@ static int skd_cons_disk(struct skd_device *skdev)
/* set sysfs ptimal_io_size to 8K */
blk_queue_io_opt(q, 8192);
 
+#if 0
+   /* FIXME: Disable discard support until it no longer BUGs */
/* DISCARD Flag initialization. */
q->limits.discard_granularity = 8192;
q->limits.discard_alignment = 0;
q->limits.max_discard_sectors = UINT_MAX >> 9;
q->limits.discard_zeroes_data = 1;
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
+#endif
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
 
spin_lock_irqsave(>lock, flags);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] skd: disable discard support

2014-02-12 Thread Mike Snitzer
The skd driver has never handled discards reliably.

The kernel will BUG as a result of issuing discards to the skd device.
Disable the skd driver's discard support until it is proven reliable.

The device-mapper-test-suite test that exposed this bug just issues a
discard that covers a portion of the skd device that was previously
written through a dm-thin device.  The discard spans the entire 1GB thin
device (logical sector 0 through 2097152).

dmtest run --profile stec --suite thin-provisioning -n 
/discard_fully_provisioned_device/

 associated device-mapper-test-suite ruby test code follows:

  def test_discard_fully_provisioned_device
with_standard_pool(@size) do |pool|
  with_new_thins(pool, @volume_size, 0, 1) do |thin, thin2|
wipe_device(thin)
wipe_device(thin2)
assert_used_blocks(pool, 2 * @blocks_per_dev)
thin.discard(0, @volume_size)
assert_used_blocks(pool, @blocks_per_dev)
  end
end
  ...

Signed-off-by: Mike Snitzer snit...@redhat.com
---
 drivers/block/skd_main.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
index eb6e1e0..5dadecc 100644
--- a/drivers/block/skd_main.c
+++ b/drivers/block/skd_main.c
@@ -4441,12 +4441,15 @@ static int skd_cons_disk(struct skd_device *skdev)
/* set sysfs ptimal_io_size to 8K */
blk_queue_io_opt(q, 8192);
 
+#if 0
+   /* FIXME: Disable discard support until it no longer BUGs */
/* DISCARD Flag initialization. */
q-limits.discard_granularity = 8192;
q-limits.discard_alignment = 0;
q-limits.max_discard_sectors = UINT_MAX  9;
q-limits.discard_zeroes_data = 1;
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
+#endif
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
 
spin_lock_irqsave(skdev-lock, flags);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: skd: disable discard support

2014-02-12 Thread Mike Snitzer
On Wed, Feb 12 2014 at  5:18pm -0500,
Mike Snitzer snit...@redhat.com wrote:

 The skd driver has never handled discards reliably.
 
 The kernel will BUG as a result of issuing discards to the skd device.
 Disable the skd driver's discard support until it is proven reliable.

Here is the first BUG I recently saw:

[ cut here ]
Uhhuh. NMI received for unknown reason 21 on CPU 0.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
kernel BUG at include/linux/scatterlist.h:65!
invalid opcode:  [#1] SMP
Modules linked in: dm_thin_pool(O) dm_bio_prison(O) dm_persistent_data(O) 
dm_bufio(O) dm_mod(O) libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle 
bridge autofs4 target_core_iblock target_core_file target_core_pscsi 
target_core_mod configfs bnx2fc fcoe libfcoe libfc 8021q scsi_transport_fc garp 
scsi_tgt stp llc sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio 
ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi vhost_net macvtap macvlan vhost tun kvm_intel kvm iTCO_wdt 
iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit 
i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses enclosure sg 
acpi_cpufreq ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix skd 
sd_mod crc_t10dif crct10dif_common megaraid_sas [last unloaded: dm_mod]
CPU: 5 PID: 0 Comm: swapper/5 Tainted: GW  O 3.14.0-rc1.snitm+ #4
Hardware name: FUJITSU  PRIMERGY RX300 S6 
/D2619, BIOS 6.00 Rev. 1.10.2619.N1   05/24/2011
task: 8803329aef20 ti: 8803329b task.ti: 8803329b
RIP: 0010:[8125a481]  [8125a481] blk_rq_map_sg+0x241/0x3f0
RSP: 0018:88033fca3c38  EFLAGS: 00010002
RAX: ea000b2984f0 RBX: 0001 RCX: 8803286f6020
RDX: ea000b2984f0 RSI:  RDI: 8803286f6000
RBP: 88033fca3cc8 R08: 8803290233c0 R09: 53538ec752528dc6
R10: 88032826f8e0 R11: 90c9 R12: 
R13:  R14: 0001 R15: 
FS:  () GS:88033fca() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 003bfd2f5170 CR3: 01a0b000 CR4: 07e0
Stack:
 0002  88033fca3cf8 8112f67e
 88033ffd7d80 88032826f8e0 880290c9 53538ec752528dc6
 8802b66bbb28 8803286f6000 8803290233c0 000101fd8b08
Call Trace:
 IRQ
 [8112f67e] ? __alloc_pages_nodemask+0x12e/0x250
 [a0071146] skd_preop_sg_list+0x46/0x270 [skd]
 [811703f2] ? alloc_pages_current+0xb2/0x170
 [a0072997] skd_request_fn+0x287/0x900 [skd]
 [a007551e] ? skd_isr_completion_posted+0x1ee/0x5d0 [skd]
 [a0076233] skd_isr+0x1a3/0x280 [skd]
 [810a73ed] handle_irq_event_percpu+0x6d/0x200
 [810a75c2] handle_irq_event+0x42/0x70
 [810aad19] handle_edge_irq+0x69/0x120
 [81005aec] handle_irq+0x5c/0x150
 [815471f2] ? __atomic_notifier_call_chain+0x12/0x20
 [81547216] ? atomic_notifier_call_chain+0x16/0x20
 [8154da1e] do_IRQ+0x5e/0x110
 [8154376a] common_interrupt+0x6a/0x6a
 EOI
 [8144d6c3] ? cpuidle_enter_state+0x53/0xd0
 [8144d6bf] ? cpuidle_enter_state+0x4f/0xd0
 [8144d887] cpuidle_idle_call+0xc7/0x160
 [8100cf5e] arch_cpu_idle+0xe/0x30
 [810a696a] cpu_idle_loop+0x9a/0x240
 [810b9e64] ? clockevents_register_device+0xc4/0x130
 [810a6b33] cpu_startup_entry+0x23/0x30
 [81032d5a] start_secondary+0x7a/0x80
Code: 41 5f c9 c3 66 0f 1f 44 00 00 44 29 f3 44 89 f2 44 89 de 4c 89 c8 eb 93 
66 90 48 8b 4d b8 41 f6 c1 03 48 8b 01 0f 84 03 ff ff ff 0f 0b eb fe 0f 1f 00 
48 8b 45 c0 4c 8b 85 78 ff ff ff 48 8b b0
RIP  [8125a481] blk_rq_map_sg+0x241/0x3f0
 RSP 88033fca3c38
---[ end trace 61da6cb864bf7eb8 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Shutting down cpus with NMI
Kernel Offset: 0x0 from 0x8100 (relocation range: 
0x8000-0x9fff)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: skd: disable discard support

2014-02-12 Thread Mike Snitzer
On Wed, Feb 12 2014 at  5:19pm -0500,
Mike Snitzer snit...@redhat.com wrote:

 On Wed, Feb 12 2014 at  5:18pm -0500,
 Mike Snitzer snit...@redhat.com wrote:
 
  The skd driver has never handled discards reliably.
  
  The kernel will BUG as a result of issuing discards to the skd device.
  Disable the skd driver's discard support until it is proven reliable.
 
 Here is the first BUG I recently saw:

And a 2nd:

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 10
CPU: 10 PID: 0 Comm: swapper/10 Tainted: GW  O 3.14.0-rc1.snitm+ #4
Hardware name: FUJITSU  PRIMERGY RX300 S6 
/D2619, BIOS 6.00 Rev. 1.10.2619.N1   05/24/2011
  88033fd47bb8 8153f180 fffa
 817d8778 88033fd47c38 8153ef0d 0010
 88033fd47c48 88033fd47be8  
Call Trace:
 NMI  [8153f180] dump_stack+0x49/0x61
 [8153ef0d] panic+0xbb/0x1d5
 [810e8761] watchdog_overflow_callback+0xb1/0xc0
 [8111e9b8] __perf_event_overflow+0x98/0x220
 [8111f2a4] perf_event_overflow+0x14/0x20
 [8102012e] intel_pmu_handle_irq+0x1de/0x3c0
 [8115f931] ? unmap_kernel_range_noflush+0x11/0x20
 [8131a5c5] ? ghes_copy_tofrom_phys+0xe5/0x200
 [81544e84] perf_event_nmi_handler+0x34/0x60
 [8154464a] nmi_handle+0x8a/0x170
 [81544848] default_do_nmi+0x68/0x210
 [81544a80] do_nmi+0x90/0xe0
 [81543ca7] end_repeat_nmi+0x1e/0x2e
 [a06ef7a0] ? skd_timer_tick_not_online+0x330/0x330 [skd]
 [815432a1] ? _raw_spin_lock_irqsave+0x21/0x30
 [815432a1] ? _raw_spin_lock_irqsave+0x21/0x30
 [815432a1] ? _raw_spin_lock_irqsave+0x21/0x30
 EOE  IRQ  [a06ef7d9] skd_timer_tick+0x39/0x1e0 [skd]
 [81069480] ? __queue_work+0x360/0x360
 [a06ef7a0] ? skd_timer_tick_not_online+0x330/0x330 [skd]
 [8105a318] call_timer_fn+0x48/0x120
 [8105aef5] run_timer_softirq+0x225/0x290
 [a06ef7a0] ? skd_timer_tick_not_online+0x330/0x330 [skd]
 [8105365c] __do_softirq+0xfc/0x2b0
 [810bc09f] ? tick_do_update_jiffies64+0x9f/0xd0
 [8105390d] irq_exit+0xbd/0xd0
 [8154dbea] smp_apic_timer_interrupt+0x4a/0x5a
 [8154c8ca] apic_timer_interrupt+0x6a/0x70
 EOI  [8144d710] ? cpuidle_enter_state+0xa0/0xd0
 [8144d6cb] ? cpuidle_enter_state+0x5b/0xd0
 [8144d887] cpuidle_idle_call+0xc7/0x160
 [8100cf5e] arch_cpu_idle+0xe/0x30
 [810a696a] cpu_idle_loop+0x9a/0x240
 [810b9e64] ? clockevents_register_device+0xc4/0x130
 [810a6b33] cpu_startup_entry+0x23/0x30
 [81032d5a] start_secondary+0x7a/0x80
Shutting down cpus with NMI
Kernel Offset: 0x0 from 0x8100 (relocation range: 
0x8000-0x9fff)
[ cut here ]
WARNING: CPU: 10 PID: 0 at arch/x86/kernel/smp.c:124 
native_smp_send_reschedule+0x5f/0x70()
Modules linked in: skd(O) dm_thin_pool(O) dm_bio_prison(O) 
dm_persistent_data(O) dm_bufio(O) libcrc32c ebtable_nat ebtables xt_CHECKSUM 
iptable_mangle bridge autofs4 target_core_iblock target_core_file 
target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 8021q libfc garp 
stp scsi_transport_fc llc scsi_tgt sunrpc cpufreq_ondemand ipt_REJECT 
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
ip6_tables bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net macvtap macvlan vhost tun 
kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core 
igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses 
enclosure sg acpi_cpufreq ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic 
ata_piix sd_mod crc_t10dif crct10dif_common dm_mirror dm_region_hash dm_log 
dm_mod megaraid_sas [last unloaded: skd]
CPU: 10 PID: 0 Comm: swapper/10 Tainted: GW  O 3.14.0-rc1.snitm+ #4
Hardware name: FUJITSU  PRIMERGY RX300 S6 
/D2619, BIOS 6.00 Rev. 1.10.2619.N1   05/24/2011
 007c 88033fd478c0 8153f180 007c
  88033fd47900 8104e9bc 88033fd52c40
 88033fc52c40 0002 88033fd52c40 8803329be250
Call Trace:
 NMI  [8153f180] dump_stack+0x49/0x61
 [8104e9bc] warn_slowpath_common+0x8c/0xc0
 [8104ea0a] warn_slowpath_null+0x1a/0x20
 [8103141f] native_smp_send_reschedule+0x5f/0x70
 [81087e3e] trigger_load_balance+0x15e/0x200
 [8107ccf7] scheduler_tick+0xa7/0xe0
 [8105a031] update_process_times+0x61/0x80
 [8131863c] ? apei_exec_write_register_value+0x1c/0x20
 [810bbfb9] tick_sched_handle+0x39/0x80
 [810bc1e4] tick_sched_timer+0x54/0x90
 [810743be] 

Re: skd: disable discard support

2014-02-12 Thread Mike Snitzer
On Wed, Feb 12 2014 at  5:18pm -0500,
Mike Snitzer snit...@redhat.com wrote:

 The skd driver has never handled discards reliably.
 
 The kernel will BUG as a result of issuing discards to the skd device.
 Disable the skd driver's discard support until it is proven reliable.
 
 The device-mapper-test-suite test that exposed this bug just issues a
 discard that covers a portion of the skd device that was previously
 written through a dm-thin device.  The discard spans the entire 1GB thin
 device (logical sector 0 through 2097152).
 
 dmtest run --profile stec --suite thin-provisioning -n 
 /discard_fully_provisioned_device/

I retested after applying these linux-block.git commits ontop of
3.14-rc1:

5cb8850c9c4a block: Explicitly handle discard/write same segments
8423ae3d7a3c block: Fix cloning of discard/write same bios

And got this:

request botched: dev skd0: type=1, flags=12248081
  sector 8390784, nr/cnr 0/128
  bio 88033169cba0, biotail 88032e42bb60, buffer   (null), len 
0   

[ cut here ]

   
kernel BUG at block/blk-core.c:2693!

   
invalid opcode:  [#1] SMP   

   
Modules linked in: dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio 
libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 
target_core_iblock t
arget_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 
8021q libfc garp stp llc scsi_transport_fc scsi_tgt sunrpc cpufreq_ondemand 
ipt_REJECT n
f_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
ip6_tables bnx2i cnic uio i
pv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan 
vhost tun kvm_int
el kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb 
i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses 
enclosure sg ac
pi_cpufreq dm_mod ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix 
skd sd_mod crc_t10dif crct10dif_common megaraid_sas 
   
CPU: 2 PID: 0 Comm: swapper/2 Tainted: GW3.14.0-rc1.snitm+ #5   

   
Hardware name: FUJITSU  PRIMERGY RX300 S6 
/D2619, BIOS 6.00 Rev. 1.10.2619.N1   05/24/2011
 
task: 88033299e150 ti: 8803329a4000 task.ti: 8803329a4000   

   
RIP: 0010:[81252f1a]  [81252f1a] 
__blk_end_request_all+0x2a/0x40 
  
RSP: 0018:88033fc43cf8  EFLAGS: 00010002

   
RAX: 0001 RBX: 88032e636ac8 RCX: 0006   

   
RDX: 0001 RSI: 88033169cba0 RDI: 88032ec755c0   

   
RBP: 88033fc43cf8 R08: 0002 R09:    

   
R10: 06f3 R11: 0001 R12:    

   
R13: 88033195faa8 R14: 8800ba396000 R15: 0001   

   
FS:  () GS:88033fc4() knlGS:

   
CS:  0010 DS:  ES:  CR0: 8005003b   

   
CR2: 003bfea13000 CR3: 00032fbdc000 CR4: 07e0   

   
Stack: