[PATCH AUTOSEL for 4.14 107/161] blk-mq: fix discard merge with scheduler attached

2018-04-08 Thread Sasha Levin
From: Jens Axboe 

[ Upstream commit 445251d0f4d329aa061f323546cd6388a3bb7ab5 ]

I ran into an issue on my laptop that triggered a bug on the
discard path:

WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 
nvme_setup_cmd+0x3d3/0x430
 Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc 
snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat 
snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 
snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq 
x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb 
snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer 
videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 
crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore 
hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd 
intel_gtt
 CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G U   4.15.0+ #176
 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
 RIP: 0010:nvme_setup_cmd+0x3d3/0x430
 RSP: 0018:880423e9f838 EFLAGS: 00010217
 RAX:  RBX: 880423e9f8c8 RCX: 0001
 RDX: 88022b200010 RSI: 0002 RDI: 327f
 RBP: 880421251400 R08: 88022b20 R09: 0009
 R10:  R11:  R12: 
 R13: 88042341e280 R14:  R15: 880421251440
 FS:  () GS:88044150() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 55b684795030 CR3: 02e09006 CR4: 001606e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: fffe0ff0 DR7: 0400
 Call Trace:
  nvme_queue_rq+0x40/0xa00
  ? __sbitmap_queue_get+0x24/0x90
  ? blk_mq_get_tag+0xa3/0x250
  ? wait_woken+0x80/0x80
  ? blk_mq_get_driver_tag+0x97/0xf0
  blk_mq_dispatch_rq_list+0x7b/0x4a0
  ? deadline_remove_request+0x49/0xb0
  blk_mq_do_dispatch_sched+0x4f/0xc0
  blk_mq_sched_dispatch_requests+0x106/0x170
  __blk_mq_run_hw_queue+0x53/0xa0
  __blk_mq_delay_run_hw_queue+0x83/0xa0
  blk_mq_run_hw_queue+0x6c/0xd0
  blk_mq_sched_insert_request+0x96/0x140
  __blk_mq_try_issue_directly+0x3d/0x190
  blk_mq_try_issue_directly+0x30/0x70
  blk_mq_make_request+0x1a4/0x6a0
  generic_make_request+0xfd/0x2f0
  ? submit_bio+0x5c/0x110
  submit_bio+0x5c/0x110
  ? __blkdev_issue_discard+0x152/0x200
  submit_bio_wait+0x43/0x60
  ext4_process_freed_data+0x1cd/0x440
  ? account_page_dirtied+0xe2/0x1a0
  ext4_journal_commit_callback+0x4a/0xc0
  jbd2_journal_commit_transaction+0x17e2/0x19e0
  ? kjournald2+0xb0/0x250
  kjournald2+0xb0/0x250
  ? wait_woken+0x80/0x80
  ? commit_timeout+0x10/0x10
  kthread+0x111/0x130
  ? kthread_create_worker_on_cpu+0x50/0x50
  ? do_group_exit+0x3a/0xa0
  ret_from_fork+0x1f/0x30
 Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 
8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 
0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff
 ---[ end trace 50d361cc444506c8 ]---
 print_req_error: I/O error, dev nvme0n1, sector 847167488

Decoding the assembly, the request claims to have 0x segments,
while nvme counts two. This turns out to be because we don't check
for a data carrying request on the mq scheduler path, and since
blk_phys_contig_segment() returns true for a non-data request,
we decrement the initial segment count of 0 and end up with
0x in the unsigned short.

There are a few issues here:

1) We should initialize the segment count for a discard to 1.
2) The discard merging is currently using the data limits for
   segments and sectors.

Fix this up by having attempt_merge() correctly identify the
request, and by initializing the segment count correctly
for discards.

This can only be triggered with mq-deadline on discard capable
devices right now, which isn't a common configuration.

Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
---
 block/blk-core.c  |  2 ++
 block/blk-merge.c | 29 ++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c01f4907dbbc..1feeb1a8aad9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3065,6 +3065,8 @@ void blk_rq_bio_prep(struct request_queue *q, struct 
request *rq,
 {
if (bio_has_data(bio))
rq->nr_phys_segments = bio_phys_segments(q, bio);
+   else if (bio_op(bio) == REQ_OP_DISCARD)
+   rq->nr_phys_segments = 1;
 
rq->__data_len = bio->bi_iter.bi_size;
rq->bio = rq->biotail = bio;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index f5dedd57dff6..8d60a5bbcef9 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -551,6 +551,24 @@ static bool req_no_special_merge(struct request *req)
 

[PATCH AUTOSEL for 4.14 107/161] blk-mq: fix discard merge with scheduler attached

2018-04-08 Thread Sasha Levin
From: Jens Axboe 

[ Upstream commit 445251d0f4d329aa061f323546cd6388a3bb7ab5 ]

I ran into an issue on my laptop that triggered a bug on the
discard path:

WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 
nvme_setup_cmd+0x3d3/0x430
 Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc 
snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat 
snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 
snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq 
x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb 
snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer 
videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 
crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore 
hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd 
intel_gtt
 CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G U   4.15.0+ #176
 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
 RIP: 0010:nvme_setup_cmd+0x3d3/0x430
 RSP: 0018:880423e9f838 EFLAGS: 00010217
 RAX:  RBX: 880423e9f8c8 RCX: 0001
 RDX: 88022b200010 RSI: 0002 RDI: 327f
 RBP: 880421251400 R08: 88022b20 R09: 0009
 R10:  R11:  R12: 
 R13: 88042341e280 R14:  R15: 880421251440
 FS:  () GS:88044150() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 55b684795030 CR3: 02e09006 CR4: 001606e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: fffe0ff0 DR7: 0400
 Call Trace:
  nvme_queue_rq+0x40/0xa00
  ? __sbitmap_queue_get+0x24/0x90
  ? blk_mq_get_tag+0xa3/0x250
  ? wait_woken+0x80/0x80
  ? blk_mq_get_driver_tag+0x97/0xf0
  blk_mq_dispatch_rq_list+0x7b/0x4a0
  ? deadline_remove_request+0x49/0xb0
  blk_mq_do_dispatch_sched+0x4f/0xc0
  blk_mq_sched_dispatch_requests+0x106/0x170
  __blk_mq_run_hw_queue+0x53/0xa0
  __blk_mq_delay_run_hw_queue+0x83/0xa0
  blk_mq_run_hw_queue+0x6c/0xd0
  blk_mq_sched_insert_request+0x96/0x140
  __blk_mq_try_issue_directly+0x3d/0x190
  blk_mq_try_issue_directly+0x30/0x70
  blk_mq_make_request+0x1a4/0x6a0
  generic_make_request+0xfd/0x2f0
  ? submit_bio+0x5c/0x110
  submit_bio+0x5c/0x110
  ? __blkdev_issue_discard+0x152/0x200
  submit_bio_wait+0x43/0x60
  ext4_process_freed_data+0x1cd/0x440
  ? account_page_dirtied+0xe2/0x1a0
  ext4_journal_commit_callback+0x4a/0xc0
  jbd2_journal_commit_transaction+0x17e2/0x19e0
  ? kjournald2+0xb0/0x250
  kjournald2+0xb0/0x250
  ? wait_woken+0x80/0x80
  ? commit_timeout+0x10/0x10
  kthread+0x111/0x130
  ? kthread_create_worker_on_cpu+0x50/0x50
  ? do_group_exit+0x3a/0xa0
  ret_from_fork+0x1f/0x30
 Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 
8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 
0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff
 ---[ end trace 50d361cc444506c8 ]---
 print_req_error: I/O error, dev nvme0n1, sector 847167488

Decoding the assembly, the request claims to have 0x segments,
while nvme counts two. This turns out to be because we don't check
for a data carrying request on the mq scheduler path, and since
blk_phys_contig_segment() returns true for a non-data request,
we decrement the initial segment count of 0 and end up with
0x in the unsigned short.

There are a few issues here:

1) We should initialize the segment count for a discard to 1.
2) The discard merging is currently using the data limits for
   segments and sectors.

Fix this up by having attempt_merge() correctly identify the
request, and by initializing the segment count correctly
for discards.

This can only be triggered with mq-deadline on discard capable
devices right now, which isn't a common configuration.

Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
---
 block/blk-core.c  |  2 ++
 block/blk-merge.c | 29 ++---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c01f4907dbbc..1feeb1a8aad9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3065,6 +3065,8 @@ void blk_rq_bio_prep(struct request_queue *q, struct 
request *rq,
 {
if (bio_has_data(bio))
rq->nr_phys_segments = bio_phys_segments(q, bio);
+   else if (bio_op(bio) == REQ_OP_DISCARD)
+   rq->nr_phys_segments = 1;
 
rq->__data_len = bio->bi_iter.bi_size;
rq->bio = rq->biotail = bio;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index f5dedd57dff6..8d60a5bbcef9 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -551,6 +551,24 @@ static bool req_no_special_merge(struct request *req)
return !q->mq_ops && req->special;
 }
 
+static bool