Re: [PATCH] loop: fix no-unmap write-zeroes request behavior

2019-10-11 Thread Christoph Hellwig
On Thu, Oct 10, 2019 at 10:02:39AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong 
> 
> Currently, if the loop device receives a WRITE_ZEROES request, it asks
> the underlying filesystem to punch out the range.  This behavior is
> correct if unmapping is allowed.  However, a NOUNMAP request means that
> the caller forbids us from freeing the storage backing the range, so
> punching out the range is incorrect behavior.

It doesn't really forbid, as most protocols don't have a way for forbid
deallocation.  It requests not to.

Otherwise this looks fine, although I would have implemented it slightly
differently:

>   case REQ_OP_FLUSH:
>   return lo_req_flush(lo, rq);
>   case REQ_OP_DISCARD:
> - case REQ_OP_WRITE_ZEROES:
>   return lo_discard(lo, rq, pos);
> + case REQ_OP_WRITE_ZEROES:
> + return lo_zeroout(lo, rq, pos);

This could just become:

case REQ_OP_WRITE_ZEROES:
if (rq->cmd_flags & REQ_NOUNMAP))
return lo_zeroout(lo, rq, pos);
/*FALLTHRU*/
case REQ_OP_DISCARD:
return lo_discard(lo, rq, pos);


Re: io_uring NULL pointer dereference on Linux v5.4-rc1

2019-10-11 Thread Stefan Hajnoczi
On Wed, Oct 09, 2019 at 02:36:01PM -0600, Jens Axboe wrote:
> On 10/9/19 11:46 AM, Stefan Hajnoczi wrote:
> > On Wed, Oct 09, 2019 at 05:27:44AM -0600, Jens Axboe wrote:
> >> On 10/9/19 3:23 AM, Stefan Hajnoczi wrote:
> >>> I hit this NULL pointer dereference when running qemu-iotests 052 (raw)
> >>> on both ext4 and XFS on dm-thin/luks.  The kernel is Linux v5.4-rc1 but
> >>> I haven't found any obvious fixes in Jens' tree, so it's likely that
> >>> this bug is still present:
> >>>
> >>> BUG: kernel NULL pointer dereference, address: 0102
> >>> #PF: supervisor read access in kernel mode
> >>> #PF: error_code(0x) - not-present page
> >>> PGD 0 P4D 0
> >>> Oops:  [#1] SMP PTI
> >>> CPU: 2 PID: 6656 Comm: qemu-io Not tainted 5.4.0-rc1 #1
> >>> Hardware name: LENOVO 20BTS1N70V/20BTS1N70V, BIOS N14ET37W (1.15 ) 
> >>> 09/06/2016
> >>> RIP: 0010:__queue_work+0x1f/0x3b0
> >>> Code: eb df 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 49 89 f7 41 
> >>> 56 41 89 fe 41 55 41 89 fd 41 54 55 48 89 d5 53 48 83 ec 10  86 02 01 
> >>> 00 00 01 0f 85 bc 02 00 00 49 bc eb 83 b5 80 46 86 c8
> >>> RSP: 0018:bef4884bbd58 EFLAGS: 00010082
> >>> RAX: 0246 RBX: 0246 RCX: 
> >>> RDX: 9903901f4460 RSI:  RDI: 0040
> >>> RBP: 9903901f4460 R08: 9903901fb040 R09: 990398614700
> >>> R10: 0030 R11:  R12: 
> >>> R13: 0040 R14: 0040 R15: 
> >>> FS:  7f7d2a4e4a80() GS:9903a5a8() 
> >>> knlGS:
> >>> CS:  0010 DS:  ES:  CR0: 80050033
> >>> CR2: 0102 CR3: 000203da8004 CR4: 003606e0
> >>> DR0:  DR1:  DR2: 
> >>> DR3:  DR6: fffe0ff0 DR7: 0400
> >>> Call Trace:
> >>>? __io_queue_sqe+0xa1/0x200
> >>>queue_work_on+0x36/0x40
> >>>__io_queue_sqe+0x16e/0x200
> >>>io_ring_submit+0xd2/0x230
> >>>? percpu_ref_resurrect+0x46/0x70
> >>>? __io_uring_register+0x207/0xa30
> >>>? __schedule+0x286/0x700
> >>>__x64_sys_io_uring_enter+0x1a3/0x280
> >>>? __x64_sys_io_uring_register+0x64/0xb0
> >>>do_syscall_64+0x5b/0x180
> >>>entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> RIP: 0033:0x7f7d3439f1fd
> >>> Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 
> >>> f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 
> >>> ff ff 73 01 c3 48 8b 0d 5b 8c 0c 00 f7 d8 64 89 01 48
> >>> RSP: 002b:7f7d2918d408 EFLAGS: 0216 ORIG_RAX: 01aa
> >>> RAX: ffda RBX: 7f7d2918d4f0 RCX: 7f7d3439f1fd
> >>> RDX:  RSI: 0001 RDI: 000a
> >>> RBP:  R08:  R09: 0008
> >>> R10:  R11: 0216 R12: 5616e3c32ab8
> >>> R13: 5616e3c32b78 R14: 5616e3c32ab0 R15: 0001
> >>> Modules linked in: fuse ccm xt_CHECKSUM xt_MASQUERADE tun bridge stp llc 
> >>> nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter 
> >>> ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack 
> >>> ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security 
> >>> iptable_nat nf_nat iptable_mangle iptable_raw iptable_security 
> >>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink 
> >>> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
> >>> ip_tables sunrpc vfat fat intel_rapl_msr rmi_smbus iwlmvm rmi_core 
> >>> intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp mac80211 
> >>> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm_intel 
> >>> snd_hda_intel kvm snd_intel_nhlt snd_hda_codec snd_usb_audio irqbypass 
> >>> uvcvideo snd_hda_core snd_usbmidi_lib snd_rawmidi iTCO_wdt snd_hwdep 
> >>> libarc4 intel_cstate cdc_ether intel_uncore videobuf2_vmalloc iwlwifi 
> >>> mei_wdt mei_hdcp iTCO_vendor_support snd_seq videobuf2_memops usbnet 
> >>> videobuf2_v4l2 snd_seq_device
> >>>intel_rapl_perf pcspkr videobuf2_common joydev wmi_bmof snd_pcm 
> >>> cfg80211 r8152 videodev intel_pch_thermal i2c_i801 mii mc thinkpad_acpi 
> >>> snd_timer mei_me ledtrig_audio snd lpc_ich mei soundcore rfkill 
> >>> binfmt_misc xfs dm_thin_pool dm_persistent_data dm_bio_prison libcrc32c 
> >>> dm_crypt i915 i2c_algo_bit drm_kms_helper drm crct10dif_pclmul 
> >>> crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw wmi video
> >>> CR2: 0102
> >>> ---[ end trace 2ac747acabe218da ]---
> >>> RIP: 0010:__queue_work+0x1f/0x3b0
> >>> Code: eb df 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 49 89 f7 41 
> >>> 56 41 89 fe 41 55 41 89 fd 41 54 55 48 89 d5 53 48 83 ec 10  86 02 01 
> >>> 00 00 01 0f 85 bc 02 00 00 49 bc eb 83 b5 80 46 86 c8
> >>> RSP: 0018:bef4884bbd58 EFLAGS: 00010082
> >>> RAX: 0246 RBX: 0246 RCX

Re: [PATCH V3 0/5] blk-mq: improvement on handling IO during CPU hotplug

2019-10-11 Thread John Garry

On 10/10/2019 12:21, John Garry wrote:




As discussed before, tags of hisilicon V3 is HBA wide. If you switch
to real hw queue, each hw queue has to own its independent tags.
However, that isn't supported by V3 hardware.


I am generating the tag internally in the driver now, so that hostwide
tags issue should not be an issue.

And, to be clear, I am not paying too much attention to performance, but
rather just hotplugging while running IO.

An update on testing:
I did some scripted overnight testing. The script essentially loops like
this:
- online all CPUS
- run fio binded on a limited bunch of CPUs to cover a hctx mask for 1
minute
- offline those CPUs
- wait 1 minute (> SCSI or NVMe timeout)
- and repeat

SCSI is actually quite stable, but NVMe isn't. For NVMe I am finding
some fio processes never dying with IOPS @ 0. I don't see any NVMe
timeout reported. Did you do any NVMe testing of this sort?



Yeah, so for NVMe, I see some sort of regression, like this:
Jobs: 1 (f=1): [_R] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 
1158037877d:17h:18m:22s]


I have tested against vanilla 5.4 rc1 without problem.

If you can advise some debug to add, then I'd appreciate it. If not, 
I'll try to add some debug to the new paths introduced in this series to 
see if anything I hit coincides with the error state, so will at least 
have a hint...


Thanks,
John






See previous discussion:

https://marc.info/?t=15592886301&r=1&w=2


Thanks,
Ming





Re: [PATCH V3 0/5] blk-mq: improvement on handling IO during CPU hotplug

2019-10-11 Thread Ming Lei
On Fri, Oct 11, 2019 at 4:54 PM John Garry  wrote:
>
> On 10/10/2019 12:21, John Garry wrote:
> >
> >>
> >> As discussed before, tags of hisilicon V3 is HBA wide. If you switch
> >> to real hw queue, each hw queue has to own its independent tags.
> >> However, that isn't supported by V3 hardware.
> >
> > I am generating the tag internally in the driver now, so that hostwide
> > tags issue should not be an issue.
> >
> > And, to be clear, I am not paying too much attention to performance, but
> > rather just hotplugging while running IO.
> >
> > An update on testing:
> > I did some scripted overnight testing. The script essentially loops like
> > this:
> > - online all CPUS
> > - run fio binded on a limited bunch of CPUs to cover a hctx mask for 1
> > minute
> > - offline those CPUs
> > - wait 1 minute (> SCSI or NVMe timeout)
> > - and repeat
> >
> > SCSI is actually quite stable, but NVMe isn't. For NVMe I am finding
> > some fio processes never dying with IOPS @ 0. I don't see any NVMe
> > timeout reported. Did you do any NVMe testing of this sort?
> >
>
> Yeah, so for NVMe, I see some sort of regression, like this:
> Jobs: 1 (f=1): [_R] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
> 1158037877d:17h:18m:22s]

I can reproduce this issue, and looks there are requests in ->dispatch.
I am a bit busy this week, please feel free to investigate it and debugfs
can help you much. I may have time next week for looking this issue.

Thanks,
Ming Lei


Re: io_uring NULL pointer dereference on Linux v5.4-rc1

2019-10-11 Thread Jens Axboe
On 10/11/19 2:46 AM, Stefan Hajnoczi wrote:
> On Wed, Oct 09, 2019 at 02:36:01PM -0600, Jens Axboe wrote:
>> On 10/9/19 11:46 AM, Stefan Hajnoczi wrote:
>>> On Wed, Oct 09, 2019 at 05:27:44AM -0600, Jens Axboe wrote:
 On 10/9/19 3:23 AM, Stefan Hajnoczi wrote:
> I hit this NULL pointer dereference when running qemu-iotests 052 (raw)
> on both ext4 and XFS on dm-thin/luks.  The kernel is Linux v5.4-rc1 but
> I haven't found any obvious fixes in Jens' tree, so it's likely that
> this bug is still present:
>
> BUG: kernel NULL pointer dereference, address: 0102
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x) - not-present page
> PGD 0 P4D 0
> Oops:  [#1] SMP PTI
> CPU: 2 PID: 6656 Comm: qemu-io Not tainted 5.4.0-rc1 #1
> Hardware name: LENOVO 20BTS1N70V/20BTS1N70V, BIOS N14ET37W (1.15 ) 
> 09/06/2016
> RIP: 0010:__queue_work+0x1f/0x3b0
> Code: eb df 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 49 89 f7 41 
> 56 41 89 fe 41 55 41 89 fd 41 54 55 48 89 d5 53 48 83 ec 10  86 02 01 
> 00 00 01 0f 85 bc 02 00 00 49 bc eb 83 b5 80 46 86 c8
> RSP: 0018:bef4884bbd58 EFLAGS: 00010082
> RAX: 0246 RBX: 0246 RCX: 
> RDX: 9903901f4460 RSI:  RDI: 0040
> RBP: 9903901f4460 R08: 9903901fb040 R09: 990398614700
> R10: 0030 R11:  R12: 
> R13: 0040 R14: 0040 R15: 
> FS:  7f7d2a4e4a80() GS:9903a5a8() 
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0102 CR3: 000203da8004 CR4: 003606e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
> ? __io_queue_sqe+0xa1/0x200
> queue_work_on+0x36/0x40
> __io_queue_sqe+0x16e/0x200
> io_ring_submit+0xd2/0x230
> ? percpu_ref_resurrect+0x46/0x70
> ? __io_uring_register+0x207/0xa30
> ? __schedule+0x286/0x700
> __x64_sys_io_uring_enter+0x1a3/0x280
> ? __x64_sys_io_uring_register+0x64/0xb0
> do_syscall_64+0x5b/0x180
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f7d3439f1fd
> Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 
> f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 
> ff ff 73 01 c3 48 8b 0d 5b 8c 0c 00 f7 d8 64 89 01 48
> RSP: 002b:7f7d2918d408 EFLAGS: 0216 ORIG_RAX: 01aa
> RAX: ffda RBX: 7f7d2918d4f0 RCX: 7f7d3439f1fd
> RDX:  RSI: 0001 RDI: 000a
> RBP:  R08:  R09: 0008
> R10:  R11: 0216 R12: 5616e3c32ab8
> R13: 5616e3c32b78 R14: 5616e3c32ab0 R15: 0001
> Modules linked in: fuse ccm xt_CHECKSUM xt_MASQUERADE tun bridge stp llc 
> nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter 
> ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack 
> ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security 
> iptable_nat nf_nat iptable_mangle iptable_raw iptable_security 
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink 
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
> ip_tables sunrpc vfat fat intel_rapl_msr rmi_smbus iwlmvm rmi_core 
> intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp mac80211 
> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm_intel 
> snd_hda_intel kvm snd_intel_nhlt snd_hda_codec snd_usb_audio irqbypass 
> uvcvideo snd_hda_core snd_usbmidi_lib snd_rawmidi iTCO_wdt snd_hwdep 
> libarc4 intel_cstate cdc_ether intel_uncore videobuf2_vmalloc iwlwifi 
> mei_wdt mei_hdcp iTCO_vendor_support snd_seq videobuf2_memops usbnet 
> videobuf2_v4l2 snd_seq_device
> intel_rapl_perf pcspkr videobuf2_common joydev wmi_bmof snd_pcm 
> cfg80211 r8152 videodev intel_pch_thermal i2c_i801 mii mc thinkpad_acpi 
> snd_timer mei_me ledtrig_audio snd lpc_ich mei soundcore rfkill 
> binfmt_misc xfs dm_thin_pool dm_persistent_data dm_bio_prison libcrc32c 
> dm_crypt i915 i2c_algo_bit drm_kms_helper drm crct10dif_pclmul 
> crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw wmi video
> CR2: 0102
> ---[ end trace 2ac747acabe218da ]---
> RIP: 0010:__queue_work+0x1f/0x3b0
> Code: eb df 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 49 89 f7 41 
> 56 41 89 fe 41 55 41 89 fd 41 54 55 48 89 d5 53 48 83 ec 10  86 02 01 
> 00 00 01 0f 85 bc 02 00 00 49 bc eb 83 b5 80 46 86 c8
> RSP: 0018:bef4884bbd58 EFLAGS: 000

Re: [PATCH V3 0/5] blk-mq: improvement on handling IO during CPU hotplug

2019-10-11 Thread John Garry

On 11/10/2019 12:55, Ming Lei wrote:

On Fri, Oct 11, 2019 at 4:54 PM John Garry  wrote:


On 10/10/2019 12:21, John Garry wrote:




As discussed before, tags of hisilicon V3 is HBA wide. If you switch
to real hw queue, each hw queue has to own its independent tags.
However, that isn't supported by V3 hardware.


I am generating the tag internally in the driver now, so that hostwide
tags issue should not be an issue.

And, to be clear, I am not paying too much attention to performance, but
rather just hotplugging while running IO.

An update on testing:
I did some scripted overnight testing. The script essentially loops like
this:
- online all CPUS
- run fio binded on a limited bunch of CPUs to cover a hctx mask for 1
minute
- offline those CPUs
- wait 1 minute (> SCSI or NVMe timeout)
- and repeat

SCSI is actually quite stable, but NVMe isn't. For NVMe I am finding
some fio processes never dying with IOPS @ 0. I don't see any NVMe
timeout reported. Did you do any NVMe testing of this sort?



Yeah, so for NVMe, I see some sort of regression, like this:
Jobs: 1 (f=1): [_R] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
1158037877d:17h:18m:22s]


I can reproduce this issue, and looks there are requests in ->dispatch.


OK, that may match with what I see:
- the problem occuring coincides with this callpath with 
BLK_MQ_S_INTERNAL_STOPPED set:


blk_mq_request_bypass_insert
(__)blk_mq_try_issue_list_directly
blk_mq_sched_insert_requests
blk_mq_flush_plug_list
blk_flush_plug_list
blk_finish_plug
blkdev_direct_IO
generic_file_read_iter
blkdev_read_iter
aio_read
io_submit_one

blk_mq_request_bypass_insert() adds to the dispatch list, and looking at 
debugfs, could this be that dispatched request sitting:

root@(none)$ more /sys/kernel/debug/block/nvme0n1/hctx18/dispatch
ac28511d {.op=READ, .cmd_flags=, .rq_flags=IO_STAT, .state=idle, 
.tag=56, .internal_tag=-1}


So could there be some race here?


I am a bit busy this week, please feel free to investigate it and debugfs
can help you much. I may have time next week for looking this issue.



OK, appreciated

John


Thanks,
Ming Lei







Re: io_uring NULL pointer dereference on Linux v5.4-rc1

2019-10-11 Thread Stefan Hajnoczi
On Fri, Oct 11, 2019 at 06:08:34AM -0600, Jens Axboe wrote:
> On 10/11/19 2:46 AM, Stefan Hajnoczi wrote:
> > On Wed, Oct 09, 2019 at 02:36:01PM -0600, Jens Axboe wrote:
> >> On 10/9/19 11:46 AM, Stefan Hajnoczi wrote:
> >>> On Wed, Oct 09, 2019 at 05:27:44AM -0600, Jens Axboe wrote:
>  On 10/9/19 3:23 AM, Stefan Hajnoczi wrote:
> > I hit this NULL pointer dereference when running qemu-iotests 052 (raw)
> > on both ext4 and XFS on dm-thin/luks.  The kernel is Linux v5.4-rc1 but
> > I haven't found any obvious fixes in Jens' tree, so it's likely that
> > this bug is still present:
> >
> > BUG: kernel NULL pointer dereference, address: 0102
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x) - not-present page
> > PGD 0 P4D 0
> > Oops:  [#1] SMP PTI
> > CPU: 2 PID: 6656 Comm: qemu-io Not tainted 5.4.0-rc1 #1
> > Hardware name: LENOVO 20BTS1N70V/20BTS1N70V, BIOS N14ET37W (1.15 ) 
> > 09/06/2016
> > RIP: 0010:__queue_work+0x1f/0x3b0
> > Code: eb df 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 49 89 f7 41 
> > 56 41 89 fe 41 55 41 89 fd 41 54 55 48 89 d5 53 48 83 ec 10  86 02 
> > 01 00 00 01 0f 85 bc 02 00 00 49 bc eb 83 b5 80 46 86 c8
> > RSP: 0018:bef4884bbd58 EFLAGS: 00010082
> > RAX: 0246 RBX: 0246 RCX: 
> > RDX: 9903901f4460 RSI:  RDI: 0040
> > RBP: 9903901f4460 R08: 9903901fb040 R09: 990398614700
> > R10: 0030 R11:  R12: 
> > R13: 0040 R14: 0040 R15: 
> > FS:  7f7d2a4e4a80() GS:9903a5a8() 
> > knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 0102 CR3: 000203da8004 CR4: 003606e0
> > DR0:  DR1:  DR2: 
> > DR3:  DR6: fffe0ff0 DR7: 0400
> > Call Trace:
> > ? __io_queue_sqe+0xa1/0x200
> > queue_work_on+0x36/0x40
> > __io_queue_sqe+0x16e/0x200
> > io_ring_submit+0xd2/0x230
> > ? percpu_ref_resurrect+0x46/0x70
> > ? __io_uring_register+0x207/0xa30
> > ? __schedule+0x286/0x700
> > __x64_sys_io_uring_enter+0x1a3/0x280
> > ? __x64_sys_io_uring_register+0x64/0xb0
> > do_syscall_64+0x5b/0x180
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > RIP: 0033:0x7f7d3439f1fd
> > Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 
> > f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
> > f0 ff ff 73 01 c3 48 8b 0d 5b 8c 0c 00 f7 d8 64 89 01 48
> > RSP: 002b:7f7d2918d408 EFLAGS: 0216 ORIG_RAX: 01aa
> > RAX: ffda RBX: 7f7d2918d4f0 RCX: 7f7d3439f1fd
> > RDX:  RSI: 0001 RDI: 000a
> > RBP:  R08:  R09: 0008
> > R10:  R11: 0216 R12: 5616e3c32ab8
> > R13: 5616e3c32b78 R14: 5616e3c32ab0 R15: 0001
> > Modules linked in: fuse ccm xt_CHECKSUM xt_MASQUERADE tun bridge stp 
> > llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter 
> > ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack 
> > ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security 
> > iptable_nat nf_nat iptable_mangle iptable_raw iptable_security 
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink 
> > ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
> > ip_tables sunrpc vfat fat intel_rapl_msr rmi_smbus iwlmvm rmi_core 
> > intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp 
> > mac80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi 
> > kvm_intel snd_hda_intel kvm snd_intel_nhlt snd_hda_codec snd_usb_audio 
> > irqbypass uvcvideo snd_hda_core snd_usbmidi_lib snd_rawmidi iTCO_wdt 
> > snd_hwdep libarc4 intel_cstate cdc_ether intel_uncore videobuf2_vmalloc 
> > iwlwifi mei_wdt mei_hdcp iTCO_vendor_support snd_seq videobuf2_memops 
> > usbnet videobuf2_v4l2 snd_seq_device
> > intel_rapl_perf pcspkr videobuf2_common joydev wmi_bmof snd_pcm 
> > cfg80211 r8152 videodev intel_pch_thermal i2c_i801 mii mc thinkpad_acpi 
> > snd_timer mei_me ledtrig_audio snd lpc_ich mei soundcore rfkill 
> > binfmt_misc xfs dm_thin_pool dm_persistent_data dm_bio_prison libcrc32c 
> > dm_crypt i915 i2c_algo_bit drm_kms_helper drm crct10dif_pclmul 
> > crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw wmi video
> > CR2: 0102
> > ---[ end trace 2ac747acabe218da ]---
> > RIP: 0010:__queue_work+0x1f/0x3b0
> > Code: eb df 66 0f 1f 84 00 00 00

[PATCH v2] loop: fix no-unmap write-zeroes request behavior

2019-10-11 Thread Darrick J. Wong
From: Darrick J. Wong 

Currently, if the loop device receives a WRITE_ZEROES request, it asks
the underlying filesystem to punch out the range.  This behavior is
correct if unmapping is allowed.  However, a NOUNMAP request means that
the caller forbids us from freeing the storage backing the range, so
punching out the range is incorrect behavior.

To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
the fallocate documentation) required to ensure that the entire range is
backed by real storage, which suffices for our purposes.

Fixes: 19372e2769179dd ("loop: implement REQ_OP_WRITE_ZEROES")
Signed-off-by: Darrick J. Wong 
---
v2: reorganize a little according to hch feedback
---
 drivers/block/loop.c |   31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index f6f77eaa7217..4943d0c5c61c 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -441,6 +441,28 @@ static int lo_discard(struct loop_device *lo, struct 
request *rq, loff_t pos)
return ret;
 }
 
+static int lo_zeroout(struct loop_device *lo, struct request *rq, loff_t pos)
+{
+   struct file *file = lo->lo_backing_file;
+   int mode = FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE;
+   int ret;
+
+   /*
+* Ask the fs to zero out the blocks, which is supposed to result in
+* space being allocated to the file.
+*/
+   if (!file->f_op->fallocate) {
+   ret = -EOPNOTSUPP;
+   goto out;
+   }
+
+   ret = file->f_op->fallocate(file, mode, pos, blk_rq_bytes(rq));
+   if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
+   ret = -EIO;
+ out:
+   return ret;
+}
+
 static int lo_req_flush(struct loop_device *lo, struct request *rq)
 {
struct file *file = lo->lo_backing_file;
@@ -596,8 +618,15 @@ static int do_req_filebacked(struct loop_device *lo, 
struct request *rq)
switch (req_op(rq)) {
case REQ_OP_FLUSH:
return lo_req_flush(lo, rq);
-   case REQ_OP_DISCARD:
case REQ_OP_WRITE_ZEROES:
+   /*
+* If the caller doesn't want deallocation, call zeroout to
+* write zeroes the range.  Otherwise, punch them out.
+*/
+   if (rq->cmd_flags & REQ_NOUNMAP)
+   return lo_zeroout(lo, rq, pos);
+   /* fall through */
+   case REQ_OP_DISCARD:
return lo_discard(lo, rq, pos);
case REQ_OP_WRITE:
if (lo->transfer)


Re: [GIT PULL] Block fixes for 5.4-rc3

2019-10-11 Thread pr-tracker-bot
The pull request you sent on Thu, 10 Oct 2019 20:15:31 -0600:

> git://git.kernel.dk/linux-block.git tags/for-linus-20191010

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/297cb23d4eefbc3043cd2fa5cf577930e695

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH v2 1/1] blk-mq: fill header with kernel-doc

2019-10-11 Thread Bart Van Assche

On 10/10/19 1:38 PM, André Almeida wrote:

Seems that it's not clear for me the role of these members. Could you
please check if those definitions make sense for you?

- hctx->dispatch: This queue is used for requests that are ready to be
dispatched to the hardware but for some reason (e.g. lack of resources,
the hardware is to busy and can't get more requests) could not be sent
to the hardware. As soon as the driver can send new requests, those
queued at this list will be sent first for a more fair dispatch. Since
those requests are at the hctx, they can't be requeued or rescheduled
anymore.

- request_queue->requeue_list: This list is used when it's not possible
to send the request to the associated hctx. This can happen if the
associated CPU or hctx is not available anymore. When requeueing those
requests, it will be possible to send them to new and function queues.


Hi André,

The hctx->dispatch description looks mostly fine. Can the following part 
be left out since it looks confusing to me: "Since those requests are at 
the hctx, they can't be requeued or rescheduled anymore."


How about changing the requeue_list description into the following: 
"requests requeued by a call to blk_mq_requeue_request()".


Thanks,

Bart.