Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On 11/08/2016 05:03 AM, Peter Lieven wrote: > Am 25.10.2016 um 18:12 schrieb Eric Blake: >> On 10/25/2016 09:36 AM, Paolo Bonzini wrote: >>> >>> On 25/10/2016 16:35, Eric Blake wrote: So your argument is that we should always pass down every unaligned less-than-optimum discard request all the way to the hardware, rather than dropping it higher in the stack, even though discard requests are already advisory, in order to leave the hardware as the ultimate decision on whether to ignore the unaligned request? >>> Yes, I agree with Peter as to this. >> Okay, I'll work on patches. I think it counts as bug fix, so appropriate >> even if I miss soft freeze (I'd still like to get NBD write zero support >> into 2.8, since it already missed 2.7, but that one is still awaiting >> review with not much time left). >> > > Hi Eric, > > have you had time to look at this? > If you need help, let me know. Patches posted, but testing help would be appreciated since you have the actual hardware that exhibits the issue. I'm also trying to write a patch to extend the blkdebug driver to share this "feature" of a 15M page, and write a qemu-iotest to make it harder to regress in the future. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On 11/08/2016 05:03 AM, Peter Lieven wrote: > Am 25.10.2016 um 18:12 schrieb Eric Blake: >> On 10/25/2016 09:36 AM, Paolo Bonzini wrote: >>> >>> On 25/10/2016 16:35, Eric Blake wrote: So your argument is that we should always pass down every unaligned less-than-optimum discard request all the way to the hardware, rather than dropping it higher in the stack, even though discard requests are already advisory, in order to leave the hardware as the ultimate decision on whether to ignore the unaligned request? >>> Yes, I agree with Peter as to this. >> Okay, I'll work on patches. I think it counts as bug fix, so appropriate >> even if I miss soft freeze (I'd still like to get NBD write zero support >> into 2.8, since it already missed 2.7, but that one is still awaiting >> review with not much time left). >> > > Hi Eric, > > have you had time to look at this? > If you need help, let me know. Still on my list. I'm not forgetting it, and it does count as a bug fix so it is safe for inclusion, although I'm trying to get it in before this week is out. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
Am 25.10.2016 um 18:12 schrieb Eric Blake: On 10/25/2016 09:36 AM, Paolo Bonzini wrote: On 25/10/2016 16:35, Eric Blake wrote: So your argument is that we should always pass down every unaligned less-than-optimum discard request all the way to the hardware, rather than dropping it higher in the stack, even though discard requests are already advisory, in order to leave the hardware as the ultimate decision on whether to ignore the unaligned request? Yes, I agree with Peter as to this. Okay, I'll work on patches. I think it counts as bug fix, so appropriate even if I miss soft freeze (I'd still like to get NBD write zero support into 2.8, since it already missed 2.7, but that one is still awaiting review with not much time left). Hi Eric, have you had time to look at this? If you need help, let me know. Peter
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On 25/10/2016 16:35, Eric Blake wrote: > So your argument is that we should always pass down every unaligned > less-than-optimum discard request all the way to the hardware, rather > than dropping it higher in the stack, even though discard requests are > already advisory, in order to leave the hardware as the ultimate > decision on whether to ignore the unaligned request? Yes, I agree with Peter as to this. Paolo signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On 10/25/2016 09:20 AM, Peter Lieven wrote: >> The firmware is probably technically buggy for advertising too large of >> a minimum granularity, if it can piece together smaller requests into a >> larger discard. If discards need to happen at a smaller granularity, >> the firmware (or kernel quirk system) should fix the advertisement to >> the actual granularity that it will honor. I don't see a reason to >> change qemu's current behavior. >> > > The issue is that the optimal unmap granularity is only a hint. > There is no evidence what happens with unaligned requests or requests > that are smaller. They could still lead to a discard operation in the > storage device. It just says if you can send me discards of that size > thats optimal for me. Thats not said that smaller or unaligned requests > have no effect. So your argument is that we should always pass down every unaligned less-than-optimum discard request all the way to the hardware, rather than dropping it higher in the stack, even though discard requests are already advisory, in order to leave the hardware as the ultimate decision on whether to ignore the unaligned request? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
Am 25.10.2016 um 15:59 schrieb Eric Blake: On 10/25/2016 07:42 AM, Peter Lieven wrote: But hey, that firmware is seriously weird. :) Yes, so you would not change the new implementation? Even if the discard is e.g. 1MB it could theretically be that internally the device has a finer granularity. Its an optimal discard alignment not the minimum required discard size. I think thats a difference. It does not say I can't handle smaller discards. The firmware is probably technically buggy for advertising too large of a minimum granularity, if it can piece together smaller requests into a larger discard. If discards need to happen at a smaller granularity, the firmware (or kernel quirk system) should fix the advertisement to the actual granularity that it will honor. I don't see a reason to change qemu's current behavior. The issue is that the optimal unmap granularity is only a hint. There is no evidence what happens with unaligned requests or requests that are smaller. They could still lead to a discard operation in the storage device. It just says if you can send me discards of that size thats optimal for me. Thats not said that smaller or unaligned requests have no effect. Peter
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On 10/25/2016 07:42 AM, Peter Lieven wrote: >> >> But hey, that firmware is seriously weird. :) > > Yes, so you would not change the new implementation? > > Even if the discard is e.g. 1MB it could theretically be that internally > the device has a finer granularity. Its an optimal discard alignment > not the minimum required discard size. I think thats a difference. > It does not say I can't handle smaller discards. The firmware is probably technically buggy for advertising too large of a minimum granularity, if it can piece together smaller requests into a larger discard. If discards need to happen at a smaller granularity, the firmware (or kernel quirk system) should fix the advertisement to the actual granularity that it will honor. I don't see a reason to change qemu's current behavior. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
Am 25.10.2016 um 14:19 schrieb Paolo Bonzini: On 25/10/2016 14:12, Peter Lieven wrote: Am 25.10.2016 um 14:09 schrieb Paolo Bonzini: On 25/10/2016 14:03, Peter Lieven wrote: Am 01.08.2016 um 11:22 schrieb Paolo Bonzini: On 28/07/2016 04:39, Eric Blake wrote: On 07/27/2016 01:25 AM, Fam Zheng wrote: On Thu, 07/21 13:34, Eric Blake wrote: +max_write_zeroes = max_write_zeroes / alignment * alignment; Not using QEMU_ALIGN_DOWN despite patch 3? Looks like I missed that on the rebase. Can fix if there is a reason for a respin. Since Stefan acked this, I'm applying the patch and fixing it to use QEMU_ALIGN_DOWN. Paolo Hi, I came across a sort of regression we introduced with the dropping of head and tail of an unaligned discard. The discard alignment that we use to trim the discard request is just a hint. I learned on the equallogics that a page (which is this unusal 15MB large) is unallocated even if the discard happens in pieces. E.g. in slices of 1MB requests. From my point of view I would like to restore the old behaviour. What do you think? The right logic should be the one in Linux: if splitting a request, and the next starting sector would be misaligned, stop the discard at the previous aligned sector. Otherwise leave everything alone. Just to clarify. I mean the guest would send incremental 1MB discards we would now drop all of them if the alignment is 15MB. Previously, we have sent all of the 1MB requests. Yes. In this case there would be no need at all to split the request, so each request should be passed through. But hey, that firmware is seriously weird. :) Yes, so you would not change the new implementation? Even if the discard is e.g. 1MB it could theretically be that internally the device has a finer granularity. Its an optimal discard alignment not the minimum required discard size. I think thats a difference. It does not say I can't handle smaller discards. Peter
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On 25/10/2016 14:12, Peter Lieven wrote: > Am 25.10.2016 um 14:09 schrieb Paolo Bonzini: >> >> On 25/10/2016 14:03, Peter Lieven wrote: >>> Am 01.08.2016 um 11:22 schrieb Paolo Bonzini: On 28/07/2016 04:39, Eric Blake wrote: > On 07/27/2016 01:25 AM, Fam Zheng wrote: >> On Thu, 07/21 13:34, Eric Blake wrote: >>> +max_write_zeroes = max_write_zeroes / alignment * alignment; >> Not using QEMU_ALIGN_DOWN despite patch 3? > Looks like I missed that on the rebase. Can fix if there is a > reason for > a respin. Since Stefan acked this, I'm applying the patch and fixing it to use QEMU_ALIGN_DOWN. Paolo >>> Hi, >>> >>> I came across a sort of regression we introduced with the dropping of >>> head and tail >>> of an unaligned discard. >>> >>> The discard alignment that we use to trim the discard request is just a >>> hint. >>> >>> I learned on the equallogics that a page (which is this unusal 15MB >>> large) is >>> unallocated even if the discard happens in pieces. E.g. in slices of 1MB >>> requests. >>> >>> From my point of view I would like to restore the old behaviour. >>> What do >>> you think? >> The right logic should be the one in Linux: if splitting a request, and >> the next starting sector would be misaligned, stop the discard at the >> previous aligned sector. Otherwise leave everything alone. > > Just to clarify. I mean the guest would send incremental 1MB discards > we would now drop all of them if the alignment is 15MB. Previously, > we have sent all of the 1MB requests. Yes. In this case there would be no need at all to split the request, so each request should be passed through. But hey, that firmware is seriously weird. :) Paolo
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
Am 25.10.2016 um 14:09 schrieb Paolo Bonzini: On 25/10/2016 14:03, Peter Lieven wrote: Am 01.08.2016 um 11:22 schrieb Paolo Bonzini: On 28/07/2016 04:39, Eric Blake wrote: On 07/27/2016 01:25 AM, Fam Zheng wrote: On Thu, 07/21 13:34, Eric Blake wrote: +max_write_zeroes = max_write_zeroes / alignment * alignment; Not using QEMU_ALIGN_DOWN despite patch 3? Looks like I missed that on the rebase. Can fix if there is a reason for a respin. Since Stefan acked this, I'm applying the patch and fixing it to use QEMU_ALIGN_DOWN. Paolo Hi, I came across a sort of regression we introduced with the dropping of head and tail of an unaligned discard. The discard alignment that we use to trim the discard request is just a hint. I learned on the equallogics that a page (which is this unusal 15MB large) is unallocated even if the discard happens in pieces. E.g. in slices of 1MB requests. From my point of view I would like to restore the old behaviour. What do you think? The right logic should be the one in Linux: if splitting a request, and the next starting sector would be misaligned, stop the discard at the previous aligned sector. Otherwise leave everything alone. Just to clarify. I mean the guest would send incremental 1MB discards we would now drop all of them if the alignment is 15MB. Previously, we have sent all of the 1MB requests. Peter
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On 25/10/2016 14:03, Peter Lieven wrote: > Am 01.08.2016 um 11:22 schrieb Paolo Bonzini: >> >> On 28/07/2016 04:39, Eric Blake wrote: >>> On 07/27/2016 01:25 AM, Fam Zheng wrote: On Thu, 07/21 13:34, Eric Blake wrote: > +max_write_zeroes = max_write_zeroes / alignment * alignment; Not using QEMU_ALIGN_DOWN despite patch 3? >>> Looks like I missed that on the rebase. Can fix if there is a reason for >>> a respin. >> Since Stefan acked this, I'm applying the patch and fixing it to use >> QEMU_ALIGN_DOWN. >> >> Paolo > > Hi, > > I came across a sort of regression we introduced with the dropping of > head and tail > of an unaligned discard. > > The discard alignment that we use to trim the discard request is just a > hint. > > I learned on the equallogics that a page (which is this unusal 15MB > large) is > unallocated even if the discard happens in pieces. E.g. in slices of 1MB > requests. > > From my point of view I would like to restore the old behaviour. What do > you think? The right logic should be the one in Linux: if splitting a request, and the next starting sector would be misaligned, stop the discard at the previous aligned sector. Otherwise leave everything alone. Paolo
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
Am 01.08.2016 um 11:22 schrieb Paolo Bonzini: On 28/07/2016 04:39, Eric Blake wrote: On 07/27/2016 01:25 AM, Fam Zheng wrote: On Thu, 07/21 13:34, Eric Blake wrote: +max_write_zeroes = max_write_zeroes / alignment * alignment; Not using QEMU_ALIGN_DOWN despite patch 3? Looks like I missed that on the rebase. Can fix if there is a reason for a respin. Since Stefan acked this, I'm applying the patch and fixing it to use QEMU_ALIGN_DOWN. Paolo Hi, I came across a sort of regression we introduced with the dropping of head and tail of an unaligned discard. The discard alignment that we use to trim the discard request is just a hint. I learned on the equallogics that a page (which is this unusal 15MB large) is unallocated even if the discard happens in pieces. E.g. in slices of 1MB requests. From my point of view I would like to restore the old behaviour. What do you think? Thanks, Peter
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On 28/07/2016 04:39, Eric Blake wrote: > On 07/27/2016 01:25 AM, Fam Zheng wrote: >> On Thu, 07/21 13:34, Eric Blake wrote: >>> +max_write_zeroes = max_write_zeroes / alignment * alignment; >> >> Not using QEMU_ALIGN_DOWN despite patch 3? > > Looks like I missed that on the rebase. Can fix if there is a reason for > a respin. Since Stefan acked this, I'm applying the patch and fixing it to use QEMU_ALIGN_DOWN. Paolo
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On 07/27/2016 01:25 AM, Fam Zheng wrote: > On Thu, 07/21 13:34, Eric Blake wrote: >> +max_write_zeroes = max_write_zeroes / alignment * alignment; > > Not using QEMU_ALIGN_DOWN despite patch 3? Looks like I missed that on the rebase. Can fix if there is a reason for a respin. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On Thu, 07/21 13:34, Eric Blake wrote: > +max_write_zeroes = max_write_zeroes / alignment * alignment; Not using QEMU_ALIGN_DOWN despite patch 3?
Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
On Thu, Jul 21, 2016 at 01:34:48PM -0600, Eric Blake wrote: > Dell Equallogic iSCSI SANs have a very unusual advertised geometry: > > $ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0 > wsnz:0 > maximum compare and write length:1 > optimal transfer length granularity:0 > maximum transfer length:0 > optimal transfer length:0 > maximum prefetch xdread xdwrite transfer length:0 > maximum unmap lba count:30720 > maximum unmap block descriptor count:2 > optimal unmap granularity:30720 > ugavalid:1 > unmap granularity alignment:0 > maximum write same length:30720 > > which says that both the maximum and the optimal discard size > is 15M. It is not immediately apparent if the device allows > discard requests not aligned to the optimal size, nor if it > allows discards at a finer granularity than the optimal size. > > I tried to find details in the SCSI Commands Reference Manual > Rev. A on what valid values of maximum and optimal sizes are > permitted, but while that document mentions a "Block Limits > VPD Page", I couldn't actually find documentation of that page > or what values it would have, or if a SCSI device has an > advertisement of its minimal unmap granularity. So it is not > obvious to me whether the Dell Equallogic device is compliance > with the SCSI specification. > > Fortunately, it is easy enough to support non-power-of-2 sizing, > even if it means we are less efficient than truly possible when > targetting that device (for example, it means that we refuse to > unmap anything that is not a multiple of 15M and aligned to a > 15M boundary, even if the device truly does support a smaller > granularity where unmapping actually works). > > Reported-by: Peter Lieven> Signed-off-by: Eric Blake > > --- > Help in locating the actual specs on what SCSI requires for > page 0xb0 would be nice. But this should at least avoid the > assertion failures that Peter is hitting. I was able to > test this patch using NBD on a hacked up qemu where I made > block/nbd.c report the same block limits, and could confirm > the assert under qemu-io 'w -z 0 40m' and 'discard 0 40m' > pre-patch, as well as the post-patch behavior of splitting > things to 15M alignment ('discard 1M 15M' becomes a no-op > because it is not aligned). But obviously it needs to be > tested on the actual iscsi SAN that triggered the original > report. > --- > include/block/block_int.h | 37 - > block/io.c| 15 +-- > 2 files changed, 29 insertions(+), 23 deletions(-) Acked-by: Stefan Hajnoczi signature.asc Description: PGP signature