Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-11-10 Thread Eric Blake
On 11/08/2016 05:03 AM, Peter Lieven wrote:
> Am 25.10.2016 um 18:12 schrieb Eric Blake:
>> On 10/25/2016 09:36 AM, Paolo Bonzini wrote:
>>>
>>> On 25/10/2016 16:35, Eric Blake wrote:
 So your argument is that we should always pass down every unaligned
 less-than-optimum discard request all the way to the hardware, rather
 than dropping it higher in the stack, even though discard requests are
 already advisory, in order to leave the hardware as the ultimate
 decision on whether to ignore the unaligned request?
>>> Yes, I agree with Peter as to this.
>> Okay, I'll work on patches. I think it counts as bug fix, so appropriate
>> even if I miss soft freeze (I'd still like to get NBD write zero support
>> into 2.8, since it already missed 2.7, but that one is still awaiting
>> review with not much time left).
>>
> 
> Hi Eric,
> 
> have you had time to look at this?
> If you need help, let me know.

Patches posted, but testing help would be appreciated since you have the
actual hardware that exhibits the issue.

I'm also trying to write a patch to extend the blkdebug driver to share
this "feature" of a 15M page, and write a qemu-iotest to make it harder
to regress in the future.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-11-08 Thread Eric Blake
On 11/08/2016 05:03 AM, Peter Lieven wrote:
> Am 25.10.2016 um 18:12 schrieb Eric Blake:
>> On 10/25/2016 09:36 AM, Paolo Bonzini wrote:
>>>
>>> On 25/10/2016 16:35, Eric Blake wrote:
 So your argument is that we should always pass down every unaligned
 less-than-optimum discard request all the way to the hardware, rather
 than dropping it higher in the stack, even though discard requests are
 already advisory, in order to leave the hardware as the ultimate
 decision on whether to ignore the unaligned request?
>>> Yes, I agree with Peter as to this.
>> Okay, I'll work on patches. I think it counts as bug fix, so appropriate
>> even if I miss soft freeze (I'd still like to get NBD write zero support
>> into 2.8, since it already missed 2.7, but that one is still awaiting
>> review with not much time left).
>>
> 
> Hi Eric,
> 
> have you had time to look at this?
> If you need help, let me know.

Still on my list. I'm not forgetting it, and it does count as a bug fix
so it is safe for inclusion, although I'm trying to get it in before
this week is out.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-11-08 Thread Peter Lieven

Am 25.10.2016 um 18:12 schrieb Eric Blake:

On 10/25/2016 09:36 AM, Paolo Bonzini wrote:


On 25/10/2016 16:35, Eric Blake wrote:

So your argument is that we should always pass down every unaligned
less-than-optimum discard request all the way to the hardware, rather
than dropping it higher in the stack, even though discard requests are
already advisory, in order to leave the hardware as the ultimate
decision on whether to ignore the unaligned request?

Yes, I agree with Peter as to this.

Okay, I'll work on patches. I think it counts as bug fix, so appropriate
even if I miss soft freeze (I'd still like to get NBD write zero support
into 2.8, since it already missed 2.7, but that one is still awaiting
review with not much time left).



Hi Eric,

have you had time to look at this?
If you need help, let me know.

Peter



Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Paolo Bonzini


On 25/10/2016 16:35, Eric Blake wrote:
> So your argument is that we should always pass down every unaligned
> less-than-optimum discard request all the way to the hardware, rather
> than dropping it higher in the stack, even though discard requests are
> already advisory, in order to leave the hardware as the ultimate
> decision on whether to ignore the unaligned request?

Yes, I agree with Peter as to this.

Paolo



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Eric Blake
On 10/25/2016 09:20 AM, Peter Lieven wrote:
>> The firmware is probably technically buggy for advertising too large of
>> a minimum granularity, if it can piece together smaller requests into a
>> larger discard.  If discards need to happen at a smaller granularity,
>> the firmware (or kernel quirk system) should fix the advertisement to
>> the actual granularity that it will honor.  I don't see a reason to
>> change qemu's current behavior.
>>
> 
> The issue is that the optimal unmap granularity is only a hint.
> There is no evidence what happens with unaligned requests or requests
> that are smaller. They could still lead to a discard operation in the
> storage device. It just says if you can send me discards of that size
> thats optimal for me. Thats not said that smaller or unaligned requests
> have no effect.

So your argument is that we should always pass down every unaligned
less-than-optimum discard request all the way to the hardware, rather
than dropping it higher in the stack, even though discard requests are
already advisory, in order to leave the hardware as the ultimate
decision on whether to ignore the unaligned request?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Peter Lieven

Am 25.10.2016 um 15:59 schrieb Eric Blake:

On 10/25/2016 07:42 AM, Peter Lieven wrote:

But hey, that firmware is seriously weird. :)

Yes, so you would not change the new implementation?

Even if the discard is e.g. 1MB it could theretically be that internally
the device has a finer granularity. Its an optimal discard alignment
not the minimum required discard size. I think thats a difference.
It does not say I can't handle smaller discards.

The firmware is probably technically buggy for advertising too large of
a minimum granularity, if it can piece together smaller requests into a
larger discard.  If discards need to happen at a smaller granularity,
the firmware (or kernel quirk system) should fix the advertisement to
the actual granularity that it will honor.  I don't see a reason to
change qemu's current behavior.



The issue is that the optimal unmap granularity is only a hint.
There is no evidence what happens with unaligned requests or requests
that are smaller. They could still lead to a discard operation in the
storage device. It just says if you can send me discards of that size
thats optimal for me. Thats not said that smaller or unaligned requests
have no effect.

Peter




Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Eric Blake
On 10/25/2016 07:42 AM, Peter Lieven wrote:
>>
>> But hey, that firmware is seriously weird. :)
> 
> Yes, so you would not change the new implementation?
> 
> Even if the discard is e.g. 1MB it could theretically be that internally
> the device has a finer granularity. Its an optimal discard alignment
> not the minimum required discard size. I think thats a difference.
> It does not say I can't handle smaller discards.

The firmware is probably technically buggy for advertising too large of
a minimum granularity, if it can piece together smaller requests into a
larger discard.  If discards need to happen at a smaller granularity,
the firmware (or kernel quirk system) should fix the advertisement to
the actual granularity that it will honor.  I don't see a reason to
change qemu's current behavior.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Peter Lieven

Am 25.10.2016 um 14:19 schrieb Paolo Bonzini:


On 25/10/2016 14:12, Peter Lieven wrote:

Am 25.10.2016 um 14:09 schrieb Paolo Bonzini:

On 25/10/2016 14:03, Peter Lieven wrote:

Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:

On 28/07/2016 04:39, Eric Blake wrote:

On 07/27/2016 01:25 AM, Fam Zheng wrote:

On Thu, 07/21 13:34, Eric Blake wrote:

+max_write_zeroes = max_write_zeroes / alignment * alignment;

Not using QEMU_ALIGN_DOWN despite patch 3?

Looks like I missed that on the rebase. Can fix if there is a
reason for
a respin.

Since Stefan acked this, I'm applying the patch and fixing it to use
QEMU_ALIGN_DOWN.

Paolo

Hi,

I came across a sort of regression we introduced with the dropping of
head and tail
of an unaligned discard.

The discard alignment that we use to trim the discard request is just a
hint.

I learned on the equallogics that a page (which is this unusal 15MB
large) is
unallocated even if the discard happens in pieces. E.g. in slices of 1MB
requests.

  From my point of view I would like to restore the old behaviour.
What do
you think?

The right logic should be the one in Linux: if splitting a request, and
the next starting sector would be misaligned, stop the discard at the
previous aligned sector.  Otherwise leave everything alone.

Just to clarify. I mean the guest would send incremental 1MB discards
we would now drop all of them if the alignment is 15MB. Previously,
we have sent all of the 1MB requests.

Yes.  In this case there would be no need at all to split the request,
so each request should be passed through.

But hey, that firmware is seriously weird. :)


Yes, so you would not change the new implementation?

Even if the discard is e.g. 1MB it could theretically be that internally
the device has a finer granularity. Its an optimal discard alignment
not the minimum required discard size. I think thats a difference.
It does not say I can't handle smaller discards.

Peter




Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Paolo Bonzini


On 25/10/2016 14:12, Peter Lieven wrote:
> Am 25.10.2016 um 14:09 schrieb Paolo Bonzini:
>>
>> On 25/10/2016 14:03, Peter Lieven wrote:
>>> Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:
 On 28/07/2016 04:39, Eric Blake wrote:
> On 07/27/2016 01:25 AM, Fam Zheng wrote:
>> On Thu, 07/21 13:34, Eric Blake wrote:
>>> +max_write_zeroes = max_write_zeroes / alignment * alignment;
>> Not using QEMU_ALIGN_DOWN despite patch 3?
> Looks like I missed that on the rebase. Can fix if there is a
> reason for
> a respin.
 Since Stefan acked this, I'm applying the patch and fixing it to use
 QEMU_ALIGN_DOWN.

 Paolo
>>> Hi,
>>>
>>> I came across a sort of regression we introduced with the dropping of
>>> head and tail
>>> of an unaligned discard.
>>>
>>> The discard alignment that we use to trim the discard request is just a
>>> hint.
>>>
>>> I learned on the equallogics that a page (which is this unusal 15MB
>>> large) is
>>> unallocated even if the discard happens in pieces. E.g. in slices of 1MB
>>> requests.
>>>
>>>  From my point of view I would like to restore the old behaviour.
>>> What do
>>> you think?
>> The right logic should be the one in Linux: if splitting a request, and
>> the next starting sector would be misaligned, stop the discard at the
>> previous aligned sector.  Otherwise leave everything alone.
> 
> Just to clarify. I mean the guest would send incremental 1MB discards
> we would now drop all of them if the alignment is 15MB. Previously,
> we have sent all of the 1MB requests.

Yes.  In this case there would be no need at all to split the request,
so each request should be passed through.

But hey, that firmware is seriously weird. :)

Paolo



Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Peter Lieven

Am 25.10.2016 um 14:09 schrieb Paolo Bonzini:


On 25/10/2016 14:03, Peter Lieven wrote:

Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:

On 28/07/2016 04:39, Eric Blake wrote:

On 07/27/2016 01:25 AM, Fam Zheng wrote:

On Thu, 07/21 13:34, Eric Blake wrote:

+max_write_zeroes = max_write_zeroes / alignment * alignment;

Not using QEMU_ALIGN_DOWN despite patch 3?

Looks like I missed that on the rebase. Can fix if there is a reason for
a respin.

Since Stefan acked this, I'm applying the patch and fixing it to use
QEMU_ALIGN_DOWN.

Paolo

Hi,

I came across a sort of regression we introduced with the dropping of
head and tail
of an unaligned discard.

The discard alignment that we use to trim the discard request is just a
hint.

I learned on the equallogics that a page (which is this unusal 15MB
large) is
unallocated even if the discard happens in pieces. E.g. in slices of 1MB
requests.

 From my point of view I would like to restore the old behaviour. What do
you think?

The right logic should be the one in Linux: if splitting a request, and
the next starting sector would be misaligned, stop the discard at the
previous aligned sector.  Otherwise leave everything alone.


Just to clarify. I mean the guest would send incremental 1MB discards
we would now drop all of them if the alignment is 15MB. Previously,
we have sent all of the 1MB requests.

Peter




Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Paolo Bonzini


On 25/10/2016 14:03, Peter Lieven wrote:
> Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:
>>
>> On 28/07/2016 04:39, Eric Blake wrote:
>>> On 07/27/2016 01:25 AM, Fam Zheng wrote:
 On Thu, 07/21 13:34, Eric Blake wrote:
> +max_write_zeroes = max_write_zeroes / alignment * alignment;
 Not using QEMU_ALIGN_DOWN despite patch 3?
>>> Looks like I missed that on the rebase. Can fix if there is a reason for
>>> a respin.
>> Since Stefan acked this, I'm applying the patch and fixing it to use
>> QEMU_ALIGN_DOWN.
>>
>> Paolo
> 
> Hi,
> 
> I came across a sort of regression we introduced with the dropping of
> head and tail
> of an unaligned discard.
> 
> The discard alignment that we use to trim the discard request is just a
> hint.
> 
> I learned on the equallogics that a page (which is this unusal 15MB
> large) is
> unallocated even if the discard happens in pieces. E.g. in slices of 1MB
> requests.
> 
> From my point of view I would like to restore the old behaviour. What do
> you think?

The right logic should be the one in Linux: if splitting a request, and
the next starting sector would be misaligned, stop the discard at the
previous aligned sector.  Otherwise leave everything alone.

Paolo



Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Peter Lieven

Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:


On 28/07/2016 04:39, Eric Blake wrote:

On 07/27/2016 01:25 AM, Fam Zheng wrote:

On Thu, 07/21 13:34, Eric Blake wrote:

+max_write_zeroes = max_write_zeroes / alignment * alignment;

Not using QEMU_ALIGN_DOWN despite patch 3?

Looks like I missed that on the rebase. Can fix if there is a reason for
a respin.

Since Stefan acked this, I'm applying the patch and fixing it to use
QEMU_ALIGN_DOWN.

Paolo


Hi,

I came across a sort of regression we introduced with the dropping of head and 
tail
of an unaligned discard.

The discard alignment that we use to trim the discard request is just a hint.

I learned on the equallogics that a page (which is this unusal 15MB large) is
unallocated even if the discard happens in pieces. E.g. in slices of 1MB 
requests.

From my point of view I would like to restore the old behaviour. What do you 
think?

Thanks,
Peter




Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-08-01 Thread Paolo Bonzini


On 28/07/2016 04:39, Eric Blake wrote:
> On 07/27/2016 01:25 AM, Fam Zheng wrote:
>> On Thu, 07/21 13:34, Eric Blake wrote:
>>> +max_write_zeroes = max_write_zeroes / alignment * alignment;
>>
>> Not using QEMU_ALIGN_DOWN despite patch 3?
> 
> Looks like I missed that on the rebase. Can fix if there is a reason for
> a respin.

Since Stefan acked this, I'm applying the patch and fixing it to use
QEMU_ALIGN_DOWN.

Paolo



Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-07-27 Thread Eric Blake
On 07/27/2016 01:25 AM, Fam Zheng wrote:
> On Thu, 07/21 13:34, Eric Blake wrote:
>> +max_write_zeroes = max_write_zeroes / alignment * alignment;
> 
> Not using QEMU_ALIGN_DOWN despite patch 3?

Looks like I missed that on the rebase. Can fix if there is a reason for
a respin.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-07-27 Thread Fam Zheng
On Thu, 07/21 13:34, Eric Blake wrote:
> +max_write_zeroes = max_write_zeroes / alignment * alignment;

Not using QEMU_ALIGN_DOWN despite patch 3?




Re: [Qemu-block] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-07-26 Thread Stefan Hajnoczi
On Thu, Jul 21, 2016 at 01:34:48PM -0600, Eric Blake wrote:
> Dell Equallogic iSCSI SANs have a very unusual advertised geometry:
> 
> $ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0
> wsnz:0
> maximum compare and write length:1
> optimal transfer length granularity:0
> maximum transfer length:0
> optimal transfer length:0
> maximum prefetch xdread xdwrite transfer length:0
> maximum unmap lba count:30720
> maximum unmap block descriptor count:2
> optimal unmap granularity:30720
> ugavalid:1
> unmap granularity alignment:0
> maximum write same length:30720
> 
> which says that both the maximum and the optimal discard size
> is 15M.  It is not immediately apparent if the device allows
> discard requests not aligned to the optimal size, nor if it
> allows discards at a finer granularity than the optimal size.
> 
> I tried to find details in the SCSI Commands Reference Manual
> Rev. A on what valid values of maximum and optimal sizes are
> permitted, but while that document mentions a "Block Limits
> VPD Page", I couldn't actually find documentation of that page
> or what values it would have, or if a SCSI device has an
> advertisement of its minimal unmap granularity.  So it is not
> obvious to me whether the Dell Equallogic device is compliance
> with the SCSI specification.
> 
> Fortunately, it is easy enough to support non-power-of-2 sizing,
> even if it means we are less efficient than truly possible when
> targetting that device (for example, it means that we refuse to
> unmap anything that is not a multiple of 15M and aligned to a
> 15M boundary, even if the device truly does support a smaller
> granularity where unmapping actually works).
> 
> Reported-by: Peter Lieven 
> Signed-off-by: Eric Blake 
> 
> ---
> Help in locating the actual specs on what SCSI requires for
> page 0xb0 would be nice. But this should at least avoid the
> assertion failures that Peter is hitting.  I was able to
> test this patch using NBD on a hacked up qemu where I made
> block/nbd.c report the same block limits, and could confirm
> the assert under qemu-io 'w -z 0 40m' and 'discard 0 40m'
> pre-patch, as well as the post-patch behavior of splitting
> things to 15M alignment ('discard 1M 15M' becomes a no-op
> because it is not aligned).  But obviously it needs to be
> tested on the actual iscsi SAN that triggered the original
> report.
> ---
>  include/block/block_int.h | 37 -
>  block/io.c| 15 +--
>  2 files changed, 29 insertions(+), 23 deletions(-)

Acked-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature