Re: [PATCH 01/23] zfcp: make DIX experimental, disabled, and independent of DIF

2018-11-28 Thread Martin K. Petersen


Steffen,

As I said, I don't have a problem with having module parameters.

> There's one more important thing that has performance impact: We need to
> pack payload and protection data into the same queue of limited
> length. So for the worst case with DIX, we have to use half the size for
> sg_tablesize to get the other half for sg_prot_tablesize.

Interesting. With the exception of NVMe over PCIe, it's kind of unusual
for modern controllers to have real limits in this area.

> This limits the maximum I/O request size and thus throughput. Using
> read_verify and write_generate does not change the tablesizes, as zfcp
> would still announce support for DIF and DIX. With the new zfcp.dif=1
> and zfcp.dix=0, we can use the full sg_tablesize for payload data and
> sg_prot_tablesize=0. (The DIF "overhead" on the fibre still exists of
> course.)
>
> Are there other ways for accomplishing this which I'm not aware of?

You can set your shost->sg_prot_tablesize. There are pathological corner
cases like dd to the raw block device where you'll suffer if there is
not a 1:1 mapping between data and protection segments. But for regular
I/O where the protection is generated over a whole bio at a time, you
can get away with something smaller.

Anyway. I don't have any problems with you making DIX experimental for
zfcp. Just want to make sure it's done for the right reasons (i.e. not
problems in SCSI or the block layer).

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 01/23] zfcp: make DIX experimental, disabled, and independent of DIF

2018-11-22 Thread Steffen Maier

Hi Martin,

On 11/21/2018 07:13 PM, Martin K. Petersen wrote:

Sorry about the delay. Travel got in the way.


No problem.


BDI_CAP_STABLE_WRITES should take care of this. What's the configuration
that fails?


Apologies, if the commit description sounds unfair. I did not mean to
blame anyone. It's just the collection of issues we saw in distros over
the years. Some of the old issues might be fixed with above zfcp patch
or common code changes. Unfortunately, I could not handle the DIX things
we saw. I think, DIF by itself provides a lot of the protection benefit
and was not affected by the encountered issues. We would like to give
users an easy way to operate in such setup.


I don't have a problem with zfcp having a parameter that affects the
host protection mask, the other drivers do that too. However, these
knobs exist exclusively for debugging and testing purposes. They are not
something regular users should twiddle to switch features on or off.

So DIF and DIX should always be enabled in the driver. And there is no
point in ever operating without DIF enabled if the hardware is capable.


Our long term plan is to make the new zfcp.dif (for DIF only) default to 
enabled once we got enough experience about zfcp stability in this mode.



If there is a desire to disable DIX protection for whatever reason
(legacy code doing bad things), do so using the block layer sysfs
knobs. That's where the policy of whether to generate and verify
protection information resides, not in the HBA driver.


Yes, we came up with udev rules to set read_verify and write_generate to 
0 in order to get DIF without DIX. However, this seems complicated for 
users, especially since we always have at least dm-multipath and maybe 
other dm targets such as LVM on top. The setting that matters is on the 
top level block device of some dm (or maybe mdraid) virtual block device 
stack. Getting this right, gets more complicated if there are also disks 
not attached through zfcp, and which may need different settings, so the 
udev rules would need somewhat involved matching. The new zfcp.dif 
parameter makes it simpler because the SCSI disk comes up with the 
desired limits and anything on top automatically inherits these block 
queue limits.


There's one more important thing that has performance impact: We need to 
pack payload and protection data into the same queue of limited length. 
So for the worst case with DIX, we have to use half the size for 
sg_tablesize to get the other half for sg_prot_tablesize. This limits 
the maximum I/O request size and thus throughput. Using read_verify and 
write_generate does not change the tablesizes, as zfcp would still 
announce support for DIF and DIX. With the new zfcp.dif=1 and 
zfcp.dix=0, we can use the full sg_tablesize for payload data and 
sg_prot_tablesize=0. (The DIF "overhead" on the fibre still exists of 
course.)


Are there other ways for accomplishing this which I'm not aware of?


And if there are unaddressed issues in the I/O stack that prevents you
from having integrity enabled, I'd prefer to know about them so they can
be fixed rather than circumventing them through driver module parameter.


Sure.

--
Mit freundlichen Gruessen / Kind regards
Steffen Maier

Linux on IBM Z Development

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



Re: [PATCH 01/23] zfcp: make DIX experimental, disabled, and independent of DIF

2018-11-21 Thread Martin K. Petersen


Hi Steffen,

Sorry about the delay. Travel got in the way.

>> BDI_CAP_STABLE_WRITES should take care of this. What's the configuration
>> that fails?
>
> Apologies, if the commit description sounds unfair. I did not mean to
> blame anyone. It's just the collection of issues we saw in distros over
> the years. Some of the old issues might be fixed with above zfcp patch
> or common code changes. Unfortunately, I could not handle the DIX things
> we saw. I think, DIF by itself provides a lot of the protection benefit
> and was not affected by the encountered issues. We would like to give
> users an easy way to operate in such setup.

I don't have a problem with zfcp having a parameter that affects the
host protection mask, the other drivers do that too. However, these
knobs exist exclusively for debugging and testing purposes. They are not
something regular users should twiddle to switch features on or off.

So DIF and DIX should always be enabled in the driver. And there is no
point in ever operating without DIF enabled if the hardware is capable.

If there is a desire to disable DIX protection for whatever reason
(legacy code doing bad things), do so using the block layer sysfs
knobs. That's where the policy of whether to generate and verify
protection information resides, not in the HBA driver.

And if there are unaddressed issues in the I/O stack that prevents you
from having integrity enabled, I'd prefer to know about them so they can
be fixed rather than circumventing them through driver module parameter.

Hope that makes sense.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 01/23] zfcp: make DIX experimental, disabled, and independent of DIF

2018-11-09 Thread Steffen Maier

Hi Martin,

On 11/09/2018 03:07 AM, Martin K. Petersen wrote:

There are too many unresolved issues with DIX outside of zfcp such as
wrong protection data on writesame/discard (over device-mapper)


We don't configure protected transfers for anything but read and write
commands. There is currently no protection information generated for
WRITE SAME.



So if you guys are seeing failures, it must be due to zfcp
not handling the scsi_cmnd prot_op/prot_flags or the command PROTECT bit
correctly.


I think we're good since
("scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/s390/scsi?id=71b8e45da51a7b64a23378221c0a5868bd79da4f.

Previously, at least regular (non-recovery) I/O should have been good by 
having checked at least scsi_prot_sg_count().



or due to unstable page writes.


BDI_CAP_STABLE_WRITES should take care of this. What's the configuration
that fails?


Apologies, if the commit description sounds unfair. I did not mean to 
blame anyone. It's just the collection of issues we saw in distros over 
the years. Some of the old issues might be fixed with above zfcp patch 
or common code changes. Unfortunately, I could not handle the DIX things 
we saw. I think, DIF by itself provides a lot of the protection benefit 
and was not affected by the encountered issues. We would like to give 
users an easy way to operate in such setup.



--
Mit freundlichen Gruessen / Kind regards
Steffen Maier

Linux on IBM Z Development

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



Re: [PATCH 01/23] zfcp: make DIX experimental, disabled, and independent of DIF

2018-11-08 Thread Martin K. Petersen


Steffen,

> There are too many unresolved issues with DIX outside of zfcp such as
> wrong protection data on writesame/discard (over device-mapper)

We don't configure protected transfers for anything but read and write
commands. There is currently no protection information generated for
WRITE SAME. So if you guys are seeing failures, it must be due to zfcp
not handling the scsi_cmnd prot_op/prot_flags or the command PROTECT bit
correctly.

> or due to unstable page writes.

BDI_CAP_STABLE_WRITES should take care of this. What's the configuration
that fails?

-- 
Martin K. Petersen  Oracle Linux Engineering