Re: [PATCH] sd: fix uninitialized variable access in error handling

2016-10-21 Thread Shaun Tancheff
On Fri, Oct 21, 2016 at 10:32 AM, Arnd Bergmann <a...@arndb.de> wrote:
> If sd_zbc_report_zones fails, the check for 'zone_blocks == 0'
> later in the function accesses uninitialized data:
>
> drivers/scsi/sd_zbc.c: In function ‘sd_zbc_read_zones’:
> drivers/scsi/sd_zbc.c:520:7: error: ‘zone_blocks’ may be used uninitialized 
> in this function [-Werror=maybe-uninitialized]
>
> This sets it to zero, which has the desired effect of leaving
> the sd_zbc_read_zones successfully with sdkp->zone_blocks = 0.
>
> Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
> Signed-off-by: Arnd Bergmann <a...@arndb.de>
> ---
>  drivers/scsi/sd_zbc.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> index 16d3fa62d8ac..d5b3bd915d9e 100644
> --- a/drivers/scsi/sd_zbc.c
> +++ b/drivers/scsi/sd_zbc.c
> @@ -455,8 +455,10 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
>
> /* Do a report zone to get the same field */
> ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0);
> -   if (ret)
> +   if (ret) {
> +   zone_blocks = 0;
> goto out;
> +   }
>
> same = buf[4] & 0x0f;
> if (same > 0) {
> --
> 2.9.0
>

Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 6/7] sd: Implement support for ZBC devices

2016-10-18 Thread Shaun Tancheff
On Tue, Oct 18, 2016 at 11:58 AM, Jeff Moyer  wrote:
> Damien Le Moal  writes:
>
>> + if (!is_power_of_2(zone_blocks)) {
>> + if (sdkp->first_scan)
>> + sd_printk(KERN_NOTICE, sdkp,
>> +   "Devices with non power of 2 zone "
>> +   "size are not supported\n");
>> + return -ENODEV;
>> + }
>
> Are power of 2 zone sizes required by the standard?  I see why you've
> done this, but I wonder if we're artificially limiting the
> implementation, and whether there will be valid devices on the market
> that simply won't work with Linux because of this.

The standard does not require power of 2 zones.
That said, I am not aware of any current (or planned) devices other
than a power of 2.
Common zone sizes I am aware of: 256MiB, 128MiB and 1GiB.

Also note that we are excluding the runt zone from the power of 2 expectation.

So conforming devices should (excluding a runt zone):
  - Have zones of the same size.
  - Choose a zone size that is a power of 2.

--Shaun

> -Jeff
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  
> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html=DQIBAg=IGDlg0lD0b-nebmJJ0Kp8A=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA=A15hLQb19nr4vdRr1Bbbn98FLSj_y-C0VI6FtiA9V_I=rVkinUiv-ZJHIfhlk2VVJM7S2dJtvxOCmwbKMuiOCPU=
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/4] fusion: remove iopriority handling

2016-10-13 Thread Shaun Tancheff
On Thu, Oct 13, 2016 at 6:00 PM, Adam Manzanares
 wrote:
> The request priority is now by default coming from the ioc. It was not
> clear what this code was trying to do based upon the iopriority class or
> data. The driver should check that a device supports priorities and use
> them according to the specificiations of ioprio.
>
> Signed-off-by: Adam Manzanares 
> ---
>  drivers/message/fusion/mptscsih.c | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/drivers/message/fusion/mptscsih.c 
> b/drivers/message/fusion/mptscsih.c
> index 6c9fc11..4740bb6 100644
> --- a/drivers/message/fusion/mptscsih.c
> +++ b/drivers/message/fusion/mptscsih.c
> @@ -1369,11 +1369,6 @@ mptscsih_qcmd(struct scsi_cmnd *SCpnt)
> if ((vdevice->vtarget->tflags & MPT_TARGET_FLAGS_Q_YES)
> && (SCpnt->device->tagged_supported)) {
> scsictl = scsidir | MPI_SCSIIO_CONTROL_SIMPLEQ;
> -   if (SCpnt->request && SCpnt->request->ioprio) {
> -   if (((SCpnt->request->ioprio & 0x7) == 1) ||
> -   !(SCpnt->request->ioprio & 0x7))
> -   scsictl |= MPI_SCSIIO_CONTROL_HEADOFQ;
> -   }
> } else
> scsictl = scsidir | MPI_SCSIIO_CONTROL_UNTAGGED;

Style wise you can further remove the extra parens around
  SCpnt->device->tagged_supported
As well as the now redundant braces.

Regards,
Shaun

> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  
> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html=DQIBAg=IGDlg0lD0b-nebmJJ0Kp8A=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA=ZE7JzxXeXPEWqk9WYm42hZHj8gESRg1QoS5XklfbprM=C0iMyTgYbYl06F1SQ2DqfdESKBtl3Whp5rSnHSBXOc4=
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 6/7] sd: Implement support for ZBC devices

2016-09-30 Thread Shaun Tancheff
From: Hannes Reinecke <h...@suse.de>

Implement ZBC support functions to setup zoned disks, both
host-managed and host-aware models. Only zoned disks that satisfy
the following conditions are supported:
1) All zones are the same size, with the exception of an eventual
   last smaller runt zone.
2) For host-managed disks, reads are unrestricted (reads are not
   failed due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are setup with
a capacity of 0 to prevent their use.

The function sd_zbc_read_zones, called from sd_revalidate_disk,
checks that the device satisfies the above two constraints. This
function may also change the disk capacity previously set by
sd_read_capacity for devices reporting only the capacity of
conventional zones at the beginning of the LBA range (i.e. devices
reporting rc_basis set to 0).

The capacity message output was moved out of sd_read_capacity into
a new function sd_print_capacity to include this eventual capacity
change by sd_zbc_read_zones. This new function also includes a call
to sd_zbc_print_zones to display the number of zones and zone size
of the device.

Signed-off-by: Hannes Reinecke <h...@suse.de>

[Damien: * Removed zone cache support
 * Removed mapping of discard to reset write pointer command
 * Modified sd_zbc_read_zones to include checks that the
   device satisfies the kernel constraints
 * Implemeted REPORT ZONES setup and post-processing based
   on code from Shaun Tancheff <shaun.tanch...@seagate.com>
 * Removed confusing use of 512B sector units in functions
   interface]
Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
Changes from v5:
* Rebased on Jens' for-4.9/block branch (v5 is based on next-20160928).

 drivers/scsi/Makefile |   1 +
 drivers/scsi/sd.c | 141 ---
 drivers/scsi/sd.h |  67 +
 drivers/scsi/sd_zbc.c | 627 ++
 include/scsi/scsi_proto.h |  17 ++
 5 files changed, 820 insertions(+), 33 deletions(-)
 create mode 100644 drivers/scsi/sd_zbc.c

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index d539798..fabcb6d 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -179,6 +179,7 @@ hv_storvsc-y:= storvsc_drv.o
 
 sd_mod-objs:= sd.o
 sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
+sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
 
 sr_mod-objs:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d3e852a..fb324ac 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -92,6 +92,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS  16
@@ -162,7 +163,7 @@ cache_type_store(struct device *dev, struct 
device_attribute *attr,
static const char temp[] = "temporary ";
int len;
 
-   if (sdp->type != TYPE_DISK)
+   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
/* no cache control on RBC devices; theoretically they
 * can do it, but there's probably so many exceptions
 * it's not worth the risk */
@@ -261,7 +262,7 @@ allow_restart_store(struct device *dev, struct 
device_attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
 
-   if (sdp->type != TYPE_DISK)
+   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
return -EINVAL;
 
sdp->allow_restart = simple_strtoul(buf, NULL, 10);
@@ -391,6 +392,11 @@ provisioning_mode_store(struct device *dev, struct 
device_attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
 
+   if (sd_is_zoned(sdkp)) {
+   sd_config_discard(sdkp, SD_LBP_DISABLE);
+   return count;
+   }
+
if (sdp->type != TYPE_DISK)
return -EINVAL;
 
@@ -458,7 +464,7 @@ max_write_same_blocks_store(struct device *dev, struct 
device_attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
 
-   if (sdp->type != TYPE_DISK)
+   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
return -EINVAL;
 
err = kstrtoul(buf, 10, );
@@ -843,6 +849,12 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
 
BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
 
+   if (sd_is_zoned(sdkp)) {
+   ret = sd_zbc_setup_read_write(cmd);

[PATCH v6 5/7] block: Implement support for zoned block devices

2016-09-30 Thread Shaun Tancheff
From: Hannes Reinecke <h...@suse.de>

Implement zoned block device zone information reporting and reset.
Zone information are reported as struct blk_zone. This implementation
does not differentiate between host-aware and host-managed device
models and is valid for both. Two functions are provided:
blkdev_report_zones for discovering the zone configuration of a
zoned block device, and blkdev_reset_zones for resetting the write
pointer of sequential zones. The helper function blk_queue_zone_size
and bdev_zone_size are also provided for, as the name suggest,
obtaining the zone size (in 512B sectors) of the zones of the device.

Signed-off-by: Hannes Reinecke <h...@suse.de>

[Damien: * Removed the zone cache
 * Implement report zones operation based on earlier proposal
       by Shaun Tancheff <shaun.tanch...@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com>
Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
Changes from v5:
* Rebased on Jens' for-4.9/block branch (v5 is based on next-20160928).

 block/Kconfig |   8 ++
 block/Makefile|   2 +-
 block/blk-zoned.c | 257 ++
 include/linux/blkdev.h|  31 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned.h | 103 +
 6 files changed, 401 insertions(+), 1 deletion(-)
 create mode 100644 block/blk-zoned.c
 create mode 100644 include/uapi/linux/blkzoned.h

diff --git a/block/Kconfig b/block/Kconfig
index 5136ad4..7bb9bf8 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
T10/SCSI Data Integrity Field or the T13/ATA External Path
Protection.  If in doubt, say N.
 
+config BLK_DEV_ZONED
+   bool "Zoned block device support"
+   ---help---
+   Block layer zoned block device support. This option enables
+   support for ZAC/ZBC host-managed and host-aware zoned block devices.
+
+   Say yes here if you have a ZAC or ZBC storage device.
+
 config BLK_DEV_THROTTLING
bool "Block layer bio throttling support"
depends on BLK_CGROUP=y
diff --git a/block/Makefile b/block/Makefile
index 9eda232..4676969 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -22,4 +22,4 @@ obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
 obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)   += cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
-
+obj-$(CONFIG_BLK_DEV_ZONED)+= blk-zoned.o
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
new file mode 100644
index 000..1603573
--- /dev/null
+++ b/block/blk-zoned.c
@@ -0,0 +1,257 @@
+/*
+ * Zoned block device handling
+ *
+ * Copyright (c) 2015, Hannes Reinecke
+ * Copyright (c) 2015, SUSE Linux GmbH
+ *
+ * Copyright (c) 2016, Damien Le Moal
+ * Copyright (c) 2016, Western Digital
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static inline sector_t blk_zone_start(struct request_queue *q,
+ sector_t sector)
+{
+   sector_t zone_mask = blk_queue_zone_size(q) - 1;
+
+   return sector & ~zone_mask;
+}
+
+/*
+ * Check that a zone report belongs to the partition.
+ * If yes, fix its start sector and write pointer, copy it in the
+ * zone information array and return true. Return false otherwise.
+ */
+static bool blkdev_report_zone(struct block_device *bdev,
+  struct blk_zone *rep,
+  struct blk_zone *zone)
+{
+   sector_t offset = get_start_sect(bdev);
+
+   if (rep->start < offset)
+   return false;
+
+   rep->start -= offset;
+   if (rep->start + rep->len > bdev->bd_part->nr_sects)
+   return false;
+
+   if (rep->type == BLK_ZONE_TYPE_CONVENTIONAL)
+   rep->wp = rep->start + rep->len;
+   else
+   rep->wp -= offset;
+   memcpy(zone, rep, sizeof(struct blk_zone));
+
+   return true;
+}
+
+/**
+ * blkdev_report_zones - Get zones information
+ * @bdev:  Target block device
+ * @sector:Sector from which to report zones
+ * @zones: Array of zone structures where to return the zones information
+ * @nr_zones:  Number of zone structures in the zone array
+ * @gfp_mask:  Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Get zone information starting from the zone containing @sector.
+ *The number of zone information reported may be less than the number
+ *requested by @nr_zones. The number of zones actually reported is
+ *returned in @nr_zones.
+ */
+int blkdev_report_zones(struct block_device *bdev,
+   

[PATCH v6 7/7] blk-zoned: implement ioctls

2016-09-30 Thread Shaun Tancheff
Adds the new BLKREPORTZONE and BLKRESETZONE ioctls for respectively
obtaining the zone configuration of a zoned block device and resetting
the write pointer of sequential zones of a zoned block device.

The BLKREPORTZONE ioctl maps directly to a single call of the function
blkdev_report_zones. The zone information result is passed as an array
of struct blk_zone identical to the structure used internally for
processing the REQ_OP_ZONE_REPORT operation.  The BLKRESETZONE ioctl
maps to a single call of the blkdev_reset_zones function.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com>
---
 block/blk-zoned.c | 93 +++
 block/ioctl.c |  4 ++
 include/linux/blkdev.h| 21 ++
 include/uapi/linux/blkzoned.h | 40 +++
 include/uapi/linux/fs.h   |  4 ++
 5 files changed, 162 insertions(+)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 1603573..667f95d 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -255,3 +255,96 @@ int blkdev_reset_zones(struct block_device *bdev,
return 0;
 }
 EXPORT_SYMBOL_GPL(blkdev_reset_zones);
+
+/**
+ * BLKREPORTZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+   void __user *argp = (void __user *)arg;
+   struct request_queue *q;
+   struct blk_zone_report rep;
+   struct blk_zone *zones;
+   int ret;
+
+   if (!argp)
+   return -EINVAL;
+
+   q = bdev_get_queue(bdev);
+   if (!q)
+   return -ENXIO;
+
+   if (!blk_queue_is_zoned(q))
+   return -ENOTTY;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
+   if (copy_from_user(, argp, sizeof(struct blk_zone_report)))
+   return -EFAULT;
+
+   if (!rep.nr_zones)
+   return -EINVAL;
+
+   zones = kcalloc(rep.nr_zones, sizeof(struct blk_zone), GFP_KERNEL);
+   if (!zones)
+   return -ENOMEM;
+
+   ret = blkdev_report_zones(bdev, rep.sector,
+ zones, _zones,
+ GFP_KERNEL);
+   if (ret)
+   goto out;
+
+   if (copy_to_user(argp, , sizeof(struct blk_zone_report))) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   if (rep.nr_zones) {
+   if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
+sizeof(struct blk_zone) * rep.nr_zones))
+   ret = -EFAULT;
+   }
+
+ out:
+   kfree(zones);
+
+   return ret;
+}
+
+/**
+ * BLKRESETZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
+unsigned int cmd, unsigned long arg)
+{
+   void __user *argp = (void __user *)arg;
+   struct request_queue *q;
+   struct blk_zone_range zrange;
+
+   if (!argp)
+   return -EINVAL;
+
+   q = bdev_get_queue(bdev);
+   if (!q)
+   return -ENXIO;
+
+   if (!blk_queue_is_zoned(q))
+   return -ENOTTY;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   if (copy_from_user(, argp, sizeof(struct blk_zone_range)))
+   return -EFAULT;
+
+   return blkdev_reset_zones(bdev, zrange.sector, zrange.nr_sectors,
+ GFP_KERNEL);
+}
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..448f78a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -513,6 +513,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
BLKDEV_DISCARD_SECURE);
case BLKZEROOUT:
return blk_ioctl_zeroout(bdev, mode, arg);
+   case BLKREPORTZONE:
+   return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
+   case BLKRESETZONE:
+   return blkdev_reset_zones_ioctl(bdev, mode, cmd, arg);
case HDIO_GETGEO:
return blkdev_getgeo(bdev, argp);
case BLKRAGET:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 252043f..90097dd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -316,6 +316,27 @@ extern int blkdev_report_zones(struct block_device *bdev,
 extern int blkdev_reset_zones(struct block_device *bdev, sector_t sectors,
  sector_t nr_sectors, gfp_t gfp_mask);
 
+extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+unsigned int cmd, uns

[PATCH v6 3/7] block: update chunk_sectors in blk_stack_limits()

2016-09-30 Thread Shaun Tancheff
From: Hannes Reinecke <h...@suse.de>

Signed-off-by: Hannes Reinecke <h...@suse.com>
Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com>
Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
 block/blk-settings.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b1d5b7f..55369a6 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct 
queue_limits *b,
t->discard_granularity;
}
 
+   if (b->chunk_sectors)
+   t->chunk_sectors = min_not_zero(t->chunk_sectors,
+   b->chunk_sectors);
+
return ret;
 }
 EXPORT_SYMBOL(blk_stack_limits);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes

2016-09-30 Thread Shaun Tancheff
From: Hannes Reinecke <h...@suse.de>

The queue limits already have a 'chunk_sectors' setting, so
we should be presenting it via sysfs.

Signed-off-by: Hannes Reinecke <h...@suse.de>

[Damien: Updated Documentation/ABI/testing/sysfs-block]

Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com>
Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
 Documentation/ABI/testing/sysfs-block | 13 +
 block/blk-sysfs.c | 11 +++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index 75a5055..ee2d5cd 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -251,3 +251,16 @@ Description:
since drive-managed zoned block devices do not support
zone commands, they will be treated as regular block
devices and zoned will report "none".
+
+What:  /sys/block//queue/chunk_sectors
+Date:  September 2016
+Contact:   Hannes Reinecke <h...@suse.com>
+Description:
+   chunk_sectors has different meaning depending on the type
+   of the disk. For a RAID device (dm-raid), chunk_sectors
+   indicates the size in 512B sectors of the RAID volume
+   stripe segment. For a zoned block device, either
+   host-aware or host-managed, chunk_sectors indicates the
+   size of 512B sectors of the zones of the device, with
+   the eventual exception of the last zone of the device
+   which may be smaller.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ff9cd9c..488c2e2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct 
request_queue *q, char *pag
return queue_var_show(queue_physical_block_size(q), page);
 }
 
+static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
+{
+   return queue_var_show(q->limits.chunk_sectors, page);
+}
+
 static ssize_t queue_io_min_show(struct request_queue *q, char *page)
 {
return queue_var_show(queue_io_min(q), page);
@@ -455,6 +460,11 @@ static struct queue_sysfs_entry 
queue_physical_block_size_entry = {
.show = queue_physical_block_size_show,
 };
 
+static struct queue_sysfs_entry queue_chunk_sectors_entry = {
+   .attr = {.name = "chunk_sectors", .mode = S_IRUGO },
+   .show = queue_chunk_sectors_show,
+};
+
 static struct queue_sysfs_entry queue_io_min_entry = {
.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
.show = queue_io_min_show,
@@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
_hw_sector_size_entry.attr,
_logical_block_size_entry.attr,
_physical_block_size_entry.attr,
+   _chunk_sectors_entry.attr,
_io_min_entry.attr,
_io_opt_entry.attr,
_discard_granularity_entry.attr,
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 4/7] block: Define zoned block device operations

2016-09-30 Thread Shaun Tancheff
From: Shaun Tancheff <shaun.tanch...@seagate.com>

Define REQ_OP_ZONE_REPORT and REQ_OP_ZONE_RESET for handling zones of
host-managed and host-aware zoned block devices. With with these two
new operations, the total number of operations defined reaches 8 and
still fits with the 3 bits definition of REQ_OP_BITS.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com>
---
 block/blk-core.c  | 4 
 include/linux/blk_types.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..e4eda5d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1941,6 +1941,10 @@ generic_make_request_checks(struct bio *bio)
case REQ_OP_WRITE_SAME:
if (!bdev_write_same(bio->bi_bdev))
goto not_supported;
+   case REQ_OP_ZONE_REPORT:
+   case REQ_OP_ZONE_RESET:
+   if (!bdev_is_zoned(bio->bi_bdev))
+   goto not_supported;
break;
default:
break;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ec..dd50dce 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -243,6 +243,8 @@ enum req_op {
REQ_OP_SECURE_ERASE,/* request to securely erase sectors */
REQ_OP_WRITE_SAME,  /* write same block many times */
REQ_OP_FLUSH,   /* request for cache flush */
+   REQ_OP_ZONE_REPORT, /* Get zone information */
+   REQ_OP_ZONE_RESET,  /* Reset a zone write pointer */
 };
 
 #define REQ_OP_BITS 3
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 1/7] block: Add 'zoned' queue limit

2016-09-30 Thread Shaun Tancheff
From: Damien Le Moal <damien.lem...@hgst.com>

Add the zoned queue limit to indicate the zoning model of a block device.
Defined values are 0 (BLK_ZONED_NONE) for regular block devices,
1 (BLK_ZONED_HA) for host-aware zone block devices and 2 (BLK_ZONED_HM)
for host-managed zone block devices. The standards defined drive managed
model is not defined here since these block devices do not provide any
command for accessing zone information. Drive managed model devices will
be reported as BLK_ZONED_NONE.

The helper functions blk_queue_zoned_model and bdev_zoned_model return
the zoned limit and the functions blk_queue_is_zoned and bdev_is_zoned
return a boolean for callers to test if a block device is zoned.

The zoned attribute is also exported as a string to applications via
sysfs. BLK_ZONED_NONE shows as "none", BLK_ZONED_HA as "host-aware" and
BLK_ZONED_HM as "host-managed".

Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com>
Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
 Documentation/ABI/testing/sysfs-block | 16 
 block/blk-settings.c  |  1 +
 block/blk-sysfs.c | 18 ++
 include/linux/blkdev.h| 47 +++
 4 files changed, 82 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index 71d184d..75a5055 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -235,3 +235,19 @@ Description:
write_same_max_bytes is 0, write same is not supported
by the device.
 
+What:  /sys/block//queue/zoned
+Date:  September 2016
+Contact:   Damien Le Moal <damien.lem...@hgst.com>
+Description:
+   zoned indicates if the device is a zoned block device
+   and the zone model of the device if it is indeed zoned.
+   The possible values indicated by zoned are "none" for
+   regular block devices and "host-aware" or "host-managed"
+   for zoned block devices. The characteristics of
+   host-aware and host-managed zoned block devices are
+   described in the ZBC (Zoned Block Commands) and ZAC
+   (Zoned Device ATA Command Set) standards. These standards
+   also define the "drive-managed" zone model. However,
+   since drive-managed zoned block devices do not support
+   zone commands, they will be treated as regular block
+   devices and zoned will report "none".
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f679ae1..b1d5b7f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -107,6 +107,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->io_opt = 0;
lim->misaligned = 0;
lim->cluster = 1;
+   lim->zoned = BLK_ZONED_NONE;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..ff9cd9c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -257,6 +257,18 @@ QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
+static ssize_t queue_zoned_show(struct request_queue *q, char *page)
+{
+   switch (blk_queue_zoned_model(q)) {
+   case BLK_ZONED_HA:
+   return sprintf(page, "host-aware\n");
+   case BLK_ZONED_HM:
+   return sprintf(page, "host-managed\n");
+   default:
+   return sprintf(page, "none\n");
+   }
+}
+
 static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
 {
return queue_var_show((blk_queue_nomerges(q) << 1) |
@@ -485,6 +497,11 @@ static struct queue_sysfs_entry queue_nonrot_entry = {
.store = queue_store_nonrot,
 };
 
+static struct queue_sysfs_entry queue_zoned_entry = {
+   .attr = {.name = "zoned", .mode = S_IRUGO },
+   .show = queue_zoned_show,
+};
+
 static struct queue_sysfs_entry queue_nomerges_entry = {
.attr = {.name = "nomerges", .mode = S_IRUGO | S_IWUSR },
.show = queue_nomerges_show,
@@ -546,6 +563,7 @@ static struct attribute *default_attrs[] = {
_discard_zeroes_data_entry.attr,
_write_same_max_entry.attr,
_nonrot_entry.attr,
+   _zoned_entry.attr,
_nomerges_entry.attr,
_rq_affinity_entry.attr,
_iostats_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..f19e16b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -261,6 +261,15 @@ struct blk_queue_tag 

[PATCH v6 0/7] ZBC / Zoned block device support

2016-09-30 Thread Shaun Tancheff
This series introduces support for zoned block devices. It integrates
earlier submissions by Hannes Reinecke, Damien Le Moal and Shaun Tancheff.
Compared to the previous series version, the code was significantly
simplified by limiting support to zoned devices satisfying the following
conditions:
1) All zones of the device are the same size, with the exception of an
   eventual last smaller runt zone.
2) For host-managed disks, reads must be unrestricted (read commands do not
   fail due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are ignored.

These 2 conditions allowed dropping the zone information cache implemented
in the previous version. This simplifies the code and also reduces the memory
consumption at run time. Support for zoned devices now only require one bit
per zone (less than 8KB in total). This bit field is used to write-lock
zones and prevent the concurrent execution of multiple write commands in
the same zone. This avoids write ordering problems at dispatch time, for
both the simple queue and scsi-mq settings.

The new operations introduced to suport zone manipulation was reduced to
only the two main ZBC/ZAC defined commands: REPORT ZONES (REQ_OP_ZONE_REPORT)
and RESET WRITE POINTER (REQ_OP_ZONE_RESET). This brings the total number of
operations defined to 8, which fits in the 3 bits (REQ_OP_BITS) reserved for
operation code in bio->bi_opf and req->cmd_flags.

Most of the ZBC specific code is kept out of sd.c and implemented in the
new file sd_zbc.c. Similarly, at the block layer, most of the zoned block
device code is implemented in the new blk-zoned.c.

For host-managed zoned block devices, the sequential write constraint of
write pointer zones is exposed to the user. Users of the disk (applications,
file systems or device mappers) must sequentially write to zones. This means
that for raw block device accesses from applications, buffered writes are
unreliable and direct I/Os must be used (or buffered writes with O_SYNC).

Access to zone manipulation operations is also provided to applications
through a set of new ioctls. This allows applications operating on raw
block devices (e.g. mkfs.xxx) to discover a device zone layout and
manipulate zone state.

Changes from v5:
* Rebased on Jens' for-4.9/block branch (v5 is based on next-20160928).

Changes from v4:
* Changed interface of sd_zbc_setup_read_write

Changes from v3:
* Fixed several typos and tabs/spaces
* Added description of zoned and chunk_sectors queue attributes in
  Documentation/ABI/testing/sysfs-block
* Fixed sd_read_capacity call in sd.c and to avoid missing information on
  the first pass of a disk scan
* Fixed scsi_disk zone related field to use logical block size unit instead
  of 512B sector unit.

Changes from v2:
* Use kcalloc to allocate zone information array for ioctl
* Use kcalloc to allocate zone information array for ioctl
* Export GPL the functions blkdev_report_zones and blkdev_reset_zones
* Shuffled uapi definitions from patch 7 into patch 5


Damien Le Moal (1):
  block: Add 'zoned' queue limit

Hannes Reinecke (4):
  blk-sysfs: Add 'chunk_sectors' to sysfs attributes
  block: update chunk_sectors in blk_stack_limits()
  block: Implement support for zoned block devices
  sd: Implement support for ZBC devices

Shaun Tancheff (2):
  block: Define zoned block device operations
  blk-zoned: implement ioctls

 Documentation/ABI/testing/sysfs-block |  29 ++
 block/Kconfig |   8 +
 block/Makefile|   2 +-
 block/blk-core.c  |   4 +
 block/blk-settings.c  |   5 +
 block/blk-sysfs.c |  29 ++
 block/blk-zoned.c | 350 +++
 block/ioctl.c |   4 +
 drivers/scsi/Makefile |   1 +
 drivers/scsi/sd.c | 141 ++--
 drivers/scsi/sd.h |  67 
 drivers/scsi/sd_zbc.c | 627 ++
 include/linux/blk_types.h |   2 +
 include/linux/blkdev.h|  99 ++
 include/scsi/scsi_proto.h |  17 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned.h | 143 
 include/uapi/linux/fs.h   |   4 +
 18 files changed, 1499 insertions(+), 34 deletions(-)
 create mode 100644 block/blk-zoned.c
 create mode 100644 drivers/scsi/sd_zbc.c
 create mode 100644 include/uapi/linux/blkzoned.h

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/7] ZBC / Zoned block device support

2016-09-30 Thread Shaun Tancheff
Hello Bart,

I rebased this series on Jens for-4.9/block and will repost.

Thanks!
--Shaun

On Fri, Sep 30, 2016 at 12:18 PM, Bart Van Assche
<bart.vanass...@sandisk.com> wrote:
> On 09/30/2016 09:47 AM, Shaun Tancheff wrote:
>> On Fri, Sep 30, 2016 at 11:10 AM, Bart Van Assche
>> <bart.vanass...@sandisk.com> wrote:
>>> On 09/29/16 21:11, Damien Le Moal wrote:
>>>> This series introduces support for zoned block devices.
>>>
>>> On top of which kernel version do these patches apply? I tried to apply the
>>> whole series to kernel v4.7 but that caused "git am" to complain ...
>>
>> This series is against linux-next tag next-20160928.
>> You should be able to "git am" the series on top of that.
>
> Hello Shaun,
>
> As far as I know linux-next should not be used as a basis for the
> development of a patch series. Unless something has changed I think
> Jens expects a patch series that applies cleanly on top of his
> for-4.9/block branch. But it doesn't seem like this patch series
> applies cleanly on top of that branch:
>
> $ for p in ~/\[PATCH\ v5\ *; do echo "$(basename "$p")"; git am "$p" || 
> break; done
> [PATCH v5 1_7] block: Add 'zoned' queue limit - Damien Le Moal 
> <damien.lem...@hgst.com> - 2016-09-29 2111.eml
> Applying: block: Add 'zoned' queue limit
> [PATCH v5 2_7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes - Damien Le 
> Moal <damien.lem...@hgst.com> - 2016-09-29 2111.eml
> Applying: blk-sysfs: Add 'chunk_sectors' to sysfs attributes
> [PATCH v5 3_7] block: update chunk_sectors in blk_stack_limits() - Damien Le 
> Moal <damien.lem...@hgst.com> - 2016-09-29 2111.eml
> Applying: block: update chunk_sectors in blk_stack_limits()
> [PATCH v5 4_7] block: Define zoned block device operations - Damien Le Moal 
> <damien.lem...@hgst.com> - 2016-09-29 2111.eml
> Applying: block: Define zoned block device operations
> [PATCH v5 5_7] block: Implement support for zoned block devices - Damien Le 
> Moal <damien.lem...@hgst.com> - 2016-09-29 2111.eml
> Applying: block: Implement support for zoned block devices
> error: patch failed: block/Makefile:22
> error: block/Makefile: patch does not apply
> error: patch failed: include/uapi/linux/Kbuild:70
> error: include/uapi/linux/Kbuild: patch does not apply
> Patch failed at 0001 block: Implement support for zoned block devices
> The copy of the patch that failed is found in: .git/rebase-apply/patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
>
> Bart.



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/7] ZBC / Zoned block device support

2016-09-30 Thread Shaun Tancheff
Hi Bart,

This series is against linux-next tag next-20160928.
You should be able to "git am" the series on top of that.

Thanks!
Shaun

On Fri, Sep 30, 2016 at 11:10 AM, Bart Van Assche
<bart.vanass...@sandisk.com> wrote:
> On 09/29/16 21:11, Damien Le Moal wrote:
>>
>> This series introduces support for zoned block devices.
>
>
> Hi Damien,
>
> On top of which kernel version do these patches apply? I tried to apply the
> whole series to kernel v4.7 but that caused "git am" to complain ...
>
> Thank you,
>
> Bart.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html=DQIC-g=IGDlg0lD0b-nebmJJ0Kp8A=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA=VY30muznazMHPib_ks7gWROq97LIrq37TtKOXyYliB0=s2DsgBOJACiLmv56Aw-uehcdexEfBe73hSnqZDfB0xY=



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/7] sd: Implement support for ZBC devices

2016-09-28 Thread Shaun Tancheff
On Wed, Sep 28, 2016 at 3:45 AM, Damien Le Moal <damien.lem...@hgst.com> wrote:
> From: Hannes Reinecke <h...@suse.de>
>
> Implement ZBC support functions to setup zoned disks, both
> host-managed and host-aware models. Only zoned disks that satisfy
> the following conditions are supported:
> 1) All zones are the same size, with the exception of an eventual
>last smaller runt zone.
> 2) For host-managed disks, reads are unrestricted (reads are not
>failed due to zone or write pointer alignement constraints).
> Zoned disks that do not satisfy these 2 conditions are setup with
> a capacity of 0 to prevent their use.
>
> The function sd_zbc_read_zones, called from sd_revalidate_disk,
> checks that the device satisfies the above two constraints. This
> function may also change the disk capacity previously set by
> sd_read_capacity for devices reporting only the capacity of
> conventional zones at the beginning of the LBA range (i.e. devices
> reporting rc_basis set to 0).
>
> The capacity message output was moved out of sd_read_capacity into
> a new function sd_print_capacity to include this eventual capacity
> change by sd_zbc_read_zones. This new function also includes a call
> to sd_zbc_print_zones to display the number of zones and zone size
> of the device.
>
> Signed-off-by: Hannes Reinecke <h...@suse.de>
>
> [Damien: * Removed zone cache support
>  * Removed mapping of discard to reset write pointer command
>  * Modified sd_zbc_read_zones to include checks that the
>device satisfies the kernel constraints
>  * Implemeted REPORT ZONES setup and post-processing based
>on code from Shaun Tancheff <shaun.tanch...@seagate.com>]
> Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
> ---
>  drivers/scsi/Makefile |   1 +
>  drivers/scsi/sd.c | 143 ---
>  drivers/scsi/sd.h |  70 ++
>  drivers/scsi/sd_zbc.c | 624 
> ++
>  include/scsi/scsi_proto.h |  17 ++
>  5 files changed, 822 insertions(+), 33 deletions(-)
>  create mode 100644 drivers/scsi/sd_zbc.c
>
> diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
> index fc0d9b8..350513c 100644
> --- a/drivers/scsi/Makefile
> +++ b/drivers/scsi/Makefile
> @@ -180,6 +180,7 @@ hv_storvsc-y:= storvsc_drv.o
>
>  sd_mod-objs:= sd.o
>  sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
> +sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
>
>  sr_mod-objs:= sr.o sr_ioctl.o sr_vendor.o
>  ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 51e5629..4d63260 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -93,6 +93,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
> +MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
>
>  #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
>  #define SD_MINORS  16
> @@ -163,7 +164,7 @@ cache_type_store(struct device *dev, struct 
> device_attribute *attr,
> static const char temp[] = "temporary ";
> int len;
>
> -   if (sdp->type != TYPE_DISK)
> +   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
> /* no cache control on RBC devices; theoretically they
>  * can do it, but there's probably so many exceptions
>  * it's not worth the risk */
> @@ -262,7 +263,7 @@ allow_restart_store(struct device *dev, struct 
> device_attribute *attr,
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
>
> -   if (sdp->type != TYPE_DISK)
> +   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
> return -EINVAL;
>
> sdp->allow_restart = simple_strtoul(buf, NULL, 10);
> @@ -392,6 +393,11 @@ provisioning_mode_store(struct device *dev, struct 
> device_attribute *attr,
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
>
> +   if (sd_is_zoned(sdkp)) {
> +   sd_config_discard(sdkp, SD_LBP_DISABLE);
> +   return count;
> +   }
> +
> if (sdp->type != TYPE_DISK)
> return -EINVAL;
>
> @@ -459,7 +465,7 @@ max_write_same_blocks_store(struct device *dev, struct 
> device_attribute *attr,
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
>
> -   if (sdp->type != TYPE_DISK)
> +   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
> return -EINVAL;
>
>  

Re: [PATCH v4 5/7] block: Implement support for zoned block devices

2016-09-28 Thread Shaun Tancheff
On Wed, Sep 28, 2016 at 3:45 AM, Damien Le Moal <damien.lem...@hgst.com> wrote:
> From: Hannes Reinecke <h...@suse.de>
>
> Implement zoned block device zone information reporting and reset.
> Zone information are reported as struct blk_zone. This implementation
> does not differentiate between host-aware and host-managed device
> models and is valid for both. Two functions are provided:
> blkdev_report_zones for discovering the zone configuration of a
> zoned block device, and blkdev_reset_zones for resetting the write
> pointer of sequential zones. The helper function blk_queue_zone_size
> and bdev_zone_size are also provided for, as the name suggest,
> obtaining the zone size (in 512B sectors) of the zones of the device.
>
> Signed-off-by: Hannes Reinecke <h...@suse.de>
>
> [Damien: * Removed the zone cache
>  * Implement report zones operation based on earlier proposal
>by Shaun Tancheff <shaun.tanch...@seagate.com>]
> Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
> ---
>  block/Kconfig |   8 ++
>  block/Makefile|   1 +
>  block/blk-zoned.c | 257 
> ++
>  include/linux/blkdev.h|  31 +
>  include/uapi/linux/Kbuild |   1 +
>  include/uapi/linux/blkzoned.h | 103 +
>  6 files changed, 401 insertions(+)
>  create mode 100644 block/blk-zoned.c
>  create mode 100644 include/uapi/linux/blkzoned.h
>
> diff --git a/block/Kconfig b/block/Kconfig
> index 1d4d624..6b0ad08 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
> T10/SCSI Data Integrity Field or the T13/ATA External Path
> Protection.  If in doubt, say N.
>
> +config BLK_DEV_ZONED
> +   bool "Zoned block device support"
> +   ---help---
> +   Block layer zoned block device support. This option enables
> +   support for ZAC/ZBC host-managed and host-aware zoned block devices.
> +
> +   Say yes here if you have a ZAC or ZBC storage device.
> +
>  config BLK_DEV_THROTTLING
> bool "Block layer bio throttling support"
> depends on BLK_CGROUP=y
> diff --git a/block/Makefile b/block/Makefile
> index 36acdd7..9371bc7 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -22,4 +22,5 @@ obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
>  obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
>  obj-$(CONFIG_BLK_CMDLINE_PARSER)   += cmdline-parser.o
>  obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
> +obj-$(CONFIG_BLK_DEV_ZONED)+= blk-zoned.o
>  obj-$(CONFIG_BLK_MQ_PCI)   += blk-mq-pci.o
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> new file mode 100644
> index 000..1603573
> --- /dev/null
> +++ b/block/blk-zoned.c
> @@ -0,0 +1,257 @@
> +/*
> + * Zoned block device handling
> + *
> + * Copyright (c) 2015, Hannes Reinecke
> + * Copyright (c) 2015, SUSE Linux GmbH
> + *
> + * Copyright (c) 2016, Damien Le Moal
> + * Copyright (c) 2016, Western Digital
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static inline sector_t blk_zone_start(struct request_queue *q,
> + sector_t sector)
> +{
> +   sector_t zone_mask = blk_queue_zone_size(q) - 1;
> +
> +   return sector & ~zone_mask;
> +}
> +
> +/*
> + * Check that a zone report belongs to the partition.
> + * If yes, fix its start sector and write pointer, copy it in the
> + * zone information array and return true. Return false otherwise.
> + */
> +static bool blkdev_report_zone(struct block_device *bdev,
> +  struct blk_zone *rep,
> +  struct blk_zone *zone)
> +{
> +   sector_t offset = get_start_sect(bdev);
> +
> +   if (rep->start < offset)
> +   return false;
> +
> +   rep->start -= offset;
> +   if (rep->start + rep->len > bdev->bd_part->nr_sects)
> +   return false;
> +
> +   if (rep->type == BLK_ZONE_TYPE_CONVENTIONAL)
> +   rep->wp = rep->start + rep->len;
> +   else
> +   rep->wp -= offset;
> +   memcpy(zone, rep, sizeof(struct blk_zone));
> +
> +   return true;
> +}
> +
> +/**
> + * blkdev_report_zones - Get zones information
> + * @bdev:  Target block device
> + * @sector:Sector from which to report zones
> + * @zones: Array of zone structures where to return the zones information
> + * @nr_zones:  Number of zone structures in the zone array
> + * @gfp_mask:

Re: [PATCH v4 3/7] block: update chunk_sectors in blk_stack_limits()

2016-09-28 Thread Shaun Tancheff
On Wed, Sep 28, 2016 at 3:45 AM, Damien Le Moal <damien.lem...@hgst.com> wrote:
> From: Hannes Reinecke <h...@suse.de>
>
> Signed-off-by: Hannes Reinecke <h...@suse.com>
> Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
> ---
>  block/blk-settings.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index b1d5b7f..55369a6 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct 
> queue_limits *b,
> t->discard_granularity;
> }
>
> +   if (b->chunk_sectors)
> +   t->chunk_sectors = min_not_zero(t->chunk_sectors,
> +   b->chunk_sectors);
> +
> return ret;
>  }
>  EXPORT_SYMBOL(blk_stack_limits);
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes

2016-09-28 Thread Shaun Tancheff
On Wed, Sep 28, 2016 at 3:45 AM, Damien Le Moal <damien.lem...@hgst.com> wrote:
> From: Hannes Reinecke <h...@suse.de>
>
> The queue limits already have a 'chunk_sectors' setting, so
> we should be presenting it via sysfs.
>
> Signed-off-by: Hannes Reinecke <h...@suse.de>
>
> [Damien: Updated Documentation/ABI/testing/sysfs-block]
>
> Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
> ---
>  Documentation/ABI/testing/sysfs-block | 13 +
>  block/blk-sysfs.c | 11 +++
>  2 files changed, 24 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-block 
> b/Documentation/ABI/testing/sysfs-block
> index 75a5055..ee2d5cd 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -251,3 +251,16 @@ Description:
> since drive-managed zoned block devices do not support
> zone commands, they will be treated as regular block
> devices and zoned will report "none".
> +
> +What:  /sys/block//queue/chunk_sectors
> +Date:  September 2016
> +Contact:   Hannes Reinecke <h...@suse.com>
> +Description:
> +   chunk_sectors has different meaning depending on the type
> +   of the disk. For a RAID device (dm-raid), chunk_sectors
> +   indicates the size in 512B sectors of the RAID volume
> +   stripe segment. For a zoned block device, either
> +   host-aware or host-managed, chunk_sectors indicates the
> +   size of 512B sectors of the zones of the device, with
> +   the eventual exception of the last zone of the device
> +   which may be smaller.
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index ff9cd9c..488c2e2 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct 
> request_queue *q, char *pag
> return queue_var_show(queue_physical_block_size(q), page);
>  }
>
> +static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
> +{
> +   return queue_var_show(q->limits.chunk_sectors, page);
> +}
> +
>  static ssize_t queue_io_min_show(struct request_queue *q, char *page)
>  {
> return queue_var_show(queue_io_min(q), page);
> @@ -455,6 +460,11 @@ static struct queue_sysfs_entry 
> queue_physical_block_size_entry = {
> .show = queue_physical_block_size_show,
>  };
>
> +static struct queue_sysfs_entry queue_chunk_sectors_entry = {
> +   .attr = {.name = "chunk_sectors", .mode = S_IRUGO },
> +   .show = queue_chunk_sectors_show,
> +};
> +
>  static struct queue_sysfs_entry queue_io_min_entry = {
> .attr = {.name = "minimum_io_size", .mode = S_IRUGO },
> .show = queue_io_min_show,
> @@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
> _hw_sector_size_entry.attr,
> _logical_block_size_entry.attr,
>     _physical_block_size_entry.attr,
> +   _chunk_sectors_entry.attr,
> _io_min_entry.attr,
> _io_opt_entry.attr,
> _discard_granularity_entry.attr,
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/7] block: Add 'zoned' queue limit

2016-09-28 Thread Shaun Tancheff
tr,
> _nomerges_entry.attr,
> _rq_affinity_entry.attr,
> _iostats_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c47c358..f19e16b 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -261,6 +261,15 @@ struct blk_queue_tag {
>  #define BLK_SCSI_MAX_CMDS  (256)
>  #define BLK_SCSI_CMD_PER_LONG  (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
>
> +/*
> + * Zoned block device models (zoned limit).
> + */
> +enum blk_zoned_model {
> +   BLK_ZONED_NONE, /* Regular block device */
> +   BLK_ZONED_HA,   /* Host-aware zoned block device */
> +   BLK_ZONED_HM,   /* Host-managed zoned block device */
> +};
> +
>  struct queue_limits {
> unsigned long   bounce_pfn;
> unsigned long   seg_boundary_mask;
> @@ -290,6 +299,7 @@ struct queue_limits {
> unsigned char   cluster;
> unsigned char   discard_zeroes_data;
> unsigned char   raid_partial_stripes_expensive;
> +   enum blk_zoned_modelzoned;
>  };
>
>  struct request_queue {
> @@ -627,6 +637,23 @@ static inline unsigned int blk_queue_cluster(struct 
> request_queue *q)
> return q->limits.cluster;
>  }
>
> +static inline enum blk_zoned_model
> +blk_queue_zoned_model(struct request_queue *q)
> +{
> +   return q->limits.zoned;
> +}
> +
> +static inline bool blk_queue_is_zoned(struct request_queue *q)
> +{
> +   switch (blk_queue_zoned_model(q)) {
> +   case BLK_ZONED_HA:
> +   case BLK_ZONED_HM:
> +   return true;
> +   default:
> +   return false;
> +   }
> +}
> +
>  /*
>   * We regard a request as sync, if either a read or a sync write
>   */
> @@ -1354,6 +1381,26 @@ static inline unsigned int bdev_write_same(struct 
> block_device *bdev)
> return 0;
>  }
>
> +static inline enum blk_zoned_model bdev_zoned_model(struct block_device 
> *bdev)
> +{
> +   struct request_queue *q = bdev_get_queue(bdev);
> +
> +   if (q)
> +   return blk_queue_zoned_model(q);
> +
> +   return BLK_ZONED_NONE;
> +}
> +
> +static inline bool bdev_is_zoned(struct block_device *bdev)
> +{
> +   struct request_queue *q = bdev_get_queue(bdev);
> +
> +   if (q)
> +   return blk_queue_is_zoned(q);
> +
> +   return false;
> +}
> +
>  static inline int queue_dma_alignment(struct request_queue *q)
>  {
> return q ? q->dma_alignment : 511;
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  
> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html=DQIBAg=IGDlg0lD0b-nebmJJ0Kp8A=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA=OrJGmhxktFJiu0t9zZDWOTM1h0hle-YsGIdgS8egsv4=iBLL4ue7jd5w6PMQqeLF8l-1CVvqmRuI_aQgJJV6Cp0=



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/7] block: update chunk_sectors in blk_stack_limits()

2016-09-27 Thread Shaun Tancheff
On Mon, Sep 26, 2016 at 6:14 AM, Damien Le Moal <damien.lem...@hgst.com> wrote:
> From: Hannes Reinecke <h...@suse.de>
>
> Signed-off-by: Hannes Reinecke <h...@suse.com>
> Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
> ---
>  block/blk-settings.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index b1d5b7f..55369a6 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -631,6 +631,10 @@ int blk_stack_limits(struct queue_limits *t, struct 
> queue_limits *b,
> t->discard_granularity;
> }
>
> +   if (b->chunk_sectors)
> +   t->chunk_sectors = min_not_zero(t->chunk_sectors,
> +   b->chunk_sectors);
> +
> return ret;
>  }
>  EXPORT_SYMBOL(blk_stack_limits);
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>


> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/7] blk-sysfs: Add 'chunk_sectors' to sysfs attributes

2016-09-27 Thread Shaun Tancheff
On Mon, Sep 26, 2016 at 6:14 AM, Damien Le Moal <damien.lem...@hgst.com> wrote:
> From: Hannes Reinecke <h...@suse.de>
>
> The queue limits already have a 'chunk_sectors' setting, so
> we should be presenting it via sysfs.
>
> Signed-off-by: Hannes Reinecke <h...@suse.de>
> Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
> ---
>  block/blk-sysfs.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index ff9cd9c..488c2e2 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -130,6 +130,11 @@ static ssize_t queue_physical_block_size_show(struct 
> request_queue *q, char *pag
> return queue_var_show(queue_physical_block_size(q), page);
>  }
>
> +static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page)
> +{
> +   return queue_var_show(q->limits.chunk_sectors, page);
> +}
> +
>  static ssize_t queue_io_min_show(struct request_queue *q, char *page)
>  {
> return queue_var_show(queue_io_min(q), page);
> @@ -455,6 +460,11 @@ static struct queue_sysfs_entry 
> queue_physical_block_size_entry = {
> .show = queue_physical_block_size_show,
>  };
>
> +static struct queue_sysfs_entry queue_chunk_sectors_entry = {
> +   .attr = {.name = "chunk_sectors", .mode = S_IRUGO },
> +   .show = queue_chunk_sectors_show,
> +};
> +
>  static struct queue_sysfs_entry queue_io_min_entry = {
> .attr = {.name = "minimum_io_size", .mode = S_IRUGO },
> .show = queue_io_min_show,
> @@ -555,6 +565,7 @@ static struct attribute *default_attrs[] = {
> _hw_sector_size_entry.attr,
> _logical_block_size_entry.attr,
> _physical_block_size_entry.attr,
> +   _chunk_sectors_entry.attr,
>     _io_min_entry.attr,
> _io_opt_entry.attr,
> _discard_granularity_entry.attr,
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>


> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/7] block: Add 'zoned' queue limit

2016-09-27 Thread Shaun Tancheff
t; +{
> +   switch (blk_queue_zoned_model(q)) {
> +   case BLK_ZONED_HA:
> +   case BLK_ZONED_HM:
> +   return true;
> +   default:
> +   return false;
> +   }
> +}
> +
>  /*
>   * We regard a request as sync, if either a read or a sync write
>   */
> @@ -1354,6 +1381,26 @@ static inline unsigned int bdev_write_same(struct 
> block_device *bdev)
> return 0;
>  }
>
> +static inline enum blk_zoned_model bdev_zoned_model(struct block_device 
> *bdev)
> +{
> +   struct request_queue *q = bdev_get_queue(bdev);
> +
> +   if (q)
> +   return blk_queue_zoned_model(q);
> +
> +   return BLK_ZONED_NONE;
> +}
> +
> +static inline bool bdev_is_zoned(struct block_device *bdev)
> +{
> +   struct request_queue *q = bdev_get_queue(bdev);
> +
> +   if (q)
> +   return blk_queue_is_zoned(q);
> +
> +   return false;
> +}
> +
>  static inline int queue_dma_alignment(struct request_queue *q)
>  {
> return q ? q->dma_alignment : 511;
> --
> 2.7.4

Reviewed-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Tested-by: Shaun Tancheff <shaun.tanch...@seagate.com>

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/7] sd: Implement support for ZBC devices

2016-09-27 Thread Shaun Tancheff
On Mon, Sep 26, 2016 at 6:14 AM, Damien Le Moal <damien.lem...@hgst.com> wrote:
> From: Hannes Reinecke <h...@suse.de>
>
> Implement ZBC support functions to setup zoned disks, both
> host-managed and host-aware models. Only zoned disks that satisfy
> the following conditions are supported:
> 1) All zones are the same size, with the exception of an eventual
>last smaller runt zone.
> 2) For host-managed disks, reads are unrestricted (reads are not
>failed due to zone or write pointer alignement constraints).
> Zoned disks that do not satisfy these 2 conditions will be ignored.
>
> The capacity read of the device triggers the zoned block device
> checks. As this needs the zone model of the disk, the call to
> sd_read_capacity is moved after the call to
> sd_read_block_characteristics so that host-aware devices are
> properlly detected and initialized. The call to sd_zbc_read_zones
> in sd_read_capacity may change the device capacity obtained with
> the sd_read_capacity_16 function for devices reporting only the
> capacity of conventional zones at the beginning of the LBA range
> (i.e. devices with rc_basis set to 0).
>
> Signed-off-by: Hannes Reinecke <h...@suse.de>
>
> [Damien: * Removed zone cache support
>  * Removed mapping of discard to reset write pointer command
>  * Modified sd_zbc_read_zones to include checks that the
>device satisfies the kernel constraints
>  * Implemeted REPORT ZONES setup and post-processing based
>on code from Shaun Tancheff <shaun.tanch...@seagate.com>]
> Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
> ---
>  drivers/scsi/Makefile |   1 +
>  drivers/scsi/sd.c |  97 ++--
>  drivers/scsi/sd.h |  67 ++
>  drivers/scsi/sd_zbc.c | 586 
> ++
>  include/scsi/scsi_proto.h |  17 ++
>  5 files changed, 754 insertions(+), 14 deletions(-)
>  create mode 100644 drivers/scsi/sd_zbc.c
>
> diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
> index fc0d9b8..350513c 100644
> --- a/drivers/scsi/Makefile
> +++ b/drivers/scsi/Makefile
> @@ -180,6 +180,7 @@ hv_storvsc-y:= storvsc_drv.o
>
>  sd_mod-objs:= sd.o
>  sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
> +sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
>
>  sr_mod-objs:= sr.o sr_ioctl.o sr_vendor.o
>  ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 51e5629..4b3523b 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -93,6 +93,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
> +MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
>
>  #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
>  #define SD_MINORS  16
> @@ -163,7 +164,7 @@ cache_type_store(struct device *dev, struct 
> device_attribute *attr,
> static const char temp[] = "temporary ";
> int len;
>
> -   if (sdp->type != TYPE_DISK)
> +   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
> /* no cache control on RBC devices; theoretically they
>  * can do it, but there's probably so many exceptions
>  * it's not worth the risk */
> @@ -262,7 +263,7 @@ allow_restart_store(struct device *dev, struct 
> device_attribute *attr,
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
>
> -   if (sdp->type != TYPE_DISK)
> +   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
> return -EINVAL;
>
> sdp->allow_restart = simple_strtoul(buf, NULL, 10);
> @@ -392,6 +393,11 @@ provisioning_mode_store(struct device *dev, struct 
> device_attribute *attr,
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
>
> +   if (sd_is_zoned(sdkp)) {
> +   sd_config_discard(sdkp, SD_LBP_DISABLE);
> +   return count;
> +   }
> +
> if (sdp->type != TYPE_DISK)
> return -EINVAL;
>
> @@ -459,7 +465,7 @@ max_write_same_blocks_store(struct device *dev, struct 
> device_attribute *attr,
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
>
> -   if (sdp->type != TYPE_DISK)
> +   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
> return -EINVAL;
>
> err = kstrtoul(buf, 10, );
> @@ -844,6 +850,13 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd 
> *cmd)
>
> BUG_ON(bio_offset(bio) || bio_i

[PATCH v3 7/7] blk-zoned: implement ioctls

2016-09-27 Thread Shaun Tancheff
Adds the new BLKREPORTZONE and BLKRESETZONE ioctls for respectively
obtaining the zone configuration of a zoned block device and resetting
the write pointer of sequential zones of a zoned block device.

The BLKREPORTZONE ioctl maps directly to a single call of the function
blkdev_report_zones. The zone information result is passed as an array
of struct blk_zone identical to the structure used internally for
processing the REQ_OP_ZONE_REPORT operation.  The BLKRESETZONE ioctl
maps to a single call of the blkdev_reset_zones function.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
---
Changes since v2:
 - Changed kzalloc() to kcalloc() per Christoph
 - Added ioctl specific bits to uapi as blkzoned.h is now added in an earlier
   patch.

 block/blk-zoned.c | 93 +++
 block/ioctl.c |  4 ++
 include/linux/blkdev.h| 22 ++
 include/uapi/linux/blkzoned.h | 40 +++
 include/uapi/linux/fs.h   |  4 ++
 5 files changed, 163 insertions(+)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index bc4159d..91f7347 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -240,3 +240,96 @@ int blkdev_reset_zones(struct block_device *bdev,
return 0;
 }
 EXPORT_SYMBOL_GPL(blkdev_reset_zones);
+
+/**
+ * BLKREPORTZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+   void __user *argp = (void __user *)arg;
+   struct request_queue *q;
+   struct blk_zone_report rep;
+   struct blk_zone *zones;
+   int ret;
+
+   if (!argp)
+   return -EINVAL;
+
+   q = bdev_get_queue(bdev);
+   if (!q)
+   return -ENXIO;
+
+   if (!blk_queue_is_zoned(q))
+   return -ENOTTY;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
+   if (copy_from_user(, argp, sizeof(struct blk_zone_report)))
+   return -EFAULT;
+
+   if (!rep.nr_zones)
+   return -EINVAL;
+
+   zones = kcalloc(rep.nr_zones, sizeof(struct blk_zone), GFP_KERNEL);
+   if (!zones)
+   return -ENOMEM;
+
+   ret = blkdev_report_zones(bdev, rep.sector,
+ zones, _zones,
+ GFP_KERNEL);
+   if (ret)
+   goto out;
+
+   if (copy_to_user(argp, , sizeof(struct blk_zone_report))) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   if (rep.nr_zones) {
+   if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
+sizeof(struct blk_zone) * rep.nr_zones))
+   ret = -EFAULT;
+   }
+
+ out:
+   kfree(zones);
+
+   return ret;
+}
+
+/**
+ * BLKRESETZONE ioctl processing.
+ * Called from blkdev_ioctl.
+ */
+int blkdev_reset_zones_ioctl(struct block_device *bdev, fmode_t mode,
+unsigned int cmd, unsigned long arg)
+{
+   void __user *argp = (void __user *)arg;
+   struct request_queue *q;
+   struct blk_zone_range zrange;
+
+   if (!argp)
+   return -EINVAL;
+
+   q = bdev_get_queue(bdev);
+   if (!q)
+   return -ENXIO;
+
+   if (!blk_queue_is_zoned(q))
+   return -ENOTTY;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   if (copy_from_user(, argp, sizeof(struct blk_zone_range)))
+   return -EFAULT;
+
+   return blkdev_reset_zones(bdev, zrange.sector, zrange.nr_sectors,
+ GFP_KERNEL);
+}
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..448f78a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -513,6 +513,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
BLKDEV_DISCARD_SECURE);
case BLKZEROOUT:
return blk_ioctl_zeroout(bdev, mode, arg);
+   case BLKREPORTZONE:
+   return blkdev_report_zones_ioctl(bdev, mode, cmd, arg);
+   case BLKRESETZONE:
+   return blkdev_reset_zones_ioctl(bdev, mode, cmd, arg);
case HDIO_GETGEO:
return blkdev_getgeo(bdev, argp);
case BLKRAGET:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6316972..0a75285 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -315,6 +315,28 @@ extern int blkdev_report_zones(struct block_device *,
unsigned int *, gfp_t);
 extern int blkdev_reset_zones(struct block_device *, sector_t,
sector_t, gfp_t);
+
+extern int blkdev_report_zones_ioctl(struct block_d

[PATCH v3 5/7] block: Implement support for zoned block devices

2016-09-27 Thread Shaun Tancheff
From: Hannes Reinecke <h...@suse.de>

Implement zoned block device zone information reporting and reset.
Zone information are reported as struct blk_zone. This implementation
does not differentiate between host-aware and host-managed device
models and is valid for both. Two functions are provided:
blkdev_report_zones for discovering the zone configuration of a
zoned block device, and blkdev_reset_zones for resetting the write
pointer of sequential zones. The helper function blk_queue_zone_size
and bdev_zone_size are also provided for, as the name suggest,
obtaining the zone size (in 512B sectors) of the zones of the device.

Signed-off-by: Hannes Reinecke <h...@suse.de>

[Damien: * Removed the zone cache
 * Implement report zones operation based on earlier proposal
       by Shaun Tancheff <shaun.tanch...@seagate.com>]
Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
---

Changes from v2:
 - Added EXPORT_SYMBOL_GPL() per Damien
 - Added uapi blkzoned.h earlier and put shared enums/struct directly
   into blkzoned.h

 block/Kconfig |   8 ++
 block/Makefile|   1 +
 block/blk-zoned.c | 242 ++
 include/linux/blkdev.h|  30 ++
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned.h | 103 ++
 6 files changed, 385 insertions(+)
 create mode 100644 block/blk-zoned.c
 create mode 100644 include/uapi/linux/blkzoned.h

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..6b0ad08 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -89,6 +89,14 @@ config BLK_DEV_INTEGRITY
T10/SCSI Data Integrity Field or the T13/ATA External Path
Protection.  If in doubt, say N.
 
+config BLK_DEV_ZONED
+   bool "Zoned block device support"
+   ---help---
+   Block layer zoned block device support. This option enables
+   support for ZAC/ZBC host-managed and host-aware zoned block devices.
+
+   Say yes here if you have a ZAC or ZBC storage device.
+
 config BLK_DEV_THROTTLING
bool "Block layer bio throttling support"
depends on BLK_CGROUP=y
diff --git a/block/Makefile b/block/Makefile
index 36acdd7..9371bc7 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -22,4 +22,5 @@ obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
 obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)   += cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
+obj-$(CONFIG_BLK_DEV_ZONED)+= blk-zoned.o
 obj-$(CONFIG_BLK_MQ_PCI)   += blk-mq-pci.o
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
new file mode 100644
index 000..bc4159d
--- /dev/null
+++ b/block/blk-zoned.c
@@ -0,0 +1,242 @@
+/*
+ * Zoned block device handling
+ *
+ * Copyright (c) 2015, Hannes Reinecke
+ * Copyright (c) 2015, SUSE Linux GmbH
+ *
+ * Copyright (c) 2016, Damien Le Moal
+ * Copyright (c) 2016, Western Digital
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static inline sector_t blk_zone_start(struct request_queue *q,
+ sector_t sector)
+{
+   sector_t zone_mask = blk_queue_zone_size(q) - 1;
+
+   return sector & ~zone_mask;
+}
+
+static inline void blkdev_report_to_zone(struct block_device *bdev,
+void *rep,
+struct blk_zone *zone)
+{
+   sector_t offset = get_start_sect(bdev);
+
+   memcpy(zone, rep, sizeof(struct blk_zone));
+   zone->start -= offset;
+   if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
+   zone->wp = zone->start + zone->len;
+   else
+   zone->wp -= offset;
+}
+
+/**
+ * blkdev_report_zones - Get zones information
+ * @bdev:  Target block device
+ * @sector:Sector from which to report zones
+ * @zones:  Array of zone structures where to return the zones information
+ * @nr_zones:   Number of zone structures in the zone array
+ * @gfp_mask:  Memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Get zone information starting from the zone containing @sector.
+ *The number of zone information reported may be less than the number
+ *requested by @nr_zones. The number of zones actually reported is
+ *returned in @nr_zones.
+ */
+int blkdev_report_zones(struct block_device *bdev,
+   sector_t sector,
+   struct blk_zone *zones,
+   unsigned int *nr_zones,
+   gfp_t gfp_mask)
+{
+   struct request_queue *q = bdev_get_queue(bdev);
+   struct blk_zone_report_hdr *hdr;
+   unsigned int nrz = *nr_zones;
+   struct page *page;
+   unsigned int nr_rep;
+   size_t rep_bytes;
+   unsigned int nr_pages;
+   struct bio *bio;
+   struct bio_vec *bv;
+   unsigned int i, nz;
+   unsigned int ofst;
+   void *a

Re: [PATCH v2 7/7] blk-zoned: implement ioctls

2016-09-26 Thread Shaun Tancheff
No objection here.

On Mon, Sep 26, 2016 at 6:30 PM, Damien Le Moal <damien.lem...@hgst.com> wrote:
>
> Christoph,
>
> On 9/27/16 01:37, Christoph Hellwig wrote:
>>> -/*
>>> - * Zone type.
>>> - */
>>> -enum blk_zone_type {
>>> -BLK_ZONE_TYPE_UNKNOWN,
>>> -BLK_ZONE_TYPE_CONVENTIONAL,
>>> -BLK_ZONE_TYPE_SEQWRITE_REQ,
>>> -BLK_ZONE_TYPE_SEQWRITE_PREF,
>>> -};
>>
>> Please don't move this code around after it was added just two
>> patches earlier.  I'd say just split adding the new blkzoned.h
>> uapi header into a patch of it's own and add that before the
>> core block code.
>
> Or we could just simply merge patches 5 and 7... Even more simple.
> Would that be OK ? Shaun, any objection ?
>
> Best regards.
>
> --
> Damien Le Moal, Ph.D.
> Sr. Manager, System Software Group, HGST Research,
> HGST, a Western Digital brand
> damien.lem...@hgst.com
> (+81) 0466-98-3593 (ext. 513593)
> 1 kirihara-cho, Fujisawa,
> Kanagawa, 252-0888 Japan
> www.hgst.com
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality 
> Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or 
> legally privileged information of WDC and/or its affiliates, and are intended 
> solely for the use of the individual or entity to which they are addressed. 
> If you are not the intended recipient, any disclosure, copying, distribution 
> or any action taken or omitted to be taken in reliance on it, is prohibited. 
> If you have received this e-mail in error, please notify the sender 
> immediately and delete the e-mail in its entirety from your system.
>



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/7] blk-zoned: implement ioctls

2016-09-26 Thread Shaun Tancheff
On Mon, Sep 26, 2016 at 11:37 AM, Christoph Hellwig <h...@infradead.org> wrote:
>> + zones = kzalloc(sizeof(struct blk_zone) * rep.nr_zones,
>> + GFP_KERNEL);
>> + if (!zones)
>> + return -ENOMEM;
>
> This should use kcalloc to get us underflow checking for the user
> controlled allocation size.

Ah. yes. Will fix that.

>> + if (copy_to_user(argp, , sizeof(struct blk_zone_report))) {
>> + ret = -EFAULT;
>> + goto out;
>> + }
>> +
>> + if (rep.nr_zones) {
>> + if (copy_to_user(argp + sizeof(struct blk_zone_report), zones,
>> +  sizeof(struct blk_zone) * rep.nr_zones))
>> + ret = -EFAULT;
>> + }
>
> We could actually do this with a single big copy_to_user.  Not that
> it really matters, though..

Except our source locations are disjoint (stack and kcalloc'd).

>> -/*
>> - * Zone type.
>> - */
>> -enum blk_zone_type {
>> - BLK_ZONE_TYPE_UNKNOWN,
>> - BLK_ZONE_TYPE_CONVENTIONAL,
>> - BLK_ZONE_TYPE_SEQWRITE_REQ,
>> - BLK_ZONE_TYPE_SEQWRITE_PREF,
>> -};
>
> Please don't move this code around after it was added just two
> patches earlier.  I'd say just split adding the new blkzoned.h
> uapi header into a patch of it's own and add that before the
> core block code.

Sounds good. Will reshuffle the patchset tonight.

Thanks!
-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: UFS API in the kernel

2016-09-26 Thread Shaun Tancheff
On Thu, Sep 22, 2016 at 10:21 AM, Joao Pinto <joao.pi...@synopsys.com> wrote:
> Hi!
>
> I am designing an application that has the goal to be an utility for Unipro 
> and
> UFS testing purposes. This application is going to run on top of a recent 
> Linux
> Kernel containing the new UFS stack (including the new DWC drivers).
>
> I am considering doing the following:
> a) Create a new config item called CONFIG_UFS_CHARDEV which is going to 
> create a
> char device responsible to make some IOCTL available for user-space 
> applications
> b) Create a linux/ufs.h header file that contains data structures declarations
> that will be needed in user-space applications

I am not very familiar with UFS devices, that said you should have an
sgX chardev being created already so you can handle SG_IO requests.
There also appear to be some sysfs entries being created.

So between sg and sysfs you should be able to handle any user-space
out of band requests without resorting to making a new chardev.

Adding more sysfs entries, if you need them, should be fine.

You may find it easier to expand on the existing interfaces than to
get consensus on a new driver and ioctls.

Hope this helps,
Shaun

> Could you please advise me about what the correct approach should be to make 
> it
> as standard as possible and usable in the future?
>
> Thank you very much for your help!
>
> regards,
> Joao
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  
> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html=DQICaQ=IGDlg0lD0b-nebmJJ0Kp8A=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA=vJFB6pCywWtdvkgHz9Vc0jQz0xzeyZlr-7eCWYu88nM=yiQLPFpqmMrbqLZz1Jb3aNqOje2dRMLJHEzUDobwcXc=



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] blk-zoned: Add ioctl interface for zone operations

2016-09-20 Thread Shaun Tancheff
On Mon, Sep 19, 2016 at 4:27 PM, Damien Le Moal <damien.lem...@hgst.com> wrote:
> From: Shaun Tancheff <shaun.tanch...@seagate.com>
>
> Adds the new BLKUPDATEZONES, BLKREPORTZONE, BLKRESETZONE,
> BLKOPENZONE, BLKCLOSEZONE and BLKFINISHZONE ioctls.
>
> BLKREPORTZONE implementation uses the device queue zone RB-tree by
> default and no actual command is issued to the device. If the
> application needs access to the untracked zone attributes (non-seq
> flag or reset recommended flag, offline or read-only zone condition,
> etc), BLKUPDATEZONES must be issued first to force an update of the
> cached zone information.
>
> Changelog (Damien):
> * Simplified blkzone descriptor (removed bit-fields and use CPU
>   endianness)
> * Changed report ioctl to operate on single zone instead of an
>   array of blkzone structures.

I think something with this degree of changes from what
I posted should not include my signed-off-by.

I also really don't like forcing the reply to be a single zone. I
think the user should be able to ask for as many or as few as
they would like.

> Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
> Signed-off-by: Damien Le Moal <damien.lem...@hgst.com>
> ---
>  block/blk-zoned.c | 115 
> ++
>  block/ioctl.c |   8 +++
>  include/linux/blkdev.h|   7 +++
>  include/uapi/linux/Kbuild |   1 +
>  include/uapi/linux/blkzoned.h |  91 +
>  include/uapi/linux/fs.h   |   1 +
>  6 files changed, 223 insertions(+)
>  create mode 100644 include/uapi/linux/blkzoned.h
>
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index a107940..71205c8 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  void blk_init_zones(struct request_queue *q)
>  {
> @@ -336,3 +337,117 @@ int blkdev_finish_zone(struct block_device *bdev,
> return blkdev_issue_zone_action(bdev, sector, REQ_OP_ZONE_FINISH,
> gfp_mask);
>  }
> +
> +static int blkdev_report_zone_ioctl(struct block_device *bdev,
> +   void __user *argp)
> +{
> +   struct blk_zone *zone;
> +   struct blkzone z;
> +
> +   if (copy_from_user(, argp, sizeof(struct blkzone)))
> +   return -EFAULT;
> +
> +   zone = blk_lookup_zone(bdev_get_queue(bdev), z.start);
> +   if (!zone)
> +   return -EINVAL;
> +
> +   memset(, 0, sizeof(struct blkzone));
> +
> +   blk_lock_zone(zone);
> +
> +   blk_wait_for_zone_update(zone);
> +
> +   z.len = zone->len;
> +   z.start = zone->start;
> +   z.wp = zone->wp;
> +   z.type = zone->type;
> +   z.cond = zone->cond;
> +   z.non_seq = zone->non_seq;
> +   z.reset = zone->reset;
> +
> +   blk_unlock_zone(zone);
> +
> +   if (copy_to_user(argp, , sizeof(struct blkzone)))
> +   return -EFAULT;
> +
> +   return 0;
> +}
> +
> +static int blkdev_zone_action_ioctl(struct block_device *bdev,
> +   unsigned cmd, void __user *argp)
> +{
> +   unsigned int op;
> +   u64 sector;
> +
> +   if (get_user(sector, (u64 __user *)argp))
> +   return -EFAULT;
> +
> +   switch (cmd) {
> +   case BLKRESETZONE:
> +   op = REQ_OP_ZONE_RESET;
> +   break;
> +   case BLKOPENZONE:
> +   op = REQ_OP_ZONE_OPEN;
> +   break;
> +   case BLKCLOSEZONE:
> +   op = REQ_OP_ZONE_CLOSE;
> +   break;
> +   case BLKFINISHZONE:
> +   op = REQ_OP_ZONE_FINISH;
> +   break;
> +   }
> +
> +   return blkdev_issue_zone_action(bdev, sector, op, GFP_KERNEL);
> +}
> +
> +/**
> + * Called from blkdev_ioctl.
> + */
> +int blkdev_zone_ioctl(struct block_device *bdev, fmode_t mode,
> + unsigned cmd, unsigned long arg)
> +{
> +   void __user *argp = (void __user *)arg;
> +   struct request_queue *q;
> +   int ret;
> +
> +   if (!argp)
> +   return -EINVAL;
> +
> +   q = bdev_get_queue(bdev);
> +   if (!q)
> +   return -ENXIO;
> +
> +   if (!blk_queue_zoned(q))
> +   return -ENOTTY;
> +
> +   if (!capable(CAP_SYS_ADMIN))
> +   return -EACCES;
> +
> +   switch (cmd) {
> +   case BLKREPORTZONE:
> +   ret = blkdev_report_zone_ioctl(bdev, argp);
> +   break

Re: [PATCH 8/9] sd: Implement support for ZBC devices

2016-09-19 Thread Shaun Tancheff
On Mon, Sep 19, 2016 at 4:27 PM, Damien Le Moal  wrote:
> From: Hannes Reinecke 
>
> Implement ZBC support functions to setup zoned disks and fill the
> block device zone information tree during the device scan. The
> zone information tree is also always updated on disk revalidation.
> This adds support for the REQ_OP_ZONE* operations and also implements
> the new RESET_WP provisioning mode so that discard requests can be
> mapped to the RESET WRITE POINTER command for devices with a constant
> zone size.
>
> The capacity read of the device triggers the zone information read
> for zoned block devices. As this needs the device zone model, the
> the call to sd_read_capacity is moved after the call to
> sd_read_block_characteristics so that host-aware devices are
> properlly initialized. The call to sd_zbc_read_zones in
> sd_read_capacity may change the device capacity obtained with
> the sd_read_capacity_16 function for devices reporting only the
> capacity of conventional zones at the beginning of the LBA range
> (i.e. devices with rc_basis et to 0).
>
> Signed-off-by: Hannes Reinecke 
> Signed-off-by: Damien Le Moal 
> ---
>  drivers/scsi/Makefile |1 +
>  drivers/scsi/sd.c |  147 --
>  drivers/scsi/sd.h |   68 +++
>  drivers/scsi/sd_zbc.c | 1097 
> +
>  include/scsi/scsi_proto.h |   17 +
>  5 files changed, 1304 insertions(+), 26 deletions(-)
>  create mode 100644 drivers/scsi/sd_zbc.c
>
> diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
> index d539798..fabcb6d 100644
> --- a/drivers/scsi/Makefile
> +++ b/drivers/scsi/Makefile
> @@ -179,6 +179,7 @@ hv_storvsc-y:= storvsc_drv.o
>
>  sd_mod-objs:= sd.o
>  sd_mod-$(CONFIG_BLK_DEV_INTEGRITY) += sd_dif.o
> +sd_mod-$(CONFIG_BLK_DEV_ZONED) += sd_zbc.o
>
>  sr_mod-objs:= sr.o sr_ioctl.o sr_vendor.o
>  ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d3e852a..46b8b78 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -92,6 +92,7 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
>  MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
> +MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
>
>  #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
>  #define SD_MINORS  16
> @@ -99,7 +100,6 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
>  #define SD_MINORS  0
>  #endif
>
> -static void sd_config_discard(struct scsi_disk *, unsigned int);
>  static void sd_config_write_same(struct scsi_disk *);
>  static int  sd_revalidate_disk(struct gendisk *);
>  static void sd_unlock_native_capacity(struct gendisk *disk);
> @@ -162,7 +162,7 @@ cache_type_store(struct device *dev, struct 
> device_attribute *attr,
> static const char temp[] = "temporary ";
> int len;
>
> -   if (sdp->type != TYPE_DISK)
> +   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
> /* no cache control on RBC devices; theoretically they
>  * can do it, but there's probably so many exceptions
>  * it's not worth the risk */
> @@ -261,7 +261,7 @@ allow_restart_store(struct device *dev, struct 
> device_attribute *attr,
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
>
> -   if (sdp->type != TYPE_DISK)
> +   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
> return -EINVAL;
>
> sdp->allow_restart = simple_strtoul(buf, NULL, 10);
> @@ -369,6 +369,7 @@ static const char *lbp_mode[] = {
> [SD_LBP_WS16]   = "writesame_16",
> [SD_LBP_WS10]   = "writesame_10",
> [SD_LBP_ZERO]   = "writesame_zero",
> +   [SD_ZBC_RESET_WP]   = "reset_wp",
> [SD_LBP_DISABLE]= "disabled",
>  };
>
> @@ -391,6 +392,13 @@ provisioning_mode_store(struct device *dev, struct 
> device_attribute *attr,
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
>
> +   if (sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
> +   if (!strncmp(buf, lbp_mode[SD_ZBC_RESET_WP], 20)) {
> +   sd_config_discard(sdkp, SD_ZBC_RESET_WP);
> +   return count;
> +   }
> +   return -EINVAL;
> +   }
> if (sdp->type != TYPE_DISK)
> return -EINVAL;
>
> @@ -458,7 +466,7 @@ max_write_same_blocks_store(struct device *dev, struct 
> device_attribute *attr,
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
>
> -   if (sdp->type != TYPE_DISK)
> +   if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
> return -EINVAL;
>
> err = kstrtoul(buf, 10, );
> @@ -631,7 +639,7 @@ static unsigned char sd_setup_protect_cmnd(struct 
> scsi_cmnd *scmd,
> return protect;
>  }
>
> -static void 

Re: patch "libata: Add support for SCT Write Same" breaks system

2016-09-09 Thread Shaun Tancheff
On Fri, Sep 9, 2016 at 10:36 AM, Tejun Heo <t...@kernel.org> wrote:
> Hello, Shaun.
>
> On Fri, Sep 09, 2016 at 10:26:44AM -0500, Shaun Tancheff wrote:
>> I'm looking into it now. Let me see if I can reproduce this on any of my
>> hardware.
>>
>> If not there are a couple of options ... one is to only enable for ZBC
>> devices
>> where this explicitly required by the spec.
>>
>> Or disable for devices that report support trim?
>
> I'd much prefer enabling this only on ZBC devices.  There isn't any
> real benefits to !ZBC devices, right?  Using non-essential features on
> ATA never goes well.

I've posted a patch for !ZBC.

Mike, can you confirm if this works for you?

>
> Thanks.
>
> --
> tejun

Thanks!

(Apologies for the html reply earlier)
-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Some drives failing on SCT Write Same

2016-09-09 Thread Shaun Tancheff
Restrict support SCT Write Same to devices which also support ZAC where 
support is required.

Reported-by: Mike Krinkin <krinkin@gmail.com>
Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
 drivers/ata/libata-scsi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 2f5487f..9cceb4a 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3562,9 +3562,9 @@ static unsigned int ata_scsiop_maint_in(struct 
ata_scsi_args *args, u8 *rbuf)
supported = 3;
break;
case WRITE_SAME_16:
-   if (ata_id_sct_write_same(dev->id))
-   supported = 3;
-   break;
+   if (!ata_id_sct_write_same(dev->id))
+   break;
+   /* fallthrough: if SCT ... only enable for ZBC */
case ZBC_IN:
case ZBC_OUT:
if (ata_id_zoned_cap(dev->id) ||
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch "libata: Add support for SCT Write Same" breaks system

2016-09-09 Thread Shaun Tancheff
On Fri, Sep 9, 2016 at 10:36 AM, Tejun Heo <t...@kernel.org> wrote:
> Hello, Shaun.
>
> On Fri, Sep 09, 2016 at 10:26:44AM -0500, Shaun Tancheff wrote:
>> I'm looking into it now. Let me see if I can reproduce this on any of my
>> hardware.
>>
>> If not there are a couple of options ... one is to only enable for ZBC
>> devices
>> where this explicitly required by the spec.
>>
>> Or disable for devices that report support trim?
>
> I'd much prefer enabling this only on ZBC devices.  There isn't any
> real benefits to !ZBC devices, right?  Using non-essential features on
> ATA never goes well.

Sure I'm fine with that.

I'll move the WRITE SAME support to be conditional on ZBC.
Sending a patch as soon as it's tested.

> Thanks.
>
> --
> tejun

Thanks
--
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/2 RESEND] Add bio/request flags to issue ZBC/ZAC commands

2016-08-25 Thread Shaun Tancheff
On Thu, Aug 25, 2016 at 9:31 PM, Damien Le Moal <damien.lem...@hgst.com> wrote:
>
> Shaun,
>
> On 8/25/16 05:24, Shaun Tancheff wrote:
>>
>> (RESENDING to include f2fs, fs-devel and dm-devel)
>>
>> Add op flags to access to zone information as well as open, close
>> and reset zones:
>>   - REQ_OP_ZONE_REPORT - Query zone information (Report zones)
>>   - REQ_OP_ZONE_OPEN - Explicitly open a zone for writing
>>   - REQ_OP_ZONE_CLOSE - Explicitly close a zone
>>   - REQ_OP_ZONE_FINISH - Explicitly finish a zone
>>   - REQ_OP_ZONE_RESET - Reset Write Pointer to start of zone
>>
>> These op flags can be used to create bio's to control zoned devices
>> through the block layer.
>
>
> I still have a hard time seeing the need for the REQ_OP_ZONE_REPORT
> operation assuming that the device queue will hold a zone information cache,
> Hannes RB-tree or your array type, whichever.
>
> Let's try to think simply here: if the disk user (and FS, a device mapper or
> an application doing raw disk accesses) wants to access the disk zone
> information, why would it need to issue a BIO when calling
> blkdev_lookup_zone would exactly give that information straight out of
> memory (so much faster) ? I thought hard about this, but cannot think of any
> value for the BIO-to-disk option. It seems to me to be equivalent to
> systematically doing a page cache read even if the page cache tells us that
> the page is up-to-date...

Firstly the BIO abstraction here gives a common interface to
getting the zone information and works even for embedded
systems that are not willing / convinced to enable
SCSI_ZBC + BLK_ZONED.

Secondly when SCSI_ZBC + BLK_ZONED are enabled it just
returns from the zone cache [as you can hopefully find
in the second half of this series]. I did add a 'force' option
but it's not intended to be used lightly.

Thirdly it is my belief that BIO abstraction is more easily
adapted to working with [and through] the device mapper
layer (s).

Today we both have the issue where if a file system
supports working with a ZBC device there can be no
device mapper stacked between the file system and
the actual zoned device. This is also true of our respective
device mapper targets.

It is my current belief that teaching the device mapper
layer to include REQ_OP_ZONE* operations is relatively
straight forward and can be done w/o affecting existing
targets that don't specifically need to operate on zones.
Something similar to the way flush is handled currently.
If the target doesn't ask to see zone operations the default
mapping rules apply.

Examples of why I would like to add REQ_OP_ZONE*
support to the device mapper:

I think it would be really neat if I could just to a quick
dm-linear and put big chunk of SSD in front of dm-zoned
or dm-zdm as it would be a nice way to boost performance.

Similarly it enable using dm-linear to stitch together enough
conventional space with a ZBC drive to see if Dave Chinner's
XFS proposal from a couple of years ago could work.

> Moreover, issuing a report zone to the disk may return information that is
> in fact incorrect, as that would not take into account the eventual set of
> write requests that was dispatched but not yet processed by the disk (some
> zone write pointer may be reported with a value lower than what the zone
> cache maintains).

Yes but issuing a zone report to media is not the expected path
when the zone cache is available. It is there to 'force' a re-sync
and it is intended that the user of the call knows that the force
is being applied and wants it to happen. Perhaps I should make
it two flags? One to force a reply form the device and second
flag to re-sync the zone cache with the result? There is one
piece of information that can only be retrieved by going to the
device and that is the 'non-seq resources' flag and it is only
used by Host Aware devices ... as far as I understand.

> Dealing (and fixing) these inconsistencies would force an update of the
> report zone result using the information of the zone cache, which in itself
> sounds like a good justification of not doing a report zones in the first
> place.

When report zones is just pulling from the zone cache it should
not be a problem. So the normal activity [when SCSI_ZBC +
BLK_ZONED are enabled] should not be introducing any
inconsistencies.

> I am fine with the other operations, and in fact having a BIO interface for
> them to send down to the SCSI layer is better than any other method. It will
> causes them to be seen in sd_init_command, which is the path taken for read
> and write commands too. So all zone cache information checking and updating
> can be done in that single place and serialized with a spinlock. Maintenance
> of the zone cache information becomes very easy.
>
> Any divergence of the zone cache infor

Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-25 Thread Shaun Tancheff
Tom,

In my opinion this patch you submitted is simply making the code less
safe against a buffer overflow without a sufficiently good reason.

In future please comment on other patches as replies to those patches.
Mixing them together is just confusing.

--Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 2/2 RESEND] Add ioctl to issue ZBC/ZAC commands via block layer

2016-08-24 Thread Shaun Tancheff
(RESENDING to include f2fs, fs-devel and dm-devel)

Add support for ZBC ioctl's
BLKREPORT - Issue Report Zones to device.
BLKZONEACTION - Issue a Zone Action (Close, Finish, Open, or Reset)

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v8:
 - Changed ioctl for zone actions to a single ioctl that takes 
   a structure including the zone, zone action, all flag, and force option
 - Mapped REQ_META flag to 'force unit access' for zone operations
v6:
 - Added GFP_DMA to gfp mask.
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's

 block/ioctl.c | 149 ++
 include/uapi/linux/blkzoned_api.h |  30 +++-
 include/uapi/linux/fs.h   |   1 +
 3 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..d760523 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -194,6 +194,151 @@ int blkdev_reread_part(struct block_device *bdev)
 }
 EXPORT_SYMBOL(blkdev_reread_part);
 
+static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode,
+ void __user *parg)
+{
+   int error = -EFAULT;
+   gfp_t gfp = GFP_KERNEL | GFP_DMA;
+   void *iopg = NULL;
+   struct bdev_zone_report_io *bzrpt = NULL;
+   int order = 0;
+   struct page *pgs = NULL;
+   u32 alloc_size = PAGE_SIZE;
+   unsigned int op_flags = 0;
+   u8 opt = 0;
+
+   if (!(mode & FMODE_READ))
+   return -EBADF;
+
+   iopg = (void *)get_zeroed_page(gfp);
+   if (!iopg) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   bzrpt = iopg;
+   if (copy_from_user(bzrpt, parg, sizeof(*bzrpt))) {
+   error = -EFAULT;
+   goto report_zones_out;
+   }
+   if (bzrpt->data.in.return_page_count > alloc_size) {
+   int npages;
+
+   alloc_size = bzrpt->data.in.return_page_count;
+   npages = (alloc_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+   pgs = alloc_pages(gfp, ilog2(npages));
+   if (pgs) {
+   void *mem = page_address(pgs);
+
+   if (!mem) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   order = ilog2(npages);
+   memset(mem, 0, alloc_size);
+   memcpy(mem, bzrpt, sizeof(*bzrpt));
+   bzrpt = mem;
+   } else {
+   /* Result requires DMA capable memory */
+   pr_err("Not enough memory available for request.\n");
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   } else {
+   alloc_size = bzrpt->data.in.return_page_count;
+   }
+   if (bzrpt->data.in.force_unit_access)
+   op_flags |= REQ_META;
+   opt = bzrpt->data.in.report_option;
+   error = blkdev_issue_zone_report(bdev, op_flags,
+   bzrpt->data.in.zone_locator_lba, opt,
+   pgs ? pgs : virt_to_page(iopg),
+   alloc_size, GFP_KERNEL);
+   if (error)
+   goto report_zones_out;
+
+   if (pgs) {
+   void *src = bzrpt;
+   u32 off = 0;
+
+   /*
+* When moving a multi-order page with GFP_DMA
+* the copy to user can trap ""
+* so instead we copy out 1 page at a time.
+*/
+   while (off < alloc_size && !error) {
+   u32 len = min_t(u32, PAGE_SIZE, alloc_size - off);
+
+   memcpy(iopg, src + off, len);
+   if (copy_to_user(parg + off, iopg, len))
+   error = -EFAULT;
+   off += len;
+   }
+   } else {
+   if (copy_to_user(parg, iopg, alloc_size))
+   error = -EFAULT;
+   }
+
+report_zones_out:
+   if (pgs)
+   __free_pages(pgs, order);
+   if (iopg)
+   free_page((unsigned long)iopg);
+   return error;
+}
+
+static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode,
+ void __user *parg)
+{
+   unsigned int op = 0;
+   unsigned int op_flags = 0;
+   sector_t lba;
+   struct bdev_zone_action za;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   /* When acting on zones we explicitly disallow using a partition. */
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
+   if 

[PATCH v8 1/2 RESEND] Add bio/request flags to issue ZBC/ZAC commands

2016-08-24 Thread Shaun Tancheff
(RESENDING to include f2fs, fs-devel and dm-devel)

Add op flags to access to zone information as well as open, close
and reset zones:
  - REQ_OP_ZONE_REPORT - Query zone information (Report zones)
  - REQ_OP_ZONE_OPEN - Explicitly open a zone for writing
  - REQ_OP_ZONE_CLOSE - Explicitly close a zone
  - REQ_OP_ZONE_FINISH - Explicitly finish a zone
  - REQ_OP_ZONE_RESET - Reset Write Pointer to start of zone

These op flags can be used to create bio's to control zoned devices
through the block layer.

This is useful for file systems and device mappers that need explicit
control of zoned devices such as Host Managed and Host Aware SMR drives,

Report zones is a device read that requires a buffer.

Open, Close, Finish and Reset are device commands that have no
associated data transfer.
  Open -   Open is a zone for writing.
  Close -  Disallow writing to a zone.
  Finish - Disallow writing a zone and set the WP to the end
   of the zone.
  Reset -  Discard data in a zone and reset the WP to the start
   of the zone.

Sending an LBA of ~0 will attempt to operate on all zones.
This is typically used with Reset to wipe a drive as a Reset
behaves similar to TRIM in that all data in the zone(s) is deleted.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option flag is useful as it can reduce the number
of zones in each report, but not critical.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v8:
 - Added Finish Zone op
 - Fixed report zones copy to user to work when HARDENED_USERCOPY is enabled
v6:
 - Added GFP_DMA to gfp mask.
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.

 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  94 
 drivers/scsi/sd.c | 121 +
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   8 +-
 include/linux/blk_types.h |   7 +-
 include/linux/blkdev.h|   1 +
 include/linux/blkzoned_api.h  |  25 ++
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 182 ++
 10 files changed, 447 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a306795..aedf311 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12984,6 +12984,15 @@ F: Documentation/networking/z8530drv.txt
 F: drivers/net/hamradio/*scc.c
 F: drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M: Shaun Tancheff <shaun.tanch...@seagate.com>
+W: http://seagate.com
+W: https://github.com/Seagate/ZDM-Device-Mapper
+L: linux-bl...@vger.kernel.org
+S: Maintained
+F: include/linux/blkzoned_api.h
+F: include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M: Seth Jennings <sjenn...@redhat.com>
 L: linux...@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..e92bd56 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -266,3 +266,97 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:  target blockdev
+ * @op_flags:  extra bio rw flags. If unsure, use 0.
+ * @sector:starting sector (report will include this sector).
+ * @opt:   See: zone_report_option, default is 0 (all zones).
+ * @page:  one or more contiguous pages.
+ * @pgsz:  up to size of page in bytes, size of report.
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+sector_t sector, u8 opt, struct page *page,
+size_t pgsz, gfp_t gfp_mask)
+{
+   struct bdev_zone_report *conv = page_address(page);
+   struct bio *bio;
+   unsigned int nr_iovecs = 1;
+   int ret = 0;
+
+   if (pgsz < (sizeof(struct bdev_zone_report) +
+   sizeof(struct bdev_zone_descriptor)))
+   return -EINVAL;
+
+   bio = bio_allo

[PATCH v8 0/2 RESEND] Block layer support ZAC/ZBC commands

2016-08-24 Thread Shaun Tancheff
(RESENDING to include f2fs, fs-devel and dm-devel)

Hi Jens,

This series is based on linus' v4.8-rc2 branch.

As Host Aware drives are becoming available we would like to be able
to make use of such drives. This series is also intended to be
suitable for use by Host Managed drives.

ZBC [and ZAC] drives add new commands for discovering and working
with Zones.

Part one of this series expands the bio/request reserved op size from
3 to 4 bits and then adds op codes for each of the ZBC commands:
   Report zones, close zone, finish zone, open zone and reset zone.

Part two of this series deals with integrating these new bio/request
op's with Hannes' zone cache.

This extends the ZBC support up to the block layer allowing direct
control by file systems or device mapper targets. Also by deferring
the zone handling to the authoritative subsystem there is an overall
lower memory usage for holding the active zone information as well
as clarifying responsible party for maintaining the write pointer
for each active zone.

By way of example a DM target may have several writes in progress. To sector
(or lba) for those writes will each depend on the previous write. While the
drive's write pointer will be updated as writes are completed the DM target
will be maintaining both where the next write should be scheduled from and
where the write pointer is based on writes completed w/o errors.

Knowing the drive zone topology enables DM targets and file systems to
extend their block allocation schemes and issue write pointer resets (or
discards) that are zone aligned.

A perhaps non-obvious approach is that a conventional drive will
returns a zone report descriptor with a single large conventional zone.
This is intended to allow a collection of zoned and non-zoned media to
be stitched together to provide a file system with a zoned device with
conventional space mapped to where it is useful.

Patches for util-linux can be found here:
g...@github.com:stancheff/util-linux.git v2.28.1+biof

https://github.com/stancheff/util-linux/tree/v2.28.1%2Bbiof

This patch is available here:
https://github.com/stancheff/linux/tree/v4.8-rc2%2Bbiof.v8

g...@github.com:stancheff/linux.git v4.8-rc2+biof.v8

v8:
 - Changed zone report to default to reading from zone cache.
 - Changed ioctl for zone commands to support forcing a query or command
   to be sent to media.
 - Fixed report zones copy to user to work when HARDENED_USERCOPY is enabled
v7:
 - Initial support for Hannes' zone cache.
v6:
 - Fix page alloc to include DMA flag for ioctl.
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
 - Dropped ata16 hackery
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
 - Use zoned report reserved bit for ata-passthrough flag.

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Moved include/uapi/linux/fs.h changes to patch 3
 - Fixed commit message for first patch in series.


Shaun Tancheff (2):
  Add bio/request flags to issue ZBC/ZAC commands
  Add ioctl to issue ZBC/ZAC commands via block layer

 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  94 +
 block/ioctl.c | 149 +++
 drivers/scsi/sd.c | 121 ++
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   8 +-
 include/linux/blk_types.h |   7 +-
 include/linux/blkdev.h|   1 +
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 210 ++
 include/uapi/linux/fs.h   |   1 +
 12 files changed, 625 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] On Discard either do Reset WP or Write Same

2016-08-23 Thread Shaun Tancheff
On Mon, Aug 22, 2016 at 8:25 PM, Damien Le Moal <damien.lem...@hgst.com> wrote:
>
> Shaun,
>
> On 8/23/16 09:22, Shaun Tancheff wrote:
>> On Mon, Aug 22, 2016 at 6:57 PM, Damien Le Moal <damien.lem...@hgst.com> 
>> wrote:

>> Also you may note that in my patch to get Host Aware working
>> with the zone cache I do not include the runt zone in the cache.
>
> Why not ? The RB-tree will handle it just fine (the insert and lookup
> code as Hannes had them was not relying on a constant zone size).

A good point. I didn't pay too much attention while brining this
forward. I think a few of my hacks may be pointless now. I'll
try to rework it and get rid of the runt check.

>> So as it sits I need this fallback otherwise doing blkdiscard over
>> the whole device ends in a error, as well as mkfs.f2fs et. al.
>
> Got it, but I do not see a problem with including it. I have not checked
> the code, but the split of a big discard call into "chunks" should be
> already handling the last chunk and make sure that the operation does
> not exceed the device capacity (in any case, that's easy to fix in the
> sd_zbc_setup_discard code).

Yes I agree the split of big discards does handle the last chunk correctly.

>>> Some 10TB host managed disks out there have 1% conventional zone space,
>>> that is 100GB of capacity. When issuing a "reset all", doing a write
>>> same in these zones will take forever... If the user really wants zeroes
>>> in those zones, let it issue a zeroout.
>>>
>>> I think that it would a better choice to simply not report
>>> discard_zeroes_data as true and do nothing for conventional zones reset.
>>
>> I think that would be unfortunate for Host Managed but I think it's
>> the right choice for Host Aware at this time. So either we base
>> it on disk type or we have some other config flag added to sysfs.
>
> I do not see any difference between host managed and host aware. Both
> define the same behavior for reset, and both end up in a NOP for
> conventional zone reset (no data "erasure" required by the standard).
> For write pointer zones, reading unwritten LBAs returns the
> initialization pattern, with the exception of host-managed disks with
> the URSWRZ bit set to 0. But that case is covered in sd.c, so the
> behavior is consistent across all models. So why forcing data zeroing
> when the standards do not mandate it ?

Well you do have point.
It appears to be only mkfs and similar tools that are really utilizing
discard zeros data at the moment.

I did a quick test:

mkfs -t ext4 -b 4096 -g 32768 -G 32  \
 -E 
lazy_itable_init=0,lazy_journal_init=0,offset=0,num_backup_sb=0,packed_meta_blocks=1,discard
  \
 -O flex_bg,extent,sparse_super2

   - discard zeroes data true - 3 minutess
   - discard zeroes data false - 6 minutes
So for the smaller conventional space on the current HA drive
there is some advantage to enabling discard zeroes data.

However for a larger conventional space you are correct the overall
impact is worse performance.

For some reason I had been assuming that some file systems
used or relied on discard zeroes data during normal operation.
Now that I am looking for that I don't seem to be finding any
evidence of it, so aside from mkfs I don't have as good an
argument discard zeroes data as I though I did.

Regards,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-23 Thread Shaun Tancheff
On Tue, Aug 23, 2016 at 5:17 AM, Tom Yan <tom.t...@gmail.com> wrote:
> Wait a minute. I think you missed or misunderstood something when you
> listen to someone's opinion in that we should switch to sglist buffer.

No, I think I can trust Christoph Hellwig <h...@lst.de>

> I think the danger people referred to is exactly what is revealed when
> the ugly code is removed in this commit (it doesn't mean that the code
> should be kept though).
>
> The original buffer appears to be open:
> buf = page_address(sg_page(scsi_sglist(scmd)));

Which is unsafe.

> While the new buffer you adopted in in ata_format_sct_write_same() and
> ata_format_dsm_trim_descr() is of fixed size:

Yes ... it is a temporary response buffer for simulated commands used to
copy data to and from the command sg_list so as not to hold irqs while
modifying the buffer.

> buffer = ((void *)ata_scsi_rbuf);
>
> sctpg = ((void *)ata_scsi_rbuf);
>
> because:
>
> #define ATA_SCSI_RBUF_SIZE  4096
> ...
> static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
>
> So the sglist buffer is always 4096 bytes.

No. The sglist buffer attached to the write same / trim command
is always sdp->sector_size

> And hence you can probably safely use ATA_SCSI_RBUF_SIZE as the buflen
> param in the sg_copy_from_buffer() calls (at least in the case of
> ata_format_sct_write_same()).

No. SCT Write Same has a fixed single 512 byte transfer.

However, the return value of
> ata_format_dsm_trim_descr() should still always be used_bytes since
> that is needed by the ata taskfile construction.

So long as it does not exceed its sglist/sector_size buffer.

> You may want to check (n_block / 65535 * 8 > ATA_SCSI_RBUF_SIZE). If
> it is true, then perhaps we may want to return 0, and make the SATL
> response with invalid CDB field if we catch that.

No that is not quite right you need to check if you are
overflowing either RBUF or sdp->sector_size.

> Though IMHO this is really NOT a reason that is strong enough to
> prevent this patch from entering the repo first.

> On 23 August 2016 at 09:36, Tom Yan <tom.t...@gmail.com> wrote:
>> On 23 August 2016 at 09:18, Shaun Tancheff <shaun.tanch...@seagate.com> 
>> wrote:
>>> On Tue, Aug 23, 2016 at 3:37 AM, Tom Yan <tom.t...@gmail.com> wrote:
>>>> On 23 August 2016 at 07:30, Shaun Tancheff <sh...@tancheff.com> wrote:
>>>
>>>> If we really want/need to avoid hitting some real buffer limit (e.g.
>>>> maximum length of scatter/gather list?), then we should in some way
>>>> check n_block against that. If it is too large we then return
>>>> used_bytes = 0 (optionally with some follow-up to add a response to
>>>> such return value or so).
>>>
>>> Yes there is a real buffer limit, I can think of these two options:
>>> 1- Assume the setups from sd_setup_discard_cmnd() and/
>>>or sd_setup_write_same_cmnd() are providing an sglist of
>>>sdp->sector_size via scsi_init_io()
>>
>> That sounds completely wrong. The scatter/gather list we are talking
>> about here has nothing to do with the SCSI or block layer anymore. The
>> SATL has _already_ parsed the SCSI Write Same (16) command and is
>> packing ranges/payload according to that in this stage. If there is
>> any limit it would probably the max_segment allowed by the host driver
>> (e.g. ahci).
>>
>> It doesn't seem to make sense to me either that we would need to
>> prevent sglist overflow in such level. Doesn't that mean we would need
>> to do the same checking (specifically, as in hard coding checks in all
>> kinds of procedures) in every use of scatter/gather list? That doesn't
>> sound right at all.
>>
>>>
>>> 2- Find (or write) a suitable sg_get_size(sgl, nents) to walk the
>>> sglist and calculate the available buffer size.
>>>
>>> #2 sounds like more fun but I'm not sure it's what people would prefer to 
>>> see.
>>
>> No idea if such thing exists / makes sense at all.
>>
>>>
>>> --
>>> Shaun



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-23 Thread Shaun Tancheff
On Tue, Aug 23, 2016 at 3:37 AM, Tom Yan <tom.t...@gmail.com> wrote:
> On 23 August 2016 at 07:30, Shaun Tancheff <sh...@tancheff.com> wrote:

> If we really want/need to avoid hitting some real buffer limit (e.g.
> maximum length of scatter/gather list?), then we should in some way
> check n_block against that. If it is too large we then return
> used_bytes = 0 (optionally with some follow-up to add a response to
> such return value or so).

Yes there is a real buffer limit, I can think of these two options:
1- Assume the setups from sd_setup_discard_cmnd() and/
   or sd_setup_write_same_cmnd() are providing an sglist of
   sdp->sector_size via scsi_init_io()

2- Find (or write) a suitable sg_get_size(sgl, nents) to walk the
sglist and calculate the available buffer size.

#2 sounds like more fun but I'm not sure it's what people would prefer to see.

--
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-23 Thread Shaun Tancheff
On Mon, Aug 22, 2016 at 3:53 PM, Tom Yan <tom.t...@gmail.com> wrote:
> On 22 August 2016 at 20:32, Shaun Tancheff <shaun.tanch...@seagate.com> wrote:
>> On Mon, Aug 22, 2016 at 3:07 PM, Tom Yan <tom.t...@gmail.com> wrote:
>>> I don't see how that's possible. count / n_block will always be
>>> smaller than 65535 * ATA_MAX_TRIM_RNUM(64) = 4194240. Not to mention
>>> that isn't even a "buffer limit" anyway. By SG_IO do you mean like
>>> SCSI Write Same commands that issued with sg_write_same or so? If
>>> that's the case, that's what exactly commit 5c79097a28c2
>>> ("libata-scsi: reject WRITE SAME (16) with n_block that exceeds
>>> limit") is for.
>>
>> Ah, I see. You are guarding the only user of ata_set_lba_range_entries().
>
> Yup. It is the only right thing to do anyway, that we leave the
> function "open" and guard per context when we use it. Say if
> ata_set_lba_range_entries() is gonna be a function that is shared by
> others, it would only make this commit more important. As I said, we
> did not guard it with a certain buffer limit, but merely redundantly
> guard it with a ("humanized") limit that applies to TRIM only.

But  the "humanized" limit is the one you just added and proceeded to
change ata_set_lba_range_entries(). You changed from a buffer size
to use "num" instead and now you want to remove the protection
entirely?

Why not just change to put this in front of ata_set_lba_range_entries()

if (n_block > 65535 * ATA_MAX_TRIM_RNUM) {
 fp = 2;
goto invalid_fld;
}

And then restore ata_set_lba_range_entries() to how it looked
before you changed it in commit:

2983860c7 (libata-scsi: avoid repeated calculation of number of TRIM ranges)

Then you can have ata_set_lba_range_entries() take the buffer size ...
something like the following would be fine:

  size = ata_set_lba_range_entries(buf, scmd->device->sector_size,
block, n_block);

Now things are happily protected against both exceeding the b0 limit(s) and
overflowing the sglist buffer.

--
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] On Discard either do Reset WP or Write Same

2016-08-22 Thread Shaun Tancheff
On Mon, Aug 22, 2016 at 6:57 PM, Damien Le Moal <damien.lem...@hgst.com> wrote:
>
> Shaun,
>
> On 8/22/16 13:31, Shaun Tancheff wrote:
> [...]
>> -int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq,
>> -  sector_t sector, unsigned int num_sectors)
>> +int sd_zbc_setup_discard(struct scsi_cmnd *cmd)
>>  {
>> - struct blk_zone *zone;
>> + struct request *rq = cmd->request;
>> + struct scsi_device *sdp = cmd->device;
>> + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
>> + sector_t sector = blk_rq_pos(rq);
>> + unsigned int nr_sectors = blk_rq_sectors(rq);
>>   int ret = BLKPREP_OK;
>> + struct blk_zone *zone;
>>   unsigned long flags;
>> + u32 wp_offset;
>> + bool use_write_same = false;
>>
>>   zone = blk_lookup_zone(rq->q, sector);
>> - if (!zone)
>> + if (!zone) {
>> + /* Test for a runt zone before giving up */
>> + if (sdp->type != TYPE_ZBC) {
>> + struct request_queue *q = rq->q;
>> + struct rb_node *node;
>> +
>> + node = rb_last(>zones);
>> + if (node)
>> + zone = rb_entry(node, struct blk_zone, node);
>> + if (zone) {
>> + spin_lock_irqsave(>lock, flags);
>> + if ((zone->start + zone->len) <= sector)
>> + goto out;
>> + spin_unlock_irqrestore(>lock, flags);
>> + zone = NULL;
>> + }
>> + }
>>   return BLKPREP_KILL;
>> + }
>
> I do not understand the point of this code here to test for the runt
> zone. As long as sector is within the device maximum capacity (in 512B
> unit), blk_lookup_zone will return the pointer to the zone structure
> containing that sector (the RB-tree does not have any constraint
> regarding zone size). The only case where NULL would be returned is if
> discard is issued super early right after the disk is probed and before
> the zone refresh work has completed. We can certainly protect against
> that by delaying the discard.

As you can see I am not including Host Managed in the
runt check.

Also you may note that in my patch to get Host Aware working
with the zone cache I do not include the runt zone in the cache.
So as it sits I need this fallback otherwise doing blkdiscard over
the whole device ends in a error, as well as mkfs.f2fs et. al.

>>   spin_lock_irqsave(>lock, flags);
>> -
>>   if (zone->state == BLK_ZONE_UNKNOWN ||
>>   zone->state == BLK_ZONE_BUSY) {
>>   sd_zbc_debug_ratelimit(sdkp,
>> -"Discarding zone %zu state %x, 
>> deferring\n",
>> +"Discarding zone %zx state %x, 
>> deferring\n",
>
> Sector values are usually displayed in decimal. Why use Hex here ? At
> least "0x" would be needed to avoid confusion I think.

Yeah, my brain is lazy about converting very large
numbers to powers of 2. So it's much easier to spot
zone alignment here.



>>  zone->start, zone->state);
>>   ret = BLKPREP_DEFER;
>>   goto out;
>> @@ -406,46 +428,80 @@ int sd_zbc_setup_discard(struct scsi_disk *sdkp, 
>> struct request *rq,
>>   if (zone->state == BLK_ZONE_OFFLINE) {
>>   /* let the drive fail the command */
>>   sd_zbc_debug_ratelimit(sdkp,
>> -"Discarding offline zone %zu\n",
>> +"Discarding offline zone %zx\n",
>>  zone->start);
>>   goto out;
>>   }
>> -
>> - if (!blk_zone_is_smr(zone)) {
>> + if (blk_zone_is_cmr(zone)) {
>> + use_write_same = true;
>>   sd_zbc_debug_ratelimit(sdkp,
>> -"Discarding %s zone %zu\n",
>> -blk_zone_is_cmr(zone) ? "CMR" : 
>> "unknown",
>> +"Discarding CMR zone %zx\n",
>>  zone->start);
>> - ret = BLKPREP_DONE;
>>   goto out;
>>   }
>
> Some 10TB host managed disks out ther

Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-22 Thread Shaun Tancheff
On Mon, Aug 22, 2016 at 3:07 PM, Tom Yan  wrote:
> I don't see how that's possible. count / n_block will always be
> smaller than 65535 * ATA_MAX_TRIM_RNUM(64) = 4194240. Not to mention
> that isn't even a "buffer limit" anyway. By SG_IO do you mean like
> SCSI Write Same commands that issued with sg_write_same or so? If
> that's the case, that's what exactly commit 5c79097a28c2
> ("libata-scsi: reject WRITE SAME (16) with n_block that exceeds
> limit") is for.

Ah, I see. You are guarding the only user of ata_set_lba_range_entries().
Still if you are going to do that you have to alert any new user that they
must have an appropriately sized buffer to be overwriting.

Better to move it out of ata.h then the limit the scope of accidental
misuse?

Regards,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-22 Thread Shaun Tancheff
On Mon, Aug 22, 2016 at 1:55 PM,   wrote:
> From: Tom Yan 
>
> In commit 5c79097a28c2 ("libata-scsi: reject WRITE SAME (16) with
> n_block that exceeds limit"), it is made sure that
> ata_set_lba_range_entries() will never be called with a request
> size (n_block) that is larger than the number of blocks that a
> 512-byte block TRIM payload can describe (65535 * 64 = 4194240),
> in addition to acknowlegding the SCSI/block layer with the same
> limit by advertising it as the Maximum Write Same Length.
>
> Therefore, it is unnecessary to hard code the same limit in
> ata_set_lba_range_entries() itself, which would only cost extra
> maintenance effort. Such effort can be noticed in, for example,
> commit 2983860c7668 ("libata-scsi: avoid repeated calculation of
> number of TRIM ranges").
>
> Signed-off-by: Tom Yan 
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index be9c76c..9b74ecb 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3322,7 +3322,7 @@ static unsigned int ata_scsi_write_same_xlat(struct 
> ata_queued_cmd *qc)
> buf = page_address(sg_page(scsi_sglist(scmd)));
>
> if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
> -   size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, 
> block, n_block);
> +   size = ata_set_lba_range_entries(buf, block, n_block);
> } else {
> fp = 2;
> goto invalid_fld;
> diff --git a/include/linux/ata.h b/include/linux/ata.h
> index adbc812..5e2e9ad 100644
> --- a/include/linux/ata.h
> +++ b/include/linux/ata.h
> @@ -1077,19 +1077,19 @@ static inline void ata_id_to_hd_driveid(u16 *id)
>   * TO NV CACHE PINNED SET.
>   */
>  static inline unsigned ata_set_lba_range_entries(void *_buffer,
> -   unsigned num, u64 sector, unsigned long count)
> +   u64 sector, unsigned long count)
>  {
> __le64 *buffer = _buffer;
> unsigned i = 0, used_bytes;
>
> -   while (i < num) {
> -   u64 entry = sector |
> -   ((u64)(count > 0x ? 0x : count) << 48);
> +   while (count > 0) {
> +   u64 range, entry;
> +
> +   range = count > 0x ? 0x : count;
> +   entry = sector | (range << 48);
> buffer[i++] = __cpu_to_le64(entry);
> -   if (count <= 0x)
> -   break;
> -   count -= 0x;
> -   sector += 0x;
> +   count -= range;
> +   sector += range;
> }

I think the problem here is that I can now inject a buffer overflow
via SG_IO.

> used_bytes = ALIGN(i * 8, 512);
> --
> 2.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Migrate zone cache from RB-Tree to arrays of descriptors

2016-08-22 Thread Shaun Tancheff
On Mon, Aug 22, 2016 at 2:11 AM, Hannes Reinecke <h...@suse.de> wrote:
> On 08/22/2016 06:34 AM, Shaun Tancheff wrote:
>> Currently the RB-Tree zone cache is fast and flexible. It does
>> use a rather largish amount of ram. This model reduces the ram
>> required from 120 bytes per zone to 16 bytes per zone with a
>> moderate transformation of the blk_zone_lookup() api.
>>
>> This model is predicated on the belief that most variations
>> on zoned media will follow a pattern of using collections of same
>> sized zones on a single device. Similar to the pattern of erase
>> blocks on flash devices being progressivly larger 16K, 64K, ...
>>
>> The goal is to be able to build a descriptor which is both memory
>> efficient, performant, and flexible.
>>
>> Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
>> ---
>>  block/blk-core.c   |2 +-
>>  block/blk-sysfs.c  |   31 +-
>>  block/blk-zoned.c  |  103 +++--
>>  drivers/scsi/sd.c  |5 +-
>>  drivers/scsi/sd.h  |4 +-
>>  drivers/scsi/sd_zbc.c  | 1025 
>> +++-
>>  include/linux/blkdev.h |   82 +++-
>>  7 files changed, 716 insertions(+), 536 deletions(-)

> Have you measure the performance impact here?

As far as actual hardware (HostAware) I am seeing the same
I/O performance. I suspect its just that below 100k iops the
zone cache just isn't a bottleneck.

> The main idea behind using an RB-tree is that each single element will
> fit in the CPU cache; using an array will prevent that.
> So we will increase the number of cache flushes, and most likely a
> performance penalty, too.
> Hence I'd rather like to see a performance measurement here before going
> down that road.

I think it will have to be a simulated benchmark, if that's okay.

Of course I'm open to suggestions if there is something you have in mind.
-- 
Regards,
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Migrate zone cache from RB-Tree to arrays of descriptors

2016-08-21 Thread Shaun Tancheff
On Sun, Aug 21, 2016 at 11:34 PM, Shaun Tancheff <sh...@tancheff.com> wrote:
> Currently the RB-Tree zone cache is fast and flexible. It does
> use a rather largish amount of ram. This model reduces the ram
> required from 120 bytes per zone to 16 bytes per zone with a
> moderate transformation of the blk_zone_lookup() api.
>
> This model is predicated on the belief that most variations
> on zoned media will follow a pattern of using collections of same
> sized zones on a single device. Similar to the pattern of erase
> blocks on flash devices being progressivly larger 16K, 64K, ...
>
> The goal is to be able to build a descriptor which is both memory
> efficient, performant, and flexible.
>
> Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
> ---
>  block/blk-core.c   |2 +-
>  block/blk-sysfs.c  |   31 +-
>  block/blk-zoned.c  |  103 +++--
>  drivers/scsi/sd.c  |5 +-
>  drivers/scsi/sd.h  |4 +-
>  drivers/scsi/sd_zbc.c  | 1025 
> +++-
>  include/linux/blkdev.h |   82 +++-
>  7 files changed, 716 insertions(+), 536 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 3a9caf7..3b084a8 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -727,7 +727,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t 
> gfp_mask, int node_id)
> INIT_LIST_HEAD(>blkg_list);
>  #endif
>  #ifdef CONFIG_BLK_DEV_ZONED
> -   q->zones = RB_ROOT;
> +   q->zones = NULL;
>  #endif
> INIT_DELAYED_WORK(>delay_work, blk_delay_work);
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 43f441f..ecbd434 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -232,36 +232,7 @@ static ssize_t queue_max_hw_sectors_show(struct 
> request_queue *q, char *page)
>  #ifdef CONFIG_BLK_DEV_ZONED
>  static ssize_t queue_zoned_show(struct request_queue *q, char *page)
>  {
> -   struct rb_node *node;
> -   struct blk_zone *zone;
> -   ssize_t offset = 0, end = 0;
> -   size_t size = 0, num = 0;
> -   enum blk_zone_type type = BLK_ZONE_TYPE_UNKNOWN;
> -
> -   for (node = rb_first(>zones); node; node = rb_next(node)) {
> -   zone = rb_entry(node, struct blk_zone, node);
> -   if (zone->type != type ||
> -   zone->len != size ||
> -   end != zone->start) {
> -   if (size != 0)
> -   offset += sprintf(page + offset, "%zu\n", 
> num);
> -   /* We can only store one page ... */
> -   if (offset + 42 > PAGE_SIZE) {
> -   offset += sprintf(page + offset, "...\n");
> -   return offset;
> -   }
> -   size = zone->len;
> -   type = zone->type;
> -   offset += sprintf(page + offset, "%zu %zu %d ",
> - zone->start, size, type);
> -   num = 0;
> -   end = zone->start + size;
> -   } else
> -   end += zone->len;
> -   num++;
> -   }
> -   offset += sprintf(page + offset, "%zu\n", num);
> -   return offset;
> +   return sprintf(page, "%u\n", q->zones ? 1 : 0);
>  }
>  #endif
>
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 975e863..338a1af 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -8,63 +8,84 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>
> -struct blk_zone *blk_lookup_zone(struct request_queue *q, sector_t lba)
> +/**
> + * blk_lookup_zone() - Lookup zones
> + * @q: Request Queue
> + * @sector: Location to lookup
> + * @start: Pointer to starting location zone (OUT)
> + * @len: Pointer to length of zone (OUT)
> + * @lock: Pointer to spinlock of zones in owning descriptor (OUT)
> + */
> +struct blk_zone *blk_lookup_zone(struct request_queue *q, sector_t sector,
> +sector_t *start, sector_t *len,
> +spinlock_t **lock)
>  {
> -   struct rb_root *root = >zones;
> -   struct rb_node *node = root->rb_node;
> +   int iter;
> +   struct blk_zone *bzone = NULL;
> +   struct zone_wps *zi = q->zones;
> +
> +   *start = 0;
> +   *len = 0;
> +   *lock = NULL;
> +
> +   if (!q->zones)
> +   goto out;
>
> -   while (node) {
> -   

[PATCH 2/2] Migrate zone cache from RB-Tree to arrays of descriptors

2016-08-21 Thread Shaun Tancheff
Currently the RB-Tree zone cache is fast and flexible. It does
use a rather largish amount of ram. This model reduces the ram
required from 120 bytes per zone to 16 bytes per zone with a
moderate transformation of the blk_zone_lookup() api.

This model is predicated on the belief that most variations
on zoned media will follow a pattern of using collections of same
sized zones on a single device. Similar to the pattern of erase
blocks on flash devices being progressivly larger 16K, 64K, ...

The goal is to be able to build a descriptor which is both memory
efficient, performant, and flexible.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
 block/blk-core.c   |2 +-
 block/blk-sysfs.c  |   31 +-
 block/blk-zoned.c  |  103 +++--
 drivers/scsi/sd.c  |5 +-
 drivers/scsi/sd.h  |4 +-
 drivers/scsi/sd_zbc.c  | 1025 +++-
 include/linux/blkdev.h |   82 +++-
 7 files changed, 716 insertions(+), 536 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3a9caf7..3b084a8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -727,7 +727,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
INIT_LIST_HEAD(>blkg_list);
 #endif
 #ifdef CONFIG_BLK_DEV_ZONED
-   q->zones = RB_ROOT;
+   q->zones = NULL;
 #endif
INIT_DELAYED_WORK(>delay_work, blk_delay_work);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 43f441f..ecbd434 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -232,36 +232,7 @@ static ssize_t queue_max_hw_sectors_show(struct 
request_queue *q, char *page)
 #ifdef CONFIG_BLK_DEV_ZONED
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
 {
-   struct rb_node *node;
-   struct blk_zone *zone;
-   ssize_t offset = 0, end = 0;
-   size_t size = 0, num = 0;
-   enum blk_zone_type type = BLK_ZONE_TYPE_UNKNOWN;
-
-   for (node = rb_first(>zones); node; node = rb_next(node)) {
-   zone = rb_entry(node, struct blk_zone, node);
-   if (zone->type != type ||
-   zone->len != size ||
-   end != zone->start) {
-   if (size != 0)
-   offset += sprintf(page + offset, "%zu\n", num);
-   /* We can only store one page ... */
-   if (offset + 42 > PAGE_SIZE) {
-   offset += sprintf(page + offset, "...\n");
-   return offset;
-   }
-   size = zone->len;
-   type = zone->type;
-   offset += sprintf(page + offset, "%zu %zu %d ",
- zone->start, size, type);
-   num = 0;
-   end = zone->start + size;
-   } else
-   end += zone->len;
-   num++;
-   }
-   offset += sprintf(page + offset, "%zu\n", num);
-   return offset;
+   return sprintf(page, "%u\n", q->zones ? 1 : 0);
 }
 #endif
 
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 975e863..338a1af 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -8,63 +8,84 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
-struct blk_zone *blk_lookup_zone(struct request_queue *q, sector_t lba)
+/**
+ * blk_lookup_zone() - Lookup zones
+ * @q: Request Queue
+ * @sector: Location to lookup
+ * @start: Pointer to starting location zone (OUT)
+ * @len: Pointer to length of zone (OUT)
+ * @lock: Pointer to spinlock of zones in owning descriptor (OUT)
+ */
+struct blk_zone *blk_lookup_zone(struct request_queue *q, sector_t sector,
+sector_t *start, sector_t *len,
+spinlock_t **lock)
 {
-   struct rb_root *root = >zones;
-   struct rb_node *node = root->rb_node;
+   int iter;
+   struct blk_zone *bzone = NULL;
+   struct zone_wps *zi = q->zones;
+
+   *start = 0;
+   *len = 0;
+   *lock = NULL;
+
+   if (!q->zones)
+   goto out;
 
-   while (node) {
-   struct blk_zone *zone = container_of(node, struct blk_zone,
-node);
+   for (iter = 0; iter < zi->wps_count; iter++) {
+   if (sector >= zi->wps[iter]->start_lba &&
+   sector <  zi->wps[iter]->last_lba) {
+   struct contiguous_wps *wp = zi->wps[iter];
+   u64 index = (sector - wp->start_lba) / wp->zone_size;
 
-   if (lba < zone->start)
-   node = node->rb_left;
-   else if (lba >= zone->start + zone->len)
-   node = node->

[PATCH 1/2] Move ZBC core setup to sd_zbc

2016-08-21 Thread Shaun Tancheff
Move the remaining ZBC specific code to sd_zbc.c

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
 drivers/scsi/sd.c |  65 +--
 drivers/scsi/sd.h |  20 ++
 drivers/scsi/sd_zbc.c | 170 +++---
 3 files changed, 126 insertions(+), 129 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 9a649fa..f144df4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2244,68 +2244,6 @@ static int sd_read_protection_type(struct scsi_disk 
*sdkp, unsigned char *buffer
return ret;
 }
 
-static void sd_read_zones(struct scsi_disk *sdkp, unsigned char *buffer)
-{
-   int retval;
-   unsigned char *desc;
-   u32 rep_len;
-   u8 same;
-   u64 zone_len, lba;
-
-   if (sdkp->zoned != 1 && sdkp->device->type != TYPE_ZBC)
-   /*
-* Device managed or normal SCSI disk,
-* no special handling required
-*/
-   return;
-
-   retval = sd_zbc_report_zones(sdkp, buffer, SD_BUF_SIZE,
-0, ZBC_ZONE_REPORTING_OPTION_ALL, false);
-   if (retval < 0)
-   return;
-
-   rep_len = get_unaligned_be32([0]);
-   if (rep_len < 64) {
-   sd_printk(KERN_WARNING, sdkp,
- "REPORT ZONES report invalid length %u\n",
- rep_len);
-   return;
-   }
-
-   if (sdkp->rc_basis == 0) {
-   /* The max_lba field is the capacity of a zoned device */
-   lba = get_unaligned_be64([8]);
-   if (lba + 1 > sdkp->capacity) {
-   if (sdkp->first_scan)
-   sd_printk(KERN_WARNING, sdkp,
- "Changing capacity from %zu to Max 
LBA+1 %zu\n",
- sdkp->capacity, (sector_t) lba + 1);
-   sdkp->capacity = lba + 1;
-   }
-   }
-
-   /*
-* Adjust 'chunk_sectors' to the zone length if the device
-* supports equal zone sizes.
-*/
-   same = buffer[4] & 0xf;
-   if (same > 3) {
-   sd_printk(KERN_WARNING, sdkp,
- "REPORT ZONES SAME type %d not supported\n", same);
-   return;
-   }
-   /* Read the zone length from the first zone descriptor */
-   desc = [64];
-   zone_len = get_unaligned_be64([8]);
-   sdkp->unmap_alignment = zone_len;
-   sdkp->unmap_granularity = zone_len;
-   blk_queue_chunk_sectors(sdkp->disk->queue,
-   logical_to_sectors(sdkp->device, zone_len));
-
-   sd_zbc_setup(sdkp, zone_len, buffer, SD_BUF_SIZE);
-   sd_config_discard(sdkp, SD_ZBC_RESET_WP);
-}
-
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device 
*sdp,
struct scsi_sense_hdr *sshdr, int sense_valid,
int the_result)
@@ -2611,7 +2549,8 @@ got_data:
  sdkp->physical_block_size);
sdkp->device->sector_size = sector_size;
 
-   sd_read_zones(sdkp, buffer);
+   if (sd_zbc_config(sdkp, buffer, SD_BUF_SIZE))
+   sd_config_discard(sdkp, SD_ZBC_RESET_WP);
 
{
char cap_str_2[10], cap_str_10[10];
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index adbf3e0..fc766db 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -289,10 +289,6 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, 
unsigned int a)
 #define SD_ZBC_WRITE_ERR   2
 
 #ifdef CONFIG_SCSI_ZBC
-
-extern int sd_zbc_report_zones(struct scsi_disk *, unsigned char *, int,
-  sector_t, enum zbc_zone_reporting_options, bool);
-extern int sd_zbc_setup(struct scsi_disk *, u64 zlen, char *buf, int buf_len);
 extern int sd_zbc_setup_zone_report_cmnd(struct scsi_cmnd *cmd, u8 rpt_opt);
 extern int sd_zbc_setup_zone_action(struct scsi_cmnd *cmd);
 extern int sd_zbc_setup_discard(struct scsi_cmnd *cmd);
@@ -303,23 +299,15 @@ extern void sd_zbc_uninit_command(struct scsi_cmnd *cmd);
 extern void sd_zbc_remove(struct scsi_disk *);
 extern void sd_zbc_reset_zones(struct scsi_disk *);
 extern void sd_zbc_update_zones(struct scsi_disk *, sector_t, int, int reason);
+extern bool sd_zbc_config(struct scsi_disk *, void *, size_t);
+
 extern unsigned int sd_zbc_discard_granularity(struct scsi_disk *sdkp);
 
 #else /* CONFIG_SCSI_ZBC */
 
-static inline int sd_zbc_report_zones(struct scsi_disk *sdkp,
- unsigned char *buf, int buf_len,
- sector_t start_sector,
- enum zbc_zone_reporting_options option,
- bool p

[PATCH 0/2] Change zone cache format to use less memory

2016-08-21 Thread Shaun Tancheff
Currently the RB-Tree zone cache is fast and flexible. It does
use a rather largish amount of ram. This model reduces the ram
required from 120 bytes per zone to 16 bytes per zone with a
moderate transformation of the blk_zone_lookup() api.

This model is predicated on the belief that most variations
on zoned media will follow a pattern of using collections of same
sized zones on a single device. Similar to the pattern of erase
blocks on flash devices being progressivly larger 16K, 64K, ...

The goal is to be able to build a descriptor which is both memory
efficient, performant, and flexible.

Shaun Tancheff (2):
  Move ZBC core setup to sd_zbc
  Migrate zone cache from RB-Tree to arrays of descriptors

 block/blk-core.c   |2 +-
 block/blk-sysfs.c  |   31 +-
 block/blk-zoned.c  |  103 +++--
 drivers/scsi/sd.c  |   66 +--
 drivers/scsi/sd.h  |   20 +-
 drivers/scsi/sd_zbc.c  | 1037 +---
 include/linux/blkdev.h |   82 +++-
 7 files changed, 759 insertions(+), 582 deletions(-)

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] Merge ZBC constants

2016-08-21 Thread Shaun Tancheff
Dedupe ZBC/ZAC constants used for reporting options, same code,
zone condition and zone type.

These are all useful to programs consuming zone information from
user space as well so include them in a uapi header.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
 block/blk-lib.c   |   4 +-
 drivers/scsi/sd.c |   2 +-
 include/linux/blkdev.h|  20 -
 include/scsi/scsi_proto.h |  17 
 include/uapi/linux/blkzoned_api.h | 167 +++---
 5 files changed, 103 insertions(+), 107 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index e92bd56..67b9258 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -316,8 +316,8 @@ int blkdev_issue_zone_report(struct block_device *bdev, 
unsigned int op_flags,
__be64 blksz = cpu_to_be64(bdev->bd_part->nr_sects);
 
conv->maximum_lba = blksz;
-   conv->descriptors[0].type = ZTYP_CONVENTIONAL;
-   conv->descriptors[0].flags = ZCOND_CONVENTIONAL << 4;
+   conv->descriptors[0].type = BLK_ZONE_TYPE_CONVENTIONAL;
+   conv->descriptors[0].flags = BLK_ZONE_NO_WP << 4;
conv->descriptors[0].length = blksz;
conv->descriptors[0].lba_start = 0;
conv->descriptors[0].lba_wptr = blksz;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d5ef6d8..b76ffbb 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1201,7 +1201,7 @@ static int sd_setup_zone_report_cmnd(struct scsi_cmnd 
*cmd)
src = kmap_atomic(bio->bi_io_vec->bv_page);
conv = src + bio->bi_io_vec->bv_offset;
conv->descriptor_count = cpu_to_be32(1);
-   conv->same_field = ZS_ALL_SAME;
+   conv->same_field = BLK_ZONE_SAME_ALL;
conv->maximum_lba = cpu_to_be64(disk->part0.nr_sects);
kunmap_atomic(src);
goto out;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 68198eb..d5cdb5d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -263,26 +263,6 @@ struct blk_queue_tag {
 #define BLK_SCSI_CMD_PER_LONG  (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))
 
 #ifdef CONFIG_BLK_DEV_ZONED
-enum blk_zone_type {
-   BLK_ZONE_TYPE_UNKNOWN,
-   BLK_ZONE_TYPE_CONVENTIONAL,
-   BLK_ZONE_TYPE_SEQWRITE_REQ,
-   BLK_ZONE_TYPE_SEQWRITE_PREF,
-   BLK_ZONE_TYPE_RESERVED,
-};
-
-enum blk_zone_state {
-   BLK_ZONE_NO_WP,
-   BLK_ZONE_EMPTY,
-   BLK_ZONE_OPEN,
-   BLK_ZONE_OPEN_EXPLICIT,
-   BLK_ZONE_CLOSED,
-   BLK_ZONE_UNKNOWN = 5,
-   BLK_ZONE_READONLY = 0xd,
-   BLK_ZONE_FULL,
-   BLK_ZONE_OFFLINE,
-   BLK_ZONE_BUSY = 0x20,
-};
 
 struct blk_zone {
struct rb_node node;
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index 6ba66e0..d1defd1 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -299,21 +299,4 @@ struct scsi_lun {
 #define SCSI_ACCESS_STATE_MASK0x0f
 #define SCSI_ACCESS_STATE_PREFERRED   0x80
 
-/* Reporting options for REPORT ZONES */
-enum zbc_zone_reporting_options {
-   ZBC_ZONE_REPORTING_OPTION_ALL = 0,
-   ZBC_ZONE_REPORTING_OPTION_EMPTY,
-   ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
-   ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
-   ZBC_ZONE_REPORTING_OPTION_CLOSED,
-   ZBC_ZONE_REPORTING_OPTION_FULL,
-   ZBC_ZONE_REPORTING_OPTION_READONLY,
-   ZBC_ZONE_REPORTING_OPTION_OFFLINE,
-   ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
-   ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
-   ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
-};
-
-#define ZBC_REPORT_ZONE_PARTIAL 0x80
-
 #endif /* _SCSI_PROTO_H_ */
diff --git a/include/uapi/linux/blkzoned_api.h 
b/include/uapi/linux/blkzoned_api.h
index cd81a9f..fa12976 100644
--- a/include/uapi/linux/blkzoned_api.h
+++ b/include/uapi/linux/blkzoned_api.h
@@ -16,97 +16,123 @@
 
 #include 
 
+#define ZBC_REPORT_OPTION_MASK  0x3f
+#define ZBC_REPORT_ZONE_PARTIAL 0x80
+
 /**
  * enum zone_report_option - Report Zones types to be included.
  *
- * @ZOPT_NON_SEQ_AND_RESET: Default (all zones).
- * @ZOPT_ZC1_EMPTY: Zones which are empty.
- * @ZOPT_ZC2_OPEN_IMPLICIT: Zones open but not explicitly opened
- * @ZOPT_ZC3_OPEN_EXPLICIT: Zones opened explicitly
- * @ZOPT_ZC4_CLOSED: Zones closed for writing.
- * @ZOPT_ZC5_FULL: Zones that are full.
- * @ZOPT_ZC6_READ_ONLY: Zones that are read-only
- * @ZOPT_ZC7_OFFLINE: Zones that are offline
- * @ZOPT_RESET: Zones that are empty
- * @ZOPT_NON_SEQ: Zones that have HA media-cache writes pending
- * @ZOPT_NON_WP_ZONES: Zones that do not have Write Pointers (conventional)
- * @ZOPT_PARTIAL_FLAG: Modifies the definition of the Zone List Length field.
+ * @ZBC_ZONE_REPORTING_OPTION_ALL: Default (all zones).
+ * @ZBC_ZONE_REPORTING_OPTION_EMPTY: Zones which 

[PATCH v2 4/4] Integrate ZBC command requests with zone cache.

2016-08-21 Thread Shaun Tancheff
Block layer (bio/request) commands can use or update the
sd_zbc zone cache as appropriate for each command.

Report Zones [REQ_OP_ZONE_REPORT] by default uses the current
zone cache data to generate a device (ZBC spec) formatted response.
REQ_META can also be specified to force the command to the device
and the result will be used to refresh the zone cache.

Reset WP [REQ_OP_ZONE_RESET] by default will attempt to translate
the request into a discard following the SD_ZBC_RESET_WP provisioning
mode. REQ_META can also be specified to force the command to be sent
to the device.

Open, Close and Finish zones having no other analog are sent directly
to the device.

On successful completion each zone action will update the zone cache
as appropriate.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
 block/blk-lib.c   |  16 --
 drivers/scsi/sd.c |  42 +++-
 drivers/scsi/sd.h |  22 +-
 drivers/scsi/sd_zbc.c | 672 +++---
 4 files changed, 698 insertions(+), 54 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 67b9258..8cc5893 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -307,22 +307,6 @@ int blkdev_issue_zone_report(struct block_device *bdev, 
unsigned int op_flags,
bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, op_flags);
ret = submit_bio_wait(bio);
 
-   /*
-* When our request it nak'd the underlying device maybe conventional
-* so ... report a single conventional zone the size of the device.
-*/
-   if (ret == -EIO && conv->descriptor_count) {
-   /* Adjust the conventional to the size of the partition ... */
-   __be64 blksz = cpu_to_be64(bdev->bd_part->nr_sects);
-
-   conv->maximum_lba = blksz;
-   conv->descriptors[0].type = BLK_ZONE_TYPE_CONVENTIONAL;
-   conv->descriptors[0].flags = BLK_ZONE_NO_WP << 4;
-   conv->descriptors[0].length = blksz;
-   conv->descriptors[0].lba_start = 0;
-   conv->descriptors[0].lba_wptr = blksz;
-   ret = 0;
-   }
bio_put(bio);
return ret;
 }
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b76ffbb..9a649fa 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1181,9 +1181,10 @@ static int sd_setup_zone_report_cmnd(struct scsi_cmnd 
*cmd)
struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
struct bio *bio = rq->bio;
sector_t sector = blk_rq_pos(rq);
-   struct gendisk *disk = rq->rq_disk;
unsigned int nr_bytes = blk_rq_bytes(rq);
int ret = BLKPREP_KILL;
+   bool is_fua = (rq->cmd_flags & REQ_META) ? true : false;
+   u8 rpt_opt = ZBC_ZONE_REPORTING_OPTION_ALL;
 
WARN_ON(nr_bytes == 0);
 
@@ -1194,18 +1195,35 @@ static int sd_setup_zone_report_cmnd(struct scsi_cmnd 
*cmd)
if (sdkp->zoned != 1 && sdkp->device->type != TYPE_ZBC) {
void *src;
struct bdev_zone_report *conv;
+   __be64 blksz = cpu_to_be64(sdkp->capacity);
 
-   if (nr_bytes < sizeof(struct bdev_zone_report))
+   if (nr_bytes < 512)
goto out;
 
src = kmap_atomic(bio->bi_io_vec->bv_page);
conv = src + bio->bi_io_vec->bv_offset;
conv->descriptor_count = cpu_to_be32(1);
conv->same_field = BLK_ZONE_SAME_ALL;
-   conv->maximum_lba = cpu_to_be64(disk->part0.nr_sects);
+   conv->maximum_lba = blksz;
+   conv->descriptors[0].type = BLK_ZONE_TYPE_CONVENTIONAL;
+   conv->descriptors[0].flags = BLK_ZONE_NO_WP << 4;
+   conv->descriptors[0].length = blksz;
+   conv->descriptors[0].lba_start = 0;
+   conv->descriptors[0].lba_wptr = blksz;
kunmap_atomic(src);
+   ret = BLKPREP_DONE;
goto out;
}
+   /* FUTURE ... when streamid is available */
+   /* rpt_opt = bio_get_streamid(bio); */
+
+   if (!is_fua) {
+   ret = sd_zbc_setup_zone_report_cmnd(cmd, rpt_opt);
+   if (ret == BLKPREP_DONE || ret == BLKPREP_DEFER)
+   goto out;
+   if (ret == BLKPREP_KILL)
+   pr_err("No Zone Cache, query media.\n");
+   }
 
ret = scsi_init_io(cmd);
if (ret != BLKPREP_OK)
@@ -1224,8 +1242,7 @@ static int sd_setup_zone_report_cmnd(struct scsi_cmnd 
*cmd)
cmd->cmnd[1] = ZI_REPORT_ZONES;
put_unaligned_be64(sector, >cmnd[2]);
put_unaligned_be32(nr_bytes, >cmnd[10]);
-   /* FUTURE ... when streamid is available */
-   /* cmd->cmnd[14] = bio_get_streamid(bio); */
+   cmd->cmnd[14] = rpt_opt;
 
cmd->sc_data_direction = DMA

[PATCH v2 0/4] Integrate bio/request ZBC ops with zone cache

2016-08-21 Thread Shaun Tancheff
Hi,

As per Christoph's request this patch incorporates Hannes' cache of zone
information.

This approach is to have REQ_OP_ZONE_REPORT return data in the same
format regardless of the availability of the zone cache. So if the
is kernel being built with or without BLK_DEV_ZONED [and SCSI_ZBC]
users of blkdev_issue_zone_report() and/or REQ_OP_ZONE_REPORT bio's
will have a consistent data format to digest.

Additionally it seems reasonable to allow the REQ_OP_ZONE_* to
be able to indicate if the command *must* be delivered to the
device [and update the zone cache] accordingly. Here REQ_META is
being used as REQ_FUA can be dropped causing sd_done to be skipped.
Rather than special case the current code I chose to pick an otherwise
non-applicable flag.

This series is based off of Linus's v4.8-rc2 and builds on top of the
previous series of block layer support:
Add ioctl to issue ZBC/ZAC commands via block layer
Add bio/request flags to issue ZBC/ZAC commands
as well as the series posted by Hannes
sd_zbc: Fix handling of ZBC read after write pointer
sd: Limit messages for ZBC disks capacity change
sd: Implement support for ZBC devices
sd: Implement new RESET_WP provisioning mode
sd: configure ZBC devices
...

Patches for util-linux can be found here:
g...@github.com:stancheff/util-linux.git v2.28.1+biof

https://github.com/stancheff/util-linux/tree/v2.28.1%2Bbiof

This patch is available here:
https://github.com/stancheff/linux/tree/v4.8-rc2%2Bbiof.v9

g...@github.com:stancheff/linux.git v4.8-rc2+biof.v9

v2:
 - Fully integrated bio <-> zone cache [<-> device]
 - Added discard -> write same for conventional zones.
 - Merged disparate constants into a canonical set.

Shaun Tancheff (4):
  Enable support for Seagate HostAware drives (testing).
  On Discard either do Reset WP or Write Same
  Merge ZBC constants
  Integrate ZBC command requests with zone cache.

 block/blk-lib.c   |  16 -
 drivers/scsi/sd.c | 111 +++--
 drivers/scsi/sd.h |  49 ++-
 drivers/scsi/sd_zbc.c | 904 ++
 include/linux/blkdev.h|  22 +-
 include/scsi/scsi_proto.h |  17 -
 include/uapi/linux/blkzoned_api.h | 167 ---
 7 files changed, 1032 insertions(+), 254 deletions(-)

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] Enable support for Seagate HostAware drives

2016-08-21 Thread Shaun Tancheff
Seagate drives report a SAME code of 0 due to having:
  - Zones of different types (CMR zones at the low LBA space).
  - Zones of different size (A terminating 'runt' zone in the high
lba space).

Support loading the zone topology into the zone cache.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
 drivers/scsi/sd.c  |  22 +++---
 drivers/scsi/sd.h  |  20 --
 drivers/scsi/sd_zbc.c  | 183 +++--
 include/linux/blkdev.h |  16 +++--
 4 files changed, 170 insertions(+), 71 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 059a57f..7903e21 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -693,8 +693,13 @@ static void sd_config_discard(struct scsi_disk *sdkp, 
unsigned int mode)
break;
 
case SD_ZBC_RESET_WP:
-   max_blocks = sdkp->unmap_granularity;
q->limits.discard_zeroes_data = 1;
+   q->limits.discard_granularity =
+   sd_zbc_discard_granularity(sdkp);
+
+   max_blocks = min_not_zero(sdkp->unmap_granularity,
+ q->limits.discard_granularity >>
+   ilog2(logical_block_size));
break;
 
case SD_LBP_ZERO:
@@ -1955,13 +1960,12 @@ static int sd_done(struct scsi_cmnd *SCpnt)
good_bytes = blk_rq_bytes(req);
scsi_set_resid(SCpnt, 0);
} else {
-#ifdef CONFIG_SCSI_ZBC
if (op == ZBC_OUT)
/* RESET WRITE POINTER failed */
sd_zbc_update_zones(sdkp,
blk_rq_pos(req),
-   512, true);
-#endif
+   512, SD_ZBC_RESET_WP_ERR);
+
good_bytes = 0;
scsi_set_resid(SCpnt, blk_rq_bytes(req));
}
@@ -2034,7 +2038,6 @@ static int sd_done(struct scsi_cmnd *SCpnt)
good_bytes = blk_rq_bytes(req);
scsi_set_resid(SCpnt, 0);
}
-#ifdef CONFIG_SCSI_ZBC
/*
 * ZBC: Unaligned write command.
 * Write did not start a write pointer position.
@@ -2042,8 +2045,7 @@ static int sd_done(struct scsi_cmnd *SCpnt)
if (sshdr.ascq == 0x04)
sd_zbc_update_zones(sdkp,
blk_rq_pos(req),
-   512, true);
-#endif
+   512, SD_ZBC_WRITE_ERR);
}
break;
default:
@@ -2270,7 +2272,7 @@ static void sd_read_zones(struct scsi_disk *sdkp, 
unsigned char *buffer)
 * supports equal zone sizes.
 */
same = buffer[4] & 0xf;
-   if (same == 0 || same > 3) {
+   if (same > 3) {
sd_printk(KERN_WARNING, sdkp,
  "REPORT ZONES SAME type %d not supported\n", same);
return;
@@ -2282,9 +2284,9 @@ static void sd_read_zones(struct scsi_disk *sdkp, 
unsigned char *buffer)
sdkp->unmap_granularity = zone_len;
blk_queue_chunk_sectors(sdkp->disk->queue,
logical_to_sectors(sdkp->device, zone_len));
-   sd_config_discard(sdkp, SD_ZBC_RESET_WP);
 
-   sd_zbc_setup(sdkp, buffer, SD_BUF_SIZE);
+   sd_zbc_setup(sdkp, zone_len, buffer, SD_BUF_SIZE);
+   sd_config_discard(sdkp, SD_ZBC_RESET_WP);
 }
 
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device 
*sdp,
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 6ae4505..ef6c132 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -283,19 +283,24 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, 
unsigned int a)
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
+
+#define SD_ZBC_INIT0
+#define SD_ZBC_RESET_WP_ERR1
+#define SD_ZBC_WRITE_ERR   2
+
 #ifdef CONFIG_SCSI_ZBC
 
 extern int sd_zbc_report_zones(struct scsi_disk *, unsigned char *, int,
   sector_t, enum zbc_zone_reporting_options, bool);
-extern int sd_zbc_setup(struct scsi_disk *, char *, int);
+extern int sd_zbc_setup(struct scsi_disk *, u64 zlen, char *buf, int buf_len);
 extern void sd_zbc_remove(struct scsi_disk *);
 extern void sd_zbc_reset_zones(struct scsi_disk *);
 extern int sd_zbc_setup_discard(struct scsi_disk *, struct request *,
sector_t, unsigned int);
 extern int sd_zbc_setup_read_write(struct scsi_disk *, struct request *,
   sector_t, unsigned int *);
-extern void sd_zbc_upd

[PATCH v8 2/2] Add ioctl to issue ZBC/ZAC commands via block layer

2016-08-21 Thread Shaun Tancheff
Add support for ZBC ioctl's
BLKREPORT - Issue Report Zones to device.
BLKZONEACTION - Issue a Zone Action (Close, Finish, Open, or Reset)

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v8:
 - Changed ioctl for zone actions to a single ioctl that takes 
   a structure including the zone, zone action, all flag, and force option
 - Mapped REQ_META flag to 'force unit access' for zone operations
v6:
 - Added GFP_DMA to gfp mask.
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's

 block/ioctl.c | 149 ++
 include/uapi/linux/blkzoned_api.h |  30 +++-
 include/uapi/linux/fs.h   |   1 +
 3 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..d760523 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -194,6 +194,151 @@ int blkdev_reread_part(struct block_device *bdev)
 }
 EXPORT_SYMBOL(blkdev_reread_part);
 
+static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode,
+ void __user *parg)
+{
+   int error = -EFAULT;
+   gfp_t gfp = GFP_KERNEL | GFP_DMA;
+   void *iopg = NULL;
+   struct bdev_zone_report_io *bzrpt = NULL;
+   int order = 0;
+   struct page *pgs = NULL;
+   u32 alloc_size = PAGE_SIZE;
+   unsigned int op_flags = 0;
+   u8 opt = 0;
+
+   if (!(mode & FMODE_READ))
+   return -EBADF;
+
+   iopg = (void *)get_zeroed_page(gfp);
+   if (!iopg) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   bzrpt = iopg;
+   if (copy_from_user(bzrpt, parg, sizeof(*bzrpt))) {
+   error = -EFAULT;
+   goto report_zones_out;
+   }
+   if (bzrpt->data.in.return_page_count > alloc_size) {
+   int npages;
+
+   alloc_size = bzrpt->data.in.return_page_count;
+   npages = (alloc_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+   pgs = alloc_pages(gfp, ilog2(npages));
+   if (pgs) {
+   void *mem = page_address(pgs);
+
+   if (!mem) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   order = ilog2(npages);
+   memset(mem, 0, alloc_size);
+   memcpy(mem, bzrpt, sizeof(*bzrpt));
+   bzrpt = mem;
+   } else {
+   /* Result requires DMA capable memory */
+   pr_err("Not enough memory available for request.\n");
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   } else {
+   alloc_size = bzrpt->data.in.return_page_count;
+   }
+   if (bzrpt->data.in.force_unit_access)
+   op_flags |= REQ_META;
+   opt = bzrpt->data.in.report_option;
+   error = blkdev_issue_zone_report(bdev, op_flags,
+   bzrpt->data.in.zone_locator_lba, opt,
+   pgs ? pgs : virt_to_page(iopg),
+   alloc_size, GFP_KERNEL);
+   if (error)
+   goto report_zones_out;
+
+   if (pgs) {
+   void *src = bzrpt;
+   u32 off = 0;
+
+   /*
+* When moving a multi-order page with GFP_DMA
+* the copy to user can trap ""
+* so instead we copy out 1 page at a time.
+*/
+   while (off < alloc_size && !error) {
+   u32 len = min_t(u32, PAGE_SIZE, alloc_size - off);
+
+   memcpy(iopg, src + off, len);
+   if (copy_to_user(parg + off, iopg, len))
+   error = -EFAULT;
+   off += len;
+   }
+   } else {
+   if (copy_to_user(parg, iopg, alloc_size))
+   error = -EFAULT;
+   }
+
+report_zones_out:
+   if (pgs)
+   __free_pages(pgs, order);
+   if (iopg)
+   free_page((unsigned long)iopg);
+   return error;
+}
+
+static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode,
+ void __user *parg)
+{
+   unsigned int op = 0;
+   unsigned int op_flags = 0;
+   sector_t lba;
+   struct bdev_zone_action za;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   /* When acting on zones we explicitly disallow using a partition. */
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
+   if (copy_from_user(, parg, sizeof(za)))
+   

[PATCH v8 1/2] Add bio/request flags to issue ZBC/ZAC commands

2016-08-21 Thread Shaun Tancheff
Add op flags to access to zone information as well as open, close
and reset zones:
  - REQ_OP_ZONE_REPORT - Query zone information (Report zones)
  - REQ_OP_ZONE_OPEN - Explicitly open a zone for writing
  - REQ_OP_ZONE_CLOSE - Explicitly close a zone
  - REQ_OP_ZONE_FINISH - Explicitly finish a zone
  - REQ_OP_ZONE_RESET - Reset Write Pointer to start of zone

These op flags can be used to create bio's to control zoned devices
through the block layer.

This is useful for file systems and device mappers that need explicit
control of zoned devices such as Host Managed and Host Aware SMR drives,

Report zones is a device read that requires a buffer.

Open, Close, Finish and Reset are device commands that have no
associated data transfer.
  Open -   Open is a zone for writing.
  Close -  Disallow writing to a zone.
  Finish - Disallow writing a zone and set the WP to the end
   of the zone.
  Reset -  Discard data in a zone and reset the WP to the start
   of the zone.

Sending an LBA of ~0 will attempt to operate on all zones.
This is typically used with Reset to wipe a drive as a Reset
behaves similar to TRIM in that all data in the zone(s) is deleted.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option flag is useful as it can reduce the number
of zones in each report, but not critical.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v8:
 - Added Finish Zone op
 - Fixed report zones copy to user to work when HARDENED_USERCOPY is enabled
v6:
 - Added GFP_DMA to gfp mask.
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.

 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  94 
 drivers/scsi/sd.c | 121 +
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   8 +-
 include/linux/blk_types.h |   7 +-
 include/linux/blkdev.h|   1 +
 include/linux/blkzoned_api.h  |  25 ++
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 182 ++
 10 files changed, 447 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a306795..aedf311 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12984,6 +12984,15 @@ F: Documentation/networking/z8530drv.txt
 F: drivers/net/hamradio/*scc.c
 F: drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M: Shaun Tancheff <shaun.tanch...@seagate.com>
+W: http://seagate.com
+W: https://github.com/Seagate/ZDM-Device-Mapper
+L: linux-bl...@vger.kernel.org
+S: Maintained
+F: include/linux/blkzoned_api.h
+F: include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M: Seth Jennings <sjenn...@redhat.com>
 L: linux...@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..e92bd56 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -266,3 +266,97 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:  target blockdev
+ * @op_flags:  extra bio rw flags. If unsure, use 0.
+ * @sector:starting sector (report will include this sector).
+ * @opt:   See: zone_report_option, default is 0 (all zones).
+ * @page:  one or more contiguous pages.
+ * @pgsz:  up to size of page in bytes, size of report.
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+sector_t sector, u8 opt, struct page *page,
+size_t pgsz, gfp_t gfp_mask)
+{
+   struct bdev_zone_report *conv = page_address(page);
+   struct bio *bio;
+   unsigned int nr_iovecs = 1;
+   int ret = 0;
+
+   if (pgsz < (sizeof(struct bdev_zone_report) +
+   sizeof(struct bdev_zone_descriptor)))
+   return -EINVAL;
+
+   bio = bio_alloc(gfp_mask, nr_iovecs);
+   if (!bio)
+   return -E

[PATCH v8 0/2] Block layer support ZAC/ZBC commands

2016-08-21 Thread Shaun Tancheff
Hi Jens,

This series is based on linus' v4.8-rc2 branch.

As Host Aware drives are becoming available we would like to be able
to make use of such drives. This series is also intended to be
suitable for use by Host Managed drives.

ZBC [and ZAC] drives add new commands for discovering and working
with Zones.

Part one of this series expands the bio/request reserved op size from
3 to 4 bits and then adds op codes for each of the ZBC commands:
   Report zones, close zone, finish zone, open zone and reset zone.

Part two of this series deals with integrating these new bio/request
op's with Hannes' zone cache.

This extends the ZBC support up to the block layer allowing direct
control by file systems or device mapper targets. Also by deferring
the zone handling to the authoritative subsystem there is an overall
lower memory usage for holding the active zone information as well
as clarifying responsible party for maintaining the write pointer
for each active zone.

By way of example a DM target may have several writes in progress. To sector
(or lba) for those writes will each depend on the previous write. While the
drive's write pointer will be updated as writes are completed the DM target
will be maintaining both where the next write should be scheduled from and
where the write pointer is based on writes completed w/o errors.

Knowing the drive zone topology enables DM targets and file systems to
extend their block allocation schemes and issue write pointer resets (or
discards) that are zone aligned.

A perhaps non-obvious approach is that a conventional drive will
returns a zone report descriptor with a single large conventional zone.
This is intended to allow a collection of zoned and non-zoned media to
be stitched together to provide a file system with a zoned device with
conventional space mapped to where it is useful.

Patches for util-linux can be found here:
g...@github.com:stancheff/util-linux.git v2.28.1+biof

https://github.com/stancheff/util-linux/tree/v2.28.1%2Bbiof

This patch is available here:
https://github.com/stancheff/linux/tree/v4.8-rc2%2Bbiof.v8

g...@github.com:stancheff/linux.git v4.8-rc2+biof.v8

v8:
 - Changed zone report to default to reading from zone cache.
 - Changed ioctl for zone commands to support forcing a query or command
   to be sent to media.
 - Fixed report zones copy to user to work when HARDENED_USERCOPY is enabled
v7:
 - Initial support for Hannes' zone cache.
v6:
 - Fix page alloc to include DMA flag for ioctl.
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
 - Dropped ata16 hackery
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
 - Use zoned report reserved bit for ata-passthrough flag.

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Moved include/uapi/linux/fs.h changes to patch 3
 - Fixed commit message for first patch in series.


Shaun Tancheff (2):
  Add bio/request flags to issue ZBC/ZAC commands
  Add ioctl to issue ZBC/ZAC commands via block layer

 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  94 +
 block/ioctl.c | 149 +++
 drivers/scsi/sd.c | 121 ++
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   8 +-
 include/linux/blk_types.h |   7 +-
 include/linux/blkdev.h|   1 +
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 210 ++
 include/uapi/linux/fs.h   |   1 +
 12 files changed, 625 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands

2016-08-16 Thread Shaun Tancheff
On Tue, Aug 9, 2016 at 11:38 PM, Damien Le Moal <damien.lem...@hgst.com> wrote:
> Shaun,
>
> On 8/10/16 12:58, Shaun Tancheff wrote:
>>
>> On Tue, Aug 9, 2016 at 3:09 AM, Damien Le Moal <damien.lem...@hgst.com>
>> wrote:
>>>>
>>>> On Aug 9, 2016, at 15:47, Hannes Reinecke <h...@suse.de> wrote:
>>
>>
>> [trim]
>>
>>>>> Since disk type == 0 for everything that isn't HM so I would prefer the
>>>>> sysfs 'zoned' file just report if the drive is HA or HM.
>>>>>
>>>> Okay. So let's put in the 'zoned' attribute the device type:
>>>> 'host-managed', 'host-aware', or 'device managed'.
>>>
>>>
>>> I hacked your patches and simply put a "0" or "1" in the sysfs zoned
>>> file.
>>> Any drive that has ZBC/ZAC command support gets a "1", "0" for everything
>>> else. This means that drive managed models are not exposed as zoned block
>>> devices. For HM vs HA differentiation, an application can look at the
>>> device type file since it is already present.
>>>
>>> We could indeed set the "zoned" file to the device type, but HM drives
>>> and
>>> regular drives will both have "0" in it, so no differentiation possible.
>>> The other choice could be the "zoned" bits defined by ZBC, but these
>>> do not define a value for host managed drives, and the drive managed
>>> value
>>> being not "0" could be confusing too. So I settled for a simple 0/1
>>> boolean.
>>
>>
>> This seems good to me.
>
>
> Another option I forgot is for the "zoned" file to indicate the total number
> of zones of the device, and 0 for a non zoned regular block device. That
> would work as well.

Clearly either is sufficient.

> [...]
>>>
>>> Done: I hacked Shaun ioctl code and added finish zone too. The
>>> difference with Shaun initial code is that the ioctl are propagated down
>>> to
>>> the driver (__blkdev_driver_ioctl -> sd_ioctl) so that there is no need
>>> for
>>> BIO request definition for the zone operations. So a lot less code added.
>>
>>
>> The purpose of the BIO flags is not to enable the ioctls so much as
>> the other way round. Creating BIO op's is to enable issuing ZBC
>> commands from device mapper targets and file systems without some
>> heinous ioctl hacks.
>> Making the resulting block layer interfaces available via ioctls is just a
>> reasonable way to exercise the code ... or that was my intent.
>
>
> Yes, I understood your code. However, since (or if) we keep the zone
> information in the RB-tree cache, there is no need for the report zone
> operation BIO interface. Same for reset write pointer by keeping the mapping
> to discard. blk_lookup_zone can be used in kernel as a report zone BIO
> replacement and works as well for the report zone ioctl implementation. For
> reset, there is blkdev_issue_discrad in kernel, and the reset zone ioctl
> becomes equivalent to BLKDISCARD ioctl. These are simple. Open, close and
> finish zone remains. For these, adding the BIO interface seemed an overkill.
> Hence my choice of propagating the ioctl to the driver.
> This is debatable of course, and adding an in-kernel interface is not hard:
> we can implement blk_open_zone, blk_close_zone and blk_finish_zone using
> __blkdev_driver_ioctl. That looks clean to me.

Uh. I would call that "heinous" ioctl hacks myself. Kernel -> User API
-> Kernel
is not really a good designed IMO.

> Overall, my concern with the BIO based interface for the ZBC commands is
> that it adds one flag for each command, which is not really the philosophy
> of the interface and potentially opens the door for more such
> implementations in the future with new standards and new commands coming up.
> Clearly that is not a sustainable path. So I think that a more specific
> interface for these zone operations is a better choice. That is consistent
> with what happens with the tons of ATA and SCSI commands not actually doing
> data I/Os (mode sense, log pages, SMART, etc). All these do not use BIOs and
> are processed as request REQ_TYPE_BLOCK_PC.

Part of the reason for following on Mike Christie's bio op/flags cleanup was to
make these op's. The advantage of being added as ops is that there is only
1 extra bit need (not 4 or 5 bits for flags). The other reason for being
promoted into the block layer as commands is because it seems to me
to make sense that these abstractions could be allowed to be passed through
a DM layer and be handled by a files 

Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands

2016-08-15 Thread Shaun Tancheff
On Mon, Aug 15, 2016 at 11:00 PM, Damien Le Moal <damien.lem...@hgst.com> wrote:
>
> Shaun,
>
>> On Aug 14, 2016, at 09:09, Shaun Tancheff <shaun.tanch...@seagate.com> wrote:
> […]
>>>>
>>> No, surely not.
>>> But one of the _big_ advantages for the RB tree is blkdev_discard().
>>> Without the RB tree any mkfs program will issue a 'discard' for every
>>> sector. We will be able to coalesce those into one discard per zone, but
>>> we still need to issue one for _every_ zone.
>>
>> How can you make coalesce work transparently in the
>> sd layer _without_ keeping some sort of a discard cache along
>> with the zone cache?
>>
>> Currently the block layer's blkdev_issue_discard() is breaking
>> large discard's into nice granular and aligned chunks but it is
>> not preventing small discards nor coalescing them.
>>
>> In the sd layer would there be way to persist or purge an
>> overly large discard cache? What about honoring
>> discard_zeroes_data? Once the discard is completed with
>> discard_zeroes_data you have to return zeroes whenever
>> a discarded sector is read. Isn't that a log more than just
>> tracking a write pointer? Couldn't a zone have dozens of holes?
>
> My understanding of the standards regarding discard is that it is not
> mandatory and that it is a hint to the drive. The drive can completely
> ignore it if it thinks that is a better choice. I may be wrong on this
> though. Need to check again.

But you are currently setting discard_zeroes_data=1 in your
current patches. I believe that setting discard_zeroes_data=1
effectively promotes discards to being mandatory.

I have a follow on patch to my SCT Write Same series that
handles the CMR zone case in the sd_zbc_setup_discard() handler.

> For reset write pointer, the mapping to discard requires that the calls
> to blkdev_issue_discard be zone aligned for anything to happen. Specify
> less than a zone and nothing will be done. This I think preserve the
> discard semantic.

Oh. If that is the intent then there is just a bug in the handler.
I have pointed out where I believe it to be in my response to
the zone cache patch being posted.

> As for the “discard_zeroes_data” thing, I also think that is a drive
> feature not mandatory. Drives may have it or not, which is consistent
> with the ZBC/ZAC standards regarding reading after write pointer (nothing
> says that zeros have to be returned). In any case, discard of CMR zones
> will be a nop, so for SMR drives, discard_zeroes_data=0 may be a better
> choice.

However I am still curious about discard's being coalesced.

>>> Which is (as indicated) really slow, and easily takes several minutes.
>>> With the RB tree we can short-circuit discards to empty zones, and speed
>>> up processing time dramatically.
>>> Sure we could be moving the logic into mkfs and friends, but that would
>>> require us to change the programs and agree on a library (libzbc?) which
>>> should be handling that.
>>
>> F2FS's mkfs.f2fs is already reading the zone topology via SG_IO ...
>> so I'm not sure your argument is valid here.
>
> This initial SMR support patch is just that: a first try. Jaegeuk
> used SG_IO (in fact copy-paste of parts of libzbc) because the current
> ZBC patch-set has no ioctl API for zone information manipulation. We
> will fix this mkfs.f2fs once we agree on an ioctl interface.

Which again is my point. If mkfs.f2fs wants to speed up it's
discard pass in mkfs.f2fs by _not_ sending unneccessary
Reset WP for zones that are already empty it has all the
information it needs to do so.

Here it seems to me that the zone cache is _at_best_
doing double work. At works the zone cache could be
doing the wrong thing _if_ the zone cache got out of sync.
It is certainly possible (however unlikely) that someone was
doing some raw sg activity that is not seed by the sd path.

All I am trying to do is have a discussion about the reasons for
and against have a zone cache. Where it works and where it breaks
this should be entirely technical but I understand that we have all
spent a lot of time _not_ discussing this for various non-technical
reasons.

So far the only reason I've been able to ascertain is that
Host Manged drives really don't like being stuck with the
URSWRZ and would like to have a software hack to return
MUD rather than ship drives with some weird out-of-the box
config where the last zone is marked as FINISH'd thereby
returning MUD on reads as per spec.

I understand that it would be strange state to see of first
boot and likely people would just do a ResetWP and have
weird boot errors, which would probably just make matters
worse.

I just would rather the work around be a bit cleaner and/or
use less 

Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands

2016-08-14 Thread Shaun Tancheff
On Tue, Aug 9, 2016 at 1:47 AM, Hannes Reinecke <h...@suse.de> wrote:
> On 08/05/2016 10:35 PM, Shaun Tancheff wrote:
>> On Tue, Aug 2, 2016 at 8:29 PM, Damien Le Moal <damien.lem...@hgst.com> 
>> wrote:
>>> Hannes, Shaun,
>>>
>>> Let me add some more comments.
>>>
>>>> On Aug 2, 2016, at 23:35, Hannes Reinecke <h...@suse.de> wrote:
>>>>
>>>> On 08/01/2016 07:07 PM, Shaun Tancheff wrote:
>>>>> On Mon, Aug 1, 2016 at 4:41 AM, Christoph Hellwig <h...@lst.de> wrote:
>>>>>>
>>>>>> Can you please integrate this with Hannes series so that it uses
>>>>>> his cache of the zone information?
>>>>>
>>>>> Adding Hannes and Damien to Cc.
>>>>>
>>>>> Christoph,
>>>>>
>>>>> I can make a patch the marshal Hannes' RB-Tree into to a block report, 
>>>>> that is
>>>>> quite simple. I can even have the open/close/reset zone commands update 
>>>>> the
>>>>> RB-Tree .. the non-private parts anyway. I would prefer to do this around 
>>>>> the
>>>>> CONFIG_SD_ZBC support, offering the existing type of patch for setups 
>>>>> that do
>>>>> not need the RB-Tree to function with zoned media.
>>
>> I have posted patches to integrate with the zone cache, hopefully they
>> make sense.
>>
> [ .. ]
>>>> I have thought about condensing the RB tree information, but then I
>>>> figured that for 'real' SMR handling we cannot assume all zones are of
>>>> fixed size, and hence we need all the information there.
>>>> Any condensing method would assume a given structure of the zones, which
>>>> the standard just doesn't provide.
>>>> Or am I missing something here?

Of course you can condense the zone cache without loosing any
information. Here is the layout I used ... I haven't update the patch
to the latest posted patches but this is the basic idea.

[It was originally done as a follow on of making your zone cache work
 with Seagate's HA drive. I did not include the wp-in-arrays patch
 along with the HA drive support that I sent you in May as you were
 quite terse about RB trees when I tried to discuss this approach with
 you at Vault]

struct blk_zone {
unsigned type:4;
unsigned state:5;
unsigned extra:7;
unsigned wp:40;
void *private_data;
};

struct contiguous_wps {
u64 start_lba;
u64 last_lba; /* or # of blocks */
u64 zone_size; /* size in blocks */
unsigned is_zoned:1;
u32 zone_count;
spinlock_t lock;
struct blk_zone zones[0];
};

struct zone_wps {
u32 wps_count;
struct contiguous_wps **wps;
};

Then in struct request_queue
-struct rb_root zones;
+   struct struct zone_wps *zones;

For each contiguous chunk of zones you need a descriptor. In the current
drives you need 1 or 2 descriptors.

Here a conventional drive is encapsulated as zoned media with one
drive sized conventional zone.

I have not spent time building an ad-hoc LVM comprised of zoned and
conventional media so it's not all ironed out yet.
I think you can see the advantage of being able to put conventional space
anywhere you would like to work around zoned media not being laid out
the the best manner for your setup.

Yes things start to break down if every other zone is a different size ..

The point being that even with supporting zones that order 48 bytes.
in size this saves a lot of space with no loss of information.
I still kind of prefer pushing blk_zone down to a u32 by reducing
the max zone size and dropping the private_data ... but that may
be going a bit too far.

blk_lookup_zone then has an [unfortunate] signature change:


/**
 * blk_lookup_zone() - Lookup zones
 * @q: Request Queue
 * @sector: Location to lookup
 * @start: Starting location zone (OUT: Required)
 * @len: Length of zone (OUT: Required)
 * @lock: Spinlock of zones (OUT: Required)
 */
struct blk_zone *blk_lookup_zone(struct request_queue *q, sector_t sector,
 sector_t *start, sector_t *len,
 spinlock_t **lock)
{
int iter;
struct blk_zone *bzone = NULL;
struct zone_wps *zi = q->zones;

*start = 0;
*len = 0;
*lock = NULL;

if (!q->zones)
goto out;

for (iter = 0; iter < zi->wps_count; iter++) {
if (sector >= zi->wps[iter]->start_lba &&
sector <  zi->wps[iter]->last_lba) {
struct contiguous_wps *wp = zi->wps[iter];
u64 index = (sector - wp-&

Re: [RFC] libata-scsi: make sure Maximum Write Same Length is not too large

2016-08-12 Thread Shaun Tancheff
On Fri, Aug 12, 2016 at 3:56 PM, Martin K. Petersen
<martin.peter...@oracle.com> wrote:
>>>>>> "Tom" == Tom Yan <tom.t...@gmail.com> writes:
>
> Tom,
>
>>> put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM / (sector_size / 512), 
>>> [36]);
>
> How many 8-byte ranges fit in a 4096-byte sector?
>
> Tom> So were you trying to pointing out something I am still missing, or
> Tom> were you merely confirming I was right?
>
> I suggest you drop ATA_MAX_TRIM_RNUM and do:
>
> enum {
>  ATA_TRIM_BLOCKS_PER_RANGE = 65535, /* 0x blocks per range desc. */
>  ATA_TRIM_RANGE_SIZE_SHIFT = 3, /* range descriptor is 8 bytes */
> };
>
> put_unaligned_be64(ATA_TRIM_BLOCKS_PER_RANGE *
>sector_size >> ATA_TRIM_RANGE_SIZE_SHIFT, [36]);
>
> Might be worthwhile to create an ata_max_lba_range_blocks() wrapper.

Ah, I think I am understanding now. When the sector size is 4K the
minimum page sent with WRITE SAME will be 4K.

If so, we also need to fix the write_same SATL code that is working
under the assumption of a 512 byte sector sector as the largest
guaranteed amount of data in the associated sg pages.
Keying off of sector_size should be straight forward there...

-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] sd: Implement support for ZBC devices

2016-08-12 Thread Shaun Tancheff
n_lock_irqsave(>lock, flags);
> +
> +   if (zone->state == BLK_ZONE_UNKNOWN ||
> +   zone->state == BLK_ZONE_BUSY) {
> +   sd_zbc_debug_ratelimit(sdkp,
> +  "zone %zu state %x, deferring\n",
> +  zone->start, zone->state);
> +   ret = BLKPREP_DEFER;
> +   goto out;
> +   }
> +   if (zone->state == BLK_ZONE_OFFLINE) {
> +   /* let the drive fail the command */
> +   sd_zbc_debug_ratelimit(sdkp,
> +  "zone %zu offline\n",
> +  zone->start);
> +   goto out;
> +   }
> +
> +   if (rq->cmd_flags & (REQ_WRITE | REQ_WRITE_SAME)) {
> +   if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> +   goto out;
> +   if (zone->state == BLK_ZONE_READONLY)
> +   goto out;
> +   if (blk_zone_is_full(zone)) {
> +   sd_zbc_debug(sdkp,
> +"Write to full zone %zu/%zu\n",
> +sector, zone->wp);
> +   ret = BLKPREP_KILL;
> +   goto out;
> +   }
> +   if (zone->wp != sector) {
> +   sd_zbc_debug(sdkp,
> +"Misaligned write %zu/%zu\n",
> +sector, zone->wp);
> +   ret = BLKPREP_KILL;
> +   goto out;
> +   }
> +   zone->wp += num_sectors;
> +   } else if (blk_zone_is_smr(zone) && (zone->wp <= sector)) {
> +   sd_zbc_debug(sdkp,
> +"Read beyond wp %zu/%zu\n",
> +sector, zone->wp);
> +   ret = BLKPREP_DONE;
> +   }
> +
> +out:
> +   spin_unlock_irqrestore(>lock, flags);
> +
> +   return ret;
> +}
> +
> +int sd_zbc_setup(struct scsi_disk *sdkp, char *buf, int buf_len)
> +{
> +   sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
> +   sector_t last_sector;
> +
> +   if (test_and_set_bit(SD_ZBC_ZONE_INIT, >zone_flags)) {
> +   sdev_printk(KERN_WARNING, sdkp->device,
> +   "zone initialisation already running\n");
> +   return 0;
> +   }
> +
> +   if (!sdkp->zone_work_q) {
> +   char wq_name[32];
> +
> +   sprintf(wq_name, "zbc_wq_%s", sdkp->disk->disk_name);
> +   sdkp->zone_work_q = create_singlethread_workqueue(wq_name);
> +   if (!sdkp->zone_work_q) {
> +   sdev_printk(KERN_WARNING, sdkp->device,
> +   "create zoned disk workqueue failed\n");
> +   return -ENOMEM;
> +   }
> +   } else if (!test_and_set_bit(SD_ZBC_ZONE_RESET, >zone_flags)) {
> +   drain_workqueue(sdkp->zone_work_q);
> +   clear_bit(SD_ZBC_ZONE_RESET, >zone_flags);
> +   }
> +
> +   last_sector = zbc_parse_zones(sdkp, buf, buf_len);
> +   if (last_sector != -1 && last_sector < capacity) {
> +   sd_zbc_update_zones(sdkp, last_sector, SD_ZBC_BUF_SIZE, 
> false);
> +   } else
> +   clear_bit(SD_ZBC_ZONE_INIT, >zone_flags);
> +
> +   return 0;
> +}
> +
> +void sd_zbc_remove(struct scsi_disk *sdkp)
> +{
> +   if (sdkp->zone_work_q) {
> +   if (!test_and_set_bit(SD_ZBC_ZONE_RESET, >zone_flags))
> +   drain_workqueue(sdkp->zone_work_q);
> +   clear_bit(SD_ZBC_ZONE_INIT, >zone_flags);
> +   destroy_workqueue(sdkp->zone_work_q);
> +   }
> +}
> --
> 1.8.5.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  
> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html=CwIBAg=IGDlg0lD0b-nebmJJ0Kp8A=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA=TECAPpeng5OMyCHPt1hU8vo6KAxzybSw2on8YvGxkFA=FuZ8S92fAROISBQ96aUzY73nDV4L0J8ME36u9FCTWK8=



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Update WRITE_SAME timeout in sd_setup_discard_cmnd

2016-08-11 Thread Shaun Tancheff
In sd_setup_discard_cmnd() there are a some discard
methods that fall back to using WRITE_SAME. It
appears that those paths using WRITE_SAME should
also use the SD_WRITE_SAME_TIMEOUT instead of the
default SD_TIMEOUT.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
I don't have a use case that breaks the current code.
It just seems to me that setups for discard and
write same should be consistent.
---
 drivers/scsi/sd.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d3e852a..3c15f3a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -722,6 +722,8 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
if (!page)
return BLKPREP_DEFER;
 
+   rq->timeout = SD_TIMEOUT;
+
switch (sdkp->provisioning_mode) {
case SD_LBP_UNMAP:
buf = page_address(page);
@@ -746,6 +748,7 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
put_unaligned_be32(nr_sectors, >cmnd[10]);
 
len = sdkp->device->sector_size;
+   rq->timeout = SD_WRITE_SAME_TIMEOUT;
break;
 
case SD_LBP_WS10:
@@ -758,6 +761,7 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
put_unaligned_be16(nr_sectors, >cmnd[7]);
 
len = sdkp->device->sector_size;
+   rq->timeout = SD_WRITE_SAME_TIMEOUT;
break;
 
default:
@@ -766,8 +770,6 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
}
 
rq->completion_data = page;
-   rq->timeout = SD_TIMEOUT;
-
cmd->transfersize = len;
cmd->allowed = SD_MAX_RETRIES;
 
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] libata-scsi: make sure Maximum Write Same Length is not too large

2016-08-11 Thread Shaun Tancheff
On Thu, Aug 11, 2016 at 3:26 AM,   wrote:
> From: Tom Yan 
>
> Currently we advertise Maximum Write Same Length based on the
> maximum number of sectors that one-block TRIM payload can cover.
> The field are used to derived discard_max_bytes and
> write_same_max_bytes limits in the block layer, which currently can
> at max be 0x (32-bit).
>
> However, with a AF 4Kn drive, the derived limits would be 65535 *
> 64 * 4096 = 0x3fffc (34-bit). Therefore, we now devide
> ATA_MAX_TRIM_RNUM with (logical sector size / 512), so that the
> derived limits will not overflow.
>
> The limits are now also consistent among drives with different
> logical sector sizes. (Although that may or may not be what we
> want ultimately when the SCSI / block layer allows larger
> representation in the future.)
>
> Although 4Kn ATA SSDs may not be a thing on the market yet, this
> patch is necessary for forthcoming SCT Write Same translation
> support, which could be available on traditional HDDs where 4Kn is
> already a thing. Also it should not change the current behavior on
> drives with 512-byte logical sectors.
>
> Note: this patch is not about AF 512e drives.
> Signed-off-by: Tom Yan 
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index be9c76c..dcadcaf 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2295,6 +2295,7 @@ static unsigned int ata_scsiop_inq_89(struct 
> ata_scsi_args *args, u8 *rbuf)
>  static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
>  {
> u16 min_io_sectors;
> +   u32 sector_size;
>
> rbuf[1] = 0xb0;
> rbuf[3] = 0x3c; /* required VPD size with unmap support */
> @@ -2309,17 +2310,27 @@ static unsigned int ata_scsiop_inq_b0(struct 
> ata_scsi_args *args, u8 *rbuf)
> min_io_sectors = 1 << ata_id_log2_per_physical_sector(args->id);
> put_unaligned_be16(min_io_sectors, [6]);
>
> -   /*
> -* Optimal unmap granularity.
> -*
> -* The ATA spec doesn't even know about a granularity or alignment
> -* for the TRIM command.  We can leave away most of the unmap related
> -* VPD page entries, but we have specifify a granularity to signal
> -* that we support some form of unmap - in thise case via WRITE SAME
> -* with the unmap bit set.
> -*/
> +   sector_size = ata_id_logical_sector_size(args->id);
> if (ata_id_has_trim(args->id)) {
> -   put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM, [36]);
> +   /*
> +* Maximum write same length.
> +*
> +* Avoid overflow in discard_max_bytes and 
> write_same_max_bytes
> +* with AF 4Kn drives. Also make them consistent among drives
> +* with different logical sector sizes.
> +*/
> +   put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM /
> +  (sector_size / 512), [36]);

I think the existing fixups in sd_setup_discard_cmnd() and
sd_setup_write_same_cmnd()
are 'doing the right thing'.

If I understand the stack correctly:

libata-scsi.c (and sd.c) both report a maximum in terms of 512 byte sectors.
The upper layer stack works (mostly) on a mix of bytes and 512 byte sectors
agnostic of the underlying hardware ... mostly. There are some bits in the
files systems and block layer that are honoring the logical block size being
larger 512 bytes as all I/O being generated are multiples of the logical block
size as per block device's request_queue / queue_limits.

So regardless of a 4Kn device being able to handle an 8x larger I/O as per
the logical sector being bigger that's basically ignored, for convenience.

In the scsi upper layer as the command are being setup the shift from
512 to 'sector_size' is handled to the number of device sectors is
matched up to the request:

sector >>= ilog2(sdp->sector_size) - 9;
nr_sectors >>= ilog2(sdp->sector_size) - 9;

So if you correctly report number of logical sectors here you break
the 'fix' in sd.c

At least that is my understanding.

> +
> +   /*
> +* Optimal unmap granularity.
> +*
> +* The ATA spec doesn't even know about a granularity or 
> alignment
> +* for the TRIM command.  We can leave away most of the unmap 
> related
> +* VPD page entries, but we have specifify a granularity to 
> signal
> +* that we support some form of unmap - in thise case via 
> WRITE SAME
> +* with the unmap bit set.
> +*/
> put_unaligned_be32(1, [28]);
> }
>
> --
> 2.9.2
>

Regards,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands

2016-08-09 Thread Shaun Tancheff
> boundaries
> and the discard code will not split and align calls on the zones. But upper 
> layers
> (an FS or a device mapper) can still do all this by themselves if they 
> want/can
> support non-constant zone sizes.
>
> The only exception is drives like the Seagate one with only the last zone of a
> different size. This case is handled exactly as if all zones are the same size
> simply because any operation on the last smaller zone will naturally align as
> the checks of operation size against the drive capacity will do the right 
> things.
>
> The ioctls work for all cases (drive with constant zone size or not). This is 
> again
> to allow supporting eventual weird drives at application level. I integrated 
> all
> these ioctl into libzbc block device backend driver and everything is fine. 
> Can't
> tell the difference with direct-to-drive SG_IO accesses. But unlike these, 
> the zone
> ioctls keep the zone information RB-tree cache up to date.
>
>>
>> I will be updating my patchset accordingly.
>
> I need to cleanup my code and rebase on top of 4.8-rc1. Let me do this and I 
> will send
> everything for review. If you have any comment on the above, please let me 
> know and
> I will be happy to incorporate changes.
>
> Best regards.
>
>
> 
> Damien Le Moal, Ph.D.
> Sr. Manager, System Software Group, HGST Research,
> HGST, a Western Digital brand
> damien.lem...@hgst.com
> (+81) 0466-98-3593 (ext. 513593)
> 1 kirihara-cho, Fujisawa,
> Kanagawa, 252-0888 Japan
> www.hgst.com
>
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality 
> Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or 
> legally privileged information of WDC and/or its affiliates, and are intended 
> solely for the use of the individual or entity to which they are addressed. 
> If you are not the intended recipient, any disclosure, copying, distribution 
> or any action taken or omitted to be taken in reliance on it, is prohibited. 
> If you have received this e-mail in error, please notify the sender 
> immediately and delete the e-mail in its entirety from your system.
>



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands

2016-08-09 Thread Shaun Tancheff
On Tue, Aug 9, 2016 at 1:47 AM, Hannes Reinecke <h...@suse.de> wrote:
> On 08/05/2016 10:35 PM, Shaun Tancheff wrote:
>> On Tue, Aug 2, 2016 at 8:29 PM, Damien Le Moal <damien.lem...@hgst.com> 
>> wrote:
>>>> On Aug 2, 2016, at 23:35, Hannes Reinecke <h...@suse.de> wrote:
>>>> On 08/01/2016 07:07 PM, Shaun Tancheff wrote:
>>>>> On Mon, Aug 1, 2016 at 4:41 AM, Christoph Hellwig <h...@lst.de> wrote:

[trim]
>> Also the zone report is 'slow' in that there is an overhead for the
>> report itself but
>> the number of zones per query can be quite large so 4 or 5 I/Os that
>> run into the
>> hundreds if milliseconds to cache the entire drive isn't really unworkable 
>> for
>> something that is used infrequently.
>>
> No, surely not.
> But one of the _big_ advantages for the RB tree is blkdev_discard().
> Without the RB tree any mkfs program will issue a 'discard' for every
> sector. We will be able to coalesce those into one discard per zone, but
> we still need to issue one for _every_ zone.
> Which is (as indicated) really slow, and easily takes several minutes.
> With the RB tree we can short-circuit discards to empty zones, and speed
> up processing time dramatically.
> Sure we could be moving the logic into mkfs and friends, but that would
> require us to change the programs and agree on a library (libzbc?) which
> should be handling that.

Adding an additional library dependency seems overkill for a program
that is already doing ioctls and raw block I/O ... but I would leave that
up to each file system. As it sits issuing the ioctl and walking the array
of data returned [see blkreport.c] is already quite trivial.

I believe the goal here is for F2FS, and perhaps NILFS? to "just
work" with the DISCARD to Reset WP and zone cache in place.

Still quite skeptical about other common file systems
"just working" without their respective mkfs et. al. being
zone aware and handling the topology of the media at mkfs time.
Perhaps there is something I am unaware of?

[trim]

>> I can add finish zone ... but I really can't think of a use for it, myself.
>>
> Which is not the point. The standard defines this, so clearly someone
> found it a reasonable addendum. So let's add this for completeness.

Agreed and queued for the next version.

Regards,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands

2016-08-05 Thread Shaun Tancheff
On Tue, Aug 2, 2016 at 8:29 PM, Damien Le Moal <damien.lem...@hgst.com> wrote:
> Hannes, Shaun,
>
> Let me add some more comments.
>
>> On Aug 2, 2016, at 23:35, Hannes Reinecke <h...@suse.de> wrote:
>>
>> On 08/01/2016 07:07 PM, Shaun Tancheff wrote:
>>> On Mon, Aug 1, 2016 at 4:41 AM, Christoph Hellwig <h...@lst.de> wrote:
>>>>
>>>> Can you please integrate this with Hannes series so that it uses
>>>> his cache of the zone information?
>>>
>>> Adding Hannes and Damien to Cc.
>>>
>>> Christoph,
>>>
>>> I can make a patch the marshal Hannes' RB-Tree into to a block report, that 
>>> is
>>> quite simple. I can even have the open/close/reset zone commands update the
>>> RB-Tree .. the non-private parts anyway. I would prefer to do this around 
>>> the
>>> CONFIG_SD_ZBC support, offering the existing type of patch for setups that 
>>> do
>>> not need the RB-Tree to function with zoned media.

I have posted patches to integrate with the zone cache, hopefully they
make sense.

>>>
>>> I do still have concerns with the approach which I have shared in smaller
>>> forums but perhaps I have to bring them to this group.
>>>
>>> First is the memory consumption. This isn't really much of a concern for 
>>> large
>>> servers with few drives but I think the embedded NAS market will grumble as
>>> well as the large data pods trying to stuff 300+ drives in a chassis.
>>>
>>> As of now the RB-Tree needs to hold ~3 zones.
>>> sizeof() reports struct blk_zone to use 120 bytes on x86_64. This yields
>>> around 3.5 MB per zoned drive attached.
>>> Which is fine if it is really needed, but most of it is fixed information
>>> and it can be significantly condensed (I have proposed 8 bytes per zone held
>>> in an array as more than adequate). Worse is that the crucial piece of
>>> information, the current wp needed for scheduling the next write, is mostly
>>> out of date because it is updated only after the write completes and zones
>>> being actively written to must work off of the last location / size that was
>>> submitted, not completed. The work around is for that tracking to be handled
>>> in the private_data member. I am not saying that updating the wp on
>>> completing a write isn’t important, I am saying that the bi_end_io hook is
>>> the existing hook that works just fine.
>>>
>> Which _actually_ is not true; with my patches I'll update the write
>> pointer prior to submit the I/O (on the reasoning that most of the time
>> I/O will succeed) and re-read the zone information if an I/O failed.
>> (Which I'll have to do anyway as after an I/O failure the write pointer
>> status is not clearly defined.)

Apologies for my mis-characterization.

>> I have thought about condensing the RB tree information, but then I
>> figured that for 'real' SMR handling we cannot assume all zones are of
>> fixed size, and hence we need all the information there.
>> Any condensing method would assume a given structure of the zones, which
>> the standard just doesn't provide.
>> Or am I missing something here?
>
> Indeed, the standards do not mandate any particular zone configuration,
> constant zone size, etc. So writing code so that can be handled is certainly
> the right way of doing things. However, if we decide to go forward with
> mapping RESET WRITE POINTER command to DISCARD, then at least a constant
> zone size (minus the last zone as you said) must be assumed, and that
> information can be removed from the entries in the RB tree (as it will be
> saved for the sysfs "zone_size" file anyway. Adding a little code to handle
> that eventual last runt zone with a different size is not a big problem.

>> As for write pointer handling: yes, the write pointer on the zones is
>> not really useful for upper-level usage.
>> Where we do need it is to detect I/O which is crossing the write pointer
>> (eg when doing reads over the entire zone).
>> As per spec you will be getting an I/O error here, so we need to split
>> the I/O on the write pointer to get valid results back.
>
> To be precise here, the I/O splitting will be handled by the block layer
> thanks to the "chunk_sectors" setting. But that relies on a constant zone
> size assumption too.
>
> The RB-tree here is most useful for reads over or after the write pointer as
> this can have different behavior on different drives (URSWRZ bit). The RB-tree
> allows us to hide these differences to upper layers and

Re: [PATCH 37/45] drivers: use req op accessor

2016-08-04 Thread Shaun Tancheff
On Thu, Aug 4, 2016 at 10:46 AM, Christoph Hellwig <h...@infradead.org> wrote:
> On Wed, Aug 03, 2016 at 07:30:29PM -0500, Shaun Tancheff wrote:
>> I think the translation in loop.c is suspicious here:
>>
>> "if use DIO && not (a flush_flag or discard_flag)"
>> should translate to:
>> "if use DIO && not ((a flush_flag) || op == discard)"
>>
>> But in the patch I read:
>> "if use DIO && ((not a flush_flag) || op == discard)
>>
>> Which would have DIO && discards follow the AIO path?
>
> Indeed.  Sorry for missing out on your patch, I just sent a fix
> in reply to Dave's other report earlier which is pretty similar to
> yours.

No worries. I prefer your switch to a an if conditional here.

-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 37/45] drivers: use req op accessor

2016-08-03 Thread Shaun Tancheff
On Wed, Aug 3, 2016 at 6:47 PM, Mike Christie <mchri...@redhat.com> wrote:
> On 08/03/2016 05:33 PM, Ross Zwisler wrote:
>> On Sun, Jun 5, 2016 at 1:32 PM,  <mchri...@redhat.com> wrote:
>>> From: Mike Christie <mchri...@redhat.com>
>>>
>>> The req operation REQ_OP is separated from the rq_flag_bits
>>> definition. This converts the block layer drivers to
>>> use req_op to get the op from the request struct.
>>>
>>> Signed-off-by: Mike Christie <mchri...@redhat.com>
>>> ---
>>>  drivers/block/loop.c  |  6 +++---
>>>  drivers/block/mtip32xx/mtip32xx.c |  2 +-
>>>  drivers/block/nbd.c   |  2 +-
>>>  drivers/block/rbd.c   |  4 ++--
>>>  drivers/block/xen-blkfront.c  |  8 +---
>>>  drivers/ide/ide-floppy.c  |  2 +-
>>>  drivers/md/dm.c   |  2 +-
>>>  drivers/mmc/card/block.c  |  7 +++
>>>  drivers/mmc/card/queue.c  |  6 ++
>>
>> Dave Chinner reported a deadlock with XFS + DAX, which I reproduced
>> and bisected to this commit:
>>
>> commit c2df40dfb8c015211ec55f4b1dd0587f875c7b34
>> Author: Mike Christie <mchri...@redhat.com>
>> Date:   Sun Jun 5 14:32:17 2016 -0500
>> drivers: use req op accessor
>>
>> Here are the steps to reproduce the deadlock with a BRD ramdisk:
>>
>> mkfs.xfs -f /dev/ram0
>> mount -o dax /dev/ram0 /mnt/scratch
>
> When using ramdisks, we need the attached patch like in your other bug
> report. I think it will fix some hangs people are seeing.
>
> I do not think that it should cause the failure to run issue you saw
> when doing generic/008 and ext2.
>

I think the translation in loop.c is suspicious here:

"if use DIO && not (a flush_flag or discard_flag)"
should translate to:
"if use DIO && not ((a flush_flag) || op == discard)"

But in the patch I read:
"if use DIO && ((not a flush_flag) || op == discard)

Which would have DIO && discards follow the AIO path?

So I would humbly suggest something like the following
(on top of commit c2df40dfb8c015211ec55f4b1dd0587f875c7b34):
[Please excuse the messed up patch format ... gmail eats tabs]

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index b9b737c..0754d83 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1659,8 +1659,9 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
if (lo->lo_state != Lo_bound)
return -EIO;

-   if (lo->use_dio && (!(cmd->rq->cmd_flags & REQ_FLUSH) ||
-   req_op(cmd->rq) == REQ_OP_DISCARD))
+   if (lo->use_dio && !(
+   (cmd->rq->cmd_flags & REQ_FLUSH) ||
+req_op(cmd->rq) == REQ_OP_DISCARD))
cmd->use_aio = true;
else
cmd->use_aio = false;

-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Enable support for Seagate HostAware drives (testing).

2016-08-03 Thread Shaun Tancheff
Seagate drives report a SAME code of 0 due to having:
  - Zones of different types (CMR zones at the low LBA space).
  - Zones of different size (A terminating 'runt' zone in the high lba space).

Support loading the zone topology into the sd_zbc zone cache.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>

Cc: Hannes Reinecke <h...@suse.de>
Cc: Damien Le Moal <damien.lem...@hgst.com>
---
v1:
 - Updated kernel version / re-sync with Hannes' zac.v3 branch.
---
 drivers/scsi/sd.c |  22 
 drivers/scsi/sd.h |  20 +--
 drivers/scsi/sd_zbc.c | 150 ++
 3 files changed, 155 insertions(+), 37 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7c38975..5fbc599 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -694,8 +694,13 @@ static void sd_config_discard(struct scsi_disk *sdkp, 
unsigned int mode)
break;
 
case SD_ZBC_RESET_WP:
-   max_blocks = sdkp->unmap_granularity;
q->limits.discard_zeroes_data = 1;
+   q->limits.discard_granularity =
+   sd_zbc_discard_granularity(sdkp);
+
+   max_blocks = min_not_zero(sdkp->unmap_granularity,
+ q->limits.discard_granularity >>
+   ilog2(logical_block_size));
break;
 
case SD_LBP_ZERO:
@@ -1952,13 +1957,12 @@ static int sd_done(struct scsi_cmnd *SCpnt)
good_bytes = blk_rq_bytes(req);
scsi_set_resid(SCpnt, 0);
} else {
-#ifdef CONFIG_SCSI_ZBC
if (op == ZBC_OUT)
/* RESET WRITE POINTER failed */
sd_zbc_update_zones(sdkp,
blk_rq_pos(req),
-   512, true);
-#endif
+   512, SD_ZBC_RESET_WP_ERR);
+
good_bytes = 0;
scsi_set_resid(SCpnt, blk_rq_bytes(req));
}
@@ -2031,7 +2035,6 @@ static int sd_done(struct scsi_cmnd *SCpnt)
good_bytes = blk_rq_bytes(req);
scsi_set_resid(SCpnt, 0);
}
-#ifdef CONFIG_SCSI_ZBC
/*
 * ZBC: Unaligned write command.
 * Write did not start a write pointer position.
@@ -2039,8 +2042,7 @@ static int sd_done(struct scsi_cmnd *SCpnt)
if (sshdr.ascq == 0x04)
sd_zbc_update_zones(sdkp,
blk_rq_pos(req),
-   512, true);
-#endif
+   512, SD_ZBC_WRITE_ERR);
}
break;
default:
@@ -2267,7 +2269,7 @@ static void sd_read_zones(struct scsi_disk *sdkp, 
unsigned char *buffer)
 * supports equal zone sizes.
 */
same = buffer[4] & 0xf;
-   if (same == 0 || same > 3) {
+   if (same > 3) {
sd_printk(KERN_WARNING, sdkp,
  "REPORT ZONES SAME type %d not supported\n", same);
return;
@@ -2279,9 +2281,9 @@ static void sd_read_zones(struct scsi_disk *sdkp, 
unsigned char *buffer)
sdkp->unmap_granularity = zone_len;
blk_queue_chunk_sectors(sdkp->disk->queue,
logical_to_sectors(sdkp->device, zone_len));
-   sd_config_discard(sdkp, SD_ZBC_RESET_WP);
 
-   sd_zbc_setup(sdkp, buffer, SD_BUF_SIZE);
+   sd_zbc_setup(sdkp, zone_len, buffer, SD_BUF_SIZE);
+   sd_config_discard(sdkp, SD_ZBC_RESET_WP);
 }
 
 static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device 
*sdp,
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 6ae4505..ef6c132 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -283,19 +283,24 @@ static inline void sd_dif_complete(struct scsi_cmnd *cmd, 
unsigned int a)
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
+
+#define SD_ZBC_INIT0
+#define SD_ZBC_RESET_WP_ERR1
+#define SD_ZBC_WRITE_ERR   2
+
 #ifdef CONFIG_SCSI_ZBC
 
 extern int sd_zbc_report_zones(struct scsi_disk *, unsigned char *, int,
   sector_t, enum zbc_zone_reporting_options, bool);
-extern int sd_zbc_setup(struct scsi_disk *, char *, int);
+extern int sd_zbc_setup(struct scsi_disk *, u64 zlen, char *buf, int buf_len);
 extern void sd_zbc_remove(struct scsi_disk *);
 extern void sd_zbc_reset_zones(struct scsi_disk *);
 extern int sd_zbc_setup_discard(struct scsi_disk *, struct request *,
sector_t, unsigned int);

[PATCH 1/2] bio/zbc support for zone cache

2016-08-03 Thread Shaun Tancheff
Zone actions (Open/Close/Reset) update zone cache on success.

Add helpers for
- Zone actions to update zone cache
- Zone report to translate cache to ZBC format structs

Update blkreport to pull from zone cache instead of querying media.

Added open explicit and closed states for zone cache

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>

Cc: Hannes Reinecke <h...@suse.de>
Cc: Damien Le Moal <damien.lem...@hgst.com>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Sagi Grimberg <sa...@mellanox.com>
Cc: Mike Christie <mchri...@redhat.com>
Cc: Toshi Kani <toshi.k...@hpe.com>
Cc: Kent Overstreet <kent.overstr...@gmail.com>
Cc: Ming Lei <ming@canonical.com>

---
 block/blk-lib.c|   3 +-
 block/blk-zoned.c  | 190 +
 block/ioctl.c  |  39 +++---
 include/linux/blkdev.h |  14 +++-
 4 files changed, 234 insertions(+), 12 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 6dcdcbf..92898ec 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -6,7 +6,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "blk.h"
 
@@ -358,6 +357,8 @@ int blkdev_issue_zone_action(struct block_device *bdev, 
unsigned int op,
bio_set_op_attrs(bio, op, op_flags);
ret = submit_bio_wait(bio);
bio_put(bio);
+   if (ret == 0)
+   update_zone_state(bdev, sector, op);
return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_zone_action);
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 975e863..799676b 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -68,3 +68,193 @@ void blk_drop_zones(struct request_queue *q)
q->zones = RB_ROOT;
 }
 EXPORT_SYMBOL_GPL(blk_drop_zones);
+
+static void __set_zone_state(struct blk_zone *zone, int op)
+{
+   if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
+   return;
+
+   switch (op) {
+   case REQ_OP_ZONE_OPEN:
+   zone->state = BLK_ZONE_OPEN_EXPLICIT;
+   break;
+   case REQ_OP_ZONE_CLOSE:
+   zone->state = BLK_ZONE_CLOSED;
+   break;
+   case REQ_OP_ZONE_RESET:
+   zone->wp = zone->start;
+   break;
+   default:
+   WARN_ONCE(1, "%s: invalid op code: %u\n", __func__, op);
+   }
+}
+
+void update_zone_state(struct block_device *bdev, sector_t lba, unsigned int 
op)
+{
+   struct request_queue *q = bdev_get_queue(bdev);
+   struct blk_zone *zone = NULL;
+
+   if (lba == ~0ul) {
+   struct rb_node *node;
+
+   for (node = rb_first(>zones); node; node = rb_next(node)) {
+   zone = rb_entry(node, struct blk_zone, node);
+   __set_zone_state(zone, op);
+   }
+   return;
+   }
+   zone = blk_lookup_zone(q, lba);
+   if (zone)
+   __set_zone_state(zone, op);
+}
+EXPORT_SYMBOL_GPL(update_zone_state);
+
+void bzrpt_fill(struct block_device *bdev, struct bdev_zone_report *bzrpt,
+   size_t sz, sector_t lba, u8 opt)
+{
+   u64 clen = ~0ul;
+   struct blk_zone *zone = NULL;
+   struct rb_node *node = NULL;
+   struct request_queue *q = bdev_get_queue(bdev);
+   u32 max_entries = (sz - sizeof(struct bdev_zone_report))
+   /  sizeof(struct bdev_zone_descriptor);
+   u32 entry;
+   int len_diffs = 0;
+   int type_diffs = 0;
+   u8 ctype;
+   u8 same = 0;
+
+   zone = blk_lookup_zone(q, lba);
+   if (zone)
+   node = >node;
+
+   for (entry = 0;
+entry < max_entries && node;
+entry++, node = rb_next(node)) {
+   u64 wp;
+   u8 cond = 0;
+   u8 flgs = 0;
+
+   zone = rb_entry(node, struct blk_zone, node);
+   if (blk_zone_is_cmr(zone))
+   wp = zone->start + zone->len;
+   else
+   wp = zone->wp;
+
+   bzrpt->descriptors[entry].lba_start = cpu_to_be64(zone->start);
+   bzrpt->descriptors[entry].length = cpu_to_be64(zone->len);
+   bzrpt->descriptors[entry].type = zone->type;
+   bzrpt->descriptors[entry].lba_wptr = cpu_to_be64(wp);
+
+   switch (zone->state) {
+   case BLK_ZONE_NO_WP:
+   cond = ZCOND_CONVENTIONAL;
+   break;
+   case BLK_ZONE_OPEN:
+   cond = ZCOND_ZC2_OPEN_IMPLICIT;
+   break;
+   case BLK_ZONE_OPEN_EXPLICIT:
+   cond = ZCOND_ZC3_OPEN_EXPLICIT;
+   break;
+   case BLK_ZONE_CLOSED:
+   cond = ZCOND_ZC4_CLOSED;
+   break;
+   case BLK_ZONE_READONLY:
+   

Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands

2016-08-01 Thread Shaun Tancheff
On Mon, Aug 1, 2016 at 4:41 AM, Christoph Hellwig <h...@lst.de> wrote:
>
> Can you please integrate this with Hannes series so that it uses
> his cache of the zone information?

Adding Hannes and Damien to Cc.

Christoph,

I can make a patch the marshal Hannes' RB-Tree into to a block report, that is
quite simple. I can even have the open/close/reset zone commands update the
RB-Tree .. the non-private parts anyway. I would prefer to do this around the
CONFIG_SD_ZBC support, offering the existing type of patch for setups that do
not need the RB-Tree to function with zoned media.

I do still have concerns with the approach which I have shared in smaller
forums but perhaps I have to bring them to this group.

First is the memory consumption. This isn't really much of a concern for large
servers with few drives but I think the embedded NAS market will grumble as
well as the large data pods trying to stuff 300+ drives in a chassis.

As of now the RB-Tree needs to hold ~3 zones.
sizeof() reports struct blk_zone to use 120 bytes on x86_64. This yields
around 3.5 MB per zoned drive attached.
Which is fine if it is really needed, but most of it is fixed information
and it can be significantly condensed (I have proposed 8 bytes per zone held
in an array as more than adequate). Worse is that the crucial piece of
information, the current wp needed for scheduling the next write, is mostly
out of date because it is updated only after the write completes and zones
being actively written to must work off of the last location / size that was
submitted, not completed. The work around is for that tracking to be handled
in the private_data member. I am not saying that updating the wp on
completing a write isn’t important, I am saying that the bi_end_io hook is
the existing hook that works just fine.

This all tails into domain responsability. With the RB-Tree doing half of the
work and the ‘responsible’ domain handling the active path via private_data
why have the split at all? It seems to be a double work to have second object
tracking the first so that I/O scheduling can function.

Finally is the error handling path when the RB-Tree encounters and error it
attempts to requery the drive topology virtually guaranteeing that the
private_data is now out-of-sync with the RB-Tree. Again this is something
that can be better encapsulated in the bi_end_io to be informed of the
failed I/O and schedule the appropriate recovery (including re-querying the
zone information of the affected zone(s)).

Anyway those are my concerns and why I am still reluctant to drop this line of
support. I have incorporated Hannes changes at various points. Hence the
SCT Write Same to attempt to work around some of the flaws in mapping
discard to reset write pointer.

Thanks and Regards,
Shaun

> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  
> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html=CwIBAg=IGDlg0lD0b-nebmJJ0Kp8A=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA=0ZPyN4vfYZXSmuCmIm3wpExF1K28PYO9KmgcqDsfQBg=aiguzw5_op7woZCZ5Qi7c36b16SxiWTJXshN0dG3Xyo=



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] sd: configure ZBC devices

2016-08-01 Thread Shaun Tancheff
  * Adjust 'chunk_sectors' to the zone length if the device
> +* supports equal zone sizes.
> +*/
> +   same = buffer[4] & 0xf;
> +   if (same == 0 || same > 3) {
> +   sd_printk(KERN_WARNING, sdkp,
> + "REPORT ZONES SAME type %d not supported\n", same);
> +   return;
> +   }

It's a bit unfortunate that you abort here. The current Seagate Host
Aware drives
must report a same code of 0 here due to the final 'runt' zone and are therefore
not supported by your RB-Tree in the following patches.

> +   /* Read the zone length from the first zone descriptor */
> +   desc = [64];
> +   zone_len = logical_to_sectors(sdkp->device,
> + get_unaligned_be64([8]));
> +   blk_queue_chunk_sectors(sdkp->disk->queue, zone_len);
> +}
> +
>  static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device 
> *sdp,
> struct scsi_sense_hdr *sshdr, int sense_valid,
> int the_result)

-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/2] Block layer support ZAC/ZBC commands

2016-07-29 Thread Shaun Tancheff
Hi Jens,

This series is based on linus' current tip after the merge of 'for-4.8/core'

As Host Aware drives are becoming available we would like to be able
to make use of such drives. This series is also intended to be suitable
for use by Host Managed drives.

ZAC/ZBC drives add new commands for discovering and working with Zones.

This extends the ZAC/ZBC support up to the block layer allowing direct control
by file systems or device mapper targets. Also by deferring the zone handling
to the authoritative subsystem there is an overall lower memory usage for
holding the active zone information as well as clarifying responsible party
for maintaining the write pointer for each active zone.
By way of example a DM target may have several writes in progress. To sector
(or lba) for those writes will each depend on the previous write. While the
drive's write pointer will be updated as writes are completed the DM target
will be maintaining both where the next write should be scheduled from and 
where the write pointer is based on writes completed w/o errors.
Knowing the drive's zone topology enables DM targets and file systems to
extend their block allocation schemes and issue write pointer resets (or 
discards) that are zone aligned.
A perhaps non-obvious approach is that a conventional drive will 
returns a zone report descriptor with a single large conventional zone.

Patches for util-linux can be found here:
https://github.com/Seagate/ZDM-Device-Mapper/tree/master/patches/util-linux

This patch is available here:
https://github.com/stancheff/linux/tree/zbc.bio.v6

g...@github.com:stancheff/linux.git zbc.bio.v6

v6:
 - Fix page alloc to include DMA flag for ioctl.
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
 - Dropped ata16 hackery
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
 - Use zoned report reserved bit for ata-passthrough flag.

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Moved include/uapi/linux/fs.h changes to patch 3
 - Fixed commit message for first patch in series.

Shaun Tancheff (2):
  Add bio/request flags to issue ZBC/ZAC commands
  Add ioctl to issue ZBC/ZAC commands via block layer

 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  95 
 block/ioctl.c | 110 +++
 drivers/scsi/sd.c | 118 
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   7 +-
 include/linux/blk_types.h |   6 +-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 220 ++
 include/uapi/linux/fs.h   |   1 +
 11 files changed, 591 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 1/2] Add bio/request flags to issue ZBC/ZAC commands

2016-07-29 Thread Shaun Tancheff
Add op flags to access to zone information as well as open, close
and reset zones:
  - REQ_OP_ZONE_REPORT - Query zone information (Report zones)
  - REQ_OP_ZONE_OPEN - Explictly open a zone for writing
  - REQ_OP_ZONE_CLOSE - Explictly close a zone
  - REQ_OP_ZONE_RESET - Reset Write Pointer to start of zone

These op flags can be used to create bio's to control zoned devices
through the block layer.

This is useful for filesystems and device mappers that need explicit
control of zoned devices such as Host Mananged and Host Aware SMR drives,

Report zones is a device read that requires a buffer.

Open, Close and Reset are device commands that have no associated
data transfer. Sending an LBA of ~0 will attempt to operate on all
zones. This is typically used with Reset to wipe a drive as a Reset
behaves similar to TRIM in that all data in the zone(s) is deleted.

The Finish zone command is intentionally not implimented as there is no
current use case for that operation.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option flag is useful as it can reduce the number
of zones in each report, but not critical.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.
---
 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  95 +
 drivers/scsi/sd.c | 118 +
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   7 +-
 include/linux/blk_types.h |   6 +-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 214 ++
 9 files changed, 474 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 771c31c..32f5598 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12785,6 +12785,15 @@ F: Documentation/networking/z8530drv.txt
 F: drivers/net/hamradio/*scc.c
 F: drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M: Shaun Tancheff <shaun.tanch...@seagate.com>
+W: http://seagate.com
+W: https://github.com/Seagate/ZDM-Device-Mapper
+L: linux-bl...@vger.kernel.org
+S: Maintained
+F: include/linux/blkzoned_api.h
+F: include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M: Seth Jennings <sjenn...@redhat.com>
 L: linux...@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..6dcdcbf 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 
@@ -266,3 +267,97 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:  target blockdev
+ * @op_flags:  extra bio rw flags. If unsure, use 0.
+ * @sector:starting sector (report will include this sector).
+ * @opt:   See: zone_report_option, default is 0 (all zones).
+ * @page:  one or more contiguous pages.
+ * @pgsz:  up to size of page in bytes, size of report.
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+sector_t sector, u8 opt, struct page *page,
+size_t pgsz, gfp_t gfp_mask)
+{
+   struct bdev_zone_report *conv = page_address(page);
+   struct bio *bio;
+   unsigned int nr_iovecs = 1;
+   int ret = 0;
+
+   if (pgsz < (sizeof(struct bdev_zone_report) +
+   sizeof(struct bdev_zone_descriptor)))
+   return -EINVAL;
+
+   bio = bio_alloc(gfp_mask, nr_iovecs);
+   if (!bio)
+   return -ENOMEM;
+
+   conv->descriptor_count = 0;
+   bio->bi_iter.bi_sector = sector;
+   bio->bi_bdev = bdev;
+   bio->bi_vcnt = 0;
+   bio->bi_iter.bi_size = 0;
+
+   bio_add_page(bio, page, pgsz, 0);
+   bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT

[PATCH v6 2/2] Add ioctl to issue ZBC/ZAC commands via block layer

2016-07-29 Thread Shaun Tancheff
Add support for ZBC ioctl's
BLKREPORT- Issue Report Zones to device.
BLKOPENZONE  - Issue Zone Action: Open Zone command.
BLKCLOSEZONE - Issue Zone Action: Close Zone command.
BLKRESETZONE - Issue Zone Action: Reset Zone command.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v6:
 - Added GFP_DMA to gfp mask.
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
---
 block/ioctl.c | 110 ++
 include/uapi/linux/blkzoned_api.h |   6 +++
 include/uapi/linux/fs.h   |   1 +
 3 files changed, 117 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..a2a6c2c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -194,6 +195,109 @@ int blkdev_reread_part(struct block_device *bdev)
 }
 EXPORT_SYMBOL(blkdev_reread_part);
 
+static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode,
+   void __user *parg)
+{
+   int error = -EFAULT;
+   gfp_t gfp = GFP_KERNEL | GFP_DMA;
+   struct bdev_zone_report_io *zone_iodata = NULL;
+   int order = 0;
+   struct page *pgs = NULL;
+   u32 alloc_size = PAGE_SIZE;
+   unsigned long op_flags = 0;
+   u8 opt = 0;
+
+   if (!(mode & FMODE_READ))
+   return -EBADF;
+
+   zone_iodata = (void *)get_zeroed_page(gfp);
+   if (!zone_iodata) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   if (copy_from_user(zone_iodata, parg, sizeof(*zone_iodata))) {
+   error = -EFAULT;
+   goto report_zones_out;
+   }
+   if (zone_iodata->data.in.return_page_count > alloc_size) {
+   int npages;
+
+   alloc_size = zone_iodata->data.in.return_page_count;
+   npages = (alloc_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+   pgs = alloc_pages(gfp, ilog2(npages));
+   if (pgs) {
+   void *mem = page_address(pgs);
+
+   if (!mem) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   order = ilog2(npages);
+   memset(mem, 0, alloc_size);
+   memcpy(mem, zone_iodata, sizeof(*zone_iodata));
+   free_page((unsigned long)zone_iodata);
+   zone_iodata = mem;
+   } else {
+   /* Result requires DMA capable memory */
+   pr_err("Not enough memory available for request.\n");
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   }
+   opt = zone_iodata->data.in.report_option;
+   error = blkdev_issue_zone_report(bdev, op_flags,
+   zone_iodata->data.in.zone_locator_lba, opt,
+   pgs ? pgs : virt_to_page(zone_iodata),
+   alloc_size, GFP_KERNEL);
+
+   if (error)
+   goto report_zones_out;
+
+   if (copy_to_user(parg, zone_iodata, alloc_size))
+   error = -EFAULT;
+
+report_zones_out:
+   if (pgs)
+   __free_pages(pgs, order);
+   else if (zone_iodata)
+   free_page((unsigned long)zone_iodata);
+   return error;
+}
+
+static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+   unsigned int op = 0;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   /*
+* When acting on zones we explicitly disallow using a partition.
+*/
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
+   switch (cmd) {
+   case BLKOPENZONE:
+   op = REQ_OP_ZONE_OPEN;
+   break;
+   case BLKCLOSEZONE:
+   op = REQ_OP_ZONE_CLOSE;
+   break;
+   case BLKRESETZONE:
+   op = REQ_OP_ZONE_RESET;
+   break;
+   default:
+   pr_err("%s: Unknown action: %u\n", __func__, cmd);
+   return -EINVAL;
+   }
+   return blkdev_issue_zone_action(bdev, op, 0, arg, GFP_KERNEL);
+}
+
 static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
unsigned long arg, unsigned long flags)
 {
@@ -568,6 +672,12 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
case BLKTRACESETUP:
case BLKTRACETEARDOWN:
return blk_trace_ioctl(bdev, cmd, argp);
+   case BLKREPORT:
+   return blk_zoned_report_ioctl(bdev, mode, argp);
+   ca

Re: [PATCH v3] Add support for SCT Write Same

2016-06-21 Thread Shaun Tancheff
On Tue, Jun 21, 2016 at 9:43 PM, Martin K. Petersen
<martin.peter...@oracle.com> wrote:
>>>>>> "Shaun" == Shaun Tancheff <sh...@tancheff.com> writes:
>
> Shaun> SATA drives may support write same via SCT. This is useful for
> Shaun> setting the drive contents to a specific pattern (0's).
>
> As indicated a while back, my preference would be for you to add support
> for REPORT SUPPORTED OPERATION CODES. It's fine that you keep the RSOC
> response simple and only list WRITE SAME(10/16). But I want to avoid
> having different heuristics for libata's SCSI-ATA translation and for
> hardware controller ditto.
>
> Shaun> If UNMAP is not set or TRIM is not available
>
> Please do not conflate the two. We have the appropriate fallbacks at the
> block layer. It happens to be the same command descriptor but it is two
> very different implementations at the device level.
>
> If the UNMAP bit is set you need to issue a DSM TRIM. If the device does
> not support TRIM you need to return ILLEGAL REQUEST/INVALID FIELD IN
> CDB.
>
> If the UNMAP bit is not set then it's a regular WRITE SAME and should be
> issued using SCT WRITE SAME. If the device does not support SCT WRITE
> SAME you need to return ILLEGAL REQUEST/INVALID FIELD IN CDB.

Thanks for the clarification and the review.
I will work on support for REPORT SUPPORTED OPERATION CODES and
handle the WRITE SAME following the UNMAP as you described.

Thanks!

> --
> Martin K. Petersen  Oracle Linux Engineering



-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/2] Add bio/request flags for using ZBC/ZAC commands

2016-06-20 Thread Shaun Tancheff
T10 ZBC and T13 ZAC specify operations for Zoned devices.

To be able to access the zone information and open and close zones
adding op's for:
  - Report zones command: REQ_OP_ZONE_REPORT
  - Open zone: REQ_OP_ZONE_OPEN
  - Close zone: REQ_OP_ZONE_CLOSE
  - Reset Write Pointer: REQ_OP_ZONE_RESET
to be used to create struct bio / struct request to issue
ZBC commands.

Report zones is a device read that requires a buffer.
Open, Close and Reset are device commands that have no associated
data transfer.

The Finish zone command is intentionally not implimented as there is no
current use case for that operation.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option flag is useful as it can reduce the number
of zones in each report, but not critical.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.
---
 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  98 +
 drivers/scsi/sd.c | 118 +
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   7 +-
 include/linux/blk_types.h |   6 +-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 214 ++
 9 files changed, 477 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index d174e34..280f87b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12815,6 +12815,15 @@ F: Documentation/networking/z8530drv.txt
 F: drivers/net/hamradio/*scc.c
 F: drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M: Shaun Tancheff <shaun.tanch...@seagate.com>
+W: http://seagate.com
+W: https://github.com/Seagate/ZDM-Device-Mapper
+L: linux-bl...@vger.kernel.org
+S: Maintained
+F: include/linux/blkzoned_api.h
+F: include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M: Seth Jennings <sjenn...@redhat.com>
 L: linux...@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 8e24f5e..913ac00 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 
@@ -261,3 +262,100 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:  target blockdev
+ * @op_flags:  extra bio rw flags. If unsure, use 0.
+ * @sector:starting sector (report will include this sector).
+ * @opt:   See: zone_report_option, default is 0 (all zones).
+ * @page:  one or more contiguous pages.
+ * @pgsz:  up to size of page in bytes, size of report.
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+sector_t sector, u8 opt, struct page *page,
+size_t pgsz, gfp_t gfp_mask)
+{
+   struct bdev_zone_report *conv = page_address(page);
+   struct bio *bio;
+   unsigned int nr_iovecs = 1;
+   int ret = 0;
+
+   if (pgsz < (sizeof(struct bdev_zone_report) +
+   sizeof(struct bdev_zone_descriptor)))
+   return -EINVAL;
+
+   bio = bio_alloc(gfp_mask, nr_iovecs);
+   if (!bio)
+   return -ENOMEM;
+
+   conv->descriptor_count = 0;
+   bio->bi_iter.bi_sector = sector;
+   bio->bi_bdev = bdev;
+   bio->bi_vcnt = 0;
+   bio->bi_iter.bi_size = 0;
+
+   /* FUTURE ... when streamid is available: */
+   /* bio_set_streamid(bio, opt); */
+
+   bio_add_page(bio, page, pgsz, 0);
+   bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, op_flags);
+   ret = submit_bio_wait(bio);
+
+   /*
+* When our request it nak'd the underlying device maybe conventional
+* so ... report a single conventional zone the size of the device.
+*/
+   if (ret == -EIO && conv->descr

[PATCH v5 2/2] Add ioctl to issue ZBC/ZAC commands via block layer

2016-06-20 Thread Shaun Tancheff
Add New ioctl types
BLKREPORT- Issue Report Zones to device.
BLKOPENZONE  - Issue an Zone Action: Open Zone command.
BLKCLOSEZONE - Issue an Zone Action: Close Zone command.
BLKRESETZONE - Issue an Zone Action: Reset Zone command.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
---
 block/ioctl.c | 110 ++
 include/uapi/linux/blkzoned_api.h |   6 +++
 include/uapi/linux/fs.h   |   1 +
 3 files changed, 117 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..97e685e 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -194,6 +195,109 @@ int blkdev_reread_part(struct block_device *bdev)
 }
 EXPORT_SYMBOL(blkdev_reread_part);
 
+static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode,
+   void __user *parg)
+{
+   int error = -EFAULT;
+   gfp_t gfp = GFP_KERNEL;
+   struct bdev_zone_report_io *zone_iodata = NULL;
+   int order = 0;
+   struct page *pgs = NULL;
+   u32 alloc_size = PAGE_SIZE;
+   unsigned long op_flags = 0;
+   u8 opt = 0;
+
+   if (!(mode & FMODE_READ))
+   return -EBADF;
+
+   zone_iodata = (void *)get_zeroed_page(gfp);
+   if (!zone_iodata) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   if (copy_from_user(zone_iodata, parg, sizeof(*zone_iodata))) {
+   error = -EFAULT;
+   goto report_zones_out;
+   }
+   if (zone_iodata->data.in.return_page_count > alloc_size) {
+   int npages;
+
+   alloc_size = zone_iodata->data.in.return_page_count;
+   npages = (alloc_size + PAGE_SIZE - 1) / PAGE_SIZE;
+   order =  ilog2(roundup_pow_of_two(npages));
+   pgs = alloc_pages(gfp, order);
+   if (pgs) {
+   void *mem = page_address(pgs);
+
+   if (!mem) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   memset(mem, 0, alloc_size);
+   memcpy(mem, zone_iodata, sizeof(*zone_iodata));
+   free_page((unsigned long)zone_iodata);
+   zone_iodata = mem;
+   } else {
+   /* Result requires DMA capable memory */
+   pr_err("Not enough memory available for request.\n");
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   }
+   opt = zone_iodata->data.in.report_option;
+   error = blkdev_issue_zone_report(bdev, op_flags,
+   zone_iodata->data.in.zone_locator_lba, opt,
+   pgs ? pgs : virt_to_page(zone_iodata),
+   alloc_size, GFP_KERNEL);
+
+   if (error)
+   goto report_zones_out;
+
+   if (copy_to_user(parg, zone_iodata, alloc_size))
+   error = -EFAULT;
+
+report_zones_out:
+   if (pgs)
+   __free_pages(pgs, order);
+   else if (zone_iodata)
+   free_page((unsigned long)zone_iodata);
+   return error;
+}
+
+static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+   unsigned int op = 0;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   /*
+* When acting on zones we explicitly disallow using a partition.
+*/
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
+   switch (cmd) {
+   case BLKOPENZONE:
+   op = REQ_OP_ZONE_OPEN;
+   break;
+   case BLKCLOSEZONE:
+   op = REQ_OP_ZONE_CLOSE;
+   break;
+   case BLKRESETZONE:
+   op = REQ_OP_ZONE_RESET;
+   break;
+   default:
+   pr_err("%s: Unknown action: %u\n", __func__, cmd);
+   return -EINVAL;
+   }
+   return blkdev_issue_zone_action(bdev, op, 0, arg, GFP_KERNEL);
+}
+
 static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
unsigned long arg, unsigned long flags)
 {
@@ -568,6 +672,12 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
case BLKTRACESETUP:
case BLKTRACETEARDOWN:
return blk_trace_ioctl(bdev, cmd, argp);
+   case BLKREPORT:
+   return blk_zoned_report_ioctl(bdev, mode, argp);
+   case BLKOPENZONE:
+   case B

[PATCH v5 0/2] Block layer support ZAC/ZBC commands

2016-06-20 Thread Shaun Tancheff
Hi Jens,

This series is on linux-next tag next-20160617.

As Host Aware drives are becoming available we would like to be able
to make use of such drives. This series is also intended to be suitable
for use by Host Managed drives.

ZAC/ZBC drives add new commands for discovering and working with Zones.

This extends the ZAC/ZBC support up to the block layer.

Patches for util-linux can be found here:
https://github.com/Seagate/ZDM-Device-Mapper/tree/master/patches/util-linux

Using BIOs to issue ZBC commands allows DM targets (such as ZDM) or
file-systems such as f2fs, btrfs or nilfs2 to extend their block
allocation schemes and/or issue discards that are zone aligned.

A perhaps non-obvious approach is that a conventional drive will 
returns a zone report descriptor with a single large conventional zone.

This patch is also at
https://github.com/stancheff/linux.git
g...@github.com:stancheff/linux.git
branch: next-20160617+bio.zbc.v5

v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
 - Dropped ata16 hackery
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
 - Use zoned report reserved bit for ata-passthrough flag.

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Moved include/uapi/linux/fs.h changes to patch 3
 - Fixed commit message for first patch in series.

Shaun Tancheff (2):
  Add bio/request flags for using ZBC/ZAC commands
  Add ioctl to issue ZBC/ZAC commands via block layer

 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  98 +
 block/ioctl.c | 110 +++
 drivers/scsi/sd.c | 118 
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   7 +-
 include/linux/blk_types.h |   6 +-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 220 ++
 include/uapi/linux/fs.h   |   1 +
 11 files changed, 594 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] Add support for SCT Write Same

2016-06-20 Thread Shaun Tancheff
SATA drives may support write same via SCT. This is useful
for setting the drive contents to a specific pattern (0's).

If UNMAP is not set or TRIM is not available then
fall back to SCT WRITE SAME, if it is available.

In this way it would be possible to mimic lbprz for devices that 
support TRIM but fail to zero blocks reliably. For example a
file-system or DM target could issue a write same w/o unmap followed
by an trim.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v3:
 - Demux UNMAP/TRIM from WRITE SAME
v2:
 - Remove fugly ata hacking from sd.c
---
 drivers/ata/libata-scsi.c  | 80 +++---
 drivers/scsi/sd.c  |  2 +-
 include/linux/ata.h| 43 +
 include/scsi/scsi_device.h |  1 +
 4 files changed, 106 insertions(+), 20 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index bfec66f..3dcc29e 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1204,6 +1204,9 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
if (!ata_id_has_unload(dev->id))
dev->flags |= ATA_DFLAG_NO_UNLOAD;
 
+   if (ata_id_sct_write_same(dev->id))
+   sdev->sct_write_same = 1;
+
/* configure max sectors */
blk_queue_max_hw_sectors(q, dev->max_sectors);
 
@@ -3272,6 +3275,7 @@ static unsigned int ata_scsi_write_same_xlat(struct 
ata_queued_cmd *qc)
struct ata_taskfile *tf = >tf;
struct scsi_cmnd *scmd = qc->scsicmd;
struct ata_device *dev = qc->dev;
+   struct scatterlist *sg;
const u8 *cdb = scmd->cmnd;
u64 block;
u32 n_block;
@@ -3279,6 +3283,8 @@ static unsigned int ata_scsi_write_same_xlat(struct 
ata_queued_cmd *qc)
void *buf;
u16 fp;
u8 bp = 0xff;
+   u8 unmap = cdb[1] & 0x8;
+   bool use_sct = (unmap && ata_id_has_trim(dev->id)) ? false : true;
 
/* we may not issue DMA commands if no DMA mode is set */
if (unlikely(!dev->dma_mode))
@@ -3290,8 +3296,14 @@ static unsigned int ata_scsi_write_same_xlat(struct 
ata_queued_cmd *qc)
}
scsi_16_lba_len(cdb, , _block);
 
-   /* for now we only support WRITE SAME with the unmap bit set */
-   if (unlikely(!(cdb[1] & 0x8))) {
+   /* effectivly ignore had_trim if NOTRIM horkage is flagged */
+   if (dev->horkage & ATA_HORKAGE_NOTRIM)
+   use_sct = true;
+
+   /*
+* If use_sct and SCT write same is not available then fail.
+*/
+   if (use_sct && !ata_id_sct_write_same(dev->id)) {
fp = 1;
bp = 3;
goto invalid_fld;
@@ -3304,26 +3316,56 @@ static unsigned int ata_scsi_write_same_xlat(struct 
ata_queued_cmd *qc)
if (!scsi_sg_count(scmd))
goto invalid_param_len;
 
-   buf = page_address(sg_page(scsi_sglist(scmd)));
-   size = ata_set_lba_range_entries(buf, 512, block, n_block);
+   sg = scsi_sglist(scmd);
+   buf = page_address(sg_page(sg)) + sg->offset;
 
-   if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
-   /* Newer devices support queued TRIM commands */
-   tf->protocol = ATA_PROT_NCQ;
-   tf->command = ATA_CMD_FPDMA_SEND;
-   tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
-   tf->nsect = qc->tag << 3;
-   tf->hob_feature = (size / 512) >> 8;
-   tf->feature = size / 512;
+   /*
+* if we only have SCT then ignore the state of unmap request
+* a zero the blocks.
+*/
+   if (use_sct) {
+   u16 *sctpg = buf;
+
+   put_unaligned_le16(0x0002,  [0]); /* SCT_ACT_WRITE_SAME */
+   put_unaligned_le16(0x0101,  [1]); /* WRITE PTRN FG */
+   put_unaligned_le64(block,   [2]);
+   put_unaligned_le64(n_block, [6]);
+   put_unaligned_le32(0u,  [10]);
 
-   tf->auxiliary = 1;
-   } else {
-   tf->protocol = ATA_PROT_DMA;
tf->hob_feature = 0;
-   tf->feature = ATA_DSM_TRIM;
-   tf->hob_nsect = (size / 512) >> 8;
-   tf->nsect = size / 512;
-   tf->command = ATA_CMD_DSM;
+   tf->feature = 0;
+   tf->hob_nsect = 0;
+   tf->nsect = 1;
+   tf->lbah = 0;
+   tf->lbam = 0;
+   tf->lbal = ATA_CMD_STANDBYNOW1;
+   tf->hob_lbah = 0;
+   tf->hob_lbam = 0;
+   tf->hob_lbal = 0;
+   tf->device = ATA_CMD_STANDBYNOW1;
+   tf->protocol = ATA_PROT_DMA;
+   tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
+   } else {
+   size = ata_set

[PATCH v3] Add support for Write Same via SCT

2016-06-20 Thread Shaun Tancheff
At some point the method of issuing Write Same for ATA drives changed.
Currently write same is commonly available via SCT so expose the SCT
capabilities and use SCT Write Same if available.

This is useful for zoned based media that prefers to support discard
with lbprz set, aka discard zeroes data by mapping discard operations to 
reset write pointer operations. Conventional zones that do not support 
reset write pointer can still honor the discard zeroes data by issuing
a write same over the zone.

After reviewing the commits around no_write_same heuristics it seems
that decoupling the status quo will cause more harm than good as it 
seems that broken SATL logic around WRITE SAME is quite common and 
perhaps not surprising as the command appears to have been removed
after ATA-2 and essentially deprecated somewhere around 1997.

Here the approach is to flag a secondary code path that uses SCT to
perform WRITE SAME and then infer the proper action based on the
UNMAP flag and the reported capability.

Before this patch the only valid code path is with UNMAP set and 
TRIM available.

With this patch if UNMAP is not set or TRIM is not available then
fall back to SCT WRITE SAME, if it is available.

In this way it would be possible to mimic lbprz for devices that 
support TRIM but fail to zero blocks reliably. For example a
file-system or DM target could issue a write same w/o unmap followed
by an trim.

This patch is also at
https://github.com/stancheff/linux.git
g...@github.com:stancheff/linux.git
branch: v4.7-rc2+sct-write-same.v3

v3:
 - Demux UNMAP/TRIM from WRITE SAME
 - Add offset from scatterlist to page address.
v2:
 - Remove fugly ata hacking from sd.c

Shaun Tancheff (1):
  Add support for SCT Write Same

 drivers/ata/libata-scsi.c  | 80 +++---
 drivers/scsi/sd.c  |  2 +-
 include/linux/ata.h| 43 +
 include/scsi/scsi_device.h |  1 +
 4 files changed, 106 insertions(+), 20 deletions(-)

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/2] Block layer support ZAC/ZBC commands

2016-06-19 Thread Shaun Tancheff
Hi Jens,

This series is on linux-next tag next-20160617.

As Host Aware drives are becoming available we would like to be able
to make use of such drives. This series is also intended to be suitable
for use by Host Managed drives.

ZAC/ZBC drives add new commands for discovering and working with Zones.

This extends the ZAC/ZBC support up to the block layer.

Patches for util-linux can be found here:
https://github.com/Seagate/ZDM-Device-Mapper/tree/master/patches/util-linux

Using BIOs to issue ZBC commands allows DM targets (such as ZDM) or
file-systems such as f2fs, btrfs or nilfs2 to extend their block
allocation schemes and/or issue discards that are zone aligned.

A perhaps non-obvious approach is that a conventional drive will 
returns a zone report descriptor with a single large conventional zone.

v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
 - Dropped ata16 hackery
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
 - Use zoned report reserved bit for ata-passthrough flag.

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Moved include/uapi/linux/fs.h changes to patch 3
 - Fixed commit message for first patch in series.

Shaun Tancheff (2):
  Add bio/request flags for using ZBC/ZAC commands
  Add ioctl to issue ZBC/ZAC commands via block layer

 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  97 +
 block/ioctl.c | 110 +++
 drivers/scsi/sd.c | 122 +
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   7 +-
 include/linux/blk_types.h |   6 +-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 220 ++
 include/uapi/linux/fs.h   |   1 +
 11 files changed, 597 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/2] Add ioctl to issue ZBC/ZAC commands via block layer

2016-06-19 Thread Shaun Tancheff
Add New ioctl types
BLKREPORT- Issue Report Zones to device.
BLKOPENZONE  - Issue an Zone Action: Open Zone command.
BLKCLOSEZONE - Issue an Zone Action: Close Zone command.
BLKRESETZONE - Issue an Zone Action: Reset Zone command.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
---
 block/ioctl.c | 110 ++
 include/uapi/linux/blkzoned_api.h |   6 +++
 include/uapi/linux/fs.h   |   1 +
 3 files changed, 117 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..97e685e 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -194,6 +195,109 @@ int blkdev_reread_part(struct block_device *bdev)
 }
 EXPORT_SYMBOL(blkdev_reread_part);
 
+static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode,
+   void __user *parg)
+{
+   int error = -EFAULT;
+   gfp_t gfp = GFP_KERNEL;
+   struct bdev_zone_report_io *zone_iodata = NULL;
+   int order = 0;
+   struct page *pgs = NULL;
+   u32 alloc_size = PAGE_SIZE;
+   unsigned long op_flags = 0;
+   u8 opt = 0;
+
+   if (!(mode & FMODE_READ))
+   return -EBADF;
+
+   zone_iodata = (void *)get_zeroed_page(gfp);
+   if (!zone_iodata) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   if (copy_from_user(zone_iodata, parg, sizeof(*zone_iodata))) {
+   error = -EFAULT;
+   goto report_zones_out;
+   }
+   if (zone_iodata->data.in.return_page_count > alloc_size) {
+   int npages;
+
+   alloc_size = zone_iodata->data.in.return_page_count;
+   npages = (alloc_size + PAGE_SIZE - 1) / PAGE_SIZE;
+   order =  ilog2(roundup_pow_of_two(npages));
+   pgs = alloc_pages(gfp, order);
+   if (pgs) {
+   void *mem = page_address(pgs);
+
+   if (!mem) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   memset(mem, 0, alloc_size);
+   memcpy(mem, zone_iodata, sizeof(*zone_iodata));
+   free_page((unsigned long)zone_iodata);
+   zone_iodata = mem;
+   } else {
+   /* Result requires DMA capable memory */
+   pr_err("Not enough memory available for request.\n");
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   }
+   opt = zone_iodata->data.in.report_option;
+   error = blkdev_issue_zone_report(bdev, op_flags,
+   zone_iodata->data.in.zone_locator_lba, opt,
+   pgs ? pgs : virt_to_page(zone_iodata),
+   alloc_size, GFP_KERNEL);
+
+   if (error)
+   goto report_zones_out;
+
+   if (copy_to_user(parg, zone_iodata, alloc_size))
+   error = -EFAULT;
+
+report_zones_out:
+   if (pgs)
+   __free_pages(pgs, order);
+   else if (zone_iodata)
+   free_page((unsigned long)zone_iodata);
+   return error;
+}
+
+static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+   unsigned int op = 0;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   /*
+* When acting on zones we explicitly disallow using a partition.
+*/
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
+   switch (cmd) {
+   case BLKOPENZONE:
+   op = REQ_OP_ZONE_OPEN;
+   break;
+   case BLKCLOSEZONE:
+   op = REQ_OP_ZONE_CLOSE;
+   break;
+   case BLKRESETZONE:
+   op = REQ_OP_ZONE_RESET;
+   break;
+   default:
+   pr_err("%s: Unknown action: %u\n", __func__, cmd);
+   return -EINVAL;
+   }
+   return blkdev_issue_zone_action(bdev, op, 0, arg, GFP_KERNEL);
+}
+
 static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
unsigned long arg, unsigned long flags)
 {
@@ -568,6 +672,12 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
case BLKTRACESETUP:
case BLKTRACETEARDOWN:
return blk_trace_ioctl(bdev, cmd, argp);
+   case BLKREPORT:
+   return blk_zoned_report_ioctl(bdev, mode, argp);
+   case BLKOPENZONE:
+   case B

[PATCH v4 1/2] Add bio/request flags for using ZBC/ZAC commands

2016-06-19 Thread Shaun Tancheff
T10 ZBC and T13 ZAC specify operations for Zoned devices.

To be able to access the zone information and open and close zones
adding op's for:
  - Report zones command: REQ_OP_ZONE_REPORT
  - Open zone: REQ_OP_ZONE_OPEN
  - Close zone: REQ_OP_ZONE_CLOSE
  - Reset Write Pointer: REQ_OP_ZONE_RESET
to be used to create struct bio / struct request to issue
ZBC commands.

Report zones is a device read that requires a buffer.
Open, Close and Reset are device commands that have no associated
data transfer.

The Finish zone command is intentionally not implimented as there is no
current use case for that operation.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option flag is useful as it can reduce the number
of zones in each report, but not critical.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.
---
 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  97 +
 drivers/scsi/sd.c | 122 ++
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   7 +-
 include/linux/blk_types.h |   6 +-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 214 ++
 9 files changed, 480 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index d174e34..280f87b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12815,6 +12815,15 @@ F: Documentation/networking/z8530drv.txt
 F: drivers/net/hamradio/*scc.c
 F: drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M: Shaun Tancheff <shaun.tanch...@seagate.com>
+W: http://seagate.com
+W: https://github.com/Seagate/ZDM-Device-Mapper
+L: linux-bl...@vger.kernel.org
+S: Maintained
+F: include/linux/blkzoned_api.h
+F: include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M: Seth Jennings <sjenn...@redhat.com>
 L: linux...@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 8e24f5e..e6ad31e 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 
@@ -261,3 +262,99 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:  target blockdev
+ * @op_flags:  extra bio rw flags. If unsure, use 0.
+ * @sector:starting sector (report will include this sector).
+ * @opt:   See: zone_report_option, default is 0 (all zones).
+ * @page:  one or more contiguous pages.
+ * @pgsz:  up to size of page in bytes, size of report.
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+sector_t sector, u8 opt, struct page *page,
+size_t pgsz, gfp_t gfp_mask)
+{
+   struct bdev_zone_report *conv = page_address(page);
+   struct bio *bio;
+   unsigned int nr_iovecs = 1;
+   int ret = 0;
+
+   if (pgsz < (sizeof(struct bdev_zone_report) +
+   sizeof(struct bdev_zone_descriptor)))
+   return -EINVAL;
+
+   bio = bio_alloc(gfp_mask, nr_iovecs);
+   if (!bio)
+   return -ENOMEM;
+
+   conv->descriptor_count = 0;
+   bio->bi_iter.bi_sector = sector;
+   bio->bi_bdev = bdev;
+   bio->bi_vcnt = 0;
+   bio->bi_iter.bi_size = 0;
+
+   /* FUTURE ... when streamid is available: */
+   /* bio_set_streamid(bio, opt); */
+
+   bio_add_page(bio, page, pgsz, 0);
+   bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, op_flags);
+   ret = submit_bio_wait(bio);
+
+   /*
+* When our request it nak'd the underlying device maybe conventional
+* so ... report a single conventional zone the size of the device.
+*/
+   if (ret == -EIO && conv->descriptor_count) {
+   /* Adjust the conventional to the size of the partition ... */
+

Re: [PATCH v3 3/3] Add ata pass-through path for ZAC commands.

2016-06-10 Thread Shaun Tancheff
On Fri, Jun 10, 2016 at 2:19 AM, Hannes Reinecke <h...@suse.de> wrote:
> On 06/10/2016 09:10 AM, Shaun Tancheff wrote:
>> The current generation of HBA SAS adapters support connecting SATA
>> drives and perform SCSI<->ATA translations in hardware.
>> Unfortunately the ZBC commands are not being translate (yet).
>>
>> Currently users of SAS controllers can only send ZAC commands via
>> ata pass-through.
>>
>> This method overloads the meaning of REQ_META to direct ZBC commands
>> to construct ZAC equivalent ATA pass through commands.
>> Note also that this approach expects the initiator to deal with the
>> little endian result due to bypassing the normal translation layers.
>>
>> Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
>> ---
>> So this patch isn't the right way to work around hardware that is
>> missing features (mixing ATA commands in SCSI interface code) it
>> maybe useful for end users in the near term who have HBA SAS
>> controllers that don't support ZBC <-> ZAC translations.
>>
> And indeed, this patch isn't right.
> It is just for a very specific SAS HBA (mpt2sas/mpt3sas).
> Other SAS HBAs like isci and hisi_sas work just nicely here.

That is good to know there are some vendors that are on the ball.

> So a translation into a ATA_16 command is _wrong_.
> If you need to do this you'll have to move it into the LLDD itself.
> Or use blacklisting to invoke this behaviour.
> But _not_ in the general code path.

Agreed. Thanks!

> Cheers,
>
> Hannes
> --
> Dr. Hannes ReineckeTeamlead Storage & Networking
> h...@suse.de   +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)

-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] Add bio/request flags for using ZBC/ZAC commands

2016-06-10 Thread Shaun Tancheff
T10 ZBC and T13 ZAC specify operations for Zoned devices.

To be able to access the zone information and open and close zones
adding flags for the report zones command (REQ_REPORT_ZONES) and for
Open and Close zone (REQ_OPEN_ZONE and REQ_CLOSE_ZONE) can be added
for use by struct bio's bi_rw and by struct request's cmd_flags.

To reduce the number of additional flags needed REQ_RESET_ZONE shares
the same flag as REQ_REPORT_ZONES and is differentiated by direction.
Report zones is a device read that requires a buffer. Reset is a device
command (WRITE) that has no associated data transfer.

The Finish zone command is intentionally not implimented as there is no
current use case for that operation.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option is useful as it can reduce the number of
zones in each report, but not critical.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.
---
 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  98 +
 drivers/scsi/sd.c |  99 ++
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   4 +-
 include/linux/blk_types.h |  16 ++-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 214 ++
 9 files changed, 464 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ed42cb6..d9fafa2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12662,6 +12662,15 @@ F: Documentation/networking/z8530drv.txt
 F: drivers/net/hamradio/*scc.c
 F: drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M: Shaun Tancheff <shaun.tanch...@seagate.com>
+W: http://seagate.com
+W: https://github.com/Seagate/ZDM-Device-Mapper
+L: linux-bl...@vger.kernel.org
+S: Maintained
+F: include/linux/blkzoned_api.h
+F: include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M: Seth Jennings <sjenn...@redhat.com>
 L: linux...@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index ff2a7f0..eda0071 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 
@@ -252,3 +253,100 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:  target blockdev
+ * @op_flags:  extra bio rw flags. If unsure, use 0.
+ * @sector:starting sector (report will include this sector).
+ * @opt:   See: zone_report_option, default is 0 (all zones).
+ * @page:  one or more contiguous pages.
+ * @pgsz:  up to size of page in bytes, size of report.
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+sector_t sector, u8 opt, struct page *page,
+size_t pgsz, gfp_t gfp_mask)
+{
+   struct bdev_zone_report *conv = page_address(page);
+   struct bio *bio;
+   unsigned int nr_iovecs = 1;
+   int ret = 0;
+
+   if (pgsz < (sizeof(struct bdev_zone_report) +
+   sizeof(struct bdev_zone_descriptor)))
+   return -EINVAL;
+
+   bio = bio_alloc(gfp_mask, nr_iovecs);
+   if (!bio)
+   return -ENOMEM;
+
+   conv->descriptor_count = 0;
+   bio->bi_iter.bi_sector = sector;
+   bio->bi_bdev = bdev;
+   bio->bi_vcnt = 0;
+   bio->bi_iter.bi_size = 0;
+
+   op_flags |= REQ_REPORT_ZONES;
+
+   /* FUTURE ... when streamid is available: */
+   /* bio_set_streamid(bio, opt); */
+
+   bio_add_page(bio, page, pgsz, 0);
+   bio_set_op_attrs(bio, REQ_OP_READ, op_flags);
+   ret = submit_bio_wait(bio);
+
+   /*
+* When our request it nak'd the underlying device maybe conventional
+* so ... report a single conventional zone the size of the device.
+*/
+   if (ret == -EIO && conv->descriptor_count) {
+   /* Adjust the conventional to the size of the pa

[PATCH v3 1/3] Add bio/request flags for using ZBC/ZAC commands

2016-06-10 Thread Shaun Tancheff
T10 ZBC and T13 ZAC specify operations for Zoned devices.

To be able to access the zone information and open and close zones
adding flags for the report zones command (REQ_REPORT_ZONES) and for
Open and Close zone (REQ_OPEN_ZONE and REQ_CLOSE_ZONE) can be added
for use by struct bio's bi_rw and by struct request's cmd_flags.

To reduce the number of additional flags needed REQ_RESET_ZONE shares
the same flag as REQ_REPORT_ZONES and is differentiated by direction.
Report zones is a device read that requires a buffer. Reset is a device
command (WRITE) that has no associated data transfer.

The Finish zone command is intentionally not implimented as there is no
current use case for that operation.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option is useful as it can reduce the number of
zones in each report, but not critical.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.
---
 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  98 +
 drivers/scsi/sd.c |  99 ++
 drivers/scsi/sd.h |   1 +
 include/linux/bio.h   |   4 +-
 include/linux/blk_types.h |  16 ++-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 214 ++
 9 files changed, 464 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ed42cb6..d9fafa2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12662,6 +12662,15 @@ F: Documentation/networking/z8530drv.txt
 F: drivers/net/hamradio/*scc.c
 F: drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M: Shaun Tancheff <shaun.tanch...@seagate.com>
+W: http://seagate.com
+W: https://github.com/Seagate/ZDM-Device-Mapper
+L: linux-bl...@vger.kernel.org
+S: Maintained
+F: include/linux/blkzoned_api.h
+F: include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M: Seth Jennings <sjenn...@redhat.com>
 L: linux...@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index ff2a7f0..eda0071 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 
@@ -252,3 +253,100 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:  target blockdev
+ * @op_flags:  extra bio rw flags. If unsure, use 0.
+ * @sector:starting sector (report will include this sector).
+ * @opt:   See: zone_report_option, default is 0 (all zones).
+ * @page:  one or more contiguous pages.
+ * @pgsz:  up to size of page in bytes, size of report.
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+sector_t sector, u8 opt, struct page *page,
+size_t pgsz, gfp_t gfp_mask)
+{
+   struct bdev_zone_report *conv = page_address(page);
+   struct bio *bio;
+   unsigned int nr_iovecs = 1;
+   int ret = 0;
+
+   if (pgsz < (sizeof(struct bdev_zone_report) +
+   sizeof(struct bdev_zone_descriptor)))
+   return -EINVAL;
+
+   bio = bio_alloc(gfp_mask, nr_iovecs);
+   if (!bio)
+   return -ENOMEM;
+
+   conv->descriptor_count = 0;
+   bio->bi_iter.bi_sector = sector;
+   bio->bi_bdev = bdev;
+   bio->bi_vcnt = 0;
+   bio->bi_iter.bi_size = 0;
+
+   op_flags |= REQ_REPORT_ZONES;
+
+   /* FUTURE ... when streamid is available: */
+   /* bio_set_streamid(bio, opt); */
+
+   bio_add_page(bio, page, pgsz, 0);
+   bio_set_op_attrs(bio, REQ_OP_READ, op_flags);
+   ret = submit_bio_wait(bio);
+
+   /*
+* When our request it nak'd the underlying device maybe conventional
+* so ... report a single conventional zone the size of the device.
+*/
+   if (ret == -EIO && conv->descriptor_count) {
+   /* Adjust the conventional to the size of the pa

[PATCH v3 2/3] Add ioctl to issue ZBC/ZAC commands via block layer

2016-06-10 Thread Shaun Tancheff
Add New ioctl types
BLKREPORT- Issue Report Zones to device.
BLKOPENZONE  - Issue an Zone Action: Open Zone command.
BLKCLOSEZONE - Issue an Zone Action: Close Zone command.
BLKRESETZONE - Issue an Zone Action: Reset Zone command.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
 block/ioctl.c | 110 ++
 include/uapi/linux/blkzoned_api.h |   6 +++
 include/uapi/linux/fs.h   |   1 +
 3 files changed, 117 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..1e89721 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -194,6 +195,109 @@ int blkdev_reread_part(struct block_device *bdev)
 }
 EXPORT_SYMBOL(blkdev_reread_part);
 
+static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode,
+   void __user *parg)
+{
+   int error = -EFAULT;
+   gfp_t gfp = GFP_KERNEL;
+   struct bdev_zone_report_io *zone_iodata = NULL;
+   int order = 0;
+   struct page *pgs = NULL;
+   u32 alloc_size = PAGE_SIZE;
+   unsigned long op_flags = 0;
+   u8 opt = 0;
+
+   if (!(mode & FMODE_READ))
+   return -EBADF;
+
+   zone_iodata = (void *)get_zeroed_page(gfp);
+   if (!zone_iodata) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   if (copy_from_user(zone_iodata, parg, sizeof(*zone_iodata))) {
+   error = -EFAULT;
+   goto report_zones_out;
+   }
+   if (zone_iodata->data.in.return_page_count > alloc_size) {
+   int npages;
+
+   alloc_size = zone_iodata->data.in.return_page_count;
+   npages = (alloc_size + PAGE_SIZE - 1) / PAGE_SIZE;
+   order =  ilog2(roundup_pow_of_two(npages));
+   pgs = alloc_pages(gfp, order);
+   if (pgs) {
+   void *mem = page_address(pgs);
+
+   if (!mem) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   memset(mem, 0, alloc_size);
+   memcpy(mem, zone_iodata, sizeof(*zone_iodata));
+   free_page((unsigned long)zone_iodata);
+   zone_iodata = mem;
+   } else {
+   /* Result requires DMA capable memory */
+   pr_err("Not enough memory available for request.\n");
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   }
+   opt = zone_iodata->data.in.report_option;
+   error = blkdev_issue_zone_report(bdev, op_flags,
+   zone_iodata->data.in.zone_locator_lba, opt,
+   pgs ? pgs : virt_to_page(zone_iodata),
+   alloc_size, GFP_KERNEL);
+
+   if (error)
+   goto report_zones_out;
+
+   if (copy_to_user(parg, zone_iodata, alloc_size))
+   error = -EFAULT;
+
+report_zones_out:
+   if (pgs)
+   __free_pages(pgs, order);
+   else if (zone_iodata)
+   free_page((unsigned long)zone_iodata);
+   return error;
+}
+
+static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+   unsigned long op_flags = 0;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   /*
+* When acting on zones we explicitly disallow using a partition.
+*/
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
+   switch (cmd) {
+   case BLKOPENZONE:
+   op_flags |= REQ_OPEN_ZONE;
+   break;
+   case BLKCLOSEZONE:
+   op_flags |= REQ_CLOSE_ZONE;
+   break;
+   case BLKRESETZONE:
+   op_flags |= REQ_RESET_ZONE;
+   break;
+   default:
+   pr_err("%s: Unknown action: %u\n", __func__, cmd);
+   WARN_ON(1);
+   }
+   return blkdev_issue_zone_action(bdev, op_flags, arg, GFP_KERNEL);
+}
+
 static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
unsigned long arg, unsigned long flags)
 {
@@ -568,6 +672,12 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
case BLKTRACESETUP:
case BLKTRACETEARDOWN:
return blk_trace_ioctl(bdev, cmd, argp);
+   case BLKREPORT:
+   return blk_zoned_report_ioctl(bdev, mode, argp);
+   case BLKOPENZONE:
+   case BLKCLOSEZONE:
+   case BLKRESETZONE:
+   return blk_zoned_ac

[PATCH v3 0/3] Block layer support ZAC/ZBC commands

2016-06-10 Thread Shaun Tancheff
Hi Jens,

This series is on your for-next branch.

As Host Aware drives are becoming available we would like to be able
to make use of such drives. This series is also intended to be suitable
for use by Host Managed drives.

ZAC/ZBC drives add new commands for discovering and working with Zones.

This extends the ZAC/ZBC support up to the block layer.

Patches for util-linux can be found here:
https://github.com/Seagate/ZDM-Device-Mapper/tree/master/patches/util-linux

Using BIOs to issue ZBC commands allows DM targets (such as ZDM) or
file-systems such as btrfs or nilfs2 to extend their block allocation
schemes and issue discards that are zone aware.

A perhaps non-obvious approach is that a conventional drive will 
returns a descriptor with a single large conventional zone.

The last patch dealing with ata16 passthrough is to workaround HBA SAS 
controllers that don't support ZBC. It will be dropped now that firmware
updates are starting to appear.

V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
 - Use zoned report reserved bit for ata-passthrough flag.

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Moved include/uapi/linux/fs.h changes to patch 3
 - Fixed commit message for first patch in series.

Shaun Tancheff (3):
  Add bio/request flags for using ZBC/ZAC commands
  Add ioctl to issue ZBC/ZAC commands via block layer
  Add ata pass-through path for ZAC commands.

 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  98 +
 block/ioctl.c | 142 
 drivers/scsi/sd.c | 139 ++-
 drivers/scsi/sd.h |   1 +
 include/linux/ata.h   |  15 +++
 include/linux/bio.h   |   4 +-
 include/linux/blk_types.h |  16 ++-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 224 ++
 include/uapi/linux/fs.h   |   1 +
 12 files changed, 671 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] Add ata pass-through path for ZAC commands.

2016-06-10 Thread Shaun Tancheff
The current generation of HBA SAS adapters support connecting SATA
drives and perform SCSI<->ATA translations in hardware.
Unfortunately the ZBC commands are not being translate (yet).

Currently users of SAS controllers can only send ZAC commands via
ata pass-through.

This method overloads the meaning of REQ_META to direct ZBC commands
to construct ZAC equivalent ATA pass through commands.
Note also that this approach expects the initiator to deal with the
little endian result due to bypassing the normal translation layers.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
So this patch isn't the right way to work around hardware that is
missing features (mixing ATA commands in SCSI interface code) it
maybe useful for end users in the near term who have HBA SAS
controllers that don't support ZBC <-> ZAC translations.

V3:
 - Use zoned report reserved bit for ata-passthrough flag.
v2:
 - Added REQ_META to op_flags if high bit is set in opt.
---
 block/ioctl.c | 34 +++-
 drivers/scsi/sd.c | 68 ++-
 include/linux/ata.h   | 15 +
 include/uapi/linux/blkzoned_api.h |  4 +++
 4 files changed, 105 insertions(+), 16 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 1e89721..b9dea29 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -244,7 +244,10 @@ static int blk_zoned_report_ioctl(struct block_device 
*bdev, fmode_t mode,
goto report_zones_out;
}
}
-   opt = zone_iodata->data.in.report_option;
+   opt = zone_iodata->data.in.report_option & ~(ZOPT_USE_ATA_PASS);
+   if (zone_iodata->data.in.report_option & ZOPT_USE_ATA_PASS)
+   op_flags |= REQ_META;
+
error = blkdev_issue_zone_report(bdev, op_flags,
zone_iodata->data.in.zone_locator_lba, opt,
pgs ? pgs : virt_to_page(zone_iodata),
@@ -281,6 +284,35 @@ static int blk_zoned_action_ioctl(struct block_device 
*bdev, fmode_t mode,
return -EFAULT;
}
 
+   /*
+* When the low bit is set force ATA passthrough try to work around
+* older SAS HBA controllers that don't support ZBC to ZAC translation.
+*
+* When the low bit is clear follow the normal path but also correct
+* for ~0ul LBA means 'for all lbas'.
+*
+* NB: We should do extra checking here to see if the user specified
+* the entire block device as opposed to a partition of the
+* device
+*/
+   if (arg & 1) {
+   op_flags |= REQ_META;
+   if (arg != ~0ul)
+   arg &= ~1ul; /* ~1 :: 0xFF...FE */
+   } else {
+   if (arg == ~1ul)
+   arg = ~0ul;
+   }
+
+   /*
+* When acting on zones we explicitly disallow using a partition.
+*/
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
switch (cmd) {
case BLKOPENZONE:
op_flags |= REQ_OPEN_ZONE;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 241faf5..1a6c5b3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1183,12 +1184,28 @@ static int sd_setup_zoned_cmnd(struct scsi_cmnd *cmd)
 
cmd->cmd_len = 16;
memset(cmd->cmnd, 0, cmd->cmd_len);
-   cmd->cmnd[0] = ZBC_IN;
-   cmd->cmnd[1] = ZI_REPORT_ZONES;
-   put_unaligned_be64(sector, >cmnd[2]);
-   put_unaligned_be32(nr_bytes, >cmnd[10]);
-   /* FUTURE ... when streamid is available */
-   /* cmd->cmnd[14] = bio_get_streamid(bio); */
+   if (rq->cmd_flags & REQ_META) {
+   cmd->cmnd[0] = ATA_16;
+   cmd->cmnd[1] = (0x6 << 1) | 1;
+   cmd->cmnd[2] = 0x0e;
+   /* FUTURE ... when streamid is available */
+   /* cmd->cmnd[3] = bio_get_streamid(bio); */
+   cmd->cmnd[4] = ATA_SUBCMD_ZAC_MGMT_IN_REPORT_ZONES;
+   cmd->cmnd[5] = ((nr_bytes / 512) >> 8) & 0xff;
+   cmd->cmnd[6] = (nr_bytes / 512) & 0xff;
+
+   _lba_to_cmd_ata(>cmnd[7], sector);
+
+   cmd->cmnd[13] = 1 << 6;
+   cmd->cmnd[14] = ATA_CMD_ZAC_MGMT_IN;
+   } else {
+   cmd->cmnd[0] = ZBC_IN;
+   cmd->cmnd[1] = ZI_REPORT_ZONES;
+   put_unaligned_b

Re: [PATCH v2] Add support for SCT Write Same

2016-06-09 Thread Shaun Tancheff
On Thu, Jun 9, 2016 at 4:22 AM, Christoph Hellwig <h...@infradead.org> wrote:
>> + if (ata_id_sct_write_same(dev->id))
>> + sdev->sct_write_same = 1;
>> +
>
> What's the point of this flag?  It should simply clear the no_write_same
> flag for this device.  Due to the way how we have both a per-host and
> per-device flag that might not be completely trivial, but untangling
> that mess might be a good idea anyway.

Agreed. I looks like clearing the no_write_same flag mostly works
but the queue limits don't make any sense because they get cleared.
I'll see if I can untangle it.

>> @@ -3305,6 +3308,37 @@ static unsigned int ata_scsi_write_same_xlat(struct 
>> ata_queued_cmd *qc)
>>   goto invalid_param_len;
>>
>>   buf = page_address(sg_page(scsi_sglist(scmd)));
>> +
>> + if (ata_id_sct_write_same(dev->id)) {
>
> Various comments:
>
>  - The plain page_address above looks harmful, how do we know that
>the page is mapped into kernel memory?  This might actually be broken
>already, though.

I think it just happens to work because it's always used with a recently
allocated page. Fixing it to include the (possible) offset is just a good thing.

>  - Why is this below the check that rejects non-unmap WRITE SAME
>commands?

Yeah, tunnel vision. For some reason I was thinking that it was
either WRITE SAME or SCT. But this case is actually TRIM and/or SCT which
makes the demuxing what the user expects a little less clear.

Now I am thinking that the cleanest method is to try and honor the unmap
flag to pick the command path.

If trim is available and unmap is set then you get the current behavior.
else if SCT is available follow the SCT path
else fail with the current error (unmap is not set).

In this way if you device supports both TRIM and SCT then you can WRITE SAME
or TRIM.

>  - Shouldn't we still translate discard command to TRIM?  Maybe we
>need a check of the operation in the request structure..
>

-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Add support for SCT Write Same

2016-06-09 Thread Shaun Tancheff
SATA drives may support write same via SCT. This is useful
for setting the drive contents to a specific pattern (0's).

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v2:
 - Remove fugly ata hacking from sd.c
---
 drivers/ata/libata-scsi.c  | 34 ++
 drivers/scsi/sd.c  |  2 +-
 include/linux/ata.h| 43 +++
 include/scsi/scsi_device.h |  1 +
 4 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index bfec66f..b73eace 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1204,6 +1204,9 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
if (!ata_id_has_unload(dev->id))
dev->flags |= ATA_DFLAG_NO_UNLOAD;
 
+   if (ata_id_sct_write_same(dev->id))
+   sdev->sct_write_same = 1;
+
/* configure max sectors */
blk_queue_max_hw_sectors(q, dev->max_sectors);
 
@@ -3305,6 +3308,37 @@ static unsigned int ata_scsi_write_same_xlat(struct 
ata_queued_cmd *qc)
goto invalid_param_len;
 
buf = page_address(sg_page(scsi_sglist(scmd)));
+
+   if (ata_id_sct_write_same(dev->id)) {
+   u16 *sctpg = buf;
+
+   put_unaligned_le16(0x0002,  [0]); /* SCT_ACT_WRITE_SAME */
+   put_unaligned_le16(0x0101,  [1]); /* WRITE PTRN FG */
+   put_unaligned_le64(block,   [2]);
+   put_unaligned_le64(n_block, [6]);
+   put_unaligned_le32(0u,  [10]);
+
+   tf->hob_feature = 0;
+   tf->feature = 0;
+   tf->hob_nsect = 0;
+   tf->nsect = 1;
+   tf->lbah = 0;
+   tf->lbam = 0;
+   tf->lbal = ATA_CMD_STANDBYNOW1;
+   tf->hob_lbah = 0;
+   tf->hob_lbam = 0;
+   tf->hob_lbal = 0;
+   tf->device = ATA_CMD_STANDBYNOW1;
+   tf->protocol = ATA_PROT_DMA;
+   tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
+   tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE |
+ATA_TFLAG_LBA48 | ATA_TFLAG_WRITE;
+
+   ata_qc_set_pc_nbytes(qc);
+
+   return 0;
+   }
+
size = ata_set_lba_range_entries(buf, 512, block, n_block);
 
if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index f459dff..b5ffcd3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -794,7 +794,7 @@ static void sd_config_write_same(struct scsi_disk *sdkp)
struct request_queue *q = sdkp->disk->queue;
unsigned int logical_block_size = sdkp->device->sector_size;
 
-   if (sdkp->device->no_write_same) {
+   if (sdkp->device->no_write_same && !sdkp->device->sct_write_same) {
sdkp->max_ws_blocks = 0;
goto out;
}
diff --git a/include/linux/ata.h b/include/linux/ata.h
index 99346be..4132de3 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -104,6 +104,7 @@ enum {
ATA_ID_CFA_KEY_MGMT = 162,
ATA_ID_CFA_MODES= 163,
ATA_ID_DATA_SET_MGMT= 169,
+   ATA_ID_SCT_CMD_XPORT= 206,
ATA_ID_ROT_SPEED= 217,
ATA_ID_PIO4 = (1 << 1),
 
@@ -778,6 +779,48 @@ static inline bool ata_id_sense_reporting_enabled(const 
u16 *id)
 }
 
 /**
+ *
+ * Word: 206 - SCT Command Transport
+ *15:12 - Vendor Specific
+ * 11:6 - Reserved
+ *5 - SCT Command Transport Data Tables supported
+ *4 - SCT Command Transport Features Control supported
+ *3 - SCT Command Transport Error Recovery Control supported
+ *2 - SCT Command Transport Write Same supported
+ *1 - SCT Command Transport Long Sector Access supported
+ *0 - SCT Command Transport supported
+ */
+static inline bool ata_id_sct_data_tables(const u16 *id)
+{
+   return id[ATA_ID_SCT_CMD_XPORT] & (1 << 5) ? true : false;
+}
+
+static inline bool ata_id_sct_features_ctrl(const u16 *id)
+{
+   return id[ATA_ID_SCT_CMD_XPORT] & (1 << 4) ? true : false;
+}
+
+static inline bool ata_id_sct_error_recovery_ctrl(const u16 *id)
+{
+   return id[ATA_ID_SCT_CMD_XPORT] & (1 << 3) ? true : false;
+}
+
+static inline bool ata_id_sct_write_same(const u16 *id)
+{
+   return id[ATA_ID_SCT_CMD_XPORT] & (1 << 2) ? true : false;
+}
+
+static inline bool ata_id_sct_long_sector_access(const u16 *id)
+{
+   return id[ATA_ID_SCT_CMD_XPORT] & (1 << 1) ? true : false;
+}
+
+static inline bool ata_id_sct_supported(const u16 *id)
+{
+   return id[ATA_ID_SCT_CMD_XPORT] & (1 << 0) ? true : false;
+}
+
+/**
  * ata_id_major_version-   get ATA level of drive

[PATCH v2] Add support for Write Same via SCT

2016-06-09 Thread Shaun Tancheff
At some point the method of issuing Write Same for ATA drives changed.
Currently write same is commonly available via SCT so expose the SCT
capabilities and use SCT Write Same if available.

This is useful for zoned based media that prefers to support discard
with lbprz set, aka discard zeroes data by mapping discard operations to 
reset write pointer operations. Conventional zones that do not support 
reset write pointer can still honor the discard zeroes data by issuing
a write same over the zone.

v2:
 - Remove fugly ata hacking from sd.c

Shaun Tancheff (1):
  Add support for SCT Write Same

 drivers/ata/libata-scsi.c  | 34 ++
 drivers/scsi/sd.c  |  2 +-
 include/linux/ata.h| 43 +++
 include/scsi/scsi_device.h |  1 +
 4 files changed, 79 insertions(+), 1 deletion(-)

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for SCT Write Same

2016-06-09 Thread Shaun Tancheff
On Wed, Jun 8, 2016 at 10:39 PM, Martin K. Petersen
<martin.peter...@oracle.com> wrote:
>
> >>>>> "Shaun" == Shaun Tancheff <sh...@tancheff.com> writes:
>
> Shaun,
>
> Shaun> SATA drives may support write same via SCT. This is useful for
> Shaun> setting the drive contents to a specific pattern (0's).
>
> index 428c03e..c5c8424 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -52,6 +52,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> No ATA stuff in sd.c, please.
>
>
> @@ -2761,24 +2762,26 @@ static void sd_read_write_same(struct scsi_disk 
> *sdkp, unsigned char *buffer)
>  {
> struct scsi_device *sdev = sdkp->device;
>
> -   if (sdev->host->no_write_same) {
> -   sdev->no_write_same = 1;
> -
> -   return;
> -   }
> -
> if (scsi_report_opcode(sdev, buffer, SD_BUF_SIZE, INQUIRY) < 0) {
> -   /* too large values might cause issues with arcmsr */
> -   int vpd_buf_len = 64;
> -
> sdev->no_report_opcodes = 1;
>
> /* Disable WRITE SAME if REPORT SUPPORTED OPERATION
>  * CODES is unsupported and the device has an ATA
>  * Information VPD page (SAT).
>  */
>
> The above comment tells you how to enable WRITE SAME in libata's
> SCSI-ATA translation.
>
>
> -   if (!scsi_get_vpd_page(sdev, 0x89, buffer, vpd_buf_len))
> +   if (!scsi_get_vpd_page(sdev, 0x89, buffer, SD_BUF_SIZE))
>
> That vpd_buf_len is intentional.



Got it. Looking at it again all of hacking here is not required.
Posting a cleaner v2 without this.

Thanks!

>
>
> --
> Martin K. Petersen  Oracle Linux Engineering




-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] Add ioctl to issue ZBC/ZAC commands via block layer

2016-06-07 Thread Shaun Tancheff
Add New ioctl types
BLKREPORT- Issue Report Zones to device.
BLKOPENZONE  - Issue an Zone Action: Open Zone command.
BLKCLOSEZONE - Issue an Zone Action: Close Zone command.
BLKRESETZONE - Issue an Zone Action: Reset Zone command.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
V2:
 - Added include/uapi/linux/fs.h
 - Removed REQ_META flag from this patch.

 block/ioctl.c | 110 ++
 include/uapi/linux/blkzoned_api.h |   6 +++
 include/uapi/linux/fs.h   |   1 +
 3 files changed, 117 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..97f45f5 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -194,6 +195,109 @@ int blkdev_reread_part(struct block_device *bdev)
 }
 EXPORT_SYMBOL(blkdev_reread_part);
 
+static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode,
+   void __user *parg)
+{
+   int error = -EFAULT;
+   gfp_t gfp = GFP_KERNEL;
+   struct bdev_zone_report_io *zone_iodata = NULL;
+   int order = 0;
+   struct page *pgs = NULL;
+   u32 alloc_size = PAGE_SIZE;
+   unsigned long op_flags = 0;
+   u8 opt = 0;
+
+   if (!(mode & FMODE_READ))
+   return -EBADF;
+
+   zone_iodata = (void *)get_zeroed_page(gfp);
+   if (!zone_iodata) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   if (copy_from_user(zone_iodata, parg, sizeof(*zone_iodata))) {
+   error = -EFAULT;
+   goto report_zones_out;
+   }
+   if (zone_iodata->data.in.return_page_count > alloc_size) {
+   int npages;
+
+   alloc_size = zone_iodata->data.in.return_page_count;
+   npages = (alloc_size + PAGE_SIZE - 1) / PAGE_SIZE;
+   order =  ilog2(roundup_pow_of_two(npages));
+   pgs = alloc_pages(gfp, order);
+   if (pgs) {
+   void *mem = page_address(pgs);
+
+   if (!mem) {
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   memset(mem, 0, alloc_size);
+   memcpy(mem, zone_iodata, sizeof(*zone_iodata));
+   free_page((unsigned long)zone_iodata);
+   zone_iodata = mem;
+   } else {
+   /* Result requires DMA capable memory */
+   pr_err("Not enough memory available for request.\n");
+   error = -ENOMEM;
+   goto report_zones_out;
+   }
+   }
+   opt = zone_iodata->data.in.report_option & 0x7F;
+   error = blkdev_issue_zone_report(bdev, op_flags,
+   zone_iodata->data.in.zone_locator_lba, opt,
+   pgs ? pgs : virt_to_page(zone_iodata),
+   alloc_size, GFP_KERNEL);
+
+   if (error)
+   goto report_zones_out;
+
+   if (copy_to_user(parg, zone_iodata, alloc_size))
+   error = -EFAULT;
+
+report_zones_out:
+   if (pgs)
+   __free_pages(pgs, order);
+   else if (zone_iodata)
+   free_page((unsigned long)zone_iodata);
+   return error;
+}
+
+static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long arg)
+{
+   unsigned long op_flags = 0;
+
+   if (!(mode & FMODE_WRITE))
+   return -EBADF;
+
+   /*
+* When acting on zones we explicitly disallow using a partition.
+*/
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
+   switch (cmd) {
+   case BLKOPENZONE:
+   op_flags |= REQ_OPEN_ZONE;
+   break;
+   case BLKCLOSEZONE:
+   op_flags |= REQ_CLOSE_ZONE;
+   break;
+   case BLKRESETZONE:
+   op_flags |= REQ_RESET_ZONE;
+   break;
+   default:
+   pr_err("%s: Unknown action: %u\n", __func__, cmd);
+   WARN_ON(1);
+   }
+   return blkdev_issue_zone_action(bdev, op_flags, arg, GFP_KERNEL);
+}
+
 static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
unsigned long arg, unsigned long flags)
 {
@@ -568,6 +672,12 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
case BLKTRACESETUP:
case BLKTRACETEARDOWN:
return blk_trace_ioctl(bdev, cmd, argp);
+   case BLKREPORT:
+   return blk_zoned_report_ioctl(bdev, mode, argp);
+   case

[PATCH v2 4/4] Add ata pass-through path for ZAC commands.

2016-06-07 Thread Shaun Tancheff
The current generation of HBA SAS adapters support connecting SATA
drives and perform SCSI<->ATA translations in hardware.
Unfortunately the ZBC commands are not being translate (yet).

Currently users of SAS controllers can only send ZAC commands via
ata pass-through.

This method overloads the meaning of REQ_META to direct ZBC commands
to construct ZAC equivalent ATA pass through commands.
Note also that this approach expects the initiator to deal with the
little endian result due to bypassing the normal translation layers.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
v2:
 - Added REQ_META to op_flags if high bit is set in opt.

 block/ioctl.c   | 32 
 drivers/scsi/sd.c   | 70 +
 include/linux/ata.h | 15 
 3 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 97f45f5..c853c6f 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -245,6 +245,9 @@ static int blk_zoned_report_ioctl(struct block_device 
*bdev, fmode_t mode,
}
}
opt = zone_iodata->data.in.report_option & 0x7F;
+   if (zone_iodata->data.in.report_option & ZOPT_USE_ATA_PASS)
+   op_flags |= REQ_META;
+
error = blkdev_issue_zone_report(bdev, op_flags,
zone_iodata->data.in.zone_locator_lba, opt,
pgs ? pgs : virt_to_page(zone_iodata),
@@ -281,6 +284,35 @@ static int blk_zoned_action_ioctl(struct block_device 
*bdev, fmode_t mode,
return -EFAULT;
}
 
+   /*
+* When the low bit is set force ATA passthrough try to work around
+* older SAS HBA controllers that don't support ZBC to ZAC translation.
+*
+* When the low bit is clear follow the normal path but also correct
+* for ~0ul LBA means 'for all lbas'.
+*
+* NB: We should do extra checking here to see if the user specified
+* the entire block device as opposed to a partition of the
+* device
+*/
+   if (arg & 1) {
+   op_flags |= REQ_META;
+   if (arg != ~0ul)
+   arg &= ~1ul; /* ~1 :: 0xFF...FE */
+   } else {
+   if (arg == ~1ul)
+   arg = ~0ul;
+   }
+
+   /*
+* When acting on zones we explicitly disallow using a partition.
+*/
+   if (bdev != bdev->bd_contains) {
+   pr_err("%s: All zone operations disallowed on this device\n",
+   __func__);
+   return -EFAULT;
+   }
+
switch (cmd) {
case BLKOPENZONE:
op_flags |= REQ_OPEN_ZONE;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 06b54d5..cf96f01 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1182,12 +1183,29 @@ static int sd_setup_zoned_cmnd(struct scsi_cmnd *cmd)
 
cmd->cmd_len = 16;
memset(cmd->cmnd, 0, cmd->cmd_len);
-   cmd->cmnd[0] = ZBC_IN;
-   cmd->cmnd[1] = ZI_REPORT_ZONES;
-   put_unaligned_be64(sector, >cmnd[2]);
-   put_unaligned_be32(nr_bytes, >cmnd[10]);
-   /* FUTURE ... when streamid is available */
-   /* cmd->cmnd[14] = bio_get_streamid(bio); */
+   if (rq->cmd_flags & REQ_META) {
+   cmd->cmnd[0] = ATA_16;
+   cmd->cmnd[1] = (0x6 << 1) | 1;
+   cmd->cmnd[2] = 0x0e;
+   /* FUTURE ... when streamid is available */
+   /* cmd->cmnd[3] = bio_get_streamid(bio); */
+   cmd->cmnd[4] = ATA_SUBCMD_ZAC_MGMT_IN_REPORT_ZONES;
+   cmd->cmnd[5] = ((nr_bytes / 512) >> 8) & 0xff;
+   cmd->cmnd[6] = (nr_bytes / 512) & 0xff;
+
+   _lba_to_cmd_ata(>cmnd[7], sector);
+
+   cmd->cmnd[13] = 1 << 6;
+   cmd->cmnd[14] = ATA_CMD_ZAC_MGMT_IN;
+   } else {
+   cmd->cmnd[0] = ZBC_IN;
+   cmd->cmnd[1] = ZI_REPORT_ZONES;
+   put_unaligned_be64(sector, >cmnd[2]);
+   put_unaligned_be32(nr_bytes, >cmnd[10]);
+   /* FUTURE ... when streamid is available */
+   /* cmd->cmnd[14] = bio_get_streamid(bio); */
+   }
+
cmd->sc_data_direction = DMA_FROM_DEVICE;
cmd->sdb.length = nr_bytes;
cmd->transfersize = sdp->sector_size;
@@ -1208,14 +1226,29 @@ static int sd_setup_zoned_cmnd(struct scsi_cmnd *cmd)
cmd->c

[PATCH v2 2/4] Add bio/request flags for using ZBC/ZAC commands

2016-06-07 Thread Shaun Tancheff
T10 ZBC and T13 ZAC specify operations for Zoned devices.

To be able to access the zone information and open and close zones
adding flags for the report zones command (REQ_REPORT_ZONES) and for
Open and Close zone (REQ_OPEN_ZONE and REQ_CLOSE_ZONE) can be added
for use by struct bio's bi_rw and by struct request's cmd_flags.

To reduce the number of additional flags needed REQ_RESET_ZONE shares
the same flag as REQ_REPORT_ZONES and is differentiated by direction.
Report zones is a device read that requires a buffer. Reset is a device
command (WRITE) that has no associated data transfer.

The Finish zone command is intentionally not implimented as there is no
current use case for that operation.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option is useful as it can reduce the number of
zones in each report, but not critical.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---
V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.

 MAINTAINERS   |   9 ++
 block/blk-lib.c   |  97 +
 drivers/scsi/sd.c |  99 +-
 drivers/scsi/sd.h |   1 +
 include/linux/blk_types.h |  19 +++-
 include/linux/blkzoned_api.h  |  25 +
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 215 ++
 8 files changed, 461 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ed42cb6..d9fafa2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12662,6 +12662,15 @@ F: Documentation/networking/z8530drv.txt
 F: drivers/net/hamradio/*scc.c
 F: drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M: Shaun Tancheff <shaun.tanch...@seagate.com>
+W: http://seagate.com
+W: https://github.com/Seagate/ZDM-Device-Mapper
+L: linux-bl...@vger.kernel.org
+S: Maintained
+F: include/linux/blkzoned_api.h
+F: include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M: Seth Jennings <sjenn...@redhat.com>
 L: linux...@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 23d7f30..ce4168a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 
@@ -249,3 +250,99 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:  target blockdev
+ * @op_flags:  extra bio rw flags. If unsure, use 0.
+ * @sector:starting sector (report will include this sector).
+ * @opt:   See: zone_report_option, default is 0 (all zones).
+ * @page:  one or more contiguous pages.
+ * @pgsz:  up to size of page in bytes, size of report.
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+sector_t sector, u8 opt, struct page *page,
+size_t pgsz, gfp_t gfp_mask)
+{
+   struct bdev_zone_report *conv = page_address(page);
+   struct bio *bio;
+   unsigned int nr_iovecs = 1;
+   int ret = 0;
+
+   if (pgsz < (sizeof(struct bdev_zone_report) +
+   sizeof(struct bdev_zone_descriptor)))
+   return -EINVAL;
+
+   bio = bio_alloc(gfp_mask, nr_iovecs);
+   if (!bio)
+   return -ENOMEM;
+
+   conv->descriptor_count = 0;
+   bio->bi_iter.bi_sector = sector;
+   bio->bi_bdev = bdev;
+   bio->bi_vcnt = 0;
+   bio->bi_iter.bi_size = 0;
+
+   op_flags |= REQ_REPORT_ZONES;
+
+   /* FUTURE ... when streamid is available: */
+   /* bio_set_streamid(bio, opt); */
+
+   bio_add_page(bio, page, pgsz, 0);
+   ret = submit_bio_wait(READ | op_flags, bio);
+
+   /*
+* When our request it nak'd the underlying device maybe conventional
+* so ... report a single conventional zone the size of the device.
+*/
+   if (ret == -EIO && conv->descriptor_count) {
+   /* Adjust the conventional to the size of the partition ... */
+   __be64 blksz = cpu_to_be64(bdev->bd_part->nr_sects);
+
+   conv->maximum_lba = blksz;
+   conv->descriptors[0].type = ZTYP_CONVENTIONAL;

[PATCH v2 0/4] Block layer support ZAC/ZBC commands

2016-06-07 Thread Shaun Tancheff
As Host Aware drives are becoming available we would like to be able
to make use of such drives. This series is also intended to be suitable
for use by Host Managed drives.

ZAC/ZBC drives add new commands for discovering and working with Zones.

This extends the ZAC/ZBC support up to the block layer.

Thie first patch in the series is a place-holder for Mike Christi's
separate operations from flags ...
https://lkml.kernel.org/r/1465155145-10812-1-git-send-email-mchri...@redhat.com
Once that work is completed the first patch can be dropped.

Patches for util-linux can be found here:
https://github.com/Seagate/ZDM-Device-Mapper/tree/master/patches/util-linux

Using BIOs to issue ZBC commands allows DM targets (such as ZDM) or
file-systems such as btrfs or nilfs2 to extend their block allocation
schemes and issue discards that are zone aware.

A perhaps non-obvious approach is that a conventional drive will 
returns a descriptor with a single large conventional zone.

This patch is also at
https://github.com/stancheff/linux.git
g...@github.com:stancheff/linux.git
branch: v4.7-rc2+bio.zbc.v2

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Moved include/uapi/linux/fs.h changes to patch 3
 - Fixed commit message for first patch in series.

Shaun Tancheff (4):
  Losing bits on request.cmd_flags
  Add bio/request flags for using ZBC/ZAC commands
  Add ioctl to issue ZBC/ZAC commands via block layer
  Add ata pass-through path for ZAC commands.

 MAINTAINERS   |   9 ++
 block/blk-core.c  |  17 +--
 block/blk-lib.c   |  97 +
 block/blk-merge.c |   2 +-
 block/blk-mq.c|   2 +-
 block/cfq-iosched.c   |   2 +-
 block/elevator.c  |   4 +-
 block/ioctl.c | 142 
 drivers/scsi/sd.c | 141 +++-
 drivers/scsi/sd.h |   1 +
 include/linux/ata.h   |  15 +++
 include/linux/blk_types.h |  19 +++-
 include/linux/blkzoned_api.h  |  25 +
 include/linux/elevator.h  |   4 +-
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/blkzoned_api.h | 221 ++
 include/uapi/linux/fs.h   |   1 +
 17 files changed, 683 insertions(+), 20 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] Losing bits on request.cmd_flags

2016-06-07 Thread Shaun Tancheff
In a few places a temporary value smaller than a cmd_flags
is used to test for bits and or build up a new cmd_flags.

Change to use explict u64 values where appropriate.

This patch is place holder for: Mike Christie's separate operations ... series
https://lkml.kernel.org/r/1465155145-10812-1-git-send-email-mchri...@redhat.com

Once Mike's patches are stablized this patch can be dropped.

Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
---

v2:
 - Fixed commit message

 block/blk-core.c | 17 ++---
 block/blk-merge.c|  2 +-
 block/blk-mq.c   |  2 +-
 block/cfq-iosched.c  |  2 +-
 block/elevator.c |  4 ++--
 include/linux/elevator.h |  4 ++--
 6 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2475b1c7..945e564 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -959,7 +959,7 @@ static void __freed_request(struct request_list *rl, int 
sync)
  * A request has just been released.  Account for it, update the full and
  * congestion status, wake up any waiters.   Called under q->queue_lock.
  */
-static void freed_request(struct request_list *rl, unsigned int flags)
+static void freed_request(struct request_list *rl, u64 flags)
 {
struct request_queue *q = rl->q;
int sync = rw_is_sync(flags);
@@ -1054,7 +1054,7 @@ static struct io_context *rq_ioc(struct bio *bio)
 /**
  * __get_request - get a free request
  * @rl: request list to allocate from
- * @rw_flags: RW and SYNC flags
+ * @rw: RW and SYNC flags
  * @bio: bio to allocate request for (can be %NULL)
  * @gfp_mask: allocation mask
  *
@@ -1065,7 +1065,7 @@ static struct io_context *rq_ioc(struct bio *bio)
  * Returns ERR_PTR on failure, with @q->queue_lock held.
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
-static struct request *__get_request(struct request_list *rl, int rw_flags,
+static struct request *__get_request(struct request_list *rl, unsigned long rw,
 struct bio *bio, gfp_t gfp_mask)
 {
struct request_queue *q = rl->q;
@@ -1073,6 +1073,7 @@ static struct request *__get_request(struct request_list 
*rl, int rw_flags,
struct elevator_type *et = q->elevator->type;
struct io_context *ioc = rq_ioc(bio);
struct io_cq *icq = NULL;
+   u64 rw_flags = rw;
const bool is_sync = rw_is_sync(rw_flags) != 0;
int may_queue;
 
@@ -1237,7 +1238,8 @@ rq_starved:
  * Returns ERR_PTR on failure, with @q->queue_lock held.
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
-static struct request *get_request(struct request_queue *q, int rw_flags,
+static struct request *get_request(struct request_queue *q,
+  unsigned long rw_flags,
   struct bio *bio, gfp_t gfp_mask)
 {
const bool is_sync = rw_is_sync(rw_flags) != 0;
@@ -1490,7 +1492,7 @@ void __blk_put_request(struct request_queue *q, struct 
request *req)
 * it didn't come out of our reserved rq pools
 */
if (req->cmd_flags & REQ_ALLOCED) {
-   unsigned int flags = req->cmd_flags;
+   u64 flags = req->cmd_flags;
struct request_list *rl = blk_rq_rl(req);
 
BUG_ON(!list_empty(>queuelist));
@@ -1712,7 +1714,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio)
 {
const bool sync = !!(bio->bi_rw & REQ_SYNC);
struct blk_plug *plug;
-   int el_ret, rw_flags, where = ELEVATOR_INSERT_SORT;
+   u64 rw_flags;
+   int el_ret, where = ELEVATOR_INSERT_SORT;
struct request *req;
unsigned int request_count = 0;
 
@@ -2246,7 +2249,7 @@ EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
  */
 unsigned int blk_rq_err_bytes(const struct request *rq)
 {
-   unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK;
+   u64 ff = rq->cmd_flags & REQ_FAILFAST_MASK;
unsigned int bytes = 0;
struct bio *bio;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2613531..fec37e1 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -604,7 +604,7 @@ static int ll_merge_requests_fn(struct request_queue *q, 
struct request *req,
  */
 void blk_rq_set_mixed_merge(struct request *rq)
 {
-   unsigned int ff = rq->cmd_flags & REQ_FAILFAST_MASK;
+   u64 ff = rq->cmd_flags & REQ_FAILFAST_MASK;
struct bio *bio;
 
if (rq->cmd_flags & REQ_MIXED_MERGE)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 29cbc1b..db962bc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -159,7 +159,7 @@ bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
 EXPORT_SYMBOL(blk_mq_can_queue);
 
 static void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx,
-  struct request 

Re: [PATCH 2/4] Add bio/request flags for using ZBC/ZAC commands

2016-06-06 Thread Shaun Tancheff
On Mon, Jun 6, 2016 at 10:39 AM, Shaun Tancheff <sh...@tancheff.com> wrote:
>
> T10 ZBC and T13 ZAC specify operations for Zoned devices.
>
> To be able to access the zone information and open and close zones
> adding flags for the report zones command (REQ_REPORT_ZONES) and for
> Open and Close zone (REQ_OPEN_ZONE and REQ_CLOSE_ZONE) can be added
> for use by struct bio's bi_rw and by struct request's cmd_flags.
>
> To reduce the number of additional flags needed REQ_RESET_ZONE shares
> the same flag as REQ_REPORT_ZONES and is differentiated by direction.
> Report zones is a device read that requires a buffer. Reset is a device
> command (WRITE) that has no associated data transfer.
>
> The Finish zone command is intentionally not implimented as there is no
> current use case for that operation.
>
> Report zones currently defaults to reporting on all zones. It expected
> that support for the zone option flag will piggy back on streamid
> support. The report option is useful as it can reduce the number of
> zones in each report, but not critical.
>
> Signed-off-by: Shaun Tancheff <shaun.tanch...@seagate.com>
> ---
>  MAINTAINERS   |   9 ++
>  block/blk-lib.c   |  96 +
>  drivers/scsi/sd.c |  99 +-
>  drivers/scsi/sd.h |   1 +
>  include/linux/blk_types.h |  19 +++-
>  include/linux/blkzoned_api.h  |  25 +
>  include/uapi/linux/Kbuild |   1 +
>  include/uapi/linux/blkzoned_api.h | 215 
> ++
>  include/uapi/linux/fs.h   |   1 +
>  9 files changed, 461 insertions(+), 5 deletions(-)
>  create mode 100644 include/linux/blkzoned_api.h
>  create mode 100644 include/uapi/linux/blkzoned_api.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7304d2e..0b71a3c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12660,6 +12660,15 @@ F: Documentation/networking/z8530drv.txt
>  F: drivers/net/hamradio/*scc.c
>  F: drivers/net/hamradio/z8530.h
>
> +ZBC AND ZBC BLOCK DEVICES
> +M: Shaun Tancheff <shaun.tanch...@seagate.com>
> +W: http://seagate.com
> +W: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Seagate_ZDM-2DDevice-2DMapper=CwIBAg=IGDlg0lD0b-nebmJJ0Kp8A=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA=NgIOfWitaBWSqZoCVyVSkoMDm3cP1ofhQL7wPM1Z-xA=gEau_a22IpcIHd6A3J6ovk5P_nay7XAov8OoSfJdTXs=
> +L: linux-bl...@vger.kernel.org
> +S: Maintained
> +F: include/linux/blkzoned_api.h
> +F: include/uapi/linux/blkzoned_api.h
> +
>  ZBUD COMPRESSED PAGE ALLOCATOR
>  M: Seth Jennings <sjenn...@redhat.com>
>  L: linux...@kvack.org
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 23d7f30..a7f047c 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "blk.h"
>
> @@ -249,3 +250,98 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
> sector_t sector,
> return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
>  }
>  EXPORT_SYMBOL(blkdev_issue_zeroout);
> +
> +/**
> + * blkdev_issue_zone_report - queue a report zones operation
> + * @bdev:  target blockdev
> + * @bi_rw: extra bio rw flags. If unsure, use 0.
> + * @sector:starting sector (report will include this sector).

Missing:
  * @opt: See: zone_report_option, default is 0 (all zones)

> + * @page:  one or more contiguous pages.
> + * @pgsz:  up to size of page in bytes, size of report.
> + * @gfp_mask:  memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + *Issue a zone report request for the sectors in question.
> + */
> +int blkdev_issue_zone_report(struct block_device *bdev, unsigned int bi_rw,
> +sector_t sector, u8 opt, struct page *page,
> +size_t pgsz, gfp_t gfp_mask)
> +{
> +   struct bdev_zone_report *conv = page_address(page);
> +   struct bio *bio;
> +   unsigned int nr_iovecs = 1;
> +   int ret = 0;
> +
> +   if (pgsz < (sizeof(struct bdev_zone_report) +
> +   sizeof(struct bdev_zone_descriptor)))
> +   return -EINVAL;
> +
> +   bio = bio_alloc(gfp_mask, nr_iovecs);
> +   if (!bio)
> +   return -ENOMEM;
> +
> +   conv->descriptor_count = 0;
> +   bio->bi_iter.bi_sector = sector;
> +   bio->bi_bdev = bdev;
> +   bio->bi_vcnt = 0;
> +   bio->bi_iter.bi_size = 0;
> +
> +   bi_rw |= REQ_REPORT_ZONES;
> +
> +   /* FUTURE ... when streamid is avai

  1   2   >