Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-17 Thread Kyle Sanderson
Not to compound upon this again. However if BFQ isn't suitable to
replace CFQ for high I/O workloads (I've yet to see 20k IOPS on any
reasonably sized SAN (SC4020 / v5000, etc)), can't we at-least default
BFQ to become the default I/O scheduler for people otherwise
requesting CFQ? Paolo has had a team of students working on this for
years, even if the otherwise "secret weapon" is mainlined I highly
doubt his work will stop. We're pretty close to fixing hard I/O stalls
in Linux, mainlining being the last major burden.

While I've contributed nothing to BFQ code wise, absolutely let any of
us know if there's anything outstanding to solve hard lockups and I
believe any of us will try our best.

Kyle.

On Sun, Oct 16, 2016 at 12:02 PM, Paolo Valente
 wrote:
>
>> Il giorno 14 ott 2016, alle ore 20:35, Tejun Heo  ha 
>> scritto:
>>
>> Hello, Paolo.
>>
>> On Fri, Oct 14, 2016 at 07:13:41PM +0200, Paolo Valente wrote:
>>> That said, your 'thus' seems a little too strong: "bfq does not yet
>>> handle fast SSDs, thus we need something else".  What about the
>>> millions of devices (and people) still within 10-20 K IOPS, and
>>> experiencing awful latencies and lack of bandwidth guarantees?
>>
>> I'm not objecting to any of that.
>
> Ok, sorry for misunderstanding.  I'm just more and more confused about
> why a readily available, and not proven wrong solution has not yet
> been accepted, if everybody apparently acknowledges the problem.
>
>>  My point just is that bfq, at least
>> as currently implemented, is unfit for certain classes of use cases.
>>
>
> Absolutely correct.
>
 FWIW, it looks like the only way we can implement proportional control
 on highspeed ssds with acceptable overhead
>>>
>>> Maybe not: as I wrote to Viveck in a previous reply, containing
>>> pointers to documentation, we have already achieved twenty millions
>>> of decisions per second with a prototype driving existing
>>> proportional-share packet schedulers (essentially without
>>> modifications).
>>
>> And that doesn't require idling and thus doesn't severely impact
>> utilization?
>>
>
> Nope.  Packets are commonly assumed to be sent asynchronously.
> I guess that discussing the validity of this assumption is out of the
> scope of this thread.
>
> Thanks,
> Paolo
>
 is somehow finding a way to
 calculate the cost of each IO and throttle IOs according to that while
 controlling for latency as necessary.  Slice scheduling with idling
 seems too expensive with highspeed devices with high io depth.
>>>
>>> Yes, that's absolutely true.  I'm already thinking about an idleless
>>> solution.  As I already wrote, I'm willing to help with scheduling in
>>> blk-mq.  I hope there will be the opportunity to find some way to go
>>> at KS.
>>
>> It'd be great to have a proportional control mechanism whose overhead
>> is acceptable.  Unfortunately, we don't have one now and nothing seems
>> right around the corner.  (Mostly) work-conserving throttling would be
>> fiddlier to use but is something which is useful regardless of such
>> proportional control mechanism and can be obtained relatively easily.
>>
>> I don't see why the two approaches would be mutually exclusive.
>>
>> Thanks.
>>
>> --
>> tejun
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-block" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> --
> Paolo Valente
> Algogroup
> Dipartimento di Scienze Fisiche, Informatiche e Matematiche
> Via Campi 213/B
> 41125 Modena - Italy
> http://algogroup.unimore.it/people/paolo/
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[TRIVIAL PATCH] nvme : trivial whitespace fix in pci.c

2016-10-17 Thread Stephen Bates
Remove annoying white space damage in pci.c.

Signed-off-by: Stephen Bates 
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7c6e9d..e2b3243 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -50,7 +50,7 @@
 #define NVME_AQ_DEPTH  256
 #define SQ_SIZE(depth) (depth * sizeof(struct nvme_command))
 #define CQ_SIZE(depth) (depth * sizeof(struct nvme_completion))
-
+
 /*
  * We handle AEN commands ourselves and don't even let the
  * block layer know about them.
--
2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/3] ata: Enabling ATA Command Priorities

2016-10-17 Thread Adam Manzanares
This patch checks to see if an ATA device supports NCQ command priorities.
If so and the user has specified an iocontext that indicates
IO_PRIO_CLASS_RT then we build a tf with a high priority command.

This is done to improve the tail latency of commands that are high
priority by passing priority to the device.

Signed-off-by: Adam Manzanares 
---
 drivers/ata/libata-core.c | 35 ++-
 drivers/ata/libata-scsi.c |  6 +-
 drivers/ata/libata.h  |  2 +-
 include/linux/ata.h   |  6 ++
 include/linux/libata.h| 18 ++
 5 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 223a770..181b530 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -739,6 +739,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct 
ata_device *dev)
  * @n_block: Number of blocks
  * @tf_flags: RW/FUA etc...
  * @tag: tag
+ * @class: IO priority class
  *
  * LOCKING:
  * None.
@@ -753,7 +754,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct 
ata_device *dev)
  */
 int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev,
u64 block, u32 n_block, unsigned int tf_flags,
-   unsigned int tag)
+   unsigned int tag, int class)
 {
tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
tf->flags |= tf_flags;
@@ -785,6 +786,12 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct 
ata_device *dev,
tf->device = ATA_LBA;
if (tf->flags & ATA_TFLAG_FUA)
tf->device |= 1 << 7;
+
+   if (ata_ncq_prio_enabled(dev)) {
+   if (class == IOPRIO_CLASS_RT)
+   tf->hob_nsect |= ATA_PRIO_HIGH <<
+ATA_SHIFT_PRIO;
+   }
} else if (dev->flags & ATA_DFLAG_LBA) {
tf->flags |= ATA_TFLAG_LBA;
 
@@ -2156,6 +2163,30 @@ static void ata_dev_config_ncq_non_data(struct 
ata_device *dev)
}
 }
 
+static void ata_dev_config_ncq_prio(struct ata_device *dev)
+{
+   struct ata_port *ap = dev->link->ap;
+   unsigned int err_mask;
+
+   err_mask = ata_read_log_page(dev,
+ATA_LOG_SATA_ID_DEV_DATA,
+ATA_LOG_SATA_SETTINGS,
+ap->sector_buf,
+1);
+   if (err_mask) {
+   ata_dev_dbg(dev,
+   "failed to get Identify Device data, Emask 0x%x\n",
+   err_mask);
+   return;
+   }
+
+   if (ap->sector_buf[ATA_LOG_NCQ_PRIO_OFFSET] & BIT(3))
+   dev->flags |= ATA_DFLAG_NCQ_PRIO;
+   else
+   ata_dev_dbg(dev, "SATA page does not support priority\n");
+
+}
+
 static int ata_dev_config_ncq(struct ata_device *dev,
   char *desc, size_t desc_sz)
 {
@@ -2205,6 +2236,8 @@ static int ata_dev_config_ncq(struct ata_device *dev,
ata_dev_config_ncq_send_recv(dev);
if (ata_id_has_ncq_non_data(dev->id))
ata_dev_config_ncq_non_data(dev);
+   if (ata_id_has_ncq_prio(dev->id))
+   ata_dev_config_ncq_prio(dev);
}
 
return 0;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 9cceb4a..2bccc3c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "libata.h"
 #include "libata-transport.h"
@@ -1755,6 +1756,8 @@ static unsigned int ata_scsi_rw_xlat(struct 
ata_queued_cmd *qc)
 {
struct scsi_cmnd *scmd = qc->scsicmd;
const u8 *cdb = scmd->cmnd;
+   struct request *rq = scmd->request;
+   int class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq));
unsigned int tf_flags = 0;
u64 block;
u32 n_block;
@@ -1821,7 +1824,8 @@ static unsigned int ata_scsi_rw_xlat(struct 
ata_queued_cmd *qc)
qc->nbytes = n_block * scmd->device->sector_size;
 
rc = ata_build_rw_tf(>tf, qc->dev, block, n_block, tf_flags,
-qc->tag);
+qc->tag, class);
+
if (likely(rc == 0))
return 0;
 
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 3b301a4..8f3a559 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -66,7 +66,7 @@ extern u64 ata_tf_to_lba48(const struct ata_taskfile *tf);
 extern struct ata_queued_cmd *ata_qc_new_init(struct ata_device *dev, int tag);
 extern int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev,
   u64 block, u32 n_block, unsigned int tf_flags,
-  unsigned int tag);
+  

[PATCH v6 1/3] block: Add iocontext priority to request

2016-10-17 Thread Adam Manzanares
Patch adds an association between iocontext ioprio and the ioprio of a
request. This is done to enable request based drivers the ability to
act on priority information stored in the request. An example being
ATA devices that support command priorities. If the ATA driver discovers
that the device supports command priorities and the request has valid
priority information indicating the request is high priority, then a high
priority command can be sent to the device. This should improve tail
latencies for high priority IO on any device that queues requests
internally and can make use of the priority information stored in the
request.

The ioprio of the request is set in blk_rq_set_prio which takes the
request and the ioc as arguments. If the ioc is valid in blk_rq_set_prio
then the iopriority of the request is set as the iopriority of the ioc.
In init_request_from_bio a check is made to see if the ioprio of the bio
is valid and if so then the request prio comes from the bio.

Signed-off-by: Adam Manzananares 
---
 block/blk-core.c   |  4 +++-
 include/linux/blkdev.h | 14 ++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..361b1b9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1153,6 +1153,7 @@ static struct request *__get_request(struct request_list 
*rl, int op,
 
blk_rq_init(q, rq);
blk_rq_set_rl(rq, rl);
+   blk_rq_set_prio(rq, ioc);
req_set_op_attrs(rq, op, op_flags | REQ_ALLOCED);
 
/* init elvpriv */
@@ -1656,7 +1657,8 @@ void init_request_from_bio(struct request *req, struct 
bio *bio)
 
req->errors = 0;
req->__sector = bio->bi_iter.bi_sector;
-   req->ioprio = bio_prio(bio);
+   if (ioprio_valid(bio_prio(bio)))
+   req->ioprio = bio_prio(bio);
blk_rq_bio_prep(req->q, req, bio);
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..9a0ceaa 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -934,6 +934,20 @@ static inline unsigned int blk_rq_count_bios(struct 
request *rq)
 }
 
 /*
+ * blk_rq_set_prio - associate a request with prio from ioc
+ * @rq: request of interest
+ * @ioc: target iocontext
+ *
+ * Assocate request prio with ioc prio so request based drivers
+ * can leverage priority information.
+ */
+static inline void blk_rq_set_prio(struct request *rq, struct io_context *ioc)
+{
+   if (ioc)
+   rq->ioprio = ioc->ioprio;
+}
+
+/*
  * Request issue related functions.
  */
 extern struct request *blk_peek_request(struct request_queue *q);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] smartpqi: switch to pci_alloc_irq_vectors

2016-10-17 Thread Christoph Hellwig
Hi Don,

did you also have a chance to test the patch and verify that the
queues are properly set up with a smartpqi controller?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/3] smartpqi: switch to pci_alloc_irq_vectors

2016-10-17 Thread Don Brace

> -Original Message-
> From: Johannes Thumshirn [mailto:jthumsh...@suse.de]
> Sent: Monday, October 17, 2016 2:34 AM
> To: Christoph Hellwig
> Cc: martin.peter...@oracle.com; Don Brace; ax...@kernel.dk; linux-
> s...@vger.kernel.org; linux-block@vger.kernel.org
> Subject: Re: [PATCH 3/3] smartpqi: switch to pci_alloc_irq_vectors
> 
> EXTERNAL EMAIL
> 
> 
> On Sat, Oct 15, 2016 at 10:47:21AM +0200, Christoph Hellwig wrote:
> > Which cleans up a lot of the MSI-X handling, and allows us to use the
> > PCI IRQ layer provided vector mapping, which we can then expose to blk-
> mq.
> >
> > Signed-off-by: Christoph Hellwig 
> > ---
> 
> Reviewed-by: Johannes Thumshirn 
> --
> Johannes Thumshirn  Storage
> jthumsh...@suse.de+49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Acked-by: Don Brace 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] scsi: allow LLDDs to expose the queue mapping to blk-mq

2016-10-17 Thread Christoph Hellwig
On Mon, Oct 17, 2016 at 09:27:03AM +0200, Johannes Thumshirn wrote:
> Shouldn't this hunk go into the previous patch?

Yes, it should, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] blk-mq: export blk_mq_map_queues

2016-10-17 Thread Johannes Thumshirn
On Sat, Oct 15, 2016 at 10:47:19AM +0200, Christoph Hellwig wrote:
> This will allow SCSI to have a single blk_mq_ops structure that either
> lets the LLDD map the queues to PCIe MSIx vectors or use the default.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/blk-mq.h | 1 -
>  include/linux/blk-mq.h | 1 +
>  2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index e5d2524..5347f01 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -38,7 +38,6 @@ void blk_mq_disable_hotplug(void);
>  /*
>   * CPU -> queue mappings
>   */
> -int blk_mq_map_queues(struct blk_mq_tag_set *set);
>  extern int blk_mq_hw_queue_to_node(unsigned int *map, unsigned int);
>  
>  static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 535ab2e..6c0fb25 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -237,6 +237,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q);
>  void blk_mq_freeze_queue_start(struct request_queue *q);
>  int blk_mq_reinit_tagset(struct blk_mq_tag_set *set);
>  
> +int blk_mq_map_queues(struct blk_mq_tag_set *set);
>  void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int 
> nr_hw_queues);
>  
>  /*
> -- 
> 2.1.4

See comment on patch 2, otherwise
Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html