Re: [PATCH V3 00/11] block-throttle: add .high limit
Not to compound upon this again. However if BFQ isn't suitable to replace CFQ for high I/O workloads (I've yet to see 20k IOPS on any reasonably sized SAN (SC4020 / v5000, etc)), can't we at-least default BFQ to become the default I/O scheduler for people otherwise requesting CFQ? Paolo has had a team of students working on this for years, even if the otherwise "secret weapon" is mainlined I highly doubt his work will stop. We're pretty close to fixing hard I/O stalls in Linux, mainlining being the last major burden. While I've contributed nothing to BFQ code wise, absolutely let any of us know if there's anything outstanding to solve hard lockups and I believe any of us will try our best. Kyle. On Sun, Oct 16, 2016 at 12:02 PM, Paolo Valentewrote: > >> Il giorno 14 ott 2016, alle ore 20:35, Tejun Heo ha >> scritto: >> >> Hello, Paolo. >> >> On Fri, Oct 14, 2016 at 07:13:41PM +0200, Paolo Valente wrote: >>> That said, your 'thus' seems a little too strong: "bfq does not yet >>> handle fast SSDs, thus we need something else". What about the >>> millions of devices (and people) still within 10-20 K IOPS, and >>> experiencing awful latencies and lack of bandwidth guarantees? >> >> I'm not objecting to any of that. > > Ok, sorry for misunderstanding. I'm just more and more confused about > why a readily available, and not proven wrong solution has not yet > been accepted, if everybody apparently acknowledges the problem. > >> My point just is that bfq, at least >> as currently implemented, is unfit for certain classes of use cases. >> > > Absolutely correct. > FWIW, it looks like the only way we can implement proportional control on highspeed ssds with acceptable overhead >>> >>> Maybe not: as I wrote to Viveck in a previous reply, containing >>> pointers to documentation, we have already achieved twenty millions >>> of decisions per second with a prototype driving existing >>> proportional-share packet schedulers (essentially without >>> modifications). >> >> And that doesn't require idling and thus doesn't severely impact >> utilization? >> > > Nope. Packets are commonly assumed to be sent asynchronously. > I guess that discussing the validity of this assumption is out of the > scope of this thread. > > Thanks, > Paolo > is somehow finding a way to calculate the cost of each IO and throttle IOs according to that while controlling for latency as necessary. Slice scheduling with idling seems too expensive with highspeed devices with high io depth. >>> >>> Yes, that's absolutely true. I'm already thinking about an idleless >>> solution. As I already wrote, I'm willing to help with scheduling in >>> blk-mq. I hope there will be the opportunity to find some way to go >>> at KS. >> >> It'd be great to have a proportional control mechanism whose overhead >> is acceptable. Unfortunately, we don't have one now and nothing seems >> right around the corner. (Mostly) work-conserving throttling would be >> fiddlier to use but is something which is useful regardless of such >> proportional control mechanism and can be obtained relatively easily. >> >> I don't see why the two approaches would be mutually exclusive. >> >> Thanks. >> >> -- >> tejun >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-block" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > Paolo Valente > Algogroup > Dipartimento di Scienze Fisiche, Informatiche e Matematiche > Via Campi 213/B > 41125 Modena - Italy > http://algogroup.unimore.it/people/paolo/ > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[TRIVIAL PATCH] nvme : trivial whitespace fix in pci.c
Remove annoying white space damage in pci.c. Signed-off-by: Stephen Bates--- drivers/nvme/host/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a7c6e9d..e2b3243 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -50,7 +50,7 @@ #define NVME_AQ_DEPTH 256 #define SQ_SIZE(depth) (depth * sizeof(struct nvme_command)) #define CQ_SIZE(depth) (depth * sizeof(struct nvme_completion)) - + /* * We handle AEN commands ourselves and don't even let the * block layer know about them. -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 2/3] ata: Enabling ATA Command Priorities
This patch checks to see if an ATA device supports NCQ command priorities. If so and the user has specified an iocontext that indicates IO_PRIO_CLASS_RT then we build a tf with a high priority command. This is done to improve the tail latency of commands that are high priority by passing priority to the device. Signed-off-by: Adam Manzanares--- drivers/ata/libata-core.c | 35 ++- drivers/ata/libata-scsi.c | 6 +- drivers/ata/libata.h | 2 +- include/linux/ata.h | 6 ++ include/linux/libata.h| 18 ++ 5 files changed, 64 insertions(+), 3 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 223a770..181b530 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -739,6 +739,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct ata_device *dev) * @n_block: Number of blocks * @tf_flags: RW/FUA etc... * @tag: tag + * @class: IO priority class * * LOCKING: * None. @@ -753,7 +754,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct ata_device *dev) */ int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev, u64 block, u32 n_block, unsigned int tf_flags, - unsigned int tag) + unsigned int tag, int class) { tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; tf->flags |= tf_flags; @@ -785,6 +786,12 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev, tf->device = ATA_LBA; if (tf->flags & ATA_TFLAG_FUA) tf->device |= 1 << 7; + + if (ata_ncq_prio_enabled(dev)) { + if (class == IOPRIO_CLASS_RT) + tf->hob_nsect |= ATA_PRIO_HIGH << +ATA_SHIFT_PRIO; + } } else if (dev->flags & ATA_DFLAG_LBA) { tf->flags |= ATA_TFLAG_LBA; @@ -2156,6 +2163,30 @@ static void ata_dev_config_ncq_non_data(struct ata_device *dev) } } +static void ata_dev_config_ncq_prio(struct ata_device *dev) +{ + struct ata_port *ap = dev->link->ap; + unsigned int err_mask; + + err_mask = ata_read_log_page(dev, +ATA_LOG_SATA_ID_DEV_DATA, +ATA_LOG_SATA_SETTINGS, +ap->sector_buf, +1); + if (err_mask) { + ata_dev_dbg(dev, + "failed to get Identify Device data, Emask 0x%x\n", + err_mask); + return; + } + + if (ap->sector_buf[ATA_LOG_NCQ_PRIO_OFFSET] & BIT(3)) + dev->flags |= ATA_DFLAG_NCQ_PRIO; + else + ata_dev_dbg(dev, "SATA page does not support priority\n"); + +} + static int ata_dev_config_ncq(struct ata_device *dev, char *desc, size_t desc_sz) { @@ -2205,6 +2236,8 @@ static int ata_dev_config_ncq(struct ata_device *dev, ata_dev_config_ncq_send_recv(dev); if (ata_id_has_ncq_non_data(dev->id)) ata_dev_config_ncq_non_data(dev); + if (ata_id_has_ncq_prio(dev->id)) + ata_dev_config_ncq_prio(dev); } return 0; diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 9cceb4a..2bccc3c 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -50,6 +50,7 @@ #include #include #include +#include #include "libata.h" #include "libata-transport.h" @@ -1755,6 +1756,8 @@ static unsigned int ata_scsi_rw_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; + struct request *rq = scmd->request; + int class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq)); unsigned int tf_flags = 0; u64 block; u32 n_block; @@ -1821,7 +1824,8 @@ static unsigned int ata_scsi_rw_xlat(struct ata_queued_cmd *qc) qc->nbytes = n_block * scmd->device->sector_size; rc = ata_build_rw_tf(>tf, qc->dev, block, n_block, tf_flags, -qc->tag); +qc->tag, class); + if (likely(rc == 0)) return 0; diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h index 3b301a4..8f3a559 100644 --- a/drivers/ata/libata.h +++ b/drivers/ata/libata.h @@ -66,7 +66,7 @@ extern u64 ata_tf_to_lba48(const struct ata_taskfile *tf); extern struct ata_queued_cmd *ata_qc_new_init(struct ata_device *dev, int tag); extern int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev, u64 block, u32 n_block, unsigned int tf_flags, - unsigned int tag); +
[PATCH v6 1/3] block: Add iocontext priority to request
Patch adds an association between iocontext ioprio and the ioprio of a request. This is done to enable request based drivers the ability to act on priority information stored in the request. An example being ATA devices that support command priorities. If the ATA driver discovers that the device supports command priorities and the request has valid priority information indicating the request is high priority, then a high priority command can be sent to the device. This should improve tail latencies for high priority IO on any device that queues requests internally and can make use of the priority information stored in the request. The ioprio of the request is set in blk_rq_set_prio which takes the request and the ioc as arguments. If the ioc is valid in blk_rq_set_prio then the iopriority of the request is set as the iopriority of the ioc. In init_request_from_bio a check is made to see if the ioprio of the bio is valid and if so then the request prio comes from the bio. Signed-off-by: Adam Manzananares--- block/blk-core.c | 4 +++- include/linux/blkdev.h | 14 ++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 14d7c07..361b1b9 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1153,6 +1153,7 @@ static struct request *__get_request(struct request_list *rl, int op, blk_rq_init(q, rq); blk_rq_set_rl(rq, rl); + blk_rq_set_prio(rq, ioc); req_set_op_attrs(rq, op, op_flags | REQ_ALLOCED); /* init elvpriv */ @@ -1656,7 +1657,8 @@ void init_request_from_bio(struct request *req, struct bio *bio) req->errors = 0; req->__sector = bio->bi_iter.bi_sector; - req->ioprio = bio_prio(bio); + if (ioprio_valid(bio_prio(bio))) + req->ioprio = bio_prio(bio); blk_rq_bio_prep(req->q, req, bio); } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c47c358..9a0ceaa 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -934,6 +934,20 @@ static inline unsigned int blk_rq_count_bios(struct request *rq) } /* + * blk_rq_set_prio - associate a request with prio from ioc + * @rq: request of interest + * @ioc: target iocontext + * + * Assocate request prio with ioc prio so request based drivers + * can leverage priority information. + */ +static inline void blk_rq_set_prio(struct request *rq, struct io_context *ioc) +{ + if (ioc) + rq->ioprio = ioc->ioprio; +} + +/* * Request issue related functions. */ extern struct request *blk_peek_request(struct request_queue *q); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] smartpqi: switch to pci_alloc_irq_vectors
Hi Don, did you also have a chance to test the patch and verify that the queues are properly set up with a smartpqi controller? -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3/3] smartpqi: switch to pci_alloc_irq_vectors
> -Original Message- > From: Johannes Thumshirn [mailto:jthumsh...@suse.de] > Sent: Monday, October 17, 2016 2:34 AM > To: Christoph Hellwig > Cc: martin.peter...@oracle.com; Don Brace; ax...@kernel.dk; linux- > s...@vger.kernel.org; linux-block@vger.kernel.org > Subject: Re: [PATCH 3/3] smartpqi: switch to pci_alloc_irq_vectors > > EXTERNAL EMAIL > > > On Sat, Oct 15, 2016 at 10:47:21AM +0200, Christoph Hellwig wrote: > > Which cleans up a lot of the MSI-X handling, and allows us to use the > > PCI IRQ layer provided vector mapping, which we can then expose to blk- > mq. > > > > Signed-off-by: Christoph Hellwig> > --- > > Reviewed-by: Johannes Thumshirn > -- > Johannes Thumshirn Storage > jthumsh...@suse.de+49 911 74053 689 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 Acked-by: Don Brace -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] scsi: allow LLDDs to expose the queue mapping to blk-mq
On Mon, Oct 17, 2016 at 09:27:03AM +0200, Johannes Thumshirn wrote: > Shouldn't this hunk go into the previous patch? Yes, it should, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] blk-mq: export blk_mq_map_queues
On Sat, Oct 15, 2016 at 10:47:19AM +0200, Christoph Hellwig wrote: > This will allow SCSI to have a single blk_mq_ops structure that either > lets the LLDD map the queues to PCIe MSIx vectors or use the default. > > Signed-off-by: Christoph Hellwig> --- > block/blk-mq.h | 1 - > include/linux/blk-mq.h | 1 + > 2 files changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/blk-mq.h b/block/blk-mq.h > index e5d2524..5347f01 100644 > --- a/block/blk-mq.h > +++ b/block/blk-mq.h > @@ -38,7 +38,6 @@ void blk_mq_disable_hotplug(void); > /* > * CPU -> queue mappings > */ > -int blk_mq_map_queues(struct blk_mq_tag_set *set); > extern int blk_mq_hw_queue_to_node(unsigned int *map, unsigned int); > > static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q, > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h > index 535ab2e..6c0fb25 100644 > --- a/include/linux/blk-mq.h > +++ b/include/linux/blk-mq.h > @@ -237,6 +237,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q); > void blk_mq_freeze_queue_start(struct request_queue *q); > int blk_mq_reinit_tagset(struct blk_mq_tag_set *set); > > +int blk_mq_map_queues(struct blk_mq_tag_set *set); > void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int > nr_hw_queues); > > /* > -- > 2.1.4 See comment on patch 2, otherwise Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html