[PATCH] scsi: vmw-pvscsi: return DID_BUS_BUSY for adapter-initated aborts
The vmw_pvscsi driver returns DID_ABORT for commands aborted internally by the adapter, leading to the filesystem going read-only. Change the result to DID_BUS_BUSY, causing the kernel to retry the command. Signed-off-by: Jim Gill --- drivers/scsi/vmw_pvscsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/vmw_pvscsi.c b/drivers/scsi/vmw_pvscsi.c index c374e3b..777e5f1 100644 --- a/drivers/scsi/vmw_pvscsi.c +++ b/drivers/scsi/vmw_pvscsi.c @@ -609,7 +609,7 @@ static void pvscsi_complete_request(struct pvscsi_adapter *adapter, break; case BTSTAT_ABORTQUEUE: - cmd->result = (DID_ABORT << 16); + cmd->result = (DID_BUS_BUSY << 16); break; case BTSTAT_SCSIPARITY: -- 2.7.4
[PATCH] scsi: vmw-pvscsi: return DID_BUS_BUSY for adapter-initated aborts
The vmw_pvscsi driver returns DID_ABORT for commands aborted internally by the adapter, leading to the filesystem going read-only. Change the result to DID_BUS_BUSY, causing the kernel to retry the command. Signed-off-by: Jim Gill --- drivers/scsi/vmw_pvscsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/vmw_pvscsi.c b/drivers/scsi/vmw_pvscsi.c index c374e3b..777e5f1 100644 --- a/drivers/scsi/vmw_pvscsi.c +++ b/drivers/scsi/vmw_pvscsi.c @@ -609,7 +609,7 @@ static void pvscsi_complete_request(struct pvscsi_adapter *adapter, break; case BTSTAT_ABORTQUEUE: - cmd->result = (DID_ABORT << 16); + cmd->result = (DID_BUS_BUSY << 16); break; case BTSTAT_SCSIPARITY: -- 2.7.4
Re: [VERY EARLY RFC 00/13] Rework SCSI results handling
Johannes, > here's a early preview of my SCSI results rework so we can eventually > discuss things next week at LSF/MM (it still has compiler errors on > aic7xxx and scsi_debug). Here's what I'd like to see: - Forget about cramming everything into that dreaded u32 result. Bart's suggestion of making it a union is fine but it's perpetuating that these fields are one thing. And they really aren't. I'd rather not be confined to the crusty old model throughout SCSI. We can synthesize the u32 result the few places where it's actually required (i.e. returned to userland). - Create separate scsi_cmnd status fields for host, driver, target (SAM), and transport (msg). And maybe block layer errno as suggested. - Fix the naming so it makes sense. No more DID_SPOT_RUN. Let's have HOST_FOO / DRIVER_BAR / SAM_STAT_BAZ so it's clear who said what. - For the first iteration, maybe we can keep the message goo. But ideally I'd like to see the SPI-specifics moved to the SPI transport class. And then have an opaque transport error field in the scsi_cmnd that other transports can use as well. - If the interface naming is sensible, I don't think we need wrappers. But if it helps the driver conversion, I don't have a problem having them as long as they are self-documenting: scsi_set_command_status(HOST_FOO, DRIVER_SENSE, SAM_STAT_BAZ); is infinitely less error prone than: result = (NON_SENSE << 24) | (DID_RUN << 16) | (SPOT_FORGOT_TO_SHIFT); I'll set aside a slot next week... -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] [v2] scsi: ips: fix firmware timestamps for 32-bit
Arnd, > do_gettimeofday() is deprecated since it will stop working in 2038 on > 32-bit platforms, leading to incorrect times passed to the firmware. > On 64-bit platforms the current code appears to be fine, as the > calculation passes an 8-bit century number into the firmware that can > represent times long in the future (possibly until 25599). > > Using ktime_get_real_seconds() to get a 64-bit seconds value and > time64_to_tm() to convert it into the firmware format greatly > simplifies the ips timekeeping code, makes 32-bit and 64-bit behave > the same way here, and gets us closer to removing the deprecated > interfaces. Applied to 4.18/scsi-queue, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] [RESEND 2] scsi: esas2r: use ktime_get_real_seconds()
Arnd, > do_gettimeofday() is deprecated because of the y2038 overflow. Here, > we use the result to pass into a 32-bit field in the firmware, which > still risks an overflow, but if the firmware is written to expect > unsigned values, it can at least last until y2106, and there is not > much we can do about it. > > This changes do_gettimeofday() to ktime_get_real_seconds(), which at > least simplifies the code a bit, and avoids the deprecated > interface. I'm adding a comment about the overflow to document what > happens. Applied to 4.18/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH scsi-next] scsi: mvumi: Using module_pci_driver.
YueHaibing, > Remove boilerplate code by using macro module_pci_driver. Applied to 4.18/scsi-queue. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH][V2] isci: Fix infinite loop in while loop
Colin, > In the case when the phy_mask is bitwise anded with the phy_index bit > is zero the continue statement currently jumps to the next iteration > of the while loop and phy_index is never actually incremented, > potentially causing an infinite loop if phy_index is less than > SCI_MAX_PHS. Fix this by turning the while loop into a for loop. Applied to 4.17/scsi-fixes. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v3 0/6] scsi: handle special return codes for ABORTED COMMAND
Martin, > Here is another attempt to handle the special return codes for ABORTED > COMMAND for certain SCSI devices. Following MKP's recommendation, I've > created two new BLIST flags, simplifying the code in scsi_error.c > compared to the previous versions. Rather than using "free" bits, I > increased the length of blist_flag_t to 64 bit, and used previously > unused bits. I also added checking for obsolete and unused bits. > > For the blist_flag_t size increase, I used sparse to try and avoid > regressions; that necessitated fixing sparse's errors for the current > code first. Much better, thanks for reworking this. Applied to 4.18/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2] target: Fix Fortify_panic kernel exception
Bryant, > The bug exists in the memcmp in which the length passed in must be > guaranteed to be 1. This bug currently exists because the second > pointer passed in, can be smaller than the cmd->data_length, which > causes a fortify_panic. > > The fix is to use memchr_inv instead to find whether or not a 0 exists > instead of using memcmp. This way you dont have to worry about buffer > overflow which is the reason for the fortify_panic. Clarified the commit description a bit and applied the patch 4.17/scsi-fixes. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: target: clean up kernel-doc and add driver-api document
Randy, > This patch series fixes kernel-doc warnings in drivers/target/ and its > header files, then adds a Documentation driver-api chapter for target > driver interfaces. Applied to 4.18/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: Representative Needed.
Good day, I am seeking your concept with great gratitude to present you as a representative to carry out business transactions with a reasonable share upon your interest and cooperation to work with us in trust. If interested please get back. Regards Kingsley --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Re: [PATCH] bsg referencing bus driver module
> This patch isn't applyable because your mailer has changed all the tabs > to spaces. > > I also think there's no need to do it this way. I think what we need > is for fc_bsg_remove() to wait until the bsg queue is drained. It does > look like the author thought this happened otherwise the code wouldn't > have the note. If we fix it that way we can do the same thing in all > the other transport classes that use bsg (which all have a similar > issue). > > James > Thanks, James. Sorry about the tabs; re-sending. On fc_bsg_remove()...: are you suggesting to implement the whole fix in scsi_transport_fc.c? That would be nice, but I do not see how that is possible. Even with the queue drained bsg still holds a reference to the Scsi_Host via bsg_class_device; bsg_class_device itself is referenced on bsg_open and kept around while a user-mode process keeps a handle to bsg. Even if we somehow implement the waiting the call may be stuck forever if the user-mode process keeps the handle. I think handling it via a rererence to the module is more consistent with the way things are done in Linux. You suggested the approach youself back in "Waiting for scsi_host_template release" discussion. >From df939b80d02bf37b21efaaef8ede86cfd39b0cb8 Mon Sep 17 00:00:00 2001 From: Anatoliy Glagolev Date: Thu, 19 Apr 2018 15:06:06 -0600 Subject: [PATCH] bsg referencing parent module Fixing a bug when bsg layer holds the last reference to device when the device's module has been unloaded. Upon dropping the reference the device's release function may touch memory of the unloaded module. --- block/bsg-lib.c | 24 ++-- block/bsg.c | 29 ++--- drivers/scsi/scsi_transport_fc.c | 8 ++-- include/linux/bsg-lib.h | 4 include/linux/bsg.h | 5 + 5 files changed, 63 insertions(+), 7 deletions(-) diff --git a/block/bsg-lib.c b/block/bsg-lib.c index fc2e5ff..90c28fd 100644 --- a/block/bsg-lib.c +++ b/block/bsg-lib.c @@ -309,6 +309,25 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name, bsg_job_fn *job_fn, int dd_job_size, void (*release)(struct device *)) { + return bsg_setup_queue_ex(dev, name, job_fn, dd_job_size, release, + NULL); +} +EXPORT_SYMBOL_GPL(bsg_setup_queue); + +/** + * bsg_setup_queue - Create and add the bsg hooks so we can receive requests + * @dev: device to attach bsg device to + * @name: device to give bsg device + * @job_fn: bsg job handler + * @dd_job_size: size of LLD data needed for each job + * @release: @dev release function + * @dev_module: @dev's module + */ +struct request_queue *bsg_setup_queue_ex(struct device *dev, const char *name, + bsg_job_fn *job_fn, int dd_job_size, + void (*release)(struct device *), + struct module *dev_module) +{ struct request_queue *q; int ret; @@ -331,7 +350,8 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name, blk_queue_softirq_done(q, bsg_softirq_done); blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT); - ret = bsg_register_queue(q, dev, name, &bsg_transport_ops, release); + ret = bsg_register_queue_ex(q, dev, name, &bsg_transport_ops, release, + dev_module); if (ret) { printk(KERN_ERR "%s: bsg interface failed to " "initialize - register queue\n", dev->kobj.name); @@ -343,4 +363,4 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name, blk_cleanup_queue(q); return ERR_PTR(ret); } -EXPORT_SYMBOL_GPL(bsg_setup_queue); +EXPORT_SYMBOL_GPL(bsg_setup_queue_ex); diff --git a/block/bsg.c b/block/bsg.c index defa06c..6920c5b 100644 --- a/block/bsg.c +++ b/block/bsg.c @@ -750,7 +750,8 @@ static struct bsg_device *__bsg_get_device(int minor, struct request_queue *q) return bd; } -static struct bsg_device *bsg_get_device(struct inode *inode, struct file *file) +static struct bsg_device *bsg_get_device(struct inode *inode, struct file *file, + struct bsg_class_device **pbcd) { struct bsg_device *bd; struct bsg_class_device *bcd; @@ -766,6 +767,7 @@ static struct bsg_device *bsg_get_device(struct inode *inode, struct file *file) if (!bcd) return ERR_PTR(-ENODEV); + *pbcd = bcd; bd = __bsg_get_device(iminor(inode), bcd->queue); if (bd) @@ -781,22 +783,34 @@ static struct bsg_device *bsg_get_device(struct inode *inode, struct file *file) static int bsg_open(struct inode *inode, struct file *file) { struct bsg_device *bd; + struct bsg_class_device *bcd; - bd = bsg_get_device(inode, file); + bd = bsg_get_device(inode, file, &bcd); if (IS_ERR(bd)) return PTR_ERR(bd); file->private_data = bd; + if (bcd->parent_module) {
Re: [PATCH 1/2] scsi: st: Replace GFP_ATOMIC with GFP_KERNEL in st_probe
Jia-Ju, > st_probe() is never called in atomic context. Applied patches 1 and 2 to 4.18/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2 01/14] mpt3sas: Bug fix for big endian systems.
Chaitra, A few comments: > @@ -426,7 +427,7 @@ static void _clone_sg_entries(struct MPT3SAS_ADAPTER *ioc, > dst_addr_phys = _base_get_chain_phys(ioc, > smid, sge_chain_count); > WARN_ON(dst_addr_phys > U32_MAX); > - sgel->Address = (u32)dst_addr_phys; > + sgel->Address = cpu_to_le32((u32)dst_addr_phys); I tend to prefer lower_32_bits() but that's your choice. > @@ -3040,8 +3047,9 @@ mpt3sas_base_map_resources(struct MPT3SAS_ADAPTER *ioc) > } > > for (i = 0; i < ioc->combined_reply_index_count; i++) { > - ioc->replyPostRegisterIndex[i] = (resource_size_t *) > - ((u8 *)&ioc->chip->Doorbell + > + ioc->replyPostRegisterIndex[i] = > + (volatile void __iomem *) > + ((u8 __force *)&ioc->chip->Doorbell + >MPI25_SUP_REPLY_POST_HOST_INDEX_OFFSET + >(i * MPT3_SUP_REPLY_POST_HOST_INDEX_REG_OFFSET)); Do you really need volatile here? The existing resource_size_t didn't imply volatile. Why the double type casts? You've already changed replyPostRegisterIndex to be 'volatile void __iomem **' in the header file. So why not: ioc->replyPostRegisterIndex[i] = &ioc->chip->Doorbell + MPI25_SUP_REPLY_POST_HOST_INDEX_OFFSET + i * MPT3_SUP_REPLY_POST_HOST_INDEX_REG_OFFSET; Also looks like ioc->reply_post_host_index handling a few lines further down could lose the type casts. > @@ -3386,7 +3394,7 @@ _base_put_smid_mpi_ep_scsi_io(struct MPT3SAS_ADAPTER > *ioc, u16 smid, u16 handle) > __le32 *mfp = (__le32 *)mpt3sas_base_get_msg_frame(ioc, smid); > > _clone_sg_entries(ioc, (void *) mfp, smid); > - mpi_req_iomem = (void *)ioc->chip + > + mpi_req_iomem = (void __force *)ioc->chip + > MPI_FRAME_START_OFFSET + (smid * ioc->request_sz); > _base_clone_mpi_to_sys_mem(mpi_req_iomem, (void *)mfp, > ioc->request_sz); Wouldn't it be better to add __iomem to the definition of mpi_req_iomem? > + nvme_encap_request->ErrorResponseBaseAddress = > + cpu_to_le64(ioc->sense_dma & 0xUL); upper_32_bits()? -- Martin K. Petersen Oracle Linux Engineering
Re: usercopy whitelist woe in scsi_sense_cache
Hi. On 20.04.2018 22:23, Kees Cook wrote: I don't know the "how", I only found the "what". :) If you want, grab the reproducer VM linked to earlier in this thread; it'll hit the problem within about 30 seconds of running the reproducer. Just to avoid a possible confusion I should note that I've removed the reproducer from my server, but I can re-upload it if needed. -- Oleksandr Natalenko (post-factum)
Re: usercopy whitelist woe in scsi_sense_cache
On Thu, Apr 19, 2018 at 2:32 AM, Paolo Valente wrote: > I'm missing something here. When the request gets completed in the > first place, the hook bfq_finish_requeue_request gets called, and that > hook clears both ->elv.priv elements (as the request has a non-null > elv.icq). So, when bfq gets the same request again, those elements > must be NULL. What am I getting wrong? > > I have some more concern on this point, but I'll stick to this for the > moment, to not create more confusion. I don't know the "how", I only found the "what". :) If you want, grab the reproducer VM linked to earlier in this thread; it'll hit the problem within about 30 seconds of running the reproducer. -Kees -- Kees Cook Pixel Security
Re: [Patch v2] Storvsc: Select channel based on available percentage of ring buffer to write
Long, > This is a best effort for estimating on how busy the ring buffer is > for that channel, based on available buffer to write in percentage. It > is still possible that at the time of actual ring buffer write, the > space may not be available due to other processes may be writing at > the time. > > Selecting a channel based on how full it is can reduce the possibility > that a ring buffer write will fail, and avoid the situation a channel > is over busy. > > Now it's possible that storvsc can use a smaller ring buffer size > (e.g. 40k bytes) to take advantage of cache locality. Applied to 4.18/scsi-queue. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2] storvsc: Set up correct queue depth values for IDE devices
Long, > If num_cpus=1, we don't have any sub channels. > > The host offers one sub channel for VM with 5 CPUs, after that it offers > an additional sub channel every 4 CPUs. > > The primary channel is always offered. Applied to 4.17/scsi-fixes. Thanks! -- Martin K. Petersen Oracle Linux Engineering
[PATCH] [v2] scsi: ips: fix firmware timestamps for 32-bit
do_gettimeofday() is deprecated since it will stop working in 2038 on 32-bit platforms, leading to incorrect times passed to the firmware. On 64-bit platforms the current code appears to be fine, as the calculation passes an 8-bit century number into the firmware that can represent times long in the future (possibly until 25599). Using ktime_get_real_seconds() to get a 64-bit seconds value and time64_to_tm() to convert it into the firmware format greatly simplifies the ips timekeeping code, makes 32-bit and 64-bit behave the same way here, and gets us closer to removing the deprecated interfaces. Signed-off-by: Arnd Bergmann --- v2: addressed review comments from Finn Thain: - rewrite changelog text - drop now-unused macros - fix incorrect century calculation --- drivers/scsi/ips.c | 78 +++--- drivers/scsi/ips.h | 11 +--- 2 files changed, 17 insertions(+), 72 deletions(-) diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c index a70f36a6d205..0e8a22c38c11 100644 --- a/drivers/scsi/ips.c +++ b/drivers/scsi/ips.c @@ -291,7 +291,7 @@ static void ips_freescb(ips_ha_t *, ips_scb_t *); static void ips_setup_funclist(ips_ha_t *); static void ips_statinit(ips_ha_t *); static void ips_statinit_memio(ips_ha_t *); -static void ips_fix_ffdc_time(ips_ha_t *, ips_scb_t *, time_t); +static void ips_fix_ffdc_time(ips_ha_t *, ips_scb_t *, time64_t); static void ips_ffdc_reset(ips_ha_t *, int); static void ips_ffdc_time(ips_ha_t *); static uint32_t ips_statupd_copperhead(ips_ha_t *); @@ -985,10 +985,7 @@ static int __ips_eh_reset(struct scsi_cmnd *SC) /* FFDC */ if (le32_to_cpu(ha->subsys->param[3]) & 0x30) { - struct timeval tv; - - do_gettimeofday(&tv); - ha->last_ffdc = tv.tv_sec; + ha->last_ffdc = ktime_get_real_seconds(); ha->reset_count++; ips_ffdc_reset(ha, IPS_INTR_IORL); } @@ -2392,7 +2389,6 @@ static int ips_hainit(ips_ha_t * ha) { int i; - struct timeval tv; METHOD_TRACE("ips_hainit", 1); @@ -2407,8 +2403,7 @@ ips_hainit(ips_ha_t * ha) /* Send FFDC */ ha->reset_count = 1; - do_gettimeofday(&tv); - ha->last_ffdc = tv.tv_sec; + ha->last_ffdc = ktime_get_real_seconds(); ips_ffdc_reset(ha, IPS_INTR_IORL); if (!ips_read_config(ha, IPS_INTR_IORL)) { @@ -2548,12 +2543,9 @@ ips_next(ips_ha_t * ha, int intr) if ((ha->subsys->param[3] & 0x30) && (ha->scb_activelist.count == 0)) { - struct timeval tv; - - do_gettimeofday(&tv); - - if (tv.tv_sec - ha->last_ffdc > IPS_SECS_8HOURS) { - ha->last_ffdc = tv.tv_sec; + time64_t now = ktime_get_real_seconds(); + if (now - ha->last_ffdc > IPS_SECS_8HOURS) { + ha->last_ffdc = now; ips_ffdc_time(ha); } } @@ -5988,59 +5980,21 @@ ips_ffdc_time(ips_ha_t * ha) /* */ // static void -ips_fix_ffdc_time(ips_ha_t * ha, ips_scb_t * scb, time_t current_time) +ips_fix_ffdc_time(ips_ha_t * ha, ips_scb_t * scb, time64_t current_time) { - long days; - long rem; - int i; - int year; - int yleap; - int year_lengths[2] = { IPS_DAYS_NORMAL_YEAR, IPS_DAYS_LEAP_YEAR }; - int month_lengths[12][2] = { {31, 31}, - {28, 29}, - {31, 31}, - {30, 30}, - {31, 31}, - {30, 30}, - {31, 31}, - {31, 31}, - {30, 30}, - {31, 31}, - {30, 30}, - {31, 31} - }; + struct tm tm; METHOD_TRACE("ips_fix_ffdc_time", 1); - days = current_time / IPS_SECS_DAY; - rem = current_time % IPS_SECS_DAY; - - scb->cmd.ffdc.hour = (rem / IPS_SECS_HOUR); - rem = rem % IPS_SECS_HOUR; - scb->cmd.ffdc.minute = (rem / IPS_SECS_MIN); - scb->cmd.ffdc.second = (rem % IPS_SECS_MIN); - - year = IPS_EPOCH_YEAR; - while (days < 0 || days >= year_lengths[yleap = IPS_IS_LEAP_YEAR(year)]) { - int newy; - - newy = year + (days / IPS_DAYS_NORMAL_YEAR); - if (days < 0) - --newy; - days -= (newy - year) * IPS_DAYS_NORMAL_YEAR + - IPS_NUM_LEAP_YEARS_THROUGH(newy - 1) - - IPS_NUM_LEAP_YEARS_THROUGH(year - 1); - year = newy; - } - - scb->cmd.ffdc.yearH = year / 100; - scb->cmd.ffdc.yearL = year % 100; - - for (i = 0; days >= month_lengths[i][yleap]; ++i) - days -= month_lengths[i][yleap]; + time64_to_tm(current_time, 0, &tm); - scb->cmd.ffdc.month = i + 1; - scb->cmd.ffdc.day = days + 1; + scb->cmd
[PATCH] [RESEND 2] scsi: esas2r: use ktime_get_real_seconds()
do_gettimeofday() is deprecated because of the y2038 overflow. Here, we use the result to pass into a 32-bit field in the firmware, which still risks an overflow, but if the firmware is written to expect unsigned values, it can at least last until y2106, and there is not much we can do about it. This changes do_gettimeofday() to ktime_get_real_seconds(), which at least simplifies the code a bit, and avoids the deprecated interface. I'm adding a comment about the overflow to document what happens. Signed-off-by: Arnd Bergmann --- Originally submitted in November 2017, no reply. Sent again in January, still no reply. Resending once more now, could you please apply the trivial patch? --- drivers/scsi/esas2r/esas2r_init.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/esas2r/esas2r_init.c b/drivers/scsi/esas2r/esas2r_init.c index 9dffcb28c9b7..9db645dde35e 100644 --- a/drivers/scsi/esas2r/esas2r_init.c +++ b/drivers/scsi/esas2r/esas2r_init.c @@ -1202,8 +1202,6 @@ static bool esas2r_format_init_msg(struct esas2r_adapter *a, case ESAS2R_INIT_MSG_START: case ESAS2R_INIT_MSG_REINIT: { - struct timeval now; - do_gettimeofday(&now); esas2r_hdebug("CFG init"); esas2r_build_cfg_req(a, rq, @@ -1212,7 +1210,8 @@ static bool esas2r_format_init_msg(struct esas2r_adapter *a, NULL); ci = (struct atto_vda_cfg_init *)&rq->vrq->cfg.data.init; ci->sgl_page_size = cpu_to_le32(sgl_page_size); - ci->epoch_time = cpu_to_le32(now.tv_sec); + /* firmware interface overflows in y2106 */ + ci->epoch_time = cpu_to_le32(ktime_get_real_seconds()); rq->flags |= RF_FAILURE_OK; a->init_msg = ESAS2R_INIT_MSG_INIT; break; -- 2.9.0
Re: [Patch v2] Storvsc: Select channel based on available percentage of ring buffer to write
On Thu, 19 Apr 2018 14:54:24 -0700 Long Li wrote: > From: Long Li > > This is a best effort for estimating on how busy the ring buffer is for > that channel, based on available buffer to write in percentage. It is still > possible that at the time of actual ring buffer write, the space may not be > available due to other processes may be writing at the time. > > Selecting a channel based on how full it is can reduce the possibility that > a ring buffer write will fail, and avoid the situation a channel is over > busy. > > Now it's possible that storvsc can use a smaller ring buffer size > (e.g. 40k bytes) to take advantage of cache locality. > > Changes. > v2: Pre-allocate struct cpumask on the heap. > Struct cpumask is a big structure (1k bytes) when CONFIG_NR_CPUS=8192 (default > value when CONFIG_MAXSMP=y). Don't use kernel stack for it by pre-allocating > them using kmalloc when channels are first initialized. > > Signed-off-by: Long Li Reviewed-by: Stephen Hemminger
[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete
https://bugzilla.kernel.org/show_bug.cgi?id=199435 --- Comment #7 from Anthony Hausman (anthonyhaussm...@gmail.com) --- I had a similar stack trace: Apr 20 14:57:18 kernel: INFO: task jbd2/sdt-8:10890 blocked for more than 120 seconds. Apr 20 14:57:18 kernel: Tainted: G OE 4.11.0-14-generic #20~16.04.1-Ubuntu Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 20 14:57:18 kernel: jbd2/sdt-8 D0 10890 2 0x Apr 20 14:57:18 kernel: Call Trace: Apr 20 14:57:18 kernel: __schedule+0x3b9/0x8f0 Apr 20 14:57:18 kernel: schedule+0x36/0x80 Apr 20 14:57:18 kernel: jbd2_journal_commit_transaction+0x241/0x1830 Apr 20 14:57:18 kernel: ? update_load_avg+0x84/0x560 Apr 20 14:57:18 kernel: ? update_load_avg+0x84/0x560 Apr 20 14:57:18 kernel: ? dequeue_entity+0xed/0x4c0 Apr 20 14:57:18 kernel: ? wake_atomic_t_function+0x60/0x60 Apr 20 14:57:18 kernel: ? lock_timer_base+0x7d/0xa0 Apr 20 14:57:18 kernel: kjournald2+0xca/0x250 Apr 20 14:57:18 kernel: ? kjournald2+0xca/0x250 Apr 20 14:57:18 kernel: ? wake_atomic_t_function+0x60/0x60 Apr 20 14:57:18 kernel: kthread+0x109/0x140 Apr 20 14:57:18 kernel: ? commit_timeout+0x10/0x10 Apr 20 14:57:18 kernel: ? kthread_create_on_node+0x70/0x70 Apr 20 14:57:18 kernel: ret_from_fork+0x25/0x30 Apr 20 14:57:18 kernel: INFO: task task:13497 blocked for more than 120 seconds. Apr 20 14:57:18 kernel: Tainted: G OE 4.11.0-14-generic #20~16.04.1-Ubuntu Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 20 14:57:18 kernel: taskD0 13497 13196 0x Apr 20 14:57:18 kernel: Call Trace: Apr 20 14:57:18 kernel: __schedule+0x3b9/0x8f0 Apr 20 14:57:18 kernel: schedule+0x36/0x80 Apr 20 14:57:18 kernel: rwsem_down_write_failed+0x237/0x3b0 Apr 20 14:57:18 kernel: ? copy_page_to_iter_iovec+0x97/0x170 Apr 20 14:57:18 kernel: call_rwsem_down_write_failed+0x17/0x30 Apr 20 14:57:18 kernel: ? call_rwsem_down_write_failed+0x17/0x30 Apr 20 14:57:18 kernel: down_write+0x2d/0x40 Apr 20 14:57:18 kernel: ext4_file_write_iter+0x70/0x3c0 Apr 20 14:57:18 kernel: ? futex_wake+0x90/0x170 Apr 20 14:57:18 kernel: new_sync_write+0xd3/0x130 Apr 20 14:57:18 kernel: __vfs_write+0x26/0x40 Apr 20 14:57:18 kernel: vfs_write+0xb8/0x1b0 Apr 20 14:57:18 kernel: SyS_pwrite64+0x95/0xb0 Apr 20 14:57:18 kernel: entry_SYSCALL_64_fastpath+0x1e/0xad Apr 20 14:57:18 kernel: RIP: 0033:0x7fa085d92d23 Apr 20 14:57:18 kernel: RSP: 002b:7fa0801acc90 EFLAGS: 0293 ORIG_RAX: 0012 Apr 20 14:57:18 kernel: RAX: ffda RBX: 7fa0480009d0 RCX: 7fa085d92d23 Apr 20 14:57:18 kernel: RDX: 0200 RSI: 7fa004000b30 RDI: 000f Apr 20 14:57:18 kernel: RBP: 7fa0801ad060 R08: 7fa0801acd2c R09: 0001 Apr 20 14:57:18 kernel: R10: 0001f86be000 R11: 0293 R12: 7fa0040014c0 Apr 20 14:57:18 kernel: R13: 7fa004000d80 R14: 002e R15: 7fa0480009d0 Apr 20 14:57:18 kernel: INFO: task task:13499 blocked for more than 120 seconds. Apr 20 14:57:18 kernel: Tainted: G OE 4.11.0-14-generic #20~16.04.1-Ubuntu Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 20 14:57:18 kernel: taskD0 13499 13196 0x Apr 20 14:57:18 kernel: Call Trace: Apr 20 14:57:18 kernel: __schedule+0x3b9/0x8f0 Apr 20 14:57:18 kernel: schedule+0x36/0x80 Apr 20 14:57:18 kernel: rwsem_down_write_failed+0x237/0x3b0 Apr 20 14:57:18 kernel: ? copy_page_to_iter_iovec+0x97/0x170 Apr 20 14:57:18 kernel: call_rwsem_down_write_failed+0x17/0x30 Apr 20 14:57:18 kernel: ? call_rwsem_down_write_failed+0x17/0x30 Apr 20 14:57:18 kernel: down_write+0x2d/0x40 Apr 20 14:57:18 kernel: ext4_file_write_iter+0x70/0x3c0 Apr 20 14:57:18 kernel: ? futex_wake+0x90/0x170 Apr 20 14:57:18 kernel: new_sync_write+0xd3/0x130 Apr 20 14:57:18 kernel: __vfs_write+0x26/0x40 Apr 20 14:57:18 kernel: vfs_write+0xb8/0x1b0 Apr 20 14:57:18 kernel: SyS_pwrite64+0x95/0xb0 Apr 20 14:57:18 kernel: entry_SYSCALL_64_fastpath+0x1e/0xad Apr 20 14:57:18 kernel: RIP: 0033:0x7fa085d92d23 Apr 20 14:57:18 kernel: RSP: 002b:7fa07f9abc90 EFLAGS: 0293 ORIG_RAX: 0012 Apr 20 14:57:18 kernel: RAX: ffda RBX: 7f9fac008d00 RCX: 7fa085d92d23 Apr 20 14:57:18 kernel: RDX: 0200 RSI: 7fa0080013b0 RDI: 000f Apr 20 14:57:18 kernel: RBP: 7fa07f9ac060 R08: 7fa07f9abd2c R09: 0001 Apr 20 14:57:18 kernel: R10: 000219541000 R11: 0293 R12: 7fa008001140 Apr 20 14:57:18 kernel: R13: 7fa0080008c0 R14: 002e R15: 7f9fac008d00 Apr 20 14:57:18 kernel: INFO: task task:13510 blocked for more than 120 seconds. Apr 20 14:57:18 kernel: Tainted: G OE 4.11.0-14-generic #20~16.04.1-Ubuntu Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/k
Re: [PATCH] isci: Fix infinite loop in while loop
On 20/04/18 10:45, James Bottomley wrote: > On Fri, 2018-04-20 at 10:03 +0100, Colin King wrote: >> From: Colin Ian King >> >> In the case when the phy_mask is bitwise anded with the >> phy_index bit is zero the continue statement currently jumps >> to the next iteration of the while loop and phy_index is >> never actually incremented, potentially causing an infinite >> loop if phy_index is less than SCI_MAX_PHS. Fix this by >> jumping to the increment of phy_index. >> >> [ The goto is used to save one more level of nesting that >> makes the code far wider than 80 columns. ] > > what's wrong with replacing the while() with a for() that just works > (removing the increment at the end). This is effectively open coding a > for loop anyway, which is a pattern we wouldn't want replicated. > > James > Good point, V2 en-route.
[PATCH][V2] isci: Fix infinite loop in while loop
From: Colin Ian King In the case when the phy_mask is bitwise anded with the phy_index bit is zero the continue statement currently jumps to the next iteration of the while loop and phy_index is never actually incremented, potentially causing an infinite loop if phy_index is less than SCI_MAX_PHS. Fix this by turning the while loop into a for loop. Signed-off-by: Colin Ian King --- drivers/scsi/isci/port_config.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/scsi/isci/port_config.c b/drivers/scsi/isci/port_config.c index edb7be786c65..9e8de1462593 100644 --- a/drivers/scsi/isci/port_config.c +++ b/drivers/scsi/isci/port_config.c @@ -291,7 +291,7 @@ sci_mpc_agent_validate_phy_configuration(struct isci_host *ihost, * Note: We have not moved the current phy_index so we will actually * compare the startting phy with itself. * This is expected and required to add the phy to the port. */ - while (phy_index < SCI_MAX_PHYS) { + for (; phy_index < SCI_MAX_PHYS; phy_index++) { if ((phy_mask & (1 << phy_index)) == 0) continue; sci_phy_get_sas_address(&ihost->phys[phy_index], @@ -311,7 +311,6 @@ sci_mpc_agent_validate_phy_configuration(struct isci_host *ihost, &ihost->phys[phy_index]); assigned_phy_mask |= (1 << phy_index); - phy_index++; } } -- 2.17.0
Re: [PATCH] bsg referencing bus driver module
On Thu, 2018-04-19 at 15:10 -0700, Anatoliy Glagolev wrote: > Updated: rebased on recent Linux, cc-ed maintainers per instructions > in MAINTAINERS file > > From df939b80d02bf37b21efaaef8ede86cfd39b0cb8 Mon Sep 17 00:00:00 > 2001 > From: Anatoliy Glagolev > Date: Thu, 19 Apr 2018 15:06:06 -0600 > Subject: [PATCH] bsg referencing parent module > > Fixing a bug when bsg layer holds the last reference to device > when the device's module has been unloaded. Upon dropping the > reference the device's release function may touch memory of > the unloaded module. > --- > block/bsg-lib.c | 24 ++-- > block/bsg.c | 29 ++--- > drivers/scsi/scsi_transport_fc.c | 8 ++-- > include/linux/bsg-lib.h | 4 > include/linux/bsg.h | 5 + > 5 files changed, 63 insertions(+), 7 deletions(-) > > diff --git a/block/bsg-lib.c b/block/bsg-lib.c > index fc2e5ff..90c28fd 100644 > --- a/block/bsg-lib.c > +++ b/block/bsg-lib.c > @@ -309,6 +309,25 @@ struct request_queue *bsg_setup_queue(struct > device *dev, const char *name, > bsg_job_fn *job_fn, int dd_job_size, > void (*release)(struct device *)) > { > + return bsg_setup_queue_ex(dev, name, job_fn, dd_job_size, release, > + NULL); > +} > +EXPORT_SYMBOL_GPL(bsg_setup_queue); > + > +/** > + * bsg_setup_queue - Create and add the bsg hooks so we can receive > requests > + * @dev: device to attach bsg device to > + * @name: device to give bsg device > + * @job_fn: bsg job handler > + * @dd_job_size: size of LLD data needed for each job > + * @release: @dev release function > + * @dev_module: @dev's module > + */ > +struct request_queue *bsg_setup_queue_ex(struct device *dev, const > char *name, > + bsg_job_fn *job_fn, int dd_job_size, > + void (*release)(struct device *), > + struct module *dev_module) This patch isn't applyable because your mailer has changed all the tabs to spaces. I also think there's no need to do it this way. I think what we need is for fc_bsg_remove() to wait until the bsg queue is drained. It does look like the author thought this happened otherwise the code wouldn't have the note. If we fix it that way we can do the same thing in all the other transport classes that use bsg (which all have a similar issue). James
Re: [PATCH] isci: Fix infinite loop in while loop
On Fri, 2018-04-20 at 10:03 +0100, Colin King wrote: > From: Colin Ian King > > In the case when the phy_mask is bitwise anded with the > phy_index bit is zero the continue statement currently jumps > to the next iteration of the while loop and phy_index is > never actually incremented, potentially causing an infinite > loop if phy_index is less than SCI_MAX_PHS. Fix this by > jumping to the increment of phy_index. > > [ The goto is used to save one more level of nesting that > makes the code far wider than 80 columns. ] what's wrong with replacing the while() with a for() that just works (removing the increment at the end). This is effectively open coding a for loop anyway, which is a pattern we wouldn't want replicated. James
[PATCH] isci: Fix infinite loop in while loop
From: Colin Ian King In the case when the phy_mask is bitwise anded with the phy_index bit is zero the continue statement currently jumps to the next iteration of the while loop and phy_index is never actually incremented, potentially causing an infinite loop if phy_index is less than SCI_MAX_PHS. Fix this by jumping to the increment of phy_index. [ The goto is used to save one more level of nesting that makes the code far wider than 80 columns. ] Fixes: 80aebef7c112 ("[SCSI] isci: Fix a infinite loop.") Signed-off-by: Colin Ian King --- drivers/scsi/isci/port_config.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/isci/port_config.c b/drivers/scsi/isci/port_config.c index edb7be786c65..55dc7c1dbc2b 100644 --- a/drivers/scsi/isci/port_config.c +++ b/drivers/scsi/isci/port_config.c @@ -293,7 +293,7 @@ sci_mpc_agent_validate_phy_configuration(struct isci_host *ihost, * This is expected and required to add the phy to the port. */ while (phy_index < SCI_MAX_PHYS) { if ((phy_mask & (1 << phy_index)) == 0) - continue; + goto next_index; sci_phy_get_sas_address(&ihost->phys[phy_index], &phy_assigned_address); @@ -311,6 +311,7 @@ sci_mpc_agent_validate_phy_configuration(struct isci_host *ihost, &ihost->phys[phy_index]); assigned_phy_mask |= (1 << phy_index); +next_index: phy_index++; } -- 2.17.0
[PATCH 3/3] scsi: avoid to hold host-wide counter of host_busy for scsi_mq
It isn't necessary to check the host depth in scsi_queue_rq() any more since it has been respected by blk-mq before calling scsi_queue_rq() via getting driver tag. Lots of LUNs may attach to same host, and per-host IOPS may reach millions level, so we should avoid to this expensive atomic operations on the hostwide counter in IO path. This patch implemens scsi_host_busy() via blk_mq_tagset_busy_iter() for reading the count of busy IOs for scsi_mq. It is observed that IOPS is increased by 15% in IO test on scsi_debug (32 LUNs, 32 submit queues, 1024 can_queue, libaio/dio) in one dual-socket system. Cc: Omar Sandoval , Cc: "Martin K. Petersen" , Cc: James Bottomley , Cc: Christoph Hellwig , Cc: Don Brace Cc: Kashyap Desai Cc: Mike Snitzer Cc: Hannes Reinecke Cc: Laurence Oberman Signed-off-by: Ming Lei --- drivers/scsi/hosts.c| 24 +++- drivers/scsi/scsi_lib.c | 23 +-- 2 files changed, 40 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 69beb30205f1..ad56e2b10ac8 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -564,13 +564,35 @@ struct Scsi_Host *scsi_host_get(struct Scsi_Host *shost) } EXPORT_SYMBOL(scsi_host_get); +struct scsi_host_mq_in_flight { + int cnt; +}; + +static void scsi_host_check_in_flight(struct request *rq, void *data, + bool reserved) +{ + struct scsi_host_mq_in_flight *in_flight = data; + + if (blk_mq_request_started(rq)) + in_flight->cnt++; +} + /** * scsi_host_busy - Return the host busy counter * @shost: Pointer to Scsi_Host to inc. **/ int scsi_host_busy(struct Scsi_Host *shost) { - return atomic_read(&shost->host_busy); + struct scsi_host_mq_in_flight in_flight = { + .cnt = 0, + }; + + if (!shost->use_blk_mq) + return atomic_read(&shost->host_busy); + + blk_mq_tagset_busy_iter(&shost->tag_set, scsi_host_check_in_flight, + &in_flight); + return in_flight.cnt; } EXPORT_SYMBOL(scsi_host_busy); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 0dfec0dedd5e..dc437c642934 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -345,7 +345,8 @@ static void scsi_dec_host_busy(struct Scsi_Host *shost) unsigned long flags; rcu_read_lock(); - atomic_dec(&shost->host_busy); + if (!shost->use_blk_mq) + atomic_dec(&shost->host_busy); if (unlikely(scsi_host_in_recovery(shost))) { spin_lock_irqsave(shost->host_lock, flags); if (shost->host_failed || shost->host_eh_scheduled) @@ -444,7 +445,12 @@ static inline bool scsi_target_is_busy(struct scsi_target *starget) static inline bool scsi_host_is_busy(struct Scsi_Host *shost) { - if (shost->can_queue > 0 && + /* +* blk-mq can handle host queue busy efficiently via host-wide driver +* tag allocation +*/ + + if (!shost->use_blk_mq && shost->can_queue > 0 && atomic_read(&shost->host_busy) >= shost->can_queue) return true; if (atomic_read(&shost->host_blocked) > 0) @@ -1539,9 +1545,12 @@ static inline int scsi_host_queue_ready(struct request_queue *q, if (scsi_host_in_recovery(shost)) return 0; - busy = atomic_inc_return(&shost->host_busy) - 1; + if (!shost->use_blk_mq) + busy = atomic_inc_return(&shost->host_busy) - 1; + else + busy = 0; if (atomic_read(&shost->host_blocked) > 0) { - if (busy) + if (busy || scsi_host_busy(shost)) goto starved; /* @@ -1555,7 +1564,7 @@ static inline int scsi_host_queue_ready(struct request_queue *q, "unblocking host at zero depth\n")); } - if (shost->can_queue > 0 && busy >= shost->can_queue) + if (!shost->use_blk_mq && shost->can_queue > 0 && busy >= shost->can_queue) goto starved; if (shost->host_self_blocked) goto starved; @@ -1641,7 +1650,9 @@ static void scsi_kill_request(struct request *req, struct request_queue *q) * with the locks as normal issue path does. */ atomic_inc(&sdev->device_busy); - atomic_inc(&shost->host_busy); + + if (!shost->use_blk_mq) + atomic_inc(&shost->host_busy); if (starget->can_queue > 0) atomic_inc(&starget->target_busy); -- 2.9.5