Re: scsi: memory leak in sg_start_req
On 2018-01-09 11:05 AM, Dmitry Vyukov wrote: Hello, syzkaller has found the following memory leak: unreferenced object 0x88004c19 (size 8328): comm "syz-executor", pid 4627, jiffies 4294749150 (age 45.507s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 22 01 00 00 00 00 00 00 04 00 00 00 ..."... backtrace: [<5955b5a9>] kmalloc_order+0x59/0x80 mm/slab_common.c:1124 [<43ae006e>] kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1133 [] kmalloc_large include/linux/slab.h:433 [inline] [ ] __kmalloc+0x2c4/0x340 mm/slub.c:3751 [ ] kmalloc include/linux/slab.h:504 [inline] [ ] bio_alloc_bioset+0x4d5/0x7e0 block/bio.c:450 [ ] bio_kmalloc include/linux/bio.h:410 [inline] [ ] bio_copy_user_iov+0x2be/0xcb0 block/bio.c:1226 [<1d0b79ed>] __blk_rq_map_user_iov block/blk-map.c:67 [inline] [<1d0b79ed>] blk_rq_map_user_iov+0x2b6/0x7d0 block/blk-map.c:136 [<4200a869>] blk_rq_map_user+0x11e/0x170 block/blk-map.c:166 [<8f21739e>] sg_start_req drivers/scsi/sg.c:1794 [inline] [<8f21739e>] sg_common_write.isra.16+0x14df/0x1ed0 drivers/scsi/sg.c:777 [<093f61e3>] sg_write+0x8a7/0xd7b drivers/scsi/sg.c:677 [ ] __vfs_write+0x10d/0x8f0 fs/read_write.c:480 [<0638f16f>] vfs_write+0x1fd/0x570 fs/read_write.c:544 [<6a7e6867>] SYSC_write fs/read_write.c:589 [inline] [<6a7e6867>] SyS_write+0xfa/0x250 fs/read_write.c:581 can be reproduced with the following program: // autogenerated by syzkaller (http://github.com/google/syzkaller) #include #include #include #include int main() { int fd = open("/dev/sg1", O_RDWR); const char *data = "\xb6\x3d\xb8\x5e\x1e\x8d\x22\x00\x00\x00\x00\x00\x00\x08\xaf\xd6\x1d" "\xcc\x43\x6a\xed\x5e\xd2\xbc\x70\x18\xce\xbc\x9b\x97\xae\x21\x91\x4d" "\x87\x2c\x67\x8c\xe2\x2c\x9b\x16\x0e\x96\xaa\x1f\xae\x1a"; write(fd, data, 0x30); return 0; } if executed in a loop, memory consumption grows infinitely. on upstream commit b2cd1df66037e7c4697c7e40496bf7e4a5e16a2d The seemingly random data that program is sending is asking for a buffer of 2,264,314 bytes which the sg driver procures and waits for the caller to either issue a read() or close() the file or shutdown the program. The test program does none of those expected operations, it simply asks for the same resources again. In my version of your test code (attached), that happens 1,021 times at which point the file handles in that process are exhausted and all subsequent open()s fail with EBADF (as do the write()s). The output from my program was this on one run: # ./sg_syzk_grow First errno=9 [Bad file descriptor] index=1021 done_count=5, err_count=48979, last_errno=9 [Bad file descriptor] # lsscsi -gs [0:0:0:0] disk Linux scsi_debug 0186 /dev/sda /dev/sg0 2.14GB Monitoring that program with 'free' from another terminal I see about 2.5 GBytes of ram "swallowed" almost immediately when the test program runs. When the program exits (about 50 seconds later) as far as I can see all that ram is given back. If you used the same program and wrote to a regular file rather than a sg device, then that program would eventually fill any file system, at the rate of 48 bytes per iteration (given enough file descriptor resources). The sg driver, using its original 1994 interface, deprecated for around 18 years, just gets a system to resource exhaustion quicker. Doug Gilbert // autogenerated by syzkaller (http://github.com/google/syzkaller) #include #include #include #include #include #include #include #include static const char * sg_dev = "/dev/sg0"; int main() { const char *data = "\xb6\x3d\xb8\x5e\x1e\x8d\x22\x00\x00\x00\x00\x00\x00\x08\xaf\xd6\x1d" "\xcc\x43\x6a\xed\x5e\xd2\xbc\x70\x18\xce\xbc\x9b\x97\xae\x21\x91\x4d" "\x87\x2c\x67\x8c\xe2\x2c\x9b\x16\x0e\x96\xaa\x1f\xae\x1a"; int k, res, first_errno, prev_errno, first_err_index; int err_count = 0; int done_count = 0; bool got_1st_errno = false; for (k = 0; k < 1; ++k) { int fd = open(sg_dev, O_RDWR); res = write(fd, data, 0x30); if (res < 0) { if (! got_1st_errno) { got_1st_errno = true; first_errno = errno; first_err_index = done_count + k; printf("First errno=%d [%s] index=%d\n", first_errno, strerror(first_errno), first_err_index); } ++err_count; } } done_count += k; sleep(10); for (k = 0; k < 1; ++k) { int fd = open(sg_dev, O_RDWR); res = write(fd, data, 0x30); if (res < 0) { if (! got_1st_errno) { got_1st_errno = true; first_errno = errno; printf("First errno=%d [%s] index=%d\n", first_errno, strerror(f
Re: [PATCH 00/14] megaraid_sas: driver updates
Shivasharan, > Shivasharan S (14): > megaraid_sas: zero out IOC INIT and stream detection memory > megaraid_sas: memset IOC INIT frame using correct size > megaraid_sas: Return the DCMD status from megasas_get_seq_num > megaraid_sas: Reset ldio_outstanding in megasas_resume > megaraid_sas: Error handling for invalid ldcount provided by firmware > in RAID map > megaraid_sas: unload flag should be set after scsi_remove_host is > called > megaraid_sas: Avoid firing DCMDs while OCR is in progress > megaraid_sas: Use megasas_wait_for_adapter_operational to detect > controller state in IOCTL path > megaraid_sas: Update LD map after populating drv_map driver map copy > megaraid_sas: Selectively apply stream detection based on IO type > megaraid_sas: Expose fw_cmds_outstanding through sysfs > megaraid_sas: re-work DCMD refire code > megaraid_sas: NVME passthru command support > megaraid_sas: driver version upgrade Applied to 1-12,14 to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] Change type of third __scsi_queue_insert() argument from int into bool
Bart, > This patch does not change any functionality but makes the SCSI core > source code slightly easier to read. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 1/2] scsi: aacraid: Get correct lun count
Raghava, > The correct lun count needs to be divided by 24, missed it in the previous > patch set. Peculiar number. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 2/2] scsi: aacraid: Delay for rescan worker needs to be 10 seconds
Raghava, > The delay for the rescan worker needs to 10 seconds, missed the HZ in > there. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCHv2] scsi_dh_alua: skip RTPG for devices only supporting active/optimized
Hannes, > For hardware only supporting active/optimized there's no point in > ever re-issuing RTPG as the only new state we can possibly read is > active/optimized. > This avoid spurious errors during path failover on such arrays. Applied to 4.16/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
[no subject]
Schöne Grüße, Ich bin Frau Adrian Irene von Active Lenders Darlehensfirma bekannt als Active Lending Loan®. Wir bieten alle Arten von Darlehen bei 1% Zinssatz. Wenn Sie ein Darlehen benötigen, kontaktieren Sie uns bitte mit den folgenden Informationen. Bitte füllen Sie das untenstehende Formular aus und senden Sie es so schnell wie möglich zurück. Benötigte Menge: . Laufzeit des Darlehens: Der Grund für das Darlehen: . Wir freuen uns darauf, Ihnen zu helfen. Kontaktieren Sie uns per E-Mail: cont...@activeslendinggroup.com Deine Frau Adrian Irene
Re: [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device
On Wed, Jan 10, 2018 at 10:18:16AM -0800, Bart Van Assche wrote: > Several SCSI transport and LLD drivers surround code that does not > tolerate concurrent calls of .queuecommand() with scsi_target_block() / > scsi_target_unblock(). These last two functions use > blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request > queues to prevent concurrent .queuecommand() calls. However, that is Actually blk_mq_quiesce_queue() is supposed to disable and drain dispatch, not for prevent concurrent .queuecommand() calls. > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). Given it is error handling, do we need to prevent the .queuecommand() call in scsi_send_eh_cmnd()? Could you share us what the actual issue observed is from user view? > Hence surround the .queuecommand() call from the SCSI error handler with > blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced(). > > Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into > code that calls blk_get_request(), e.g. scsi_execute_req(), is not an > option since scsi_send_eh_cmnd() can be called if all requests are > allocated and if no requests will make progress without aborting any > of these requests. If we need to prevent the .queuecommand() in scsi_send_eh_cmnd(), what do you think of the approach by requeuing the EH command via scsi_mq_requeue_cmd()/scsi_requeue_cmd()? And the EH request will be dispatched finally when the queue becomes unquiesced or the STOPPED is cleared. -- Ming
[PATCH 22/38] scsi: Define usercopy region in scsi_sense_cache slab cache
From: David Windsor SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore contained in the scsi_sense_cache slab cache, need to be copied to/from userspace. cache object allocation: drivers/scsi/scsi_lib.c: scsi_select_sense_cache(...): return ... ? scsi_sense_isadma_cache : scsi_sense_cache scsi_alloc_sense_buffer(...): return kmem_cache_alloc_node(scsi_select_sense_cache(), ...); scsi_init_request(...): ... cmd->sense_buffer = scsi_alloc_sense_buffer(...); ... cmd->req.sense = cmd->sense_buffer example usage trace: block/scsi_ioctl.c: (inline from sg_io) blk_complete_sghdr_rq(...): struct scsi_request *req = scsi_req(rq); ... copy_to_user(..., req->sense, len) scsi_cmd_ioctl(...): sg_io(...); In support of usercopy hardening, this patch defines a region in the scsi_sense_cache slab cache in which userspace copy operations are allowed. This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. Signed-off-by: David Windsor [kees: adjust commit log, provide usage trace] Cc: "James E.J. Bottomley" Cc: "Martin K. Petersen" Cc: linux-scsi@vger.kernel.org Signed-off-by: Kees Cook --- drivers/scsi/scsi_lib.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 1cbc497e00bd..164d062c4d94 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -79,14 +79,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost) if (shost->unchecked_isa_dma) { scsi_sense_isadma_cache = kmem_cache_create("scsi_sense_cache(DMA)", - SCSI_SENSE_BUFFERSIZE, 0, - SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL); + SCSI_SENSE_BUFFERSIZE, 0, + SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL); if (!scsi_sense_isadma_cache) ret = -ENOMEM; } else { scsi_sense_cache = - kmem_cache_create("scsi_sense_cache", - SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN, NULL); + kmem_cache_create_usercopy("scsi_sense_cache", + SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN, + 0, SCSI_SENSE_BUFFERSIZE, NULL); if (!scsi_sense_cache) ret = -ENOMEM; } -- 2.7.4
Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+.
On Fri, Jan 5, 2018 at 8:32 AM, Bart Van Assche wrote: > On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote: >> HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login >> prompt and hangs with Linux 4.13 or later. I cannot log in on console >> or SSH into the machine. Linux 4.12 and older boot fine. >> >> I see these messages on the console. >> >> [ 242.843206] INFO: task scsi_eh_2:465 blocked for more than 120 seconds. >> [ 242.877835] Not tainted 4.15.0-041500rc6-generic #201712312330 > > It seems like something got stuck in the block layer. The traditional way to > debug this is to analyze the information that is available under > /sys/kernel/debug/block. However, since login is not possible we can't use > that approach. Would it be possible for you to check whether this has been > resolved in kernel v4.15-rc6, and if not, bisect this? > > Thanks, > > Bart. Hi. The machine still hangs with Linux 4.15-rc6. I did a bisect. The hang is introduced with Linux 4.13-rc1 commit c5cb83bb337c25caae995d992d1cdf9b317f83de "genirq/cpuhotplug: Handle managed IRQs on CPU hotplug". There is a startup script that disables hyperthreading by offlining sibling CPUs. for CPU in $(cut -s -d, -f2 $SYS_PATH/cpu*/topology/thread_siblings_list | sort -un); do echo 0 > /sys/devices/system/cpu/cpu$CPU/online done If the above script is not run, the machine does not hang with Linux 4.13. Cheers, Vinson
[PATCH] Change type of third __scsi_queue_insert() argument from int into bool
This patch does not change any functionality but makes the SCSI core source code slightly easier to read. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn --- drivers/scsi/scsi_lib.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 1c4d885a1d9e..dad450ccd57b 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -164,7 +164,7 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd) * for a requeue after completion, which should only occur in this * file. */ -static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy) +static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, bool unbusy) { struct scsi_device *device = cmd->device; struct request_queue *q = device->request_queue; @@ -220,7 +220,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy) */ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason) { - __scsi_queue_insert(cmd, reason, 1); + __scsi_queue_insert(cmd, reason, true); } @@ -1015,11 +1015,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) break; case ACTION_RETRY: /* Retry the same command immediately */ - __scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY, 0); + __scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY, false); break; case ACTION_DELAYED_RETRY: /* Retry the same command after a delay */ - __scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY, 0); + __scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY, false); break; } } -- 2.15.1
RE: [PATCH 13/14] megaraid_sas: NVME passthru command support
Bart et al, Broadcom's Tri-mode HBAs and MegaRAID controllers are capable of connecting with SAS, SATA, NVMe drives, SAS expanders and PCIe switches (with NVMe drives connected behind that) and are capable of creating RAID volumes on top of similar family of drives. In the case of RAID controllers, all of those drives and RAID volumes are exposed to the OS as generic SCSI devices and in the case of HBA only for SAS and SATA the topology is exposed to OS through SAS transport layer and NVMe drives are exposed as generic SCSI devices. The SCSI CDB to specific packet (SATA frames, SSP frames or NVMe) translation occurs in the hardware/firmware. For the OS driver, the interface to interact is common across all the type of devices and it is MPI SCSI IO Request. The NVMe passthru support added in this patch is only for management purpose and will let Broadcom specific management applications to send some direct NVMe commands to the hardware/firmware solely for management purpose. For normal READ/WRITE I/O the preferred path is to issue SCSI command to our hardware/firmware and let it translate to the NVMe. We have many architectural constraints to directly expose NVMe drives to NVMe subsystem for normal I/O usage and management usage and hence we prefer not to go down the path. This patch is just addition of new feature for our management applications (which are common across many OSes) to access a specific type of MPI command to manage NVMe drives connected behind our HBAs (which are non-standard) in a vendor specific way, hence we think this patch is valid to be accepted to megaraid driver. Please let us know if more details are required on the tri-mode controllers. Thanks Sathya -Original Message- From: Linux-nvme [mailto:linux-nvme-boun...@lists.infradead.org] On Behalf Of Douglas Gilbert Sent: Wednesday, January 10, 2018 1:06 PM To: Bart Van Assche; h...@infradead.org; kashyap.de...@broadcom.com; shivasharan.srikanteshw...@broadcom.com Cc: sumit.sax...@broadcom.com; peter.riv...@broadcom.com; linux-n...@lists.infradead.org; linux-scsi@vger.kernel.org Subject: Re: [PATCH 13/14] megaraid_sas: NVME passthru command support On 2018-01-10 11:22 AM, Bart Van Assche wrote: > On Tue, 2018-01-09 at 22:07 +0530, Kashyap Desai wrote: >> Overall NVME support behind MR controller is really a SCSI device. On >> top of that, for MegaRaid, NVME device can be part of Virtual Disk >> and those drive will not be exposed to the driver. User application >> may like to talk to hidden NVME devices (part of VDs). This patch >> will extend the existing interface for megaraid product in the same >> way it is currently supported for other protocols like SMP, SATA pass-through. > > It seems to me like there is a contradiction in the above paragraph: > if some NVMe devices are not exposed to the driver, how can a user > space application ever send NVMe commands to it? I think that he meant that the NVMe physical devices (e.g. SSDs) are not exposed to the upper layers (e.g. the SCSI mid-layer and above). The SCSI subsystem has a no_uld_attach device flag that lets a LLD attach physical devices but the sd driver and hence the block layer do not "see" them. The idea is that maintenance programs like smartmontools can use them via the bsg or sg drivers. The Megaraid driver code does not seem to use no_uld_attach. Does the NVMe subsystem have similar "generic" (i.e. non-block) devices accessible to the user space? > Anyway, has it been considered to implement the NVMe support as an > NVMe transport driver? The upstream kernel already supports NVMe > communication with NVMe PCI devices, NVMe over RDMA and NVMe over FC. > If communication to the NVMe devices behind the MegaRaid controller > would be implemented as an NVMe transport driver then all > functionality of the Linux NVMe driver could be reused, including its sysfs entries. Broadcom already sell "SAS" HBAs that have "tri-mode" phys. That is a phy that can connect to a SAS device (e.g. a SAS expander), a SATA device or a NVMe device. Now if I was Broadcom designing a 24 Gbps SAS-4 next generation expander I would be thinking of using those tri-mode phys on it. But then there is a problem, SAS currently supports 3 protocols: SSP (for SCSI storage and enclosure management (SES)), STP (for SATA storage ) and SMP (for expander management). The problem is how those NVMe commands, status and data cross the wire between the OS HBA (or MegaRaid type controller) and an expander. Solving that might need some lateral thinking. On one hand the NVM Express folks seem to have shelved the idea of a SCSI to NVMe Translation Layer (SNTL) and have not updated an old white paper on the subject. Currently there is no SNTL on Linux (there was but it was removed) or FreeBSD but there is one on Windows. On the other hand I'm informed that recently the same body accepted the SES-3 standard pretty much as-is. That is done with the addition of SES Send and SES Receive commands to NVME-
[PATCH 0/1] scsi_debug: delay stress fix
Bart Van Assche reported that when the scsi_debug driver was being stress tested with fio, changing the delay paremeter via sysfs caused a cascade of oops-es. The fix presented reads the driver wide delay values (jiffies or nanoseconds) once and remembers in the sdebug_defer object which defer method is used and which method has been initialized. This simplifies handling when command aborts occur. This causes a minor changes in semantic: a SCSI command "in flight" is no longer impacted by changing the delay option after it has been scheduled (i.e. while it is waiting for a work queue or a hr timer to exhaust). Douglas Gilbert (1): scsi_debug: delay fix drivers/scsi/scsi_debug.c | 72 ++- 1 file changed, 46 insertions(+), 26 deletions(-) -- 2.14.1
[PATCH 1/1] scsi_debug: delay stress fix
Introduce a state enum into sdebug_defer objects to indicate which, if any, defer method has been used with the associated command. Also add 2 bools to indicate which of the defer methods has been initialized. Those objects are re-used but the initialization only needs to be done once. This simplifies command cancellation handling. Now the delay associated with a deferred response of a command cannot be changed (once started) by changing the delay (and ndelay) parameters in sysfs. Command aborts and driver shutdown are still honoured immediately when received. Signed-off-by: Douglas Gilbert --- drivers/scsi/scsi_debug.c | 72 ++- 1 file changed, 46 insertions(+), 26 deletions(-) diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index e4f037f0f38b..f5d0098b5a6a 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -263,12 +263,18 @@ struct sdebug_host_info { #define to_sdebug_host(d) \ container_of(d, struct sdebug_host_info, dev) +enum sdeb_defer_type {SDEB_DEFER_NONE = 0, SDEB_DEFER_HRT = 1, + SDEB_DEFER_WQ = 2}; + struct sdebug_defer { struct hrtimer hrt; struct execute_work ew; int sqa_idx;/* index of sdebug_queue array */ int qc_idx; /* index of sdebug_queued_cmd array within sqa_idx */ int issuing_cpu; + bool init_hrt; + bool init_wq; + enum sdeb_defer_type defer_t; }; struct sdebug_queued_cmd { @@ -3495,6 +3501,7 @@ static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp) struct scsi_cmnd *scp; struct sdebug_dev_info *devip; + sd_dp->defer_t = SDEB_DEFER_NONE; qc_idx = sd_dp->qc_idx; sqp = sdebug_q_arr + sd_dp->sqa_idx; if (sdebug_statistics) { @@ -3678,13 +3685,14 @@ static void scsi_debug_slave_destroy(struct scsi_device *sdp) } } -static void stop_qc_helper(struct sdebug_defer *sd_dp) +static void stop_qc_helper(struct sdebug_defer *sd_dp, + enum sdeb_defer_type defer_t) { if (!sd_dp) return; - if ((sdebug_jdelay > 0) || (sdebug_ndelay > 0)) + if (defer_t == SDEB_DEFER_HRT) hrtimer_cancel(&sd_dp->hrt); - else if (sdebug_jdelay < 0) + else if (defer_t == SDEB_DEFER_WQ) cancel_work_sync(&sd_dp->ew.work); } @@ -3694,6 +3702,7 @@ static bool stop_queued_cmnd(struct scsi_cmnd *cmnd) { unsigned long iflags; int j, k, qmax, r_qmax; + enum sdeb_defer_type l_defer_t; struct sdebug_queue *sqp; struct sdebug_queued_cmd *sqcp; struct sdebug_dev_info *devip; @@ -3717,8 +3726,13 @@ static bool stop_queued_cmnd(struct scsi_cmnd *cmnd) atomic_dec(&devip->num_in_q); sqcp->a_cmnd = NULL; sd_dp = sqcp->sd_dp; + if (sd_dp) { + l_defer_t = sd_dp->defer_t; + sd_dp->defer_t = SDEB_DEFER_NONE; + } else + l_defer_t = SDEB_DEFER_NONE; spin_unlock_irqrestore(&sqp->qc_lock, iflags); - stop_qc_helper(sd_dp); + stop_qc_helper(sd_dp, l_defer_t); clear_bit(k, sqp->in_use_bm); return true; } @@ -3733,6 +3747,7 @@ static void stop_all_queued(void) { unsigned long iflags; int j, k; + enum sdeb_defer_type l_defer_t; struct sdebug_queue *sqp; struct sdebug_queued_cmd *sqcp; struct sdebug_dev_info *devip; @@ -3751,8 +3766,13 @@ static void stop_all_queued(void) atomic_dec(&devip->num_in_q); sqcp->a_cmnd = NULL; sd_dp = sqcp->sd_dp; + if (sd_dp) { + l_defer_t = sd_dp->defer_t; + sd_dp->defer_t = SDEB_DEFER_NONE; + } else + l_defer_t = SDEB_DEFER_NONE; spin_unlock_irqrestore(&sqp->qc_lock, iflags); - stop_qc_helper(sd_dp); + stop_qc_helper(sd_dp, l_defer_t); clear_bit(k, sqp->in_use_bm); spin_lock_irqsave(&sqp->qc_lock, iflags); } @@ -4003,7 +4023,7 @@ static void setup_inject(struct sdebug_queue *sqp, * SCSI_MLQUEUE_HOST_BUSY if temporarily out of resources. */ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip, -int scsi_result
Re: [PATCHv2] scsi_dh_alua: skip RTPG for devices only supporting active/optimized
On Fri, 2017-12-22 at 12:52 +0100, Hannes Reinecke wrote: > From: Hannes Reinecke > > For hardware only supporting active/optimized there's no point in > ever re-issuing RTPG as the only new state we can possibly read is > active/optimized. > This avoid spurious errors during path failover on such arrays. Reviewed-by: Bart Van Assche
Re: [PATCH] scsi: bfa: use ARRAY_SIZE for array sizing calculation on array __pciids
Colin, > Use the ARRAY_SIZE macro on array __pciids to determine size of the array. > Improvement suggested by coccinelle. Applied to 4.16/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH][scsi-next] scsi: qla2xxx: remove redundant assignment of d
Colin, > The initialization of d is redundant as this value is never read > and it is overwritten inside the subsequent for-loop. Remove this > redundant assignment. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] libsas: Disable asynchronous aborts for SATA devices
Hannes, > Handling CD-ROM devices from libsas is decidedly odd, as libata relies > on SCSI EH to be started to figure out that no medium is present. So > we cannot do asynchronous aborts for SATA devices. Applied to 4.15/scsi-fixes. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [RESEND PATCH 2/2] scsi: qedi: Use zeroing allocator instead of allocator/memset
Himanshu, > Use dma_zalloc_coherent instead of dma_alloc_coherent followed by memset > 0. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [RESEND PATCH 1/2] scsi: bnx2fc: Use zeroing allocator rather than allocator/memset
Himanshu, > Use dma_zalloc_coherent instead of dma_alloc_coherent followed by > memset 0. Applied to 4.16/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
[PATCH 2/2] scsi: aacraid: Delay for rescan worker needs to be 10 seconds
The delay for the rescan worker needs to 10 seconds, missed the HZ in there. Fixes: a1367e4adee207fe (scsi: aacraid: Reschedule host scan in case of failure) Signed-off-by: Raghava Aditya Renukunta --- drivers/scsi/aacraid/aacraid.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h index 3e8bfcf..3ab3231 100644 --- a/drivers/scsi/aacraid/aacraid.h +++ b/drivers/scsi/aacraid/aacraid.h @@ -1340,7 +1340,7 @@ struct fib { #define AAC_DEVTYPE_ARC_RAW2 #define AAC_DEVTYPE_NATIVE_RAW 3 -#define AAC_SAFW_RESCAN_DELAY 10 +#define AAC_SAFW_RESCAN_DELAY (10 * HZ) struct aac_hba_map_info { __le32 rmw_nexus; /* nexus for native HBA devices */ -- 2.9.4
[PATCH 1/2] scsi: aacraid: Get correct lun count
The correct lun count needs to be divided by 24, missed it in the previous patch set. Fixes: 4b00022753550055 (scsi: aacraid: Create helper functions to get lun info) Signed-off-by: Raghava Aditya Renukunta --- drivers/scsi/aacraid/aachba.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c index a2bdd79..4c65991 100644 --- a/drivers/scsi/aacraid/aachba.c +++ b/drivers/scsi/aacraid/aachba.c @@ -1868,7 +1868,7 @@ static int aac_get_safw_ciss_luns(struct aac_dev *dev) static inline u32 aac_get_safw_phys_lun_count(struct aac_dev *dev) { - return get_unaligned_be32(&dev->safw_phys_luns->list_length[0]); + return get_unaligned_be32(&dev->safw_phys_luns->list_length[0])/24; } static inline u32 aac_get_safw_phys_bus(struct aac_dev *dev, int lun) -- 2.9.4
[PATCH v2] storvsc: do not assume SG list is continuous when doing bounce buffers (for 4.1 and prior stable kernels)
From: Long Li The original patch was made for stable 4.1 and was Acked on 08/22/2017, but for some reason it never made it to the stable tree. Change from v1: Changed comment that this patch is for linux-stable 4.1 and all prior stable kernels. storvsc checks the SG list for gaps before passing them to Hyper-v device. If there are gaps, data is copied to a bounce buffer and a continuous data buffer is passed to Hyper-V. The check on gaps assumes SG list is continuous, and not chained. This is not always true. Failing the check may result in incorrect I/O data passed to the Hyper-v device. This code path is not used post Linux 4.1. Signed-off-by: Long Li Acked-by: Martin K. Petersen --- drivers/scsi/storvsc_drv.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 6c52d14..14dc5c6 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -584,17 +584,18 @@ static int do_bounce_buffer(struct scatterlist *sgl, unsigned int sg_count) for (i = 0; i < sg_count; i++) { if (i == 0) { /* make sure 1st one does not have hole */ - if (sgl[i].offset + sgl[i].length != PAGE_SIZE) + if (sgl->offset + sgl->length != PAGE_SIZE) return i; } else if (i == sg_count - 1) { /* make sure last one does not have hole */ - if (sgl[i].offset != 0) + if (sgl->offset != 0) return i; } else { /* make sure no hole in the middle */ - if (sgl[i].length != PAGE_SIZE || sgl[i].offset != 0) + if (sgl->length != PAGE_SIZE || sgl->offset != 0) return i; } + sgl = sg_next(sgl); } return -1; } -- 2.7.4
Re: [PATCH 13/14] megaraid_sas: NVME passthru command support
On 2018-01-10 11:22 AM, Bart Van Assche wrote: On Tue, 2018-01-09 at 22:07 +0530, Kashyap Desai wrote: Overall NVME support behind MR controller is really a SCSI device. On top of that, for MegaRaid, NVME device can be part of Virtual Disk and those drive will not be exposed to the driver. User application may like to talk to hidden NVME devices (part of VDs). This patch will extend the existing interface for megaraid product in the same way it is currently supported for other protocols like SMP, SATA pass-through. It seems to me like there is a contradiction in the above paragraph: if some NVMe devices are not exposed to the driver, how can a user space application ever send NVMe commands to it? I think that he meant that the NVMe physical devices (e.g. SSDs) are not exposed to the upper layers (e.g. the SCSI mid-layer and above). The SCSI subsystem has a no_uld_attach device flag that lets a LLD attach physical devices but the sd driver and hence the block layer do not "see" them. The idea is that maintenance programs like smartmontools can use them via the bsg or sg drivers. The Megaraid driver code does not seem to use no_uld_attach. Does the NVMe subsystem have similar "generic" (i.e. non-block) devices accessible to the user space? Anyway, has it been considered to implement the NVMe support as an NVMe transport driver? The upstream kernel already supports NVMe communication with NVMe PCI devices, NVMe over RDMA and NVMe over FC. If communication to the NVMe devices behind the MegaRaid controller would be implemented as an NVMe transport driver then all functionality of the Linux NVMe driver could be reused, including its sysfs entries. Broadcom already sell "SAS" HBAs that have "tri-mode" phys. That is a phy that can connect to a SAS device (e.g. a SAS expander), a SATA device or a NVMe device. Now if I was Broadcom designing a 24 Gbps SAS-4 next generation expander I would be thinking of using those tri-mode phys on it. But then there is a problem, SAS currently supports 3 protocols: SSP (for SCSI storage and enclosure management (SES)), STP (for SATA storage ) and SMP (for expander management). The problem is how those NVMe commands, status and data cross the wire between the OS HBA (or MegaRaid type controller) and an expander. Solving that might need some lateral thinking. On one hand the NVM Express folks seem to have shelved the idea of a SCSI to NVMe Translation Layer (SNTL) and have not updated an old white paper on the subject. Currently there is no SNTL on Linux (there was but it was removed) or FreeBSD but there is one on Windows. On the other hand I'm informed that recently the same body accepted the SES-3 standard pretty much as-is. That is done with the addition of SES Send and SES Receive commands to NVME-MI. The library under sg_ses has already been modified to use them (by implementing a specialized SNTL). Doug Gilbert
Re: [PATCH] uas: Unblock scsi-requests on failure to alloc streams in post_reset
Hi, On 10-01-18 16:23, Oliver Neukum wrote: Am Mittwoch, den 10.01.2018, 08:13 +0100 schrieb Hans de Goede: If we return 1 from our post_reset handler, then our disconnect handler will be called immediately afterwards. Since pre_reset blocks all scsi requests our disconnect handler will then hang in the scsi_remove_host call. Hi Hans, it seems to me that the diagnosis is spot on. But why do we keep different code paths at all in this case? I do not see the point of not reporting the reset to the SCSI subsystem, even if we are not operational afterwards. So how about something like this? Sure, works for me :) Regards, Hans From 4d1e26154bc5d09913bfba34d7adc39cce98d20a Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Wed, 10 Jan 2018 16:16:03 +0100 Subject: [PATCH] usb: uas: unconditionally bring back host after reset Quoting Hans: If we return 1 from our post_reset handler, then our disconnect handler will be called immediately afterwards. Since pre_reset blocks all scsi requests our disconnect handler will then hang in the scsi_remove_host call. This is esp. bad because our disconnect handler hanging for ever also stops the USB subsys from enumerating any new USB devices, causes commands like lsusb to hang, etc. In practice this happens when unplugging some uas devices because the hub code may see the device as needing a warm-reset and calls usb_reset_device before seeing the disconnect. In this case uas_configure_endpoints fails with -ENODEV. We do not want to print an error for this, so this commit also silences the shost_printk for -ENODEV. ENDQUOTE However, if we do that we better drop any unconditional execution and report to the SCSI subsystem that we have undergone a reset but we are not operational now. Signed-off-by: Oliver Neukum Reported-by: Hans de Goede --- Makefile | 2 +- drivers/usb/storage/uas.c | 7 +++ 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c index 5d04c40ee40a..3b1b9695177a 100644 --- a/drivers/usb/storage/uas.c +++ b/drivers/usb/storage/uas.c @@ -1076,20 +1076,19 @@ static int uas_post_reset(struct usb_interface *intf) return 0; err = uas_configure_endpoints(devinfo); - if (err) { + if (err && err != ENODEV) shost_printk(KERN_ERR, shost, "%s: alloc streams error %d after reset", __func__, err); - return 1; - } + /* we must unblock the host in every case lest we deadlock */ spin_lock_irqsave(shost->host_lock, flags); scsi_report_bus_reset(shost, 0); spin_unlock_irqrestore(shost->host_lock, flags); scsi_unblock_requests(shost); - return 0; + return err ? 1 : 0; } static int uas_suspend(struct usb_interface *intf, pm_message_t message) -- 2.13.6
Re: [PATCH] tcmu: Allow reconfig to handle multiple attributes
On 01/04/2018 10:11 AM, Bryant G. Ly wrote: > This patch allows for multiple attributes to be reconfigured > and handled all in one call as compared to multiple netlinks. > > Example: > set attribute dev_reconfig=dev_config=fbo//home/path:dev_size=2147483648 > I know I suggested this, but I think I was wrong. If we have to support other apps that work in reverse to targetcli+tcmu-runner where the tcmu-runner equivalent app sets things up then contacts the kernel, let's just not do passthrough operations like this for reconfig. There is no need to bring in the kernel. For the initial config we can still do it since we have to maintain compat, but for major reconfigs like this let's just have targetcli contact tcmu-runner, then runner can update the kernel if needed. So in targetcli and runner copy the check_config stuff, and add a reconfig callout that works like it. We then do not have this weird kernel passthrough and do not have to worry about the non targetcl-tcmu-runner type of model. > Signed-off-by: Bryant G. Ly > --- > drivers/target/target_core_user.c | 92 > ++- > include/uapi/linux/target_core_user.h | 1 + > 2 files changed, 92 insertions(+), 1 deletion(-) > > diff --git a/drivers/target/target_core_user.c > b/drivers/target/target_core_user.c > index 4824abf92ed79..619fae5e865f1 100644 > --- a/drivers/target/target_core_user.c > +++ b/drivers/target/target_core_user.c > @@ -152,6 +152,8 @@ struct tcmu_dev { > char dev_config[TCMU_CONFIG_LEN]; > > int nl_reply_supported; > + > + char dev_reconfig[TCMU_CONFIG_LEN * 2]; > }; > > #define TCMU_DEV(_se_dev) container_of(_se_dev, struct tcmu_dev, se_dev) > @@ -1452,6 +1454,10 @@ static int tcmu_netlink_event(struct tcmu_dev *udev, > enum tcmu_genl_cmd cmd, > ret = nla_put_u8(skb, reconfig_attr, > *((u8 *)reconfig_data)); > break; > + case TCMU_ATTR_DEV_RECFG: > + pr_err("Put string into netlink and send it\n"); > + ret = nla_put_string(skb, reconfig_attr, reconfig_data); > + break; > default: > BUG(); > } > @@ -1637,7 +1643,7 @@ static void tcmu_destroy_device(struct se_device *dev) > > enum { > Opt_dev_config, Opt_dev_size, Opt_hw_block_size, Opt_hw_max_sectors, > - Opt_nl_reply_supported, Opt_err, > + Opt_nl_reply_supported, Opt_dev_reconfig, Opt_err, > }; > > static match_table_t tokens = { > @@ -1646,6 +1652,7 @@ static match_table_t tokens = { > {Opt_hw_block_size, "hw_block_size=%u"}, > {Opt_hw_max_sectors, "hw_max_sectors=%u"}, > {Opt_nl_reply_supported, "nl_reply_supported=%d"}, > + {Opt_dev_reconfig, "dev_reconfig=%s"}, > {Opt_err, NULL} > }; > > @@ -1731,6 +1738,14 @@ static ssize_t tcmu_set_configfs_dev_params(struct > se_device *dev, > if (ret < 0) > pr_err("kstrtoint() failed for > nl_reply_supported=\n"); > break; > + case Opt_dev_reconfig: > + arg_p = match_strdup(&args[0]); > + if (!arg_p) { > + ret = -ENOMEM; > + break; > + } > + kfree(arg_p); > + break; > default: > break; > } > @@ -1945,12 +1960,87 @@ static ssize_t tcmu_emulate_write_cache_store(struct > config_item *item, > } > CONFIGFS_ATTR(tcmu_, emulate_write_cache); > > +static ssize_t tcmu_dev_reconfig_show(struct config_item *item, char *page) > +{ > + struct se_dev_attrib *da = container_of(to_config_group(item), > + struct se_dev_attrib, da_group); > + struct tcmu_dev *udev = TCMU_DEV(da->da_dev); > + > + return snprintf(page, PAGE_SIZE, "%s\n", udev->dev_reconfig); > +} > + > +static ssize_t tcmu_dev_reconfig_store(struct config_item *item, > +const char *page, > +size_t count) > +{ > + struct se_dev_attrib *da = container_of(to_config_group(item), > + struct se_dev_attrib, da_group); > + struct tcmu_dev *udev = TCMU_DEV(da->da_dev); > + int token, ret; > + char *orig, *ptr, *opts, *arg_p; > + substring_t args[MAX_OPT_ARGS]; > + > + /* Check if device has been configured before */ > + if (tcmu_dev_configured(udev)) { > + ret = tcmu_netlink_event(udev, TCMU_CMD_RECONFIG_DEVICE, > + TCMU_ATTR_DEV_RECFG, page); > + if (ret) { > + pr_err("Unable to reconfigure device\n"); > + return ret; > + } > + > + opts = kstrdup(page, GFP_KERNEL); > +
[PATCH v2 2/4] block: Introduce blk_start_wait_if_quiesced() and blk_finish_wait_if_quiesced()
Introduce functions that allow block drivers to wait while a request queue is in the quiesced state (blk-mq) or in the stopped state (legacy block layer). The next patch will add calls to these functions in the SCSI core. Signed-off-by: Bart Van Assche Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: Ming Lei --- block/blk-core.c | 1 + block/blk-mq.c | 64 ++ include/linux/blk-mq.h | 2 ++ 3 files changed, 67 insertions(+) diff --git a/block/blk-core.c b/block/blk-core.c index c10b4ce95248..06eaea15bae9 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -287,6 +287,7 @@ void blk_start_queue(struct request_queue *q) WARN_ON_ONCE(q->mq_ops); queue_flag_clear(QUEUE_FLAG_STOPPED, q); + wake_up_all(&q->mq_wq); __blk_run_queue(q); } EXPORT_SYMBOL(blk_start_queue); diff --git a/block/blk-mq.c b/block/blk-mq.c index a05ea7e9b415..87455977ad34 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -247,11 +247,75 @@ void blk_mq_unquiesce_queue(struct request_queue *q) queue_flag_clear(QUEUE_FLAG_QUIESCED, q); spin_unlock_irqrestore(q->queue_lock, flags); + wake_up_all(&q->mq_wq); + /* dispatch requests which are inserted during quiescing */ blk_mq_run_hw_queues(q, true); } EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue); +/** + * blk_start_wait_if_quiesced() - wait if a queue is quiesced (blk-mq) or stopped (legacy block layer) + * @q: Request queue pointer. + * + * Some block drivers, e.g. the SCSI core, can bypass the block layer core + * request submission mechanism. Surround such code with + * blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced() to avoid that + * request submission can happen while a queue is quiesced or stopped. + * + * Returns with the RCU read lock held (blk-mq) or with q->queue_lock held + * (legacy block layer). + * + * Notes: + * - Every call of this function must be followed by a call of + * blk_finish_wait_if_quiesced(). + * - This function does not support block drivers whose .queue_rq() + * implementation can sleep (BLK_MQ_F_BLOCKING). + */ +int blk_start_wait_if_quiesced(struct request_queue *q) +{ + struct blk_mq_hw_ctx *hctx; + unsigned int i; + + might_sleep(); + + if (q->mq_ops) { + queue_for_each_hw_ctx(q, hctx, i) + WARN_ON(hctx->flags & BLK_MQ_F_BLOCKING); + + rcu_read_lock(); + while (!blk_queue_dying(q) && blk_queue_quiesced(q)) { + rcu_read_unlock(); + wait_event(q->mq_wq, blk_queue_dying(q) || + !blk_queue_quiesced(q)); + rcu_read_lock(); + } + } else { + spin_lock_irq(q->queue_lock); + wait_event_lock_irq(q->mq_wq, + blk_queue_dying(q) || !blk_queue_stopped(q), + *q->queue_lock); + q->request_fn_active++; + } + return blk_queue_dying(q) ? -ENODEV : 0; +} +EXPORT_SYMBOL(blk_start_wait_if_quiesced); + +/** + * blk_finish_wait_if_quiesced() - counterpart of blk_start_wait_if_quiesced() + * @q: Request queue pointer. + */ +void blk_finish_wait_if_quiesced(struct request_queue *q) +{ + if (q->mq_ops) { + rcu_read_unlock(); + } else { + q->request_fn_active--; + spin_unlock_irq(q->queue_lock); + } +} +EXPORT_SYMBOL(blk_finish_wait_if_quiesced); + void blk_mq_wake_waiters(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 8efcf49796a3..15912cd348b5 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -267,6 +267,8 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async); void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async); void blk_mq_quiesce_queue(struct request_queue *q); void blk_mq_unquiesce_queue(struct request_queue *q); +int blk_start_wait_if_quiesced(struct request_queue *q); +void blk_finish_wait_if_quiesced(struct request_queue *q); void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs); bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async); void blk_mq_run_hw_queues(struct request_queue *q, bool async); -- 2.15.1
[PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device
Several SCSI transport and LLD drivers surround code that does not tolerate concurrent calls of .queuecommand() with scsi_target_block() / scsi_target_unblock(). These last two functions use blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request queues to prevent concurrent .queuecommand() calls. However, that is not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). Hence surround the .queuecommand() call from the SCSI error handler with blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced(). Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into code that calls blk_get_request(), e.g. scsi_execute_req(), is not an option since scsi_send_eh_cmnd() can be called if all requests are allocated and if no requests will make progress without aborting any of these requests. Signed-off-by: Bart Van Assche Reviewed-by: Hannes Reinecke Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Johannes Thumshirn Cc: Ming Lei --- drivers/scsi/scsi_error.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 62b56de38ae8..f7154ea86715 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1016,6 +1016,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, { struct scsi_device *sdev = scmd->device; struct Scsi_Host *shost = sdev->host; + struct request_queue *q = sdev->request_queue; DECLARE_COMPLETION_ONSTACK(done); unsigned long timeleft = timeout; struct scsi_eh_save ses; @@ -1028,7 +1029,9 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, scsi_log_send(scmd); scmd->scsi_done = scsi_eh_done; + blk_start_wait_if_quiesced(q); rtn = shost->hostt->queuecommand(shost, scmd); + blk_finish_wait_if_quiesced(q); if (rtn) { if (timeleft > stall_for) { scsi_eh_restore_cmnd(scmd, &ses); -- 2.15.1
[PATCH v2 4/4] IB/srp: Fix a sleep-in-invalid-context bug
The previous two patches guarantee that srp_queuecommand() does not get invoked while reconnecting occurs. Hence remove the code from srp_queuecommand() that prevents command queueing while reconnecting. This patch avoids that the following can appear in the kernel log: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 in_atomic(): 1, irqs_disabled(): 0, pid: 5600, name: scsi_eh_9 1 lock held by scsi_eh_9/5600: #0: (rcu_read_lock){}, at: [] __blk_mq_run_hw_queue+0xf1/0x1e0 Preemption disabled at: [<139badf2>] __blk_mq_delay_run_hw_queue+0x78/0xf0 CPU: 9 PID: 5600 Comm: scsi_eh_9 Tainted: GW4.15.0-rc4-dbg+ #1 Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS 2.5.4 01/22/2016 Call Trace: dump_stack+0x67/0x99 ___might_sleep+0x16a/0x250 [ib_srp] __mutex_lock+0x46/0x9d0 srp_queuecommand+0x356/0x420 [ib_srp] scsi_dispatch_cmd+0xf6/0x3f0 scsi_queue_rq+0x4a8/0x5f0 blk_mq_dispatch_rq_list+0x73/0x440 blk_mq_sched_dispatch_requests+0x109/0x1a0 __blk_mq_run_hw_queue+0x131/0x1e0 __blk_mq_delay_run_hw_queue+0x9a/0xf0 blk_mq_run_hw_queue+0xc0/0x1e0 blk_mq_start_hw_queues+0x2c/0x40 scsi_run_queue+0x18e/0x2d0 scsi_run_host_queues+0x22/0x40 scsi_error_handler+0x18d/0x5f0 kthread+0x11c/0x140 ret_from_fork+0x24/0x30 Signed-off-by: Bart Van Assche Reviewed-by: Hannes Reinecke Cc: Jason Gunthorpe Cc: Doug Ledford --- drivers/infiniband/ulp/srp/ib_srp.c | 21 ++--- 1 file changed, 2 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 972d4b3c5223..670f187ecb91 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -2149,7 +2149,6 @@ static void srp_handle_qp_err(struct ib_cq *cq, struct ib_wc *wc, static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) { struct srp_target_port *target = host_to_target(shost); - struct srp_rport *rport = target->rport; struct srp_rdma_ch *ch; struct srp_request *req; struct srp_iu *iu; @@ -2159,16 +2158,6 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) u32 tag; u16 idx; int len, ret; - const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler; - - /* -* The SCSI EH thread is the only context from which srp_queuecommand() -* can get invoked for blocked devices (SDEV_BLOCK / -* SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by -* locking the rport mutex if invoked from inside the SCSI EH. -*/ - if (in_scsi_eh) - mutex_lock(&rport->mutex); scmnd->result = srp_chkready(target->rport); if (unlikely(scmnd->result)) @@ -2230,13 +2219,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) goto err_unmap; } - ret = 0; - -unlock_rport: - if (in_scsi_eh) - mutex_unlock(&rport->mutex); - - return ret; + return 0; err_unmap: srp_unmap_data(scmnd, ch, req); @@ -2258,7 +2241,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) ret = SCSI_MLQUEUE_HOST_BUSY; } - goto unlock_rport; + return ret; } /* -- 2.15.1
[PATCH v2 1/4] blk-mq: Rename request_queue.mq_freeze_wq into mq_wq
Rename a waitqueue in struct request_queue since the next patch will add code that uses this waitqueue outside the request queue freezing implementation. Signed-off-by: Bart Van Assche Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Johannes Thumshirn Cc: Ming Lei --- block/blk-core.c | 10 +- block/blk-mq.c | 10 +- include/linux/blkdev.h | 2 +- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index fa1cb95f7f6a..c10b4ce95248 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -377,7 +377,7 @@ void blk_clear_preempt_only(struct request_queue *q) spin_lock_irqsave(q->queue_lock, flags); queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q); - wake_up_all(&q->mq_freeze_wq); + wake_up_all(&q->mq_wq); spin_unlock_irqrestore(q->queue_lock, flags); } EXPORT_SYMBOL_GPL(blk_clear_preempt_only); @@ -648,7 +648,7 @@ void blk_set_queue_dying(struct request_queue *q) } /* Make blk_queue_enter() reexamine the DYING flag. */ - wake_up_all(&q->mq_freeze_wq); + wake_up_all(&q->mq_wq); } EXPORT_SYMBOL_GPL(blk_set_queue_dying); @@ -851,7 +851,7 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) */ smp_rmb(); - ret = wait_event_interruptible(q->mq_freeze_wq, + ret = wait_event_interruptible(q->mq_wq, (atomic_read(&q->mq_freeze_depth) == 0 && (preempt || !blk_queue_preempt_only(q))) || blk_queue_dying(q)); @@ -872,7 +872,7 @@ static void blk_queue_usage_counter_release(struct percpu_ref *ref) struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); - wake_up_all(&q->mq_freeze_wq); + wake_up_all(&q->mq_wq); } static void blk_rq_timed_out_timer(struct timer_list *t) @@ -948,7 +948,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) q->bypass_depth = 1; __set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags); - init_waitqueue_head(&q->mq_freeze_wq); + init_waitqueue_head(&q->mq_wq); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. diff --git a/block/blk-mq.c b/block/blk-mq.c index 29f140b4dbf7..a05ea7e9b415 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -137,16 +137,16 @@ EXPORT_SYMBOL_GPL(blk_freeze_queue_start); void blk_mq_freeze_queue_wait(struct request_queue *q) { - wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); + wait_event(q->mq_wq, percpu_ref_is_zero(&q->q_usage_counter)); } EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait); int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout) { - return wait_event_timeout(q->mq_freeze_wq, - percpu_ref_is_zero(&q->q_usage_counter), - timeout); + return wait_event_timeout(q->mq_wq, + percpu_ref_is_zero(&q->q_usage_counter), + timeout); } EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout); @@ -185,7 +185,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q) WARN_ON_ONCE(freeze_depth < 0); if (!freeze_depth) { percpu_ref_reinit(&q->q_usage_counter); - wake_up_all(&q->mq_freeze_wq); + wake_up_all(&q->mq_wq); } } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 50fb1c18ec54..2c74c03a9d5f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -638,7 +638,7 @@ struct request_queue { struct throtl_data *td; #endif struct rcu_head rcu_head; - wait_queue_head_t mq_freeze_wq; + wait_queue_head_t mq_wq; struct percpu_ref q_usage_counter; struct list_headall_q_node; -- 2.15.1
[PATCH v2 0/4] Make SCSI transport recovery more robust
Hello Jens, A longstanding issue with the SCSI core is that several SCSI transport drivers use scsi_target_block() and scsi_target_unblock() to avoid concurrent .queuecommand() calls during e.g. transport recovery but that this is not sufficient to protect from such calls. Hence this patch series. An additional benefit of this patch series is that it allows to remove an ugly hack from the SRP initiator driver. Please consider this patch series for kernel v4.16. Thanks, Bart. Changes compared to v1: - Renamed blk_wait_if_quiesced() into blk_start_wait_if_quiesced(). - Mentioned in the comment above blk_start_wait_if_quiesced() that every call of this function has to be followed by a call of blk_finish_wait_if_quiesced(). Bart Van Assche (4): blk-mq: Rename request_queue.mq_freeze_wq into mq_wq block: Introduce blk_start_wait_if_quiesced() and blk_finish_wait_if_quiesced() scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device IB/srp: Fix a sleep-in-invalid-context bug block/blk-core.c| 11 +++--- block/blk-mq.c | 74 ++--- drivers/infiniband/ulp/srp/ib_srp.c | 21 +-- drivers/scsi/scsi_error.c | 3 ++ include/linux/blk-mq.h | 2 + include/linux/blkdev.h | 2 +- 6 files changed, 83 insertions(+), 30 deletions(-) -- 2.15.1
Re: scsi: Always retry internal target error
> On 10/17/2017 09:11 AM, Hannes Reinecke wrote: >> EMC Symmetrix is returning 'internal target error' for a variety >> of conditions, most of which will be transient. >> So we should always retry it, even with failfast set. >> Otherwise we'd get spurious path flaps with multipath. >> >> Signed-off-by: Hannes Reinecke >> --- >> drivers/scsi/scsi_error.c | 13 + >> 1 file changed, 13 insertions(+) >> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >> index 5086489..5722c2e 100644 >> --- a/drivers/scsi/scsi_error.c >> +++ b/drivers/scsi/scsi_error.c >> @@ -497,6 +497,16 @@ int scsi_check_sense(struct scsi_cmnd *scmd) >> if (sshdr.asc == 0x10) /* DIF */ >> return SUCCESS; >> >> +if (!strncmp(scmd->device->vendor, "EMC", 3) && >> +!strncmp(scmd->device->model, "SYMMETRIX", 9) && >> +(sshdr.asc == 0x44) && (sshdr.ascq == 0x0)) { >> +/* >> + * EMC Symmetrix returns 'Internal target failure' >> + * for a variety of internal issues, all of which >> + * can be recovered by retry. >> + */ >> +return ADD_TO_MLQUEUE; >> +} >> return NEEDS_RETRY; >> case NOT_READY: >> case UNIT_ATTENTION: >> @@ -1846,6 +1856,9 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd) >> rtn = scsi_check_sense(scmd); >> if (rtn == NEEDS_RETRY) >> goto maybe_retry; >> +else if (rtn == ADD_TO_MLQUEUE) >> +/* Always enforce a retry for ADD_TO_MLQUEUE */ >> +rtn = NEEDS_RETRY; >> /* if rtn == FAILED, we have no sense information; >> * returning FAILED will wake the error handler thread >> * to collect the sense and redo the decide >> On 10/18/2017 07:27 AM, Hannes Reinecke wrote: >> Yeah, you are right. >> Will be adding a blacklist entry for this. Is there a new patch ongoing?
Re: [PATCH 13/14] megaraid_sas: NVME passthru command support
On Tue, 2018-01-09 at 22:07 +0530, Kashyap Desai wrote: > Overall NVME support behind MR controller is really a SCSI device. On top > of that, for MegaRaid, NVME device can be part of Virtual Disk and those > drive will not be exposed to the driver. User application may like to talk > to hidden NVME devices (part of VDs). This patch will extend the existing > interface for megaraid product in the same way it is currently supported > for other protocols like SMP, SATA pass-through. It seems to me like there is a contradiction in the above paragraph: if some NVMe devices are not exposed to the driver, how can a user space application ever send NVMe commands to it? Anyway, has it been considered to implement the NVMe support as an NVMe transport driver? The upstream kernel already supports NVMe communication with NVMe PCI devices, NVMe over RDMA and NVMe over FC. If communication to the NVMe devices behind the MegaRaid controller would be implemented as an NVMe transport driver then all functionality of the Linux NVMe driver could be reused, including its sysfs entries. Bart.
Re: [PATCH] uas: Unblock scsi-requests on failure to alloc streams in post_reset
Am Mittwoch, den 10.01.2018, 08:13 +0100 schrieb Hans de Goede: > If we return 1 from our post_reset handler, then our disconnect handler > will be called immediately afterwards. Since pre_reset blocks all scsi > requests our disconnect handler will then hang in the scsi_remove_host > call. Hi Hans, it seems to me that the diagnosis is spot on. But why do we keep different code paths at all in this case? I do not see the point of not reporting the reset to the SCSI subsystem, even if we are not operational afterwards. So how about something like this? >From 4d1e26154bc5d09913bfba34d7adc39cce98d20a Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Wed, 10 Jan 2018 16:16:03 +0100 Subject: [PATCH] usb: uas: unconditionally bring back host after reset Quoting Hans: If we return 1 from our post_reset handler, then our disconnect handler will be called immediately afterwards. Since pre_reset blocks all scsi requests our disconnect handler will then hang in the scsi_remove_host call. This is esp. bad because our disconnect handler hanging for ever also stops the USB subsys from enumerating any new USB devices, causes commands like lsusb to hang, etc. In practice this happens when unplugging some uas devices because the hub code may see the device as needing a warm-reset and calls usb_reset_device before seeing the disconnect. In this case uas_configure_endpoints fails with -ENODEV. We do not want to print an error for this, so this commit also silences the shost_printk for -ENODEV. ENDQUOTE However, if we do that we better drop any unconditional execution and report to the SCSI subsystem that we have undergone a reset but we are not operational now. Signed-off-by: Oliver Neukum Reported-by: Hans de Goede --- Makefile | 2 +- drivers/usb/storage/uas.c | 7 +++ 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c index 5d04c40ee40a..3b1b9695177a 100644 --- a/drivers/usb/storage/uas.c +++ b/drivers/usb/storage/uas.c @@ -1076,20 +1076,19 @@ static int uas_post_reset(struct usb_interface *intf) return 0; err = uas_configure_endpoints(devinfo); - if (err) { + if (err && err != ENODEV) shost_printk(KERN_ERR, shost, "%s: alloc streams error %d after reset", __func__, err); - return 1; - } + /* we must unblock the host in every case lest we deadlock */ spin_lock_irqsave(shost->host_lock, flags); scsi_report_bus_reset(shost, 0); spin_unlock_irqrestore(shost->host_lock, flags); scsi_unblock_requests(shost); - return 0; + return err ? 1 : 0; } static int uas_suspend(struct usb_interface *intf, pm_message_t message) -- 2.13.6
Re: [-next PATCH 2/4] treewide: Use DEVICE_ATTR_RW
On 2017-12-20 12:54, Jarkko Nikula wrote: > On Wed, Dec 20, 2017 at 10:32:11AM +0100, Greg Kroah-Hartman wrote: >> On Wed, Dec 20, 2017 at 01:24:44AM -0800, Joe Perches wrote: >>> On Wed, 2017-12-20 at 10:34 +0200, Jarkko Nikula wrote: On Tue, Dec 19, 2017 at 10:15:07AM -0800, Joe Perches wrote: > Convert DEVICE_ATTR uses to DEVICE_ATTR_RW where possible. >>> [] > diff --git a/sound/soc/omap/mcbsp.c b/sound/soc/omap/mcbsp.c >>> [] > @@ -854,7 +854,7 @@ static ssize_t dma_op_mode_store(struct device *dev, > return size; > } > > -static DEVICE_ATTR(dma_op_mode, 0644, dma_op_mode_show, > dma_op_mode_store); > +static DEVICE_ATTR_RW(dma_op_mode); > While I can ack this part here if it helps generic cleanup effort I don't understart would it improve code readability in general? Mode 644 is clear and don't need any grepping but for DEVICE_ATTR_RW() I had to go through all of these files in order to see what does it mean: >> >> Yeah, 644 is "clear", but _RW() is even more clear. Ideally I want to >> get rid of all of the "non-standard" users that set random modes of >> sysfs files, as we get it wrong too many times. Using the "defaults" is >> much better. >> > Fair enough. For the sound/soc/omap/ (Acked-by was missing from my > previous reply): > > Acked-by: Jarkko Nikula And from me to the same file (sound/soc/omap/mcbsp.c): Acked-by: Peter Ujfalusi - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Re: [RESEND PATCH 1/2] scsi: bnx2fc: Use zeroing allocator rather than allocator/memset
On Tue, 9 Jan 2018, 4:06am, Himanshu Jha wrote: > Use dma_zalloc_coherent instead of dma_alloc_coherent followed by > memset 0. > > Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci > > Suggested-by: Luis R. Rodriguez > Signed-off-by: Himanshu Jha > --- > drivers/scsi/bnx2fc/bnx2fc_hwi.c | 60 > +--- > drivers/scsi/bnx2fc/bnx2fc_tgt.c | 51 +++--- > 2 files changed, 47 insertions(+), 64 deletions(-) > Sorry, didn't realize I needed to ack the resend. Acked-by: Chad Dupuis
RE: [PATCH 13/14] megaraid_sas: NVME passthru command support
> -Original Message- > From: Douglas Gilbert [mailto:dgilb...@interlog.com] > Sent: Wednesday, January 10, 2018 2:21 AM > To: Christoph Hellwig; Kashyap Desai > Cc: Shivasharan Srikanteshwara; linux-scsi@vger.kernel.org; Sumit Saxena; > linux-n...@lists.infradead.org; Peter Rivera > Subject: Re: [PATCH 13/14] megaraid_sas: NVME passthru command support > > On 2018-01-09 11:45 AM, Christoph Hellwig wrote: > > On Tue, Jan 09, 2018 at 10:07:28PM +0530, Kashyap Desai wrote: > >> Chris - > >> > >> Overall NVME support behind MR controller is really a SCSI device. On > >> top of that, for MegaRaid, NVME device can be part of Virtual Disk > >> and those drive will not be exposed to the driver. User application > >> may like to talk to hidden NVME devices (part of VDs). This patch > >> will extend the existing interface for megaraid product in the same > >> way it is currently supported for other protocols like SMP, SATA pass- > through. > >> > >> Example - Current smartmon is using megaraid.h (MFI headers) to send > >> SATA pass-through. > >> > >> https://github.com/mirror/smartmontools/blob/master/megaraid.h > > > > And that is exactly the example of why we should have never allowed > > megaraid any private passthrough ioctls to start with. > > Christoph, > Have you tried to do any serious work with and say > compared it with FreeBSD and Microsoft's approach? No prize for guessing > which one is worst (and least extensible). Looks like the Linux > pass-through > was at the end of a ToDo list and was "designed" > at 5 a.m in the morning. > > RAID cards need a pass-through that allows them to address one of many > physical disks behind the virtual disk presented to OS. > Pass-throughs need to have uncommited room for extra parameters that will > be passed through as-is to the RAID LLD. Doug - As you mentioned, I notice the same. This type of issue is common for all RAID controllers vendors. Whatever Christoph mentioned about NVMe type API to be used is possible, but may need extra hit in firmware side to convert Linux NVME API to FW specific OR deal the same in driver. It may come with it's own pros/cons. Also may not fulfil the end goal. For other platforms, we still have to depend upon specialized pass-through code. So having said that Firmware of RAID cannot use only one interface for pass-through and they have to choose specialized pass-through code. NVME-CLI interface is designed for NVME drives attached to block layer. MegaRaid product is design to keep NVME protocol abstracted (much like SATA drives behind SAS controller) and attach those drives/virtual disk to SCSI layer. > > So until Christoph gives an example of how that can be done with > then I would like to see Christoph's objection > ignored. > > > And as a maintainer of smartmontools, I would like to point out that > pretty > well all supported RAIDs, on all platforms need specialized pass-through > code. If upstream community like to enhance nvme-cli type interface in megaraid_sas driver, we may have to come up with one more layer in megaraid_sas driver to convert NVME-API to specialized pass-through code. It is really not simple to fit into existing design as NVME-CLI/API is considering NVME drive associated with nvme.ko modules (/dev/nvmeX). Also we don't have many sysfs entries nvme-cli is looking for NVME device etc.. We don't have way to talk to Physical disks which is part of VD etc.. Specialized pass-through code is better to extend in application like smartmontools etc. > Start by looking at os_linux.cpp and then at the other OSes. And now > smartmontools supports NVMe on most platforms and at the pass-through > level, it is just another one, and not a particularly clean one. > > IMO Intel had their chance on the pass-through front, and blew it. > It is now too late to fix it and that job (impossible ?) should not fall > to > MegaRaid maintainers. > > Douglas Gilbert
RE: [PATCH 13/14] megaraid_sas: NVME passthru command support
> -Original Message- > From: Keith Busch [mailto:keith.bu...@intel.com] > Sent: Wednesday, January 10, 2018 4:53 AM > To: Douglas Gilbert > Cc: Christoph Hellwig; Kashyap Desai; Shivasharan Srikanteshwara; Sumit > Saxena; Peter Rivera; linux-n...@lists.infradead.org; linux- > s...@vger.kernel.org > Subject: Re: [PATCH 13/14] megaraid_sas: NVME passthru command support > > On Tue, Jan 09, 2018 at 03:50:44PM -0500, Douglas Gilbert wrote: > > Have you tried to do any serious work with and > > say compared it with FreeBSD and Microsoft's approach? No prize for > > guessing which one is worst (and least extensible). Looks like the > > Linux pass-through was at the end of a ToDo list and was "designed" > > at 5 a.m in the morning. > > What the heck are you talking about? FreeBSD's NVMe passthrough is near > identical to Linux, and Linux's existed years prior. > > You're not even touching the nvme subsystem, so why are you copying the > linux-nvme mailing list to help you with a non-NVMe device? Please take your > ignorant and dubious claims elsewhere. Keith - As we discussed for mpt3sas driver NVME driver support, there was request to add linux-n...@lists.infradead.org for NVME related discussion. https://marc.info/?l=linux-kernel&m=149874673729467&w=2 As you mentioned, we are not touching NVME subsystem, we can skip to add NVME mailing list for future submission w.r.t NVME drive behind MR (megaraid_sas) and HBA (mpt3sas). All the NVME drives behind MegaRaid controller is SCSI device irrespective of transport. Kashyap
Re: [PATCH] libsas: Disable asynchronous aborts for SATA devices
On 01/10/2018 09:15 AM, Christoph Hellwig wrote: > Looks fine to me for 4.15 and -stable: > > Reviewed-by: Christoph Hellwig > > But we really need to fix this properly in the long run. > Looked into it, but I'm not sure if we can due to the fundamental differences between SCSI and libata EH. And there really is no way on how we can do async aborts on SATA; as soon as we're submitting NCQ commands _all_ commands will be aborted on error and we have to pick up the pieces. We might be handling things a tad better than now (we're always punting the abort to a workqueue, only to figure out from the workqueue function that we should've invoked SCSI EH proper). But I can't really see a better way here. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
[PATCHv2] libsas: Disable asynchronous aborts for SATA devices
Handling CD-ROM devices from libsas is decidedly odd, as libata relies on SCSI EH to be started to figure out that no medium is present. So we cannot do asynchronous aborts for SATA devices. Fixes: 909657615d9 ("scsi: libsas: allow async aborts") Cc: # 4.12+ Signed-off-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Tested-by: Yves-Alexis Perez --- drivers/scsi/libsas/sas_scsi_host.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index 6267272..6de9681 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -487,15 +487,28 @@ static int sas_queue_reset(struct domain_device *dev, int reset_type, int sas_eh_abort_handler(struct scsi_cmnd *cmd) { - int res; + int res = TMF_RESP_FUNC_FAILED; struct sas_task *task = TO_SAS_TASK(cmd); struct Scsi_Host *host = cmd->device->host; + struct domain_device *dev = cmd_to_domain_dev(cmd); struct sas_internal *i = to_sas_internal(host->transportt); + unsigned long flags; if (!i->dft->lldd_abort_task) return FAILED; - res = i->dft->lldd_abort_task(task); + spin_lock_irqsave(host->host_lock, flags); + /* We cannot do async aborts for SATA devices */ + if (dev_is_sata(dev) && !host->host_eh_scheduled) { + spin_unlock_irqrestore(host->host_lock, flags); + return FAILED; + } + spin_unlock_irqrestore(host->host_lock, flags); + + if (task) + res = i->dft->lldd_abort_task(task); + else + SAS_DPRINTK("no task to abort\n"); if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE) return SUCCESS; -- 1.8.5.6
Re: [PATCH] libsas: Disable asynchronous aborts for SATA devices
On Tue, 2018-01-09 at 16:43 +0100, Hannes Reinecke wrote: > Handling CD-ROM devices from libsas is decidedly odd, as libata > relies on SCSI EH to be started to figure out that no medium is > present. > So we cannot do asynchronous aborts for SATA devices. The box boots fine with this change, thanks! Tested-by: Yves-Alexis Perez -- Yves-Alexis signature.asc Description: This is a digitally signed message part
Re: [PATCH] libsas: Disable asynchronous aborts for SATA devices
Looks fine to me for 4.15 and -stable: Reviewed-by: Christoph Hellwig But we really need to fix this properly in the long run.