Re: [PATCH 3/4] scsi: reintroduce scsi_driver.init_command
On Sun, Mar 30, 2014 at 10:45:04PM -0700, Nicholas A. Bellinger wrote: int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req) { - struct scsi_cmnd *cmd; - int ret = scsi_prep_state_check(sdev, req); - - if (ret != BLKPREP_OK) - return ret; - - cmd = scsi_get_cmd_from_req(sdev, req); - if (unlikely(!cmd)) - return BLKPREP_DEFER; + struct scsi_cmnd *cmd = req-special; Mmm, I thought that req-special was only holding a pointer to a pre-allocated scsi_cmnd during mq operation, no..? For the mq case the block layer sets up req-special. For the old case scsi_get_cmd_from_req sets up req-special the first it is called, and leaves it in there until the command is done. + ret = scsi_prep_state_check(sdev, req); + if (ret != BLKPREP_OK) + goto out; + + cmd = scsi_get_cmd_from_req(sdev, req); + if (unlikely(!cmd)) { + ret = BLKPREP_DEFER; + goto out; + } + From here the req-special pointer may or may not be set depending on mq operation, no..? a) there is no blk-mq support yet in James tree (+ these patches) b) even in my scsi-mq tree this code path won't be called for the mq case c) after scsi_get_cmd_from_req returns req-special will always be set up to point to the scsi_cmnd: static struct scsi_cmnd *scsi_get_cmd_from_req(struct scsi_device *sdev, struct request *req) { struct scsi_cmnd *cmd; if (!req-special) { ... req-special = cmd; } else { cmd = req-special; } /* pull a tag out of the request if we have one */ cmd-tag = req-tag; cmd-request = req; cmd-cmnd = req-cmd; cmd-prot_op = SCSI_PROT_NORMAL; return cmd; } } ret = scsi_setup_fs_cmnd(sdp, rq); if (ret != BLKPREP_OK) And this currently assumes req-special is always set when calling scsi_setup_fs_cmnd() That is the case both before and after this patch. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] scsi: explicitly release bidi buffers
On Mon, Mar 31, 2014 at 12:58:01AM -0500, Mike Christie wrote: Hey, I just wanted to double check that the only time we do bidi requests is when they come through as REQ_TYPE_BLOCK_PC requests. I saw some of the code comments on the init side that indicated that, but there was that scsi_end_request() - scsi_release_buffers() - __scsi_release_buffers() call which then did not make sense because we were passing in 1 for the bidi check argument. We only support BIDI for REQ_TYPE_BLOCK_PC requests, and scsi_io_completion enforces this: if (req-cmd_type == REQ_TYPE_BLOCK_PC) { /* SG_IO ioctl from block level */ ... if (scsi_bidi_cmnd(cmd)) { ... end_io and return } } /* no bidi support for !REQ_TYPE_BLOCK_PC yet */ BUG_ON(blk_bidi_rq(req)); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] scsi: reintroduce scsi_driver.init_command
On Mon, 2014-03-31 at 08:08 +0200, Christoph Hellwig wrote: On Sun, Mar 30, 2014 at 10:45:04PM -0700, Nicholas A. Bellinger wrote: int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req) { - struct scsi_cmnd *cmd; - int ret = scsi_prep_state_check(sdev, req); - - if (ret != BLKPREP_OK) - return ret; - - cmd = scsi_get_cmd_from_req(sdev, req); - if (unlikely(!cmd)) - return BLKPREP_DEFER; + struct scsi_cmnd *cmd = req-special; Mmm, I thought that req-special was only holding a pointer to a pre-allocated scsi_cmnd during mq operation, no..? For the mq case the block layer sets up req-special. For the old case scsi_get_cmd_from_req sets up req-special the first it is called, and leaves it in there until the command is done. Er, yes of course. Reviewed-by: Nicholas Bellinger n...@linux-iscsi.org -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: misc scsi midlayer updates
Hi Boaz, Hi Christoph Many years ago I have sent these exact patches but no-one cared Including me I guess. I didn't remember your older patches, sorry. I think my: scsi_lib: Remove that __scsi_release_buffers contraption Is more elegant, is layered better and is smaller code. (please consider my version) I very much disagree - the bidi code uses a separate request for it's payload, uses separate functions to set it up at the low-level so mirroring it with a separate teardown makes sense. This also avoids having to do any bidi check at all in the fast path. Also the first patch is some more cleanup you'd like. Doesn't look bad, but not that importan either. The main patch of collapsing scsi_end_request is basically the same. I like the goto version better beause it avoids additional duplication from inside the switch and the bidi path, but it should be functionally equivalent. Please note the 4th patch which is a real BUG, titled: scsi_lib: Can't RETRY scsi_cmnd if some bytes were completed That fix seems very hard to read due to the arithmetic comparism on the enum value. The way I try to understand it is that you never want to retry if ((an error happened) (bytes were completed)) but the explanation should be expanded. [Your patches have been tested within my MQ testing right?] Yes. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] scsi: reintroduce scsi_driver.init_command
On 03/27/2014 11:14 AM, Christoph Hellwig wrote: @@ -1663,6 +1652,8 @@ static int sd_done(struct scsi_cmnd *SCpnt) unsigned char op = SCpnt-cmnd[0]; unsigned char unmap = SCpnt-cmnd[1] 8; + sd_uninit_command(SCpnt); + The above call would free the cmnd-cmnd and set it to null. If then scsi_io_completion was going to do some error processing it looks like it could try to access the scsi_cmnd-cmnd field. With the current code that would not be a problem because the blk unprep callback is not called until the block layer does its request cleanup in blk_finish_request which as you know is after scsi_io_completion/scsi_end_request is done with the cmnd. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] [SCSI] Fix abort state memory problem
On 03/28/2014 06:49 PM, James Bottomley wrote: The abort state is being stored persistently across commands, meaning if the command times out a second time, the error handler thinks an abort is still pending. Fix this by properly resetting the abort flag after the abort is finished. Signed-off-by: James Bottomley jbottom...@parallels.com --- drivers/scsi/scsi_error.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 771c16b..b9f3b07 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -869,10 +869,13 @@ static int scsi_try_bus_device_reset(struct scsi_cmnd *scmd) static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_cmnd *scmd) { - if (!hostt-eh_abort_handler) - return FAILED; + int retval = FAILED; + + if (hostt-eh_abort_handler) + retval = hostt-eh_abort_handler(scmd); - return hostt-eh_abort_handler(scmd); + scmd-eh_eflags = ~SCSI_EH_ABORT_SCHEDULED; + return retval; } static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd) Hmm. No, I don't think this is correct. SCSI_EH_ABORT_SCHEDULED signifies whether scmd_eh_abort_handler() needs to run. As such it needs to be reset when the workqueue item is triggered. Can you try with this instead? diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 771c16b..9694803 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -120,6 +120,8 @@ scmd_eh_abort_handler(struct work_struct *work) struct scsi_device *sdev = scmd-device; int rtn; + scmd-eh_eflags = ~SCSI_EH_ABORT_SCHEDULED; + if (scsi_host_eh_past_deadline(sdev-host)) { SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-3.15] scsi/libiscsi: Fix static checker warning on bh locking
On 03/30/2014 07:26 AM, Or Gerlitz wrote: From: Shlomo Pongratz shlo...@mellanox.com Commit 659743b [SCSI] libiscsi: Reduce locking contention in fast path introduced a new smatch warning on libiscsi.c iscsi_xmit_task() warn: inconsistent returns bottom_half:: locked (1410 [(-61)]) unlocked (1425 [0], 1425 [s32min-(-1),1-s32max]), which we can eliminate by using non bh locking on the nested spin_lock call. Reported-by: Dan Carpenter dan.carpen...@oracle.com Signed-off-by: Shlomo Pongratz shlo...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/scsi/libiscsi.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 5b8605c..5087957 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1411,9 +1411,9 @@ static int iscsi_xmit_task(struct iscsi_conn *conn) conn-task = NULL; } /* regular RX path uses back_lock */ - spin_lock_bh(conn-session-back_lock); + spin_lock(conn-session-back_lock); __iscsi_put_task(task); - spin_unlock_bh(conn-session-back_lock); + spin_unlock(conn-session-back_lock); return rc; Thanks for cleaning this up Or and Shlomo. Reviewed-by: Mike Christie micha...@cs.wisc.edu -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [SCSI] Fix spurious request sense in error handling
On 03/28/2014 06:50 PM, James Bottomley wrote: We unconditionally execute scsi_eh_get_sense() to make sure all failed commands that should have sense attached, do. However, the routine forgets that some commands, because of the way they fail, will not have any sense code ... we should not bother them with a REQUEST_SENSE command. Fix this by testing to see if we actually got a CHECK_CONDITION return and skip asking for sense if we don't. Tested-by: Alan Stern st...@rowland.harvard.edu Signed-off-by: James Bottomley jbottom...@parallels.com Acked-by: Hannes Reinecke h...@suse.de Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On 03/28/2014 08:29 PM, Alan Stern wrote: On Fri, 28 Mar 2014, James Bottomley wrote: This is a set of three patches we agreed to a while ago to eliminate a USB deadlock. I did rewrite the first patch, if it could be reviewed and tested. I tested all three patches under the same conditions as before, and they all worked correctly. In the revised patch 1, the meaning of SCSI_EH_ABORT_SCHEDULED isn't entirely clear. This boils down to two questions, which I don't know the answers to: What should happen in scmd_eh_abort_handler() if scsi_host_eh_past_deadline() returns true and thereby prevents scsi_try_to_abort_cmd() from being called? The flag wouldn't get cleared then. Ah. Correct. But that's due to the first patch being incorrect. Cf my response to the original first patch. What should happen if some other pathway manages to call scsi_try_to_abort_cmd() while scmd-abort_work is still sitting on the work queue? The command would be aborted and the flag would be cleared, but the queued work would remain. Can this ever happen? Not that I could see. A command abort is only ever triggered by the request timeout from the block layer. And the timer is _not_ rearmed once the timeout function (here: scsi_times_out()) is called. Hence I fail to see how it can be called concurrently. Maybe scmd_eh_abort_handler() should check the flag before doing anything. Is there any sort of sychronization to prevent the same incarnation of a command from being aborted twice (or by two different threads at the same time)? If there is, it isn't obvious. See above. scsi_times_out() will only ever called once. What can happen, though, is that _theoretically_ the LLDD might decide to call -done() on a timed out command when scsi_eh_abort_handler() is still pending. (Also, what's going on at the start of scsi_abort_command()? Contrary to what one might expect, the first part of the function _cancels_ a scheduled abort. And it does so without clearing the SCSI_EH_ABORT_SCHEDULED flag.) The original idea was this: SCSI_EH_ABORT_SCHEDULED is sticky _per command_. Point is, any command abort is only ever send for a timed-out command. And the main problem for a timed-out command is that we simply _do not_ know what happened for that command. So _if_ a command timed out, _and_ we've send an abort, _and_ the command times out _again_ we'll be running into an endless loop between timeout and aborting, and never returning the command at all. So to prevent this we should set a marker on that command telling it to _not_ try to abort the command again. Which is what 'SCSI_EH_ABORT_SCHEDULED' was meant for: - A command times out, sets 'SCSI_EH_ABORT_SCHEDULED' - abort _succeeds_ - The same command times out a second time, notifies that SCSI_EH_ABORT_SCHEDULED is set, and doesn't call scsi_eh_abort_command() but rather escalates directly. _However_, I do feel that we've gotten the wrong track with all of this. In you original mail you stated: Basically, usb-storage deadlocks when the SCSI error handler invokes the eh_device_reset_handler callback while a command is running. The command has timed out and will never complete normally, because the device's firmware has crashed. But usb-storage's device-reset routine waits for the current command to finish, which brings everything to a standstill. Question now to you as the USB god: A command abort is only _ever_ send after a command timeout. And the main idea of the command abort is to remove any context the LLDD or the target might have. So by the time the command abort returns SUCCESS we _expect_ the firmware to have cleared that command. If for some reason the firmware isn't capable of doing so, it should return FAILED. So: - Has the command timeout fired? - If so, why hasn't it returned FAILED? - If it had returned SUCCESS, why did the device_reset_handler wait for the command to complete? Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] m68k/atari - atari_scsi: use correct virt/phys translation for DMA buffer
Hi James, If you don't object, I'd like to take this one through the m68k tree, as it depends on a new m68k platform function. On Sun, Mar 30, 2014 at 1:01 AM, Michael Schmitz schmitz...@gmail.com wrote: With new ST-RAM allocation to work also when the kernel is running from FastRAM, use dedicated virt/phys translation for this case. Signed-off-by: Michael Schmitz schm...@debian.org --- drivers/scsi/atari_scsi.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/atari_scsi.c b/drivers/scsi/atari_scsi.c index 296c936..a8d721f 100644 --- a/drivers/scsi/atari_scsi.c +++ b/drivers/scsi/atari_scsi.c @@ -639,7 +639,7 @@ static int __init atari_scsi_detect(struct scsi_host_template *host) double buffer\n); return 0; } - atari_dma_phys_buffer = virt_to_phys(atari_dma_buffer); + atari_dma_phys_buffer = atari_stram_to_phys(atari_dma_buffer); atari_dma_orig_addr = 0; } #endif -- 1.7.0.4 Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/4] m68k/atari - atari_scsi: use correct virt/phys translation for DMA buffer
With the kernel running from FastRAM instead of ST-RAM, none of ST-RAM is mapped by mem_init, and DMA-addressable buffer must be mapped by ioremap. Use platform specific virt/phys translation helpers for this case. Signed-off-by: Michael Schmitz schm...@debian.org Cc: linux-scsi@vger.kernel.org --- drivers/scsi/atari_scsi.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/atari_scsi.c b/drivers/scsi/atari_scsi.c index 296c936..a8d721f 100644 --- a/drivers/scsi/atari_scsi.c +++ b/drivers/scsi/atari_scsi.c @@ -639,7 +639,7 @@ static int __init atari_scsi_detect(struct scsi_host_template *host) double buffer\n); return 0; } - atari_dma_phys_buffer = virt_to_phys(atari_dma_buffer); + atari_dma_phys_buffer = atari_stram_to_phys(atari_dma_buffer); atari_dma_orig_addr = 0; } #endif -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] scsi: reintroduce scsi_driver.init_command
On Mon, Mar 31, 2014 at 01:56:13AM -0500, Mike Christie wrote: The above call would free the cmnd-cmnd and set it to null. If then scsi_io_completion was going to do some error processing it looks like it could try to access the scsi_cmnd-cmnd field. With the current code that would not be a problem because the blk unprep callback is not called until the block layer does its request cleanup in blk_finish_request which as you know is after scsi_io_completion/scsi_end_request is done with the cmnd. Right, we should probabl call -uninit_command at that place as well instead of calling it internall in -done. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 19/55] scsi: Mark function as static in isci/phy.c
On Saturday, March 29, 2014 7:04 PM Rashika Kheria rashika.khe...@gmail.com wrote: Mark function as static in isci/phy.c because it is not used outside this file. This eliminates the following warning in isci/phy.c: drivers/scsi/isci/phy.c:672:6: warning: no previous prototype for ‘scu_link_layer_set_txcomsas_timeout’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria rashika.khe...@gmail.com Reviewed-by: Josh Triplett j...@joshtriplett.org Acked-by: Lukasz Dorau lukasz.do...@intel.com --- drivers/scsi/isci/phy.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/isci/phy.c b/drivers/scsi/isci/phy.c index cb87b2e..8e21022 100644 --- a/drivers/scsi/isci/phy.c +++ b/drivers/scsi/isci/phy.c @@ -669,7 +669,8 @@ static const char *phy_event_name(u32 event_code) phy_state_name(state), phy_event_name(code), code) -void scu_link_layer_set_txcomsas_timeout(struct isci_phy *iphy, u32 timeout) +static void scu_link_layer_set_txcomsas_timeout(struct isci_phy *iphy, + u32 timeout) { u32 val; -- 1.7.9.5 N�r��yb�X��ǧv�^�){.n�+{{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: [PATCH 21/55] scsi: Mark function as static in isci/port.c
On Saturday, March 29, 2014 7:07 PM Rashika Kheria rashika.khe...@gmail.com wrote: Mark function as static in isci/port.c because they are not used outside this file. This eliminates the following warning in isci/port.c: drivers/scsi/isci/port.c:65:13: warning: no previous prototype for ‘port_state_name’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria rashika.khe...@gmail.com Reviewed-by: Josh Triplett j...@joshtriplett.org Acked-by: Lukasz Dorau lukasz.do...@intel.com --- drivers/scsi/isci/port.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/isci/port.c b/drivers/scsi/isci/port.c index 13098b0..225d39f 100644 --- a/drivers/scsi/isci/port.c +++ b/drivers/scsi/isci/port.c @@ -62,7 +62,7 @@ #undef C #define C(a) (#a) -const char *port_state_name(enum sci_port_states state) +static const char *port_state_name(enum sci_port_states state) { static const char * const strings[] = PORT_STATES; -- 1.7.9.5
RE: [PATCH 20/55] scsi: Mark function as static in isci/remote_device.c
On Saturday, March 29, 2014 7:05 PM Rashika Kheria rashika.khe...@gmail.com wrote: Mark function as static in isci/remote_device.c because it is not used outside this file. This eliminates the following warning in isci/remote_device.c: drivers/scsi/isci/remote_device.c:1387:6: warning: no previous prototype for ‘isci_remote_device_wait_for_resume_from_abort’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria rashika.khe...@gmail.com Reviewed-by: Josh Triplett j...@joshtriplett.org Acked-by: Lukasz Dorau lukasz.do...@intel.com --- drivers/scsi/isci/remote_device.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/isci/remote_device.c b/drivers/scsi/isci/remote_device.c index 96a26f4..33033fb 100644 --- a/drivers/scsi/isci/remote_device.c +++ b/drivers/scsi/isci/remote_device.c @@ -1384,7 +1384,7 @@ static bool isci_remote_device_test_resume_done( return done; } -void isci_remote_device_wait_for_resume_from_abort( +static void isci_remote_device_wait_for_resume_from_abort( struct isci_host *ihost, struct isci_remote_device *idev) { -- 1.7.9.5 N�r��yb�X��ǧv�^�){.n�+{{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
Re: [PATCH 19/55] scsi: Mark function as static in isci/phy.c
On Mon, Mar 31, 2014 at 08:54:49AM +, Dorau, Lukasz wrote: On Saturday, March 29, 2014 7:04 PM Rashika Kheria rashika.khe...@gmail.com wrote: Mark function as static in isci/phy.c because it is not used outside this file. This eliminates the following warning in isci/phy.c: drivers/scsi/isci/phy.c:672:6: warning: no previous prototype for ‘scu_link_layer_set_txcomsas_timeout’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria rashika.khe...@gmail.com Reviewed-by: Josh Triplett j...@joshtriplett.org Acked-by: Lukasz Dorau lukasz.do...@intel.com Since you're a maintainer of the driver in question, can you accept the three patches you acked, or would you suggest that they go through another tree? - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 19/55] scsi: Mark function as static in isci/phy.c
On Monday, March 31, 2014 11:36 AM Josh Triplett j...@joshtriplett.org wrote: On Mon, Mar 31, 2014 at 08:54:49AM +, Dorau, Lukasz wrote: On Saturday, March 29, 2014 7:04 PM Rashika Kheria rashika.khe...@gmail.com wrote: Mark function as static in isci/phy.c because it is not used outside this file. This eliminates the following warning in isci/phy.c: drivers/scsi/isci/phy.c:672:6: warning: no previous prototype for ‘scu_link_layer_set_txcomsas_timeout’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria rashika.khe...@gmail.com Reviewed-by: Josh Triplett j...@joshtriplett.org Acked-by: Lukasz Dorau lukasz.do...@intel.com Since you're a maintainer of the driver in question, can you accept the three patches you acked, or would you suggest that they go through another tree? - Josh Triplett Hi Josh, The isci patches should be accepted by James Bottomley. Lukasz
[Bug 71231] System unresponsable after a lot of LUNs have been added to the system
https://bugzilla.kernel.org/show_bug.cgi?id=71231 --- Comment #10 from Alex alexandernaum...@gmx.de --- The new kernel 3.14 works fine too. So 3.10 and 3.12 have this problem (and my problem is, that I must use the 3.10 line) -- You are receiving this mail because: You are the assignee for the bug. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] scsi: reintroduce scsi_driver.init_command
The above call would free the cmnd-cmnd and set it to null. If then scsi_io_completion was going to do some error processing it looks like it could try to access the scsi_cmnd-cmnd field. With the current code that would not be a problem because the blk unprep callback is not called until the block layer does its request cleanup in blk_finish_request which as you know is after scsi_io_completion/scsi_end_request is done with the cmnd. This incremental patches fixes the issue, and makes sure the uninit calls are nicely paired like the rest of the I/O completion routines after patch 2: diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 48c5c77..8e79612 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -490,8 +490,6 @@ static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd) struct request *req = cmd-request; unsigned long flags; - scsi_uninit_command(cmd); - spin_lock_irqsave(q-queue_lock, flags); blk_unprep_request(req); req-special = NULL; @@ -941,6 +939,7 @@ requeue: /* Unprep the request and put it back at the head of the queue. * A new command will be prepared and issued. */ + scsi_uninit_command(cmd); scsi_release_buffers(cmd); scsi_requeue_command(q, cmd); break; @@ -956,6 +955,7 @@ requeue: return; next_command: + scsi_uninit_command(cmd); scsi_release_buffers(cmd); scsi_next_command(cmd); } diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index d95c4fd..d99cb3f 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1652,8 +1652,6 @@ static int sd_done(struct scsi_cmnd *SCpnt) unsigned char op = SCpnt-cmnd[0]; unsigned char unmap = SCpnt-cmnd[1] 8; - sd_uninit_command(SCpnt); - if (req-cmd_flags REQ_DISCARD || req-cmd_flags REQ_WRITE_SAME) { if (!result) { good_bytes = blk_rq_bytes(req); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] [SCSI] Fix abort state memory problem
On Mon, 31 Mar 2014, Hannes Reinecke wrote: On 03/28/2014 06:49 PM, James Bottomley wrote: The abort state is being stored persistently across commands, meaning if the command times out a second time, the error handler thinks an abort is still pending. Fix this by properly resetting the abort flag after the abort is finished. Signed-off-by: James Bottomley jbottom...@parallels.com --- drivers/scsi/scsi_error.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 771c16b..b9f3b07 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -869,10 +869,13 @@ static int scsi_try_bus_device_reset(struct scsi_cmnd *scmd) static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_cmnd *scmd) { - if (!hostt-eh_abort_handler) - return FAILED; + int retval = FAILED; + + if (hostt-eh_abort_handler) + retval = hostt-eh_abort_handler(scmd); - return hostt-eh_abort_handler(scmd); + scmd-eh_eflags = ~SCSI_EH_ABORT_SCHEDULED; + return retval; } static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd) Hmm. No, I don't think this is correct. SCSI_EH_ABORT_SCHEDULED signifies whether scmd_eh_abort_handler() needs to run. As such it needs to be reset when the workqueue item is triggered. Can you try with this instead? diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 771c16b..9694803 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -120,6 +120,8 @@ scmd_eh_abort_handler(struct work_struct *work) struct scsi_device *sdev = scmd-device; int rtn; + scmd-eh_eflags = ~SCSI_EH_ABORT_SCHEDULED; + if (scsi_host_eh_past_deadline(sdev-host)) { SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, This doesn't seem like a good idea either, because the flag is tested later on in scsi_decide_disposition(). Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On Mon, 31 Mar 2014, Hannes Reinecke wrote: On 03/28/2014 08:29 PM, Alan Stern wrote: On Fri, 28 Mar 2014, James Bottomley wrote: This is a set of three patches we agreed to a while ago to eliminate a USB deadlock. I did rewrite the first patch, if it could be reviewed and tested. I tested all three patches under the same conditions as before, and they all worked correctly. In the revised patch 1, the meaning of SCSI_EH_ABORT_SCHEDULED isn't entirely clear. This boils down to two questions, which I don't know the answers to: What should happen in scmd_eh_abort_handler() if scsi_host_eh_past_deadline() returns true and thereby prevents scsi_try_to_abort_cmd() from being called? The flag wouldn't get cleared then. Ah. Correct. But that's due to the first patch being incorrect. Cf my response to the original first patch. See my response to your response. :-) What should happen if some other pathway manages to call scsi_try_to_abort_cmd() while scmd-abort_work is still sitting on the work queue? The command would be aborted and the flag would be cleared, but the queued work would remain. Can this ever happen? Not that I could see. A command abort is only ever triggered by the request timeout from the block layer. And the timer is _not_ rearmed once the timeout function (here: scsi_times_out()) is called. Hence I fail to see how it can be called concurrently. scsi_try_to_abort_cmd() is also called (via a different pathway) when a command sent by the error handler itself times out. I haven't traced through all the different paths to make sure none of them can run concurrently. But I'm willing to take your word for it. Maybe scmd_eh_abort_handler() should check the flag before doing anything. Is there any sort of sychronization to prevent the same incarnation of a command from being aborted twice (or by two different threads at the same time)? If there is, it isn't obvious. See above. scsi_times_out() will only ever called once. What can happen, though, is that _theoretically_ the LLDD might decide to call -done() on a timed out command when scsi_eh_abort_handler() is still pending. That's okay. We can expect the LLDD to have sufficient locking to handle that sort of thing without confusion (usb-storage does, for example). (Also, what's going on at the start of scsi_abort_command()? Contrary to what one might expect, the first part of the function _cancels_ a scheduled abort. And it does so without clearing the SCSI_EH_ABORT_SCHEDULED flag.) The original idea was this: SCSI_EH_ABORT_SCHEDULED is sticky _per command_. Point is, any command abort is only ever send for a timed-out command. And the main problem for a timed-out command is that we simply _do not_ know what happened for that command. So _if_ a command timed out, _and_ we've send an abort, _and_ the command times out _again_ we'll be running into an endless loop between timeout and aborting, and never returning the command at all. So to prevent this we should set a marker on that command telling it to _not_ try to abort the command again. I disagree. We _have_ to abort the command again -- how else can we stop a running command? To prevent the loop you described, we should avoid _retrying_ the command after it is aborted the second time. Which is what 'SCSI_EH_ABORT_SCHEDULED' was meant for: - A command times out, sets 'SCSI_EH_ABORT_SCHEDULED' - abort _succeeds_ - The same command times out a second time, notifies that SCSI_EH_ABORT_SCHEDULED is set, and doesn't call scsi_eh_abort_command() but rather escalates directly. The proper time to escalate is after the command is aborted again, not while the command is still running. The only situation where you might want to escalate while a command is still running would be if you were unable to abort the command. (Hmmm, maybe that's not true for SCSI devices in general. It is true for USB mass-storage, however. Perhaps the reset handlers in usb-storage should be changed so that they will automatically abort a running command before starting the reset.) _However_, I do feel that we've gotten the wrong track with all of this. In you original mail you stated: Basically, usb-storage deadlocks when the SCSI error handler invokes the eh_device_reset_handler callback while a command is running. The command has timed out and will never complete normally, because the device's firmware has crashed. But usb-storage's device-reset routine waits for the current command to finish, which brings everything to a standstill. Question now to you as the USB god: A command abort is only _ever_ send after a command timeout. And the main idea of the command abort is to remove any context the LLDD or the target might have. So by the time the command abort returns SUCCESS we _expect_ the firmware to have cleared that
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On 03/31/2014 03:33 PM, Alan Stern wrote: On Mon, 31 Mar 2014, Hannes Reinecke wrote: On 03/28/2014 08:29 PM, Alan Stern wrote: On Fri, 28 Mar 2014, James Bottomley wrote: This is a set of three patches we agreed to a while ago to eliminate a USB deadlock. I did rewrite the first patch, if it could be reviewed and tested. I tested all three patches under the same conditions as before, and they all worked correctly. In the revised patch 1, the meaning of SCSI_EH_ABORT_SCHEDULED isn't entirely clear. This boils down to two questions, which I don't know the answers to: What should happen in scmd_eh_abort_handler() if scsi_host_eh_past_deadline() returns true and thereby prevents scsi_try_to_abort_cmd() from being called? The flag wouldn't get cleared then. Ah. Correct. But that's due to the first patch being incorrect. Cf my response to the original first patch. See my response to your response. :-) Okay, So I probably should refrain from issueing a response to your response to my response lest infinite recursion happens :-) What should happen if some other pathway manages to call scsi_try_to_abort_cmd() while scmd-abort_work is still sitting on the work queue? The command would be aborted and the flag would be cleared, but the queued work would remain. Can this ever happen? Not that I could see. A command abort is only ever triggered by the request timeout from the block layer. And the timer is _not_ rearmed once the timeout function (here: scsi_times_out()) is called. Hence I fail to see how it can be called concurrently. scsi_try_to_abort_cmd() is also called (via a different pathway) when a command sent by the error handler itself times out. I haven't traced through all the different paths to make sure none of them can run concurrently. But I'm willing to take your word for it. Yes, but that's not calling scsi_abort_command(), but rather invokes scsi_abort_eh_cmnd(). Maybe scmd_eh_abort_handler() should check the flag before doing anything. Is there any sort of sychronization to prevent the same incarnation of a command from being aborted twice (or by two different threads at the same time)? If there is, it isn't obvious. See above. scsi_times_out() will only ever called once. What can happen, though, is that _theoretically_ the LLDD might decide to call -done() on a timed out command when scsi_eh_abort_handler() is still pending. That's okay. We can expect the LLDD to have sufficient locking to handle that sort of thing without confusion (usb-storage does, for example). (Also, what's going on at the start of scsi_abort_command()? Contrary to what one might expect, the first part of the function _cancels_ a scheduled abort. And it does so without clearing the SCSI_EH_ABORT_SCHEDULED flag.) The original idea was this: SCSI_EH_ABORT_SCHEDULED is sticky _per command_. Point is, any command abort is only ever send for a timed-out command. And the main problem for a timed-out command is that we simply _do not_ know what happened for that command. So _if_ a command timed out, _and_ we've send an abort, _and_ the command times out _again_ we'll be running into an endless loop between timeout and aborting, and never returning the command at all. So to prevent this we should set a marker on that command telling it to _not_ try to abort the command again. I disagree. We _have_ to abort the command again -- how else can we stop a running command? To prevent the loop you described, we should avoid _retrying_ the command after it is aborted the second time. The actual question is whether it's worth aborting the same command a second time. In principle any reset (like LUN reset etc) should clear the command, too. And the EH abort functionality is geared around this. If, for some reason, the transport layer / device driver requires a command abort to be send then sure, we need to accommodate for that. Which is what 'SCSI_EH_ABORT_SCHEDULED' was meant for: - A command times out, sets 'SCSI_EH_ABORT_SCHEDULED' - abort _succeeds_ - The same command times out a second time, notifies that SCSI_EH_ABORT_SCHEDULED is set, and doesn't call scsi_eh_abort_command() but rather escalates directly. The proper time to escalate is after the command is aborted again, not while the command is still running. The only situation where you might want to escalate while a command is still running would be if you were unable to abort the command. (Hmmm, maybe that's not true for SCSI devices in general. It is true for USB mass-storage, however. Perhaps the reset handlers in usb-storage should be changed so that they will automatically abort a running command before starting the reset.) As said, yes, in principle you are right. We should be aborting the command a second time, _and then_ starting the escalation. So if the above reasoning is okay then this patch should
[PATCH 4/4] blk-mq: support shared tag maps
--- block/blk-mq-tag.c |2 ++ block/blk-mq.c | 83 +++- block/blk-mq.h |2 ++ include/linux/blk-mq.h | 12 +++ 4 files changed, 91 insertions(+), 8 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 108f82b..a7b1888 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -121,6 +121,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, if (!tags) return NULL; + kref_init(tags-ref_count); + nr_tags = total_tags - reserved_tags; nr_cache = nr_tags / num_possible_cpus(); diff --git a/block/blk-mq.c b/block/blk-mq.c index f1b5d52..3d63d71 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1051,8 +1051,10 @@ void blk_mq_free_commands(struct request_queue *q, } EXPORT_SYMBOL(blk_mq_free_commands); -static void blk_mq_free_rq_map(struct blk_mq_tags *tags) +static void blk_mq_free_rq_map(struct kref *kref) { + struct blk_mq_tags *tags = + container_of(kref, struct blk_mq_tags, ref_count); struct page *page; while (!list_empty(tags-page_list)) { @@ -1066,6 +1068,17 @@ static void blk_mq_free_rq_map(struct blk_mq_tags *tags) blk_mq_free_tags(tags); } +static void blk_mq_put_rq_map(struct blk_mq_tags *tags) +{ + kref_put(tags-ref_count, blk_mq_free_rq_map); +} + +static struct blk_mq_tags *blk_mq_get_rq_map(struct blk_mq_tags *tags) +{ + kref_get(tags-ref_count); + return tags; +} + static size_t order_to_size(unsigned int order) { size_t ret = PAGE_SIZE; @@ -1144,7 +1157,7 @@ static struct blk_mq_tags *blk_mq_init_rq_map(unsigned int total_tags, fail: pr_warn(%s: failed to allocate requests\n, __func__); - blk_mq_free_rq_map(tags); + blk_mq_put_rq_map(tags); return NULL; } @@ -1178,10 +1191,14 @@ static int blk_mq_init_hw_queues(struct request_queue *q, blk_mq_hctx_notify, hctx); blk_mq_register_cpu_notifier(hctx-cpu_notifier); - hctx-tags = blk_mq_init_rq_map(hctx-queue_depth, - reg-reserved_tags, reg-cmd_size, node); - if (!hctx-tags) - break; + if (reg-shared_tags) { + hctx-tags = blk_mq_get_rq_map(reg-shared_tags-tags[i]); + } else { + hctx-tags = blk_mq_init_rq_map(hctx-queue_depth, + reg-reserved_tags, reg-cmd_size, node); + if (!hctx-tags) + break; + } /* * Allocate space for all possible cpus to avoid allocation in @@ -1221,7 +1238,7 @@ static int blk_mq_init_hw_queues(struct request_queue *q, blk_mq_unregister_cpu_notifier(hctx-cpu_notifier); if (hctx-tags) - blk_mq_free_rq_map(hctx-tags); + blk_mq_put_rq_map(hctx-tags); kfree(hctx-ctxs); } @@ -1399,7 +1416,7 @@ void blk_mq_free_queue(struct request_queue *q) kfree(hctx-ctx_map); kfree(hctx-ctxs); if (hctx-tags) - blk_mq_free_rq_map(hctx-tags); + blk_mq_put_rq_map(hctx-tags); blk_mq_unregister_cpu_notifier(hctx-cpu_notifier); if (q-mq_ops-exit_hctx) q-mq_ops-exit_hctx(hctx, i); @@ -1459,6 +1476,56 @@ static int blk_mq_queue_reinit_notify(struct notifier_block *nb, return NOTIFY_OK; } +struct blk_mq_shared_tags *blk_mq_alloc_shared_tags(struct blk_mq_reg *reg, + int (*init)(void *, struct request *), void *data) +{ + struct blk_mq_shared_tags *shared_tags; + int i, j; + + shared_tags = kmalloc_node(sizeof(*shared_tags) + + reg-nr_hw_queues * sizeof(struct blk_mq_tags), + GFP_KERNEL, reg-numa_node); + if (!shared_tags) + goto out; + + shared_tags-nr_hw_queues = reg-nr_hw_queues; + shared_tags-queue_depth = reg-queue_depth; + for (i = 0; i reg-nr_hw_queues; i++) { + shared_tags-tags[i] = blk_mq_init_rq_map(reg-queue_depth, + reg-reserved_tags, reg-cmd_size, reg-numa_node); + if (!shared_tags-tags[i]) + goto out_unwind; + + for (j = 0; j reg-queue_depth; j++) { + struct request *rq = shared_tags-tags[i]-rqs[j]; + int ret; + + ret = init(data, rq); + BUG_ON(ret); + } + } + + return shared_tags; + +out_unwind: + while (--i = 0) + blk_mq_put_rq_map(shared_tags-tags[i]); +out: + return NULL; +} + +void
[PATCH 2/4] blk-mq: initialize request on allocation
If we want to share tag and request allocation between queues we cannot initialize the request at init/free time, but need to initialize it at allocation time as it might get used for different queues over its lifetime. Signed-off-by: Christoph Hellwig h...@lst.de --- block/blk-mq.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 871acd6..ec0c276 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -82,6 +82,7 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_hw_ctx *hctx, tag = blk_mq_get_tag(hctx-tags, gfp, reserved); if (tag != BLK_MQ_TAG_FAIL) { rq = hctx-rqs[tag]; + blk_rq_init(hctx-queue, rq); rq-tag = tag; return rq; @@ -254,9 +255,7 @@ static void __blk_mq_free_request(struct blk_mq_hw_ctx *hctx, const int tag = rq-tag; struct request_queue *q = rq-q; - blk_rq_init(hctx-queue, rq); blk_mq_put_tag(hctx-tags, tag); - blk_mq_queue_exit(q); } @@ -1128,7 +1127,6 @@ static int blk_mq_init_rq_map(struct blk_mq_hw_ctx *hctx, left -= to_do * rq_size; for (j = 0; j to_do; j++) { hctx-rqs[i] = p; - blk_rq_init(hctx-queue, hctx-rqs[i]); p += rq_size; i++; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] blk-mq: move request structures into struct blk_mq_tags
This is in preparation for allowing to share the tags, and thus request allocation between multiple queues. Also remove blk_mq_tag_to_rq, as it was unused and thus untestable. If we need it back it can easil be re-added as a non-inline function. Note that we also now straight out fail queue initialization if we can't allocate tags - keeping track of a reduced queue_depth over a more complex call chain isn't easil possible and this shouldn't happen on an of todays systems. Signed-off-by: Christoph Hellwig h...@lst.de --- block/blk-mq-tag.c | 13 block/blk-mq.c | 84 +--- block/blk-mq.h | 18 +++ include/linux/blk-mq.h |8 - 4 files changed, 61 insertions(+), 62 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 83ae96c..108f82b 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -7,19 +7,6 @@ #include blk-mq.h #include blk-mq-tag.h -/* - * Per tagged queue (tag address space) map - */ -struct blk_mq_tags { - unsigned int nr_tags; - unsigned int nr_reserved_tags; - unsigned int nr_batch_move; - unsigned int nr_max_cache; - - struct percpu_ida free_tags; - struct percpu_ida reserved_tags; -}; - void blk_mq_wait_for_tags(struct blk_mq_tags *tags) { int tag = blk_mq_get_tag(tags, __GFP_WAIT, false); diff --git a/block/blk-mq.c b/block/blk-mq.c index ec0c276..f1b5d52 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -81,7 +81,7 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_hw_ctx *hctx, tag = blk_mq_get_tag(hctx-tags, gfp, reserved); if (tag != BLK_MQ_TAG_FAIL) { - rq = hctx-rqs[tag]; + rq = hctx-tags-rqs[tag]; blk_rq_init(hctx-queue, rq); rq-tag = tag; @@ -406,7 +406,9 @@ static void blk_mq_timeout_check(void *__data, unsigned long *free_tags) if (tag = hctx-queue_depth) break; - rq = hctx-rqs[tag++]; + rq = hctx-tags-rqs[tag++]; + if (rq-q != hctx-queue) + continue; if (!test_bit(REQ_ATOM_STARTED, rq-atomic_flags)) continue; @@ -993,7 +995,7 @@ static int blk_mq_init_hw_commands(struct blk_mq_hw_ctx *hctx, int ret = 0; for (i = 0; i hctx-queue_depth; i++) { - struct request *rq = hctx-rqs[i]; + struct request *rq = hctx-tags-rqs[i]; ret = init(data, hctx, rq, i); if (ret) @@ -1030,7 +1032,7 @@ static void blk_mq_free_hw_commands(struct blk_mq_hw_ctx *hctx, unsigned int i; for (i = 0; i hctx-queue_depth; i++) { - struct request *rq = hctx-rqs[i]; + struct request *rq = hctx-tags-rqs[i]; free(data, hctx, rq, i); } @@ -1049,20 +1051,19 @@ void blk_mq_free_commands(struct request_queue *q, } EXPORT_SYMBOL(blk_mq_free_commands); -static void blk_mq_free_rq_map(struct blk_mq_hw_ctx *hctx) +static void blk_mq_free_rq_map(struct blk_mq_tags *tags) { struct page *page; - while (!list_empty(hctx-page_list)) { - page = list_first_entry(hctx-page_list, struct page, lru); + while (!list_empty(tags-page_list)) { + page = list_first_entry(tags-page_list, struct page, lru); list_del_init(page-lru); __free_pages(page, page-private); } - kfree(hctx-rqs); + kfree(tags-rqs); - if (hctx-tags) - blk_mq_free_tags(hctx-tags); + blk_mq_free_tags(tags); } static size_t order_to_size(unsigned int order) @@ -1075,28 +1076,35 @@ static size_t order_to_size(unsigned int order) return ret; } -static int blk_mq_init_rq_map(struct blk_mq_hw_ctx *hctx, - unsigned int reserved_tags, int node) +static struct blk_mq_tags *blk_mq_init_rq_map(unsigned int total_tags, + unsigned int reserved_tags, unsigned int cmd_size, int node) { + struct blk_mq_tags *tags; unsigned int i, j, entries_per_page, max_order = 4; size_t rq_size, left; - INIT_LIST_HEAD(hctx-page_list); + tags = blk_mq_init_tags(total_tags, reserved_tags, node); + if (!tags) + return NULL; + + INIT_LIST_HEAD(tags-page_list); - hctx-rqs = kmalloc_node(hctx-queue_depth * sizeof(struct request *), + tags-rqs = kmalloc_node(total_tags * sizeof(struct request *), GFP_KERNEL, node); - if (!hctx-rqs) - return -ENOMEM; + if (!tags-rqs) { + blk_mq_free_tags(tags); + return NULL; + } /* * rq_size is the size of the request plus driver payload, rounded * to the cacheline size */ - rq_size = round_up(sizeof(struct
[PATCH 1/4] blk-mq: stop pre-initializing req-special
We can get at the private data easil using pointer arithmetics. Do so instead of initializing req-special so that we don't rely on the request state in various initialization functions and shave off another few instructions in the fast path. Signed-off-by: Christoph Hellwig h...@lst.de --- block/blk-flush.c | 10 ++ block/blk-mq.c | 15 ++- block/blk-mq.h |1 - drivers/block/null_blk.c |4 ++-- drivers/block/virtio_blk.c |6 +++--- 5 files changed, 9 insertions(+), 27 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index 43e6b47..9a0c427 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -306,22 +306,16 @@ static bool blk_kick_flush(struct request_queue *q) */ q-flush_pending_idx ^= 1; + blk_rq_init(q, q-flush_rq); if (q-mq_ops) { - struct blk_mq_ctx *ctx = first_rq-mq_ctx; - struct blk_mq_hw_ctx *hctx = q-mq_ops-map_queue(q, ctx-cpu); - - blk_mq_rq_init(hctx, q-flush_rq); - q-flush_rq-mq_ctx = ctx; - /* * Reuse the tag value from the fist waiting request, * with blk-mq the tag is generated during request * allocation and drivers can rely on it being inside * the range they asked for. */ + q-flush_rq-mq_ctx = first_rq-mq_ctx; q-flush_rq-tag = first_rq-tag; - } else { - blk_rq_init(q, q-flush_rq); } q-flush_rq-cmd_type = REQ_TYPE_FS; diff --git a/block/blk-mq.c b/block/blk-mq.c index 4274ee0..871acd6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -248,24 +248,13 @@ struct request *blk_mq_alloc_reserved_request(struct request_queue *q, int rw, } EXPORT_SYMBOL(blk_mq_alloc_reserved_request); -/* - * Re-init and set pdu, if we have it - */ -void blk_mq_rq_init(struct blk_mq_hw_ctx *hctx, struct request *rq) -{ - blk_rq_init(hctx-queue, rq); - - if (hctx-cmd_size) - rq-special = blk_mq_rq_to_pdu(rq); -} - static void __blk_mq_free_request(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct request *rq) { const int tag = rq-tag; struct request_queue *q = rq-q; - blk_mq_rq_init(hctx, rq); + blk_rq_init(hctx-queue, rq); blk_mq_put_tag(hctx-tags, tag); blk_mq_queue_exit(q); @@ -1139,7 +1128,7 @@ static int blk_mq_init_rq_map(struct blk_mq_hw_ctx *hctx, left -= to_do * rq_size; for (j = 0; j to_do; j++) { hctx-rqs[i] = p; - blk_mq_rq_init(hctx, hctx-rqs[i]); + blk_rq_init(hctx-queue, hctx-rqs[i]); p += rq_size; i++; } diff --git a/block/blk-mq.h b/block/blk-mq.h index ebbe6ba..238379a 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -27,7 +27,6 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async); void blk_mq_init_flush(struct request_queue *q); void blk_mq_drain_queue(struct request_queue *q); void blk_mq_free_queue(struct request_queue *q); -void blk_mq_rq_init(struct blk_mq_hw_ctx *hctx, struct request *rq); /* * CPU hotplug helpers diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index 091b9ea..71df69d 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -226,7 +226,7 @@ static void null_cmd_end_timer(struct nullb_cmd *cmd) static void null_softirq_done_fn(struct request *rq) { - end_cmd(rq-special); + end_cmd(blk_mq_rq_to_pdu(rq)); } static inline void null_handle_cmd(struct nullb_cmd *cmd) @@ -311,7 +311,7 @@ static void null_request_fn(struct request_queue *q) static int null_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *rq) { - struct nullb_cmd *cmd = rq-special; + struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq); cmd-rq = rq; cmd-nq = hctx-driver_data; diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 0eace43..11e8f4b 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -112,7 +112,7 @@ static int __virtblk_add_req(struct virtqueue *vq, static inline void virtblk_request_done(struct request *req) { - struct virtblk_req *vbr = req-special; + struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); int error = virtblk_result(vbr); if (req-cmd_type == REQ_TYPE_BLOCK_PC) { @@ -154,7 +154,7 @@ static void virtblk_done(struct virtqueue *vq) static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req) { struct virtio_blk *vblk = hctx-queue-queuedata; - struct virtblk_req *vbr = req-special; + struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); unsigned long flags; unsigned int num; const bool last = (req-cmd_flags REQ_END) != 0;
[RFC] blk-mq: support for shared tags
This series adds support for sharing tags (and thus requests) between multiple request_queues. We'll need this for SCSI, and I think Martin also wants something similar for nvme. Besides the mess with request contructors/destructors the major RFC here is how the blk_mq_alloc_shared_tags API should look like. For now I've been lazy and reused struct blk_mq_reg, but that feels a bit cumbersome. Either a separate blk_mq_tags_reg or just passing the few arguments directly would work fine for me. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
[lets split the thread] On Mon, 2014-03-31 at 16:37 +0200, Hannes Reinecke wrote: On 03/31/2014 03:33 PM, Alan Stern wrote: On Mon, 31 Mar 2014, Hannes Reinecke wrote: On 03/28/2014 08:29 PM, Alan Stern wrote: On Fri, 28 Mar 2014, James Bottomley wrote: Maybe scmd_eh_abort_handler() should check the flag before doing anything. Is there any sort of sychronization to prevent the same incarnation of a command from being aborted twice (or by two different threads at the same time)? If there is, it isn't obvious. See above. scsi_times_out() will only ever called once. What can happen, though, is that _theoretically_ the LLDD might decide to call -done() on a timed out command when scsi_eh_abort_handler() is still pending. That's okay. We can expect the LLDD to have sufficient locking to handle that sort of thing without confusion (usb-storage does, for example). (Also, what's going on at the start of scsi_abort_command()? Contrary to what one might expect, the first part of the function _cancels_ a scheduled abort. And it does so without clearing the SCSI_EH_ABORT_SCHEDULED flag.) The original idea was this: SCSI_EH_ABORT_SCHEDULED is sticky _per command_. Point is, any command abort is only ever send for a timed-out command. And the main problem for a timed-out command is that we simply _do not_ know what happened for that command. So _if_ a command timed out, _and_ we've send an abort, _and_ the command times out _again_ we'll be running into an endless loop between timeout and aborting, and never returning the command at all. So to prevent this we should set a marker on that command telling it to _not_ try to abort the command again. I disagree. We _have_ to abort the command again -- how else can we stop a running command? To prevent the loop you described, we should avoid _retrying_ the command after it is aborted the second time. The actual question is whether it's worth aborting the same command a second time. In principle any reset (like LUN reset etc) should clear the command, too. And the EH abort functionality is geared around this. If, for some reason, the transport layer / device driver requires a command abort to be send then sure, we need to accommodate for that. We already discussed this (that was my first response too). USB needs all outstanding commands aborted before proceeding to reset. I'm starting to think the actual way to fix this is to reset the abort scheduled only if we send something else, so this might be a better fix. It doesn't matter if we finish a command with abort scheduled because the command then gets freed and the flags cleaned. We can take our time with this because the other two patches, which I can send separately fix the current deadlock (we no longer send an unaborted request sense before the reset), and it can go via rc fixes. James --- diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 771c16b..3cfd2bf 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1007,6 +1007,12 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, const unsigned long stall_for = msecs_to_jiffies(100); int rtn; + /* +* We're sending another command we'll need to abort, so reset the +* abort scheduled flag on the real command before we save its state +*/ + scmd-eh_eflags = ~SCSI_EH_ABORT_SCHEDULED; + retry: scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes); shost-eh_action = done; -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] tcm_qla2xxx: T10-Dif set harware capability
Regards, Quinn Tran On 3/28/14 5:12 PM, sagi grimberg sa...@mellanox.com wrote: On 3/29/2014 2:05 AM, Quinn Tran wrote: Set Protection Type(1,2,3) capabilities, Guarg type (CRC/IPchksm) capabilities bits to let TCM core knows of HW/fabric capabilities. Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org Signed-off-by: Giridhar Malavali giridhar.malav...@qlogic.com --- drivers/scsi/qla2xxx/tcm_qla2xxx.c | 23 +++ drivers/scsi/qla2xxx/tcm_qla2xxx.h | 1 + 2 files changed, 24 insertions(+) diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c index b23a0ff..4d93081 100644 --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c @@ -910,12 +910,20 @@ DEF_QLA_TPG_ATTR_BOOL(demo_mode_login_only); DEF_QLA_TPG_ATTRIB(demo_mode_login_only); QLA_TPG_ATTR(demo_mode_login_only, S_IRUGO | S_IWUSR); +/* + * Define tcm_qla2xxx_tpg_attrib_s_t10dif_force_on + */ +DEF_QLA_TPG_ATTR_BOOL(t10dif_force_on); +DEF_QLA_TPG_ATTRIB(t10dif_force_on); +QLA_TPG_ATTR(t10dif_force_on, S_IRUGO | S_IWUSR); + static struct configfs_attribute *tcm_qla2xxx_tpg_attrib_attrs[] = { tcm_qla2xxx_tpg_attrib_generate_node_acls.attr, tcm_qla2xxx_tpg_attrib_cache_dynamic_acls.attr, tcm_qla2xxx_tpg_attrib_demo_mode_write_protect.attr, tcm_qla2xxx_tpg_attrib_prod_mode_write_protect.attr, tcm_qla2xxx_tpg_attrib_demo_mode_login_only.attr, +tcm_qla2xxx_tpg_attrib_t10dif_force_on.attr, NULL, }; @@ -1049,6 +1057,18 @@ static struct se_portal_group *tcm_qla2xxx_make_tpg( tpg-tpg_attrib.demo_mode_write_protect = 1; tpg-tpg_attrib.cache_dynamic_acls = 1; tpg-tpg_attrib.demo_mode_login_only = 1; +tpg-tpg_attrib.t10dif_force_on = 0; +tpg-se_tpg.fabric_sup_prot_type = 0; +tpg-se_tpg.fabric_sup_guard_type = 0; You can lose guard_type - this is more relevant to the initiator side. QT OK + +if (scsi_host_get_prot(lport-qla_vha-host)) { +tpg-se_tpg.fabric_sup_prot_type = (TARGET_DIF_TYPE0_PROT| +TARGET_DIF_TYPE1_PROT|TARGET_DIF_TYPE2_PROT| +TARGET_DIF_TYPE3_PROT); + +tpg-se_tpg.fabric_sup_guard_type = TARGET_GUARD_CRC| +TARGET_GUARD_IP; +} ret = core_tpg_register(tcm_qla2xxx_fabric_configfs-tf_ops, wwn, tpg-se_tpg, tpg, TRANSPORT_TPG_TYPE_NORMAL); @@ -1127,6 +1147,8 @@ static ssize_t tcm_qla2xxx_npiv_tpg_store_enable( qlt_stop_phase1(vha-vha_tgt.qla_tgt); } +core_tpg_set_fabric_t10dif(se_tpg, tpg-tpg_attrib.t10dif_force_on); + Any way we can get this logic to be shared also with iscsi, srp, etc... all fabrics should be set with t10dif right? so I would imagine it would be better to centralize it right? QT Not sure how you want this logic to be shared. This patch is specific to Qlogic driver registering its capabilities. return count; } @@ -1169,6 +1191,7 @@ static struct se_portal_group *tcm_qla2xxx_npiv_make_tpg( tpg-tpg_attrib.demo_mode_write_protect = 1; tpg-tpg_attrib.cache_dynamic_acls = 1; tpg-tpg_attrib.demo_mode_login_only = 1; +tpg-tpg_attrib.t10dif_force_on = 0; ret = core_tpg_register(tcm_qla2xxx_npiv_fabric_configfs-tf_ops, wwn, tpg-se_tpg, tpg, TRANSPORT_TPG_TYPE_NORMAL); diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.h b/drivers/scsi/qla2xxx/tcm_qla2xxx.h index 33aaac8..62fdce3 100644 --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.h +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.h @@ -28,6 +28,7 @@ struct tcm_qla2xxx_tpg_attrib { int demo_mode_write_protect; int prod_mode_write_protect; int demo_mode_login_only; +int t10dif_force_on; }; struct tcm_qla2xxx_tpg { This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. attachment: winmail.dat
Re: [PATCH 4/4] target/rd: T10-Dif: RAM disk is allocating more space than required.
Regards, Quinn Tran On 3/28/14 5:22 PM, sagi grimberg sa...@mellanox.com wrote: On 3/29/2014 2:05 AM, Quinn Tran wrote: Ram disk is allocating 8x more space than required for diff data. For large RAM disk test, there is small potential for memory starvation. Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org Signed-off-by: Giridhar Malavali giridhar.malav...@qlogic.com --- drivers/target/target_core_rd.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c index 01dda0b..6df177a 100644 --- a/drivers/target/target_core_rd.c +++ b/drivers/target/target_core_rd.c @@ -253,7 +253,11 @@ static int rd_build_prot_space(struct rd_dev *rd_dev, int prot_length) if (rd_dev-rd_flags RDF_NULLIO) return 0; -total_sg_needed = rd_dev-rd_page_count / prot_length; +/* prot_length=8byte dif data + * tot sg needed = rd_page_count * (PGSZ/512) * (prot_length/PGSZ) + pad + * PGSZ canceled each other. + */ +total_sg_needed = (rd_dev-rd_page_count * prot_length / 512) +1; You probably will want to use block_size instead of hard-coding 512. Other then that this makes sense. QT agreed sg_tables = (total_sg_needed / max_sg_per_table) + 1; This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. attachment: winmail.dat
scsi_debug driver puzzle
Hello I have what to me is a puzzle but is likely a stupid question about the queuecommand interface in the scsi_debug driver. I see the host template set for scsi_debug_queuecommand but in the driver we have the function declared as int scsi_debug_queuecommand_lck So how is this working. Egrep pattern: scsi_debug_queuecommand File Line 0 scsi_debug.c 3551 int scsi_debug_queuecommand_lck(struct scsi_cmnd *SCpnt, done_funct_t done) 1 scsi_debug.c 3900 static DEF_SCSI_QCMD(scsi_debug_queuecommand) 2 scsi_debug.c 3912 .queuecommand = scsi_debug_queuecommand, Thanks Laurence -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] target/core: T10-Dif: check HW support capabilities
Regards, Quinn Tran On 3/28/14 6:24 PM, sagi grimberg sa...@mellanox.com wrote: On 3/29/2014 3:53 AM, Quinn Tran wrote: + +if (dev-dev_attrib.pi_prot_type) { +uint32_t cap[] = { 0, + TARGET_DIF_TYPE1_PROTECTION, + TARGET_DIF_TYPE2_PROTECTION, + TARGET_DIF_TYPE3_PROTECTION}; +uint32_t pt_bits = cap[dev-dev_attrib.pi_prot_type]; +pt_bits = se_tpg-fabric_sup_prot_type; At what point should the fabric fill that? it can vary between portals right? QT Yes, protection mode can vary between portals. When user select the physical function (via fabric_make_tpg), you know the specific portal based on user input and its capability. This is where Qlogic register its capabilities based on specific hardware. I would prefer to do that as a function pointer to explicitly ask the fabric it's support. QT is it still require with previous answer ? Well, I think it is nicer to explicitly ask the fabric at that point instead of checking what it previously set. By the way, I think this patch breaks existing iSER support as iSER doesn't set these bits. Thats why I think it would be a good idea to *explicitly* ask. QT I see. No issue with converting to a callback. +pr_err(dev[%p]: DIF protection mismatch with fabric +(%s). Transport capability bits[0x%x]\n, +dev, se_tpg-se_tpg_wwn-wwn_group.cg_item.ci_name, +se_tpg-fabric_sup_prot_type); +return -EFAULT; Didn't we agree that this is allowed and the target core should compensate on the lack fabric support? QT My recollection was that it's not recommended to have different portals with different supported feature. Well we seem to remember different things... Anyway I think it is better to compensate that in backstore/target-core level, that would be better than silently turn off protection. Martin? Nic? your takes? QT the error return above fail the binding (ln -sf backend disk fabric LUN) between the back disk and the frontend/fabric LUN representation. The failure happens during configuration time. The commented out code imply a silent turn off. Sorry should have clean it out. Also I don't know what rats are hiding here if the backstore is handling IOs in this time... What about cleaning up all the protection resources the backstore driver might be managing? QT hmm. It's a big hammer. I'll let the other folks chime in on this because it affect user's interaction. Nicholas ? Martin? Meaning a SCSI Write without Dif down a none-T10PI portal update the data. The Guard on the disk is now mismatch with the data. A SCSI Read down a T10PI path will run into a Guard failure. By introducing this option, Disk vendor require additional logic to compensate for this. I think it's cheaper to have supported hardware rather than support nightmare. +} +} + if (lun-lun_se_dev != NULL) { pr_err(Port Symlink already exists\n); return -EEXIST; diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index c036595..9279971 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -632,6 +632,15 @@ int core_tpg_set_initiator_node_tag( } EXPORT_SYMBOL(core_tpg_set_initiator_node_tag); +void core_tpg_set_fabric_t10dif( +struct se_portal_group *tpg, +uint32_t fabric_t10dif_force_on) +{ +tpg-fabric_t10dif_force_on = fabric_t10dif_force_on; +} +EXPORT_SYMBOL(core_tpg_set_fabric_t10dif); + Is there a user for this function in this patch? QT I'm on the fence with this piece. Just thinking of a case where DIX is not available for initiator side, but user wants to turn on protection at the link layer. Our test folks would like to cover this case in the future. Not sure I understand. Initiator will send the target data+protection (DIX disabled HBA does INSERT), why does this help? Why should the target fabric care who generated the protection (OS or HBA)? QT Sorry for the confusion. The case I'm trying to get at is Data In Insert and Data out strip.In this case, the protection starts from front end target adapter to the back end storage. In revisit your previous patch, this case is not covered. Sagi. This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. attachment: winmail.dat
Re: scsi_debug driver puzzle
On 03/31/2014 06:32 PM, Laurence Oberman wrote: File Line 0 scsi_debug.c 3551 int scsi_debug_queuecommand_lck(struct scsi_cmnd *SCpnt, done_funct_t done) 1 scsi_debug.c 3900 static DEF_SCSI_QCMD(scsi_debug_queuecommand) 2 scsi_debug.c 3912 .queuecommand = scsi_debug_queuecommand, Magical scsi_host.h macro as part of the effort to push host_lock down. The macro creates a function named as its argument which takes and drops the shost-host_lock around a call to the real queuecommand function: spin_lock_irqsave(shost-host_lock, irq_flags); scsi_cmd_get_serial(shost, cmd); rc = func_name##_lck (cmd,cmd-scsi_done); spin_unlock_irqrestore(shost-host_lock, irq_flags); http://fpaste.org/90345/88875139/ Cheers, Bryn. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On Mon, 31 Mar 2014, Hannes Reinecke wrote: Ah. Correct. But that's due to the first patch being incorrect. Cf my response to the original first patch. See my response to your response. :-) Okay, So I probably should refrain from issueing a response to your response to my response lest infinite recursion happens :-) Indeed. What should happen if some other pathway manages to call scsi_try_to_abort_cmd() while scmd-abort_work is still sitting on the work queue? The command would be aborted and the flag would be cleared, but the queued work would remain. Can this ever happen? Not that I could see. A command abort is only ever triggered by the request timeout from the block layer. And the timer is _not_ rearmed once the timeout function (here: scsi_times_out()) is called. Hence I fail to see how it can be called concurrently. scsi_try_to_abort_cmd() is also called (via a different pathway) when a command sent by the error handler itself times out. I haven't traced through all the different paths to make sure none of them can run concurrently. But I'm willing to take your word for it. Yes, but that's not calling scsi_abort_command(), but rather invokes scsi_abort_eh_cmnd(). Sure. But either way, we end up in scsi_try_to_abort_cmd(), which calls the LLDD's abort handler. Thus leading to the possibility of aborting the same command more than once. The original idea was this: SCSI_EH_ABORT_SCHEDULED is sticky _per command_. Point is, any command abort is only ever send for a timed-out command. And the main problem for a timed-out command is that we simply _do not_ know what happened for that command. So _if_ a command timed out, _and_ we've send an abort, _and_ the command times out _again_ we'll be running into an endless loop between timeout and aborting, and never returning the command at all. So to prevent this we should set a marker on that command telling it to _not_ try to abort the command again. I disagree. We _have_ to abort the command again -- how else can we stop a running command? To prevent the loop you described, we should avoid _retrying_ the command after it is aborted the second time. The actual question is whether it's worth aborting the same command a second time. In principle any reset (like LUN reset etc) should clear the command, too. And the EH abort functionality is geared around this. If, for some reason, the transport layer / device driver requires a command abort to be send then sure, we need to accommodate for that. As James mentioned, with USB a reset does not abort a running command. Instead it waits for the command to finish. (However, this could be changed in usb-storage, if required.) As said, yes, in principle you are right. We should be aborting the command a second time, _and then_ starting the escalation. So if the above reasoning is okay then this patch should be doing the trick: diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 771c16b..0e72374 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -189,6 +189,7 @@ scsi_abort_command(struct scsi_cmnd *scmd) /* * Retry after abort failed, escalate to next level. */ + scmd-eh_eflags = ~SCSI_EH_ABORT_SCHEDULED; SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, scmd %p previous abort failed\n, scmd)); (Beware of line breaks) Can you test with it? I don't understand. This doesn't solve the fundamental problem (namely that you escalate before aborting a running command). All it does is clear the SCSI_EH_ABORT_SCHEDULED flag before escalating. What's needed is something like a 2-bit abort counter. scsi_try_to_abort_cmd() should increment the counter each time it runs, and if scmd_eh_abort_handler() sees that the counter is too high, it should avoid its retry pathway. _Then_ you can move on to something more drastic. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On Mon, 31 Mar 2014, James Bottomley wrote: The actual question is whether it's worth aborting the same command a second time. In principle any reset (like LUN reset etc) should clear the command, too. And the EH abort functionality is geared around this. If, for some reason, the transport layer / device driver requires a command abort to be send then sure, we need to accommodate for that. We already discussed this (that was my first response too). USB needs all outstanding commands aborted before proceeding to reset. I'm starting to think the actual way to fix this is to reset the abort scheduled only if we send something else, so this might be a better fix. It doesn't matter if we finish a command with abort scheduled because the command then gets freed and the flags cleaned. We can take our time with this because the other two patches, which I can send separately fix the current deadlock (we no longer send an unaborted request sense before the reset), and it can go via rc fixes. James --- diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 771c16b..3cfd2bf 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1007,6 +1007,12 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, const unsigned long stall_for = msecs_to_jiffies(100); int rtn; + /* + * We're sending another command we'll need to abort, so reset the + * abort scheduled flag on the real command before we save its state + */ + scmd-eh_eflags = ~SCSI_EH_ABORT_SCHEDULED; + retry: scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes); shost-eh_action = done; I don't know if Hannes will like this, but I don't think it is right. This is not the retry pathway that's causing problems; that pathway begins where scmd_eh_abort_handler() calls scsi_queue_insert(). Now, that call is already guarded by this test: if (... (++scmd-retries = scmd-allowed)) { which would seem to prevent the infinite loop that Hannes was concerned about. So maybe the 1/3 patch posted a couple of days ago is the best solution after all. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] target/rd: T10-Dif: RAM disk is allocating more space than required.
Hi Quinn Co, On Mon, 2014-03-31 at 16:15 +, Quinn Tran wrote: On 3/28/14 5:22 PM, sagi grimberg sa...@mellanox.com wrote: On 3/29/2014 2:05 AM, Quinn Tran wrote: Ram disk is allocating 8x more space than required for diff data. For large RAM disk test, there is small potential for memory starvation. Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org Signed-off-by: Giridhar Malavali giridhar.malav...@qlogic.com --- drivers/target/target_core_rd.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c index 01dda0b..6df177a 100644 --- a/drivers/target/target_core_rd.c +++ b/drivers/target/target_core_rd.c @@ -253,7 +253,11 @@ static int rd_build_prot_space(struct rd_dev *rd_dev, int prot_length) if (rd_dev-rd_flags RDF_NULLIO) return 0; -total_sg_needed = rd_dev-rd_page_count / prot_length; +/* prot_length=8byte dif data + * tot sg needed = rd_page_count * (PGSZ/512) * (prot_length/PGSZ) + pad + * PGSZ canceled each other. + */ +total_sg_needed = (rd_dev-rd_page_count * prot_length / 512) +1; You probably will want to use block_size instead of hard-coding 512. Other then that this makes sense. QT agreed Applying the following updated patch to target-pending/for-next, along with Sagi's comment wrt to block_size. Also adding your missing Signed-off-by. Please make sure to include these in future patches. ;) Thanks! --nab commit 435b88ba4a38adc39842957610b27a8d0fb4584b Author: Quinn Tran quinn.t...@qlogic.com Date: Fri Mar 28 19:05:27 2014 -0400 target/rd: T10-Dif: RAM disk is allocating more space than required. Ram disk is allocating 8x more space than required for diff data. For large RAM disk test, there is small potential for memory starvation. (Use block_size when calculating total_sg_needed - sagi + nab) Signed-off-by: Giridhar Malavali giridhar.malav...@qlogic.com Signed-off-by: Quinn Tran quinn.t...@qlogic.com Cc: sta...@vger.kernel.org #3.14+ Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c index 66a5aba..b920db3 100644 --- a/drivers/target/target_core_rd.c +++ b/drivers/target/target_core_rd.c @@ -242,7 +242,7 @@ static void rd_release_prot_space(struct rd_dev *rd_dev) rd_dev-sg_prot_count = 0; } -static int rd_build_prot_space(struct rd_dev *rd_dev, int prot_length) +static int rd_build_prot_space(struct rd_dev *rd_dev, int prot_length, int block_size) { struct rd_dev_sg_table *sg_table; u32 total_sg_needed, sg_tables; @@ -252,8 +252,13 @@ static int rd_build_prot_space(struct rd_dev *rd_dev, int prot_length) if (rd_dev-rd_flags RDF_NULLIO) return 0; - - total_sg_needed = rd_dev-rd_page_count / prot_length; + /* +* prot_length=8byte dif data +* tot sg needed = rd_page_count * (PGSZ/block_size) * +* (prot_length/block_size) + pad +* PGSZ canceled each other. +*/ + total_sg_needed = (rd_dev-rd_page_count * prot_length / block_size) + 1; sg_tables = (total_sg_needed / max_sg_per_table) + 1; @@ -606,7 +611,8 @@ static int rd_init_prot(struct se_device *dev) if (!dev-dev_attrib.pi_prot_type) return 0; - return rd_build_prot_space(rd_dev, dev-prot_length); + return rd_build_prot_space(rd_dev, dev-prot_length, + dev-dev_attrib.block_size); } static void rd_free_prot(struct se_device *dev) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] target/core: T10-Dif: check HW support capabilities
On Sat, 2014-03-29 at 04:24 +0300, sagi grimberg wrote: On 3/29/2014 3:53 AM, Quinn Tran wrote: + +if (dev-dev_attrib.pi_prot_type) { +uint32_t cap[] = { 0, + TARGET_DIF_TYPE1_PROTECTION, + TARGET_DIF_TYPE2_PROTECTION, + TARGET_DIF_TYPE3_PROTECTION}; +uint32_t pt_bits = cap[dev-dev_attrib.pi_prot_type]; +pt_bits = se_tpg-fabric_sup_prot_type; At what point should the fabric fill that? it can vary between portals right? QT Yes, protection mode can vary between portals. When user select the physical function (via fabric_make_tpg), you know the specific portal based on user input and its capability. This is where Qlogic register its capabilities based on specific hardware. I would prefer to do that as a function pointer to explicitly ask the fabric it's support. QT is it still require with previous answer ? Well, I think it is nicer to explicitly ask the fabric at that point instead of checking what it previously set. I'm currently working on a series that explicitly queries the fabric for what PI modes are supported, as per our LSF discussion. By the way, I think this patch breaks existing iSER support as iSER doesn't set these bits. Thats why I think it would be a good idea to *explicitly* ask. nod +pr_err(dev[%p]: DIF protection mismatch with fabric +(%s). Transport capability bits[0x%x]\n, +dev, se_tpg-se_tpg_wwn-wwn_group.cg_item.ci_name, +se_tpg-fabric_sup_prot_type); +return -EFAULT; Didn't we agree that this is allowed and the target core should compensate on the lack fabric support? QT My recollection was that it's not recommended to have different portals with different supported feature. Well we seem to remember different things... Anyway I think it is better to compensate that in backstore/target-core level, that would be better than silently turn off protection. Martin? Nic? your takes? At the BoF we concluded that when a backend has PI enabled, but the fabric does not support PI, that target-core should strip off the INQUIRY + other control bits that normally indicate protection, but only on that particular path (eg: session). This would allow different iSCSI network portals to set a se_session bit at login time in order to indicate if/when protection is supported at the fabric nexus level. If it wasn't for iscsi-target / iser-target sharing the same endpoint across different fabrics, the PI information could simply be queried on a per /sys/kernel/config/target/$FABRIC/$WWPN/$TPGT endpoint basis, but alas it's not that simple.. Also I don't know what rats are hiding here if the backstore is handling IOs in this time... What about cleaning up all the protection resources the backstore driver might be managing? Meaning a SCSI Write without Dif down a none-T10PI portal update the data. The Guard on the disk is now mismatch with the data. A SCSI Read down a T10PI path will run into a Guard failure. By introducing this option, Disk vendor require additional logic to compensate for this. I think it's cheaper to have supported hardware rather than support nightmare. +} +} + if (lun-lun_se_dev != NULL) { pr_err(Port Symlink already exists\n); return -EEXIST; diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index c036595..9279971 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -632,6 +632,15 @@ int core_tpg_set_initiator_node_tag( } EXPORT_SYMBOL(core_tpg_set_initiator_node_tag); +void core_tpg_set_fabric_t10dif( +struct se_portal_group *tpg, +uint32_t fabric_t10dif_force_on) +{ +tpg-fabric_t10dif_force_on = fabric_t10dif_force_on; +} +EXPORT_SYMBOL(core_tpg_set_fabric_t10dif); + Is there a user for this function in this patch? QT I'm on the fence with this piece. Just thinking of a case where DIX is not available for initiator side, but user wants to turn on protection at the link layer. Our test folks would like to cover this case in the future. Not sure I understand. Initiator will send the target data+protection (DIX disabled HBA does INSERT), why does this help? Why should the target fabric care who generated the protection (OS or HBA)? So this is the case where the fabric is responsible for doing the WRITE INSERT + READ_STRIP (which could be in hardware or software), but the INQUIRY PROTECT bit + friends still need to be masked (on that particular session) so the initiator does not generate PI information the host side LLD cannot handle. --nab -- To unsubscribe from this list: send the line unsubscribe linux-scsi in
Re: [PATCH 2/4] tcm_qla2xxx: T10-Dif set harware capability
On Mon, 2014-03-31 at 15:38 +, Quinn Tran wrote: On 3/28/14 5:12 PM, sagi grimberg sa...@mellanox.com wrote: On 3/29/2014 2:05 AM, Quinn Tran wrote: Set Protection Type(1,2,3) capabilities, Guarg type (CRC/IPchksm) capabilities bits to let TCM core knows of HW/fabric capabilities. Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org Signed-off-by: Giridhar Malavali giridhar.malav...@qlogic.com --- drivers/scsi/qla2xxx/tcm_qla2xxx.c | 23 +++ drivers/scsi/qla2xxx/tcm_qla2xxx.h | 1 + 2 files changed, 24 insertions(+) diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c index b23a0ff..4d93081 100644 --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c SNIP + +if (scsi_host_get_prot(lport-qla_vha-host)) { +tpg-se_tpg.fabric_sup_prot_type = (TARGET_DIF_TYPE0_PROT| +TARGET_DIF_TYPE1_PROT|TARGET_DIF_TYPE2_PROT| +TARGET_DIF_TYPE3_PROT); + +tpg-se_tpg.fabric_sup_guard_type = TARGET_GUARD_CRC| +TARGET_GUARD_IP; +} ret = core_tpg_register(tcm_qla2xxx_fabric_configfs-tf_ops, wwn, tpg-se_tpg, tpg, TRANSPORT_TPG_TYPE_NORMAL); @@ -1127,6 +1147,8 @@ static ssize_t tcm_qla2xxx_npiv_tpg_store_enable( qlt_stop_phase1(vha-vha_tgt.qla_tgt); } +core_tpg_set_fabric_t10dif(se_tpg, tpg-tpg_attrib.t10dif_force_on); + Any way we can get this logic to be shared also with iscsi, srp, etc... all fabrics should be set with t10dif right? so I would imagine it would be better to centralize it right? QT Not sure how you want this logic to be shared. This patch is specific to Qlogic driver registering its capabilities. I think that Sagi was referring to a target_core_fabric_ops callback to query protection information from the fabric.. As mentioned in the last response, this would work just fine on a /sys/kernel/config/target/$FABRIC/$WWPN/$TPGT context basis, if it wasn't for iscsi-target / iser-target sharing the same endpoint while still allowing different protection modes. --nab -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] target/core: T10-Dif: check HW support capabilities
On Mon, 2014-03-31 at 17:53 +, Quinn Tran wrote: On 3/28/14 6:24 PM, sagi grimberg sa...@mellanox.com wrote: On 3/29/2014 3:53 AM, Quinn Tran wrote: SNIP +} +} + if (lun-lun_se_dev != NULL) { pr_err(Port Symlink already exists\n); return -EEXIST; diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index c036595..9279971 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -632,6 +632,15 @@ int core_tpg_set_initiator_node_tag( } EXPORT_SYMBOL(core_tpg_set_initiator_node_tag); +void core_tpg_set_fabric_t10dif( +struct se_portal_group *tpg, +uint32_t fabric_t10dif_force_on) +{ +tpg-fabric_t10dif_force_on = fabric_t10dif_force_on; +} +EXPORT_SYMBOL(core_tpg_set_fabric_t10dif); + Is there a user for this function in this patch? QT I'm on the fence with this piece. Just thinking of a case where DIX is not available for initiator side, but user wants to turn on protection at the link layer. Our test folks would like to cover this case in the future. Not sure I understand. Initiator will send the target data+protection (DIX disabled HBA does INSERT), why does this help? Why should the target fabric care who generated the protection (OS or HBA)? QT Sorry for the confusion. The case I'm trying to get at is Data In Insert and Data out strip.In this case, the protection starts from front end target adapter to the back end storage. In revisit your previous patch, this case is not covered. nod So for the TARGET_PROT_DIN_INSERT + TARGET_PROT_DOUT_STRIP cases, the target will need to expose INQUIRY PROTECT=1 + other PI related control bits when the fabric supports these modes, regardless of what PI is supported on the backend device. Keeping this configuration in mind as well while coding up the aforementioned series. ;) --nab -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html