Re: [PATCHv5 0/9] New EH command timeout handler

2013-09-03 Thread Jörn Engel
On Mon, 2 September 2013 02:31:29 -0700, Christoph Hellwig wrote: I would defintively love enabling it globally. Agreed. I carry a private patch based on an older version of Hannes' patchset doing just that. Istr there was some problem exactly once on a VM with a virtualized hba. We hit

Re: [RFC][PATCH] Add a flight data recorder for scsi commands

2013-08-28 Thread Jörn Engel
On Tue, 27 August 2013 20:34:43 -0400, Steven Rostedt wrote: On Tue, 27 Aug 2013 18:41:47 -0400 Jörn Engel jo...@logfs.org wrote: without knowing in advance which device it will be I have to trace all devices, because I don't know the interesting one ahead of time. And after

Re: [RFC][PATCH] Add a flight data recorder for scsi commands

2013-08-28 Thread Jörn Engel
On Tue, 27 August 2013 22:09:19 -0700, Nicholas A. Bellinger wrote: On Tue, 2013-08-27 at 18:17 -0400, Jörn Engel wrote: Here is a fun patch in an early state. Essentially I want to trace scsi commands, which has already been done long ago. The problem I have is that I care about all

Re: [RFC][PATCH] Add a flight data recorder for scsi commands

2013-08-28 Thread Jörn Engel
On Wed, 28 August 2013 14:45:14 +0200, Bart Van Assche wrote: Isn't this facility something that overlaps with what Wireshark can already do today ? Wireshark can already capture SCSI traces for several transport types (iSCSI, USB, ...). For other transports, e.g. FC, I'd prefer to see

Re: [RFC][PATCH] Add a flight data recorder for scsi commands

2013-08-27 Thread Jörn Engel
On Tue, 27 August 2013 19:54:22 -0400, Steven Rostedt wrote: On Tue, 27 Aug 2013 18:03:25 -0400 Jörn Engel jo...@logfs.org wrote: Here is a fun patch in an early state. Essentially I want to trace scsi commands, which has already been done long ago. The problem I have is that I care

Re: [PATCH] mpt2sas: don't handle broadcast primitives

2013-07-24 Thread Jörn Engel
On Wed, 24 July 2013 23:42:22 +0300, Baruch Even wrote: On Sat, Jul 20, 2013 at 1:11 AM, Jörn Engel jo...@logfs.org wrote: On Fri, 19 July 2013 18:06:59 -0400, Jörn Engel wrote: The handling of broadcast primitives involves _scsih_block_io_all_device(), which does what the name implies

Re: [PATCH v4 3/4] [SCSI] sg: checking sdp-detached isn't protected when open

2013-07-22 Thread Jörn Engel
On Sat, 20 July 2013 15:53:16 +0800, Xitao Cao wrote: Thanks for your comment. Do I need to update it and resend? Yes, please. Jörn -- When people work hard for you for a pat on the back, you've got to give them that pat. -- Robert Heinlein -- To unsubscribe from this list: send the line

Re: [PATCH v5 0/4] [SCSI] sg: fix race condition in sg_open

2013-07-22 Thread Jörn Engel
On Mon, 22 July 2013 12:40:29 +0800, Vaughan Cao wrote: There is a race when open sg with O_EXCL flag. Also a race may happen between sg_open and sg_remove. Changes from v4: * [3/4] use ERR_PTR series instead of adding another parameter in sg_add_sfp * [4/4] fix conflict for cherry-pick

Re: [PATCH v4 1/4] [SCSI] sg: use rwsem to solve race during exclusive open

2013-07-19 Thread Jörn Engel
On Wed, 17 July 2013 23:34:03 +0800, Vaughan Cao wrote: Date: Wed, 17 Jul 2013 23:34:03 +0800 From: Vaughan Cao vaughan@oracle.com To: jo...@logfs.org Cc: dgilb...@interlog.com, jbottom...@parallels.com, linux-scsi@vger.kernel.org, linux-ker...@vger.kernel.org,

Re: [PATCH v4 4/4] [SCSI] sg: push file descriptor list locking down to per-device locking

2013-07-19 Thread Jörn Engel
On Wed, 17 July 2013 23:34:06 +0800, Vaughan Cao wrote: Date: Wed, 17 Jul 2013 23:34:06 +0800 From: Vaughan Cao vaughan@oracle.com To: jo...@logfs.org Cc: dgilb...@interlog.com, jbottom...@parallels.com, linux-scsi@vger.kernel.org, linux-ker...@vger.kernel.org,

Re: [PATCH v4 2/4] [SCSI] sg: no need sg_open_exclusive_lock

2013-07-19 Thread Jörn Engel
On Wed, 17 July 2013 23:34:04 +0800, Vaughan Cao wrote: Date: Wed, 17 Jul 2013 23:34:04 +0800 From: Vaughan Cao vaughan@oracle.com To: jo...@logfs.org Cc: dgilb...@interlog.com, jbottom...@parallels.com, linux-scsi@vger.kernel.org, linux-ker...@vger.kernel.org,

Re: [PATCH v4 3/4] [SCSI] sg: checking sdp-detached isn't protected when open

2013-07-19 Thread Jörn Engel
On Wed, 17 July 2013 23:34:05 +0800, Vaughan Cao wrote: -static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev); +static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev, int * reason); You can use ERR_PTR and friends instead of adding another parameter. static void sg_remove_sfp(struct kref *);

[PATCH] mpt2sas: don't handle broadcast primitives

2013-07-19 Thread Jörn Engel
The handling of broadcast primitives involves _scsih_block_io_all_device(), which does what the name implies. I have observed cases with 60s of blocking io on all devices, caused by a single bad device. The downsides of this code are obvious, while the upsides are more elusive. Signed-off-by:

Re: [PATCH] mpt2sas: don't handle broadcast primitives

2013-07-19 Thread Jörn Engel
On Fri, 19 July 2013 18:06:59 -0400, Jörn Engel wrote: The handling of broadcast primitives involves _scsih_block_io_all_device(), which does what the name implies. I have observed cases with 60s of blocking io on all devices, caused by a single bad device. The downsides of this code

Re: [PATCH v2 1/1] [SCSI] sg: fix race condition when do exclusive open

2013-07-15 Thread Jörn Engel
On Sun, 7 July 2013 01:24:44 +0800, vaughan wrote: Do you agree that I use a per device spin_lock 'sfd_lock' to protect sfds list and leave sg_index_lock only protect the global sg device lookup? I think it's reasonable for concurrency. That bit looks fine to me. I don't think it matters

Re: [PATCH v3 1/1] [SCSI] sg: fix race condition when do exclusive open

2013-07-15 Thread Jörn Engel
On Mon, 8 July 2013 03:53:49 +0800, vaughan wrote: Use rwsem to aid opens. Exclusive open has to get write lock and non-exclusive open should get read lock. Replace global sg_open_exclusive_lock with a per device lock - sfd_lock. Since sfds list is now protected by the lock owned by the

Re: [PATCH v2 1/1] [SCSI] sg: fix race condition when do exclusive open

2013-07-05 Thread Jörn Engel
Sorry about replying so late. On Mon, 17 June 2013 21:10:53 +0800, vaughan wrote: Rewrite the last patch. Add a new field 'toopen' in sg_device to count ongoing sg_open's. By checking both 'toopen' and 'exclude' marks when do exclusive open, old race conditions can be avoided. Replace

Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-02 Thread Jörn Engel
On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote: I don't understand what you're getting at. In a dual HBA situation, whether the second HBA is implicated or not depends on configuration and what the first HBA is doing. If it's just passively lost device state, then the second HBA

Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-02 Thread Jörn Engel
On Tue, 2 July 2013 16:33:40 +, James Bottomley wrote: On Tue, 2013-07-02 at 10:58 -0400, Jörn Engel wrote: On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote: I don't understand what you're getting at. In a dual HBA situation, whether the second HBA is implicated

Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-01 Thread Jörn Engel
On Mon, 1 July 2013 08:50:48 +0200, Hannes Reinecke wrote: This patchset implements a new 'eh_deadline' attribute to the SCSI host. It will limit the overall SCSI EH runtime by a given timeout. If the timeout is reached all intermediate EH steps will be skipped and host reset will be

Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-01 Thread Jörn Engel
On Mon, 1 July 2013 19:23:25 +, James Bottomley wrote: On Mon, 2013-07-01 at 13:44 -0400, Jörn Engel wrote: If a single device is bad, don't ever do a host reset. This isn't a tenable position. Sometimes a device looks bad because the host state for it has gone insane. At that point

Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-06-11 Thread Jörn Engel
On Tue, 11 June 2013 08:18:51 +0200, Hannes Reinecke wrote: On 06/11/2013 01:24 AM, Jörn Engel wrote: On Mon, 10 June 2013 11:19:16 -0400, Jörn Engel wrote: I don't care too much whether we use per-command work items or a single system-global thread. Actually, I do care. We have

Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-06-10 Thread Jörn Engel
On Mon, 10 June 2013 11:00:49 +0200, Hannes Reinecke wrote: On 06/10/2013 10:20 AM, Christoph Hellwig wrote: Why can't we use a work item per command? Linking things into a list just to queue it up to workqueues missed half of the point of the workqueue infrastructure. Hmm. I felt

Re: [PATCH 8/9] mpt2sas: Enable new EH timeout handler

2013-06-10 Thread Jörn Engel
On Mon, 10 June 2013 09:40:57 +0200, Hannes Reinecke wrote: Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c

Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-06-10 Thread Jörn Engel
On Mon, 10 June 2013 09:40:52 +0200, Hannes Reinecke wrote: + + spin_lock_irqsave(sdev-list_lock, flags); + if (list_empty(sdev-eh_abort_list)) + kick_worker = 1; + list_add(scmd-eh_entry, sdev-eh_abort_list); + spin_unlock_irqrestore(sdev-list_lock, flags); +

Re: [PATCH 0/4] New SCSI command timeout handler

2013-06-07 Thread Jörn Engel
On Fri, 7 June 2013 14:54:16 +0800, Ren Mingxin wrote: On 06/06/2013 05:43 PM, Hannes Reinecke wrote: This approach may not work for some LLDDs as you said, but I wonder whether SAS is applicable(whether there will be later patches for SAS). We have tested a derived patch on SAS and it did

Re: [PATCH 3/4] scsi: improved eh timeout handler

2013-06-07 Thread Jörn Engel
On Thu, 6 June 2013 11:43:54 +0200, Hannes Reinecke wrote: + spin_lock_irqsave(sdev-list_lock, flags); + list_for_each_entry_safe(scmd, tmp, sdev-eh_abort_list, eh_entry) { + list_del_init(scmd-eh_entry); + spin_unlock_irqrestore(sdev-list_lock, flags); I

Re: [PATCH 1/9] target: Add transport_cmd_check_stop write_pending bit

2013-06-07 Thread Jörn Engel
On Fri, 7 June 2013 21:34:16 +, Nicholas A. Bellinger wrote: -static int transport_cmd_check_stop(struct se_cmd *cmd, bool remove_from_lists) +static int transport_cmd_check_stop(struct se_cmd *cmd, bool remove_from_lists, + bool write_pending) ... -

Re: [PATCH 1/4] scsi: move initialization of scmd-eh_entry

2013-06-06 Thread Jörn Engel
On Thu, 6 June 2013 11:43:52 +0200, Hannes Reinecke wrote: The 'eh_entry' list might be used even before scsi_softirq_done() is called. Hence we should rather initialize it together with the other eh-related variables. Signed-off-by: Hannes Reinecke h...@suse.de Tested-by: Joern Engel

Re: [PATCH 2/4] blk-timeout: add BLK_EH_SCHEDULED return code

2013-06-06 Thread Jörn Engel
On Thu, 6 June 2013 11:43:53 +0200, Hannes Reinecke wrote: Add a 'BLK_EH_SCHEDULED' return code for blk-timeout to indicate that a delayed error recovery has been initiated. Signed-off-by: Hannes Reinecke h...@suse.de Since the patch I tested diverged from your patches 2-4, this is only

Re: [PATCH 3/4] scsi: improved eh timeout handler

2013-06-06 Thread Jörn Engel
On Thu, 6 June 2013 11:43:54 +0200, Hannes Reinecke wrote: When a command runs into a timeout we need to send an 'ABORT TASK' TMF. This is typically done by the 'eh_abort_handler' LLDD callback. Conceptually, however, this function is a normal SCSI command, so there is no need to enter the

Re: [PATCH 3/4] scsi: improved eh timeout handler

2013-06-06 Thread Jörn Engel
On Thu, 6 June 2013 22:39:14 +0200, Hannes Reinecke wrote: + spin_unlock_irqrestore(sdev-list_lock, flags); + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + aborting command %p\n, scmd)); + rtn =

Re: [PATCH] sg: atomize check and set sdp-exclude in sg_open

2013-06-05 Thread Jörn Engel
On Wed, 5 June 2013 17:18:33 +0800, vaughan wrote: Check and set sdp-exclude should be atomic when set in sg_open(). The patch is line-wrapped. More importantly, it doesn't seem to do what your description indicates it should do. And lastly, does this fix a bug, possibly even one you have a

Re: [PATCH] sg: atomize check and set sdp-exclude in sg_open

2013-06-05 Thread Jörn Engel
On Thu, 6 June 2013 00:16:45 +0800, vaughan wrote: 于 2013年06月05日 21:27, Jörn Engel 写道: On Wed, 5 June 2013 17:18:33 +0800, vaughan wrote: Check and set sdp-exclude should be atomic when set in sg_open(). The patch is line-wrapped. More importantly, it doesn't seem to do It's shorter than

Re: [PATCH 4/4] scsi_transport_fc: FC timeout handler

2013-06-05 Thread Jörn Engel
On Sat, 25 May 2013 11:55:58 +0200, Hannes Reinecke wrote: It should be possible to move the code into scsi_lib and just have small hooks for the individual transports to use this. We have done something a bit tasteless and unconditionally enabled your new error handling for all scsi drivers.

Re: [PATCH 1/4] scsi: move initialization of scmd-eh_entry

2013-05-24 Thread Jörn Engel
On Fri, 24 May 2013 11:50:47 +0200, Hannes Reinecke wrote: The 'eh_entry' list might be used even before scsi_softirq_done() is called. Hence we should rather initialize it together with the other eh-related variables. Signed-off-by: Hannes Reinecke h...@suse.de ---

Re: [PATCH] mpt2sas: prevent double free on error path

2013-01-25 Thread Jörn Engel
On Thu, 24 January 2013 08:51:20 +0100, Bjørn Mork wrote: How about the copy of this code in drivers/scsi/mpt3sas/mpt3sas_scsih.c? Is that safe, or does it need fixing as well? Well spotted, that appears to suffer from the same ailment. Will cook up a second patch for that. Are there any

[PATCH v2] mpt2sas/mpt3sas: prevent double free on error path

2013-01-25 Thread Jörn Engel
Changes since v1: - Also fixes mpt3sas, thanks to Bjørn Mork. - Adds a missing put_sas_device to _scsih_probe_boot_devices - Added a newline - Moved a kref_get outside the spinlock I noticed this one when list_del was called with poisoned list pointers, but the real problem is a double-free (and

[PATCH v3] mpt2sas/mpt3sas: prevent double free on error path

2013-01-25 Thread Jörn Engel
Changes since v1: - Adds another missing put_sas_device to _scsih_probe_boot_devices, proving the need for more coffee before sending out patches. I noticed this one when list_del was called with poisoned list pointers, but the real problem is a double-free (and a use-after-free just before

[PATCH] mpt2sas: prevent double free on error path

2013-01-23 Thread Jörn Engel
I noticed this one when list_del was called with poisoned list pointers, but the real problem is a double-free (and a use-after-free just before that). Both _scsih_probe_boot_devices() and _scsih_sas_device_add() put the sas_device onto a list, thereby giving up control. Next they call

Re: [PATCH] scsi_lib: add NULL check to scsi_setup_blk_pc_cmnd

2012-07-25 Thread Jörn Engel
On Tue, 24 July 2012 09:01:41 +0400, James Bottomley wrote: On Mon, 2012-07-23 at 15:24 -0400, Jörn Engel wrote: On Mon, 23 July 2012 23:45:55 +0400, James Bottomley wrote: Have you checked this with the patches in scsi-misc? There's a series of patches in there that alters the way

[PATCH] scsi_lib: add NULL check to scsi_setup_blk_pc_cmnd

2012-07-23 Thread Jörn Engel
At least two slightly different paths can lead to a NULL pointer dereference in scsi_prep_state_check and have been hit in practice. 1. Call Trace: [812f8c4b] scsi_setup_blk_pc_cmnd+0x2b/0x170 [81305108] sd_prep_fn+0x568/0xdd0 [8121c594] blk_peek_request+0xb4/0x240

Re: [PATCH] scsi_lib: add NULL check to scsi_setup_blk_pc_cmnd

2012-07-23 Thread Jörn Engel
On Mon, 23 July 2012 23:45:55 +0400, James Bottomley wrote: On Mon, 2012-07-23 at 13:32 -0400, Jörn Engel wrote: At least two slightly different paths can lead to a NULL pointer dereference in scsi_prep_state_check and have been hit in practice. Have you checked this with the patches