Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 24/05/2013 03:44, Tejun Heo ha scritto: On Thu, May 23, 2013 at 11:47:25AM +0200, Paolo Bonzini wrote: No no, I'm not talking about it not working for the users - it's just passing the commands, it of course works. I'm doubting about it being a worthy security isolation layer. cdb filtering (of any form really) has always been something on the border. It always was something we got stuck with due to lack of other immediate options. I understood correctly then. :) I agree it's not the best, but it's not completely broken either. It has bugs, and that's what these patches try to fix. The same filtering table being applied to different classes of hardware is a software bug, but my point is that the practive essentially entrusts non-insignificant part of security enforcement to the hardware itself. The variety of hardware in question is very wide and significant portion has historically been known to be flaky. Unproven theory, and contradicted by actual practice. Bugs are more common in the handling of borderline conditions, not in the handling of unimplemented commands. If you want to be secure aginst buggy firmware, the commands you have to block are READ and WRITE. Check out the list of existing USB quirks. I'm wondering whether combining 3 into 4 would be good enough. No, it wouldn't. I learnt it the hard way (by having a patch nacked :)) at http://thread.gmane.org/gmane.linux.kernel/1311887. Of course you can't do that by adding dangerous commands to the existing filtering table which is allowed by default. I'm saying count me out knob could be good enough. Neither is perfect but at least the latter doesn't affect the default cases. You need to allow more commands. The count-me-out knob allows all commands. You cannot always allow all commands. Ergo, you cannot always use the count-me-out knob. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote: Adjust the blk_verify_command function to let it look at per-queue data. This will be done in the next patch. This is not a bug fix. This is an enabler for your complex and to my mind dubious rework of the SG_IO command filter. I'm running out of ways to say please don't mix bug fixes with features, because this redesignating of the original patch set as part 1 and parts 2,3 doesn't satisfy the requirement. Does anyone in the real world actually care about this bug? because if not perhaps we can just remove the confusion and consider this as a feature set. If there's someone who actually cares, please lets just do the bug fix first and argue about the feature later. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
Il 24/05/2013 09:36, James Bottomley ha scritto: On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote: Adjust the blk_verify_command function to let it look at per-queue data. This will be done in the next patch. This is not a bug fix. This is an enabler for your complex and to my mind dubious rework of the SG_IO command filter. I'm running out of ways to say please don't mix bug fixes with features, because this redesignating of the original patch set as part 1 and parts 2,3 doesn't satisfy the requirement. I made it part 1/2/3 because parts 2/3 depend on part 1. It makes dependency tracking easier, at least in my mind. If you have another solution that does not require passing request_queue to blk_verify_command, I'm all ears. Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Paolo because if not perhaps we can just remove the confusion and consider this as a feature set. If there's someone who actually cares, please lets just do the bug fix first and argue about the feature later. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
On Fri, 2013-05-24 at 09:43 +0200, Paolo Bonzini wrote: Il 24/05/2013 09:36, James Bottomley ha scritto: On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote: Adjust the blk_verify_command function to let it look at per-queue data. This will be done in the next patch. This is not a bug fix. This is an enabler for your complex and to my mind dubious rework of the SG_IO command filter. I'm running out of ways to say please don't mix bug fixes with features, because this redesignating of the original patch set as part 1 and parts 2,3 doesn't satisfy the requirement. I made it part 1/2/3 because parts 2/3 depend on part 1. It makes dependency tracking easier, at least in my mind. If you have another solution that does not require passing request_queue to blk_verify_command, I'm all ears. That's a circular response that doesn't answer the question. The actual question is: what is simple fix for the bug that isn't entangled with enabling the SG_IO per device type whitelist feature. Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Fine, so produce a simple fix for this bug which we can discuss that's not tied to this feature. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
Il 24/05/2013 09:50, James Bottomley ha scritto: On Fri, 2013-05-24 at 09:43 +0200, Paolo Bonzini wrote: Il 24/05/2013 09:36, James Bottomley ha scritto: On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote: Adjust the blk_verify_command function to let it look at per-queue data. This will be done in the next patch. This is not a bug fix. This is an enabler for your complex and to my mind dubious rework of the SG_IO command filter. I'm running out of ways to say please don't mix bug fixes with features, because this redesignating of the original patch set as part 1 and parts 2,3 doesn't satisfy the requirement. I made it part 1/2/3 because parts 2/3 depend on part 1. It makes dependency tracking easier, at least in my mind. If you have another solution that does not require passing request_queue to blk_verify_command, I'm all ears. That's a circular response that doesn't answer the question. The actual question is: what is simple fix for the bug that isn't entangled with enabling the SG_IO per device type whitelist feature. Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Fine, so produce a simple fix for this bug which we can discuss that's not tied to this feature. Honestly, I have no idea how this is even possible. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
On Fri, May 24, 2013 at 4:13 PM, Paolo Bonzini pbonz...@redhat.com wrote: The same filtering table being applied to different classes of hardware is a software bug, but my point is that the practive essentially entrusts non-insignificant part of security enforcement to the hardware itself. The variety of hardware in question is very wide and significant portion has historically been known to be flaky. Unproven theory, and contradicted by actual practice. Bugs are more common in the handling of borderline conditions, not in the handling of unimplemented commands. If you want to be secure aginst buggy firmware, the commands you have to block are READ and WRITE. Check out the list of existing USB quirks. Well, I'd actually much prefer disabling CDB whitelisting for all !MMC devices if at all possible. You're basically arguing that because what we have is already broken, it should be okay to break it further. Also, RW commands having more quirks doesn't necessarily indicate that they tend to be more broken. They just get hammered on a lot in various ways so problems on those commands tend to be more noticeable. You need to allow more commands. The count-me-out knob allows all commands. You cannot always allow all commands. Ergo, you cannot always use the count-me-out knob. The thing is that both approaches aren't perfect here so you can make similar type of argument from the other side. If the system wants to give out raw hardware access to VMs, requiring it to delegate the device fully could be reasonable. Not ideal but widening direct command access by default is pretty nasty too. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
On Fri, 2013-05-24 at 09:53 +0200, Paolo Bonzini wrote: Il 24/05/2013 09:50, James Bottomley ha scritto: On Fri, 2013-05-24 at 09:43 +0200, Paolo Bonzini wrote: Il 24/05/2013 09:36, James Bottomley ha scritto: On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote: Adjust the blk_verify_command function to let it look at per-queue data. This will be done in the next patch. This is not a bug fix. This is an enabler for your complex and to my mind dubious rework of the SG_IO command filter. I'm running out of ways to say please don't mix bug fixes with features, because this redesignating of the original patch set as part 1 and parts 2,3 doesn't satisfy the requirement. I made it part 1/2/3 because parts 2/3 depend on part 1. It makes dependency tracking easier, at least in my mind. If you have another solution that does not require passing request_queue to blk_verify_command, I'm all ears. That's a circular response that doesn't answer the question. The actual question is: what is simple fix for the bug that isn't entangled with enabling the SG_IO per device type whitelist feature. Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Fine, so produce a simple fix for this bug which we can discuss that's not tied to this feature. Honestly, I have no idea how this is even possible. Really? It looks to me like a simple block on the commands for disk devices in the opcode switch would do it (with a corresponding change to sg.c:sg_allow_access). James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 24/05/2013 10:02, Tejun Heo ha scritto: On Fri, May 24, 2013 at 4:13 PM, Paolo Bonzini pbonz...@redhat.com wrote: The same filtering table being applied to different classes of hardware is a software bug, but my point is that the practive essentially entrusts non-insignificant part of security enforcement to the hardware itself. The variety of hardware in question is very wide and significant portion has historically been known to be flaky. Unproven theory, and contradicted by actual practice. Bugs are more common in the handling of borderline conditions, not in the handling of unimplemented commands. If you want to be secure aginst buggy firmware, the commands you have to block are READ and WRITE. Check out the list of existing USB quirks. Well, I'd actually much prefer disabling CDB whitelisting for all !MMC devices if at all possible. You're basically arguing that because what we have is already broken, it should be okay to break it further. Also, RW commands having more quirks doesn't necessarily indicate that they tend to be more broken. They just get hammered on a lot in various ways so problems on those commands tend to be more noticeable. I agree intuition may not count, and it's perfectly possible that firmware writers forgot a break; or put the wrong location in a jump table, so that unimplemented commands give interesting results. However, the _fact_ is that this might happen anyway with the buttload of commands that are already enabled by the whitelist and that most disks will never implement. You need to allow more commands. The count-me-out knob allows all commands. You cannot always allow all commands. Ergo, you cannot always use the count-me-out knob. The thing is that both approaches aren't perfect here so you can make similar type of argument from the other side. If the system wants to give out raw hardware access to VMs, requiring it to delegate the device fully could be reasonable. No, it is not unfortunately. Allowing to do discards is one thing, allowing to disrupt the settings of a SAN is another. You can only delegate the device fully in these cases: (a) of course, if the guest is trusted; (b) if QEMU is running as a confined user, then you can get by with a userspace whitelist. (Otherwise you're just a ptrace away from arbitrary access, as you pointed out). Unfortunately, there are _real_ problems that this patches fix, such as log spews due to blocked commands, and these happen even if neither of the above conditions is true. Is there anything else I can do? Sure, I can check for the presence of the whitelist and hack the VPD pages to hide features that I know the whitelist will block. Is it the right thing to do? In my opinion no. It makes no sense to have raw device access _disable_ features compared to emulation. Not ideal but widening direct command access by default is pretty nasty too. I actually agree, and that's why I'm trying to balance the widening and restricting. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
Il 24/05/2013 10:03, James Bottomley ha scritto: Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Fine, so produce a simple fix for this bug which we can discuss that's not tied to this feature. Honestly, I have no idea how this is even possible. Really? It looks to me like a simple block on the commands for disk devices in the opcode switch would do it (with a corresponding change to sg.c:sg_allow_access). Which switch? What I can do is something like this in blk_verify_command: if (q-sgio_type == TYPE_ROM) return 0; if (rq-cmd[0] == 0xA4) return -EPERM; if (!is_write (req-cmd[0] == ... || rq-cmd[0] == ...)) return -EPERM; But then the particular patch you're replying to is still necessary, and you're slowing down blk_verify_command. It may be fine for stable and -rc, but IMHO it calls for a better implementation in 3.11. (Besides, I did it like this because it is what Tejun suggested). Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
On Fri, May 24, 2013 at 5:31 PM, Paolo Bonzini pbonz...@redhat.com wrote: I agree intuition may not count, and it's perfectly possible that firmware writers forgot a break; or put the wrong location in a jump table, so that unimplemented commands give interesting results. It's not just unimplemented commands. Exposing any new command exposes its borderline problems together with it. However, the _fact_ is that this might happen anyway with the buttload of commands that are already enabled by the whitelist and that most disks will never implement. Yes and that's why the whitelist is generally frowned upon. It's inherently fragile. These devices simply aren't designed and implemented to be exposed to lesser security domains directly. It's true that it's already kinda broken that way but as I wrote before it's a vicious cycle and we don't wanna keep building on top of it. This expansion is gonna increase the usage of whitelisting which will in turn attract further use cases which are likely to call for even more expansion. The thing is that both approaches aren't perfect here so you can make similar type of argument from the other side. If the system wants to give out raw hardware access to VMs, requiring it to delegate the device fully could be reasonable. No, it is not unfortunately. Allowing to do discards is one thing, allowing to disrupt the settings of a SAN is another. You can only delegate the device fully in these cases: (a) of course, if the guest is trusted; (b) if QEMU is running as a confined user, then you can get by with a userspace whitelist. (Otherwise you're just a ptrace away from arbitrary access, as you pointed out). Unfortunately, there are _real_ problems that this patches fix, such as log spews due to blocked commands, and these happen even if neither of the above conditions is true. Is there anything else I can do? Sure, I can check for the presence of the whitelist and hack the VPD pages to hide features that I know the whitelist will block. Is it the right thing to do? In my opinion no. It makes no sense to have raw device access _disable_ features compared to emulation. If the bulk of filtering can be solved with userland whitelisting as a confined user, it should be possible to resolve peripheral problems like log messages in simpler way, no? Can you please elaborate on the log message problem? Who's spewing those messages? -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi: megaraid: check kzalloc
On 2013/5/24 11:20, Santosh Y wrote: On Fri, May 24, 2013 at 7:52 AM, Libo Chen clbchenlibo.c...@huawei.com wrote: we should check kzalloc, avoid to hit oops Signed-off-by: Libo Chen libo.c...@huawei.com --- drivers/scsi/megaraid.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c index 846f475..195b095 100644 --- a/drivers/scsi/megaraid.c +++ b/drivers/scsi/megaraid.c @@ -4162,6 +4162,10 @@ mega_internal_command(adapter_t *adapter, megacmd_t *mc, mega_passthru *pthru) sdev = kzalloc(sizeof(struct scsi_device), GFP_KERNEL); scmd-device = sdev; ^^ I guess assigning after the NULL check of 'sdev' is more appropriate. Ah, there is a little different. ok, I will update. Thanks, Libo + if (!scmd-device) { + scsi_free_command(GFP_KERNEL, scmd); + return -ENOMEM; + } memset(adapter-int_cdb, 0, sizeof(adapter-int_cdb)); scmd-cmnd = adapter-int_cdb; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND] scsi: megaraid: check kzalloc
we should check kzalloc, avoid to hit oops Signed-off-by: Libo Chen libo.c...@huawei.com --- drivers/scsi/megaraid.c |4 1 files changed, 4 insertions(+), 0 deletions(-) instead of checking scmd-device, sdev is more appropriate. diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c index 846f475..6b623cb 100644 --- a/drivers/scsi/megaraid.c +++ b/drivers/scsi/megaraid.c @@ -4161,6 +4161,10 @@ mega_internal_command(adapter_t *adapter, megacmd_t *mc, mega_passthru *pthru) memset(scb, 0, sizeof(scb_t)); sdev = kzalloc(sizeof(struct scsi_device), GFP_KERNEL); + if (sdev) { + scsi_free_command(GFP_KERNEL, scmd); + return -ENOMEM; + } scmd-device = sdev; memset(adapter-int_cdb, 0, sizeof(adapter-int_cdb)); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Il 24/05/2013 11:07, Tejun Heo ha scritto: On Fri, May 24, 2013 at 5:31 PM, Paolo Bonzini pbonz...@redhat.com wrote: I agree intuition may not count, and it's perfectly possible that firmware writers forgot a break; or put the wrong location in a jump table, so that unimplemented commands give interesting results. It's not just unimplemented commands. Exposing any new command exposes its borderline problems together with it. For commands that are used by Linux already, the right way to fix the problems is not obscuring the commands from userspace's view. You can hit the same problems with ioctls or even with normal operation of the device. However, the _fact_ is that this might happen anyway with the buttload of commands that are already enabled by the whitelist and that most disks will never implement. Yes and that's why the whitelist is generally frowned upon. It's inherently fragile. These devices simply aren't designed and implemented to be exposed to lesser security domains directly. It's true that it's already kinda broken that way but as I wrote before it's a vicious cycle and we don't wanna keep building on top of it. This expansion is gonna increase the usage of whitelisting which will in turn attract further use cases which are likely to call for even more expansion. And prohibiting the extension of whitelists is gonna increase the usage of unpriv_sgio and less-secure userspace whitelists. Anvil, meet hammer. The thing is that both approaches aren't perfect here so you can make similar type of argument from the other side. If the system wants to give out raw hardware access to VMs, requiring it to delegate the device fully could be reasonable. No, it is not unfortunately. Allowing to do discards is one thing, allowing to disrupt the settings of a SAN is another. You can only delegate the device fully in these cases: (a) of course, if the guest is trusted; (b) if QEMU is running as a confined user; If the bulk of filtering can be solved with userland whitelisting as a confined user, it should be possible to resolve peripheral problems like log messages in simpler way, no? Can you please elaborate on the log message problem? Who's spewing those messages? For example: if (bdev_write_same(bdev)) { unsigned char bdn[BDEVNAME_SIZE]; if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, ZERO_PAGE(0))) return 0; bdevname(bdev, bdn); pr_err(%s: WRITE SAME failed. Manually zeroing.\n, bdn); } return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); The device exposes the ability to zero out LUN blocks, but the command is not whitelisted and it fails. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] New FC timeout handler
Hi all, this is the first step towards a new FC error handler. This patch implements a new FC command timeout handler which will be sending command aborts inline without engaging SCSI EH. In addition the commands will be returned directly if the command abort succeeded, cutting down recovery times dramatically. With the original scsi error recovery I got: # time dd if=/dev/zero of=/mnt/test.blk bs=512 count=2048 oflag=sync 2048+0 records in 2048+0 records out 1048576 bytes (1.0 MB) copied, 3.72732 s, 281 kB/s real2m14.475s user0m0.000s sys 0m0.104s with this patchset I got: # time dd if=/dev/zero of=/mnt/test.blk bs=512 count=2048 oflag=sync 2048+0 records in 2048+0 records out 1048576 bytes (1.0 MB) copied, 79.6376 s, 13.2 kB/s real1m19.642s user0m0.000s sys 0m0.100s Test was to disable RSCN on the target port, disable the target port, and then start the 'dd' command as indicated. Comments etc are welcome. Hannes Reinecke (4): scsi: move initialization of scmd-eh_entry blk-timeout: add BLK_EH_SCHEDULED return code scsi: export functions for new fc timeout handler scsi_transport_fc: FC timeout handler drivers/scsi/scsi_error.c| 8 - drivers/scsi/scsi_lib.c | 6 ++-- drivers/scsi/scsi_priv.h | 2 ++ drivers/scsi/scsi_transport_fc.c | 63 +++- include/linux/blkdev.h | 1 + include/scsi/scsi_transport_fc.h | 2 ++ 6 files changed, 78 insertions(+), 4 deletions(-) -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] blk-timeout: add BLK_EH_SCHEDULED return code
Add a 'BLK_EH_SCHEDULED' return code for blk-timeout to indicate that a delayed error recovery has been initiated. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/scsi/scsi_error.c | 3 +++ include/linux/blkdev.h| 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index c1b05a8..bef6799e 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -145,6 +145,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) rtn = host-hostt-eh_timed_out(scmd); scmd-result |= DID_TIME_OUT 16; + /* Check for delayed EH scheduling */ + if (rtn == BLK_EH_SCHEDULED) + return BLK_EH_NOT_HANDLED; if (unlikely(rtn == BLK_EH_NOT_HANDLED !scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 78feda9..0a38211 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -238,6 +238,7 @@ enum blk_eh_timer_return { BLK_EH_NOT_HANDLED, BLK_EH_HANDLED, BLK_EH_RESET_TIMER, + BLK_EH_SCHEDULED, }; typedef enum blk_eh_timer_return (rq_timed_out_fn)(struct request *); -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] scsi: move initialization of scmd-eh_entry
The 'eh_entry' list might be used even before scsi_softirq_done() is called. Hence we should rather initialize it together with the other eh-related variables. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/scsi/scsi_lib.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index c31187d..a5515ec 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -318,6 +318,8 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd) memset(cmd-sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); if (cmd-cmd_len == 0) cmd-cmd_len = scsi_command_size(cmd-cmnd); + INIT_LIST_HEAD(cmd-eh_entry); + cmd-eh_eflags = 0; } void scsi_device_unbusy(struct scsi_device *sdev) @@ -1483,8 +1485,6 @@ static void scsi_softirq_done(struct request *rq) unsigned long wait_for = (cmd-allowed + 1) * rq-timeout; int disposition; - INIT_LIST_HEAD(cmd-eh_entry); - atomic_inc(cmd-device-iodone_cnt); if (cmd-result) atomic_inc(cmd-device-ioerr_cnt); -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] scsi_transport_fc: FC timeout handler
When a command runs into a timeout we need to send an 'ABORT TASK' TMF. This is typically done by the 'eh_abort_handler' LLDD callback. Conceptually, however, this function is a normal SCSI command, so there is no need to enter the error handler. This patch implements a new FC timeout handler which invokes the 'eh_abort_handler' directly when the timeout occurs without entering the error handler. If the 'eh_abort_handler' returns SUCCESS or FAST_IO_FAIL the command will be retried if possible. If no retries are allowed the command will be returned immediately, as we have to assume the TMF succeeded and the command is completed with the LLDD. For any other return code from 'eh_abort_handler' the command will be pushed onto the existing SCSI EH handler, or aborted with an error if that fails. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/scsi/scsi_transport_fc.c | 63 +++- include/scsi/scsi_transport_fc.h | 2 ++ 2 files changed, 64 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c index e106c27..370b263 100644 --- a/drivers/scsi/scsi_transport_fc.c +++ b/drivers/scsi/scsi_transport_fc.c @@ -2048,6 +2048,51 @@ static int fc_vport_match(struct attribute_container *cont, return i-vport_attr_cont.ac == cont; } +/** + * fc_rport_eh_handler - FC-specific error handler + * @work: remote port on which the error occurred. + */ +static void +fc_rport_eh_handler(struct work_struct *work) +{ + struct fc_rport *rport = + container_of(work, struct fc_rport, rport_eh_work); + struct Scsi_Host *shost = rport_to_shost(rport); + struct scsi_cmnd *scmd, *tmp; + unsigned long flags; + int rtn; + + spin_lock_irqsave(shost-host_lock, flags); + list_for_each_entry_safe(scmd, tmp, rport-eh_work_q, eh_entry) { + list_del_init(scmd-eh_entry); + spin_unlock_irqrestore(shost-host_lock, flags); + dev_printk(KERN_WARNING, rport-dev, + trying to abort scmd %p, scmd); + rtn = scsi_try_to_abort_cmd(shost-hostt, scmd); + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { + if (!scsi_noretry_cmd(scmd) + (++scmd-retries = scmd-allowed)) { + dev_printk(KERN_WARNING, rport-dev, + retry scmd %p, scmd); + scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY); + } else { + dev_printk(KERN_WARNING, rport-dev, + fast fail scmd %p, scmd); + scmd-result |= DID_TRANSPORT_FAILFAST 16; + scsi_finish_command(scmd); + } + } else { + if (!scsi_eh_scmd_add(scmd, 0)) { + dev_printk(KERN_WARNING, rport-dev, + terminate scmd %p, scmd); + scmd-result |= DID_TIME_OUT 16; + scsi_finish_command(scmd); + } + } + spin_lock_irqsave(shost-host_lock, flags); + } + spin_unlock_irqrestore(shost-host_lock, flags); +} /** * fc_timed_out - FC Transport I/O timeout intercept handler @@ -2075,11 +2120,25 @@ static enum blk_eh_timer_return fc_timed_out(struct scsi_cmnd *scmd) { struct fc_rport *rport = starget_to_rport(scsi_target(scmd-device)); + struct Scsi_Host *shost = rport_to_shost(rport); + unsigned long flags; + int kick_worker = 0; if (rport-port_state == FC_PORTSTATE_BLOCKED) return BLK_EH_RESET_TIMER; - return BLK_EH_NOT_HANDLED; + spin_lock_irqsave(shost-host_lock, flags); + if (list_empty(rport-eh_work_q)) + kick_worker = 1; + list_add(scmd-eh_entry, rport-eh_work_q); + dev_printk(KERN_WARNING, rport-dev, + scmd %p added to eh queue\n, scmd); + spin_unlock_irqrestore(shost-host_lock, flags); + + if (kick_worker) + fc_queue_work(shost, rport-rport_eh_work); + + return BLK_EH_SCHEDULED; } /* @@ -2630,11 +2689,13 @@ fc_rport_create(struct Scsi_Host *shost, int channel, rport-channel = channel; rport-fast_io_fail_tmo = -1; + INIT_LIST_HEAD(rport-eh_work_q); INIT_DELAYED_WORK(rport-dev_loss_work, fc_timeout_deleted_rport); INIT_DELAYED_WORK(rport-fail_io_work, fc_timeout_fail_rport_io); INIT_WORK(rport-scan_work, fc_scsi_scan_rport); INIT_WORK(rport-stgt_delete_work, fc_starget_delete); INIT_WORK(rport-rport_delete_work, fc_rport_final_delete); + INIT_WORK(rport-rport_eh_work, fc_rport_eh_handler);
[PATCH 3/4] scsi: export functions for new fc timeout handler
Export some more functions which are used by the new FC timeout handler. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/scsi/scsi_error.c | 5 - drivers/scsi/scsi_lib.c | 2 ++ drivers/scsi/scsi_priv.h | 2 ++ 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index bef6799e..f21ad7c 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -119,6 +119,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) spin_unlock_irqrestore(shost-host_lock, flags); return ret; } +EXPORT_SYMBOL_GPL(scsi_eh_scmd_add); /** * scsi_times_out - Timeout function for normal scsi commands. @@ -659,13 +660,14 @@ static int scsi_try_bus_device_reset(struct scsi_cmnd *scmd) return rtn; } -static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_cmnd *scmd) +int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_cmnd *scmd) { if (!hostt-eh_abort_handler) return FAILED; return hostt-eh_abort_handler(scmd); } +EXPORT_SYMBOL_GPL(scsi_try_to_abort_cmd); static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd) { @@ -1412,6 +1414,7 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd) return 0; } +EXPORT_SYMBOL_GPL(scsi_noretry_cmd); /** * scsi_decide_disposition - Disposition a cmd on return from LLD. diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index a5515ec..7306aaa 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -210,6 +210,8 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason) { __scsi_queue_insert(cmd, reason, 1); } +EXPORT_SYMBOL_GPL(scsi_queue_insert); + /** * scsi_execute - insert request and wait for the result * @sdev: scsi device diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index 8f9a0ca..1d8d7d1 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -71,6 +71,8 @@ extern int scsi_error_handler(void *host); extern int scsi_decide_disposition(struct scsi_cmnd *cmd); extern void scsi_eh_wakeup(struct Scsi_Host *shost); extern int scsi_eh_scmd_add(struct scsi_cmnd *, int); +extern int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, +struct scsi_cmnd *scmd); void scsi_eh_ready_devs(struct Scsi_Host *shost, struct list_head *work_q, struct list_head *done_q); -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Question: eh_abort_handler() and terminate commands
Hi all, after having posted the first attempt for an updated FC error handler I found that the 'eh_abort_handler()' semantics are a bit odd. Obviously, the 'eh_abort_handler' is called to abort a command. But what _exactly_ is supposed to happen if it returns 'SUCCESS'? Initially one does expect that the command has been aborted. But then the callback itself does _not_ terminate the command, it's rather expected that the caller of 'eh_abort_command' does it. Which leads to the interesting question: What happens with the actual command once eh_abort_handler returns? As normally 'eh_abort_handler' is implemented as a TMF, one does assume that the command itself will be returned by the target with an appropriate status. However, as the upper layer is expected to terminate the command itself, we will never see this status, right? OTOH it also means that the HBA firmware might receive a completion for a command which the upper layer has already completed. Will this completion ever being mirrored to the LLDD? Or discarded by the firmware? And how is one expected to handle the case where the TMF _failed_ on the target? I would rather prefer to have the LLDD terminate the command; this way we at least have a chance of getting a decent status back ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] ipr: possible irq lock inversion dependency detected
When enable lockdep, seeing possible irq lock inversion dependency detected error. This patch fixes the issue. Signed-off-by: Wen Xiong wenxi...@linux.vnet.ibm.com --- drivers/scsi/ipr.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) Index: b/drivers/scsi/ipr.c === --- a/drivers/scsi/ipr.c2013-05-20 09:55:23.0 -0500 +++ b/drivers/scsi/ipr.c2013-05-20 10:00:34.548380858 -0500 @@ -9408,7 +9408,7 @@ static int ipr_probe_ioa(struct pci_dev void __iomem *ipr_regs; int rc = PCIBIOS_SUCCESSFUL; volatile u32 mask, uproc, interrupts; - unsigned long lock_flags; + unsigned long lock_flags, driver_lock_flags; ENTER; @@ -9631,9 +9631,9 @@ static int ipr_probe_ioa(struct pci_dev } else ioa_cfg-reset = ipr_reset_start_bist; - spin_lock(ipr_driver_lock); + spin_lock_irqsave(ipr_driver_lock, driver_lock_flags); list_add_tail(ioa_cfg-queue, ipr_ioa_head); - spin_unlock(ipr_driver_lock); + spin_unlock_irqrestore(ipr_driver_lock, driver_lock_flags); LEAVE; out: @@ -9716,6 +9716,7 @@ static void __ipr_remove(struct pci_dev unsigned long host_lock_flags = 0; struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); int i; + unsigned long driver_lock_flags; ENTER; spin_lock_irqsave(ioa_cfg-host-host_lock, host_lock_flags); @@ -9739,9 +9740,9 @@ static void __ipr_remove(struct pci_dev INIT_LIST_HEAD(ioa_cfg-used_res_q); spin_lock_irqsave(ioa_cfg-host-host_lock, host_lock_flags); - spin_lock(ipr_driver_lock); + spin_lock_irqsave(ipr_driver_lock, driver_lock_flags); list_del(ioa_cfg-queue); - spin_unlock(ipr_driver_lock); + spin_unlock_irqrestore(ipr_driver_lock, driver_lock_flags); if (ioa_cfg-sdt_state == ABORT_DUMP) ioa_cfg-sdt_state = WAIT_FOR_DUMP; @@ -10007,12 +10008,12 @@ static int ipr_halt(struct notifier_bloc { struct ipr_cmnd *ipr_cmd; struct ipr_ioa_cfg *ioa_cfg; - unsigned long flags = 0; + unsigned long flags = 0, driver_lock_flags; if (event != SYS_RESTART event != SYS_HALT event != SYS_POWER_OFF) return NOTIFY_DONE; - spin_lock(ipr_driver_lock); + spin_lock_irqsave(ipr_driver_lock, driver_lock_flags); list_for_each_entry(ioa_cfg, ipr_ioa_head, queue) { spin_lock_irqsave(ioa_cfg-host-host_lock, flags); @@ -10030,7 +10031,7 @@ static int ipr_halt(struct notifier_bloc ipr_do_req(ipr_cmd, ipr_halt_done, ipr_timeout, IPR_DEVICE_RESET_TIMEOUT); spin_unlock_irqrestore(ioa_cfg-host-host_lock, flags); } - spin_unlock(ipr_driver_lock); + spin_unlock_irqrestore(ipr_driver_lock, driver_lock_flags); return NOTIFY_OK; } -- -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1] ipr: possible irq lock inversion dependency detected
Hi All, When enable lockdep, seeing possible irq lock inversion dependency detected error. This patch fixed the issue. Thanks, Wendy -- -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 0/2] block,scsi: fixup blk_get_request dead queue scenarios
[PATCH v5 0/2] block,scsi: fixup blk_get_request dead queue scenarios Changes from v4: - As per James' suggestion, split into two patches: the first adds blk_get_request return value checking to avoid potential oops, the second converts callers and friends to handle ERR_PTR differentiation of -ENOMEM and -ENODEV. Regards, -- Joe -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 1/2] block,scsi: verify return pointer from blk_get_request
From 22307be1bc6e404622b1f074094902e385a1bd30 Mon Sep 17 00:00:00 2001 From: Joe Lawrence joe.lawre...@stratus.com Date: Fri, 24 May 2013 12:39:04 -0400 Subject: [PATCH v5 1/2] block,scsi: verify return pointer from blk_get_request The blk-core dead queue checks introduced in commit 70460571 added an error scenario to blk_get_request that returns NULL if the request queue has been shutdown. This changed the behavior for __GFP_WAIT callers, who should now verify the return value before dereferencing. Signed-off-by: Joe Lawrence joe.lawre...@stratus.com Cc: Jens Axboe ax...@kernel.dk Cc: Jiri Kosina jkos...@suse.cz Cc: James E.J. Bottomley jbottom...@parallels.com Cc: Bart Van Assche bvanass...@acm.org Cc: linux-scsi@vger.kernel.org --- block/scsi_ioctl.c| 9 - drivers/block/paride/pd.c | 2 ++ drivers/block/pktcdvd.c | 2 ++ drivers/scsi/scsi_error.c | 2 ++ 4 files changed, 14 insertions(+), 1 deletion(-) diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index 9a87daa..6c87d4e 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -458,6 +458,10 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode, } rq = blk_get_request(q, in_len ? WRITE : READ, __GFP_WAIT); + if (!rq) { + err = -ENODEV; + goto error_free_buffer; + } cmdlen = COMMAND_SIZE(opcode); @@ -530,8 +534,9 @@ out: } error: - kfree(buffer); blk_put_request(rq); +error_free_buffer: + kfree(buffer); return err; } EXPORT_SYMBOL_GPL(sg_scsi_ioctl); @@ -544,6 +549,8 @@ static int __blk_send_generic(struct request_queue *q, struct gendisk *bd_disk, int err; rq = blk_get_request(q, WRITE, __GFP_WAIT); + if (!rq) + return -ENODEV; rq-cmd_type = REQ_TYPE_BLOCK_PC; rq-timeout = BLK_DEFAULT_SG_TIMEOUT; rq-cmd[0] = cmd; diff --git a/drivers/block/paride/pd.c b/drivers/block/paride/pd.c index 831e3ac..fc2ecff 100644 --- a/drivers/block/paride/pd.c +++ b/drivers/block/paride/pd.c @@ -722,6 +722,8 @@ static int pd_special_command(struct pd_unit *disk, int err = 0; rq = blk_get_request(disk-gd-queue, READ, __GFP_WAIT); + if (!rq) + return -ENODEV; rq-cmd_type = REQ_TYPE_SPECIAL; rq-special = func; diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c index 1119042..4a8fb03f 100644 --- a/drivers/block/pktcdvd.c +++ b/drivers/block/pktcdvd.c @@ -711,6 +711,8 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command * rq = blk_get_request(q, (cgc-data_direction == CGC_DATA_WRITE) ? WRITE : READ, __GFP_WAIT); + if (!rq) + return -ENODEV; if (cgc-buflen) { if (blk_rq_map_kern(q, rq, cgc-buffer, cgc-buflen, __GFP_WAIT)) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index c1b05a8..aa6b83d 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1635,6 +1635,8 @@ static void scsi_eh_lock_door(struct scsi_device *sdev) * request becomes available */ req = blk_get_request(sdev-request_queue, READ, GFP_KERNEL); + if (!req) + return; req-cmd[0] = ALLOW_MEDIUM_REMOVAL; req-cmd[1] = 0; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/2] block,scsi: convert and handle ERR_PTR from blk_get_request
From 5b26d593807b30f60ed41f6fd5a16a56c3c9a43c Mon Sep 17 00:00:00 2001 From: Joe Lawrence joe.lawre...@stratus.com Date: Fri, 24 May 2013 13:05:09 -0400 Subject: [PATCH v5 2/2] block,scsi: convert and handle ERR_PTR from blk_get_request The blk_get_request function may fail in low-memory conditions or during device removal (even if __GFP_WAIT is set). To distinguish between these errors, modify the blk_get_request call stack to return the appropriate ERR_PTR. Verify that all callers check the return status and consider IS_ERR instead of a simple NULL pointer check. Signed-off-by: Joe Lawrence joe.lawre...@stratus.com Cc: Jens Axboe ax...@kernel.dk Cc: Jiri Kosina jkos...@suse.cz Cc: James E.J. Bottomley jbottom...@parallels.com Cc: Bart Van Assche bvanass...@acm.org Cc: linux-scsi@vger.kernel.org --- block/blk-core.c| 34 ++--- block/bsg.c | 8 +++ block/scsi_ioctl.c | 12 +- drivers/block/paride/pd.c | 4 ++-- drivers/block/pktcdvd.c | 4 ++-- drivers/block/sx8.c | 2 +- drivers/cdrom/cdrom.c | 4 ++-- drivers/ide/ide-park.c | 2 +- drivers/scsi/device_handler/scsi_dh_alua.c | 2 +- drivers/scsi/device_handler/scsi_dh_emc.c | 2 +- drivers/scsi/device_handler/scsi_dh_hp_sw.c | 4 ++-- drivers/scsi/device_handler/scsi_dh_rdac.c | 2 +- drivers/scsi/osd/osd_initiator.c| 4 ++-- drivers/scsi/osst.c | 2 +- drivers/scsi/scsi_error.c | 2 +- drivers/scsi/scsi_lib.c | 2 +- drivers/scsi/scsi_tgt_lib.c | 2 +- drivers/scsi/sg.c | 4 ++-- drivers/scsi/st.c | 2 +- drivers/target/target_core_pscsi.c | 4 ++-- 20 files changed, 51 insertions(+), 51 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index f224d17..9e254e4 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -886,9 +886,9 @@ static struct io_context *rq_ioc(struct bio *bio) * Get a free request from @q. This function may fail under memory * pressure or if @q is dead. * - * Must be callled with @q-queue_lock held and, - * Returns %NULL on failure, with @q-queue_lock held. - * Returns !%NULL on success, with @q-queue_lock *not held*. + * Must be called with @q-queue_lock held and, + * Returns ERR_PTR on failure, with @q-queue_lock held. + * Returns request pointer on success, with @q-queue_lock *not held*. */ static struct request *__get_request(struct request_list *rl, int rw_flags, struct bio *bio, gfp_t gfp_mask) @@ -902,7 +902,7 @@ static struct request *__get_request(struct request_list *rl, int rw_flags, int may_queue; if (unlikely(blk_queue_dying(q))) - return NULL; + return ERR_PTR(-ENODEV); may_queue = elv_may_queue(q, rw_flags); if (may_queue == ELV_MQUEUE_NO) @@ -927,7 +927,7 @@ static struct request *__get_request(struct request_list *rl, int rw_flags, * process is not a batcher, and not * exempted by the IO scheduler */ - return NULL; + return ERR_PTR(-ENOMEM); } } } @@ -945,7 +945,7 @@ static struct request *__get_request(struct request_list *rl, int rw_flags, * allocated with any setting of -nr_requests */ if (rl-count[is_sync] = (3 * q-nr_requests / 2)) - return NULL; + return ERR_PTR(-ENOMEM); q-nr_rqs[is_sync]++; rl-count[is_sync]++; @@ -1050,7 +1050,7 @@ fail_alloc: rq_starved: if (unlikely(rl-count[is_sync] == 0)) rl-starved[is_sync] = 1; - return NULL; + return ERR_PTR(-ENOMEM); } /** @@ -1063,9 +1063,9 @@ rq_starved: * Get a free request from @q. If %__GFP_WAIT is set in @gfp_mask, this * function keeps retrying under memory pressure and fails iff @q is dead. * - * Must be callled with @q-queue_lock held and, - * Returns %NULL on failure, with @q-queue_lock held. - * Returns !%NULL on success, with @q-queue_lock *not held*. + * Must be called with @q-queue_lock held and, + * Returns ERR_PTR on failure, with @q-queue_lock held. + * Returns request pointer on success, with @q-queue_lock *not held*. */ static struct request *get_request(struct request_queue *q, int rw_flags, struct bio *bio, gfp_t gfp_mask) @@ -1078,12 +1078,12 @@ static struct request *get_request(struct request_queue *q, int rw_flags, rl = blk_get_rl(q, bio);/* transferred to @rq on success */ retry:
Re: [PATCH 1/4] scsi: move initialization of scmd-eh_entry
On Fri, 24 May 2013 11:50:47 +0200, Hannes Reinecke wrote: The 'eh_entry' list might be used even before scsi_softirq_done() is called. Hence we should rather initialize it together with the other eh-related variables. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/scsi/scsi_lib.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index c31187d..a5515ec 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -318,6 +318,8 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd) memset(cmd-sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); if (cmd-cmd_len == 0) cmd-cmd_len = scsi_command_size(cmd-cmnd); + INIT_LIST_HEAD(cmd-eh_entry); + cmd-eh_eflags = 0; I always suspect something subtle going on when a variable is initialized to 0 three lines below a memset. Can line this go away? Jörn -- Those who come seeking peace without a treaty are plotting. -- Sun Tzu -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
Il 24/05/2013 10:32, Paolo Bonzini ha scritto: Il 24/05/2013 10:03, James Bottomley ha scritto: Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Fine, so produce a simple fix for this bug which we can discuss that's not tied to this feature. Honestly, I have no idea how this is even possible. Really? It looks to me like a simple block on the commands for disk devices in the opcode switch would do it (with a corresponding change to sg.c:sg_allow_access). Which switch? What I can do is something like this in blk_verify_command: if (q-sgio_type == TYPE_ROM) return 0; if (rq-cmd[0] == 0xA4) return -EPERM; if (!is_write (req-cmd[0] == ... || rq-cmd[0] == ...)) return -EPERM; But then the particular patch you're replying to is still necessary, and you're slowing down blk_verify_command. It may be fine for stable and -rc, but IMHO it calls for a better implementation in 3.11. Ok, so I did a patch along these lines. And it's just as ugly as everything else that I've been posting in these threads. Yes, perhaps it has a redeeming grace in that it is fine for = 3.10, but that's it. Because actually I agree with you. The rework of the SG_IO command filter _is_ dubious to say the least; 300 lines of default, immutable policy don't belong in the kernel. So why am I posting this crap? Because I have to work around the nack of the generic sysfs bitmap patches, which would have beatifully solved all of this. In fact, you had proposed that approach. I posted it in September 2012. Then (as usual) silence for one month until it was quickly dismissed by Tejun. *You and Jens* failed to review patches, and this rathole is what that led to. It's unpleasant for me as it is for everyone else. Yes, you and Jens are busy, we all are. But *you and Jens* are the maintainers. Please make a decision instead of drawing straws, so that we can all go back to our regular business. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
On Fri, May 24, 2013 at 11:45:33AM +0200, Paolo Bonzini wrote: It's not just unimplemented commands. Exposing any new command exposes its borderline problems together with it. For commands that are used by Linux already, the right way to fix the problems is not obscuring the commands from userspace's view. You can hit the same problems with ioctls or even with normal operation of the device. The kernel is providing an isolation layer between the userland and the devices. It isn't obscuring. We can go through many adjectives but it's still increasing the amount exposed surface and accelerating expansion of cdb filter. And prohibiting the extension of whitelists is gonna increase the usage of unpriv_sgio and less-secure userspace whitelists. Anvil, meet hammer. Delegating full device access is still a fringe use case compared to generic block RW access. Given thta we're expecting to have an extra separation layer albeit in userland, the overall picture doesn't seem to favor extension of whitelists. If the bulk of filtering can be solved with userland whitelisting as a confined user, it should be possible to resolve peripheral problems like log messages in simpler way, no? Can you please elaborate on the log message problem? Who's spewing those messages? For example: if (bdev_write_same(bdev)) { unsigned char bdn[BDEVNAME_SIZE]; if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, ZERO_PAGE(0))) return 0; bdevname(bdev, bdn); pr_err(%s: WRITE SAME failed. Manually zeroing.\n, bdn); } return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); The device exposes the ability to zero out LUN blocks, but the command is not whitelisted and it fails. Which can be solved by userland filtering, right? -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question: eh_abort_handler() and terminate commands
On 5/24/2013 5:57 AM, Hannes Reinecke wrote: Which leads to the interesting question: What happens with the actual command once eh_abort_handler returns? Well, eventually it ends up on the done_q and gets returned up the stack via flush_done_q(). But that wasn't what you were asking? As normally 'eh_abort_handler' is implemented as a TMF, one does assume that the command itself will be returned by the target with an appropriate status. Uh, well you don't get a proper SCSI status on a TMF or a ABTS/ABTX. So basically, the abort just kills processing of the commands. OTOH it also means that the HBA firmware might receive a completion for a command which the upper layer has already completed. Well, I think there is some rule here (scsi_eh.txt, everyone forgets about the command) that by the time the eh_abort_handler() completes you won't get any new scsi_done()s. This doesn't appear to mean that you won't get them while the abort_handler is running. Hence if you look at send_eh_cmnd() you see that the done completion being triggered at any time after the wait_for_completion_timeout() doesn't really do anything useful. The normal abort path completion doesn't appear to care either. Abort success/failure doesn't appear to fundamentally change the eventual return status of the commands. Will this completion ever being mirrored to the LLDD? Or discarded by the firmware? Yes, if for some reason a status comes in for an aborted exchange the HBA firmware rejects it because its against an invalid exchange (or should, the HBA i'm most familiar with does it this way). This is fairly easy to test if you have a jammer, just inject a FCP_RSP_IU into an aborted exchange. And how is one expected to handle the case where the TMF _failed_ on the target? Doesn't the current path eventually just end up doing the lun reset? Whats wrong with that, stop all the IO, let the existing commands complete or timeout then hit the device with the big hammer? If the lun reset succeeds you can pretty much feel safe that everything is aborted. That is assuming you get the correct return from the bus_device_reset(). It is potentially possible for the lun reset to be rejected, and in the case of some of the drivers return success anyway (consider lpfc_sli_issue_iocb_wait). I bet I could corrupt some disk data like that (format unit, abts reject, lun reset reject, continue operation with format unit still running on the target). I would rather prefer to have the LLDD terminate the command; this way we at least have a chance of getting a decent status back ... Well, you might be able to simplify a few things in scsi_* if eh_abort_handler() were more like the windows async cancel IO IRP and didn't block. It simply marks the IO as being canceled and then the completion eventually runs as normal within the devloss timeout. You probably could abort right out of a function in front of scsi_times_out() and avoid the whole error handling queues/blocking/task/etc. Then you use the abort accept/failure out of scsi_done to either queue the command into the current scsi_times_out logic, or you complete it with a timeout. Pretty clean, except for the fact your going to have to rewrite a lot of stuff in the LLDs to assure that they get the abort status returned within a reasonable amount of time. OTOH, the cancel IO model in windows is one of the things people writing IO drivers on that platform despise. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Martin K. Petersen, on 05/22/2013 09:32 AM wrote: Paolo First of all, I'll note that SG_IO and block-device-specific Paolo ioctls both have their place. My usecase for SG_IO is Paolo virtualization, where I need to pass information from the LUN to Paolo the virtual machine with as much fidelity as possible if I choose Paolo to virtualize at the SCSI level. Now there's your problem! Several people told you way back that the SCSI virt approach was a really poor choice. The SG_IO permissions problem is a classic Doctor, it hurts when I do this. The kernel's fundamental task is to provide abstraction between applications and intricacies of hardware. The right way to solve the problem would have been to provide a better device abstraction built on top of the block/SCSI infrastructure we already have in place. If you need more fidelity, add fidelity to the block layer instead of punching a giant hole through it. I seem to recall that reservations were part of your motivation for going the SCSI route in the first place. A better approach would have been to create a generic reservations mechanism that could be exposed to the guest. And then let the baremetal kernel worry about the appropriate way to communicate with the physical hardware. Just like we've done with reads and writes, discard, write same, etc. Well, any abstraction is good only if it isn't artificial, so solving more problems than creating. Reality is that de facto in the industry _SCSI_ is the abstraction for block/direct access to data. Look around. How many of systems around you after all layers end up to SCSI commands in their storage devices? Linux block layer is purely artificial creature slowly reinventing wheel creating more problems, than solving. It enforces approach, where often impossible means impossible in this interface. For instance, how about copy offload? How about reservations? How about atomic writes? Look at history of barriers and compare then with what can be done in SCSI. It's still worse, because doesn't allow usage of all devices capabilities. Why was it needed to create special blk integrity interface with the only end user - SCSI? Artificial task created - then well solved. Etc, etc. The block layer keeps repeating SCSI. So, maybe, after all, it's better to acknowledge that direct usage of SCSI without any intermediate layers and translations is more productive? And for those minors not using SCSI internally, translate from SCSI to their internal commands? Creating and filling CDB fields for most cases isn't anyhow harder, than creating and feeling bio fields. So, I appreciate work Paolo is doing in this direction. At least, the right thing will be on the virtualization level. I do understand that with all existing baggage replacing block layer by SCSI isn't practical and not proposing it, but let's at least acknowledge limitations of the academic block abstraction. Let's don't make those limitations global walls. Many things better to do using direct SCSI, hence let's do the better way. Vlad -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command
On Fri, 2013-05-24 at 10:32 +0200, Paolo Bonzini wrote: Il 24/05/2013 10:03, James Bottomley ha scritto: Does anyone in the real world actually care about this bug? Yes, or I would move on and not waste so much time on this. Fine, so produce a simple fix for this bug which we can discuss that's not tied to this feature. Honestly, I have no idea how this is even possible. Really? It looks to me like a simple block on the commands for disk devices in the opcode switch would do it (with a corresponding change to sg.c:sg_allow_access). Which switch? What I can do is something like this in blk_verify_command: not in blk_verify_command: outside of it, in the three places it's used. if (q-sgio_type == TYPE_ROM) return 0; if (rq-cmd[0] == 0xA4) return -EPERM; if (!is_write (req-cmd[0] == ... || rq-cmd[0] == ...)) return -EPERM; But then the particular patch you're replying to is still necessary, and you're slowing down blk_verify_command. It's a set if if switches in non performance critical code. It may be fine for stable and -rc, but IMHO it calls for a better implementation in 3.11. What goes into stable should be what goes into the real kernel and it helps separate the bug fix from feature argument. James (Besides, I did it like this because it is what Tejun suggested). -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
On Fri, 2013-05-24 at 17:02 +0900, Tejun Heo wrote: On Fri, May 24, 2013 at 4:13 PM, Paolo Bonzini pbonz...@redhat.com wrote: The same filtering table being applied to different classes of hardware is a software bug, but my point is that the practive essentially entrusts non-insignificant part of security enforcement to the hardware itself. The variety of hardware in question is very wide and significant portion has historically been known to be flaky. Unproven theory, and contradicted by actual practice. Bugs are more common in the handling of borderline conditions, not in the handling of unimplemented commands. If you want to be secure aginst buggy firmware, the commands you have to block are READ and WRITE. Check out the list of existing USB quirks. Well, I'd actually much prefer disabling CDB whitelisting for all !MMC devices if at all possible. I'll go along with this. I'm also wondering what the problem would be if we just allowed all commands on either CAP_SYS_RAWIO or opening the device for write, so we just defer to the filesystem permissions and restricted read only opens to the basic all device opcodes. You're basically arguing that because what we have is already broken, it should be okay to break it further. Also, RW commands having more quirks doesn't necessarily indicate that they tend to be more broken. They just get hammered on a lot in various ways so problems on those commands tend to be more noticeable. I agree with this, so finding a way to get rid of the opcode table seems to be what we need. You need to allow more commands. The count-me-out knob allows all commands. You cannot always allow all commands. Ergo, you cannot always use the count-me-out knob. Do we have a real world example of this? Getting the kernel out of the command filtering business does seem to be a good idea to me. The thing is that both approaches aren't perfect here so you can make similar type of argument from the other side. If the system wants to give out raw hardware access to VMs, requiring it to delegate the device fully could be reasonable. Not ideal but widening direct command access by default is pretty nasty too. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] scsi_transport_fc: FC timeout handler
This looks like a good start, but why would we make this FC specific? -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
On Fri, May 24, 2013 at 09:35:02PM -0700, James Bottomley wrote: I'll go along with this. I'm also wondering what the problem would be if we just allowed all commands on either CAP_SYS_RAWIO or opening the device for write, so we just defer to the filesystem permissions and restricted read only opens to the basic all device opcodes. I've been out of this area for a bit, but the problem used to be that you could send destructive commands to a partition. The right fix for that would be to not allow SG_IO on partitions at all, just wondering if anything would be broken by this. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html