Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 03:44, Tejun Heo ha scritto:
 On Thu, May 23, 2013 at 11:47:25AM +0200, Paolo Bonzini wrote:
 No no, I'm not talking about it not working for the users - it's just
 passing the commands, it of course works.  I'm doubting about it being
 a worthy security isolation layer.  cdb filtering (of any form really)
 has always been something on the border.  It always was something we
 got stuck with due to lack of other immediate options.

 I understood correctly then. :)  I agree it's not the best, but it's not
 completely broken either.  It has bugs, and that's what these patches
 try to fix.
 
 The same filtering table being applied to different classes of
 hardware is a software bug, but my point is that the practive
 essentially entrusts non-insignificant part of security enforcement to
 the hardware itself.  The variety of hardware in question is very wide
 and significant portion has historically been known to be flaky.

Unproven theory, and contradicted by actual practice.  Bugs are more
common in the handling of borderline conditions, not in the handling of
unimplemented commands.

If you want to be secure aginst buggy firmware, the commands you have to
block are READ and WRITE.  Check out the list of existing USB quirks.

 I'm wondering whether combining 3 into 4 would be good enough.

 No, it wouldn't.  I learnt it the hard way (by having a patch nacked :))
 at http://thread.gmane.org/gmane.linux.kernel/1311887.
 
 Of course you can't do that by adding dangerous commands to the
 existing filtering table which is allowed by default.  I'm saying
 count me out knob could be good enough.  Neither is perfect but at
 least the latter doesn't affect the default cases.

You need to allow more commands.

The count-me-out knob allows all commands.

You cannot always allow all commands.

Ergo, you cannot always use the count-me-out knob.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread James Bottomley
On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote:
 Adjust the blk_verify_command function to let it look at per-queue
 data.  This will be done in the next patch.

This is not a bug fix.  This is an enabler for your complex and to my
mind dubious rework of the SG_IO command filter.  I'm running out of
ways to say please don't mix bug fixes with features, because this
redesignating of the original patch set as part 1 and parts 2,3 doesn't
satisfy the requirement.

Does anyone in the real world actually care about this bug?  because if
not perhaps we can just remove the confusion and consider this as a
feature set.  If there's someone who actually cares, please lets just do
the bug fix first and argue about the feature later.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 09:36, James Bottomley ha scritto:
 On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote:
 Adjust the blk_verify_command function to let it look at per-queue
 data.  This will be done in the next patch.
 
 This is not a bug fix.  This is an enabler for your complex and to my
 mind dubious rework of the SG_IO command filter.  I'm running out of
 ways to say please don't mix bug fixes with features, because this
 redesignating of the original patch set as part 1 and parts 2,3 doesn't
 satisfy the requirement.

I made it part 1/2/3 because parts 2/3 depend on part 1.  It makes
dependency tracking easier, at least in my mind.

If you have another solution that does not require passing request_queue
to blk_verify_command, I'm all ears.

 Does anyone in the real world actually care about this bug?

Yes, or I would move on and not waste so much time on this.

Paolo

 because if
 not perhaps we can just remove the confusion and consider this as a
 feature set.  If there's someone who actually cares, please lets just do
 the bug fix first and argue about the feature later.
 
 James
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread James Bottomley
On Fri, 2013-05-24 at 09:43 +0200, Paolo Bonzini wrote:
 Il 24/05/2013 09:36, James Bottomley ha scritto:
  On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote:
  Adjust the blk_verify_command function to let it look at per-queue
  data.  This will be done in the next patch.
  
  This is not a bug fix.  This is an enabler for your complex and to my
  mind dubious rework of the SG_IO command filter.  I'm running out of
  ways to say please don't mix bug fixes with features, because this
  redesignating of the original patch set as part 1 and parts 2,3 doesn't
  satisfy the requirement.
 
 I made it part 1/2/3 because parts 2/3 depend on part 1.  It makes
 dependency tracking easier, at least in my mind.
 
 If you have another solution that does not require passing request_queue
 to blk_verify_command, I'm all ears.

That's a circular response that doesn't answer the question.  The actual
question is: what is simple fix for the bug that isn't entangled with
enabling the SG_IO per device type whitelist feature.

  Does anyone in the real world actually care about this bug?
 
 Yes, or I would move on and not waste so much time on this.

Fine, so produce a simple fix for this bug which we can discuss that's
not tied to this feature.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 09:50, James Bottomley ha scritto:
 On Fri, 2013-05-24 at 09:43 +0200, Paolo Bonzini wrote:
 Il 24/05/2013 09:36, James Bottomley ha scritto:
 On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote:
 Adjust the blk_verify_command function to let it look at per-queue
 data.  This will be done in the next patch.

 This is not a bug fix.  This is an enabler for your complex and to my
 mind dubious rework of the SG_IO command filter.  I'm running out of
 ways to say please don't mix bug fixes with features, because this
 redesignating of the original patch set as part 1 and parts 2,3 doesn't
 satisfy the requirement.

 I made it part 1/2/3 because parts 2/3 depend on part 1.  It makes
 dependency tracking easier, at least in my mind.

 If you have another solution that does not require passing request_queue
 to blk_verify_command, I'm all ears.
 
 That's a circular response that doesn't answer the question.  The actual
 question is: what is simple fix for the bug that isn't entangled with
 enabling the SG_IO per device type whitelist feature.
 
 Does anyone in the real world actually care about this bug?

 Yes, or I would move on and not waste so much time on this.
 
 Fine, so produce a simple fix for this bug which we can discuss that's
 not tied to this feature.

Honestly, I have no idea how this is even possible.

Paolo

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Tejun Heo
On Fri, May 24, 2013 at 4:13 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 The same filtering table being applied to different classes of
 hardware is a software bug, but my point is that the practive
 essentially entrusts non-insignificant part of security enforcement to
 the hardware itself.  The variety of hardware in question is very wide
 and significant portion has historically been known to be flaky.

 Unproven theory, and contradicted by actual practice.  Bugs are more
 common in the handling of borderline conditions, not in the handling of
 unimplemented commands.

 If you want to be secure aginst buggy firmware, the commands you have to
 block are READ and WRITE.  Check out the list of existing USB quirks.

Well, I'd actually much prefer disabling CDB whitelisting for all !MMC
devices if at all possible. You're basically arguing that because what
we have is already broken, it should be okay to break it further.
Also, RW commands having more quirks doesn't necessarily indicate that
they tend to be more broken. They just get hammered on a lot in
various ways so problems on those commands tend to be more noticeable.

 You need to allow more commands.
 The count-me-out knob allows all commands.
 You cannot always allow all commands.
 Ergo, you cannot always use the count-me-out knob.

The thing is that both approaches aren't perfect here so you can make
similar type of argument from the other side. If the system wants to
give out raw hardware access to VMs, requiring it to delegate the
device fully could be reasonable. Not ideal but widening direct
command access by default is pretty nasty too.

--
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread James Bottomley
On Fri, 2013-05-24 at 09:53 +0200, Paolo Bonzini wrote:
 Il 24/05/2013 09:50, James Bottomley ha scritto:
  On Fri, 2013-05-24 at 09:43 +0200, Paolo Bonzini wrote:
  Il 24/05/2013 09:36, James Bottomley ha scritto:
  On Thu, 2013-05-23 at 15:58 +0200, Paolo Bonzini wrote:
  Adjust the blk_verify_command function to let it look at per-queue
  data.  This will be done in the next patch.
 
  This is not a bug fix.  This is an enabler for your complex and to my
  mind dubious rework of the SG_IO command filter.  I'm running out of
  ways to say please don't mix bug fixes with features, because this
  redesignating of the original patch set as part 1 and parts 2,3 doesn't
  satisfy the requirement.
 
  I made it part 1/2/3 because parts 2/3 depend on part 1.  It makes
  dependency tracking easier, at least in my mind.
 
  If you have another solution that does not require passing request_queue
  to blk_verify_command, I'm all ears.
  
  That's a circular response that doesn't answer the question.  The actual
  question is: what is simple fix for the bug that isn't entangled with
  enabling the SG_IO per device type whitelist feature.
  
  Does anyone in the real world actually care about this bug?
 
  Yes, or I would move on and not waste so much time on this.
  
  Fine, so produce a simple fix for this bug which we can discuss that's
  not tied to this feature.
 
 Honestly, I have no idea how this is even possible.

Really?  It looks to me like a simple block on the commands for disk
devices in the opcode switch would do it (with a corresponding change to
sg.c:sg_allow_access).

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 10:02, Tejun Heo ha scritto:
 On Fri, May 24, 2013 at 4:13 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 The same filtering table being applied to different classes of
 hardware is a software bug, but my point is that the practive
 essentially entrusts non-insignificant part of security enforcement to
 the hardware itself.  The variety of hardware in question is very wide
 and significant portion has historically been known to be flaky.

 Unproven theory, and contradicted by actual practice.  Bugs are more
 common in the handling of borderline conditions, not in the handling of
 unimplemented commands.

 If you want to be secure aginst buggy firmware, the commands you have to
 block are READ and WRITE.  Check out the list of existing USB quirks.
 
 Well, I'd actually much prefer disabling CDB whitelisting for all !MMC
 devices if at all possible. You're basically arguing that because what
 we have is already broken, it should be okay to break it further.
 Also, RW commands having more quirks doesn't necessarily indicate that
 they tend to be more broken. They just get hammered on a lot in
 various ways so problems on those commands tend to be more noticeable.

I agree intuition may not count, and it's perfectly possible that
firmware writers forgot a break; or put the wrong location in a jump
table, so that unimplemented commands give interesting results.

However, the _fact_ is that this might happen anyway with the buttload
of commands that are already enabled by the whitelist and that most
disks will never implement.

 You need to allow more commands.
 The count-me-out knob allows all commands.
 You cannot always allow all commands.
 Ergo, you cannot always use the count-me-out knob.
 
 The thing is that both approaches aren't perfect here so you can make
 similar type of argument from the other side. If the system wants to
 give out raw hardware access to VMs, requiring it to delegate the
 device fully could be reasonable.

No, it is not unfortunately.  Allowing to do discards is one thing,
allowing to disrupt the settings of a SAN is another.  You can only
delegate the device fully in these cases:

(a) of course, if the guest is trusted;

(b) if QEMU is running as a confined user, then you can get by with a
userspace whitelist.  (Otherwise you're just a ptrace away from
arbitrary access, as you pointed out).

Unfortunately, there are _real_ problems that this patches fix, such as
log spews due to blocked commands, and these happen even if neither of
the above conditions is true.

Is there anything else I can do?  Sure, I can check for the presence of
the whitelist and hack the VPD pages to hide features that I know the
whitelist will block.  Is it the right thing to do?  In my opinion no.
It makes no sense to have raw device access _disable_ features compared
to emulation.

 Not ideal but widening direct
 command access by default is pretty nasty too.

I actually agree, and that's why I'm trying to balance the widening and
restricting.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 10:03, James Bottomley ha scritto:
   Does anyone in the real world actually care about this bug?
  
   Yes, or I would move on and not waste so much time on this.
   
   Fine, so produce a simple fix for this bug which we can discuss that's
   not tied to this feature.
  
  Honestly, I have no idea how this is even possible.
 Really?  It looks to me like a simple block on the commands for disk
 devices in the opcode switch would do it (with a corresponding change to
 sg.c:sg_allow_access).

Which switch?  What I can do is something like this in blk_verify_command:

if (q-sgio_type == TYPE_ROM)
return 0;
if (rq-cmd[0] == 0xA4)
return -EPERM;
if (!is_write 
(req-cmd[0] == ... || rq-cmd[0] == ...))
return -EPERM;

But then the particular patch you're replying to is still necessary,
and you're slowing down blk_verify_command.  It may be fine for stable
and -rc, but IMHO it calls for a better implementation in 3.11.

(Besides, I did it like this because it is what Tejun suggested).

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Tejun Heo
On Fri, May 24, 2013 at 5:31 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 I agree intuition may not count, and it's perfectly possible that
 firmware writers forgot a break; or put the wrong location in a jump
 table, so that unimplemented commands give interesting results.

It's not just unimplemented commands. Exposing any new command exposes
its borderline problems together with it.

 However, the _fact_ is that this might happen anyway with the buttload
 of commands that are already enabled by the whitelist and that most
 disks will never implement.

Yes and that's why the whitelist is generally frowned upon. It's
inherently fragile. These devices simply aren't designed and
implemented to be exposed to lesser security domains directly. It's
true that it's already kinda broken that way but as I wrote before
it's a vicious cycle and we don't wanna keep building on top of it.
This expansion is gonna increase the usage of whitelisting which will
in turn attract further use cases which are likely to call for even
more expansion.

 The thing is that both approaches aren't perfect here so you can make
 similar type of argument from the other side. If the system wants to
 give out raw hardware access to VMs, requiring it to delegate the
 device fully could be reasonable.

 No, it is not unfortunately.  Allowing to do discards is one thing,
 allowing to disrupt the settings of a SAN is another.  You can only
 delegate the device fully in these cases:

 (a) of course, if the guest is trusted;

 (b) if QEMU is running as a confined user, then you can get by with a
 userspace whitelist.  (Otherwise you're just a ptrace away from
 arbitrary access, as you pointed out).

 Unfortunately, there are _real_ problems that this patches fix, such as
 log spews due to blocked commands, and these happen even if neither of
 the above conditions is true.

 Is there anything else I can do?  Sure, I can check for the presence of
 the whitelist and hack the VPD pages to hide features that I know the
 whitelist will block.  Is it the right thing to do?  In my opinion no.
 It makes no sense to have raw device access _disable_ features compared
 to emulation.

If the bulk of filtering can be solved with userland whitelisting as a
confined user, it should be possible to resolve peripheral problems
like log messages in simpler way, no? Can you please elaborate on the
log message problem? Who's spewing those messages?

--
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: megaraid: check kzalloc

2013-05-24 Thread Libo Chen
On 2013/5/24 11:20, Santosh Y wrote:
 On Fri, May 24, 2013 at 7:52 AM, Libo Chen clbchenlibo.c...@huawei.com 
 wrote:

 we should check kzalloc, avoid to hit oops

 Signed-off-by: Libo Chen libo.c...@huawei.com
 ---
  drivers/scsi/megaraid.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)

 diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
 index 846f475..195b095 100644
 --- a/drivers/scsi/megaraid.c
 +++ b/drivers/scsi/megaraid.c
 @@ -4162,6 +4162,10 @@ mega_internal_command(adapter_t *adapter, megacmd_t 
 *mc, mega_passthru *pthru)

 sdev = kzalloc(sizeof(struct scsi_device), GFP_KERNEL);
 scmd-device = sdev;
   ^^
 I guess assigning after the NULL check of 'sdev' is more appropriate.

Ah, there is a little different.  ok, I will update.


Thanks,

Libo

 
 +   if (!scmd-device) {
 +   scsi_free_command(GFP_KERNEL, scmd);
 +   return -ENOMEM;
 +   }

 memset(adapter-int_cdb, 0, sizeof(adapter-int_cdb));
 scmd-cmnd = adapter-int_cdb;
 --
 1.7.1

 --
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND] scsi: megaraid: check kzalloc

2013-05-24 Thread Libo Chen

we should check kzalloc, avoid to hit oops

Signed-off-by: Libo Chen libo.c...@huawei.com
---
 drivers/scsi/megaraid.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

instead of checking scmd-device, sdev is more appropriate.

diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
index 846f475..6b623cb 100644
--- a/drivers/scsi/megaraid.c
+++ b/drivers/scsi/megaraid.c
@@ -4161,6 +4161,10 @@ mega_internal_command(adapter_t *adapter, megacmd_t *mc, 
mega_passthru *pthru)
memset(scb, 0, sizeof(scb_t));

sdev = kzalloc(sizeof(struct scsi_device), GFP_KERNEL);
+   if (sdev) {
+   scsi_free_command(GFP_KERNEL, scmd);
+   return -ENOMEM;
+   }
scmd-device = sdev;

memset(adapter-int_cdb, 0, sizeof(adapter-int_cdb));
-- 
1.7.1



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 11:07, Tejun Heo ha scritto:
 On Fri, May 24, 2013 at 5:31 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 I agree intuition may not count, and it's perfectly possible that
 firmware writers forgot a break; or put the wrong location in a jump
 table, so that unimplemented commands give interesting results.
 
 It's not just unimplemented commands. Exposing any new command exposes
 its borderline problems together with it.

For commands that are used by Linux already, the right way to fix the
problems is not obscuring the commands from userspace's view.  You can
hit the same problems with ioctls or even with normal operation of the
device.

 However, the _fact_ is that this might happen anyway with the buttload
 of commands that are already enabled by the whitelist and that most
 disks will never implement.
 
 Yes and that's why the whitelist is generally frowned upon. It's
 inherently fragile. These devices simply aren't designed and
 implemented to be exposed to lesser security domains directly. It's
 true that it's already kinda broken that way but as I wrote before
 it's a vicious cycle and we don't wanna keep building on top of it.
 This expansion is gonna increase the usage of whitelisting which will
 in turn attract further use cases which are likely to call for even
 more expansion.

And prohibiting the extension of whitelists is gonna increase the
usage of unpriv_sgio and less-secure userspace whitelists.

Anvil, meet hammer.

 The thing is that both approaches aren't perfect here so you can make
 similar type of argument from the other side. If the system wants to
 give out raw hardware access to VMs, requiring it to delegate the
 device fully could be reasonable.

 No, it is not unfortunately.  Allowing to do discards is one thing,
 allowing to disrupt the settings of a SAN is another.  You can only
 delegate the device fully in these cases:

 (a) of course, if the guest is trusted;
 (b) if QEMU is running as a confined user;
 
 If the bulk of filtering can be solved with userland whitelisting as a
 confined user, it should be possible to resolve peripheral problems
 like log messages in simpler way, no? Can you please elaborate on the
 log message problem? Who's spewing those messages?

For example:

if (bdev_write_same(bdev)) {
unsigned char bdn[BDEVNAME_SIZE];

if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
 ZERO_PAGE(0)))
return 0;

bdevname(bdev, bdn);
pr_err(%s: WRITE SAME failed. Manually zeroing.\n, bdn);
}

return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);

The device exposes the ability to zero out LUN blocks, but the command is
not whitelisted and it fails.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] New FC timeout handler

2013-05-24 Thread Hannes Reinecke
Hi all,

this is the first step towards a new FC error handler.
This patch implements a new FC command timeout handler
which will be sending command aborts inline without
engaging SCSI EH.

In addition the commands will be returned directly
if the command abort succeeded, cutting down recovery
times dramatically.

With the original scsi error recovery I got:
# time dd if=/dev/zero of=/mnt/test.blk bs=512 count=2048 oflag=sync
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB) copied, 3.72732 s, 281 kB/s

real2m14.475s
user0m0.000s
sys 0m0.104s

with this patchset I got:
# time dd if=/dev/zero of=/mnt/test.blk bs=512 count=2048 oflag=sync
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB) copied, 79.6376 s, 13.2 kB/s

real1m19.642s
user0m0.000s
sys 0m0.100s

Test was to disable RSCN on the target port, disable the
target port, and then start the 'dd' command as indicated.

Comments etc are welcome.

Hannes Reinecke (4):
  scsi: move initialization of scmd-eh_entry
  blk-timeout: add BLK_EH_SCHEDULED return code
  scsi: export functions for new fc timeout handler
  scsi_transport_fc: FC timeout handler

 drivers/scsi/scsi_error.c|  8 -
 drivers/scsi/scsi_lib.c  |  6 ++--
 drivers/scsi/scsi_priv.h |  2 ++
 drivers/scsi/scsi_transport_fc.c | 63 +++-
 include/linux/blkdev.h   |  1 +
 include/scsi/scsi_transport_fc.h |  2 ++
 6 files changed, 78 insertions(+), 4 deletions(-)

-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] blk-timeout: add BLK_EH_SCHEDULED return code

2013-05-24 Thread Hannes Reinecke
Add a 'BLK_EH_SCHEDULED' return code for blk-timeout to indicate
that a delayed error recovery has been initiated.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/scsi_error.c | 3 +++
 include/linux/blkdev.h| 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..bef6799e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -145,6 +145,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
rtn = host-hostt-eh_timed_out(scmd);
 
scmd-result |= DID_TIME_OUT  16;
+   /* Check for delayed EH scheduling */
+   if (rtn == BLK_EH_SCHEDULED)
+   return BLK_EH_NOT_HANDLED;
 
if (unlikely(rtn == BLK_EH_NOT_HANDLED 
 !scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD)))
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 78feda9..0a38211 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -238,6 +238,7 @@ enum blk_eh_timer_return {
BLK_EH_NOT_HANDLED,
BLK_EH_HANDLED,
BLK_EH_RESET_TIMER,
+   BLK_EH_SCHEDULED,
 };
 
 typedef enum blk_eh_timer_return (rq_timed_out_fn)(struct request *);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] scsi: move initialization of scmd-eh_entry

2013-05-24 Thread Hannes Reinecke
The 'eh_entry' list might be used even before scsi_softirq_done()
is called. Hence we should rather initialize it together with
the other eh-related variables.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/scsi_lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c31187d..a5515ec 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -318,6 +318,8 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd)
memset(cmd-sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
if (cmd-cmd_len == 0)
cmd-cmd_len = scsi_command_size(cmd-cmnd);
+   INIT_LIST_HEAD(cmd-eh_entry);
+   cmd-eh_eflags = 0;
 }
 
 void scsi_device_unbusy(struct scsi_device *sdev)
@@ -1483,8 +1485,6 @@ static void scsi_softirq_done(struct request *rq)
unsigned long wait_for = (cmd-allowed + 1) * rq-timeout;
int disposition;
 
-   INIT_LIST_HEAD(cmd-eh_entry);
-
atomic_inc(cmd-device-iodone_cnt);
if (cmd-result)
atomic_inc(cmd-device-ioerr_cnt);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] scsi_transport_fc: FC timeout handler

2013-05-24 Thread Hannes Reinecke
When a command runs into a timeout we need to send an 'ABORT TASK'
TMF. This is typically done by the 'eh_abort_handler' LLDD callback.

Conceptually, however, this function is a normal SCSI command, so
there is no need to enter the error handler.

This patch implements a new FC timeout handler which invokes
the 'eh_abort_handler' directly when the timeout occurs without
entering the error handler.

If the 'eh_abort_handler' returns SUCCESS or FAST_IO_FAIL the
command will be retried if possible. If no retries are allowed
the command will be returned immediately, as we have to assume
the TMF succeeded and the command is completed with the LLDD.
For any other return code from 'eh_abort_handler' the command
will be pushed onto the existing SCSI EH handler, or aborted
with an error if that fails.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/scsi_transport_fc.c | 63 +++-
 include/scsi/scsi_transport_fc.h |  2 ++
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index e106c27..370b263 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2048,6 +2048,51 @@ static int fc_vport_match(struct attribute_container 
*cont,
return i-vport_attr_cont.ac == cont;
 }
 
+/**
+ * fc_rport_eh_handler - FC-specific error handler
+ * @work:  remote port on which the error occurred.
+ */
+static void
+fc_rport_eh_handler(struct work_struct *work)
+{
+   struct fc_rport *rport =
+   container_of(work, struct fc_rport, rport_eh_work);
+   struct Scsi_Host *shost = rport_to_shost(rport);
+   struct scsi_cmnd *scmd, *tmp;
+   unsigned long flags;
+   int rtn;
+
+   spin_lock_irqsave(shost-host_lock, flags);
+   list_for_each_entry_safe(scmd, tmp, rport-eh_work_q, eh_entry) {
+   list_del_init(scmd-eh_entry);
+   spin_unlock_irqrestore(shost-host_lock, flags);
+   dev_printk(KERN_WARNING, rport-dev,
+  trying to abort scmd %p, scmd);
+   rtn = scsi_try_to_abort_cmd(shost-hostt, scmd);
+   if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
+   if (!scsi_noretry_cmd(scmd) 
+   (++scmd-retries = scmd-allowed)) {
+   dev_printk(KERN_WARNING, rport-dev,
+  retry scmd %p, scmd);
+   scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+   } else {
+   dev_printk(KERN_WARNING, rport-dev,
+  fast fail scmd %p, scmd);
+   scmd-result |= DID_TRANSPORT_FAILFAST  16;
+   scsi_finish_command(scmd);
+   }
+   } else {
+   if (!scsi_eh_scmd_add(scmd, 0)) {
+   dev_printk(KERN_WARNING, rport-dev,
+  terminate scmd %p, scmd);
+   scmd-result |= DID_TIME_OUT  16;
+   scsi_finish_command(scmd);
+   }
+   }
+   spin_lock_irqsave(shost-host_lock, flags);
+   }
+   spin_unlock_irqrestore(shost-host_lock, flags);
+}
 
 /**
  * fc_timed_out - FC Transport I/O timeout intercept handler
@@ -2075,11 +2120,25 @@ static enum blk_eh_timer_return
 fc_timed_out(struct scsi_cmnd *scmd)
 {
struct fc_rport *rport = starget_to_rport(scsi_target(scmd-device));
+   struct Scsi_Host *shost = rport_to_shost(rport);
+   unsigned long flags;
+   int kick_worker = 0;
 
if (rport-port_state == FC_PORTSTATE_BLOCKED)
return BLK_EH_RESET_TIMER;
 
-   return BLK_EH_NOT_HANDLED;
+   spin_lock_irqsave(shost-host_lock, flags);
+   if (list_empty(rport-eh_work_q))
+   kick_worker = 1;
+   list_add(scmd-eh_entry, rport-eh_work_q);
+   dev_printk(KERN_WARNING, rport-dev,
+  scmd %p added to eh queue\n, scmd);
+   spin_unlock_irqrestore(shost-host_lock, flags);
+
+   if (kick_worker)
+   fc_queue_work(shost, rport-rport_eh_work);
+
+   return BLK_EH_SCHEDULED;
 }
 
 /*
@@ -2630,11 +2689,13 @@ fc_rport_create(struct Scsi_Host *shost, int channel,
rport-channel = channel;
rport-fast_io_fail_tmo = -1;
 
+   INIT_LIST_HEAD(rport-eh_work_q);
INIT_DELAYED_WORK(rport-dev_loss_work, fc_timeout_deleted_rport);
INIT_DELAYED_WORK(rport-fail_io_work, fc_timeout_fail_rport_io);
INIT_WORK(rport-scan_work, fc_scsi_scan_rport);
INIT_WORK(rport-stgt_delete_work, fc_starget_delete);
INIT_WORK(rport-rport_delete_work, fc_rport_final_delete);
+   INIT_WORK(rport-rport_eh_work, fc_rport_eh_handler);
 

[PATCH 3/4] scsi: export functions for new fc timeout handler

2013-05-24 Thread Hannes Reinecke
Export some more functions which are used by the new FC timeout
handler.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/scsi_error.c | 5 -
 drivers/scsi/scsi_lib.c   | 2 ++
 drivers/scsi/scsi_priv.h  | 2 ++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index bef6799e..f21ad7c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -119,6 +119,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
spin_unlock_irqrestore(shost-host_lock, flags);
return ret;
 }
+EXPORT_SYMBOL_GPL(scsi_eh_scmd_add);
 
 /**
  * scsi_times_out - Timeout function for normal scsi commands.
@@ -659,13 +660,14 @@ static int scsi_try_bus_device_reset(struct scsi_cmnd 
*scmd)
return rtn;
 }
 
-static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct 
scsi_cmnd *scmd)
+int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_cmnd 
*scmd)
 {
if (!hostt-eh_abort_handler)
return FAILED;
 
return hostt-eh_abort_handler(scmd);
 }
+EXPORT_SYMBOL_GPL(scsi_try_to_abort_cmd);
 
 static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
 {
@@ -1412,6 +1414,7 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd)
 
return 0;
 }
+EXPORT_SYMBOL_GPL(scsi_noretry_cmd);
 
 /**
  * scsi_decide_disposition - Disposition a cmd on return from LLD.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a5515ec..7306aaa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -210,6 +210,8 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
 {
__scsi_queue_insert(cmd, reason, 1);
 }
+EXPORT_SYMBOL_GPL(scsi_queue_insert);
+
 /**
  * scsi_execute - insert request and wait for the result
  * @sdev:  scsi device
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 8f9a0ca..1d8d7d1 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -71,6 +71,8 @@ extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
 extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
+extern int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
+struct scsi_cmnd *scmd);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
struct list_head *work_q,
struct list_head *done_q);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question: eh_abort_handler() and terminate commands

2013-05-24 Thread Hannes Reinecke
Hi all,

after having posted the first attempt for an updated FC error
handler I found that the 'eh_abort_handler()' semantics are a bit odd.

Obviously, the 'eh_abort_handler' is called to abort a command.
But what _exactly_ is supposed to happen if it returns 'SUCCESS'?
Initially one does expect that the command has been aborted.
But then the callback itself does _not_ terminate the command,
it's rather expected that the caller of 'eh_abort_command' does it.

Which leads to the interesting question:
What happens with the actual command once eh_abort_handler returns?

As normally 'eh_abort_handler' is implemented as a TMF, one does
assume that the command itself will be returned by the target with
an appropriate status.
However, as the upper layer is expected to terminate the command
itself, we will never see this status, right?
OTOH it also means that the HBA firmware might receive a completion
for a command which the upper layer has already completed.
Will this completion ever being mirrored to the LLDD? Or discarded
by the firmware?
And how is one expected to handle the case where the TMF _failed_
on the target?

I would rather prefer to have the LLDD terminate the command; this
way we at least have a chance of getting a decent status back ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] ipr: possible irq lock inversion dependency detected

2013-05-24 Thread wenxiong
When enable lockdep, seeing possible irq lock inversion dependency detected
error. This patch fixes the issue.

Signed-off-by: Wen Xiong wenxi...@linux.vnet.ibm.com
---
 drivers/scsi/ipr.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

Index: b/drivers/scsi/ipr.c
===
--- a/drivers/scsi/ipr.c2013-05-20 09:55:23.0 -0500
+++ b/drivers/scsi/ipr.c2013-05-20 10:00:34.548380858 -0500
@@ -9408,7 +9408,7 @@ static int ipr_probe_ioa(struct pci_dev 
void __iomem *ipr_regs;
int rc = PCIBIOS_SUCCESSFUL;
volatile u32 mask, uproc, interrupts;
-   unsigned long lock_flags;
+   unsigned long lock_flags, driver_lock_flags;
 
ENTER;
 
@@ -9631,9 +9631,9 @@ static int ipr_probe_ioa(struct pci_dev 
} else
ioa_cfg-reset = ipr_reset_start_bist;
 
-   spin_lock(ipr_driver_lock);
+   spin_lock_irqsave(ipr_driver_lock, driver_lock_flags);
list_add_tail(ioa_cfg-queue, ipr_ioa_head);
-   spin_unlock(ipr_driver_lock);
+   spin_unlock_irqrestore(ipr_driver_lock, driver_lock_flags);
 
LEAVE;
 out:
@@ -9716,6 +9716,7 @@ static void __ipr_remove(struct pci_dev 
unsigned long host_lock_flags = 0;
struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
int i;
+   unsigned long driver_lock_flags;
ENTER;
 
spin_lock_irqsave(ioa_cfg-host-host_lock, host_lock_flags);
@@ -9739,9 +9740,9 @@ static void __ipr_remove(struct pci_dev 
INIT_LIST_HEAD(ioa_cfg-used_res_q);
spin_lock_irqsave(ioa_cfg-host-host_lock, host_lock_flags);
 
-   spin_lock(ipr_driver_lock);
+   spin_lock_irqsave(ipr_driver_lock, driver_lock_flags);
list_del(ioa_cfg-queue);
-   spin_unlock(ipr_driver_lock);
+   spin_unlock_irqrestore(ipr_driver_lock, driver_lock_flags);
 
if (ioa_cfg-sdt_state == ABORT_DUMP)
ioa_cfg-sdt_state = WAIT_FOR_DUMP;
@@ -10007,12 +10008,12 @@ static int ipr_halt(struct notifier_bloc
 {
struct ipr_cmnd *ipr_cmd;
struct ipr_ioa_cfg *ioa_cfg;
-   unsigned long flags = 0;
+   unsigned long flags = 0, driver_lock_flags;
 
if (event != SYS_RESTART  event != SYS_HALT  event != SYS_POWER_OFF)
return NOTIFY_DONE;
 
-   spin_lock(ipr_driver_lock);
+   spin_lock_irqsave(ipr_driver_lock, driver_lock_flags);
 
list_for_each_entry(ioa_cfg, ipr_ioa_head, queue) {
spin_lock_irqsave(ioa_cfg-host-host_lock, flags);
@@ -10030,7 +10031,7 @@ static int ipr_halt(struct notifier_bloc
ipr_do_req(ipr_cmd, ipr_halt_done, ipr_timeout, 
IPR_DEVICE_RESET_TIMEOUT);
spin_unlock_irqrestore(ioa_cfg-host-host_lock, flags);
}
-   spin_unlock(ipr_driver_lock);
+   spin_unlock_irqrestore(ipr_driver_lock, driver_lock_flags);
 
return NOTIFY_OK;
 }

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] ipr: possible irq lock inversion dependency detected

2013-05-24 Thread wenxiong
Hi All,

When enable lockdep, seeing possible irq lock inversion dependency detected
error. This patch fixed the issue.

Thanks,
Wendy
-- 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/2] block,scsi: fixup blk_get_request dead queue scenarios

2013-05-24 Thread Joe Lawrence
[PATCH v5 0/2] block,scsi: fixup blk_get_request dead queue scenarios

Changes from v4:

 - As per James' suggestion, split into two patches: the first adds
   blk_get_request return value checking to avoid potential oops, the
   second converts callers and friends to handle ERR_PTR differentiation
   of -ENOMEM and -ENODEV.

Regards,

-- Joe
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/2] block,scsi: verify return pointer from blk_get_request

2013-05-24 Thread Joe Lawrence
From 22307be1bc6e404622b1f074094902e385a1bd30 Mon Sep 17 00:00:00 2001
From: Joe Lawrence joe.lawre...@stratus.com
Date: Fri, 24 May 2013 12:39:04 -0400
Subject: [PATCH v5 1/2] block,scsi: verify return pointer from blk_get_request

The blk-core dead queue checks introduced in commit 70460571 added an
error scenario to blk_get_request that returns NULL if the request queue
has been shutdown. This changed the behavior for __GFP_WAIT callers, who
should now verify the return value before dereferencing.

Signed-off-by: Joe Lawrence joe.lawre...@stratus.com
Cc: Jens Axboe ax...@kernel.dk
Cc: Jiri Kosina jkos...@suse.cz
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: Bart Van Assche bvanass...@acm.org
Cc: linux-scsi@vger.kernel.org
---
 block/scsi_ioctl.c| 9 -
 drivers/block/paride/pd.c | 2 ++
 drivers/block/pktcdvd.c   | 2 ++
 drivers/scsi/scsi_error.c | 2 ++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 9a87daa..6c87d4e 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -458,6 +458,10 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk 
*disk, fmode_t mode,
}
 
rq = blk_get_request(q, in_len ? WRITE : READ, __GFP_WAIT);
+   if (!rq) {
+   err = -ENODEV;
+   goto error_free_buffer;
+   }
 
cmdlen = COMMAND_SIZE(opcode);
 
@@ -530,8 +534,9 @@ out:
}

 error:
-   kfree(buffer);
blk_put_request(rq);
+error_free_buffer:
+   kfree(buffer);
return err;
 }
 EXPORT_SYMBOL_GPL(sg_scsi_ioctl);
@@ -544,6 +549,8 @@ static int __blk_send_generic(struct request_queue *q, 
struct gendisk *bd_disk,
int err;
 
rq = blk_get_request(q, WRITE, __GFP_WAIT);
+   if (!rq)
+   return -ENODEV;
rq-cmd_type = REQ_TYPE_BLOCK_PC;
rq-timeout = BLK_DEFAULT_SG_TIMEOUT;
rq-cmd[0] = cmd;
diff --git a/drivers/block/paride/pd.c b/drivers/block/paride/pd.c
index 831e3ac..fc2ecff 100644
--- a/drivers/block/paride/pd.c
+++ b/drivers/block/paride/pd.c
@@ -722,6 +722,8 @@ static int pd_special_command(struct pd_unit *disk,
int err = 0;
 
rq = blk_get_request(disk-gd-queue, READ, __GFP_WAIT);
+   if (!rq)
+   return -ENODEV;
 
rq-cmd_type = REQ_TYPE_SPECIAL;
rq-special = func;
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 1119042..4a8fb03f 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -711,6 +711,8 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, 
struct packet_command *
 
rq = blk_get_request(q, (cgc-data_direction == CGC_DATA_WRITE) ?
 WRITE : READ, __GFP_WAIT);
+   if (!rq)
+   return -ENODEV;
 
if (cgc-buflen) {
if (blk_rq_map_kern(q, rq, cgc-buffer, cgc-buflen, 
__GFP_WAIT))
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..aa6b83d 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1635,6 +1635,8 @@ static void scsi_eh_lock_door(struct scsi_device *sdev)
 * request becomes available
 */
req = blk_get_request(sdev-request_queue, READ, GFP_KERNEL);
+   if (!req)
+   return;
 
req-cmd[0] = ALLOW_MEDIUM_REMOVAL;
req-cmd[1] = 0;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/2] block,scsi: convert and handle ERR_PTR from blk_get_request

2013-05-24 Thread Joe Lawrence
From 5b26d593807b30f60ed41f6fd5a16a56c3c9a43c Mon Sep 17 00:00:00 2001
From: Joe Lawrence joe.lawre...@stratus.com
Date: Fri, 24 May 2013 13:05:09 -0400
Subject: [PATCH v5 2/2] block,scsi: convert and handle ERR_PTR from 
blk_get_request

The blk_get_request function may fail in low-memory conditions or during
device removal (even if __GFP_WAIT is set). To distinguish between these
errors, modify the blk_get_request call stack to return the appropriate
ERR_PTR. Verify that all callers check the return status and consider
IS_ERR instead of a simple NULL pointer check.

Signed-off-by: Joe Lawrence joe.lawre...@stratus.com
Cc: Jens Axboe ax...@kernel.dk
Cc: Jiri Kosina jkos...@suse.cz
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: Bart Van Assche bvanass...@acm.org
Cc: linux-scsi@vger.kernel.org
---
 block/blk-core.c| 34 ++---
 block/bsg.c |  8 +++
 block/scsi_ioctl.c  | 12 +-
 drivers/block/paride/pd.c   |  4 ++--
 drivers/block/pktcdvd.c |  4 ++--
 drivers/block/sx8.c |  2 +-
 drivers/cdrom/cdrom.c   |  4 ++--
 drivers/ide/ide-park.c  |  2 +-
 drivers/scsi/device_handler/scsi_dh_alua.c  |  2 +-
 drivers/scsi/device_handler/scsi_dh_emc.c   |  2 +-
 drivers/scsi/device_handler/scsi_dh_hp_sw.c |  4 ++--
 drivers/scsi/device_handler/scsi_dh_rdac.c  |  2 +-
 drivers/scsi/osd/osd_initiator.c|  4 ++--
 drivers/scsi/osst.c |  2 +-
 drivers/scsi/scsi_error.c   |  2 +-
 drivers/scsi/scsi_lib.c |  2 +-
 drivers/scsi/scsi_tgt_lib.c |  2 +-
 drivers/scsi/sg.c   |  4 ++--
 drivers/scsi/st.c   |  2 +-
 drivers/target/target_core_pscsi.c  |  4 ++--
 20 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index f224d17..9e254e4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -886,9 +886,9 @@ static struct io_context *rq_ioc(struct bio *bio)
  * Get a free request from @q.  This function may fail under memory
  * pressure or if @q is dead.
  *
- * Must be callled with @q-queue_lock held and,
- * Returns %NULL on failure, with @q-queue_lock held.
- * Returns !%NULL on success, with @q-queue_lock *not held*.
+ * Must be called with @q-queue_lock held and,
+ * Returns ERR_PTR on failure, with @q-queue_lock held.
+ * Returns request pointer on success, with @q-queue_lock *not held*.
  */
 static struct request *__get_request(struct request_list *rl, int rw_flags,
 struct bio *bio, gfp_t gfp_mask)
@@ -902,7 +902,7 @@ static struct request *__get_request(struct request_list 
*rl, int rw_flags,
int may_queue;
 
if (unlikely(blk_queue_dying(q)))
-   return NULL;
+   return ERR_PTR(-ENODEV);
 
may_queue = elv_may_queue(q, rw_flags);
if (may_queue == ELV_MQUEUE_NO)
@@ -927,7 +927,7 @@ static struct request *__get_request(struct request_list 
*rl, int rw_flags,
 * process is not a batcher, and not
 * exempted by the IO scheduler
 */
-   return NULL;
+   return ERR_PTR(-ENOMEM);
}
}
}
@@ -945,7 +945,7 @@ static struct request *__get_request(struct request_list 
*rl, int rw_flags,
 * allocated with any setting of -nr_requests
 */
if (rl-count[is_sync] = (3 * q-nr_requests / 2))
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 
q-nr_rqs[is_sync]++;
rl-count[is_sync]++;
@@ -1050,7 +1050,7 @@ fail_alloc:
 rq_starved:
if (unlikely(rl-count[is_sync] == 0))
rl-starved[is_sync] = 1;
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 }
 
 /**
@@ -1063,9 +1063,9 @@ rq_starved:
  * Get a free request from @q.  If %__GFP_WAIT is set in @gfp_mask, this
  * function keeps retrying under memory pressure and fails iff @q is dead.
  *
- * Must be callled with @q-queue_lock held and,
- * Returns %NULL on failure, with @q-queue_lock held.
- * Returns !%NULL on success, with @q-queue_lock *not held*.
+ * Must be called with @q-queue_lock held and,
+ * Returns ERR_PTR on failure, with @q-queue_lock held.
+ * Returns request pointer on success, with @q-queue_lock *not held*.
  */
 static struct request *get_request(struct request_queue *q, int rw_flags,
   struct bio *bio, gfp_t gfp_mask)
@@ -1078,12 +1078,12 @@ static struct request *get_request(struct request_queue 
*q, int rw_flags,
rl = blk_get_rl(q, bio);/* transferred to @rq on success */
 retry:

Re: [PATCH 1/4] scsi: move initialization of scmd-eh_entry

2013-05-24 Thread Jörn Engel
On Fri, 24 May 2013 11:50:47 +0200, Hannes Reinecke wrote:
 
 The 'eh_entry' list might be used even before scsi_softirq_done()
 is called. Hence we should rather initialize it together with
 the other eh-related variables.
 
 Signed-off-by: Hannes Reinecke h...@suse.de
 ---
  drivers/scsi/scsi_lib.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
 index c31187d..a5515ec 100644
 --- a/drivers/scsi/scsi_lib.c
 +++ b/drivers/scsi/scsi_lib.c
 @@ -318,6 +318,8 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd)
   memset(cmd-sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
   if (cmd-cmd_len == 0)
   cmd-cmd_len = scsi_command_size(cmd-cmnd);
 + INIT_LIST_HEAD(cmd-eh_entry);
 + cmd-eh_eflags = 0;

I always suspect something subtle going on when a variable is
initialized to 0 three lines below a memset.  Can line this go away?

Jörn

--
Those who come seeking peace without a treaty are plotting.
-- Sun Tzu
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread Paolo Bonzini
Il 24/05/2013 10:32, Paolo Bonzini ha scritto:
 Il 24/05/2013 10:03, James Bottomley ha scritto:
 Does anyone in the real world actually care about this bug?

 Yes, or I would move on and not waste so much time on this.

 Fine, so produce a simple fix for this bug which we can discuss that's
 not tied to this feature.

 Honestly, I have no idea how this is even possible.
 Really?  It looks to me like a simple block on the commands for disk
 devices in the opcode switch would do it (with a corresponding change to
 sg.c:sg_allow_access).
 
 Which switch?  What I can do is something like this in blk_verify_command:
 
 if (q-sgio_type == TYPE_ROM)
   return 0;
   if (rq-cmd[0] == 0xA4)
   return -EPERM;
   if (!is_write 
   (req-cmd[0] == ... || rq-cmd[0] == ...))
   return -EPERM;
 
 But then the particular patch you're replying to is still necessary,
 and you're slowing down blk_verify_command.  It may be fine for stable
 and -rc, but IMHO it calls for a better implementation in 3.11.

Ok, so I did a patch along these lines.  And it's just as ugly as
everything else that I've been posting in these threads.  Yes, perhaps
it has a redeeming grace in that it is fine for = 3.10, but that's it.

Because actually I agree with you.  The rework of the SG_IO command
filter _is_ dubious to say the least; 300 lines of default, immutable
policy don't belong in the kernel.

So why am I posting this crap?  Because I have to work around the nack
of the generic sysfs bitmap patches, which would have beatifully solved
all of this.

In fact, you had proposed that approach.  I posted it in September 2012.
 Then (as usual) silence for one month until it was quickly dismissed by
Tejun.

*You and Jens* failed to review patches, and this rathole is what that
led to.  It's unpleasant for me as it is for everyone else.

Yes, you and Jens are busy, we all are.  But *you and Jens* are the
maintainers.  Please make a decision instead of drawing straws, so that
we can all go back to our regular business.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Tejun Heo
On Fri, May 24, 2013 at 11:45:33AM +0200, Paolo Bonzini wrote:
  It's not just unimplemented commands. Exposing any new command exposes
  its borderline problems together with it.
 
 For commands that are used by Linux already, the right way to fix the
 problems is not obscuring the commands from userspace's view.  You can
 hit the same problems with ioctls or even with normal operation of the
 device.

The kernel is providing an isolation layer between the userland and
the devices.  It isn't obscuring.  We can go through many adjectives
but it's still increasing the amount exposed surface and accelerating
expansion of cdb filter.

 And prohibiting the extension of whitelists is gonna increase the
 usage of unpriv_sgio and less-secure userspace whitelists.
 
 Anvil, meet hammer.

Delegating full device access is still a fringe use case compared to
generic block RW access.  Given thta we're expecting to have an extra
separation layer albeit in userland, the overall picture doesn't seem
to favor extension of whitelists.

  If the bulk of filtering can be solved with userland whitelisting as a
  confined user, it should be possible to resolve peripheral problems
  like log messages in simpler way, no? Can you please elaborate on the
  log message problem? Who's spewing those messages?
 
 For example:
 
 if (bdev_write_same(bdev)) {
 unsigned char bdn[BDEVNAME_SIZE];
 
 if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
  ZERO_PAGE(0)))
 return 0;
 
 bdevname(bdev, bdn);
 pr_err(%s: WRITE SAME failed. Manually zeroing.\n, bdn);
 }
 
 return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 
 The device exposes the ability to zero out LUN blocks, but the command is
 not whitelisted and it fails.

Which can be solved by userland filtering, right?

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: eh_abort_handler() and terminate commands

2013-05-24 Thread Jeremy Linton
On 5/24/2013 5:57 AM, Hannes Reinecke wrote:
 Which leads to the interesting question: What happens with the actual
 command once eh_abort_handler returns?

Well, eventually it ends up on the done_q and gets returned up the 
stack via
flush_done_q(). But that wasn't what you were asking?

 
 As normally 'eh_abort_handler' is implemented as a TMF, one does assume
 that the command itself will be returned by the target with an appropriate
 status.
Uh, well you don't get a proper SCSI status on a TMF or a ABTS/ABTX. 
So
basically, the abort just kills processing of the commands.



 OTOH it also means that the HBA firmware might receive a completion for a
 command which the upper layer has already completed.
Well, I think there is some rule here (scsi_eh.txt, everyone forgets 
about
the command) that by the time the eh_abort_handler() completes you won't get
any new scsi_done()s. This doesn't appear to mean that you won't get them
while the abort_handler is running. Hence if you look at send_eh_cmnd() you
see that the done completion being triggered at any time after the
wait_for_completion_timeout() doesn't really do anything useful. The normal
abort path completion doesn't appear to care either. Abort success/failure
doesn't appear to fundamentally change the eventual return status of the
commands.


 Will this completion ever being mirrored to the LLDD? Or discarded by the
 firmware?

Yes, if for some reason a status comes in for an aborted exchange the 
HBA
firmware rejects it because its against an invalid exchange (or should, the
HBA i'm most familiar with does it this way). This is fairly easy to test if
you have a jammer, just inject a FCP_RSP_IU into an aborted exchange.


 And how is one expected to handle the case where the TMF _failed_ on the
 target?
Doesn't the current path eventually just end up doing the lun reset? 
Whats
wrong with that, stop all the IO, let the existing commands complete or
timeout then hit the device with the big hammer?

If the lun reset succeeds you can pretty much feel safe that everything 
is
aborted. That is assuming you get the correct return from the
bus_device_reset(). It is potentially possible for the lun reset to be
rejected, and in the case of some of the drivers return success anyway
(consider lpfc_sli_issue_iocb_wait). I bet I could corrupt some disk data like
that (format unit, abts reject, lun reset reject, continue operation with
format unit still running on the target).




 I would rather prefer to have the LLDD terminate the command; this way we
 at least have a chance of getting a decent status back ...

Well, you might be able to simplify a few things in scsi_* if
eh_abort_handler() were more like the windows async cancel IO IRP and didn't
block. It simply marks the IO as being canceled and then the completion
eventually runs as normal within the devloss timeout. You probably could abort
right out of a function in front of scsi_times_out() and avoid the whole error
handling queues/blocking/task/etc. Then you use the abort accept/failure out
of scsi_done to either queue the command into the current scsi_times_out
logic, or you complete it with a timeout.

Pretty clean, except for the fact your going to have to rewrite a lot of
stuff in the LLDs to assure that they get the abort status returned within a
reasonable amount of time. OTOH, the cancel IO model in windows is one of the
things people writing IO drivers on that platform despise.


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Vladislav Bolkhovitin
Martin K. Petersen, on 05/22/2013 09:32 AM wrote:
 Paolo First of all, I'll note that SG_IO and block-device-specific
 Paolo ioctls both have their place.  My usecase for SG_IO is
 Paolo virtualization, where I need to pass information from the LUN to
 Paolo the virtual machine with as much fidelity as possible if I choose
 Paolo to virtualize at the SCSI level.  
 
 Now there's your problem! Several people told you way back that the SCSI
 virt approach was a really poor choice. The SG_IO permissions problem is
 a classic Doctor, it hurts when I do this.
 
 The kernel's fundamental task is to provide abstraction between
 applications and intricacies of hardware. The right way to solve the
 problem would have been to provide a better device abstraction built on
 top of the block/SCSI infrastructure we already have in place. If you
 need more fidelity, add fidelity to the block layer instead of punching
 a giant hole through it.
 
 I seem to recall that reservations were part of your motivation for
 going the SCSI route in the first place. A better approach would have
 been to create a generic reservations mechanism that could be exposed to
 the guest. And then let the baremetal kernel worry about the appropriate
 way to communicate with the physical hardware. Just like we've done with
 reads and writes, discard, write same, etc.

Well, any abstraction is good only if it isn't artificial, so solving more 
problems
than creating.

Reality is that de facto in the industry _SCSI_ is the abstraction for 
block/direct
access to data. Look around. How many of systems around you after all layers 
end up to
SCSI commands in their storage devices?

Linux block layer is purely artificial creature slowly reinventing wheel 
creating more
problems, than solving. It enforces approach, where often impossible means
impossible in this interface. For instance, how about copy offload? How about
reservations? How about atomic writes? Look at history of barriers and compare 
then
with what can be done in SCSI. It's still worse, because doesn't allow usage of 
all
devices capabilities. Why was it needed to create special blk integrity 
interface with
the only end user - SCSI? Artificial task created - then well solved. Etc, etc.

The block layer keeps repeating SCSI. So, maybe, after all, it's better to 
acknowledge
that direct usage of SCSI without any intermediate layers and translations is 
more
productive? And for those minors not using SCSI internally, translate from SCSI 
to
their internal commands? Creating and filling CDB fields for most cases isn't 
anyhow
harder, than creating and feeling bio fields.

So, I appreciate work Paolo is doing in this direction. At least, the right 
thing will
be on the virtualization level.

I do understand that with all existing baggage replacing block layer by SCSI 
isn't
practical and not proposing it, but let's at least acknowledge limitations of 
the
academic block abstraction. Let's don't make those limitations global walls. 
Many
things better to do using direct SCSI, hence let's do the better way.

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 part1 1/4] sg_io: pass request_queue to blk_verify_command

2013-05-24 Thread James Bottomley
On Fri, 2013-05-24 at 10:32 +0200, Paolo Bonzini wrote:
 Il 24/05/2013 10:03, James Bottomley ha scritto:
Does anyone in the real world actually care about this bug?
   
Yes, or I would move on and not waste so much time on this.

Fine, so produce a simple fix for this bug which we can discuss that's
not tied to this feature.
   
   Honestly, I have no idea how this is even possible.
  Really?  It looks to me like a simple block on the commands for disk
  devices in the opcode switch would do it (with a corresponding change to
  sg.c:sg_allow_access).
 
 Which switch?  What I can do is something like this in blk_verify_command:

not in blk_verify_command: outside of it, in the three places it's used.

 if (q-sgio_type == TYPE_ROM)
   return 0;
   if (rq-cmd[0] == 0xA4)
   return -EPERM;
   if (!is_write 
   (req-cmd[0] == ... || rq-cmd[0] == ...))
   return -EPERM;
 
 But then the particular patch you're replying to is still necessary,
 and you're slowing down blk_verify_command.

It's a set if if switches in non performance critical code.

   It may be fine for stable
 and -rc, but IMHO it calls for a better implementation in 3.11.

What goes into stable should be what goes into the real kernel and it
helps separate the bug fix from feature argument.

James

 (Besides, I did it like this because it is what Tejun suggested).


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread James Bottomley
On Fri, 2013-05-24 at 17:02 +0900, Tejun Heo wrote:
 On Fri, May 24, 2013 at 4:13 PM, Paolo Bonzini pbonz...@redhat.com wrote:
  The same filtering table being applied to different classes of
  hardware is a software bug, but my point is that the practive
  essentially entrusts non-insignificant part of security enforcement to
  the hardware itself.  The variety of hardware in question is very wide
  and significant portion has historically been known to be flaky.
 
  Unproven theory, and contradicted by actual practice.  Bugs are more
  common in the handling of borderline conditions, not in the handling of
  unimplemented commands.
 
  If you want to be secure aginst buggy firmware, the commands you have to
  block are READ and WRITE.  Check out the list of existing USB quirks.
 
 Well, I'd actually much prefer disabling CDB whitelisting for all !MMC
 devices if at all possible.

I'll go along with this.  I'm also wondering what the problem would be
if we just allowed all commands on either CAP_SYS_RAWIO or opening the
device for write, so we just defer to the filesystem permissions and
restricted read only opens to the basic all device opcodes.

  You're basically arguing that because what
 we have is already broken, it should be okay to break it further.
 Also, RW commands having more quirks doesn't necessarily indicate that
 they tend to be more broken. They just get hammered on a lot in
 various ways so problems on those commands tend to be more noticeable.

I agree with this, so finding a way to get rid of the opcode table seems
to be what we need.

  You need to allow more commands.
  The count-me-out knob allows all commands.
  You cannot always allow all commands.
  Ergo, you cannot always use the count-me-out knob.

Do we have a real world example of this?  Getting the kernel out of the
command filtering business does seem to be a good idea to me.

 The thing is that both approaches aren't perfect here so you can make
 similar type of argument from the other side. If the system wants to
 give out raw hardware access to VMs, requiring it to delegate the
 device fully could be reasonable. Not ideal but widening direct
 command access by default is pretty nasty too.

James

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] scsi_transport_fc: FC timeout handler

2013-05-24 Thread Christoph Hellwig
This looks like a good start, but why would we make this FC specific?

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Christoph Hellwig
On Fri, May 24, 2013 at 09:35:02PM -0700, James Bottomley wrote:
 I'll go along with this.  I'm also wondering what the problem would be
 if we just allowed all commands on either CAP_SYS_RAWIO or opening the
 device for write, so we just defer to the filesystem permissions and
 restricted read only opens to the basic all device opcodes.

I've been out of this area for a bit, but the problem used to be that
you could send destructive commands to a partition.  The right fix
for that would be to not allow SG_IO on partitions at all, just
wondering if anything would be broken by this.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html