Re: Fail to probe qla2xxx fiber channel card while doing pci hotplug

2012-09-18 Thread Yijing Wang
On 2012/9/19 1:54, Bjorn Helgaas wrote:
 On Mon, Sep 17, 2012 at 6:06 AM, Yijing Wang wangyij...@huawei.com wrote:
 On 2012/9/16 11:30, Bjorn Helgaas wrote:
 On Sat, Sep 15, 2012 at 4:22 AM, Yijing Wang wangyij...@huawei.com wrote:
 Hi all,
I encountered a very strange problem when I hot plug a fiber channel 
 card(using qla2xxx driver).
 I did the hotplug in arch x86 machine, using pciehp driver for hotplug, 
 this platform supports pci hot-plug triggering from both
 sysfs and attention button. If a hot-plug slot is empty when system 
 boot-up, then hotplug FC card in this slot is ok.
 If a hot-plug slot has been embeded a FC card when system boot-up, 
 hot-remove this card is ok, but hot-add this card will fail.
 I used
 #modprobe qla2xxx ql2xextended_error_logging=0x7fff
 to get all probe info. As bellow:

 Can anyone give me any suggestion for this problem?

 It sounds like you did this:

   1) Power down system
   2) Remove FC card from slot
   3) Boot system
   4) Hot-add FC card
   5) Load qla2xxx driver
   6) qla2xxx driver claims FC card
   7) FC card works correctly

   8) Power down system
   9) Install FC card in slot
  10) Boot system
  11) Load qla2xxx driver
  12) qla2xxx driver claims FC card
  13) FC card works correctly
 I rmmod qla2xxx driver here and modprobe qla2xxx 
 ql2xextended_error_logging=0x1e40 again for get errors info
 Also I modprobe pciehp pciehp_debug=1 for getting debug info
  14) Hot-remove card
  15) Hot-add card
  16) qla2xxx driver claims FC card
  17) FC card does not work

 and I assume the dmesg log you included is just from steps 15 and 16
 (correct me if I'm wrong).

 It would be useful to see the entire log showing all these events so
 we can compare the working cases with the non-working one.  If you use
 the pciehp_debug module parameter, we should also see some pciehp
 events that would help me understand that driver.


 Hi Bjorn,
Thanks for your comments very much!

 My steps:
 1) power down system
 2) Install FC card in slot
 3) Boot system
 4) Load qla2xxx driver
 5) qla2xxx driver claims FC card
 6) FC card works correctly(at least probe return ok, I don't know qla2xxx 
 driver much..)
 7) rmmod qla2xxx
 8) modprobe qla2xxx ql2xextended_error_logging=0x1e40(for get errors 
 info)
 9) modprobe pciehp pciehp_debug=1
 10) Hot-remove card
 11) Hot-add card
 12) qla2xxx driver claims FC card fail(probe return fail, setup chip fail)
 --so this is failed situation--

 --continue to hot-add fc card into empty 
 slot(also support pci hp)
 13) Install FC card in empty slot
 14) Hot-add card
 15) qla2xxx driver claims FC card ok (probe return ok)

 btw:
 If fc card firmware version 4.03, everything is ok (hot-plug in any 
 slots(empty or not))
 fc card firmware version is 4.04 or 5.04 , situation as same as 1)---12)
 
 Thanks.  The FW change is a good clue.  If everything works with
 version 4.03, but it doesn't work with version 4.04, it's likely to be
 a FW problem, not a Linux PCI core problem.
 
 Here's what I see from your logs.  In slot 4 (bus 08), the card was
 present before boot, you removed it, re-added it, and it failed after
 being re-added.  Slot 3 (bus 06) was empty at boot, you hot-added a
 card, and it worked.  Here are the resources available on those two
 buses and the boot-time config of the first device in slot 4:
 
   pci :00:07.0: PCI bridge to [bus 06-07]
   pci :00:07.0:   bridge window [io  0xc000-0xcfff]
   pci :00:07.0:   bridge window [mem 0xf900-0xf9ff]
   pci :00:07.0:   bridge window [mem 0xf100-0xf1ff 64bit pref]
   pci :00:09.0: PCI bridge to [bus 08-09]
   pci :00:09.0:   bridge window [io  0xb000-0xbfff]
   pci :00:09.0:   bridge window [mem 0xf800-0xf8ff]
   pci :00:09.0:   bridge window [mem 0xf000-0xf0ff 64bit pref]
   pci :08:00.0: [1077:2532] type 00 class 0x0c0400
   pci :08:00.0: reg 10: [io  0xb100-0xb1ff]
   pci :08:00.0: reg 14: [mem 0xf8084000-0xf8087fff 64bit]
   pci :08:00.0: reg 30: [mem 0xf804-0xf807 pref]
 
 After you remove and re-add the card in slot 4, it starts with
 uninitialized BARs as expected, then we assign resources to it.  It's
 sort of interesting that the BIOS had originally put the ROM (reg 30)
 in the non-prefetchable window, while after the hot-add, Linux places
 it in the prefetchable window.  Either should work, and in fact the
 card you added in slot 3 *does* work with its ROM in the prefetchable
 window.
 
   pci :08:00.0: [1077:2532] type 00 class 0x0c0400
   pci :08:00.0: reg 10: [io  0x-0x00ff]
   pci :08:00.0: reg 14: [mem 0x-0x3fff 64bit]
   pci :08:00.0: reg 30: [mem 0x-0x0003 pref]
   pci :08:00.0: BAR 0: assigned [io  0xb000-0xb0ff]
   pci :08:00.0: BAR 1: assigned [mem 

Re: [PATCH] qla2xxx: Fix endianness of task management response code

2012-09-18 Thread Saurav Kashyap


From: Roland Dreier rol...@purestorage.com

The qla2xxx firmware actually expects the task management response
code in a CTIO IOCB with SCSI status mode 1 to be in little-endian
byte order, ie the response code should be the first byte in the
sense_data[] array.  The old code erroneously byte-swapped the
response code, which puts it in the wrong place on the wire and leads
to initiators thinking every task management request succeeds (since
they see 0 in the byte where they look for the response code).

Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Arun Easi arun.e...@qlogic.com
Signed-off-by: Roland Dreier rol...@purestorage.com
---
 drivers/scsi/qla2xxx/qla_target.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c
b/drivers/scsi/qla2xxx/qla_target.c
index 5b30132..41b74ba 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -1403,7 +1403,7 @@ static void qlt_24xx_send_task_mgmt_ctio(struct
scsi_qla_host *ha,
   ctio-u.status1.scsi_status =
   __constant_cpu_to_le16(SS_RESPONSE_INFO_LEN_VALID);
   ctio-u.status1.response_len = __constant_cpu_to_le16(8);
-  ((uint32_t *)ctio-u.status1.sense_data)[0] = cpu_to_be32(resp_code);
+  ctio-u.status1.sense_data[0] = resp_code;

   qla2x00_start_iocbs(ha, ha-req);
 }
--
1.7.10.4

Acked-by: Saurav Kashyap saurav.kash...@qlogic.com

Thanks,
~Saurav


This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] Support runtime power off of HDD

2012-09-18 Thread Aaron Lu
v2:
Introduced a new flag sync_before_stop as scsi_stop =
ata_flush_cache + ata_standby_immediate. This flag specifies
that when stop command is to send to this device, please issue
an additional synchronize cache command to it before the stop
command.
Put device into stop power condition on runtime suspend.

v1:
This patch set is baed on v7 ZPODD patches.

Aaron Lu (4):
  scsi: introduce sync_before_stop flag
  scsi: sd: enter stop power condition on runtime suspend
  scsi: sd: set ready_to_power_off for scsi disk
  libata: acpi: set can_power_off for both ODD and HDD

 drivers/ata/libata-acpi.c  | 25 +
 drivers/ata/libata-scsi.c  |  1 +
 drivers/scsi/sd.c  | 11 +--
 include/scsi/scsi_device.h |  1 +
 4 files changed, 28 insertions(+), 10 deletions(-)

-- 
1.7.12.21.g871e293

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] scsi: introduce sync_before_stop flag

2012-09-18 Thread Aaron Lu
When scsi device received stop command, it will take care of its
internal cache before enters stopped power condition. This command is
translated to standby immediate in libata-scsi, but standby doesn't
imply flush cache for ATA device, so to issue stop command to ATA
device, an additional flush cache has to be issued.

Introduce this flag so that when we are to stop the ATA disk in scsi
disk driver, also flush its internal cache.

Signed-off-by: Aaron Lu aaron...@intel.com
---
 include/scsi/scsi_device.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 4712aa1..26c3621 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -160,6 +160,7 @@ struct scsi_device {
unsigned ready_to_power_off:1;  /* Device is ready to be powered off */
unsigned powered_off:1; /* Device is powered off */
unsigned may_power_off:1; /* Power off is allowed by user */
+   unsigned sync_before_stop:1; /* Sync cache before stop */
 
DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events 
*/
struct list_head event_list;/* asserted events */
-- 
1.7.12.21.g871e293

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] scsi: sd: enter stop power condition on runtime suspend

2012-09-18 Thread Aaron Lu
On runtime suspend, put device into stop power condition.

For standard scsi device, no need to synchronize cache.
For ata devices, issue synchronize cache before enter standby state
as reflected by the sync_before_stop flag.

Signed-off-by: Aaron Lu aaron...@intel.com
---
 drivers/ata/libata-scsi.c |  1 +
 drivers/scsi/sd.c | 10 --
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 8ec81ca..9307f52 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1117,6 +1117,7 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
} else {
sdev-sector_size = ata_id_logical_sector_size(dev-id);
sdev-manage_start_stop = 1;
+   sdev-sync_before_stop = 1;
}
 
/*
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4df73e5..7b0f07a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2866,18 +2866,24 @@ static int sd_suspend(struct device *dev, pm_message_t 
mesg)
 {
struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
int ret = 0;
+   bool sync;
 
if (!sdkp)
return 0;   /* this can happen */
 
-   if (sdkp-WCE) {
+   if (!sdkp-device-manage_start_stop)
+   sync = !!sdkp-WCE;
+   else
+   sync = sdkp-device-sync_before_stop  !!sdkp-WCE;
+
+   if (sync) {
sd_printk(KERN_NOTICE, sdkp, Synchronizing SCSI cache\n);
ret = sd_sync_cache(sdkp);
if (ret)
goto done;
}
 
-   if ((mesg.event  PM_EVENT_SLEEP)  sdkp-device-manage_start_stop) {
+   if (sdkp-device-manage_start_stop) {
sd_printk(KERN_NOTICE, sdkp, Stopping disk\n);
ret = sd_start_stop_device(sdkp, 0);
}
-- 
1.7.12.21.g871e293

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] libata: acpi: set can_power_off for both ODD and HDD

2012-09-18 Thread Aaron Lu
Hard disk may also be runtime powered off, so set can_power_off flag
for it too if condition satisfies so that the may_power_off sysfs file
will be created for it to give user a chance to disable runtime power
off.

Signed-off-by: Aaron Lu aaron...@intel.com
Acked-by: Jeff Garzik jgar...@redhat.com
---
 drivers/ata/libata-acpi.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index 24347e0..443c3f2 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -1015,7 +1015,7 @@ static void ata_acpi_add_pm_notifier(struct ata_device 
*dev)
if (ACPI_FAILURE(status))
return;
 
-   if (dev-sdev-can_power_off) {
+   if (dev-class == ATA_DEV_ATAPI  dev-sdev-can_power_off) {
acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY,
ata_acpi_wake_dev, dev);
device_set_run_wake(dev-sdev-sdev_gendev, true);
@@ -1036,7 +1036,7 @@ static void ata_acpi_remove_pm_notifier(struct ata_device 
*dev)
if (ACPI_FAILURE(status))
return;
 
-   if (dev-sdev-can_power_off) {
+   if (dev-class == ATA_DEV_ATAPI  dev-sdev-can_power_off) {
device_set_run_wake(dev-sdev-sdev_gendev, false);
acpi_remove_notify_handler(handle, ACPI_SYSTEM_NOTIFY,
ata_acpi_wake_dev);
@@ -1140,14 +1140,23 @@ static int ata_acpi_bind_device(struct ata_port *ap, 
struct scsi_device *sdev,
 
/*
 * If firmware has _PS3 or _PR3 for this device,
-* and this ata ODD device support device attention,
-* it means this device can be powered off
+* it means this device can be powered off runtime
 */
states = acpi_dev-power.states;
-   if ((states[ACPI_STATE_D3_HOT].flags.valid ||
-   states[ACPI_STATE_D3_COLD].flags.explicit_set) 
-   ata_dev-flags  ATA_DFLAG_DA)
-   sdev-can_power_off = 1;
+   if (states[ACPI_STATE_D3_HOT].flags.valid ||
+   states[ACPI_STATE_D3_COLD].flags.explicit_set) {
+   /*
+* For ODD, it needs to support device attention or
+* it can't be powered up back by user
+*/
+   if (ata_dev-class == ATA_DEV_ATAPI 
+   ata_dev-flags  ATA_DFLAG_DA)
+   sdev-can_power_off = 1;
+
+   /* No requirement for hard disk */
+   if (ata_dev-class == ATA_DEV_ATA)
+   sdev-can_power_off = 1;
+   }
 
return 0;
 }
-- 
1.7.12.21.g871e293

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] scsi: sd: set ready_to_power_off for scsi disk

2012-09-18 Thread Aaron Lu
The ready_to_power_off flag is used to give indication to ATA layer
if this device's power can be removed after runtime suspended from the
perspective of scsi driver.

It is introduced to support zero power ODD. When ODD is runtime
suspended, it may not be OK to remove its power.

But for disk, there doesn't exist a scenario that when the disk is
runtime suspended, it can't be powered off.

Signed-off-by: Aaron Lu aaron...@intel.com
---
 drivers/scsi/sd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7b0f07a..009d30c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2638,6 +2638,7 @@ static void sd_probe_async(void *data, async_cookie_t 
cookie)
 
sd_printk(KERN_NOTICE, sdkp, Attached SCSI %sdisk\n,
  sdp-removable ? removable  : );
+   sdp-ready_to_power_off = 1;
scsi_autopm_put_device(sdp);
put_device(sdkp-dev);
 }
-- 
1.7.12.21.g871e293

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag

2012-09-18 Thread Oliver Neukum
On Tuesday 18 September 2012 15:00:28 Aaron Lu wrote:
 When scsi device received stop command, it will take care of its
 internal cache before enters stopped power condition. This command is
 translated to standby immediate in libata-scsi, but standby doesn't
 imply flush cache for ATA device, so to issue stop command to ATA
 device, an additional flush cache has to be issued.

Why not just set WCE?

Regards
Oliver

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag

2012-09-18 Thread Aaron Lu
On Tue, Sep 18, 2012 at 09:30:11AM +0200, Oliver Neukum wrote:
 On Tuesday 18 September 2012 15:00:28 Aaron Lu wrote:
  When scsi device received stop command, it will take care of its
  internal cache before enters stopped power condition. This command is
  translated to standby immediate in libata-scsi, but standby doesn't
  imply flush cache for ATA device, so to issue stop command to ATA
  device, an additional flush cache has to be issued.
 
 Why not just set WCE?

This flag is used for devices whose WCE bit is set.

This flag means, when we are to issue a scsi stop command, we need to
issue an additional sync cache command first.

For scsi device, per the spec, there is no such need as the device will
take care of its internal cache when going to stop power condition.

But for ata device, the stop command is translated to standby immediate,
and we have to flush the internal cache before enter standby.

-Aaron
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag

2012-09-18 Thread James Bottomley
On Tue, 2012-09-18 at 15:00 +0800, Aaron Lu wrote:
 When scsi device received stop command, it will take care of its
 internal cache before enters stopped power condition. This command is
 translated to standby immediate in libata-scsi, but standby doesn't
 imply flush cache for ATA device, so to issue stop command to ATA
 device, an additional flush cache has to be issued.
 
 Introduce this flag so that when we are to stop the ATA disk in scsi
 disk driver, also flush its internal cache.
 
 Signed-off-by: Aaron Lu aaron...@intel.com
 ---
  include/scsi/scsi_device.h | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
 index 4712aa1..26c3621 100644
 --- a/include/scsi/scsi_device.h
 +++ b/include/scsi/scsi_device.h
 @@ -160,6 +160,7 @@ struct scsi_device {
   unsigned ready_to_power_off:1;  /* Device is ready to be powered off */
   unsigned powered_off:1; /* Device is powered off */
   unsigned may_power_off:1; /* Power off is allowed by user */
 + unsigned sync_before_stop:1; /* Sync cache before stop */

Why do you need this?

Surely it's all conditioned on the WCE flag.  If WCE isn't set, the
cache is write through (or uncached) and there's no need for a sync
before power down.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag

2012-09-18 Thread Aaron Lu
On Tue, Sep 18, 2012 at 08:56:55AM +0100, James Bottomley wrote:
 On Tue, 2012-09-18 at 15:00 +0800, Aaron Lu wrote:
  When scsi device received stop command, it will take care of its
  internal cache before enters stopped power condition. This command is
  translated to standby immediate in libata-scsi, but standby doesn't
  imply flush cache for ATA device, so to issue stop command to ATA
  device, an additional flush cache has to be issued.
  
  Introduce this flag so that when we are to stop the ATA disk in scsi
  disk driver, also flush its internal cache.
  
  Signed-off-by: Aaron Lu aaron...@intel.com
  ---
   include/scsi/scsi_device.h | 1 +
   1 file changed, 1 insertion(+)
  
  diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
  index 4712aa1..26c3621 100644
  --- a/include/scsi/scsi_device.h
  +++ b/include/scsi/scsi_device.h
  @@ -160,6 +160,7 @@ struct scsi_device {
  unsigned ready_to_power_off:1;  /* Device is ready to be powered off */
  unsigned powered_off:1; /* Device is powered off */
  unsigned may_power_off:1; /* Power off is allowed by user */
  +   unsigned sync_before_stop:1; /* Sync cache before stop */
 
 Why do you need this?
 
 Surely it's all conditioned on the WCE flag.  If WCE isn't set, the
 cache is write through (or uncached) and there's no need for a sync
 before power down.

The set of this flag doesn't mean we will sync cache for sure.

It's only meaningful when WCE is set, and in that case, it means when we
are to send a stop command to the device, do we need to send an
additional flush cache command first?

In sd_suspend, the cache will be synchronized when:
1 For devices do not support start_stop, always;
2 For devices support start_stop, if it is standard scsi device, never;
  and if it is an ata device(reflected by this newly introduced flag),
  always.

The reason for this is, the scsi's stop command = flush_cache + standby
for ata, so I have to introduce this flag to explicitely do a cache
flush.

Makes sense?

-Aaron
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag

2012-09-18 Thread James Bottomley
On Tue, 2012-09-18 at 16:09 +0800, Aaron Lu wrote:
 On Tue, Sep 18, 2012 at 08:56:55AM +0100, James Bottomley wrote:
  On Tue, 2012-09-18 at 15:00 +0800, Aaron Lu wrote:
   When scsi device received stop command, it will take care of its
   internal cache before enters stopped power condition. This command is
   translated to standby immediate in libata-scsi, but standby doesn't
   imply flush cache for ATA device, so to issue stop command to ATA
   device, an additional flush cache has to be issued.
   
   Introduce this flag so that when we are to stop the ATA disk in scsi
   disk driver, also flush its internal cache.
   
   Signed-off-by: Aaron Lu aaron...@intel.com
   ---
include/scsi/scsi_device.h | 1 +
1 file changed, 1 insertion(+)
   
   diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
   index 4712aa1..26c3621 100644
   --- a/include/scsi/scsi_device.h
   +++ b/include/scsi/scsi_device.h
   @@ -160,6 +160,7 @@ struct scsi_device {
 unsigned ready_to_power_off:1;  /* Device is ready to be powered off */
 unsigned powered_off:1; /* Device is powered off */
 unsigned may_power_off:1; /* Power off is allowed by user */
   + unsigned sync_before_stop:1; /* Sync cache before stop */
  
  Why do you need this?
  
  Surely it's all conditioned on the WCE flag.  If WCE isn't set, the
  cache is write through (or uncached) and there's no need for a sync
  before power down.
 
 The set of this flag doesn't mean we will sync cache for sure.
 
 It's only meaningful when WCE is set, and in that case, it means when we
 are to send a stop command to the device, do we need to send an
 additional flush cache command first?

Then surely it indicates support for ACPI power down and it's wrongly
named?

 In sd_suspend, the cache will be synchronized when:
 1 For devices do not support start_stop, always;
 2 For devices support start_stop, if it is standard scsi device, never;
   and if it is an ata device(reflected by this newly introduced flag),
   always.

This doesn't look right to me.  I think it's probably just a layering
violation.  For sd, we need to use the standard SCSI commands.  That
means we treat power states as the SCSI standard says.  The fact that
ATA devices may be required to translate START_STOP_UNIT with STANDBY as
a flush followed by one of the ATA standby commands.  This is very
important: if we construct a libata SATL that doesn't conform to the
standards, things will eventually explode when we try to interact with
devices with their own internal SATL (like the LSI card, or various USB
devices) because eventually we'll make one unexpected interaction too
many.

 The reason for this is, the scsi's stop command = flush_cache +
 standby
 for ata, so I have to introduce this flag to explicitely do a cache
 flush.
 
 Makes sense?

Really, no: your flag equates to this is an ATA device, treat it
differently that's knowledge that's not supposed to be in sd unless
absolutely necessary.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag

2012-09-18 Thread Aaron Lu
On Tue, Sep 18, 2012 at 09:20:20AM +0100, James Bottomley wrote:
 On Tue, 2012-09-18 at 16:09 +0800, Aaron Lu wrote:
  On Tue, Sep 18, 2012 at 08:56:55AM +0100, James Bottomley wrote:
   On Tue, 2012-09-18 at 15:00 +0800, Aaron Lu wrote:
When scsi device received stop command, it will take care of its
internal cache before enters stopped power condition. This command is
translated to standby immediate in libata-scsi, but standby doesn't
imply flush cache for ATA device, so to issue stop command to ATA
device, an additional flush cache has to be issued.

Introduce this flag so that when we are to stop the ATA disk in scsi
disk driver, also flush its internal cache.

Signed-off-by: Aaron Lu aaron...@intel.com
---
 include/scsi/scsi_device.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 4712aa1..26c3621 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -160,6 +160,7 @@ struct scsi_device {
unsigned ready_to_power_off:1;  /* Device is ready to be 
powered off */
unsigned powered_off:1; /* Device is powered off */
unsigned may_power_off:1; /* Power off is allowed by user */
+   unsigned sync_before_stop:1; /* Sync cache before stop */
   
   Why do you need this?
   
   Surely it's all conditioned on the WCE flag.  If WCE isn't set, the
   cache is write through (or uncached) and there's no need for a sync
   before power down.
  
  The set of this flag doesn't mean we will sync cache for sure.
  
  It's only meaningful when WCE is set, and in that case, it means when we
  are to send a stop command to the device, do we need to send an
  additional flush cache command first?
 
 Then surely it indicates support for ACPI power down and it's wrongly
 named?

It's generic ATA requirement that before standby, cache has to be
flushed and unlike scsi, standby does not imply cache flush.

 
  In sd_suspend, the cache will be synchronized when:
  1 For devices do not support start_stop, always;
  2 For devices support start_stop, if it is standard scsi device, never;
and if it is an ata device(reflected by this newly introduced flag),
always.
 
 This doesn't look right to me.  I think it's probably just a layering
 violation.  For sd, we need to use the standard SCSI commands.  That
 means we treat power states as the SCSI standard says.  The fact that
 ATA devices may be required to translate START_STOP_UNIT with STANDBY as
 a flush followed by one of the ATA standby commands.  This is very
 important: if we construct a libata SATL that doesn't conform to the
 standards, things will eventually explode when we try to interact with
 devices with their own internal SATL (like the LSI card, or various USB
 devices) because eventually we'll make one unexpected interaction too
 many.

I agree that it is better handled in libata's SALT, I tried to do this
but didn't find a good way so I introduced this flag. The SALT is 1-1
mapping, I'm not sure how to handle this 1-2 mapping.

I'll check this again to see how to do it there.

Thanks for your comments.

-Aaron
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] scsi: eesox: use __iomem pointers for MMIO

2012-09-18 Thread Arnd Bergmann
On Monday 17 September 2012, Russell King - ARM Linux wrote:
 In both of my replies, I've said as x86 does.  We need to follow
 x86's behaviour here, which is as I've quoted - it's not a matter
 that I need to make up my mind - my mind is already well made up.
 That is, we need to follow x86 here.
 
 That is, const volatile void __iomem * for reads, and volatile void
 __iomem * for writes.

Ok, I'll keep the patch in the series then, as it only changes the
pointer that we do an MMIO write on, not the ones for MMIO read.

Arnd

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [V4 PATCH 0/8] csiostor: Chelsio FCoE offload driver submission

2012-09-18 Thread James Bottomley
On Tue, 2012-09-18 at 09:54 +0530, Naresh Kumar Inna wrote:
 Hi James,
 
 Could you please consider merging version V4 of the driver patches, if
 you think they are in good shape now?

Well, definitely not yet; you seem to have missed the memo on readq:

  CC [M]  drivers/scsi/cxgbi/cxgb4i/cxgb4i.o
drivers/scsi/csiostor/csio_hw.c: In function ‘csio_hw_mc_read’:
drivers/scsi/csiostor/csio_hw.c:194:3: error: implicit declaration of
function ‘readq’ [-Werror=implicit-function-declaration]

It's only defined on platforms which can support an atomic 64 bit
operation (which is mostly not any 32 bit platforms) ... so this could
do with compile testing on those.

To see how to handle readq/writeq in the 32 bit case, see the uses in
fnic or qla2xxx

You also have a couple of unnecessary version.h includes.

Since you're a new driver, could you not do a correctly unlocked
queuecommand routine?  You'll find the way you've currently got it coded
(holding the host lock for the entire queuecommand routine) is very
performance detrimental.

You have a lot of locking statements which aren't easy to audit by hand
because there are multiple unlocks.  Things like this:

csio_scan_finished(struct Scsi_Host *shost, unsigned long time)
{
struct csio_lnode *ln = shost_priv(shost);
int rv = 0;

spin_lock_irq(shost-host_lock);
if (!ln-hwp || csio_list_deleted(ln-sm.sm_list)) {
spin_unlock_irq(shost-host_lock);
return 1;
}

rv = csio_scan_done(ln, jiffies, time, csio_max_scan_tmo * HZ,
csio_delta_scan_tmo * HZ);

spin_unlock_irq(shost-host_lock);

return rv;
}

Are better coded as

csio_scan_finished(struct Scsi_Host *shost, unsigned long time)
{
struct csio_lnode *ln = shost_priv(shost);
int rv = 1;

spin_lock_irq(shost-host_lock);
if (!ln-hwp || csio_list_deleted(ln-sm.sm_list))
goto out;

rv = csio_scan_done(ln, jiffies, time, csio_max_scan_tmo * HZ,
csio_delta_scan_tmo * HZ);

 out:
spin_unlock_irq(shost-host_lock);

return rv;
}

It's shorter and the unlock clearly matches the lock.  You could even
invert the if logic and just make the csio_scan_done() conditional on it
avoiding the goto.

I'd also really like the people who understand FC to take a look over
this as well.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel crash with unsupported DIF protection type

2012-09-18 Thread Hannes Reinecke
On 09/18/2012 11:30 AM, Douglas Gilbert wrote:
 On 12-09-18 11:04 AM, Hannes Reinecke wrote:
 Hi all,

 I recently got my hands on some weird drives, insisting on having
 been formatted with protection type 7:

 # sg_readcap --16 /dev/sdb
 Read Capacity results:
 Protection: prot_en=1, p_type=6, p_i_exponent=0
 Logical block provisioning: lbpme=0, lbprz=0
 Last logical block address=1758174767 (0x68cb9e2f), Number of
 logical blocks=1758174768
 Logical block length=512 bytes
 Logical blocks per physical block exponent=0
 Lowest aligned logical block address=0
 Hence:
 Device size: 900185481216 bytes, 858483.8 MiB, 900.19 GB

 (I know. I've already complained.)
 However, this drive causes a horrible crash:

 sd 0:0:1:0: [sdb] formatted with unsupported protection type 7.
 Disabling disk!
 sd 0:0:1:0: [sdb] 1758174768 512-byte logical blocks: (900
 GB/838 GiB)
 sd 0:0:1:0: [sdb]  Result: hostbyte=DID_OK
 driverbyte=DRIVER_SENSE
 sd 0:0:1:0: [sdb]  Sense Key : Medium Error [current]
 sd 0:0:1:0: [sdb]  Add. Sense: Medium format corrupted
 sd 0:0:1:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
 end_request: I/O error, dev sdb, sector 0
 Buffer I/O error on device sdb, logical block 0

 [ tons and tons of I/O errors ]

 [   15.551689] sd 0:0:1:0: [sdb] Enabling DIX T10-DIF-TYPE1-CRC
 protection
 [   15.561353] [ cut here ]
 [   15.569416] kernel BUG at
 /usr/src/linux-3.0/drivers/scsi/scsi_lib.c:1069!

 There are several odd things happening here:
 - It says 'Disabling disk', which _should_ have set the
capacity to '0':

 drivers/scsi/sd.c:sd_read_protection_type()
 if (type  SD_DIF_TYPE3_PROTECTION) {
 sd_printk(KERN_ERR, sdkp, formatted with unsupported \
   protection type %u. Disabling disk!\n, type);
 sdkp-capacity = 0;
 return;
 }

 - it enables type 1 protection, which it evidently is not.

 I've attached a tentative patch, which allows the system to boot.
 However, I'm not completely happy with that, as the capacity is
 _still_ updated after revalidation:

 sd 0:0:1:0: [sdb] formatted with unsupported protection type 7.
 Disabling disk!
 sd 0:0:1:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
 sd 0:0:1:0: [sdb] Write Protect is off
 sd 0:0:1:0: [sdb] Mode Sense: cf 00 10 08
 sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled,
 supports DPO and FUA
 sd 0:0:1:0: [sdb] 1758174768 512-byte logical blocks: (900 GB/838
 GiB)
 sd 0:0:1:0: [sdb] Attached SCSI disk

 Thoughts?
 
 The Medium format corrupted additional sense qualifier occurs
 (with Seagate disks) when a FORMAT UNIT command is interrupted.
 So maybe, for good measure, that disk also sets the DIF
 protection type to an unsupported value (i.e. 7). So re-doing a
 FORMAT  UNIT may clear that state.
 
Yeah, that's what I'm doing now.

Incidentally:

sg_format manpage says:

   Format with type 1 protection:

  sg_format --format --fmtpinfo=3 --pfu /dev/sdm

but the '--pfu' option requires an argument.
According to sbc3r23 the command should read:

sg_format --format --fmtpinfo=3 --pfu=1 /dev/sdm

Correct?

 Obviously the kernel should not crash when faced with such
 a disk.
 
Precisely.
Especially the 'Disabling disk' no-op is worrying.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel crash with unsupported DIF protection type

2012-09-18 Thread Douglas Gilbert

On 12-09-18 11:35 AM, Hannes Reinecke wrote:

On 09/18/2012 11:30 AM, Douglas Gilbert wrote:

On 12-09-18 11:04 AM, Hannes Reinecke wrote:

Hi all,

I recently got my hands on some weird drives, insisting on having
been formatted with protection type 7:

# sg_readcap --16 /dev/sdb
Read Capacity results:
 Protection: prot_en=1, p_type=6, p_i_exponent=0
 Logical block provisioning: lbpme=0, lbprz=0
 Last logical block address=1758174767 (0x68cb9e2f), Number of
logical blocks=1758174768
 Logical block length=512 bytes
 Logical blocks per physical block exponent=0
 Lowest aligned logical block address=0
Hence:
 Device size: 900185481216 bytes, 858483.8 MiB, 900.19 GB

(I know. I've already complained.)
However, this drive causes a horrible crash:

sd 0:0:1:0: [sdb] formatted with unsupported protection type 7.
Disabling disk!
sd 0:0:1:0: [sdb] 1758174768 512-byte logical blocks: (900
GB/838 GiB)
sd 0:0:1:0: [sdb]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
sd 0:0:1:0: [sdb]  Sense Key : Medium Error [current]
sd 0:0:1:0: [sdb]  Add. Sense: Medium format corrupted
sd 0:0:1:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0

[ tons and tons of I/O errors ]

[   15.551689] sd 0:0:1:0: [sdb] Enabling DIX T10-DIF-TYPE1-CRC
protection
[   15.561353] [ cut here ]
[   15.569416] kernel BUG at
/usr/src/linux-3.0/drivers/scsi/scsi_lib.c:1069!

There are several odd things happening here:
- It says 'Disabling disk', which _should_ have set the
capacity to '0':

drivers/scsi/sd.c:sd_read_protection_type()
 if (type  SD_DIF_TYPE3_PROTECTION) {
 sd_printk(KERN_ERR, sdkp, formatted with unsupported \
   protection type %u. Disabling disk!\n, type);
 sdkp-capacity = 0;
 return;
 }

- it enables type 1 protection, which it evidently is not.

I've attached a tentative patch, which allows the system to boot.
However, I'm not completely happy with that, as the capacity is
_still_ updated after revalidation:

sd 0:0:1:0: [sdb] formatted with unsupported protection type 7.
Disabling disk!
sd 0:0:1:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
sd 0:0:1:0: [sdb] Write Protect is off
sd 0:0:1:0: [sdb] Mode Sense: cf 00 10 08
sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled,
supports DPO and FUA
sd 0:0:1:0: [sdb] 1758174768 512-byte logical blocks: (900 GB/838
GiB)
sd 0:0:1:0: [sdb] Attached SCSI disk

Thoughts?


The Medium format corrupted additional sense qualifier occurs
(with Seagate disks) when a FORMAT UNIT command is interrupted.
So maybe, for good measure, that disk also sets the DIF
protection type to an unsupported value (i.e. 7). So re-doing a
FORMAT  UNIT may clear that state.


Yeah, that's what I'm doing now.

Incidentally:

sg_format manpage says:

Format with type 1 protection:

   sg_format --format --fmtpinfo=3 --pfu /dev/sdm

but the '--pfu' option requires an argument.
According to sbc3r23 the command should read:

sg_format --format --fmtpinfo=3 --pfu=1 /dev/sdm

Correct?


Yes. And it is corrected in the 1.34 betas that I have
sent you :-)


I placed a copy of my latest sg3_utils-1.34 beta
in the News section at the top of this page:
  http://sg.danny.cz/sg

Doug Gilbert


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] block: Consolidate command flag and queue limit checks for merges

2012-09-18 Thread Martin K. Petersen
From: Martin K. Petersen martin.peter...@oracle.com

 - blk_check_merge_flags() verifies that cmd_flags / bi_rw are
   compatible. This function is called for both req-req and req-bio
   merging.

 - blk_rq_get_max_sectors() and blk_queue_get_max_sectors() can be used
   to query the maximum sector count for a given request or queue. The
   calls will return the right value from the queue limits given the
   type of command (RW, discard, write same, etc.)

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
---
 block/blk-core.c   |3 +--
 block/blk-merge.c  |   30 --
 include/linux/blkdev.h |   31 +++
 3 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5cc2929..33eded0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1866,8 +1866,7 @@ int blk_rq_check_limits(struct request_queue *q, struct 
request *rq)
if (!rq_mergeable(rq))
return 0;
 
-   if (blk_rq_sectors(rq)  queue_max_sectors(q) ||
-   blk_rq_bytes(rq)  queue_max_hw_sectors(q)  9) {
+   if (blk_rq_sectors(rq)  blk_queue_get_max_sectors(q, rq-cmd_flags)) {
printk(KERN_ERR %s: over max size limit.\n, __func__);
return -EIO;
}
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 86710ca..642b862 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -275,14 +275,8 @@ no_merge:
 int ll_back_merge_fn(struct request_queue *q, struct request *req,
 struct bio *bio)
 {
-   unsigned short max_sectors;
-
-   if (unlikely(req-cmd_type == REQ_TYPE_BLOCK_PC))
-   max_sectors = queue_max_hw_sectors(q);
-   else
-   max_sectors = queue_max_sectors(q);
-
-   if (blk_rq_sectors(req) + bio_sectors(bio)  max_sectors) {
+   if (blk_rq_sectors(req) + bio_sectors(bio) 
+   blk_rq_get_max_sectors(req)) {
req-cmd_flags |= REQ_NOMERGE;
if (req == q-last_merge)
q-last_merge = NULL;
@@ -299,15 +293,8 @@ int ll_back_merge_fn(struct request_queue *q, struct 
request *req,
 int ll_front_merge_fn(struct request_queue *q, struct request *req,
  struct bio *bio)
 {
-   unsigned short max_sectors;
-
-   if (unlikely(req-cmd_type == REQ_TYPE_BLOCK_PC))
-   max_sectors = queue_max_hw_sectors(q);
-   else
-   max_sectors = queue_max_sectors(q);
-
-
-   if (blk_rq_sectors(req) + bio_sectors(bio)  max_sectors) {
+   if (blk_rq_sectors(req) + bio_sectors(bio) 
+   blk_rq_get_max_sectors(req)) {
req-cmd_flags |= REQ_NOMERGE;
if (req == q-last_merge)
q-last_merge = NULL;
@@ -338,7 +325,8 @@ static int ll_merge_requests_fn(struct request_queue *q, 
struct request *req,
/*
 * Will it become too large?
 */
-   if ((blk_rq_sectors(req) + blk_rq_sectors(next))  queue_max_sectors(q))
+   if ((blk_rq_sectors(req) + blk_rq_sectors(next)) 
+   blk_rq_get_max_sectors(req))
return 0;
 
total_phys_segments = req-nr_phys_segments + next-nr_phys_segments;
@@ -417,6 +405,9 @@ static int attempt_merge(struct request_queue *q, struct 
request *req,
if (!rq_mergeable(req) || !rq_mergeable(next))
return 0;
 
+   if (!blk_check_merge_flags(req-cmd_flags, next-cmd_flags))
+   return 0;
+
/*
 * not contiguous
 */
@@ -512,6 +503,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
if (!rq_mergeable(rq) || !bio_mergeable(bio))
return false;
 
+   if (!blk_check_merge_flags(rq-cmd_flags, bio-bi_rw))
+   return false;
+
/* different data direction or already started, don't merge */
if (bio_data_dir(bio) != rq_data_dir(rq))
return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3a6fea7..90f7abe 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -605,6 +605,18 @@ static inline bool rq_mergeable(struct request *rq)
return true;
 }
 
+static inline bool blk_check_merge_flags(unsigned int flags1,
+unsigned int flags2)
+{
+   if ((flags1  REQ_DISCARD) != (flags2  REQ_DISCARD))
+   return false;
+
+   if ((flags1  REQ_SECURE) != (flags2  REQ_SECURE))
+   return false;
+
+   return true;
+}
+
 /*
  * q-prep_rq_fn return values
  */
@@ -800,6 +812,25 @@ static inline unsigned int blk_rq_cur_sectors(const struct 
request *rq)
return blk_rq_cur_bytes(rq)  9;
 }
 
+static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
+unsigned int cmd_flags)
+{
+   if (unlikely(cmd_flags  REQ_DISCARD))
+   return 

[PATCH 4/8] block: Make blkdev_issue_zeroout use WRITE SAME

2012-09-18 Thread Martin K. Petersen
From: Martin K. Petersen martin.peter...@oracle.com

If the device supports WRITE SAME, use that to optimize zeroing of
blocks. If the device does not support WRITE SAME or if the operation
fails, fall back to writing zeroes the old-fashioned way.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Acked-by: Mike Snitzer snit...@redhat.com
---
 block/blk-lib.c |   30 +-
 1 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index a062543..9373b58 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -214,7 +214,7 @@ EXPORT_SYMBOL(blkdev_issue_write_same);
  *  Generate and issue number of bios with zerofiled pages.
  */
 
-int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
+int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask)
 {
int ret;
@@ -264,4 +264,32 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
 
return ret;
 }
+
+/**
+ * blkdev_issue_zeroout - zero-fill a block range
+ * @bdev:  blockdev to write
+ * @sector:start sector
+ * @nr_sects:  number of sectors to write
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *  Generate and issue number of bios with zerofiled pages.
+ */
+
+int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
+sector_t nr_sects, gfp_t gfp_mask)
+{
+   if (bdev_write_same(bdev)) {
+   unsigned char bdn[BDEVNAME_SIZE];
+
+   if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
+ZERO_PAGE(0)))
+   return 0;
+
+   bdevname(bdev, bdn);
+   pr_err(%s: WRITE SAME failed. Manually zeroing.\n, bdn);
+   }
+
+   return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
+}
 EXPORT_SYMBOL(blkdev_issue_zeroout);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] block: Clean up special command handling logic

2012-09-18 Thread Martin K. Petersen
From: Martin K. Petersen martin.peter...@oracle.com

Remove special-casing of non-rw fs style requests (discard). The nomerge
flags are consolidated in blk_types.h, and rq_mergeable() and
bio_mergeable() have been modified to use them.

bio_is_rw() is used in place of bio_has_data() a few places. This is
done to to distinguish true reads and writes from other fs type requests
that carry a payload (e.g. write same).

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Acked-by: Mike Snitzer snit...@redhat.com
---
 block/blk-core.c  |   13 ++---
 block/blk-merge.c |   22 +-
 block/blk.h   |5 ++---
 block/elevator.c  |6 ++
 include/linux/bio.h   |   23 +--
 include/linux/blk_types.h |4 
 include/linux/blkdev.h|   22 ++
 7 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2d739ca..5cc2929 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1657,8 +1657,8 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
 
-   if (unlikely(!(bio-bi_rw  REQ_DISCARD) 
-nr_sectors  queue_max_hw_sectors(q))) {
+   if (likely(bio_is_rw(bio) 
+  nr_sectors  queue_max_hw_sectors(q))) {
printk(KERN_ERR bio too big device %s (%u  %u)\n,
   bdevname(bio-bi_bdev, b),
   bio_sectors(bio),
@@ -1699,8 +1699,7 @@ generic_make_request_checks(struct bio *bio)
 
if ((bio-bi_rw  REQ_DISCARD) 
(!blk_queue_discard(q) ||
-((bio-bi_rw  REQ_SECURE) 
- !blk_queue_secdiscard(q {
+((bio-bi_rw  REQ_SECURE)  !blk_queue_secdiscard(q {
err = -EOPNOTSUPP;
goto end_io;
}
@@ -1818,7 +1817,7 @@ void submit_bio(int rw, struct bio *bio)
 * If it's a regular read/write or a barrier with data attached,
 * go through the normal accounting stuff before submission.
 */
-   if (bio_has_data(bio)  !(rw  REQ_DISCARD)) {
+   if (bio_has_data(bio)) {
if (rw  WRITE) {
count_vm_events(PGPGOUT, count);
} else {
@@ -1864,7 +1863,7 @@ EXPORT_SYMBOL(submit_bio);
  */
 int blk_rq_check_limits(struct request_queue *q, struct request *rq)
 {
-   if (rq-cmd_flags  REQ_DISCARD)
+   if (!rq_mergeable(rq))
return 0;
 
if (blk_rq_sectors(rq)  queue_max_sectors(q) ||
@@ -2338,7 +2337,7 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
req-buffer = bio_data(req-bio);
 
/* update sector only for requests with clear definition of sector */
-   if (req-cmd_type == REQ_TYPE_FS || (req-cmd_flags  REQ_DISCARD))
+   if (req-cmd_type == REQ_TYPE_FS)
req-__sector += total_bytes  9;
 
/* mixed attributes always follow the first bio */
diff --git a/block/blk-merge.c b/block/blk-merge.c
index e76279e..86710ca 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -418,18 +418,6 @@ static int attempt_merge(struct request_queue *q, struct 
request *req,
return 0;
 
/*
-* Don't merge file system requests and discard requests
-*/
-   if ((req-cmd_flags  REQ_DISCARD) != (next-cmd_flags  REQ_DISCARD))
-   return 0;
-
-   /*
-* Don't merge discard requests and secure discard requests
-*/
-   if ((req-cmd_flags  REQ_SECURE) != (next-cmd_flags  REQ_SECURE))
-   return 0;
-
-   /*
 * not contiguous
 */
if (blk_rq_pos(req) + blk_rq_sectors(req) != blk_rq_pos(next))
@@ -521,15 +509,7 @@ int blk_attempt_req_merge(struct request_queue *q, struct 
request *rq,
 
 bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 {
-   if (!rq_mergeable(rq))
-   return false;
-
-   /* don't merge file system requests and discard requests */
-   if ((bio-bi_rw  REQ_DISCARD) != (rq-bio-bi_rw  REQ_DISCARD))
-   return false;
-
-   /* don't merge discard requests and secure discard requests */
-   if ((bio-bi_rw  REQ_SECURE) != (rq-bio-bi_rw  REQ_SECURE))
+   if (!rq_mergeable(rq) || !bio_mergeable(bio))
return false;
 
/* different data direction or already started, don't merge */
diff --git a/block/blk.h b/block/blk.h
index 2a0ea32..ca51543 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -171,14 +171,13 @@ static inline int queue_congestion_off_threshold(struct 
request_queue *q)
  *
  * a) it's attached to a gendisk, and
  * b) the queue had IO stats enabled when this request was started, and
- * c) it's a file system request or a discard request
+ * c) it's a file system request
  */
 static inline int blk_do_io_stat(struct request *rq)
 {
return rq-rq_disk 
   

[PATCH 5/8] block: ioctl to zero block ranges

2012-09-18 Thread Martin K. Petersen
From: Martin K. Petersen martin.peter...@oracle.com

Introduce a BLKZEROOUT ioctl which can be used to clear block ranges by
way of blkdev_issue_zeroout().

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Acked-by: Mike Snitzer snit...@redhat.com
---
 block/ioctl.c  |   27 +++
 include/linux/fs.h |1 +
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 4476e0e8..769d296 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -185,6 +185,22 @@ static int blk_ioctl_discard(struct block_device *bdev, 
uint64_t start,
return blkdev_issue_discard(bdev, start, len, GFP_KERNEL, flags);
 }
 
+static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start,
+uint64_t len)
+{
+   if (start  511)
+   return -EINVAL;
+   if (len  511)
+   return -EINVAL;
+   start = 9;
+   len = 9;
+
+   if (start + len  (i_size_read(bdev-bd_inode)  9))
+   return -EINVAL;
+
+   return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL);
+}
+
 static int put_ushort(unsigned long arg, unsigned short val)
 {
return put_user(val, (unsigned short __user *)arg);
@@ -300,6 +316,17 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, 
unsigned cmd,
return blk_ioctl_discard(bdev, range[0], range[1],
 cmd == BLKSECDISCARD);
}
+   case BLKZEROOUT: {
+   uint64_t range[2];
+
+   if (!(mode  FMODE_WRITE))
+   return -EBADF;
+
+   if (copy_from_user(range, (void __user *)arg, sizeof(range)))
+   return -EFAULT;
+
+   return blk_ioctl_zeroout(bdev, range[0], range[1]);
+   }
 
case HDIO_GETGEO: {
struct hd_geometry geo;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index aa11047..bd6f6e7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -335,6 +335,7 @@ struct inodes_stat_t {
 #define BLKDISCARDZEROES _IO(0x12,124)
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
+#define BLKZEROOUT _IO(0x12,127)
 
 #define BMAP_IOCTL 1   /* obsolete - kept for compatibility */
 #define FIBMAP_IO(0x00,1)  /* bmap access */
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] scsi: Add a report opcode helper

2012-09-18 Thread Martin K. Petersen
From: Martin K. Petersen martin.peter...@oracle.com

The REPORT SUPPORTED OPERATION CODES command can be used to query
whether a given opcode is supported by a device. Add a helper function
that allows us to look up commands.

We only issue RSOC if the device reports compliance with SPC-3 or
later. But to err on the side of caution we disable the command for ATA,
FireWire and USB.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Acked-by: Mike Snitzer snit...@redhat.com
---
 drivers/ata/libata-scsi.c  |1 +
 drivers/firewire/sbp2.c|1 +
 drivers/scsi/scsi.c|   45 
 drivers/usb/storage/scsiglue.c |3 ++
 include/scsi/scsi_device.h |3 ++
 5 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 8ec81ca..86d43b2 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1052,6 +1052,7 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
 {
sdev-use_10_for_rw = 1;
sdev-use_10_for_ms = 1;
+   sdev-no_report_opcodes = 1;
 
/* Schedule policy is determined by -qc_defer() callback and
 * it needs to see every deferred qc.  Set dev_blocked to 1 to
diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index 1162d6b..f82e9d4 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -1546,6 +1546,7 @@ static int sbp2_scsi_slave_configure(struct scsi_device 
*sdev)
struct sbp2_logical_unit *lu = sdev-hostdata;
 
sdev-use_10_for_rw = 1;
+   sdev-no_report_opcodes = 1;
 
if (sbp2_param_exclusive_login)
sdev-manage_start_stop = 1;
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2936b44..2c0d0ec 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -55,6 +55,7 @@
 #include linux/cpu.h
 #include linux/mutex.h
 #include linux/async.h
+#include asm/unaligned.h
 
 #include scsi/scsi.h
 #include scsi/scsi_cmnd.h
@@ -1062,6 +1063,50 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, 
unsigned char *buf,
 EXPORT_SYMBOL_GPL(scsi_get_vpd_page);
 
 /**
+ * scsi_report_opcode - Find out if a given command opcode is supported
+ * @sdev:  scsi device to query
+ * @buffer:scratch buffer (must be at least 20 bytes long)
+ * @len:   length of buffer
+ * @opcode:opcode for command to look up
+ *
+ * Uses the REPORT SUPPORTED OPERATION CODES to look up the given
+ * opcode. Returns 0 if RSOC fails or if the command opcode is
+ * unsupported. Returns 1 if the device claims to support the command.
+ */
+int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer,
+  unsigned int len, unsigned char opcode)
+{
+   unsigned char cmd[16];
+   struct scsi_sense_hdr sshdr;
+   int result;
+
+   if (sdev-no_report_opcodes || sdev-scsi_level  SCSI_SPC_3)
+   return 0;
+
+   memset(cmd, 0, 16);
+   cmd[0] = MAINTENANCE_IN;
+   cmd[1] = MI_REPORT_SUPPORTED_OPERATION_CODES;
+   cmd[2] = 1; /* One command format */
+   cmd[3] = opcode;
+   put_unaligned_be32(len, cmd[6]);
+   memset(buffer, 0, len);
+
+   result = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buffer, len,
+ sshdr, 30 * HZ, 3, NULL);
+
+   if (result  scsi_sense_valid(sshdr) 
+   sshdr.sense_key == ILLEGAL_REQUEST 
+   (sshdr.asc == 0x20 || sshdr.asc == 0x24)  sshdr.ascq == 0x00)
+   return 0;
+
+   if ((buffer[1]  3) == 3) /* Command supported */
+   return 1;
+
+   return 0;
+}
+EXPORT_SYMBOL(scsi_report_opcode);
+
+/**
  * scsi_device_get  -  get an additional reference to a scsi_device
  * @sdev:  device to get a reference to
  *
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index a3d5436..6ab376a 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -186,6 +186,9 @@ static int slave_configure(struct scsi_device *sdev)
/* Some devices don't handle VPD pages correctly */
sdev-skip_vpd_pages = 1;
 
+   /* Do not attempt to use REPORT SUPPORTED OPERATION CODES */
+   sdev-no_report_opcodes = 1;
+
/* Some disks return the total number of blocks in response
 * to READ CAPACITY rather than the highest block number.
 * If this device makes that mistake, tell the sd driver. */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 9895f69..7d0b4f6 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -135,6 +135,7 @@ struct scsi_device {
 * because we did a bus reset. */
unsigned use_10_for_rw:1; /* first try 10-byte read / write */
unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
+   unsigned 

[PATCH 3/8] block: Implement support for WRITE SAME

2012-09-18 Thread Martin K. Petersen
From: Martin K. Petersen martin.peter...@oracle.com

The WRITE SAME command supported on some SCSI devices allows the same
block to be efficiently replicated throughout a block range. Only a
single logical block is transferred from the host and the storage device
writes the same data to all blocks described by the I/O.

This patch implements support for WRITE SAME in the block layer. The
blkdev_issue_write_same() function can be used by filesystems and block
drivers to replicate a buffer across a block range. This can be used to
efficiently initialize software RAID devices, etc.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Acked-by: Mike Snitzer snit...@redhat.com
---
 Documentation/ABI/testing/sysfs-block |   14 ++
 block/blk-core.c  |   14 +-
 block/blk-lib.c   |   74 +
 block/blk-merge.c |9 
 block/blk-settings.c  |   16 +++
 block/blk-sysfs.c |   13 ++
 drivers/md/raid0.c|1 +
 fs/bio.c  |9 +++-
 include/linux/bio.h   |3 +
 include/linux/blk_types.h |5 ++-
 include/linux/blkdev.h|   29 +
 11 files changed, 181 insertions(+), 6 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index c1eb41c..279da08 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -206,3 +206,17 @@ Description:
when a discarded area is read the discard_zeroes_data
parameter will be set to one. Otherwise it will be 0 and
the result of reading a discarded area is undefined.
+
+What:  /sys/block/disk/queue/write_same_max_bytes
+Date:  January 2012
+Contact:   Martin K. Petersen martin.peter...@oracle.com
+Description:
+   Some devices support a write same operation in which a
+   single data block can be written to a range of several
+   contiguous blocks on storage. This can be used to wipe
+   areas on disk or to initialize drives in a RAID
+   configuration. write_same_max_bytes indicates how many
+   bytes can be written in a single write same command. If
+   write_same_max_bytes is 0, write same is not supported
+   by the device.
+
diff --git a/block/blk-core.c b/block/blk-core.c
index 33eded0..3b08054 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1704,6 +1704,11 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
 
+   if (bio-bi_rw  REQ_WRITE_SAME  !bdev_write_same(bio-bi_bdev)) {
+   err = -EOPNOTSUPP;
+   goto end_io;
+   }
+
/*
 * Various block parts want %current-io_context and lazy ioc
 * allocation ends up trading a lot of pain for a small amount of
@@ -1809,8 +1814,6 @@ EXPORT_SYMBOL(generic_make_request);
  */
 void submit_bio(int rw, struct bio *bio)
 {
-   int count = bio_sectors(bio);
-
bio-bi_rw |= rw;
 
/*
@@ -1818,6 +1821,13 @@ void submit_bio(int rw, struct bio *bio)
 * go through the normal accounting stuff before submission.
 */
if (bio_has_data(bio)) {
+   unsigned int count;
+
+   if (unlikely(rw  REQ_WRITE_SAME))
+   count = bdev_logical_block_size(bio-bi_bdev)  9;
+   else
+   count = bio_sectors(bio);
+
if (rw  WRITE) {
count_vm_events(PGPGOUT, count);
} else {
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 19cc761..a062543 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -130,6 +130,80 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
 EXPORT_SYMBOL(blkdev_issue_discard);
 
 /**
+ * blkdev_issue_write_same - queue a write same operation
+ * @bdev:  target blockdev
+ * @sector:start sector
+ * @nr_sects:  number of sectors to write
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ * @page:  page containing data to write
+ *
+ * Description:
+ *Issue a write same request for the sectors in question.
+ */
+int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+   sector_t nr_sects, gfp_t gfp_mask,
+   struct page *page)
+{
+   DECLARE_COMPLETION_ONSTACK(wait);
+   struct request_queue *q = bdev_get_queue(bdev);
+   unsigned int max_write_same_sectors;
+   struct bio_batch bb;
+   struct bio *bio;
+   int ret = 0;
+
+   if (!q)
+   return -ENXIO;
+
+   max_write_same_sectors = q-limits.max_write_same_sectors;
+
+   if (max_write_same_sectors == 0)
+   return -EOPNOTSUPP;
+
+   atomic_set(bb.done, 1);
+   

[PATCH 8/8] sd: Implement support for WRITE SAME

2012-09-18 Thread Martin K. Petersen
From: Martin K. Petersen martin.peter...@oracle.com

Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk
driver.

 - We set the default maximum to 0x because there are several
   devices out there that only support two-byte block counts even with
   WRITE SAME(16). We only enable transfers bigger than 0x if the
   device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK
   LIMITS VPD.

 - max_write_same_blocks can be overriden per-device basis in sysfs.

 - The UNMAP discovery heuristics remain unchanged but the discard
   limits are tweaked to match the real WRITE SAME commands.

 - In the error handling logic we now distinguish between WRITE SAME
   with and without UNMAP set.


The discovery process heuristics are:

 - If the device reports a SCSI level of SPC-3 or greater we'll issue
   READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is
   supported. If that's the case we will use it.

 - If the device supports the block limits VPD and reports a MAXIMUM
   WRITE SAME LENGTH bigger than 0x we will use WRITE SAME(16).

 - Otherwise we will use WRITE SAME(10) unless the target LBA is beyond
   0x or the block count exceeds 0x.

 - no_write_same is set for ATA, FireWire and USB.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
Acked-by: Mike Snitzer snit...@redhat.com
Acked-by: Jeff Garzik jgar...@redhat.com
---
 drivers/ata/libata-scsi.c  |1 +
 drivers/firewire/sbp2.c|1 +
 drivers/scsi/scsi_lib.c|   22 -
 drivers/scsi/sd.c  |  172 +---
 drivers/scsi/sd.h  |7 ++
 drivers/usb/storage/scsiglue.c |3 +
 include/scsi/scsi_device.h |1 +
 7 files changed, 191 insertions(+), 16 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 86d43b2..002f1b7 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1053,6 +1053,7 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
sdev-use_10_for_rw = 1;
sdev-use_10_for_ms = 1;
sdev-no_report_opcodes = 1;
+   sdev-no_write_same = 1;
 
/* Schedule policy is determined by -qc_defer() callback and
 * it needs to see every deferred qc.  Set dev_blocked to 1 to
diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index f82e9d4..bb1b392 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -1547,6 +1547,7 @@ static int sbp2_scsi_slave_configure(struct scsi_device 
*sdev)
 
sdev-use_10_for_rw = 1;
sdev-no_report_opcodes = 1;
+   sdev-no_write_same = 1;
 
if (sbp2_param_exclusive_login)
sdev-manage_start_stop = 1;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ffd7773..874d41b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -897,11 +897,23 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
action = ACTION_FAIL;
error = -EILSEQ;
/* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */
-   } else if ((sshdr.asc == 0x20 || sshdr.asc == 0x24) 
-  (cmd-cmnd[0] == UNMAP ||
-   cmd-cmnd[0] == WRITE_SAME_16 ||
-   cmd-cmnd[0] == WRITE_SAME)) {
-   description = Discard failure;
+   } else if (sshdr.asc == 0x20 || sshdr.asc == 0x24) {
+   switch (cmd-cmnd[0]) {
+   case UNMAP:
+   description = Discard failure;
+   break;
+   case WRITE_SAME:
+   case WRITE_SAME_16:
+   if (cmd-cmnd[1]  0x8)
+   description = Discard failure;
+   else
+   description =
+   Write same failure;
+   break;
+   default:
+   description = Invalid command failure;
+   break;
+   }
action = ACTION_FAIL;
error = -EREMOTEIO;
} else
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e100b5c..1cf2d5d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -99,6 +99,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
 #endif
 
 static void sd_config_discard(struct scsi_disk *, unsigned int);
+static void sd_config_write_same(struct scsi_disk *);
 static int  sd_revalidate_disk(struct gendisk *);
 static void 

Re: [V4 PATCH 0/8] csiostor: Chelsio FCoE offload driver submission

2012-09-18 Thread Naresh Kumar Inna
On 9/18/2012 2:06 PM, James Bottomley wrote:
 On Tue, 2012-09-18 at 09:54, Naresh Kumar Inna wrote:
 Hi James,

 Could you please consider merging version V4 of the driver patches, if
 you think they are in good shape now?
 
 Well, definitely not yet; you seem to have missed the memo on readq:
 
   CC [M]  drivers/scsi/cxgbi/cxgb4i/cxgb4i.o
 drivers/scsi/csiostor/csio_hw.c: In function csio_hw_mc_read:
 drivers/scsi/csiostor/csio_hw.c:194:3: error: implicit declaration of
 function readq [-Werror=implicit-function-declaration]
 
 It's only defined on platforms which can support an atomic 64 bit
 operation (which is mostly not any 32 bit platforms) ... so this could
 do with compile testing on those.
 
 To see how to handle readq/writeq in the 32 bit case, see the uses in
 fnic or qla2xxx
 

Thanks for reviewing. I will fix up readq/writeq, as well as other
32-bit compilation issues, if any.

 You also have a couple of unnecessary version.h includes.
 

I will get rid of them.

 Since you're a new driver, could you not do a correctly unlocked
 queuecommand routine?  You'll find the way you've currently got it coded
 (holding the host lock for the entire queuecommand routine) is very
 performance detrimental.
 

Yes, I am aware of that. However, some of this code was written and
tested before the lock-less queuecommand came into existence. Going the
lock-less route would require me to test the driver thoroughly again. It
definitely is in my to-do list, but I would like to take that up after
the initial submission goes through. Would that be OK?

 You have a lot of locking statements which aren't easy to audit by hand
 because there are multiple unlocks.  Things like this:
 
 csio_scan_finished(struct Scsi_Host *shost, unsigned long time)
 {
   struct csio_lnode *ln = shost_priv(shost);
   int rv = 0;
 
   spin_lock_irq(shost-host_lock);
   if (!ln-hwp || csio_list_deleted(ln-sm.sm_list)) {
   spin_unlock_irq(shost-host_lock);
   return 1;
   }
 
   rv = csio_scan_done(ln, jiffies, time, csio_max_scan_tmo * HZ,
   csio_delta_scan_tmo * HZ);
 
   spin_unlock_irq(shost-host_lock);
 
   return rv;
 }
 
 Are better coded as
 
 csio_scan_finished(struct Scsi_Host *shost, unsigned long time)
 {
   struct csio_lnode *ln = shost_priv(shost);
   int rv = 1;
 
   spin_lock_irq(shost-host_lock);
   if (!ln-hwp || csio_list_deleted(ln-sm.sm_list))
   goto out;
 
   rv = csio_scan_done(ln, jiffies, time, csio_max_scan_tmo * HZ,
   csio_delta_scan_tmo * HZ);
 
  out:
   spin_unlock_irq(shost-host_lock);
 
   return rv;
 }
 
 It's shorter and the unlock clearly matches the lock.  You could even
 invert the if logic and just make the csio_scan_done() conditional on it
 avoiding the goto.
 

I will try to minimize the lock/unlock mismatch instances per your
suggestions, if not eliminate them altogether.

 I'd also really like the people who understand FC to take a look over
 this as well.
 

Sure.

Thanks,
Naresh.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fail to probe qla2xxx fiber channel card while doing pci hotplug

2012-09-18 Thread Bjorn Helgaas
On Mon, Sep 17, 2012 at 6:06 AM, Yijing Wang wangyij...@huawei.com wrote:
 On 2012/9/16 11:30, Bjorn Helgaas wrote:
 On Sat, Sep 15, 2012 at 4:22 AM, Yijing Wang wangyij...@huawei.com wrote:
 Hi all,
I encountered a very strange problem when I hot plug a fiber channel 
 card(using qla2xxx driver).
 I did the hotplug in arch x86 machine, using pciehp driver for hotplug, 
 this platform supports pci hot-plug triggering from both
 sysfs and attention button. If a hot-plug slot is empty when system 
 boot-up, then hotplug FC card in this slot is ok.
 If a hot-plug slot has been embeded a FC card when system boot-up, 
 hot-remove this card is ok, but hot-add this card will fail.
 I used
 #modprobe qla2xxx ql2xextended_error_logging=0x7fff
 to get all probe info. As bellow:

 Can anyone give me any suggestion for this problem?

 It sounds like you did this:

   1) Power down system
   2) Remove FC card from slot
   3) Boot system
   4) Hot-add FC card
   5) Load qla2xxx driver
   6) qla2xxx driver claims FC card
   7) FC card works correctly

   8) Power down system
   9) Install FC card in slot
  10) Boot system
  11) Load qla2xxx driver
  12) qla2xxx driver claims FC card
  13) FC card works correctly
 I rmmod qla2xxx driver here and modprobe qla2xxx 
 ql2xextended_error_logging=0x1e40 again for get errors info
 Also I modprobe pciehp pciehp_debug=1 for getting debug info
  14) Hot-remove card
  15) Hot-add card
  16) qla2xxx driver claims FC card
  17) FC card does not work

 and I assume the dmesg log you included is just from steps 15 and 16
 (correct me if I'm wrong).

 It would be useful to see the entire log showing all these events so
 we can compare the working cases with the non-working one.  If you use
 the pciehp_debug module parameter, we should also see some pciehp
 events that would help me understand that driver.


 Hi Bjorn,
Thanks for your comments very much!

 My steps:
 1) power down system
 2) Install FC card in slot
 3) Boot system
 4) Load qla2xxx driver
 5) qla2xxx driver claims FC card
 6) FC card works correctly(at least probe return ok, I don't know qla2xxx 
 driver much..)
 7) rmmod qla2xxx
 8) modprobe qla2xxx ql2xextended_error_logging=0x1e40(for get errors info)
 9) modprobe pciehp pciehp_debug=1
 10) Hot-remove card
 11) Hot-add card
 12) qla2xxx driver claims FC card fail(probe return fail, setup chip fail)
 --so this is failed situation--

 --continue to hot-add fc card into empty 
 slot(also support pci hp)
 13) Install FC card in empty slot
 14) Hot-add card
 15) qla2xxx driver claims FC card ok (probe return ok)

 btw:
 If fc card firmware version 4.03, everything is ok (hot-plug in any 
 slots(empty or not))
 fc card firmware version is 4.04 or 5.04 , situation as same as 1)---12)

Thanks.  The FW change is a good clue.  If everything works with
version 4.03, but it doesn't work with version 4.04, it's likely to be
a FW problem, not a Linux PCI core problem.

Here's what I see from your logs.  In slot 4 (bus 08), the card was
present before boot, you removed it, re-added it, and it failed after
being re-added.  Slot 3 (bus 06) was empty at boot, you hot-added a
card, and it worked.  Here are the resources available on those two
buses and the boot-time config of the first device in slot 4:

  pci :00:07.0: PCI bridge to [bus 06-07]
  pci :00:07.0:   bridge window [io  0xc000-0xcfff]
  pci :00:07.0:   bridge window [mem 0xf900-0xf9ff]
  pci :00:07.0:   bridge window [mem 0xf100-0xf1ff 64bit pref]
  pci :00:09.0: PCI bridge to [bus 08-09]
  pci :00:09.0:   bridge window [io  0xb000-0xbfff]
  pci :00:09.0:   bridge window [mem 0xf800-0xf8ff]
  pci :00:09.0:   bridge window [mem 0xf000-0xf0ff 64bit pref]
  pci :08:00.0: [1077:2532] type 00 class 0x0c0400
  pci :08:00.0: reg 10: [io  0xb100-0xb1ff]
  pci :08:00.0: reg 14: [mem 0xf8084000-0xf8087fff 64bit]
  pci :08:00.0: reg 30: [mem 0xf804-0xf807 pref]

After you remove and re-add the card in slot 4, it starts with
uninitialized BARs as expected, then we assign resources to it.  It's
sort of interesting that the BIOS had originally put the ROM (reg 30)
in the non-prefetchable window, while after the hot-add, Linux places
it in the prefetchable window.  Either should work, and in fact the
card you added in slot 3 *does* work with its ROM in the prefetchable
window.

  pci :08:00.0: [1077:2532] type 00 class 0x0c0400
  pci :08:00.0: reg 10: [io  0x-0x00ff]
  pci :08:00.0: reg 14: [mem 0x-0x3fff 64bit]
  pci :08:00.0: reg 30: [mem 0x-0x0003 pref]
  pci :08:00.0: BAR 0: assigned [io  0xb000-0xb0ff]
  pci :08:00.0: BAR 1: assigned [mem 0xf800-0xf8003fff 64bit]
  pci :08:00.0: BAR 6: assigned [mem 

Re: [PATCH 2/8] block: Consolidate command flag and queue limit checks for merges

2012-09-18 Thread Mike Snitzer
On Tue, Sep 18 2012 at 12:19pm -0400,
Martin K. Petersen martin.peter...@oracle.com wrote:

 From: Martin K. Petersen martin.peter...@oracle.com
 
  - blk_check_merge_flags() verifies that cmd_flags / bi_rw are
compatible. This function is called for both req-req and req-bio
merging.
 
  - blk_rq_get_max_sectors() and blk_queue_get_max_sectors() can be used
to query the maximum sector count for a given request or queue. The
calls will return the right value from the queue limits given the
type of command (RW, discard, write same, etc.)
 
 Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
 ---
  block/blk-core.c   |3 +--
  block/blk-merge.c  |   30 --
  include/linux/blkdev.h |   31 +++
  3 files changed, 44 insertions(+), 20 deletions(-)
 
 diff --git a/block/blk-core.c b/block/blk-core.c
 index 5cc2929..33eded0 100644
 --- a/block/blk-core.c
 +++ b/block/blk-core.c
 @@ -1866,8 +1866,7 @@ int blk_rq_check_limits(struct request_queue *q, struct 
 request *rq)
   if (!rq_mergeable(rq))
   return 0;
  
 - if (blk_rq_sectors(rq)  queue_max_sectors(q) ||
 - blk_rq_bytes(rq)  queue_max_hw_sectors(q)  9) {
 + if (blk_rq_sectors(rq)  blk_queue_get_max_sectors(q, rq-cmd_flags)) {
   printk(KERN_ERR %s: over max size limit.\n, __func__);
   return -EIO;
   }
 diff --git a/block/blk-merge.c b/block/blk-merge.c
 index 86710ca..642b862 100644
 --- a/block/blk-merge.c
 +++ b/block/blk-merge.c
 @@ -275,14 +275,8 @@ no_merge:
  int ll_back_merge_fn(struct request_queue *q, struct request *req,
struct bio *bio)
  {
 - unsigned short max_sectors;
 -
 - if (unlikely(req-cmd_type == REQ_TYPE_BLOCK_PC))
 - max_sectors = queue_max_hw_sectors(q);
 - else
 - max_sectors = queue_max_sectors(q);
 -
 - if (blk_rq_sectors(req) + bio_sectors(bio)  max_sectors) {
 + if (blk_rq_sectors(req) + bio_sectors(bio) 
 + blk_rq_get_max_sectors(req)) {
   req-cmd_flags |= REQ_NOMERGE;
   if (req == q-last_merge)
   q-last_merge = NULL;
 @@ -299,15 +293,8 @@ int ll_back_merge_fn(struct request_queue *q, struct 
 request *req,
  int ll_front_merge_fn(struct request_queue *q, struct request *req,
 struct bio *bio)
  {
 - unsigned short max_sectors;
 -
 - if (unlikely(req-cmd_type == REQ_TYPE_BLOCK_PC))
 - max_sectors = queue_max_hw_sectors(q);
 - else
 - max_sectors = queue_max_sectors(q);
 -
 -
 - if (blk_rq_sectors(req) + bio_sectors(bio)  max_sectors) {
 + if (blk_rq_sectors(req) + bio_sectors(bio) 
 + blk_rq_get_max_sectors(req)) {
   req-cmd_flags |= REQ_NOMERGE;
   if (req == q-last_merge)
   q-last_merge = NULL;
 @@ -338,7 +325,8 @@ static int ll_merge_requests_fn(struct request_queue *q, 
 struct request *req,
   /*
* Will it become too large?
*/
 - if ((blk_rq_sectors(req) + blk_rq_sectors(next))  queue_max_sectors(q))
 + if ((blk_rq_sectors(req) + blk_rq_sectors(next)) 
 + blk_rq_get_max_sectors(req))
   return 0;
  
   total_phys_segments = req-nr_phys_segments + next-nr_phys_segments;
 @@ -417,6 +405,9 @@ static int attempt_merge(struct request_queue *q, struct 
 request *req,
   if (!rq_mergeable(req) || !rq_mergeable(next))
   return 0;
  
 + if (!blk_check_merge_flags(req-cmd_flags, next-cmd_flags))
 + return 0;
 +
   /*
* not contiguous
*/
 @@ -512,6 +503,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
   if (!rq_mergeable(rq) || !bio_mergeable(bio))
   return false;
  
 + if (!blk_check_merge_flags(rq-cmd_flags, bio-bi_rw))
 + return false;
 +
   /* different data direction or already started, don't merge */
   if (bio_data_dir(bio) != rq_data_dir(rq))
   return false;
 diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
 index 3a6fea7..90f7abe 100644
 --- a/include/linux/blkdev.h
 +++ b/include/linux/blkdev.h
 @@ -605,6 +605,18 @@ static inline bool rq_mergeable(struct request *rq)
   return true;
  }
  
 +static inline bool blk_check_merge_flags(unsigned int flags1,
 +  unsigned int flags2)
 +{
 + if ((flags1  REQ_DISCARD) != (flags2  REQ_DISCARD))
 + return false;
 +
 + if ((flags1  REQ_SECURE) != (flags2  REQ_SECURE))
 + return false;
 +
 + return true;
 +}
 +
  /*
   * q-prep_rq_fn return values
   */
 @@ -800,6 +812,25 @@ static inline unsigned int blk_rq_cur_sectors(const 
 struct request *rq)
   return blk_rq_cur_bytes(rq)  9;
  }
  
 +static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
 +  unsigned int 

[PATCH] qla2xxx: Fix endianness of task management response code

2012-09-18 Thread Roland Dreier
From: Roland Dreier rol...@purestorage.com

The qla2xxx firmware actually expects the task management response
code in a CTIO IOCB with SCSI status mode 1 to be in little-endian
byte order, ie the response code should be the first byte in the
sense_data[] array.  The old code erroneously byte-swapped the
response code, which puts it in the wrong place on the wire and leads
to initiators thinking every task management request succeeds (since
they see 0 in the byte where they look for the response code).

Cc: Chad Dupuis chad.dup...@qlogic.com
Cc: Arun Easi arun.e...@qlogic.com
Signed-off-by: Roland Dreier rol...@purestorage.com
---
 drivers/scsi/qla2xxx/qla_target.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index 5b30132..41b74ba 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -1403,7 +1403,7 @@ static void qlt_24xx_send_task_mgmt_ctio(struct 
scsi_qla_host *ha,
ctio-u.status1.scsi_status =
__constant_cpu_to_le16(SS_RESPONSE_INFO_LEN_VALID);
ctio-u.status1.response_len = __constant_cpu_to_le16(8);
-   ((uint32_t *)ctio-u.status1.sense_data)[0] = cpu_to_be32(resp_code);
+   ctio-u.status1.sense_data[0] = resp_code;
 
qla2x00_start_iocbs(ha, ha-req);
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla2xxx: Fix endianness of task management response code

2012-09-18 Thread Nicholas A. Bellinger
On Tue, 2012-09-18 at 15:10 -0700, Roland Dreier wrote:
 From: Roland Dreier rol...@purestorage.com
 
 The qla2xxx firmware actually expects the task management response
 code in a CTIO IOCB with SCSI status mode 1 to be in little-endian
 byte order, ie the response code should be the first byte in the
 sense_data[] array.  The old code erroneously byte-swapped the
 response code, which puts it in the wrong place on the wire and leads
 to initiators thinking every task management request succeeds (since
 they see 0 in the byte where they look for the response code).
 
 Cc: Chad Dupuis chad.dup...@qlogic.com
 Cc: Arun Easi arun.e...@qlogic.com
 Signed-off-by: Roland Dreier rol...@purestorage.com
 ---

Very nice catch here Roland!

  drivers/scsi/qla2xxx/qla_target.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/scsi/qla2xxx/qla_target.c 
 b/drivers/scsi/qla2xxx/qla_target.c
 index 5b30132..41b74ba 100644
 --- a/drivers/scsi/qla2xxx/qla_target.c
 +++ b/drivers/scsi/qla2xxx/qla_target.c
 @@ -1403,7 +1403,7 @@ static void qlt_24xx_send_task_mgmt_ctio(struct 
 scsi_qla_host *ha,
   ctio-u.status1.scsi_status =
   __constant_cpu_to_le16(SS_RESPONSE_INFO_LEN_VALID);
   ctio-u.status1.response_len = __constant_cpu_to_le16(8);
 - ((uint32_t *)ctio-u.status1.sense_data)[0] = cpu_to_be32(resp_code);
 + ctio-u.status1.sense_data[0] = resp_code;
  
   qla2x00_start_iocbs(ha, ha-req);
  }

Dropping this into queue for the moment, and will CC' to stable - move
into for-next once Chad  Co. have a chance to give their ACK.

Thank you!

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 47701] New: When too many disks fall out at the same time, RCU hangs

2012-09-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=47701

   Summary: When too many disks fall out at the same time, RCU
hangs
   Product: IO/Storage
   Version: 2.5
Kernel Version: 3.5.4
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: SCSI
AssignedTo: linux-scsi@vger.kernel.org
ReportedBy: sgunder...@bigfoot.com
Regression: No


Hi,

For whatever reason, I lost all of my disks at the same time (I guess a SAS
cable fell out; I'll know tomorrow). As was expected, I/O on the machine was
not so happy afterwards; what was not so expected, was the following output on
the serial console a minute or so after:

[292657.601264] INFO: rcu_sched self-detected stall on CPU[292657.602441] INFO:
rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 4, t=60002 jiffies)
[292657.602444] INFO: Stall ended before state dump start

[292657.620982]  {[292657.622964]  6}  (t=60024 jiffies)
Pid: 7800, comm: kworker/u:9 Not tainted 3.5.4 #1
[292657.631337] Call Trace:
[292657.634069]  IRQ  [8108ecaf] __rcu_pending+0xbd/0x417
[292657.640471]  [8108f0de] rcu_check_callbacks+0xd5/0x138
[292657.646765]  [81041b73] update_process_times+0x3c/0x73
[292657.653050]  [81072076] tick_sched_timer+0x6a/0x93
[292657.658999]  [8105301e] __run_hrtimer+0xb3/0x13e
[292657.664763]  [8107200c] ? tick_nohz_handler+0xd3/0xd3
[292657.670972]  [8103b7a3] ? __do_softirq+0x16c/0x182
[292657.676916]  [810533d5] hrtimer_interrupt+0xce/0x1b0
[292657.683030]  [8101ad3d] smp_apic_timer_interrupt+0x81/0x94
[292657.689676]  [81377447] apic_timer_interrupt+0x67/0x70
[292657.695965]  EOI  [813703dd] ?
_raw_spin_unlock_irqrestore+0x9/0xb
[292657.703695]  [8123daaa] scsi_remove_target+0x137/0x153
[292657.709985]  [812425dc] sas_rphy_remove+0x25/0x4e
[292657.715841]  [81242616] sas_rphy_delete+0x11/0x1e
[292657.721699]  [81242648] sas_port_delete+0x25/0x11a
[292657.727644]  [8136de53] ? mutex_unlock+0x9/0xb
[292657.733254]  [a0020fd5] mpt2sas_transport_port_remove+0x16f/0x190
[mpt2sas]
[292657.741576]  [a001a70b] _scsih_remove_device+0x58/0x84 [mpt2sas]
[292657.748731]  [a001a7f4] _scsih_device_remove_by_handle+0xbd/0xc6
[mpt2sas]
[292657.756960]  [a001c5bb]
_scsih_sas_topology_change_event+0x422/0x46d [mpt2sas]
[292657.765531]  [81064556] ? idle_balance+0xde/0x10c
[292657.771395]  [a001e098] ? _scsih_abort+0x1c1/0x1c1 [mpt2sas]
[292657.778212]  [a001e38d] _firmware_event_work+0x2f5/0x920
[mpt2sas]
[292657.785547]  [81042089] ? add_timer+0x17/0x1a
[292657.791058]  [8104bc64] ? queue_delayed_work_on+0xda/0xe8
[292657.797607]  [a001e098] ? _scsih_abort+0x1c1/0x1c1 [mpt2sas]
[292657.804418]  [8104c641] process_one_work+0x253/0x3c5
[292657.810534]  [8104cbb7] worker_thread+0x1d4/0x34d
[292657.816394]  [8104c9e3] ? rescuer_thread+0x230/0x230
[292657.822511]  [810500bb] kthread+0x84/0x8c
[292657.827675]  [81377c94] kernel_thread_helper+0x4/0x10
[292657.833875]  [81050037] ? kthread_freezable_should_stop+0x58/0x58
[292657.841117]  [81377c90] ? gs_change+0xb/0xb

I'm reporting this primarily because it could cause problems in some other
context (say, when only one or two disks disappear); of course, for me in this
case, it wouldn't matter if I/O was properly rejected or the entire machine
hanged, it's useless anyway.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html