Re: Fail to probe qla2xxx fiber channel card while doing pci hotplug
On 2012/9/19 1:54, Bjorn Helgaas wrote: On Mon, Sep 17, 2012 at 6:06 AM, Yijing Wang wangyij...@huawei.com wrote: On 2012/9/16 11:30, Bjorn Helgaas wrote: On Sat, Sep 15, 2012 at 4:22 AM, Yijing Wang wangyij...@huawei.com wrote: Hi all, I encountered a very strange problem when I hot plug a fiber channel card(using qla2xxx driver). I did the hotplug in arch x86 machine, using pciehp driver for hotplug, this platform supports pci hot-plug triggering from both sysfs and attention button. If a hot-plug slot is empty when system boot-up, then hotplug FC card in this slot is ok. If a hot-plug slot has been embeded a FC card when system boot-up, hot-remove this card is ok, but hot-add this card will fail. I used #modprobe qla2xxx ql2xextended_error_logging=0x7fff to get all probe info. As bellow: Can anyone give me any suggestion for this problem? It sounds like you did this: 1) Power down system 2) Remove FC card from slot 3) Boot system 4) Hot-add FC card 5) Load qla2xxx driver 6) qla2xxx driver claims FC card 7) FC card works correctly 8) Power down system 9) Install FC card in slot 10) Boot system 11) Load qla2xxx driver 12) qla2xxx driver claims FC card 13) FC card works correctly I rmmod qla2xxx driver here and modprobe qla2xxx ql2xextended_error_logging=0x1e40 again for get errors info Also I modprobe pciehp pciehp_debug=1 for getting debug info 14) Hot-remove card 15) Hot-add card 16) qla2xxx driver claims FC card 17) FC card does not work and I assume the dmesg log you included is just from steps 15 and 16 (correct me if I'm wrong). It would be useful to see the entire log showing all these events so we can compare the working cases with the non-working one. If you use the pciehp_debug module parameter, we should also see some pciehp events that would help me understand that driver. Hi Bjorn, Thanks for your comments very much! My steps: 1) power down system 2) Install FC card in slot 3) Boot system 4) Load qla2xxx driver 5) qla2xxx driver claims FC card 6) FC card works correctly(at least probe return ok, I don't know qla2xxx driver much..) 7) rmmod qla2xxx 8) modprobe qla2xxx ql2xextended_error_logging=0x1e40(for get errors info) 9) modprobe pciehp pciehp_debug=1 10) Hot-remove card 11) Hot-add card 12) qla2xxx driver claims FC card fail(probe return fail, setup chip fail) --so this is failed situation-- --continue to hot-add fc card into empty slot(also support pci hp) 13) Install FC card in empty slot 14) Hot-add card 15) qla2xxx driver claims FC card ok (probe return ok) btw: If fc card firmware version 4.03, everything is ok (hot-plug in any slots(empty or not)) fc card firmware version is 4.04 or 5.04 , situation as same as 1)---12) Thanks. The FW change is a good clue. If everything works with version 4.03, but it doesn't work with version 4.04, it's likely to be a FW problem, not a Linux PCI core problem. Here's what I see from your logs. In slot 4 (bus 08), the card was present before boot, you removed it, re-added it, and it failed after being re-added. Slot 3 (bus 06) was empty at boot, you hot-added a card, and it worked. Here are the resources available on those two buses and the boot-time config of the first device in slot 4: pci :00:07.0: PCI bridge to [bus 06-07] pci :00:07.0: bridge window [io 0xc000-0xcfff] pci :00:07.0: bridge window [mem 0xf900-0xf9ff] pci :00:07.0: bridge window [mem 0xf100-0xf1ff 64bit pref] pci :00:09.0: PCI bridge to [bus 08-09] pci :00:09.0: bridge window [io 0xb000-0xbfff] pci :00:09.0: bridge window [mem 0xf800-0xf8ff] pci :00:09.0: bridge window [mem 0xf000-0xf0ff 64bit pref] pci :08:00.0: [1077:2532] type 00 class 0x0c0400 pci :08:00.0: reg 10: [io 0xb100-0xb1ff] pci :08:00.0: reg 14: [mem 0xf8084000-0xf8087fff 64bit] pci :08:00.0: reg 30: [mem 0xf804-0xf807 pref] After you remove and re-add the card in slot 4, it starts with uninitialized BARs as expected, then we assign resources to it. It's sort of interesting that the BIOS had originally put the ROM (reg 30) in the non-prefetchable window, while after the hot-add, Linux places it in the prefetchable window. Either should work, and in fact the card you added in slot 3 *does* work with its ROM in the prefetchable window. pci :08:00.0: [1077:2532] type 00 class 0x0c0400 pci :08:00.0: reg 10: [io 0x-0x00ff] pci :08:00.0: reg 14: [mem 0x-0x3fff 64bit] pci :08:00.0: reg 30: [mem 0x-0x0003 pref] pci :08:00.0: BAR 0: assigned [io 0xb000-0xb0ff] pci :08:00.0: BAR 1: assigned [mem
Re: [PATCH] qla2xxx: Fix endianness of task management response code
From: Roland Dreier rol...@purestorage.com The qla2xxx firmware actually expects the task management response code in a CTIO IOCB with SCSI status mode 1 to be in little-endian byte order, ie the response code should be the first byte in the sense_data[] array. The old code erroneously byte-swapped the response code, which puts it in the wrong place on the wire and leads to initiators thinking every task management request succeeds (since they see 0 in the byte where they look for the response code). Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Arun Easi arun.e...@qlogic.com Signed-off-by: Roland Dreier rol...@purestorage.com --- drivers/scsi/qla2xxx/qla_target.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 5b30132..41b74ba 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -1403,7 +1403,7 @@ static void qlt_24xx_send_task_mgmt_ctio(struct scsi_qla_host *ha, ctio-u.status1.scsi_status = __constant_cpu_to_le16(SS_RESPONSE_INFO_LEN_VALID); ctio-u.status1.response_len = __constant_cpu_to_le16(8); - ((uint32_t *)ctio-u.status1.sense_data)[0] = cpu_to_be32(resp_code); + ctio-u.status1.sense_data[0] = resp_code; qla2x00_start_iocbs(ha, ha-req); } -- 1.7.10.4 Acked-by: Saurav Kashyap saurav.kash...@qlogic.com Thanks, ~Saurav This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/4] Support runtime power off of HDD
v2: Introduced a new flag sync_before_stop as scsi_stop = ata_flush_cache + ata_standby_immediate. This flag specifies that when stop command is to send to this device, please issue an additional synchronize cache command to it before the stop command. Put device into stop power condition on runtime suspend. v1: This patch set is baed on v7 ZPODD patches. Aaron Lu (4): scsi: introduce sync_before_stop flag scsi: sd: enter stop power condition on runtime suspend scsi: sd: set ready_to_power_off for scsi disk libata: acpi: set can_power_off for both ODD and HDD drivers/ata/libata-acpi.c | 25 + drivers/ata/libata-scsi.c | 1 + drivers/scsi/sd.c | 11 +-- include/scsi/scsi_device.h | 1 + 4 files changed, 28 insertions(+), 10 deletions(-) -- 1.7.12.21.g871e293 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/4] scsi: introduce sync_before_stop flag
When scsi device received stop command, it will take care of its internal cache before enters stopped power condition. This command is translated to standby immediate in libata-scsi, but standby doesn't imply flush cache for ATA device, so to issue stop command to ATA device, an additional flush cache has to be issued. Introduce this flag so that when we are to stop the ATA disk in scsi disk driver, also flush its internal cache. Signed-off-by: Aaron Lu aaron...@intel.com --- include/scsi/scsi_device.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 4712aa1..26c3621 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -160,6 +160,7 @@ struct scsi_device { unsigned ready_to_power_off:1; /* Device is ready to be powered off */ unsigned powered_off:1; /* Device is powered off */ unsigned may_power_off:1; /* Power off is allowed by user */ + unsigned sync_before_stop:1; /* Sync cache before stop */ DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */ struct list_head event_list;/* asserted events */ -- 1.7.12.21.g871e293 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/4] scsi: sd: enter stop power condition on runtime suspend
On runtime suspend, put device into stop power condition. For standard scsi device, no need to synchronize cache. For ata devices, issue synchronize cache before enter standby state as reflected by the sync_before_stop flag. Signed-off-by: Aaron Lu aaron...@intel.com --- drivers/ata/libata-scsi.c | 1 + drivers/scsi/sd.c | 10 -- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 8ec81ca..9307f52 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1117,6 +1117,7 @@ static int ata_scsi_dev_config(struct scsi_device *sdev, } else { sdev-sector_size = ata_id_logical_sector_size(dev-id); sdev-manage_start_stop = 1; + sdev-sync_before_stop = 1; } /* diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 4df73e5..7b0f07a 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2866,18 +2866,24 @@ static int sd_suspend(struct device *dev, pm_message_t mesg) { struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev); int ret = 0; + bool sync; if (!sdkp) return 0; /* this can happen */ - if (sdkp-WCE) { + if (!sdkp-device-manage_start_stop) + sync = !!sdkp-WCE; + else + sync = sdkp-device-sync_before_stop !!sdkp-WCE; + + if (sync) { sd_printk(KERN_NOTICE, sdkp, Synchronizing SCSI cache\n); ret = sd_sync_cache(sdkp); if (ret) goto done; } - if ((mesg.event PM_EVENT_SLEEP) sdkp-device-manage_start_stop) { + if (sdkp-device-manage_start_stop) { sd_printk(KERN_NOTICE, sdkp, Stopping disk\n); ret = sd_start_stop_device(sdkp, 0); } -- 1.7.12.21.g871e293 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/4] libata: acpi: set can_power_off for both ODD and HDD
Hard disk may also be runtime powered off, so set can_power_off flag for it too if condition satisfies so that the may_power_off sysfs file will be created for it to give user a chance to disable runtime power off. Signed-off-by: Aaron Lu aaron...@intel.com Acked-by: Jeff Garzik jgar...@redhat.com --- drivers/ata/libata-acpi.c | 25 + 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c index 24347e0..443c3f2 100644 --- a/drivers/ata/libata-acpi.c +++ b/drivers/ata/libata-acpi.c @@ -1015,7 +1015,7 @@ static void ata_acpi_add_pm_notifier(struct ata_device *dev) if (ACPI_FAILURE(status)) return; - if (dev-sdev-can_power_off) { + if (dev-class == ATA_DEV_ATAPI dev-sdev-can_power_off) { acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY, ata_acpi_wake_dev, dev); device_set_run_wake(dev-sdev-sdev_gendev, true); @@ -1036,7 +1036,7 @@ static void ata_acpi_remove_pm_notifier(struct ata_device *dev) if (ACPI_FAILURE(status)) return; - if (dev-sdev-can_power_off) { + if (dev-class == ATA_DEV_ATAPI dev-sdev-can_power_off) { device_set_run_wake(dev-sdev-sdev_gendev, false); acpi_remove_notify_handler(handle, ACPI_SYSTEM_NOTIFY, ata_acpi_wake_dev); @@ -1140,14 +1140,23 @@ static int ata_acpi_bind_device(struct ata_port *ap, struct scsi_device *sdev, /* * If firmware has _PS3 or _PR3 for this device, -* and this ata ODD device support device attention, -* it means this device can be powered off +* it means this device can be powered off runtime */ states = acpi_dev-power.states; - if ((states[ACPI_STATE_D3_HOT].flags.valid || - states[ACPI_STATE_D3_COLD].flags.explicit_set) - ata_dev-flags ATA_DFLAG_DA) - sdev-can_power_off = 1; + if (states[ACPI_STATE_D3_HOT].flags.valid || + states[ACPI_STATE_D3_COLD].flags.explicit_set) { + /* +* For ODD, it needs to support device attention or +* it can't be powered up back by user +*/ + if (ata_dev-class == ATA_DEV_ATAPI + ata_dev-flags ATA_DFLAG_DA) + sdev-can_power_off = 1; + + /* No requirement for hard disk */ + if (ata_dev-class == ATA_DEV_ATA) + sdev-can_power_off = 1; + } return 0; } -- 1.7.12.21.g871e293 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/4] scsi: sd: set ready_to_power_off for scsi disk
The ready_to_power_off flag is used to give indication to ATA layer if this device's power can be removed after runtime suspended from the perspective of scsi driver. It is introduced to support zero power ODD. When ODD is runtime suspended, it may not be OK to remove its power. But for disk, there doesn't exist a scenario that when the disk is runtime suspended, it can't be powered off. Signed-off-by: Aaron Lu aaron...@intel.com --- drivers/scsi/sd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 7b0f07a..009d30c 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2638,6 +2638,7 @@ static void sd_probe_async(void *data, async_cookie_t cookie) sd_printk(KERN_NOTICE, sdkp, Attached SCSI %sdisk\n, sdp-removable ? removable : ); + sdp-ready_to_power_off = 1; scsi_autopm_put_device(sdp); put_device(sdkp-dev); } -- 1.7.12.21.g871e293 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag
On Tuesday 18 September 2012 15:00:28 Aaron Lu wrote: When scsi device received stop command, it will take care of its internal cache before enters stopped power condition. This command is translated to standby immediate in libata-scsi, but standby doesn't imply flush cache for ATA device, so to issue stop command to ATA device, an additional flush cache has to be issued. Why not just set WCE? Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag
On Tue, Sep 18, 2012 at 09:30:11AM +0200, Oliver Neukum wrote: On Tuesday 18 September 2012 15:00:28 Aaron Lu wrote: When scsi device received stop command, it will take care of its internal cache before enters stopped power condition. This command is translated to standby immediate in libata-scsi, but standby doesn't imply flush cache for ATA device, so to issue stop command to ATA device, an additional flush cache has to be issued. Why not just set WCE? This flag is used for devices whose WCE bit is set. This flag means, when we are to issue a scsi stop command, we need to issue an additional sync cache command first. For scsi device, per the spec, there is no such need as the device will take care of its internal cache when going to stop power condition. But for ata device, the stop command is translated to standby immediate, and we have to flush the internal cache before enter standby. -Aaron -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag
On Tue, 2012-09-18 at 15:00 +0800, Aaron Lu wrote: When scsi device received stop command, it will take care of its internal cache before enters stopped power condition. This command is translated to standby immediate in libata-scsi, but standby doesn't imply flush cache for ATA device, so to issue stop command to ATA device, an additional flush cache has to be issued. Introduce this flag so that when we are to stop the ATA disk in scsi disk driver, also flush its internal cache. Signed-off-by: Aaron Lu aaron...@intel.com --- include/scsi/scsi_device.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 4712aa1..26c3621 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -160,6 +160,7 @@ struct scsi_device { unsigned ready_to_power_off:1; /* Device is ready to be powered off */ unsigned powered_off:1; /* Device is powered off */ unsigned may_power_off:1; /* Power off is allowed by user */ + unsigned sync_before_stop:1; /* Sync cache before stop */ Why do you need this? Surely it's all conditioned on the WCE flag. If WCE isn't set, the cache is write through (or uncached) and there's no need for a sync before power down. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag
On Tue, Sep 18, 2012 at 08:56:55AM +0100, James Bottomley wrote: On Tue, 2012-09-18 at 15:00 +0800, Aaron Lu wrote: When scsi device received stop command, it will take care of its internal cache before enters stopped power condition. This command is translated to standby immediate in libata-scsi, but standby doesn't imply flush cache for ATA device, so to issue stop command to ATA device, an additional flush cache has to be issued. Introduce this flag so that when we are to stop the ATA disk in scsi disk driver, also flush its internal cache. Signed-off-by: Aaron Lu aaron...@intel.com --- include/scsi/scsi_device.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 4712aa1..26c3621 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -160,6 +160,7 @@ struct scsi_device { unsigned ready_to_power_off:1; /* Device is ready to be powered off */ unsigned powered_off:1; /* Device is powered off */ unsigned may_power_off:1; /* Power off is allowed by user */ + unsigned sync_before_stop:1; /* Sync cache before stop */ Why do you need this? Surely it's all conditioned on the WCE flag. If WCE isn't set, the cache is write through (or uncached) and there's no need for a sync before power down. The set of this flag doesn't mean we will sync cache for sure. It's only meaningful when WCE is set, and in that case, it means when we are to send a stop command to the device, do we need to send an additional flush cache command first? In sd_suspend, the cache will be synchronized when: 1 For devices do not support start_stop, always; 2 For devices support start_stop, if it is standard scsi device, never; and if it is an ata device(reflected by this newly introduced flag), always. The reason for this is, the scsi's stop command = flush_cache + standby for ata, so I have to introduce this flag to explicitely do a cache flush. Makes sense? -Aaron -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag
On Tue, 2012-09-18 at 16:09 +0800, Aaron Lu wrote: On Tue, Sep 18, 2012 at 08:56:55AM +0100, James Bottomley wrote: On Tue, 2012-09-18 at 15:00 +0800, Aaron Lu wrote: When scsi device received stop command, it will take care of its internal cache before enters stopped power condition. This command is translated to standby immediate in libata-scsi, but standby doesn't imply flush cache for ATA device, so to issue stop command to ATA device, an additional flush cache has to be issued. Introduce this flag so that when we are to stop the ATA disk in scsi disk driver, also flush its internal cache. Signed-off-by: Aaron Lu aaron...@intel.com --- include/scsi/scsi_device.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 4712aa1..26c3621 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -160,6 +160,7 @@ struct scsi_device { unsigned ready_to_power_off:1; /* Device is ready to be powered off */ unsigned powered_off:1; /* Device is powered off */ unsigned may_power_off:1; /* Power off is allowed by user */ + unsigned sync_before_stop:1; /* Sync cache before stop */ Why do you need this? Surely it's all conditioned on the WCE flag. If WCE isn't set, the cache is write through (or uncached) and there's no need for a sync before power down. The set of this flag doesn't mean we will sync cache for sure. It's only meaningful when WCE is set, and in that case, it means when we are to send a stop command to the device, do we need to send an additional flush cache command first? Then surely it indicates support for ACPI power down and it's wrongly named? In sd_suspend, the cache will be synchronized when: 1 For devices do not support start_stop, always; 2 For devices support start_stop, if it is standard scsi device, never; and if it is an ata device(reflected by this newly introduced flag), always. This doesn't look right to me. I think it's probably just a layering violation. For sd, we need to use the standard SCSI commands. That means we treat power states as the SCSI standard says. The fact that ATA devices may be required to translate START_STOP_UNIT with STANDBY as a flush followed by one of the ATA standby commands. This is very important: if we construct a libata SATL that doesn't conform to the standards, things will eventually explode when we try to interact with devices with their own internal SATL (like the LSI card, or various USB devices) because eventually we'll make one unexpected interaction too many. The reason for this is, the scsi's stop command = flush_cache + standby for ata, so I have to introduce this flag to explicitely do a cache flush. Makes sense? Really, no: your flag equates to this is an ATA device, treat it differently that's knowledge that's not supposed to be in sd unless absolutely necessary. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/4] scsi: introduce sync_before_stop flag
On Tue, Sep 18, 2012 at 09:20:20AM +0100, James Bottomley wrote: On Tue, 2012-09-18 at 16:09 +0800, Aaron Lu wrote: On Tue, Sep 18, 2012 at 08:56:55AM +0100, James Bottomley wrote: On Tue, 2012-09-18 at 15:00 +0800, Aaron Lu wrote: When scsi device received stop command, it will take care of its internal cache before enters stopped power condition. This command is translated to standby immediate in libata-scsi, but standby doesn't imply flush cache for ATA device, so to issue stop command to ATA device, an additional flush cache has to be issued. Introduce this flag so that when we are to stop the ATA disk in scsi disk driver, also flush its internal cache. Signed-off-by: Aaron Lu aaron...@intel.com --- include/scsi/scsi_device.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 4712aa1..26c3621 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -160,6 +160,7 @@ struct scsi_device { unsigned ready_to_power_off:1; /* Device is ready to be powered off */ unsigned powered_off:1; /* Device is powered off */ unsigned may_power_off:1; /* Power off is allowed by user */ + unsigned sync_before_stop:1; /* Sync cache before stop */ Why do you need this? Surely it's all conditioned on the WCE flag. If WCE isn't set, the cache is write through (or uncached) and there's no need for a sync before power down. The set of this flag doesn't mean we will sync cache for sure. It's only meaningful when WCE is set, and in that case, it means when we are to send a stop command to the device, do we need to send an additional flush cache command first? Then surely it indicates support for ACPI power down and it's wrongly named? It's generic ATA requirement that before standby, cache has to be flushed and unlike scsi, standby does not imply cache flush. In sd_suspend, the cache will be synchronized when: 1 For devices do not support start_stop, always; 2 For devices support start_stop, if it is standard scsi device, never; and if it is an ata device(reflected by this newly introduced flag), always. This doesn't look right to me. I think it's probably just a layering violation. For sd, we need to use the standard SCSI commands. That means we treat power states as the SCSI standard says. The fact that ATA devices may be required to translate START_STOP_UNIT with STANDBY as a flush followed by one of the ATA standby commands. This is very important: if we construct a libata SATL that doesn't conform to the standards, things will eventually explode when we try to interact with devices with their own internal SATL (like the LSI card, or various USB devices) because eventually we'll make one unexpected interaction too many. I agree that it is better handled in libata's SALT, I tried to do this but didn't find a good way so I introduced this flag. The SALT is 1-1 mapping, I'm not sure how to handle this 1-2 mapping. I'll check this again to see how to do it there. Thanks for your comments. -Aaron -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] scsi: eesox: use __iomem pointers for MMIO
On Monday 17 September 2012, Russell King - ARM Linux wrote: In both of my replies, I've said as x86 does. We need to follow x86's behaviour here, which is as I've quoted - it's not a matter that I need to make up my mind - my mind is already well made up. That is, we need to follow x86 here. That is, const volatile void __iomem * for reads, and volatile void __iomem * for writes. Ok, I'll keep the patch in the series then, as it only changes the pointer that we do an MMIO write on, not the ones for MMIO read. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [V4 PATCH 0/8] csiostor: Chelsio FCoE offload driver submission
On Tue, 2012-09-18 at 09:54 +0530, Naresh Kumar Inna wrote: Hi James, Could you please consider merging version V4 of the driver patches, if you think they are in good shape now? Well, definitely not yet; you seem to have missed the memo on readq: CC [M] drivers/scsi/cxgbi/cxgb4i/cxgb4i.o drivers/scsi/csiostor/csio_hw.c: In function ‘csio_hw_mc_read’: drivers/scsi/csiostor/csio_hw.c:194:3: error: implicit declaration of function ‘readq’ [-Werror=implicit-function-declaration] It's only defined on platforms which can support an atomic 64 bit operation (which is mostly not any 32 bit platforms) ... so this could do with compile testing on those. To see how to handle readq/writeq in the 32 bit case, see the uses in fnic or qla2xxx You also have a couple of unnecessary version.h includes. Since you're a new driver, could you not do a correctly unlocked queuecommand routine? You'll find the way you've currently got it coded (holding the host lock for the entire queuecommand routine) is very performance detrimental. You have a lot of locking statements which aren't easy to audit by hand because there are multiple unlocks. Things like this: csio_scan_finished(struct Scsi_Host *shost, unsigned long time) { struct csio_lnode *ln = shost_priv(shost); int rv = 0; spin_lock_irq(shost-host_lock); if (!ln-hwp || csio_list_deleted(ln-sm.sm_list)) { spin_unlock_irq(shost-host_lock); return 1; } rv = csio_scan_done(ln, jiffies, time, csio_max_scan_tmo * HZ, csio_delta_scan_tmo * HZ); spin_unlock_irq(shost-host_lock); return rv; } Are better coded as csio_scan_finished(struct Scsi_Host *shost, unsigned long time) { struct csio_lnode *ln = shost_priv(shost); int rv = 1; spin_lock_irq(shost-host_lock); if (!ln-hwp || csio_list_deleted(ln-sm.sm_list)) goto out; rv = csio_scan_done(ln, jiffies, time, csio_max_scan_tmo * HZ, csio_delta_scan_tmo * HZ); out: spin_unlock_irq(shost-host_lock); return rv; } It's shorter and the unlock clearly matches the lock. You could even invert the if logic and just make the csio_scan_done() conditional on it avoiding the goto. I'd also really like the people who understand FC to take a look over this as well. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel crash with unsupported DIF protection type
On 09/18/2012 11:30 AM, Douglas Gilbert wrote: On 12-09-18 11:04 AM, Hannes Reinecke wrote: Hi all, I recently got my hands on some weird drives, insisting on having been formatted with protection type 7: # sg_readcap --16 /dev/sdb Read Capacity results: Protection: prot_en=1, p_type=6, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last logical block address=1758174767 (0x68cb9e2f), Number of logical blocks=1758174768 Logical block length=512 bytes Logical blocks per physical block exponent=0 Lowest aligned logical block address=0 Hence: Device size: 900185481216 bytes, 858483.8 MiB, 900.19 GB (I know. I've already complained.) However, this drive causes a horrible crash: sd 0:0:1:0: [sdb] formatted with unsupported protection type 7. Disabling disk! sd 0:0:1:0: [sdb] 1758174768 512-byte logical blocks: (900 GB/838 GiB) sd 0:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 0:0:1:0: [sdb] Sense Key : Medium Error [current] sd 0:0:1:0: [sdb] Add. Sense: Medium format corrupted sd 0:0:1:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 end_request: I/O error, dev sdb, sector 0 Buffer I/O error on device sdb, logical block 0 [ tons and tons of I/O errors ] [ 15.551689] sd 0:0:1:0: [sdb] Enabling DIX T10-DIF-TYPE1-CRC protection [ 15.561353] [ cut here ] [ 15.569416] kernel BUG at /usr/src/linux-3.0/drivers/scsi/scsi_lib.c:1069! There are several odd things happening here: - It says 'Disabling disk', which _should_ have set the capacity to '0': drivers/scsi/sd.c:sd_read_protection_type() if (type SD_DIF_TYPE3_PROTECTION) { sd_printk(KERN_ERR, sdkp, formatted with unsupported \ protection type %u. Disabling disk!\n, type); sdkp-capacity = 0; return; } - it enables type 1 protection, which it evidently is not. I've attached a tentative patch, which allows the system to boot. However, I'm not completely happy with that, as the capacity is _still_ updated after revalidation: sd 0:0:1:0: [sdb] formatted with unsupported protection type 7. Disabling disk! sd 0:0:1:0: [sdb] 0 512-byte logical blocks: (0 B/0 B) sd 0:0:1:0: [sdb] Write Protect is off sd 0:0:1:0: [sdb] Mode Sense: cf 00 10 08 sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA sd 0:0:1:0: [sdb] 1758174768 512-byte logical blocks: (900 GB/838 GiB) sd 0:0:1:0: [sdb] Attached SCSI disk Thoughts? The Medium format corrupted additional sense qualifier occurs (with Seagate disks) when a FORMAT UNIT command is interrupted. So maybe, for good measure, that disk also sets the DIF protection type to an unsupported value (i.e. 7). So re-doing a FORMAT UNIT may clear that state. Yeah, that's what I'm doing now. Incidentally: sg_format manpage says: Format with type 1 protection: sg_format --format --fmtpinfo=3 --pfu /dev/sdm but the '--pfu' option requires an argument. According to sbc3r23 the command should read: sg_format --format --fmtpinfo=3 --pfu=1 /dev/sdm Correct? Obviously the kernel should not crash when faced with such a disk. Precisely. Especially the 'Disabling disk' no-op is worrying. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel crash with unsupported DIF protection type
On 12-09-18 11:35 AM, Hannes Reinecke wrote: On 09/18/2012 11:30 AM, Douglas Gilbert wrote: On 12-09-18 11:04 AM, Hannes Reinecke wrote: Hi all, I recently got my hands on some weird drives, insisting on having been formatted with protection type 7: # sg_readcap --16 /dev/sdb Read Capacity results: Protection: prot_en=1, p_type=6, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last logical block address=1758174767 (0x68cb9e2f), Number of logical blocks=1758174768 Logical block length=512 bytes Logical blocks per physical block exponent=0 Lowest aligned logical block address=0 Hence: Device size: 900185481216 bytes, 858483.8 MiB, 900.19 GB (I know. I've already complained.) However, this drive causes a horrible crash: sd 0:0:1:0: [sdb] formatted with unsupported protection type 7. Disabling disk! sd 0:0:1:0: [sdb] 1758174768 512-byte logical blocks: (900 GB/838 GiB) sd 0:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 0:0:1:0: [sdb] Sense Key : Medium Error [current] sd 0:0:1:0: [sdb] Add. Sense: Medium format corrupted sd 0:0:1:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 end_request: I/O error, dev sdb, sector 0 Buffer I/O error on device sdb, logical block 0 [ tons and tons of I/O errors ] [ 15.551689] sd 0:0:1:0: [sdb] Enabling DIX T10-DIF-TYPE1-CRC protection [ 15.561353] [ cut here ] [ 15.569416] kernel BUG at /usr/src/linux-3.0/drivers/scsi/scsi_lib.c:1069! There are several odd things happening here: - It says 'Disabling disk', which _should_ have set the capacity to '0': drivers/scsi/sd.c:sd_read_protection_type() if (type SD_DIF_TYPE3_PROTECTION) { sd_printk(KERN_ERR, sdkp, formatted with unsupported \ protection type %u. Disabling disk!\n, type); sdkp-capacity = 0; return; } - it enables type 1 protection, which it evidently is not. I've attached a tentative patch, which allows the system to boot. However, I'm not completely happy with that, as the capacity is _still_ updated after revalidation: sd 0:0:1:0: [sdb] formatted with unsupported protection type 7. Disabling disk! sd 0:0:1:0: [sdb] 0 512-byte logical blocks: (0 B/0 B) sd 0:0:1:0: [sdb] Write Protect is off sd 0:0:1:0: [sdb] Mode Sense: cf 00 10 08 sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA sd 0:0:1:0: [sdb] 1758174768 512-byte logical blocks: (900 GB/838 GiB) sd 0:0:1:0: [sdb] Attached SCSI disk Thoughts? The Medium format corrupted additional sense qualifier occurs (with Seagate disks) when a FORMAT UNIT command is interrupted. So maybe, for good measure, that disk also sets the DIF protection type to an unsupported value (i.e. 7). So re-doing a FORMAT UNIT may clear that state. Yeah, that's what I'm doing now. Incidentally: sg_format manpage says: Format with type 1 protection: sg_format --format --fmtpinfo=3 --pfu /dev/sdm but the '--pfu' option requires an argument. According to sbc3r23 the command should read: sg_format --format --fmtpinfo=3 --pfu=1 /dev/sdm Correct? Yes. And it is corrected in the 1.34 betas that I have sent you :-) I placed a copy of my latest sg3_utils-1.34 beta in the News section at the top of this page: http://sg.danny.cz/sg Doug Gilbert -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] block: Consolidate command flag and queue limit checks for merges
From: Martin K. Petersen martin.peter...@oracle.com - blk_check_merge_flags() verifies that cmd_flags / bi_rw are compatible. This function is called for both req-req and req-bio merging. - blk_rq_get_max_sectors() and blk_queue_get_max_sectors() can be used to query the maximum sector count for a given request or queue. The calls will return the right value from the queue limits given the type of command (RW, discard, write same, etc.) Signed-off-by: Martin K. Petersen martin.peter...@oracle.com --- block/blk-core.c |3 +-- block/blk-merge.c | 30 -- include/linux/blkdev.h | 31 +++ 3 files changed, 44 insertions(+), 20 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 5cc2929..33eded0 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1866,8 +1866,7 @@ int blk_rq_check_limits(struct request_queue *q, struct request *rq) if (!rq_mergeable(rq)) return 0; - if (blk_rq_sectors(rq) queue_max_sectors(q) || - blk_rq_bytes(rq) queue_max_hw_sectors(q) 9) { + if (blk_rq_sectors(rq) blk_queue_get_max_sectors(q, rq-cmd_flags)) { printk(KERN_ERR %s: over max size limit.\n, __func__); return -EIO; } diff --git a/block/blk-merge.c b/block/blk-merge.c index 86710ca..642b862 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -275,14 +275,8 @@ no_merge: int ll_back_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { - unsigned short max_sectors; - - if (unlikely(req-cmd_type == REQ_TYPE_BLOCK_PC)) - max_sectors = queue_max_hw_sectors(q); - else - max_sectors = queue_max_sectors(q); - - if (blk_rq_sectors(req) + bio_sectors(bio) max_sectors) { + if (blk_rq_sectors(req) + bio_sectors(bio) + blk_rq_get_max_sectors(req)) { req-cmd_flags |= REQ_NOMERGE; if (req == q-last_merge) q-last_merge = NULL; @@ -299,15 +293,8 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, int ll_front_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { - unsigned short max_sectors; - - if (unlikely(req-cmd_type == REQ_TYPE_BLOCK_PC)) - max_sectors = queue_max_hw_sectors(q); - else - max_sectors = queue_max_sectors(q); - - - if (blk_rq_sectors(req) + bio_sectors(bio) max_sectors) { + if (blk_rq_sectors(req) + bio_sectors(bio) + blk_rq_get_max_sectors(req)) { req-cmd_flags |= REQ_NOMERGE; if (req == q-last_merge) q-last_merge = NULL; @@ -338,7 +325,8 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, /* * Will it become too large? */ - if ((blk_rq_sectors(req) + blk_rq_sectors(next)) queue_max_sectors(q)) + if ((blk_rq_sectors(req) + blk_rq_sectors(next)) + blk_rq_get_max_sectors(req)) return 0; total_phys_segments = req-nr_phys_segments + next-nr_phys_segments; @@ -417,6 +405,9 @@ static int attempt_merge(struct request_queue *q, struct request *req, if (!rq_mergeable(req) || !rq_mergeable(next)) return 0; + if (!blk_check_merge_flags(req-cmd_flags, next-cmd_flags)) + return 0; + /* * not contiguous */ @@ -512,6 +503,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) if (!rq_mergeable(rq) || !bio_mergeable(bio)) return false; + if (!blk_check_merge_flags(rq-cmd_flags, bio-bi_rw)) + return false; + /* different data direction or already started, don't merge */ if (bio_data_dir(bio) != rq_data_dir(rq)) return false; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3a6fea7..90f7abe 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -605,6 +605,18 @@ static inline bool rq_mergeable(struct request *rq) return true; } +static inline bool blk_check_merge_flags(unsigned int flags1, +unsigned int flags2) +{ + if ((flags1 REQ_DISCARD) != (flags2 REQ_DISCARD)) + return false; + + if ((flags1 REQ_SECURE) != (flags2 REQ_SECURE)) + return false; + + return true; +} + /* * q-prep_rq_fn return values */ @@ -800,6 +812,25 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq) return blk_rq_cur_bytes(rq) 9; } +static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q, +unsigned int cmd_flags) +{ + if (unlikely(cmd_flags REQ_DISCARD)) + return
[PATCH 4/8] block: Make blkdev_issue_zeroout use WRITE SAME
From: Martin K. Petersen martin.peter...@oracle.com If the device supports WRITE SAME, use that to optimize zeroing of blocks. If the device does not support WRITE SAME or if the operation fails, fall back to writing zeroes the old-fashioned way. Signed-off-by: Martin K. Petersen martin.peter...@oracle.com Acked-by: Mike Snitzer snit...@redhat.com --- block/blk-lib.c | 30 +- 1 files changed, 29 insertions(+), 1 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index a062543..9373b58 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -214,7 +214,7 @@ EXPORT_SYMBOL(blkdev_issue_write_same); * Generate and issue number of bios with zerofiled pages. */ -int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, +int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask) { int ret; @@ -264,4 +264,32 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, return ret; } + +/** + * blkdev_issue_zeroout - zero-fill a block range + * @bdev: blockdev to write + * @sector:start sector + * @nr_sects: number of sectors to write + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + * Generate and issue number of bios with zerofiled pages. + */ + +int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, +sector_t nr_sects, gfp_t gfp_mask) +{ + if (bdev_write_same(bdev)) { + unsigned char bdn[BDEVNAME_SIZE]; + + if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, +ZERO_PAGE(0))) + return 0; + + bdevname(bdev, bdn); + pr_err(%s: WRITE SAME failed. Manually zeroing.\n, bdn); + } + + return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); +} EXPORT_SYMBOL(blkdev_issue_zeroout); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] block: Clean up special command handling logic
From: Martin K. Petersen martin.peter...@oracle.com Remove special-casing of non-rw fs style requests (discard). The nomerge flags are consolidated in blk_types.h, and rq_mergeable() and bio_mergeable() have been modified to use them. bio_is_rw() is used in place of bio_has_data() a few places. This is done to to distinguish true reads and writes from other fs type requests that carry a payload (e.g. write same). Signed-off-by: Martin K. Petersen martin.peter...@oracle.com Acked-by: Mike Snitzer snit...@redhat.com --- block/blk-core.c | 13 ++--- block/blk-merge.c | 22 +- block/blk.h |5 ++--- block/elevator.c |6 ++ include/linux/bio.h | 23 +-- include/linux/blk_types.h |4 include/linux/blkdev.h| 22 ++ 7 files changed, 46 insertions(+), 49 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 2d739ca..5cc2929 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1657,8 +1657,8 @@ generic_make_request_checks(struct bio *bio) goto end_io; } - if (unlikely(!(bio-bi_rw REQ_DISCARD) -nr_sectors queue_max_hw_sectors(q))) { + if (likely(bio_is_rw(bio) + nr_sectors queue_max_hw_sectors(q))) { printk(KERN_ERR bio too big device %s (%u %u)\n, bdevname(bio-bi_bdev, b), bio_sectors(bio), @@ -1699,8 +1699,7 @@ generic_make_request_checks(struct bio *bio) if ((bio-bi_rw REQ_DISCARD) (!blk_queue_discard(q) || -((bio-bi_rw REQ_SECURE) - !blk_queue_secdiscard(q { +((bio-bi_rw REQ_SECURE) !blk_queue_secdiscard(q { err = -EOPNOTSUPP; goto end_io; } @@ -1818,7 +1817,7 @@ void submit_bio(int rw, struct bio *bio) * If it's a regular read/write or a barrier with data attached, * go through the normal accounting stuff before submission. */ - if (bio_has_data(bio) !(rw REQ_DISCARD)) { + if (bio_has_data(bio)) { if (rw WRITE) { count_vm_events(PGPGOUT, count); } else { @@ -1864,7 +1863,7 @@ EXPORT_SYMBOL(submit_bio); */ int blk_rq_check_limits(struct request_queue *q, struct request *rq) { - if (rq-cmd_flags REQ_DISCARD) + if (!rq_mergeable(rq)) return 0; if (blk_rq_sectors(rq) queue_max_sectors(q) || @@ -2338,7 +2337,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) req-buffer = bio_data(req-bio); /* update sector only for requests with clear definition of sector */ - if (req-cmd_type == REQ_TYPE_FS || (req-cmd_flags REQ_DISCARD)) + if (req-cmd_type == REQ_TYPE_FS) req-__sector += total_bytes 9; /* mixed attributes always follow the first bio */ diff --git a/block/blk-merge.c b/block/blk-merge.c index e76279e..86710ca 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -418,18 +418,6 @@ static int attempt_merge(struct request_queue *q, struct request *req, return 0; /* -* Don't merge file system requests and discard requests -*/ - if ((req-cmd_flags REQ_DISCARD) != (next-cmd_flags REQ_DISCARD)) - return 0; - - /* -* Don't merge discard requests and secure discard requests -*/ - if ((req-cmd_flags REQ_SECURE) != (next-cmd_flags REQ_SECURE)) - return 0; - - /* * not contiguous */ if (blk_rq_pos(req) + blk_rq_sectors(req) != blk_rq_pos(next)) @@ -521,15 +509,7 @@ int blk_attempt_req_merge(struct request_queue *q, struct request *rq, bool blk_rq_merge_ok(struct request *rq, struct bio *bio) { - if (!rq_mergeable(rq)) - return false; - - /* don't merge file system requests and discard requests */ - if ((bio-bi_rw REQ_DISCARD) != (rq-bio-bi_rw REQ_DISCARD)) - return false; - - /* don't merge discard requests and secure discard requests */ - if ((bio-bi_rw REQ_SECURE) != (rq-bio-bi_rw REQ_SECURE)) + if (!rq_mergeable(rq) || !bio_mergeable(bio)) return false; /* different data direction or already started, don't merge */ diff --git a/block/blk.h b/block/blk.h index 2a0ea32..ca51543 100644 --- a/block/blk.h +++ b/block/blk.h @@ -171,14 +171,13 @@ static inline int queue_congestion_off_threshold(struct request_queue *q) * * a) it's attached to a gendisk, and * b) the queue had IO stats enabled when this request was started, and - * c) it's a file system request or a discard request + * c) it's a file system request */ static inline int blk_do_io_stat(struct request *rq) { return rq-rq_disk
[PATCH 5/8] block: ioctl to zero block ranges
From: Martin K. Petersen martin.peter...@oracle.com Introduce a BLKZEROOUT ioctl which can be used to clear block ranges by way of blkdev_issue_zeroout(). Signed-off-by: Martin K. Petersen martin.peter...@oracle.com Acked-by: Mike Snitzer snit...@redhat.com --- block/ioctl.c | 27 +++ include/linux/fs.h |1 + 2 files changed, 28 insertions(+), 0 deletions(-) diff --git a/block/ioctl.c b/block/ioctl.c index 4476e0e8..769d296 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -185,6 +185,22 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start, return blkdev_issue_discard(bdev, start, len, GFP_KERNEL, flags); } +static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start, +uint64_t len) +{ + if (start 511) + return -EINVAL; + if (len 511) + return -EINVAL; + start = 9; + len = 9; + + if (start + len (i_size_read(bdev-bd_inode) 9)) + return -EINVAL; + + return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL); +} + static int put_ushort(unsigned long arg, unsigned short val) { return put_user(val, (unsigned short __user *)arg); @@ -300,6 +316,17 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, return blk_ioctl_discard(bdev, range[0], range[1], cmd == BLKSECDISCARD); } + case BLKZEROOUT: { + uint64_t range[2]; + + if (!(mode FMODE_WRITE)) + return -EBADF; + + if (copy_from_user(range, (void __user *)arg, sizeof(range))) + return -EFAULT; + + return blk_ioctl_zeroout(bdev, range[0], range[1]); + } case HDIO_GETGEO: { struct hd_geometry geo; diff --git a/include/linux/fs.h b/include/linux/fs.h index aa11047..bd6f6e7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -335,6 +335,7 @@ struct inodes_stat_t { #define BLKDISCARDZEROES _IO(0x12,124) #define BLKSECDISCARD _IO(0x12,125) #define BLKROTATIONAL _IO(0x12,126) +#define BLKZEROOUT _IO(0x12,127) #define BMAP_IOCTL 1 /* obsolete - kept for compatibility */ #define FIBMAP_IO(0x00,1) /* bmap access */ -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] scsi: Add a report opcode helper
From: Martin K. Petersen martin.peter...@oracle.com The REPORT SUPPORTED OPERATION CODES command can be used to query whether a given opcode is supported by a device. Add a helper function that allows us to look up commands. We only issue RSOC if the device reports compliance with SPC-3 or later. But to err on the side of caution we disable the command for ATA, FireWire and USB. Signed-off-by: Martin K. Petersen martin.peter...@oracle.com Acked-by: Mike Snitzer snit...@redhat.com --- drivers/ata/libata-scsi.c |1 + drivers/firewire/sbp2.c|1 + drivers/scsi/scsi.c| 45 drivers/usb/storage/scsiglue.c |3 ++ include/scsi/scsi_device.h |3 ++ 5 files changed, 53 insertions(+), 0 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 8ec81ca..86d43b2 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1052,6 +1052,7 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev) { sdev-use_10_for_rw = 1; sdev-use_10_for_ms = 1; + sdev-no_report_opcodes = 1; /* Schedule policy is determined by -qc_defer() callback and * it needs to see every deferred qc. Set dev_blocked to 1 to diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c index 1162d6b..f82e9d4 100644 --- a/drivers/firewire/sbp2.c +++ b/drivers/firewire/sbp2.c @@ -1546,6 +1546,7 @@ static int sbp2_scsi_slave_configure(struct scsi_device *sdev) struct sbp2_logical_unit *lu = sdev-hostdata; sdev-use_10_for_rw = 1; + sdev-no_report_opcodes = 1; if (sbp2_param_exclusive_login) sdev-manage_start_stop = 1; diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 2936b44..2c0d0ec 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -55,6 +55,7 @@ #include linux/cpu.h #include linux/mutex.h #include linux/async.h +#include asm/unaligned.h #include scsi/scsi.h #include scsi/scsi_cmnd.h @@ -1062,6 +1063,50 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, unsigned char *buf, EXPORT_SYMBOL_GPL(scsi_get_vpd_page); /** + * scsi_report_opcode - Find out if a given command opcode is supported + * @sdev: scsi device to query + * @buffer:scratch buffer (must be at least 20 bytes long) + * @len: length of buffer + * @opcode:opcode for command to look up + * + * Uses the REPORT SUPPORTED OPERATION CODES to look up the given + * opcode. Returns 0 if RSOC fails or if the command opcode is + * unsupported. Returns 1 if the device claims to support the command. + */ +int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer, + unsigned int len, unsigned char opcode) +{ + unsigned char cmd[16]; + struct scsi_sense_hdr sshdr; + int result; + + if (sdev-no_report_opcodes || sdev-scsi_level SCSI_SPC_3) + return 0; + + memset(cmd, 0, 16); + cmd[0] = MAINTENANCE_IN; + cmd[1] = MI_REPORT_SUPPORTED_OPERATION_CODES; + cmd[2] = 1; /* One command format */ + cmd[3] = opcode; + put_unaligned_be32(len, cmd[6]); + memset(buffer, 0, len); + + result = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buffer, len, + sshdr, 30 * HZ, 3, NULL); + + if (result scsi_sense_valid(sshdr) + sshdr.sense_key == ILLEGAL_REQUEST + (sshdr.asc == 0x20 || sshdr.asc == 0x24) sshdr.ascq == 0x00) + return 0; + + if ((buffer[1] 3) == 3) /* Command supported */ + return 1; + + return 0; +} +EXPORT_SYMBOL(scsi_report_opcode); + +/** * scsi_device_get - get an additional reference to a scsi_device * @sdev: device to get a reference to * diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c index a3d5436..6ab376a 100644 --- a/drivers/usb/storage/scsiglue.c +++ b/drivers/usb/storage/scsiglue.c @@ -186,6 +186,9 @@ static int slave_configure(struct scsi_device *sdev) /* Some devices don't handle VPD pages correctly */ sdev-skip_vpd_pages = 1; + /* Do not attempt to use REPORT SUPPORTED OPERATION CODES */ + sdev-no_report_opcodes = 1; + /* Some disks return the total number of blocks in response * to READ CAPACITY rather than the highest block number. * If this device makes that mistake, tell the sd driver. */ diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 9895f69..7d0b4f6 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -135,6 +135,7 @@ struct scsi_device { * because we did a bus reset. */ unsigned use_10_for_rw:1; /* first try 10-byte read / write */ unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */ + unsigned
[PATCH 3/8] block: Implement support for WRITE SAME
From: Martin K. Petersen martin.peter...@oracle.com The WRITE SAME command supported on some SCSI devices allows the same block to be efficiently replicated throughout a block range. Only a single logical block is transferred from the host and the storage device writes the same data to all blocks described by the I/O. This patch implements support for WRITE SAME in the block layer. The blkdev_issue_write_same() function can be used by filesystems and block drivers to replicate a buffer across a block range. This can be used to efficiently initialize software RAID devices, etc. Signed-off-by: Martin K. Petersen martin.peter...@oracle.com Acked-by: Mike Snitzer snit...@redhat.com --- Documentation/ABI/testing/sysfs-block | 14 ++ block/blk-core.c | 14 +- block/blk-lib.c | 74 + block/blk-merge.c |9 block/blk-settings.c | 16 +++ block/blk-sysfs.c | 13 ++ drivers/md/raid0.c|1 + fs/bio.c |9 +++- include/linux/bio.h |3 + include/linux/blk_types.h |5 ++- include/linux/blkdev.h| 29 + 11 files changed, 181 insertions(+), 6 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index c1eb41c..279da08 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block @@ -206,3 +206,17 @@ Description: when a discarded area is read the discard_zeroes_data parameter will be set to one. Otherwise it will be 0 and the result of reading a discarded area is undefined. + +What: /sys/block/disk/queue/write_same_max_bytes +Date: January 2012 +Contact: Martin K. Petersen martin.peter...@oracle.com +Description: + Some devices support a write same operation in which a + single data block can be written to a range of several + contiguous blocks on storage. This can be used to wipe + areas on disk or to initialize drives in a RAID + configuration. write_same_max_bytes indicates how many + bytes can be written in a single write same command. If + write_same_max_bytes is 0, write same is not supported + by the device. + diff --git a/block/blk-core.c b/block/blk-core.c index 33eded0..3b08054 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1704,6 +1704,11 @@ generic_make_request_checks(struct bio *bio) goto end_io; } + if (bio-bi_rw REQ_WRITE_SAME !bdev_write_same(bio-bi_bdev)) { + err = -EOPNOTSUPP; + goto end_io; + } + /* * Various block parts want %current-io_context and lazy ioc * allocation ends up trading a lot of pain for a small amount of @@ -1809,8 +1814,6 @@ EXPORT_SYMBOL(generic_make_request); */ void submit_bio(int rw, struct bio *bio) { - int count = bio_sectors(bio); - bio-bi_rw |= rw; /* @@ -1818,6 +1821,13 @@ void submit_bio(int rw, struct bio *bio) * go through the normal accounting stuff before submission. */ if (bio_has_data(bio)) { + unsigned int count; + + if (unlikely(rw REQ_WRITE_SAME)) + count = bdev_logical_block_size(bio-bi_bdev) 9; + else + count = bio_sectors(bio); + if (rw WRITE) { count_vm_events(PGPGOUT, count); } else { diff --git a/block/blk-lib.c b/block/blk-lib.c index 19cc761..a062543 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -130,6 +130,80 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, EXPORT_SYMBOL(blkdev_issue_discard); /** + * blkdev_issue_write_same - queue a write same operation + * @bdev: target blockdev + * @sector:start sector + * @nr_sects: number of sectors to write + * @gfp_mask: memory allocation flags (for bio_alloc) + * @page: page containing data to write + * + * Description: + *Issue a write same request for the sectors in question. + */ +int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask, + struct page *page) +{ + DECLARE_COMPLETION_ONSTACK(wait); + struct request_queue *q = bdev_get_queue(bdev); + unsigned int max_write_same_sectors; + struct bio_batch bb; + struct bio *bio; + int ret = 0; + + if (!q) + return -ENXIO; + + max_write_same_sectors = q-limits.max_write_same_sectors; + + if (max_write_same_sectors == 0) + return -EOPNOTSUPP; + + atomic_set(bb.done, 1); +
[PATCH 8/8] sd: Implement support for WRITE SAME
From: Martin K. Petersen martin.peter...@oracle.com Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk driver. - We set the default maximum to 0x because there are several devices out there that only support two-byte block counts even with WRITE SAME(16). We only enable transfers bigger than 0x if the device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK LIMITS VPD. - max_write_same_blocks can be overriden per-device basis in sysfs. - The UNMAP discovery heuristics remain unchanged but the discard limits are tweaked to match the real WRITE SAME commands. - In the error handling logic we now distinguish between WRITE SAME with and without UNMAP set. The discovery process heuristics are: - If the device reports a SCSI level of SPC-3 or greater we'll issue READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is supported. If that's the case we will use it. - If the device supports the block limits VPD and reports a MAXIMUM WRITE SAME LENGTH bigger than 0x we will use WRITE SAME(16). - Otherwise we will use WRITE SAME(10) unless the target LBA is beyond 0x or the block count exceeds 0x. - no_write_same is set for ATA, FireWire and USB. Signed-off-by: Martin K. Petersen martin.peter...@oracle.com Acked-by: Mike Snitzer snit...@redhat.com Acked-by: Jeff Garzik jgar...@redhat.com --- drivers/ata/libata-scsi.c |1 + drivers/firewire/sbp2.c|1 + drivers/scsi/scsi_lib.c| 22 - drivers/scsi/sd.c | 172 +--- drivers/scsi/sd.h |7 ++ drivers/usb/storage/scsiglue.c |3 + include/scsi/scsi_device.h |1 + 7 files changed, 191 insertions(+), 16 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 86d43b2..002f1b7 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1053,6 +1053,7 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev) sdev-use_10_for_rw = 1; sdev-use_10_for_ms = 1; sdev-no_report_opcodes = 1; + sdev-no_write_same = 1; /* Schedule policy is determined by -qc_defer() callback and * it needs to see every deferred qc. Set dev_blocked to 1 to diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c index f82e9d4..bb1b392 100644 --- a/drivers/firewire/sbp2.c +++ b/drivers/firewire/sbp2.c @@ -1547,6 +1547,7 @@ static int sbp2_scsi_slave_configure(struct scsi_device *sdev) sdev-use_10_for_rw = 1; sdev-no_report_opcodes = 1; + sdev-no_write_same = 1; if (sbp2_param_exclusive_login) sdev-manage_start_stop = 1; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index ffd7773..874d41b 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -897,11 +897,23 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) action = ACTION_FAIL; error = -EILSEQ; /* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */ - } else if ((sshdr.asc == 0x20 || sshdr.asc == 0x24) - (cmd-cmnd[0] == UNMAP || - cmd-cmnd[0] == WRITE_SAME_16 || - cmd-cmnd[0] == WRITE_SAME)) { - description = Discard failure; + } else if (sshdr.asc == 0x20 || sshdr.asc == 0x24) { + switch (cmd-cmnd[0]) { + case UNMAP: + description = Discard failure; + break; + case WRITE_SAME: + case WRITE_SAME_16: + if (cmd-cmnd[1] 0x8) + description = Discard failure; + else + description = + Write same failure; + break; + default: + description = Invalid command failure; + break; + } action = ACTION_FAIL; error = -EREMOTEIO; } else diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index e100b5c..1cf2d5d 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -99,6 +99,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC); #endif static void sd_config_discard(struct scsi_disk *, unsigned int); +static void sd_config_write_same(struct scsi_disk *); static int sd_revalidate_disk(struct gendisk *); static void
Re: [V4 PATCH 0/8] csiostor: Chelsio FCoE offload driver submission
On 9/18/2012 2:06 PM, James Bottomley wrote: On Tue, 2012-09-18 at 09:54, Naresh Kumar Inna wrote: Hi James, Could you please consider merging version V4 of the driver patches, if you think they are in good shape now? Well, definitely not yet; you seem to have missed the memo on readq: CC [M] drivers/scsi/cxgbi/cxgb4i/cxgb4i.o drivers/scsi/csiostor/csio_hw.c: In function csio_hw_mc_read: drivers/scsi/csiostor/csio_hw.c:194:3: error: implicit declaration of function readq [-Werror=implicit-function-declaration] It's only defined on platforms which can support an atomic 64 bit operation (which is mostly not any 32 bit platforms) ... so this could do with compile testing on those. To see how to handle readq/writeq in the 32 bit case, see the uses in fnic or qla2xxx Thanks for reviewing. I will fix up readq/writeq, as well as other 32-bit compilation issues, if any. You also have a couple of unnecessary version.h includes. I will get rid of them. Since you're a new driver, could you not do a correctly unlocked queuecommand routine? You'll find the way you've currently got it coded (holding the host lock for the entire queuecommand routine) is very performance detrimental. Yes, I am aware of that. However, some of this code was written and tested before the lock-less queuecommand came into existence. Going the lock-less route would require me to test the driver thoroughly again. It definitely is in my to-do list, but I would like to take that up after the initial submission goes through. Would that be OK? You have a lot of locking statements which aren't easy to audit by hand because there are multiple unlocks. Things like this: csio_scan_finished(struct Scsi_Host *shost, unsigned long time) { struct csio_lnode *ln = shost_priv(shost); int rv = 0; spin_lock_irq(shost-host_lock); if (!ln-hwp || csio_list_deleted(ln-sm.sm_list)) { spin_unlock_irq(shost-host_lock); return 1; } rv = csio_scan_done(ln, jiffies, time, csio_max_scan_tmo * HZ, csio_delta_scan_tmo * HZ); spin_unlock_irq(shost-host_lock); return rv; } Are better coded as csio_scan_finished(struct Scsi_Host *shost, unsigned long time) { struct csio_lnode *ln = shost_priv(shost); int rv = 1; spin_lock_irq(shost-host_lock); if (!ln-hwp || csio_list_deleted(ln-sm.sm_list)) goto out; rv = csio_scan_done(ln, jiffies, time, csio_max_scan_tmo * HZ, csio_delta_scan_tmo * HZ); out: spin_unlock_irq(shost-host_lock); return rv; } It's shorter and the unlock clearly matches the lock. You could even invert the if logic and just make the csio_scan_done() conditional on it avoiding the goto. I will try to minimize the lock/unlock mismatch instances per your suggestions, if not eliminate them altogether. I'd also really like the people who understand FC to take a look over this as well. Sure. Thanks, Naresh. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fail to probe qla2xxx fiber channel card while doing pci hotplug
On Mon, Sep 17, 2012 at 6:06 AM, Yijing Wang wangyij...@huawei.com wrote: On 2012/9/16 11:30, Bjorn Helgaas wrote: On Sat, Sep 15, 2012 at 4:22 AM, Yijing Wang wangyij...@huawei.com wrote: Hi all, I encountered a very strange problem when I hot plug a fiber channel card(using qla2xxx driver). I did the hotplug in arch x86 machine, using pciehp driver for hotplug, this platform supports pci hot-plug triggering from both sysfs and attention button. If a hot-plug slot is empty when system boot-up, then hotplug FC card in this slot is ok. If a hot-plug slot has been embeded a FC card when system boot-up, hot-remove this card is ok, but hot-add this card will fail. I used #modprobe qla2xxx ql2xextended_error_logging=0x7fff to get all probe info. As bellow: Can anyone give me any suggestion for this problem? It sounds like you did this: 1) Power down system 2) Remove FC card from slot 3) Boot system 4) Hot-add FC card 5) Load qla2xxx driver 6) qla2xxx driver claims FC card 7) FC card works correctly 8) Power down system 9) Install FC card in slot 10) Boot system 11) Load qla2xxx driver 12) qla2xxx driver claims FC card 13) FC card works correctly I rmmod qla2xxx driver here and modprobe qla2xxx ql2xextended_error_logging=0x1e40 again for get errors info Also I modprobe pciehp pciehp_debug=1 for getting debug info 14) Hot-remove card 15) Hot-add card 16) qla2xxx driver claims FC card 17) FC card does not work and I assume the dmesg log you included is just from steps 15 and 16 (correct me if I'm wrong). It would be useful to see the entire log showing all these events so we can compare the working cases with the non-working one. If you use the pciehp_debug module parameter, we should also see some pciehp events that would help me understand that driver. Hi Bjorn, Thanks for your comments very much! My steps: 1) power down system 2) Install FC card in slot 3) Boot system 4) Load qla2xxx driver 5) qla2xxx driver claims FC card 6) FC card works correctly(at least probe return ok, I don't know qla2xxx driver much..) 7) rmmod qla2xxx 8) modprobe qla2xxx ql2xextended_error_logging=0x1e40(for get errors info) 9) modprobe pciehp pciehp_debug=1 10) Hot-remove card 11) Hot-add card 12) qla2xxx driver claims FC card fail(probe return fail, setup chip fail) --so this is failed situation-- --continue to hot-add fc card into empty slot(also support pci hp) 13) Install FC card in empty slot 14) Hot-add card 15) qla2xxx driver claims FC card ok (probe return ok) btw: If fc card firmware version 4.03, everything is ok (hot-plug in any slots(empty or not)) fc card firmware version is 4.04 or 5.04 , situation as same as 1)---12) Thanks. The FW change is a good clue. If everything works with version 4.03, but it doesn't work with version 4.04, it's likely to be a FW problem, not a Linux PCI core problem. Here's what I see from your logs. In slot 4 (bus 08), the card was present before boot, you removed it, re-added it, and it failed after being re-added. Slot 3 (bus 06) was empty at boot, you hot-added a card, and it worked. Here are the resources available on those two buses and the boot-time config of the first device in slot 4: pci :00:07.0: PCI bridge to [bus 06-07] pci :00:07.0: bridge window [io 0xc000-0xcfff] pci :00:07.0: bridge window [mem 0xf900-0xf9ff] pci :00:07.0: bridge window [mem 0xf100-0xf1ff 64bit pref] pci :00:09.0: PCI bridge to [bus 08-09] pci :00:09.0: bridge window [io 0xb000-0xbfff] pci :00:09.0: bridge window [mem 0xf800-0xf8ff] pci :00:09.0: bridge window [mem 0xf000-0xf0ff 64bit pref] pci :08:00.0: [1077:2532] type 00 class 0x0c0400 pci :08:00.0: reg 10: [io 0xb100-0xb1ff] pci :08:00.0: reg 14: [mem 0xf8084000-0xf8087fff 64bit] pci :08:00.0: reg 30: [mem 0xf804-0xf807 pref] After you remove and re-add the card in slot 4, it starts with uninitialized BARs as expected, then we assign resources to it. It's sort of interesting that the BIOS had originally put the ROM (reg 30) in the non-prefetchable window, while after the hot-add, Linux places it in the prefetchable window. Either should work, and in fact the card you added in slot 3 *does* work with its ROM in the prefetchable window. pci :08:00.0: [1077:2532] type 00 class 0x0c0400 pci :08:00.0: reg 10: [io 0x-0x00ff] pci :08:00.0: reg 14: [mem 0x-0x3fff 64bit] pci :08:00.0: reg 30: [mem 0x-0x0003 pref] pci :08:00.0: BAR 0: assigned [io 0xb000-0xb0ff] pci :08:00.0: BAR 1: assigned [mem 0xf800-0xf8003fff 64bit] pci :08:00.0: BAR 6: assigned [mem
Re: [PATCH 2/8] block: Consolidate command flag and queue limit checks for merges
On Tue, Sep 18 2012 at 12:19pm -0400, Martin K. Petersen martin.peter...@oracle.com wrote: From: Martin K. Petersen martin.peter...@oracle.com - blk_check_merge_flags() verifies that cmd_flags / bi_rw are compatible. This function is called for both req-req and req-bio merging. - blk_rq_get_max_sectors() and blk_queue_get_max_sectors() can be used to query the maximum sector count for a given request or queue. The calls will return the right value from the queue limits given the type of command (RW, discard, write same, etc.) Signed-off-by: Martin K. Petersen martin.peter...@oracle.com --- block/blk-core.c |3 +-- block/blk-merge.c | 30 -- include/linux/blkdev.h | 31 +++ 3 files changed, 44 insertions(+), 20 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 5cc2929..33eded0 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1866,8 +1866,7 @@ int blk_rq_check_limits(struct request_queue *q, struct request *rq) if (!rq_mergeable(rq)) return 0; - if (blk_rq_sectors(rq) queue_max_sectors(q) || - blk_rq_bytes(rq) queue_max_hw_sectors(q) 9) { + if (blk_rq_sectors(rq) blk_queue_get_max_sectors(q, rq-cmd_flags)) { printk(KERN_ERR %s: over max size limit.\n, __func__); return -EIO; } diff --git a/block/blk-merge.c b/block/blk-merge.c index 86710ca..642b862 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -275,14 +275,8 @@ no_merge: int ll_back_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { - unsigned short max_sectors; - - if (unlikely(req-cmd_type == REQ_TYPE_BLOCK_PC)) - max_sectors = queue_max_hw_sectors(q); - else - max_sectors = queue_max_sectors(q); - - if (blk_rq_sectors(req) + bio_sectors(bio) max_sectors) { + if (blk_rq_sectors(req) + bio_sectors(bio) + blk_rq_get_max_sectors(req)) { req-cmd_flags |= REQ_NOMERGE; if (req == q-last_merge) q-last_merge = NULL; @@ -299,15 +293,8 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, int ll_front_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { - unsigned short max_sectors; - - if (unlikely(req-cmd_type == REQ_TYPE_BLOCK_PC)) - max_sectors = queue_max_hw_sectors(q); - else - max_sectors = queue_max_sectors(q); - - - if (blk_rq_sectors(req) + bio_sectors(bio) max_sectors) { + if (blk_rq_sectors(req) + bio_sectors(bio) + blk_rq_get_max_sectors(req)) { req-cmd_flags |= REQ_NOMERGE; if (req == q-last_merge) q-last_merge = NULL; @@ -338,7 +325,8 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, /* * Will it become too large? */ - if ((blk_rq_sectors(req) + blk_rq_sectors(next)) queue_max_sectors(q)) + if ((blk_rq_sectors(req) + blk_rq_sectors(next)) + blk_rq_get_max_sectors(req)) return 0; total_phys_segments = req-nr_phys_segments + next-nr_phys_segments; @@ -417,6 +405,9 @@ static int attempt_merge(struct request_queue *q, struct request *req, if (!rq_mergeable(req) || !rq_mergeable(next)) return 0; + if (!blk_check_merge_flags(req-cmd_flags, next-cmd_flags)) + return 0; + /* * not contiguous */ @@ -512,6 +503,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) if (!rq_mergeable(rq) || !bio_mergeable(bio)) return false; + if (!blk_check_merge_flags(rq-cmd_flags, bio-bi_rw)) + return false; + /* different data direction or already started, don't merge */ if (bio_data_dir(bio) != rq_data_dir(rq)) return false; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3a6fea7..90f7abe 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -605,6 +605,18 @@ static inline bool rq_mergeable(struct request *rq) return true; } +static inline bool blk_check_merge_flags(unsigned int flags1, + unsigned int flags2) +{ + if ((flags1 REQ_DISCARD) != (flags2 REQ_DISCARD)) + return false; + + if ((flags1 REQ_SECURE) != (flags2 REQ_SECURE)) + return false; + + return true; +} + /* * q-prep_rq_fn return values */ @@ -800,6 +812,25 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq) return blk_rq_cur_bytes(rq) 9; } +static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q, + unsigned int
[PATCH] qla2xxx: Fix endianness of task management response code
From: Roland Dreier rol...@purestorage.com The qla2xxx firmware actually expects the task management response code in a CTIO IOCB with SCSI status mode 1 to be in little-endian byte order, ie the response code should be the first byte in the sense_data[] array. The old code erroneously byte-swapped the response code, which puts it in the wrong place on the wire and leads to initiators thinking every task management request succeeds (since they see 0 in the byte where they look for the response code). Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Arun Easi arun.e...@qlogic.com Signed-off-by: Roland Dreier rol...@purestorage.com --- drivers/scsi/qla2xxx/qla_target.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 5b30132..41b74ba 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -1403,7 +1403,7 @@ static void qlt_24xx_send_task_mgmt_ctio(struct scsi_qla_host *ha, ctio-u.status1.scsi_status = __constant_cpu_to_le16(SS_RESPONSE_INFO_LEN_VALID); ctio-u.status1.response_len = __constant_cpu_to_le16(8); - ((uint32_t *)ctio-u.status1.sense_data)[0] = cpu_to_be32(resp_code); + ctio-u.status1.sense_data[0] = resp_code; qla2x00_start_iocbs(ha, ha-req); } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qla2xxx: Fix endianness of task management response code
On Tue, 2012-09-18 at 15:10 -0700, Roland Dreier wrote: From: Roland Dreier rol...@purestorage.com The qla2xxx firmware actually expects the task management response code in a CTIO IOCB with SCSI status mode 1 to be in little-endian byte order, ie the response code should be the first byte in the sense_data[] array. The old code erroneously byte-swapped the response code, which puts it in the wrong place on the wire and leads to initiators thinking every task management request succeeds (since they see 0 in the byte where they look for the response code). Cc: Chad Dupuis chad.dup...@qlogic.com Cc: Arun Easi arun.e...@qlogic.com Signed-off-by: Roland Dreier rol...@purestorage.com --- Very nice catch here Roland! drivers/scsi/qla2xxx/qla_target.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 5b30132..41b74ba 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -1403,7 +1403,7 @@ static void qlt_24xx_send_task_mgmt_ctio(struct scsi_qla_host *ha, ctio-u.status1.scsi_status = __constant_cpu_to_le16(SS_RESPONSE_INFO_LEN_VALID); ctio-u.status1.response_len = __constant_cpu_to_le16(8); - ((uint32_t *)ctio-u.status1.sense_data)[0] = cpu_to_be32(resp_code); + ctio-u.status1.sense_data[0] = resp_code; qla2x00_start_iocbs(ha, ha-req); } Dropping this into queue for the moment, and will CC' to stable - move into for-next once Chad Co. have a chance to give their ACK. Thank you! -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 47701] New: When too many disks fall out at the same time, RCU hangs
https://bugzilla.kernel.org/show_bug.cgi?id=47701 Summary: When too many disks fall out at the same time, RCU hangs Product: IO/Storage Version: 2.5 Kernel Version: 3.5.4 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: SCSI AssignedTo: linux-scsi@vger.kernel.org ReportedBy: sgunder...@bigfoot.com Regression: No Hi, For whatever reason, I lost all of my disks at the same time (I guess a SAS cable fell out; I'll know tomorrow). As was expected, I/O on the machine was not so happy afterwards; what was not so expected, was the following output on the serial console a minute or so after: [292657.601264] INFO: rcu_sched self-detected stall on CPU[292657.602441] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 4, t=60002 jiffies) [292657.602444] INFO: Stall ended before state dump start [292657.620982] {[292657.622964] 6} (t=60024 jiffies) Pid: 7800, comm: kworker/u:9 Not tainted 3.5.4 #1 [292657.631337] Call Trace: [292657.634069] IRQ [8108ecaf] __rcu_pending+0xbd/0x417 [292657.640471] [8108f0de] rcu_check_callbacks+0xd5/0x138 [292657.646765] [81041b73] update_process_times+0x3c/0x73 [292657.653050] [81072076] tick_sched_timer+0x6a/0x93 [292657.658999] [8105301e] __run_hrtimer+0xb3/0x13e [292657.664763] [8107200c] ? tick_nohz_handler+0xd3/0xd3 [292657.670972] [8103b7a3] ? __do_softirq+0x16c/0x182 [292657.676916] [810533d5] hrtimer_interrupt+0xce/0x1b0 [292657.683030] [8101ad3d] smp_apic_timer_interrupt+0x81/0x94 [292657.689676] [81377447] apic_timer_interrupt+0x67/0x70 [292657.695965] EOI [813703dd] ? _raw_spin_unlock_irqrestore+0x9/0xb [292657.703695] [8123daaa] scsi_remove_target+0x137/0x153 [292657.709985] [812425dc] sas_rphy_remove+0x25/0x4e [292657.715841] [81242616] sas_rphy_delete+0x11/0x1e [292657.721699] [81242648] sas_port_delete+0x25/0x11a [292657.727644] [8136de53] ? mutex_unlock+0x9/0xb [292657.733254] [a0020fd5] mpt2sas_transport_port_remove+0x16f/0x190 [mpt2sas] [292657.741576] [a001a70b] _scsih_remove_device+0x58/0x84 [mpt2sas] [292657.748731] [a001a7f4] _scsih_device_remove_by_handle+0xbd/0xc6 [mpt2sas] [292657.756960] [a001c5bb] _scsih_sas_topology_change_event+0x422/0x46d [mpt2sas] [292657.765531] [81064556] ? idle_balance+0xde/0x10c [292657.771395] [a001e098] ? _scsih_abort+0x1c1/0x1c1 [mpt2sas] [292657.778212] [a001e38d] _firmware_event_work+0x2f5/0x920 [mpt2sas] [292657.785547] [81042089] ? add_timer+0x17/0x1a [292657.791058] [8104bc64] ? queue_delayed_work_on+0xda/0xe8 [292657.797607] [a001e098] ? _scsih_abort+0x1c1/0x1c1 [mpt2sas] [292657.804418] [8104c641] process_one_work+0x253/0x3c5 [292657.810534] [8104cbb7] worker_thread+0x1d4/0x34d [292657.816394] [8104c9e3] ? rescuer_thread+0x230/0x230 [292657.822511] [810500bb] kthread+0x84/0x8c [292657.827675] [81377c94] kernel_thread_helper+0x4/0x10 [292657.833875] [81050037] ? kthread_freezable_should_stop+0x58/0x58 [292657.841117] [81377c90] ? gs_change+0xb/0xb I'm reporting this primarily because it could cause problems in some other context (say, when only one or two disks disappear); of course, for me in this case, it wouldn't matter if I/O was properly rejected or the entire machine hanged, it's useless anyway. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html