Re: [PATCH] libsas: don't treat underrun as an error on SMP tasks
On Sat, 29 Dec 2007 11:49:53 -0600 James Bottomley [EMAIL PROTECTED] wrote: All SMP tasks sent through bsg generate messages like: sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 0x81 Three times (because the task gets retried). Firstly, don't retry either overrun or underrun (the data buffer isn't going to change size) and secondly, just report the underrun but don't set an error for it. This is necessary so bsg can report back the residual. James diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 76555b1..1578059 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -109,6 +109,16 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size, task-task_status.stat == SAM_GOOD) { res = 0; break; + } if (task-task_status.resp == SAS_TASK_COMPLETE + task-task_status.stat == SAS_DATA_UNDERRUN) { + /* no error, but return the number of bytes of + * underrun */ + res = task-task_status.residual; + break; + } if (task-task_status.resp == SAS_TASK_COMPLETE + task-task_status.stat == SAS_DATA_OVERRUN) { + res = -EMSGSIZE; + break; } else { SAS_DPRINTK(%s: task to dev %016llx response: 0x%x status 0x%x\n, __FUNCTION__, @@ -1924,6 +1934,11 @@ int sas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy, ret = smp_execute_task(dev, bio_data(req-bio), req-data_len, bio_data(rsp-bio), rsp-data_len); + if (ret 0) { + /* positive number is the untransferred residual */ + rsp-data_len = ret; + ret = 0; + } Would be better to update dout_resid too (on sucess, we can set req-data_len to zero, I think)? Here's a patch to do the same thing for mpt sas, an updated version of: http://marc.info/?l=linux-scsim=119811872823947w=2 -- From: FUJITA Tomonori [EMAIL PROTECTED] Subject: [PATCH] mpt fusion: mptsas_smp_handler updates resid This patch fixes mptsas_smp_handler to update both din_resid or dout_resid on success. bsg can report back the residual. Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] --- drivers/message/fusion/mptsas.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index e4c94f9..f77b329 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1343,6 +1343,8 @@ static int mptsas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy, smprep = (SmpPassthroughReply_t *)ioc-sas_mgmt.reply; memcpy(req-sense, smprep, sizeof(*smprep)); req-sense_len = sizeof(*smprep); + req-data_len = 0; + rsp-data_len -= smprep-ResponseDataLength; } else { printk(MYIOC_s_ERR_FMT %s: smp passthru reply failed to be returned\n, ioc-name, __FUNCTION__); -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adaptec 2410SA, aacraid, managment
In article [EMAIL PROTECTED], Giacomo Di Ciocco [EMAIL PROTECTED] wrote: Hello subscribers, theres any utility to query/manage controller status/features ? I'm using this card for a Raid 10 array made of four 300GB sata disks, on a debian sarge with kernel 2.6.19.1-grsec. It runs on a dual opteron Tyan GT24 transport and operates principally as mail and web server. You can download the 'storage manager' software from the adaptec site. It includes GUI and CLI (arcconf) tools. http://linux.adaptec.com/2007/11/15/how-to-monitor-the-status-of-arrays-in-ubuntu-710/ Mike. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Maybe Sorry for that but where should i write .)
On Sat, Dec 29 2007 at 3:50 +0200, thanatos [EMAIL PROTECTED] wrote: I got a sata controller ignitio 00:11.0 SATA controller: Initio Corporation INI-1623 PCI SATA-II Controller (rev 02) (prog-if 00 [Vendor specific]) Subsystem: Initio Corporation INI-1623 PCI SATA-II Controller Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 11 I/O ports at dc00 [size=256] I/O ports at d800 [size=256] I/O ports at d400 [size=256] I/O ports at d000 [size=256] I/O ports at cc00 [size=256] Memory at c000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at cff8 [disabled] [size=256K] Capabilities: [dc] Power Management version 2 and an no yet usable 320 hdd, cause of lba48 ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: HPA unlocked: 625142448 - 625142448, native 18446744072354982576 ata3.00: ATA-8: SAMSUNG HD321KJ, CP100-12, max UDMA7 ata3.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 0/32) ata3.00: ERROR: This driver doesn't support LBA48 yet and may cause data corruption on such devices. Disabling. ata3.00: disabled i'm still on kernel 2.6.22. i had look on changelog 2.6.23 and even 2.6.24 nothing for that, is theyre a roadmap ? so if i could help in any way testing patch. i'm theyre cc: the linux-scsi mailing list. Personally I have nothing to do with this driver, I only sent a fix for a patch in 2.6.23 Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libsas: don't treat underrun as an error on SMP tasks
On Sun, 2007-12-30 at 19:34 +0900, FUJITA Tomonori wrote: On Sat, 29 Dec 2007 11:49:53 -0600 James Bottomley [EMAIL PROTECTED] wrote: All SMP tasks sent through bsg generate messages like: sas: smp_execute_task: task to dev 500605b01450 response: 0x0 status 0x81 Three times (because the task gets retried). Firstly, don't retry either overrun or underrun (the data buffer isn't going to change size) and secondly, just report the underrun but don't set an error for it. This is necessary so bsg can report back the residual. James diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 76555b1..1578059 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -109,6 +109,16 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size, task-task_status.stat == SAM_GOOD) { res = 0; break; + } if (task-task_status.resp == SAS_TASK_COMPLETE + task-task_status.stat == SAS_DATA_UNDERRUN) { + /* no error, but return the number of bytes of +* underrun */ + res = task-task_status.residual; + break; + } if (task-task_status.resp == SAS_TASK_COMPLETE + task-task_status.stat == SAS_DATA_OVERRUN) { + res = -EMSGSIZE; + break; } else { SAS_DPRINTK(%s: task to dev %016llx response: 0x%x status 0x%x\n, __FUNCTION__, @@ -1924,6 +1934,11 @@ int sas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy, ret = smp_execute_task(dev, bio_data(req-bio), req-data_len, bio_data(rsp-bio), rsp-data_len); + if (ret 0) { + /* positive number is the untransferred residual */ + rsp-data_len = ret; + ret = 0; + } Would be better to update dout_resid too (on sucess, we can set req-data_len to zero, I think)? Yes, I'll add that. Here's a patch to do the same thing for mpt sas, an updated version of: http://marc.info/?l=linux-scsim=119811872823947w=2 And this. James -- From: FUJITA Tomonori [EMAIL PROTECTED] Subject: [PATCH] mpt fusion: mptsas_smp_handler updates resid This patch fixes mptsas_smp_handler to update both din_resid or dout_resid on success. bsg can report back the residual. Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] --- drivers/message/fusion/mptsas.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index e4c94f9..f77b329 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1343,6 +1343,8 @@ static int mptsas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy, smprep = (SmpPassthroughReply_t *)ioc-sas_mgmt.reply; memcpy(req-sense, smprep, sizeof(*smprep)); req-sense_len = sizeof(*smprep); + req-data_len = 0; + rsp-data_len -= smprep-ResponseDataLength; } else { printk(MYIOC_s_ERR_FMT %s: smp passthru reply failed to be returned\n, ioc-name, __FUNCTION__); - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] libsas: don't use made up error codes
This is bad for two reasons: 1. If they're returned to outside applications, no-one knows what they mean. 2. Eventually they'll clash with the ever expanding standard error codes. The problem error code in question is ETASK. I've replaced this by ECOMM (communications error on send) a network error code that seems to most closely relay what ETASK meant. James diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 0829b55..adc47d4 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -500,7 +500,7 @@ static int sas_execute_task(struct sas_task *task, void *buffer, int size, goto ex_err; } wait_for_completion(task-completion); - res = -ETASK; + res = -ECOMM; if (task-task_state_flags SAS_TASK_STATE_ABORTED) { int res2; SAS_DPRINTK(task aborted, flags:0x%x\n, diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 8aeaad9..aefd865 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -96,7 +96,7 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size, } wait_for_completion(task-completion); - res = -ETASK; + res = -ECOMM; if ((task-task_state_flags SAS_TASK_STATE_ABORTED)) { SAS_DPRINTK(smp task timed out or aborted\n); i-dft-lldd_abort_task(task); diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h index 93248cd..a075f13 100644 --- a/include/scsi/libsas.h +++ b/include/scsi/libsas.h @@ -91,8 +91,6 @@ enum discover_event { /* -- Expander Devices -- */ -#define ETASK 0xFA - #define to_dom_device(_obj) container_of(_obj, struct domain_device, dev_obj) #define to_dev_attr(_attr) container_of(_attr, struct domain_dev_attribute,\ attr) - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PROBLEM: BUG: null pointer deref., segfaults
[1.] One line summary of the problem: using dd on a broken hdd causes kernel NULL pointer dereference [2.] Full description of the problem/report: I have a broken hdd (unreadable sector). While dd-ing it into another same size hdd, I get kernel-level error. First time it is a NULL pointer dereference then a few minutes later its a BUG in mm/slab.c, no IO operation anymore shown by iostat, and dd gets kernel-space process (shown as [dd] by ps). System becomes unstable, processes get segfaulted, even reboot is unoperational. Dmesg output: BUG: unable to handle kernel NULL pointer dereference at virtual address 002a printing eip: c016b8ce *pde = Oops: 0002 [#1] Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core usbcore sata_sil CPU:0 EIP:0060:[c016b8ce]Not tainted VLI EFLAGS: 00010213 (2.6.23.12 #1) EIP is at drop_buffers+0x5a/0xd5 eax: 002e ebx: d6e300f0 ecx: 0026 edx: d6e30118 esi: c126e140 edi: d6e300f0 ebp: d6e300f0 esp: c149bdd8 ds: 007b es: 007b fs: gs: ss: 0068 Process kswapd0 (pid: 163, ti=c149a000 task=dff6f030 task.ti=c149a000) Stack: c1274860 c1117260 df1152fc c149be00 c126e140 0001 c149bf84 c016b987 c126e140 df1152fc c0141f72 043e 0011 c149bf14 0120 0004 0004 0001 c12dc440 Call Trace: [c016b987] try_to_free_buffers+0x3e/0x6c [c0141f72] shrink_page_list+0x414/0x4fc [c01415a7] isolate_lru_pages+0x44/0x17f [c0142128] shrink_inactive_list+0xce/0x265 [c0117f98] check_preempt_curr_fair+0x52/0x56 [c014237d] shrink_zone+0xbe/0xe2 [c014278d] kswapd+0x251/0x3b8 [c0127a69] autoremove_wake_function+0x0/0x35 [c014253c] kswapd+0x0/0x3b8 [c0127913] kthread+0x36/0x5b [c01278dd] kthread+0x0/0x5b [c010468f] kernel_thread_helper+0x7/0x10 === Code: 14 8b 02 83 e0 06 0b 42 34 74 07 31 c0 e9 8c 00 00 00 8b 52 04 39 fa 75 d5 89 fb 8b 4b 28 8d 53 28 8b 6b 04 39 d1 74 53 8b 42 04 89 41 04 89 08 89 52 04 83 7b 30 00 89 53 28 75 29 c7 44 24 0c EIP: [c016b8ce] drop_buffers+0x5a/0xd5 SS:ESP 0068:c149bdd8 then later [ cut here ] kernel BUG at mm/slab.c:2983! invalid opcode: [#2] Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core usbcore sata_sil CPU:0 EIP:0060:[c01501fc]Tainted: G D VLI EFLAGS: 00010046 (2.6.23.12 #1) EIP is at cache_alloc_refill+0xe0/0x3ec eax: 0043 ebx: 0020 ecx: dffe7da0 edx: dffe7da0 esi: d4686000 edi: dffedd20 ebp: dffe9200 esp: de49dcc8 ds: 007b es: 007b fs: gs: 0033 ss: 0068 Process dd (pid: 1888, ti=de49c000 task=df7dd030 task.ti=de49c000) Stack: 0043 8050 dffe7da0 001a c0262880 d67d79e0 c013d8d3 c016b777 c1126900 dffe7da0 0282 8050 c01500e2 c1126900 1000 c016b75d c1126900 c016bda5 0001 c1126900 f000 Call Trace: [c0262880] submit_bio+0xa5/0xac [c013d8d3] mempool_alloc+0x1c/0x93 [c016b777] alloc_buffer_head+0x2a/0x2e [c01500e2] kmem_cache_alloc+0x2b/0x65 [c016b75d] alloc_buffer_head+0x10/0x2e [c016bda5] alloc_page_buffers+0x2d/0xbb [c016be43] create_empty_buffers+0x10/0x6b [c016dce5] block_read_full_page+0x40/0x296 [c017014f] blkdev_get_block+0x0/0x42 [c014054f] __do_page_cache_readahead+0x16c/0x1bf [c0140714] ondemand_readahead+0x48/0xf1 [c013bbf8] do_generic_mapping_read+0x10d/0x3c8 [c013d2ca] generic_file_aio_read+0x12d/0x159 [c013b5b7] file_read_actor+0x0/0xca [c01528a4] do_sync_read+0xc6/0x109 [c0139bbe] handle_IRQ_event+0x1a/0x3f [c0127a69] autoremove_wake_function+0x0/0x35 [c012d12a] clockevents_program_event+0x9c/0xa3 [c01527de] do_sync_read+0x0/0x109 [c01530bb] vfs_read+0xa6/0x128 [c01533db] sys_read+0x41/0x67 [c0103afe] sysenter_past_esp+0x5f/0x85 === Code: be 00 00 00 8b 37 39 fe 75 15 8b 77 10 8d 47 10 c7 47 30 01 00 00 00 39 c6 0f 84 9a 00 00 00 8b 54 24 08 8b 42 1c 39 46 10 72 2d 0f 0b eb fe 8b 44 24 08 8b 5e 14 8b 4d 00 8b 50 10 8b 04 24 0f EIP: [c01501fc] cache_alloc_refill+0xe0/0x3ec SS:ESP 0068:de49dcc8 [3.] Keywords (i.e., modules, networking, kernel): dd null pointer dereference segfault [4.] Kernel version (from /proc/version): Linux version 2.6.23.12 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 Thu Dec 27 13:22:18 CET 2007 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) sorry:) [6.] A small shell script or example program which triggers the problem (if possible) dd if=/dev/sdd of=/dev/sdb bs=65536 where sdd is the
Re: PROBLEM: BUG: null pointer deref., segfaults
From the backtrace, this doesn't seem to be a scsi or ide problem. It might be a block-layer bug, or a VM problem. I've cc'd the VM people to see what they think. On Mon, Dec 31, 2007 at 12:06:00AM +0100, Erno Kovacs wrote: [1.] One line summary of the problem: using dd on a broken hdd causes kernel NULL pointer dereference [2.] Full description of the problem/report: I have a broken hdd (unreadable sector). While dd-ing it into another same size hdd, I get kernel-level error. First time it is a NULL pointer dereference then a few minutes later its a BUG in mm/slab.c, no IO operation anymore shown by iostat, and dd gets kernel-space process (shown as [dd] by ps). System becomes unstable, processes get segfaulted, even reboot is unoperational. Dmesg output: BUG: unable to handle kernel NULL pointer dereference at virtual address 002a printing eip: c016b8ce *pde = Oops: 0002 [#1] Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core usbcore sata_sil CPU:0 EIP:0060:[c016b8ce]Not tainted VLI EFLAGS: 00010213 (2.6.23.12 #1) EIP is at drop_buffers+0x5a/0xd5 eax: 002e ebx: d6e300f0 ecx: 0026 edx: d6e30118 esi: c126e140 edi: d6e300f0 ebp: d6e300f0 esp: c149bdd8 ds: 007b es: 007b fs: gs: ss: 0068 Process kswapd0 (pid: 163, ti=c149a000 task=dff6f030 task.ti=c149a000) Stack: c1274860 c1117260 df1152fc c149be00 c126e140 0001 c149bf84 c016b987 c126e140 df1152fc c0141f72 043e 0011 c149bf14 0120 0004 0004 0001 c12dc440 Call Trace: [c016b987] try_to_free_buffers+0x3e/0x6c [c0141f72] shrink_page_list+0x414/0x4fc [c01415a7] isolate_lru_pages+0x44/0x17f [c0142128] shrink_inactive_list+0xce/0x265 [c0117f98] check_preempt_curr_fair+0x52/0x56 [c014237d] shrink_zone+0xbe/0xe2 [c014278d] kswapd+0x251/0x3b8 [c0127a69] autoremove_wake_function+0x0/0x35 [c014253c] kswapd+0x0/0x3b8 [c0127913] kthread+0x36/0x5b [c01278dd] kthread+0x0/0x5b [c010468f] kernel_thread_helper+0x7/0x10 === Code: 14 8b 02 83 e0 06 0b 42 34 74 07 31 c0 e9 8c 00 00 00 8b 52 04 39 fa 75 d5 89 fb 8b 4b 28 8d 53 28 8b 6b 04 39 d1 74 53 8b 42 04 89 41 04 89 08 89 52 04 83 7b 30 00 89 53 28 75 29 c7 44 24 0c EIP: [c016b8ce] drop_buffers+0x5a/0xd5 SS:ESP 0068:c149bdd8 then later [ cut here ] kernel BUG at mm/slab.c:2983! invalid opcode: [#2] Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core usbcore sata_sil CPU:0 EIP:0060:[c01501fc]Tainted: G D VLI EFLAGS: 00010046 (2.6.23.12 #1) EIP is at cache_alloc_refill+0xe0/0x3ec eax: 0043 ebx: 0020 ecx: dffe7da0 edx: dffe7da0 esi: d4686000 edi: dffedd20 ebp: dffe9200 esp: de49dcc8 ds: 007b es: 007b fs: gs: 0033 ss: 0068 Process dd (pid: 1888, ti=de49c000 task=df7dd030 task.ti=de49c000) Stack: 0043 8050 dffe7da0 001a c0262880 d67d79e0 c013d8d3 c016b777 c1126900 dffe7da0 0282 8050 c01500e2 c1126900 1000 c016b75d c1126900 c016bda5 0001 c1126900 f000 Call Trace: [c0262880] submit_bio+0xa5/0xac [c013d8d3] mempool_alloc+0x1c/0x93 [c016b777] alloc_buffer_head+0x2a/0x2e [c01500e2] kmem_cache_alloc+0x2b/0x65 [c016b75d] alloc_buffer_head+0x10/0x2e [c016bda5] alloc_page_buffers+0x2d/0xbb [c016be43] create_empty_buffers+0x10/0x6b [c016dce5] block_read_full_page+0x40/0x296 [c017014f] blkdev_get_block+0x0/0x42 [c014054f] __do_page_cache_readahead+0x16c/0x1bf [c0140714] ondemand_readahead+0x48/0xf1 [c013bbf8] do_generic_mapping_read+0x10d/0x3c8 [c013d2ca] generic_file_aio_read+0x12d/0x159 [c013b5b7] file_read_actor+0x0/0xca [c01528a4] do_sync_read+0xc6/0x109 [c0139bbe] handle_IRQ_event+0x1a/0x3f [c0127a69] autoremove_wake_function+0x0/0x35 [c012d12a] clockevents_program_event+0x9c/0xa3 [c01527de] do_sync_read+0x0/0x109 [c01530bb] vfs_read+0xa6/0x128 [c01533db] sys_read+0x41/0x67 [c0103afe] sysenter_past_esp+0x5f/0x85 === Code: be 00 00 00 8b 37 39 fe 75 15 8b 77 10 8d 47 10 c7 47 30 01 00 00 00 39 c6 0f 84 9a 00 00 00 8b 54 24 08 8b 42 1c 39 46 10 72 2d 0f 0b eb fe 8b 44 24 08 8b 5e 14 8b 4d 00 8b 50 10 8b 04 24 0f EIP: [c01501fc] cache_alloc_refill+0xe0/0x3ec SS:ESP 0068:de49dcc8 [3.] Keywords (i.e., modules, networking, kernel): dd null pointer dereference segfault [4.] Kernel version (from /proc/version): Linux version 2.6.23.12 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115