Re: PATA Sil680 Command Timeout on ARM XScale
Since primary channel and secondary channel share the same IRQ, the ISR could be called to service one or both channels. So I would think it's normal to see irq trap traces when both channels are in IO operation, correct? I have another question in regard to ata_host_intr() function in libata-core.c. For PIO read/write, the status of interrupt pin was not checked before moving the host state machine. Sil680 spec. recommend checking IDE channel interupt (bit 11 in the IDEx Task File Timing and Config + Status register) though. Could someone explain why interrupt status does not need to be checked for PIO? I'm still troubleshooting the command timeout issue on my test hardware, I will repeat the same test on i386 hardware. In the mean time, I would appreciate any suggestions or known information to isolate the issue. Thanks, Fajun On 3/13/07, Fajun Chen [EMAIL PROTECTED] wrote: On 3/13/07, Alan Cox [EMAIL PROTECTED] wrote: above 1.5. Two timers are used to track command timeout in our test software. The one in the user space is set to 6 seconds using alarm() call while the one in the kernel (scsi timer) is set to 5 seconds. These timeout values are probably too low to be realistic, but the issue here is not about the timeout itself but to understand why it is A lot of drive commands seem to be set up on a seven second worst case always user space timer expired before kernel timer. Since kernel timer uses jiffies to track time, does this imply a kernel bug where the time interrupts were lost or delay somehow? Do you know any know problems related to command timeout in PATA Sil680? Alarm() is also handled by the same jiffies logic, so I suspect a bug in your test environment ? I enabled ata_irq_trap and did the same test again. The kernel timer caught the timeout (10 seconds) this time along with the irq trap traces below. What's the cause of these idle irqs? [42949560.15] SCSI device sdb: drive cache: write back [ 85.57] ata1: irq trap [ 85.82] ata2: irq trap [ 92.12] abnormal status 0xD0 [ 92.12] ata1: irq trap [ 92.92] ata2: irq trap [ 98.75] ata1: irq trap [ 100.26] abnormal status 0xD0 [ 100.26] ata2: irq trap [ 105.54] ata1: irq trap [ 108.05] ata1: irq trap [ 110.62] ata1: irq trap [ 113.13] ata1: irq trap [ 115.53] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 115.53] ata1.00: (BMDMA stat 0x0) [ 115.53] ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout) [ 115.53] ata1: soft resetting port [ 115.57] ata1.00: configured for UDMA/100 [ 115.57] sg_cmd_done: sg0, pack_id=2706, res=0x802, dur=10040 ms [ 115.57] ata1: EH complete [ 115.58] SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB) [ 115.58] sda: Write Protect is off [ 115.58] sda: Mode Sense: 00 3a 00 00 [ 115.58] SCSI device sda: drive cache: write back ... Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATA Sil680 Command Timeout on ARM XScale
Fajun Chen wrote: Since primary channel and secondary channel share the same IRQ, the ISR could be called to service one or both channels. So I would think it's normal to see irq trap traces when both channels are in IO operation, correct? The irq trap code only occurs after a certain number of unhandled interrupts. I have another question in regard to ata_host_intr() function in libata-core.c. For PIO read/write, the status of interrupt pin was not checked before moving the host state machine. Sil680 spec. recommend checking IDE channel interupt (bit 11 in the IDEx Task File Timing and Config + Status register) though. Could someone explain why interrupt status does not need to be checked for PIO? Reading the Status register (as opposed to AltStatus) should clear the interrupt condition, on standard hardware. Jeff - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PATA Sil680 Command Timeout on ARM XScale
Hi Folks, We have a command timeout with Sil680 controller on ARM XScale. The kernel is 2.6.18-rc2 and libata 2.00 with preemptive enabled. Similar problem observed as well with kernel preemptive disabled. ATA pass through and sg are used. Heavy IO test was ran on both channels of Sil680 and the system was pretty loaded where the load average was above 1.5. Two timers are used to track command timeout in our test software. The one in the user space is set to 6 seconds using alarm() call while the one in the kernel (scsi timer) is set to 5 seconds. These timeout values are probably too low to be realistic, but the issue here is not about the timeout itself but to understand why it is always user space timer expired before kernel timer. Since kernel timer uses jiffies to track time, does this imply a kernel bug where the time interrupts were lost or delay somehow? Do you know any know problems related to command timeout in PATA Sil680? Thanks, Fajun User space trace: Cmd 4276 timed out after 7.260137 secs: start time 1173775439.409099 secs, timed out at 1173775446.669236 secs [Tue Mar 13 08:44:06 2007]: Test: Random Write Sectors Extended LBA Low:0 LBA High: 10 ... Num Cmds: 4277 Num_Failed_Cmds:1 ... Status: Fail [Error 401: Command timeout] Dmesg log ~ $ dmesg .77] Calling initcall 0xc001ebb4: inet_diag_init+0x0/0x80() [42949375.77] Calling initcall 0xc001ec34: tcp_diag_init+0x0/0x1c() [42949375.77] Calling initcall 0xc001ec50: bictcp_register+0x0/0x1c() [42949375.77] TCP bic registered [42949375.77] Calling initcall 0xc001ee2c: af_unix_init+0x0/0x80() [42949375.77] NET: Registered protocol family 1 [42949375.77] Calling initcall 0xc001eeac: packet_init+0x0/0x70() [42949375.77] NET: Registered protocol family 17 [42949375.77] Calling initcall 0xc0012a88: clocksource_done_booting+0x0/0x24() [42949375.77] Calling initcall 0xc0019ed4: seqgen_init+0x0/0x1c() [42949375.77] Calling initcall 0xc001ba44: early_uart_console_switch+0x0/0x90() [42949375.77] Calling initcall 0xc013a150: net_random_reseed+0x0/0x38() [42949375.77] RAMDISK: Compressed image found at block 0 [42949378.95] VFS: Mounted root (ext2 filesystem). [42949378.96] Freeing init memory: 104K [42949549.17] ata1: soft resetting port [42949549.25] ata1.00: ATA-6, max UDMA/100, 78140160 sectors: LBA48 [42949549.25] ata1.00: configured for UDMA/100 [42949549.25] ata1: EH complete [42949549.25] Vendor: ATA Model: ST940813AMRev: 3.02 [42949549.25] Type: Direct-Access ANSI SCSI revision: 05 [42949549.26] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) [42949549.26] sda: Write Protect is off [42949549.26] sda: Mode Sense: 00 3a 00 00 [42949549.26] SCSI device sda: drive cache: write back [42949549.27] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) [42949549.27] sda: Write Protect is off [42949549.27] sda: Mode Sense: 00 3a 00 00 [42949549.27] SCSI device sda: drive cache: write back [42949549.27] sda: unknown partition table [42949549.29] sd 0:0:0:0: Attached scsi disk sda [42949549.29] sd 0:0:0:0: Attached scsi generic sg0 type 0 [42949549.32] ata1: soft resetting port [42949549.38] ata1.00: configured for UDMA/100 [42949549.38] ata1: EH complete [42949549.38] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) [42949549.38] sda: Write Protect is off [42949549.38] sda: Mode Sense: 00 3a 00 00 [42949549.39] SCSI device sda: drive cache: write back [42949559.28] ata2: soft resetting port [42949559.42] ata2.00: ATA-6, max UDMA/100, 78140160 sectors: LBA48 [42949559.42] ata2.00: configured for UDMA/100 [42949559.42] ata2: EH complete [42949559.42] Vendor: ATA Model: ST94811A Rev: 3.07 [42949559.42] Type: Direct-Access ANSI SCSI revision: 05 [42949559.43] SCSI device sdb: 78140160 512-byte hdwr sectors (40008 MB) [42949559.43] sdb: Write Protect is off [42949559.43] sdb: Mode Sense: 00 3a 00 00 [42949559.43] SCSI device sdb: drive cache: write back [42949559.43] SCSI device sdb: 78140160 512-byte hdwr sectors (40008 MB) [42949559.44] sdb: Write Protect is off [42949559.44] sdb: Mode Sense: 00 3a 00 00 [42949559.44] SCSI device sdb: drive cache: write back [42949559.44] sdb: unknown partition table [42949559.46] sd 1:0:0:0: Attached scsi disk sdb [42949559.46] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 643.23] NWFPE: ntpd[38] takes exception 0001 at c002d514 from 0001d308 [ 711.23] NWFPE: ntpd[38] takes exception 0001 at c002d514 from 0001d308 [ 777.22] NWFPE: ntpd[38] takes exception 0001 at c002d514 from 0001d308 [ 841.30] NWFPE: ntpd[38] takes exception 0001 at c002d514 from
Re: PATA Sil680 Command Timeout on ARM XScale
above 1.5. Two timers are used to track command timeout in our test software. The one in the user space is set to 6 seconds using alarm() call while the one in the kernel (scsi timer) is set to 5 seconds. These timeout values are probably too low to be realistic, but the issue here is not about the timeout itself but to understand why it is A lot of drive commands seem to be set up on a seven second worst case always user space timer expired before kernel timer. Since kernel timer uses jiffies to track time, does this imply a kernel bug where the time interrupts were lost or delay somehow? Do you know any know problems related to command timeout in PATA Sil680? Alarm() is also handled by the same jiffies logic, so I suspect a bug in your test environment ? - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATA Sil680 Command Timeout on ARM XScale
On 3/13/07, Alan Cox [EMAIL PROTECTED] wrote: above 1.5. Two timers are used to track command timeout in our test software. The one in the user space is set to 6 seconds using alarm() call while the one in the kernel (scsi timer) is set to 5 seconds. These timeout values are probably too low to be realistic, but the issue here is not about the timeout itself but to understand why it is A lot of drive commands seem to be set up on a seven second worst case always user space timer expired before kernel timer. Since kernel timer uses jiffies to track time, does this imply a kernel bug where the time interrupts were lost or delay somehow? Do you know any know problems related to command timeout in PATA Sil680? Alarm() is also handled by the same jiffies logic, so I suspect a bug in your test environment ? I enabled ata_irq_trap and did the same test again. The kernel timer caught the timeout (10 seconds) this time along with the irq trap traces below. What's the cause of these idle irqs? [42949560.15] SCSI device sdb: drive cache: write back [ 85.57] ata1: irq trap [ 85.82] ata2: irq trap [ 92.12] abnormal status 0xD0 [ 92.12] ata1: irq trap [ 92.92] ata2: irq trap [ 98.75] ata1: irq trap [ 100.26] abnormal status 0xD0 [ 100.26] ata2: irq trap [ 105.54] ata1: irq trap [ 108.05] ata1: irq trap [ 110.62] ata1: irq trap [ 113.13] ata1: irq trap [ 115.53] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 115.53] ata1.00: (BMDMA stat 0x0) [ 115.53] ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout) [ 115.53] ata1: soft resetting port [ 115.57] ata1.00: configured for UDMA/100 [ 115.57] sg_cmd_done: sg0, pack_id=2706, res=0x802, dur=10040 ms [ 115.57] ata1: EH complete [ 115.58] SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB) [ 115.58] sda: Write Protect is off [ 115.58] sda: Mode Sense: 00 3a 00 00 [ 115.58] SCSI device sda: drive cache: write back ... Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html