Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-26 Thread Tejun Heo
Tejun Heo wrote:
> Andreas John wrote:
>> 00:00.0 Host bridge [0600]: ATI Technologies Inc Unknown device [1002:7910]
> 
> Okidoki, another ATI host bridge which can't forward MSI write upto the
> cpu.  I'll blacklist it.  Thanks.

Scrap that.  I got confused.  You weren't using MSI.

In the log, you have a detection failure on ata2.00.  What's attached to
the port?

Regarding timeout on /dev/sda: Most disks have a dip switch which forces
the drive to limit itself to 1.5Gbps if you put the dip switch on, is it
any better?

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-26 Thread Ulrich
Hi,


my system has an Nvidia "nForce 630A MCP" chipset.

(Asrock ALiveNF7G-HDready mainboard)


If it helps, I've uploaded the output of "lspci -vvxxx" to:

http://datenparkplatz.de/DiesUndDas/lspci-vvxxx.output.txt.


Best wishes,
Ulrich
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-26 Thread Andreas John
Hm,
it seems we are victim of a SATA NCQ HORKAGE:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg08936.html

Disableing NCQ seems to work on the sb600. Samsung already ships a newer
edition ("403" instead of "401")  but there is no fix for the NCQ  in
the 401 edition. With Samsungs HUTIL you cannot disbale NCQ on the disk
itself. Sadly we have about 20 pcs SAMSUNG HD401LJ of the broken one here :/

Interstingly the ncq horkage does not affect the platforms which are
ICH7-based. We replugged both harddisks to some core2quad/ICH7-based
system without any change in software - and it works without lockups. Is
that simply luck? According to dmesg NCQ is activated on the ICH7.
Is it currently only possible to ATA_HORKAGE_NONCQ generally? Or can
that be done per chipset?

Best Regards,
Andreas

@Ulrich: On what kind of chipset did you observe the horkage?

Andreas John schrieb:
> Hi,
> as requested by Tejun here comes dmesg and lspci -nn. Additionally I
> found something dmesg (See 1.), probably caused by the low-level error.
> 
> Best Regards,
> Andreas
> 
> 1.)
> [81739.06] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK,SUGGEST_OK
> [81739.06] end_request: I/O error, dev sdb, sector 0
> 
> 2.) lspci -nn
> [EMAIL PROTECTED]:~# lspci -nn
> 00:00.0 Host bridge [0600]: ATI Technologies Inc Unknown device [1002:7910]
> 00:01.0 PCI bridge [0604]: ATI Technologies Inc Unknown device [1002:7912]
> 00:12.0 SATA controller [0106]: ATI Technologies Inc SB600 Non-Raid-5
> SATA [1002:4380]
> 00:13.0 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI0)
> [1002:4387]
> 00:13.1 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI1)
[SNIP]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-26 Thread Tejun Heo
Andreas John wrote:
> 00:00.0 Host bridge [0600]: ATI Technologies Inc Unknown device [1002:7910]

Okidoki, another ATI host bridge which can't forward MSI write upto the
cpu.  I'll blacklist it.  Thanks.

-- 
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-26 Thread Tejun Heo
Andreas John wrote:
 00:00.0 Host bridge [0600]: ATI Technologies Inc Unknown device [1002:7910]

Okidoki, another ATI host bridge which can't forward MSI write upto the
cpu.  I'll blacklist it.  Thanks.

-- 
tejun

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-26 Thread Andreas John
Hm,
it seems we are victim of a SATA NCQ HORKAGE:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg08936.html

Disableing NCQ seems to work on the sb600. Samsung already ships a newer
edition (403 instead of 401)  but there is no fix for the NCQ  in
the 401 edition. With Samsungs HUTIL you cannot disbale NCQ on the disk
itself. Sadly we have about 20 pcs SAMSUNG HD401LJ of the broken one here :/

Interstingly the ncq horkage does not affect the platforms which are
ICH7-based. We replugged both harddisks to some core2quad/ICH7-based
system without any change in software - and it works without lockups. Is
that simply luck? According to dmesg NCQ is activated on the ICH7.
Is it currently only possible to ATA_HORKAGE_NONCQ generally? Or can
that be done per chipset?

Best Regards,
Andreas

@Ulrich: On what kind of chipset did you observe the horkage?

Andreas John schrieb:
 Hi,
 as requested by Tejun here comes dmesg and lspci -nn. Additionally I
 found something dmesg (See 1.), probably caused by the low-level error.
 
 Best Regards,
 Andreas
 
 1.)
 [81739.06] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
 driverbyte=DRIVER_OK,SUGGEST_OK
 [81739.06] end_request: I/O error, dev sdb, sector 0
 
 2.) lspci -nn
 [EMAIL PROTECTED]:~# lspci -nn
 00:00.0 Host bridge [0600]: ATI Technologies Inc Unknown device [1002:7910]
 00:01.0 PCI bridge [0604]: ATI Technologies Inc Unknown device [1002:7912]
 00:12.0 SATA controller [0106]: ATI Technologies Inc SB600 Non-Raid-5
 SATA [1002:4380]
 00:13.0 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI0)
 [1002:4387]
 00:13.1 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI1)
[SNIP]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-26 Thread Ulrich
Hi,


my system has an Nvidia nForce 630A MCP chipset.

(Asrock ALiveNF7G-HDready mainboard)


If it helps, I've uploaded the output of lspci -vvxxx to:

http://datenparkplatz.de/DiesUndDas/lspci-vvxxx.output.txt.


Best wishes,
Ulrich
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-26 Thread Tejun Heo
Tejun Heo wrote:
 Andreas John wrote:
 00:00.0 Host bridge [0600]: ATI Technologies Inc Unknown device [1002:7910]
 
 Okidoki, another ATI host bridge which can't forward MSI write upto the
 cpu.  I'll blacklist it.  Thanks.

Scrap that.  I got confused.  You weren't using MSI.

In the log, you have a detection failure on ata2.00.  What's attached to
the port?

Regarding timeout on /dev/sda: Most disks have a dip switch which forces
the drive to limit itself to 1.5Gbps if you put the dip switch on, is it
any better?

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-23 Thread Andreas John
Hi,
as requested by Tejun here comes dmesg and lspci -nn. Additionally I
found something dmesg (See 1.), probably caused by the low-level error.

Best Regards,
Andreas

1.)
[81739.06] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
[81739.06] end_request: I/O error, dev sdb, sector 0

2.) lspci -nn
[EMAIL PROTECTED]:~# lspci -nn
00:00.0 Host bridge [0600]: ATI Technologies Inc Unknown device [1002:7910]
00:01.0 PCI bridge [0604]: ATI Technologies Inc Unknown device [1002:7912]
00:12.0 SATA controller [0106]: ATI Technologies Inc SB600 Non-Raid-5
SATA [1002:4380]
00:13.0 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI0)
[1002:4387]
00:13.1 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI1)
[1002:4388]
00:13.2 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI2)
[1002:4389]
00:13.3 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI3)
[1002:438a]
00:13.4 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI4)
[1002:438b]
00:13.5 USB Controller [0c03]: ATI Technologies Inc SB600 USB Controller
(EHCI) [1002:4386]
00:14.0 SMBus [0c05]: ATI Technologies Inc SB600 SMBus [1002:4385] (rev 14)
00:14.1 IDE interface [0101]: ATI Technologies Inc SB600 IDE [1002:438c]
00:14.2 Audio device [0403]: ATI Technologies Inc SB600 Azalia [1002:4383]
00:14.3 ISA bridge [0601]: ATI Technologies Inc SB600 PCI to LPC Bridge
[1002:438d]
00:14.4 PCI bridge [0604]: ATI Technologies Inc SB600 PCI to PCI Bridge
[1002:4384]
00:18.0 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] HyperTransport Technology Configuration [1022:1100]
00:18.1 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] Address Map [1022:1101]
00:18.2 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] DRAM Controller [1022:1102]
00:18.3 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] Miscellaneous Control [1022:1103]
01:05.0 VGA compatible controller [0300]: ATI Technologies Inc Unknown
device [1002:791e]
02:0f.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL-8169SC Gigabit Ethernet [10ec:8167] (rev 10)


3.) A "fresh" dmesg
[EMAIL PROTECTED]:~# dmesg
[0.00] Linux version 2.6.22-9-server ([EMAIL PROTECTED]) (gcc
version 4.1.3 20070718 (prerelease) (Ubuntu 4.1.2-14ubuntu1)) #1 SMP Fri
Aug 3 01:19:51 GMT 2007 (Ubuntu 2.6.22-9.25-server)
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009f800 (usable)
[0.00]  BIOS-e820: 0009f800 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 7dee (usable)
[0.00]  BIOS-e820: 7dee - 7dee3000 (ACPI NVS)
[0.00]  BIOS-e820: 7dee3000 - 7def (ACPI data)
[0.00]  BIOS-e820: 7def - 7df0 (reserved)
[0.00]  BIOS-e820: e000 - f000 (reserved)
[0.00]  BIOS-e820: fec0 - 0001 (reserved)
[0.00] 1118MB HIGHMEM available.
[0.00] 896MB LOWMEM available.
[0.00] found SMP MP-table at 000f4fa0
[0.00] NX (Execute Disable) protection: active
[0.00] Entering add_active_range(0, 0, 515808) 0 entries of 256 used
[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   Normal   4096 ->   229376
[0.00]   HighMem229376 ->   515808
[0.00] early_node_map[1] active PFN ranges
[0.00] 0:0 ->   515808
[0.00] On node 0 totalpages: 515808
[0.00]   DMA zone: 32 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 4064 pages, LIFO batch:0
[0.00]   Normal zone: 1760 pages used for memmap
[0.00]   Normal zone: 223520 pages, LIFO batch:31
[0.00]   HighMem zone: 2237 pages used for memmap
[0.00]   HighMem zone: 284195 pages, LIFO batch:31
[0.00] DMI 2.3 present.
[0.00] ACPI: RSDP 000F6940, 0014 (r0 GBT   )
[0.00] ACPI: RSDT 7DEE3000, 0034 (r1 GBTGBTUACPI 42302E31
GBTU  1010101)
[0.00] ACPI: FACP 7DEE3040, 0074 (r1 GBTGBTUACPI 42302E31
GBTU  1010101)
[0.00] ACPI: DSDT 7DEE30C0, 39B1 (r1 GBTAWRDACPI 1000
MSFT  10C)
[0.00] ACPI: FACS 7DEE, 0040
[0.00] ACPI: SSDT 7DEE6B00, 01C4 (r1 PTLTD  POWERNOW1
LTP1)
[0.00] ACPI: MCFG 7DEE6D00, 003C (r1 GBTGBTUACPI 42302E31
GBTU  1010101)
[0.00] ACPI: APIC 7DEE6A80, 0068 (r1 GBTGBTUACPI 42302E31
GBTU  1010101)
[0.00] ATI board detected. Disabling timer routing over 8254.
[0.00] ACPI: PM-Timer IO Port: 0x4008
[0.00] ACPI: Local APIC address 0xfee0
[0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0.00] Processor #0 15:11 APIC version 16
[

Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-23 Thread Andreas John
Hi,
as requested by Tejun here comes dmesg and lspci -nn. Additionally I
found something dmesg (See 1.), probably caused by the low-level error.

Best Regards,
Andreas

1.)
[81739.06] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
[81739.06] end_request: I/O error, dev sdb, sector 0

2.) lspci -nn
[EMAIL PROTECTED]:~# lspci -nn
00:00.0 Host bridge [0600]: ATI Technologies Inc Unknown device [1002:7910]
00:01.0 PCI bridge [0604]: ATI Technologies Inc Unknown device [1002:7912]
00:12.0 SATA controller [0106]: ATI Technologies Inc SB600 Non-Raid-5
SATA [1002:4380]
00:13.0 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI0)
[1002:4387]
00:13.1 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI1)
[1002:4388]
00:13.2 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI2)
[1002:4389]
00:13.3 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI3)
[1002:438a]
00:13.4 USB Controller [0c03]: ATI Technologies Inc SB600 USB (OHCI4)
[1002:438b]
00:13.5 USB Controller [0c03]: ATI Technologies Inc SB600 USB Controller
(EHCI) [1002:4386]
00:14.0 SMBus [0c05]: ATI Technologies Inc SB600 SMBus [1002:4385] (rev 14)
00:14.1 IDE interface [0101]: ATI Technologies Inc SB600 IDE [1002:438c]
00:14.2 Audio device [0403]: ATI Technologies Inc SB600 Azalia [1002:4383]
00:14.3 ISA bridge [0601]: ATI Technologies Inc SB600 PCI to LPC Bridge
[1002:438d]
00:14.4 PCI bridge [0604]: ATI Technologies Inc SB600 PCI to PCI Bridge
[1002:4384]
00:18.0 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] HyperTransport Technology Configuration [1022:1100]
00:18.1 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] Address Map [1022:1101]
00:18.2 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] DRAM Controller [1022:1102]
00:18.3 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] Miscellaneous Control [1022:1103]
01:05.0 VGA compatible controller [0300]: ATI Technologies Inc Unknown
device [1002:791e]
02:0f.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL-8169SC Gigabit Ethernet [10ec:8167] (rev 10)


3.) A fresh dmesg
[EMAIL PROTECTED]:~# dmesg
[0.00] Linux version 2.6.22-9-server ([EMAIL PROTECTED]) (gcc
version 4.1.3 20070718 (prerelease) (Ubuntu 4.1.2-14ubuntu1)) #1 SMP Fri
Aug 3 01:19:51 GMT 2007 (Ubuntu 2.6.22-9.25-server)
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009f800 (usable)
[0.00]  BIOS-e820: 0009f800 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 7dee (usable)
[0.00]  BIOS-e820: 7dee - 7dee3000 (ACPI NVS)
[0.00]  BIOS-e820: 7dee3000 - 7def (ACPI data)
[0.00]  BIOS-e820: 7def - 7df0 (reserved)
[0.00]  BIOS-e820: e000 - f000 (reserved)
[0.00]  BIOS-e820: fec0 - 0001 (reserved)
[0.00] 1118MB HIGHMEM available.
[0.00] 896MB LOWMEM available.
[0.00] found SMP MP-table at 000f4fa0
[0.00] NX (Execute Disable) protection: active
[0.00] Entering add_active_range(0, 0, 515808) 0 entries of 256 used
[0.00] Zone PFN ranges:
[0.00]   DMA 0 - 4096
[0.00]   Normal   4096 -   229376
[0.00]   HighMem229376 -   515808
[0.00] early_node_map[1] active PFN ranges
[0.00] 0:0 -   515808
[0.00] On node 0 totalpages: 515808
[0.00]   DMA zone: 32 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 4064 pages, LIFO batch:0
[0.00]   Normal zone: 1760 pages used for memmap
[0.00]   Normal zone: 223520 pages, LIFO batch:31
[0.00]   HighMem zone: 2237 pages used for memmap
[0.00]   HighMem zone: 284195 pages, LIFO batch:31
[0.00] DMI 2.3 present.
[0.00] ACPI: RSDP 000F6940, 0014 (r0 GBT   )
[0.00] ACPI: RSDT 7DEE3000, 0034 (r1 GBTGBTUACPI 42302E31
GBTU  1010101)
[0.00] ACPI: FACP 7DEE3040, 0074 (r1 GBTGBTUACPI 42302E31
GBTU  1010101)
[0.00] ACPI: DSDT 7DEE30C0, 39B1 (r1 GBTAWRDACPI 1000
MSFT  10C)
[0.00] ACPI: FACS 7DEE, 0040
[0.00] ACPI: SSDT 7DEE6B00, 01C4 (r1 PTLTD  POWERNOW1
LTP1)
[0.00] ACPI: MCFG 7DEE6D00, 003C (r1 GBTGBTUACPI 42302E31
GBTU  1010101)
[0.00] ACPI: APIC 7DEE6A80, 0068 (r1 GBTGBTUACPI 42302E31
GBTU  1010101)
[0.00] ATI board detected. Disabling timer routing over 8254.
[0.00] ACPI: PM-Timer IO Port: 0x4008
[0.00] ACPI: Local APIC address 0xfee0
[0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0.00] Processor #0 15:11 APIC version 16
[0.00] 

Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-22 Thread Tejun Heo
Andreas John wrote:
> Hm,
> I should add that on 2.6.22-amd64 (ubuntu gutsy) the log entry is as
> follows:
> 
> 8<--
> ata2.00 excetion Emask 0x40 SAct 0x0 SErr 0x800 action 0x2 frozen
> ata2.00 tag 0 cmd 0xea Emask 0x44 stat 0x40 err 0x0 timeout
> 1st FIS failed
> 8<--

Please post full kernel boot log and the result of 'lspci -nn'.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-22 Thread Andreas John
Hm,
I should add that on 2.6.22-amd64 (ubuntu gutsy) the log entry is as
follows:

8<--
ata2.00 excetion Emask 0x40 SAct 0x0 SErr 0x800 action 0x2 frozen
ata2.00 tag 0 cmd 0xea Emask 0x44 stat 0x40 err 0x0 timeout
1st FIS failed
8<--

rgds,
Andreas


Andreas John schrieb:
> Hi SB600-folks,
> 
> we bought some AMD690/sb600 based mobos and try go get them working. I
> followed the patches on LKML and switched from Debian Etch 2.6.18-x
> kernel to 2.6.22, just to ensure that all patches are already applied.
> But we still have strange errors/lockups and we found a way to reproduce
> them: simply run checkarry --all and do some dd if=/dev/sda 
> parallely. We notive load avg going up and then boom ... lockup,
> softraid broken:
> 
> ---<8
> ata2.00: exception Emask 0x0 SAct 0X2 SErr 0x= action 0x0
> ata2.00: (irq_stat 0x4008)
> ata2.00: cmd 60/00:00:00:69:71/01:00:06:00:00/40 tag 0 cdb 0x0 data
> 131072 in
> ---<8
> 
> This appears with ahci. If I switch to atiixp I only see the cdrom and
> one harddisk, the second does not appear at all and -depending on the
> setting in BIOS setup ahci->sata, native ide, legacy ide- only the cdrom
> appears.
> 
> I might note that I first ran into that trouble on amd64 with 4GB RAM.
> Then I swicthed back to 2 GB and back to i386 / 2 GB. The error message
> above is from the i386 / 2 GB variant, but all suffer from this strange
> sata pain, I am not 100% sure, if the log entriea read the same of onyl
> similar. I also tried pci=nomsi some times, but I was still able to
> trigger the bug. I might also note, that I noticed the problem on amd64
> arch and it was simply to trigger it there, but with the checkarry --all
> trick I was also able to trigger it on i386.
> 
> Is there anything I can further test? I you provide a patch, I will
> glady test it.
> 
> best regards,
> Andreas
> 
> 
> Conke Hu schrieb:
>> On 3/15/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
>>> Conke Hu wrote:
> E  Internal error: The host bus adapter experienced an internal error
> that caused the operation to fail and may have put the host bus
>>> adapter
> into an error state. Host software should reset the interface before
> re-trying the operation. If the condition persists, the host bus
>>> adapter
> may suffer from a design issue rendering it incompatible with the
> attached device.
>
 Yes, I saw this too :) and I am contacting the hardware engineers to
 check if there is any hardware bug.
 But, even though this were a hardware bug and could be fixed, we would
 still need this patch since many SB600 boards have already come into
 the market and those ASICs can never be fixed :(
>>> Yeap, we certainly need the workaround.  I was just having a little fun.
>>>  :-)
>>>
> 4381 isn't affected while 4380 is?
 I never see such an ID, and plan to remove 0x4381.
 The patch which added the PCI IDs was not sent out by myself. I
 checked all SB600 boards, and not found any 0x4381 controller, only
 0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
 device ID, only with class code different.
>>> I see.
>>>
> Anyways, Conke Hu, can you please take a look at my patch from a month
> ago?  It's almost identical but SERR_INTERNAL is always ignored on
>>> both
> SB600 PCI IDs, which I think is safer.  Does this fix what you're
>>> seeing?
 I just read your patch. Another difference is that my patch ignores
 SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
 other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?
>>> Yeah, I noticed the difference.  I don't really care but I was thinking
>>> that SERR_INTERNAL might be set in other similar situations too.  e.g.
>>> TF error from ATA device or what not, so I thought it would be safer to
>>> ignore the bit altogether.  You probably need to consult your hardware
>>> people about when exactly the bit misbehaves but unless proven
>>> otherwise, I'd prefer to always ignore the bit.  Also, please rename the
>>> enum constant and flag name.
>>>
>> Thank you, Tejun!
>> I was discussing with our HW designers on this topic. It is a HW
>> design issue and will be fixed in SB700, the next generation of
>> AMD/ATI southbridge.
>>
>> The correct walkaround/solution for SB600 SATA is:
>> 1. ignore SERR_INTERNAL for both ATA and ATAPI device (as you suggested
>> :p ).
>> 2. ignore SERR_INTERNAL only on IRQ_TF_ERR.
>>
>> I'll re-create the patch.
>>
>> Conke
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [EMAIL PROTECTED]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please 

Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-22 Thread Andreas John
Hi SB600-folks,

we bought some AMD690/sb600 based mobos and try go get them working. I
followed the patches on LKML and switched from Debian Etch 2.6.18-x
kernel to 2.6.22, just to ensure that all patches are already applied.
But we still have strange errors/lockups and we found a way to reproduce
them: simply run checkarry --all and do some dd if=/dev/sda 
parallely. We notive load avg going up and then boom ... lockup,
softraid broken:

---<8
ata2.00: exception Emask 0x0 SAct 0X2 SErr 0x= action 0x0
ata2.00: (irq_stat 0x4008)
ata2.00: cmd 60/00:00:00:69:71/01:00:06:00:00/40 tag 0 cdb 0x0 data
131072 in
---<8

This appears with ahci. If I switch to atiixp I only see the cdrom and
one harddisk, the second does not appear at all and -depending on the
setting in BIOS setup ahci->sata, native ide, legacy ide- only the cdrom
appears.

I might note that I first ran into that trouble on amd64 with 4GB RAM.
Then I swicthed back to 2 GB and back to i386 / 2 GB. The error message
above is from the i386 / 2 GB variant, but all suffer from this strange
sata pain, I am not 100% sure, if the log entriea read the same of onyl
similar. I also tried pci=nomsi some times, but I was still able to
trigger the bug. I might also note, that I noticed the problem on amd64
arch and it was simply to trigger it there, but with the checkarry --all
trick I was also able to trigger it on i386.

Is there anything I can further test? I you provide a patch, I will
glady test it.

best regards,
Andreas


Conke Hu schrieb:
> On 3/15/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
>> Conke Hu wrote:
>> >> E  Internal error: The host bus adapter experienced an internal error
>> >> that caused the operation to fail and may have put the host bus
>> adapter
>> >> into an error state. Host software should reset the interface before
>> >> re-trying the operation. If the condition persists, the host bus
>> adapter
>> >> may suffer from a design issue rendering it incompatible with the
>> >> attached device.
>> >>
>> >
>> > Yes, I saw this too :) and I am contacting the hardware engineers to
>> > check if there is any hardware bug.
>> > But, even though this were a hardware bug and could be fixed, we would
>> > still need this patch since many SB600 boards have already come into
>> > the market and those ASICs can never be fixed :(
>>
>> Yeap, we certainly need the workaround.  I was just having a little fun.
>>  :-)
>>
>> >> 4381 isn't affected while 4380 is?
>> >
>> > I never see such an ID, and plan to remove 0x4381.
>> > The patch which added the PCI IDs was not sent out by myself. I
>> > checked all SB600 boards, and not found any 0x4381 controller, only
>> > 0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
>> > device ID, only with class code different.
>>
>> I see.
>>
>> >> Anyways, Conke Hu, can you please take a look at my patch from a month
>> >> ago?  It's almost identical but SERR_INTERNAL is always ignored on
>> both
>> >> SB600 PCI IDs, which I think is safer.  Does this fix what you're
>> seeing?
>> >>
>> >
>> > I just read your patch. Another difference is that my patch ignores
>> > SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
>> > other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?
>>
>> Yeah, I noticed the difference.  I don't really care but I was thinking
>> that SERR_INTERNAL might be set in other similar situations too.  e.g.
>> TF error from ATA device or what not, so I thought it would be safer to
>> ignore the bit altogether.  You probably need to consult your hardware
>> people about when exactly the bit misbehaves but unless proven
>> otherwise, I'd prefer to always ignore the bit.  Also, please rename the
>> enum constant and flag name.
>>
> 
> Thank you, Tejun!
> I was discussing with our HW designers on this topic. It is a HW
> design issue and will be fixed in SB700, the next generation of
> AMD/ATI southbridge.
> 
> The correct walkaround/solution for SB600 SATA is:
> 1. ignore SERR_INTERNAL for both ATA and ATAPI device (as you suggested
> :p ).
> 2. ignore SERR_INTERNAL only on IRQ_TF_ERR.
> 
> I'll re-create the patch.
> 
> Conke
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-22 Thread Andreas John
Hi SB600-folks,

we bought some AMD690/sb600 based mobos and try go get them working. I
followed the patches on LKML and switched from Debian Etch 2.6.18-x
kernel to 2.6.22, just to ensure that all patches are already applied.
But we still have strange errors/lockups and we found a way to reproduce
them: simply run checkarry --all and do some dd if=/dev/sda 
parallely. We notive load avg going up and then boom ... lockup,
softraid broken:

---8
ata2.00: exception Emask 0x0 SAct 0X2 SErr 0x= action 0x0
ata2.00: (irq_stat 0x4008)
ata2.00: cmd 60/00:00:00:69:71/01:00:06:00:00/40 tag 0 cdb 0x0 data
131072 in
---8

This appears with ahci. If I switch to atiixp I only see the cdrom and
one harddisk, the second does not appear at all and -depending on the
setting in BIOS setup ahci-sata, native ide, legacy ide- only the cdrom
appears.

I might note that I first ran into that trouble on amd64 with 4GB RAM.
Then I swicthed back to 2 GB and back to i386 / 2 GB. The error message
above is from the i386 / 2 GB variant, but all suffer from this strange
sata pain, I am not 100% sure, if the log entriea read the same of onyl
similar. I also tried pci=nomsi some times, but I was still able to
trigger the bug. I might also note, that I noticed the problem on amd64
arch and it was simply to trigger it there, but with the checkarry --all
trick I was also able to trigger it on i386.

Is there anything I can further test? I you provide a patch, I will
glady test it.

best regards,
Andreas


Conke Hu schrieb:
 On 3/15/07, Tejun Heo [EMAIL PROTECTED] wrote:
 Conke Hu wrote:
  E  Internal error: The host bus adapter experienced an internal error
  that caused the operation to fail and may have put the host bus
 adapter
  into an error state. Host software should reset the interface before
  re-trying the operation. If the condition persists, the host bus
 adapter
  may suffer from a design issue rendering it incompatible with the
  attached device.
 
 
  Yes, I saw this too :) and I am contacting the hardware engineers to
  check if there is any hardware bug.
  But, even though this were a hardware bug and could be fixed, we would
  still need this patch since many SB600 boards have already come into
  the market and those ASICs can never be fixed :(

 Yeap, we certainly need the workaround.  I was just having a little fun.
  :-)

  4381 isn't affected while 4380 is?
 
  I never see such an ID, and plan to remove 0x4381.
  The patch which added the PCI IDs was not sent out by myself. I
  checked all SB600 boards, and not found any 0x4381 controller, only
  0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
  device ID, only with class code different.

 I see.

  Anyways, Conke Hu, can you please take a look at my patch from a month
  ago?  It's almost identical but SERR_INTERNAL is always ignored on
 both
  SB600 PCI IDs, which I think is safer.  Does this fix what you're
 seeing?
 
 
  I just read your patch. Another difference is that my patch ignores
  SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
  other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?

 Yeah, I noticed the difference.  I don't really care but I was thinking
 that SERR_INTERNAL might be set in other similar situations too.  e.g.
 TF error from ATA device or what not, so I thought it would be safer to
 ignore the bit altogether.  You probably need to consult your hardware
 people about when exactly the bit misbehaves but unless proven
 otherwise, I'd prefer to always ignore the bit.  Also, please rename the
 enum constant and flag name.

 
 Thank you, Tejun!
 I was discussing with our HW designers on this topic. It is a HW
 design issue and will be fixed in SB700, the next generation of
 AMD/ATI southbridge.
 
 The correct walkaround/solution for SB600 SATA is:
 1. ignore SERR_INTERNAL for both ATA and ATAPI device (as you suggested
 :p ).
 2. ignore SERR_INTERNAL only on IRQ_TF_ERR.
 
 I'll re-create the patch.
 
 Conke
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-22 Thread Andreas John
Hm,
I should add that on 2.6.22-amd64 (ubuntu gutsy) the log entry is as
follows:

8--
ata2.00 excetion Emask 0x40 SAct 0x0 SErr 0x800 action 0x2 frozen
ata2.00 tag 0 cmd 0xea Emask 0x44 stat 0x40 err 0x0 timeout
1st FIS failed
8--

rgds,
Andreas


Andreas John schrieb:
 Hi SB600-folks,
 
 we bought some AMD690/sb600 based mobos and try go get them working. I
 followed the patches on LKML and switched from Debian Etch 2.6.18-x
 kernel to 2.6.22, just to ensure that all patches are already applied.
 But we still have strange errors/lockups and we found a way to reproduce
 them: simply run checkarry --all and do some dd if=/dev/sda 
 parallely. We notive load avg going up and then boom ... lockup,
 softraid broken:
 
 ---8
 ata2.00: exception Emask 0x0 SAct 0X2 SErr 0x= action 0x0
 ata2.00: (irq_stat 0x4008)
 ata2.00: cmd 60/00:00:00:69:71/01:00:06:00:00/40 tag 0 cdb 0x0 data
 131072 in
 ---8
 
 This appears with ahci. If I switch to atiixp I only see the cdrom and
 one harddisk, the second does not appear at all and -depending on the
 setting in BIOS setup ahci-sata, native ide, legacy ide- only the cdrom
 appears.
 
 I might note that I first ran into that trouble on amd64 with 4GB RAM.
 Then I swicthed back to 2 GB and back to i386 / 2 GB. The error message
 above is from the i386 / 2 GB variant, but all suffer from this strange
 sata pain, I am not 100% sure, if the log entriea read the same of onyl
 similar. I also tried pci=nomsi some times, but I was still able to
 trigger the bug. I might also note, that I noticed the problem on amd64
 arch and it was simply to trigger it there, but with the checkarry --all
 trick I was also able to trigger it on i386.
 
 Is there anything I can further test? I you provide a patch, I will
 glady test it.
 
 best regards,
 Andreas
 
 
 Conke Hu schrieb:
 On 3/15/07, Tejun Heo [EMAIL PROTECTED] wrote:
 Conke Hu wrote:
 E  Internal error: The host bus adapter experienced an internal error
 that caused the operation to fail and may have put the host bus
 adapter
 into an error state. Host software should reset the interface before
 re-trying the operation. If the condition persists, the host bus
 adapter
 may suffer from a design issue rendering it incompatible with the
 attached device.

 Yes, I saw this too :) and I am contacting the hardware engineers to
 check if there is any hardware bug.
 But, even though this were a hardware bug and could be fixed, we would
 still need this patch since many SB600 boards have already come into
 the market and those ASICs can never be fixed :(
 Yeap, we certainly need the workaround.  I was just having a little fun.
  :-)

 4381 isn't affected while 4380 is?
 I never see such an ID, and plan to remove 0x4381.
 The patch which added the PCI IDs was not sent out by myself. I
 checked all SB600 boards, and not found any 0x4381 controller, only
 0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
 device ID, only with class code different.
 I see.

 Anyways, Conke Hu, can you please take a look at my patch from a month
 ago?  It's almost identical but SERR_INTERNAL is always ignored on
 both
 SB600 PCI IDs, which I think is safer.  Does this fix what you're
 seeing?
 I just read your patch. Another difference is that my patch ignores
 SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
 other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?
 Yeah, I noticed the difference.  I don't really care but I was thinking
 that SERR_INTERNAL might be set in other similar situations too.  e.g.
 TF error from ATA device or what not, so I thought it would be safer to
 ignore the bit altogether.  You probably need to consult your hardware
 people about when exactly the bit misbehaves but unless proven
 otherwise, I'd prefer to always ignore the bit.  Also, please rename the
 enum constant and flag name.

 Thank you, Tejun!
 I was discussing with our HW designers on this topic. It is a HW
 design issue and will be fixed in SB700, the next generation of
 AMD/ATI southbridge.

 The correct walkaround/solution for SB600 SATA is:
 1. ignore SERR_INTERNAL for both ATA and ATAPI device (as you suggested
 :p ).
 2. ignore SERR_INTERNAL only on IRQ_TF_ERR.

 I'll re-create the patch.

 Conke
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-08-22 Thread Tejun Heo
Andreas John wrote:
 Hm,
 I should add that on 2.6.22-amd64 (ubuntu gutsy) the log entry is as
 follows:
 
 8--
 ata2.00 excetion Emask 0x40 SAct 0x0 SErr 0x800 action 0x2 frozen
 ata2.00 tag 0 cmd 0xea Emask 0x44 stat 0x40 err 0x0 timeout
 1st FIS failed
 8--

Please post full kernel boot log and the result of 'lspci -nn'.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-03-27 Thread Conke Hu

On 3/15/07, Tejun Heo <[EMAIL PROTECTED]> wrote:

Conke Hu wrote:
>> E  Internal error: The host bus adapter experienced an internal error
>> that caused the operation to fail and may have put the host bus adapter
>> into an error state. Host software should reset the interface before
>> re-trying the operation. If the condition persists, the host bus adapter
>> may suffer from a design issue rendering it incompatible with the
>> attached device.
>>
>
> Yes, I saw this too :) and I am contacting the hardware engineers to
> check if there is any hardware bug.
> But, even though this were a hardware bug and could be fixed, we would
> still need this patch since many SB600 boards have already come into
> the market and those ASICs can never be fixed :(

Yeap, we certainly need the workaround.  I was just having a little fun.
 :-)

>> 4381 isn't affected while 4380 is?
>
> I never see such an ID, and plan to remove 0x4381.
> The patch which added the PCI IDs was not sent out by myself. I
> checked all SB600 boards, and not found any 0x4381 controller, only
> 0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
> device ID, only with class code different.

I see.

>> Anyways, Conke Hu, can you please take a look at my patch from a month
>> ago?  It's almost identical but SERR_INTERNAL is always ignored on both
>> SB600 PCI IDs, which I think is safer.  Does this fix what you're seeing?
>>
>
> I just read your patch. Another difference is that my patch ignores
> SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
> other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?

Yeah, I noticed the difference.  I don't really care but I was thinking
that SERR_INTERNAL might be set in other similar situations too.  e.g.
TF error from ATA device or what not, so I thought it would be safer to
ignore the bit altogether.  You probably need to consult your hardware
people about when exactly the bit misbehaves but unless proven
otherwise, I'd prefer to always ignore the bit.  Also, please rename the
enum constant and flag name.



Thank you, Tejun!
I was discussing with our HW designers on this topic. It is a HW
design issue and will be fixed in SB700, the next generation of
AMD/ATI southbridge.

The correct walkaround/solution for SB600 SATA is:
1. ignore SERR_INTERNAL for both ATA and ATAPI device (as you suggested :p ).
2. ignore SERR_INTERNAL only on IRQ_TF_ERR.

I'll re-create the patch.

Conke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-03-27 Thread Conke Hu

On 3/15/07, Tejun Heo [EMAIL PROTECTED] wrote:

Conke Hu wrote:
 E  Internal error: The host bus adapter experienced an internal error
 that caused the operation to fail and may have put the host bus adapter
 into an error state. Host software should reset the interface before
 re-trying the operation. If the condition persists, the host bus adapter
 may suffer from a design issue rendering it incompatible with the
 attached device.


 Yes, I saw this too :) and I am contacting the hardware engineers to
 check if there is any hardware bug.
 But, even though this were a hardware bug and could be fixed, we would
 still need this patch since many SB600 boards have already come into
 the market and those ASICs can never be fixed :(

Yeap, we certainly need the workaround.  I was just having a little fun.
 :-)

 4381 isn't affected while 4380 is?

 I never see such an ID, and plan to remove 0x4381.
 The patch which added the PCI IDs was not sent out by myself. I
 checked all SB600 boards, and not found any 0x4381 controller, only
 0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
 device ID, only with class code different.

I see.

 Anyways, Conke Hu, can you please take a look at my patch from a month
 ago?  It's almost identical but SERR_INTERNAL is always ignored on both
 SB600 PCI IDs, which I think is safer.  Does this fix what you're seeing?


 I just read your patch. Another difference is that my patch ignores
 SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
 other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?

Yeah, I noticed the difference.  I don't really care but I was thinking
that SERR_INTERNAL might be set in other similar situations too.  e.g.
TF error from ATA device or what not, so I thought it would be safer to
ignore the bit altogether.  You probably need to consult your hardware
people about when exactly the bit misbehaves but unless proven
otherwise, I'd prefer to always ignore the bit.  Also, please rename the
enum constant and flag name.



Thank you, Tejun!
I was discussing with our HW designers on this topic. It is a HW
design issue and will be fixed in SB700, the next generation of
AMD/ATI southbridge.

The correct walkaround/solution for SB600 SATA is:
1. ignore SERR_INTERNAL for both ATA and ATAPI device (as you suggested :p ).
2. ignore SERR_INTERNAL only on IRQ_TF_ERR.

I'll re-create the patch.

Conke
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-03-15 Thread Tejun Heo
Conke Hu wrote:
>> E  Internal error: The host bus adapter experienced an internal error
>> that caused the operation to fail and may have put the host bus adapter
>> into an error state. Host software should reset the interface before
>> re-trying the operation. If the condition persists, the host bus adapter
>> may suffer from a design issue rendering it incompatible with the
>> attached device.
>>
> 
> Yes, I saw this too :) and I am contacting the hardware engineers to
> check if there is any hardware bug.
> But, even though this were a hardware bug and could be fixed, we would
> still need this patch since many SB600 boards have already come into
> the market and those ASICs can never be fixed :(

Yeap, we certainly need the workaround.  I was just having a little fun.
 :-)

>> 4381 isn't affected while 4380 is?
> 
> I never see such an ID, and plan to remove 0x4381.
> The patch which added the PCI IDs was not sent out by myself. I
> checked all SB600 boards, and not found any 0x4381 controller, only
> 0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
> device ID, only with class code different.

I see.

>> Anyways, Conke Hu, can you please take a look at my patch from a month
>> ago?  It's almost identical but SERR_INTERNAL is always ignored on both
>> SB600 PCI IDs, which I think is safer.  Does this fix what you're seeing?
>>
> 
> I just read your patch. Another difference is that my patch ignores
> SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
> other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?

Yeah, I noticed the difference.  I don't really care but I was thinking
that SERR_INTERNAL might be set in other similar situations too.  e.g.
TF error from ATA device or what not, so I thought it would be safer to
ignore the bit altogether.  You probably need to consult your hardware
people about when exactly the bit misbehaves but unless proven
otherwise, I'd prefer to always ignore the bit.  Also, please rename the
enum constant and flag name.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-03-15 Thread Conke Hu

On 3/14/07, Tejun Heo <[EMAIL PROTECTED]> wrote:

Hello,

Conke Hu wrote:
>When there is no media in SATA CD/DVD drive or media is not ready,
> AHCI controller fails to execute the ATAPI commands TEST_UNIT_READY,
> READ_CAPACITY or READ_TOC and reports PORT_IRQ_TF_ERR. But ATI SB600
> SATA controller sets SERR_INTERNAL bit in the error register at the
> same time, which is not necessary. This patch is just to ignore the
> INTERNAL ERROR in such case. Without this patch, ahci error handler
> will report many errors as below:

Whoa, SERR_INTERNAL on ATAPI check condition?  Just for fun, here's what
the spec says about SERR_INTERNAL.



When media is not ready, command TEST_UNIT_READY fails with ahci irq
status == 0x4001(IRQ_TF_ERROR) and serror == SERR_INTERNEL, then
ahci error handler calls atapi_eh_request_sense() and sets
ATA_QCFLAG_SENSE_VALID. Command REQUEST_SENSE executes successfully
and atapi_qc_complete() sets result = SAM_STAT_CHECK_CONDITION, and
now the whole TEST_UNIT_READY request is done and returns .



E  Internal error: The host bus adapter experienced an internal error
that caused the operation to fail and may have put the host bus adapter
into an error state. Host software should reset the interface before
re-trying the operation. If the condition persists, the host bus adapter
may suffer from a design issue rendering it incompatible with the
attached device.



Yes, I saw this too :) and I am contacting the hardware engineers to
check if there is any hardware bug.
But, even though this were a hardware bug and could be fixed, we would
still need this patch since many SB600 boards have already come into
the market and those ASICs can never be fixed :(
So, if no errors in this patch, could Jeff please apply it ASAP?



Anyways thanks for fixing this.  Just a few comments.

> --- linux-2.6.21-rc3-git8/drivers/ata/ahci.c.orig2007-03-25
> 20:57:31.0 +0800
> +++ linux-2.6.21-rc3-git8/drivers/ata/ahci.c2007-03-25
> 21:03:54.0 +0800
> @@ -80,6 +80,7 @@ enum {
> board_ahci_pi= 1,
> board_ahci_vt8251= 2,
> board_ahci_ign_iferr= 3,
> +board_ahci_ati= 4,
>
> /* global controller registers */
> HOST_CAP= 0x00, /* host capabilities */
> @@ -168,6 +169,7 @@ enum {
> AHCI_FLAG_NO_NCQ= (1 << 24),
> AHCI_FLAG_IGN_IRQ_IF_ERR= (1 << 25), /* ignore IRQ_IF_ERR */
> AHCI_FLAG_HONOR_PI= (1 << 26), /* honor PORTS_IMPL */
> +AHCI_FLAG_TF_ERR_FIX= (1 << 27), /* ignore INTERNAL ERROR on
> IRQ_TF_ERROR */

Can we use board_ahci_ign_interr and AHCI_FLAG_IGN_SERR_INTERNAL to keep
it more consistent with the other IGN flag?

> -{ PCI_VDEVICE(ATI, 0x4380), board_ahci }, /* ATI SB600 non-raid */
> +{ PCI_VDEVICE(ATI, 0x4380), board_ahci_ati }, /* ATI SB600 non-raid */
> { PCI_VDEVICE(ATI, 0x4381), board_ahci }, /* ATI SB600 raid */

4381 isn't affected while 4380 is?


I never see such an ID, and plan to remove 0x4381.
The patch which added the PCI IDs was not sent out by myself. I
checked all SB600 boards, and not found any 0x4381 controller, only
0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
device ID, only with class code different.




Hmmm... Okay, this is weird.  I'm feeling very strong deja vu.

Well, I must be getting alzheimer.  I did almost the same patch a month
ago and was waiting for verification to properly submit the patch.

  http://thread.gmane.org/gmane.linux.ide/16049/focus=16437

Anyways, Conke Hu, can you please take a look at my patch from a month
ago?  It's almost identical but SERR_INTERNAL is always ignored on both
SB600 PCI IDs, which I think is safer.  Does this fix what you're seeing?



I just read your patch. Another difference is that my patch ignores
SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?
The following is some detail:
// your patch:
+   if (ap->flags & AHCI_FLAG_IGN_SERR_INTERNAL)
+   serr &= ~SERR_INTERNAL;

// mine:
-   if (irq_stat & PORT_IRQ_TF_ERR)
+   if (irq_stat & PORT_IRQ_TF_ERR) {
  err_mask |= AC_ERR_DEV;
+
+   /* some controllers set INTERNAL ERROR on ATAPI
IRQ_TF_ERROR, ignore it */
+   if ((serror & SERR_INTERNAL) &&
+(ap->flags & AHCI_FLAG_TF_ERR_FIX) &&
+ qc && qc->dev->class == ATA_DEV_ATAPI) {
+   serror &= ~SERR_INTERNAL;
+   }
+   }

Tejun, you do me a great favor, thank you so much! for your previous
help, too :)

Conke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-03-15 Thread Conke Hu

On 3/14/07, Tejun Heo [EMAIL PROTECTED] wrote:

Hello,

Conke Hu wrote:
When there is no media in SATA CD/DVD drive or media is not ready,
 AHCI controller fails to execute the ATAPI commands TEST_UNIT_READY,
 READ_CAPACITY or READ_TOC and reports PORT_IRQ_TF_ERR. But ATI SB600
 SATA controller sets SERR_INTERNAL bit in the error register at the
 same time, which is not necessary. This patch is just to ignore the
 INTERNAL ERROR in such case. Without this patch, ahci error handler
 will report many errors as below:

Whoa, SERR_INTERNAL on ATAPI check condition?  Just for fun, here's what
the spec says about SERR_INTERNAL.



When media is not ready, command TEST_UNIT_READY fails with ahci irq
status == 0x4001(IRQ_TF_ERROR) and serror == SERR_INTERNEL, then
ahci error handler calls atapi_eh_request_sense() and sets
ATA_QCFLAG_SENSE_VALID. Command REQUEST_SENSE executes successfully
and atapi_qc_complete() sets result = SAM_STAT_CHECK_CONDITION, and
now the whole TEST_UNIT_READY request is done and returns .



E  Internal error: The host bus adapter experienced an internal error
that caused the operation to fail and may have put the host bus adapter
into an error state. Host software should reset the interface before
re-trying the operation. If the condition persists, the host bus adapter
may suffer from a design issue rendering it incompatible with the
attached device.



Yes, I saw this too :) and I am contacting the hardware engineers to
check if there is any hardware bug.
But, even though this were a hardware bug and could be fixed, we would
still need this patch since many SB600 boards have already come into
the market and those ASICs can never be fixed :(
So, if no errors in this patch, could Jeff please apply it ASAP?



Anyways thanks for fixing this.  Just a few comments.

 --- linux-2.6.21-rc3-git8/drivers/ata/ahci.c.orig2007-03-25
 20:57:31.0 +0800
 +++ linux-2.6.21-rc3-git8/drivers/ata/ahci.c2007-03-25
 21:03:54.0 +0800
 @@ -80,6 +80,7 @@ enum {
 board_ahci_pi= 1,
 board_ahci_vt8251= 2,
 board_ahci_ign_iferr= 3,
 +board_ahci_ati= 4,

 /* global controller registers */
 HOST_CAP= 0x00, /* host capabilities */
 @@ -168,6 +169,7 @@ enum {
 AHCI_FLAG_NO_NCQ= (1  24),
 AHCI_FLAG_IGN_IRQ_IF_ERR= (1  25), /* ignore IRQ_IF_ERR */
 AHCI_FLAG_HONOR_PI= (1  26), /* honor PORTS_IMPL */
 +AHCI_FLAG_TF_ERR_FIX= (1  27), /* ignore INTERNAL ERROR on
 IRQ_TF_ERROR */

Can we use board_ahci_ign_interr and AHCI_FLAG_IGN_SERR_INTERNAL to keep
it more consistent with the other IGN flag?

 -{ PCI_VDEVICE(ATI, 0x4380), board_ahci }, /* ATI SB600 non-raid */
 +{ PCI_VDEVICE(ATI, 0x4380), board_ahci_ati }, /* ATI SB600 non-raid */
 { PCI_VDEVICE(ATI, 0x4381), board_ahci }, /* ATI SB600 raid */

4381 isn't affected while 4380 is?


I never see such an ID, and plan to remove 0x4381.
The patch which added the PCI IDs was not sent out by myself. I
checked all SB600 boards, and not found any 0x4381 controller, only
0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
device ID, only with class code different.




Hmmm... Okay, this is weird.  I'm feeling very strong deja vu.

Well, I must be getting alzheimer.  I did almost the same patch a month
ago and was waiting for verification to properly submit the patch.

  http://thread.gmane.org/gmane.linux.ide/16049/focus=16437

Anyways, Conke Hu, can you please take a look at my patch from a month
ago?  It's almost identical but SERR_INTERNAL is always ignored on both
SB600 PCI IDs, which I think is safer.  Does this fix what you're seeing?



I just read your patch. Another difference is that my patch ignores
SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?
The following is some detail:
// your patch:
+   if (ap-flags  AHCI_FLAG_IGN_SERR_INTERNAL)
+   serr = ~SERR_INTERNAL;

// mine:
-   if (irq_stat  PORT_IRQ_TF_ERR)
+   if (irq_stat  PORT_IRQ_TF_ERR) {
  err_mask |= AC_ERR_DEV;
+
+   /* some controllers set INTERNAL ERROR on ATAPI
IRQ_TF_ERROR, ignore it */
+   if ((serror  SERR_INTERNAL) 
+(ap-flags  AHCI_FLAG_TF_ERR_FIX) 
+ qc  qc-dev-class == ATA_DEV_ATAPI) {
+   serror = ~SERR_INTERNAL;
+   }
+   }

Tejun, you do me a great favor, thank you so much! for your previous
help, too :)

Conke
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-03-15 Thread Tejun Heo
Conke Hu wrote:
 E  Internal error: The host bus adapter experienced an internal error
 that caused the operation to fail and may have put the host bus adapter
 into an error state. Host software should reset the interface before
 re-trying the operation. If the condition persists, the host bus adapter
 may suffer from a design issue rendering it incompatible with the
 attached device.

 
 Yes, I saw this too :) and I am contacting the hardware engineers to
 check if there is any hardware bug.
 But, even though this were a hardware bug and could be fixed, we would
 still need this patch since many SB600 boards have already come into
 the market and those ASICs can never be fixed :(

Yeap, we certainly need the workaround.  I was just having a little fun.
 :-)

 4381 isn't affected while 4380 is?
 
 I never see such an ID, and plan to remove 0x4381.
 The patch which added the PCI IDs was not sent out by myself. I
 checked all SB600 boards, and not found any 0x4381 controller, only
 0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
 device ID, only with class code different.

I see.

 Anyways, Conke Hu, can you please take a look at my patch from a month
 ago?  It's almost identical but SERR_INTERNAL is always ignored on both
 SB600 PCI IDs, which I think is safer.  Does this fix what you're seeing?

 
 I just read your patch. Another difference is that my patch ignores
 SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
 other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?

Yeah, I noticed the difference.  I don't really care but I was thinking
that SERR_INTERNAL might be set in other similar situations too.  e.g.
TF error from ATA device or what not, so I thought it would be safer to
ignore the bit altogether.  You probably need to consult your hardware
people about when exactly the bit misbehaves but unless proven
otherwise, I'd prefer to always ignore the bit.  Also, please rename the
enum constant and flag name.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-03-14 Thread Tejun Heo
Hello,

Conke Hu wrote:
>When there is no media in SATA CD/DVD drive or media is not ready,
> AHCI controller fails to execute the ATAPI commands TEST_UNIT_READY,
> READ_CAPACITY or READ_TOC and reports PORT_IRQ_TF_ERR. But ATI SB600
> SATA controller sets SERR_INTERNAL bit in the error register at the
> same time, which is not necessary. This patch is just to ignore the
> INTERNAL ERROR in such case. Without this patch, ahci error handler
> will report many errors as below:

Whoa, SERR_INTERNAL on ATAPI check condition?  Just for fun, here's what
the spec says about SERR_INTERNAL.

E  Internal error: The host bus adapter experienced an internal error
that caused the operation to fail and may have put the host bus adapter
into an error state. Host software should reset the interface before
re-trying the operation. If the condition persists, the host bus adapter
may suffer from a design issue rendering it incompatible with the
attached device.

Anyways thanks for fixing this.  Just a few comments.

> --- linux-2.6.21-rc3-git8/drivers/ata/ahci.c.orig2007-03-25
> 20:57:31.0 +0800
> +++ linux-2.6.21-rc3-git8/drivers/ata/ahci.c2007-03-25
> 21:03:54.0 +0800
> @@ -80,6 +80,7 @@ enum {
> board_ahci_pi= 1,
> board_ahci_vt8251= 2,
> board_ahci_ign_iferr= 3,
> +board_ahci_ati= 4,
> 
> /* global controller registers */
> HOST_CAP= 0x00, /* host capabilities */
> @@ -168,6 +169,7 @@ enum {
> AHCI_FLAG_NO_NCQ= (1 << 24),
> AHCI_FLAG_IGN_IRQ_IF_ERR= (1 << 25), /* ignore IRQ_IF_ERR */
> AHCI_FLAG_HONOR_PI= (1 << 26), /* honor PORTS_IMPL */
> +AHCI_FLAG_TF_ERR_FIX= (1 << 27), /* ignore INTERNAL ERROR on
> IRQ_TF_ERROR */

Can we use board_ahci_ign_interr and AHCI_FLAG_IGN_SERR_INTERNAL to keep
it more consistent with the other IGN flag?

> -{ PCI_VDEVICE(ATI, 0x4380), board_ahci }, /* ATI SB600 non-raid */
> +{ PCI_VDEVICE(ATI, 0x4380), board_ahci_ati }, /* ATI SB600 non-raid */
> { PCI_VDEVICE(ATI, 0x4381), board_ahci }, /* ATI SB600 raid */

4381 isn't affected while 4380 is?

Hmmm... Okay, this is weird.  I'm feeling very strong deja vu.

Well, I must be getting alzheimer.  I did almost the same patch a month
ago and was waiting for verification to properly submit the patch.

  http://thread.gmane.org/gmane.linux.ide/16049/focus=16437

Anyways, Conke Hu, can you please take a look at my patch from a month
ago?  It's almost identical but SERR_INTERNAL is always ignored on both
SB600 PCI IDs, which I think is safer.  Does this fix what you're seeing?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-03-14 Thread Conke Hu

   When there is no media in SATA CD/DVD drive or media is not ready,
AHCI controller fails to execute the ATAPI commands TEST_UNIT_READY,
READ_CAPACITY or READ_TOC and reports PORT_IRQ_TF_ERR. But ATI SB600
SATA controller sets SERR_INTERNAL bit in the error register at the
same time, which is not necessary. This patch is just to ignore the
INTERNAL ERROR in such case. Without this patch, ahci error handler
will report many errors as below:
--- cut from dmesg ---
ata9: soft resetting port
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x2
ata9.00: (irq_stat 0x4001)
ata9.00: cmd a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
res 51/24:03:00:00:20/00:00:00:00:00/a0 Emask 0x40 (internal error)
ata9: soft resetting port
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x2
ata9.00: (irq_stat 0x4001)
ata9.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x43 data 12 in
res 51/24:03:00:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
 end cut -

Signed-off-by: Conke Hu <[EMAIL PROTECTED]>

--- linux-2.6.21-rc3-git8/drivers/ata/ahci.c.orig   2007-03-25
20:57:31.0 +0800
+++ linux-2.6.21-rc3-git8/drivers/ata/ahci.c2007-03-25 21:03:54.0 
+0800
@@ -80,6 +80,7 @@ enum {
board_ahci_pi   = 1,
board_ahci_vt8251   = 2,
board_ahci_ign_iferr= 3,
+   board_ahci_ati  = 4,

/* global controller registers */
HOST_CAP= 0x00, /* host capabilities */
@@ -168,6 +169,7 @@ enum {
AHCI_FLAG_NO_NCQ= (1 << 24),
AHCI_FLAG_IGN_IRQ_IF_ERR= (1 << 25), /* ignore IRQ_IF_ERR */
AHCI_FLAG_HONOR_PI  = (1 << 26), /* honor PORTS_IMPL */
+   AHCI_FLAG_TF_ERR_FIX= (1 << 27), /* ignore INTERNAL ERROR on 
IRQ_TF_ERROR */
};

struct ahci_cmd_hdr {
@@ -362,6 +364,16 @@ static const struct ata_port_info ahci_p
.udma_mask  = 0x7f, /* udma0-6 ; FIXME */
.port_ops   = _ops,
},
+   /* board_ahci_ati */
+   {
+   .sht= _sht,
+   .flags  = ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY |
+ ATA_FLAG_MMIO | ATA_FLAG_PIO_DMA |
+ ATA_FLAG_SKIP_D2H_BSY | AHCI_FLAG_TF_ERR_FIX,
+   .pio_mask   = 0x1f, /* pio0-4 */
+   .udma_mask  = 0x7f, /* udma0-6 ; FIXME */
+   .port_ops   = _ops,
+   },  
};

static const struct pci_device_id ahci_pci_tbl[] = {
@@ -399,7 +411,7 @@ static const struct pci_device_id ahci_p
  PCI_CLASS_STORAGE_SATA_AHCI, 0xff, board_ahci_ign_iferr },

/* ATI */
-   { PCI_VDEVICE(ATI, 0x4380), board_ahci }, /* ATI SB600 non-raid */
+   { PCI_VDEVICE(ATI, 0x4380), board_ahci_ati }, /* ATI SB600 non-raid */
{ PCI_VDEVICE(ATI, 0x4381), board_ahci }, /* ATI SB600 raid */

/* VIA */
@@ -1063,12 +1075,22 @@ static void ahci_error_intr(struct ata_p
/* analyze @irq_stat */
ata_ehi_push_desc(ehi, "irq_stat 0x%08x", irq_stat);

+   qc = ata_qc_from_tag(ap, ap->active_tag);
+
/* some controllers set IRQ_IF_ERR on device errors, ignore it */
if (ap->flags & AHCI_FLAG_IGN_IRQ_IF_ERR)
irq_stat &= ~PORT_IRQ_IF_ERR;

-   if (irq_stat & PORT_IRQ_TF_ERR)
+   if (irq_stat & PORT_IRQ_TF_ERR) {
err_mask |= AC_ERR_DEV;
+   
+   /* some controllers set INTERNAL ERROR on ATAPI IRQ_TF_ERROR, 
ignore it */
+   if ((serror & SERR_INTERNAL) &&
+(ap->flags & AHCI_FLAG_TF_ERR_FIX) &&
+ qc && qc->dev->class == ATA_DEV_ATAPI) {
+   serror &= ~SERR_INTERNAL;
+   }
+   }

if (irq_stat & (PORT_IRQ_HBUS_ERR | PORT_IRQ_HBUS_DATA_ERR)) {
err_mask |= AC_ERR_HOST_BUS;
@@ -1100,7 +1122,6 @@ static void ahci_error_intr(struct ata_p
ehi->serror |= serror;
ehi->action |= action;

-   qc = ata_qc_from_tag(ap, ap->active_tag);
if (qc)
qc->err_mask |= err_mask;
else
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-03-14 Thread Conke Hu

   When there is no media in SATA CD/DVD drive or media is not ready,
AHCI controller fails to execute the ATAPI commands TEST_UNIT_READY,
READ_CAPACITY or READ_TOC and reports PORT_IRQ_TF_ERR. But ATI SB600
SATA controller sets SERR_INTERNAL bit in the error register at the
same time, which is not necessary. This patch is just to ignore the
INTERNAL ERROR in such case. Without this patch, ahci error handler
will report many errors as below:
--- cut from dmesg ---
ata9: soft resetting port
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x2
ata9.00: (irq_stat 0x4001)
ata9.00: cmd a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
res 51/24:03:00:00:20/00:00:00:00:00/a0 Emask 0x40 (internal error)
ata9: soft resetting port
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x2
ata9.00: (irq_stat 0x4001)
ata9.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x43 data 12 in
res 51/24:03:00:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
 end cut -

Signed-off-by: Conke Hu [EMAIL PROTECTED]

--- linux-2.6.21-rc3-git8/drivers/ata/ahci.c.orig   2007-03-25
20:57:31.0 +0800
+++ linux-2.6.21-rc3-git8/drivers/ata/ahci.c2007-03-25 21:03:54.0 
+0800
@@ -80,6 +80,7 @@ enum {
board_ahci_pi   = 1,
board_ahci_vt8251   = 2,
board_ahci_ign_iferr= 3,
+   board_ahci_ati  = 4,

/* global controller registers */
HOST_CAP= 0x00, /* host capabilities */
@@ -168,6 +169,7 @@ enum {
AHCI_FLAG_NO_NCQ= (1  24),
AHCI_FLAG_IGN_IRQ_IF_ERR= (1  25), /* ignore IRQ_IF_ERR */
AHCI_FLAG_HONOR_PI  = (1  26), /* honor PORTS_IMPL */
+   AHCI_FLAG_TF_ERR_FIX= (1  27), /* ignore INTERNAL ERROR on 
IRQ_TF_ERROR */
};

struct ahci_cmd_hdr {
@@ -362,6 +364,16 @@ static const struct ata_port_info ahci_p
.udma_mask  = 0x7f, /* udma0-6 ; FIXME */
.port_ops   = ahci_ops,
},
+   /* board_ahci_ati */
+   {
+   .sht= ahci_sht,
+   .flags  = ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY |
+ ATA_FLAG_MMIO | ATA_FLAG_PIO_DMA |
+ ATA_FLAG_SKIP_D2H_BSY | AHCI_FLAG_TF_ERR_FIX,
+   .pio_mask   = 0x1f, /* pio0-4 */
+   .udma_mask  = 0x7f, /* udma0-6 ; FIXME */
+   .port_ops   = ahci_ops,
+   },  
};

static const struct pci_device_id ahci_pci_tbl[] = {
@@ -399,7 +411,7 @@ static const struct pci_device_id ahci_p
  PCI_CLASS_STORAGE_SATA_AHCI, 0xff, board_ahci_ign_iferr },

/* ATI */
-   { PCI_VDEVICE(ATI, 0x4380), board_ahci }, /* ATI SB600 non-raid */
+   { PCI_VDEVICE(ATI, 0x4380), board_ahci_ati }, /* ATI SB600 non-raid */
{ PCI_VDEVICE(ATI, 0x4381), board_ahci }, /* ATI SB600 raid */

/* VIA */
@@ -1063,12 +1075,22 @@ static void ahci_error_intr(struct ata_p
/* analyze @irq_stat */
ata_ehi_push_desc(ehi, irq_stat 0x%08x, irq_stat);

+   qc = ata_qc_from_tag(ap, ap-active_tag);
+
/* some controllers set IRQ_IF_ERR on device errors, ignore it */
if (ap-flags  AHCI_FLAG_IGN_IRQ_IF_ERR)
irq_stat = ~PORT_IRQ_IF_ERR;

-   if (irq_stat  PORT_IRQ_TF_ERR)
+   if (irq_stat  PORT_IRQ_TF_ERR) {
err_mask |= AC_ERR_DEV;
+   
+   /* some controllers set INTERNAL ERROR on ATAPI IRQ_TF_ERROR, 
ignore it */
+   if ((serror  SERR_INTERNAL) 
+(ap-flags  AHCI_FLAG_TF_ERR_FIX) 
+ qc  qc-dev-class == ATA_DEV_ATAPI) {
+   serror = ~SERR_INTERNAL;
+   }
+   }

if (irq_stat  (PORT_IRQ_HBUS_ERR | PORT_IRQ_HBUS_DATA_ERR)) {
err_mask |= AC_ERR_HOST_BUS;
@@ -1100,7 +1122,6 @@ static void ahci_error_intr(struct ata_p
ehi-serror |= serror;
ehi-action |= action;

-   qc = ata_qc_from_tag(ap, ap-active_tag);
if (qc)
qc-err_mask |= err_mask;
else
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

2007-03-14 Thread Tejun Heo
Hello,

Conke Hu wrote:
When there is no media in SATA CD/DVD drive or media is not ready,
 AHCI controller fails to execute the ATAPI commands TEST_UNIT_READY,
 READ_CAPACITY or READ_TOC and reports PORT_IRQ_TF_ERR. But ATI SB600
 SATA controller sets SERR_INTERNAL bit in the error register at the
 same time, which is not necessary. This patch is just to ignore the
 INTERNAL ERROR in such case. Without this patch, ahci error handler
 will report many errors as below:

Whoa, SERR_INTERNAL on ATAPI check condition?  Just for fun, here's what
the spec says about SERR_INTERNAL.

E  Internal error: The host bus adapter experienced an internal error
that caused the operation to fail and may have put the host bus adapter
into an error state. Host software should reset the interface before
re-trying the operation. If the condition persists, the host bus adapter
may suffer from a design issue rendering it incompatible with the
attached device.

Anyways thanks for fixing this.  Just a few comments.

 --- linux-2.6.21-rc3-git8/drivers/ata/ahci.c.orig2007-03-25
 20:57:31.0 +0800
 +++ linux-2.6.21-rc3-git8/drivers/ata/ahci.c2007-03-25
 21:03:54.0 +0800
 @@ -80,6 +80,7 @@ enum {
 board_ahci_pi= 1,
 board_ahci_vt8251= 2,
 board_ahci_ign_iferr= 3,
 +board_ahci_ati= 4,
 
 /* global controller registers */
 HOST_CAP= 0x00, /* host capabilities */
 @@ -168,6 +169,7 @@ enum {
 AHCI_FLAG_NO_NCQ= (1  24),
 AHCI_FLAG_IGN_IRQ_IF_ERR= (1  25), /* ignore IRQ_IF_ERR */
 AHCI_FLAG_HONOR_PI= (1  26), /* honor PORTS_IMPL */
 +AHCI_FLAG_TF_ERR_FIX= (1  27), /* ignore INTERNAL ERROR on
 IRQ_TF_ERROR */

Can we use board_ahci_ign_interr and AHCI_FLAG_IGN_SERR_INTERNAL to keep
it more consistent with the other IGN flag?

 -{ PCI_VDEVICE(ATI, 0x4380), board_ahci }, /* ATI SB600 non-raid */
 +{ PCI_VDEVICE(ATI, 0x4380), board_ahci_ati }, /* ATI SB600 non-raid */
 { PCI_VDEVICE(ATI, 0x4381), board_ahci }, /* ATI SB600 raid */

4381 isn't affected while 4380 is?

Hmmm... Okay, this is weird.  I'm feeling very strong deja vu.

Well, I must be getting alzheimer.  I did almost the same patch a month
ago and was waiting for verification to properly submit the patch.

  http://thread.gmane.org/gmane.linux.ide/16049/focus=16437

Anyways, Conke Hu, can you please take a look at my patch from a month
ago?  It's almost identical but SERR_INTERNAL is always ignored on both
SB600 PCI IDs, which I think is safer.  Does this fix what you're seeing?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/