Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On Wed, Jan 12, 2011 at 08:32:12PM -0500, Brad wrote: The following diff is ported from NetBSD (the workaround originated from OpenSolaris) to workaround the issue of data corruption with the ALI M5229 IDE chipset when using UltraDMA. Same workaround is also used by FreeBSD/Linux. This chipset is found in some sparc64 systems such as the Blade 100 and Netra X1. I don't have any such systems but I went digging for this being curious why the nasty hack was added to the kernel configs to disable UltraDMA to workaround this bug and thus penalizing other IDE/SATA controllers that could be in the same system. If you have one of the mentioned systems please test this. Ok with a bit more digging I think I found out why the workaround is not functioning correctly. I found in rev 1.90 of wdc.c jsg@ added the infrastructure to allow for the reset callback but then part of it was reverted by miod@ in rev 1.93 due to an issue with a NULL pointer dereference on some systems and no one bothered to go back and fix it. I brought over the fix for this issue from NetBSD. This needs testing on any IDE/SATA controllers. Index: wdc.c === RCS file: /home/cvs/src/sys/dev/ic/wdc.c,v retrieving revision 1.109 diff -u -p -r1.109 wdc.c --- wdc.c 21 Sep 2010 03:33:32 - 1.109 +++ wdc.c 23 Jan 2011 19:46:03 - @@ -589,6 +589,9 @@ wdcprobe(struct channel_softc *chp) int savedmask = wdcdebug_mask; #endif + if (chp-wdc-reset == NULL) + chp-wdc-reset = wdc_do_reset; + if (chp-_vtbl == 0) { int s = splbio(); chp-_vtbl = wdc_default_vtbl; @@ -628,7 +631,7 @@ wdcprobe(struct channel_softc *chp) } /* reset the channel */ - wdc_do_reset(chp); + chp-wdc-reset(chp); ret_value = __wdcwait_reset(chp, ret_value); WDCDEBUG_PRINT((%s:%d: after reset, ret_value=0x%d\n, -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On Sun, Jan 23, 2011 at 04:27:15PM -0500, Brad wrote: Ok with a bit more digging I think I found out why the workaround is not functioning correctly. I found in rev 1.90 of wdc.c jsg@ added the infrastructure to allow for the reset callback but then part of it was reverted by miod@ in rev 1.93 due to an issue with a NULL pointer dereference on some systems and no one bothered to go back and fix it. I brought over the fix for this issue from NetBSD. This needs testing on any IDE/SATA controllers. FWIW, I tried this patch together with the workaround and a printf added to acer_do_reset on my Blade 150. I'm seeing that - acer_do_reset is beeing called - there are still read errors just after the resets: Jan 23 23:16:12 gilda /bsd: acer_do_reset Jan 23 23:16:13 gilda /bsd: wd0a: DMA error reading fsbn 14097936 of 14097936-14097967 (wd0 bn 14097936; cn 3455 tn 6 sn 6), retrying Jan 23 23:16:13 gilda /bsd: wd0: soft error (corrected) Jan 24 01:30:30 gilda /bsd: acer_do_reset Jan 24 01:30:31 gilda /bsd: wd0a: DMA error reading fsbn 10239004 of 10239004-10239035 (wd0 bn 10239004; cn 2509 tn 8 sn 244), retrying Jan 24 01:30:31 gilda /bsd: wd0: soft error (corrected) Jan 24 01:30:35 gilda /bsd: acer_do_reset Jan 24 01:30:37 gilda /bsd: wd0a: DMA error reading fsbn 10366820 of 10366820-10366851 (wd0 bn 10366820; cn 2540 tn 14 sn 50), retrying Jan 24 01:30:37 gilda /bsd: wd0: soft error (corrected) As far as I'm concerned, I'd stick to UDMA2 for this stupid controller rather than wasting more time on this. -- Matthieu Herrb
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On Thursday 20 January 2011 20:22:52 Ted Unangst wrote: I know miod suggested the ifdef, but is there any benefit? Is there any reason to believe whatever this bug is doesn't affect the same silicon on i386? (Or that the same rev is different silicon?) On the flipside, besides being a little slower, is there any harm to i386 by enabling the cap? I don't see any and it just adds an ugly ifdef in MI code. The hw errata isn't specific to any arch, its just the chip is a lot more common in sparc64 systems vs i386. No harm, just slower. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On Wed, Jan 12, 2011 at 08:32:12PM -0500, Brad wrote: The following diff is ported from NetBSD (the workaround originated from OpenSolaris) to workaround the issue of data corruption with the ALI M5229 IDE chipset when using UltraDMA. Same workaround is also used by FreeBSD/Linux. This chipset is found in some sparc64 systems such as the Blade 100 and Netra X1. I don't have any such systems but I went digging for this being curious why the nasty hack was added to the kernel configs to disable UltraDMA to workaround this bug and thus penalizing other IDE/SATA controllers that could be in the same system. If you have one of the mentioned systems please test this. Here is the alternate workaround for the time being. Please test. Index: dev/pci/pciide.c === RCS file: /home/cvs/src/sys/dev/pci/pciide.c,v retrieving revision 1.323 diff -u -p -r1.323 pciide.c --- dev/pci/pciide.c18 Nov 2010 18:12:52 - 1.323 +++ dev/pci/pciide.c21 Jan 2011 00:19:49 - @@ -5639,6 +5639,10 @@ acer_chip_map(struct pciide_softc *sc, s sc-sc_wdcdev.cap |= WDC_CAPABILITY_UDMA; if (rev = 0xC4) sc-sc_wdcdev.UDMA_cap = 5; +#ifdef __sparc64__ + else if (rev == 0xC3) + sc-sc_wdcdev.UDMA_cap = 2; +#endif else if (rev = 0xC2) sc-sc_wdcdev.UDMA_cap = 4; else Index: arch/sparc64/conf/GENERIC === RCS file: /home/cvs/src/sys/arch/sparc64/conf/GENERIC,v retrieving revision 1.262 diff -u -p -r1.262 GENERIC --- arch/sparc64/conf/GENERIC 8 Jan 2011 11:56:30 - 1.262 +++ arch/sparc64/conf/GENERIC 15 Jan 2011 18:44:11 - @@ -382,7 +382,7 @@ stty* at spif? sbpp* at spif? pciide*at pci? flags 0x -wd*at pciide? flags 0x0a00 +wd*at pciide? flags 0x atapiscsi* at pciide? scsibus* at atapiscsi? Index: arch/sparc64/conf/RAMDISK === RCS file: /home/cvs/src/sys/arch/sparc64/conf/RAMDISK,v retrieving revision 1.98 diff -u -p -r1.98 RAMDISK --- arch/sparc64/conf/RAMDISK 19 Apr 2010 10:44:33 - 1.98 +++ arch/sparc64/conf/RAMDISK 11 Jan 2011 23:41:13 - @@ -151,7 +151,7 @@ ti* at sbus? gem* at sbus? pciide*at pci? flags 0x -wd*at pciide? flags 0x0a00 +wd*at pciide? flags 0x atapiscsi* at pciide? scsibus* at atapiscsi? Index: arch/sparc64/conf/RAMDISKU5 === RCS file: /home/cvs/src/sys/arch/sparc64/conf/RAMDISKU5,v retrieving revision 1.16 diff -u -p -r1.16 RAMDISKU5 --- arch/sparc64/conf/RAMDISKU5 24 Jun 2009 11:38:40 - 1.16 +++ arch/sparc64/conf/RAMDISKU5 11 Jan 2011 23:41:30 - @@ -55,7 +55,7 @@ pcons0at mainbus0 # PROM console timer* at mainbus0 # Timer chip (some systems) pciide*at pci? flags 0x -wd*at pciide? flags 0x0a00 +wd*at pciide? flags 0x atapiscsi* at pciide? scsibus* at atapiscsi? -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On Thu, Jan 20, 2011 at 7:57 PM, Brad b...@comstyle.com wrote: Here is the alternate workaround for the time being. +#ifdef __sparc64__ + else if (rev == 0xC3) + sc-sc_wdcdev.UDMA_cap = 2; +#endif I know miod suggested the ifdef, but is there any benefit? Is there any reason to believe whatever this bug is doesn't affect the same silicon on i386? (Or that the same rev is different silicon?) On the flipside, besides being a little slower, is there any harm to i386 by enabling the cap?
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
Date: Sun, 16 Jan 2011 18:18:19 +0100 From: Matthieu Herrb matthieu.he...@laas.fr I redid a make build with just that. It finished ok without errors. *but* I noticed about a dozen of error like this one during the build, concerning random block numbers: wd0a: DMA error reading fsbn 12543712 of 12543712-12543743 (wd0 bn 12543712; cn 3074 tn 7 sn 7), retrying wd0: soft error (corrected) and worse, there were about the same number during the first build with Brad's full patch. A previous build last week with no patches at all caused no errors. I've restarted a build with no patches to confirm that it's not dying hw. my conclusion for now is that not only is UDMA 4 still not good, but the patch doesn't make it better. Thanks Matthieu, As far as I'm concerned the diff that started this thread is off the table for now. Earlier in this thread somebody suggested to restrict the pciide downgrade to Ultra-DMA mode 2 to just the broken Acer Labs controller. That is actually really easy to do. The big question is whether we want to do this just on sparc64 (and add an ugly #ifdef __sparc64__ in otherwise MI code), or if we should do this to all rev 0xc3 Acer Labs M5229 controllers. There is at least one of those on a Pentium III machine that has a disk attached that does Ultra-DMA mode 4 now in dmesglog. Opinions?
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
Earlier in this thread somebody suggested to restrict the pciide downgrade to Ultra-DMA mode 2 to just the broken Acer Labs controller. That is actually really easy to do. The big question is whether we want to do this just on sparc64 (and add an ugly #ifdef __sparc64__ in otherwise MI code), or if we should do this to all rev 0xc3 Acer Labs M5229 controllers. There is at least one of those on a Pentium III machine that has a disk attached that does Ultra-DMA mode 4 now in dmesglog. Opinions? Even though this might be a bad idea, for now I'd prefer such a change to be #ifdef __sparc64__. Miod
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On Sunday 16 January 2011 16:44:47 Mark Kettenis wrote: Date: Sun, 16 Jan 2011 18:18:19 +0100 From: Matthieu Herrb matthieu.he...@laas.fr I redid a make build with just that. It finished ok without errors. *but* I noticed about a dozen of error like this one during the build, concerning random block numbers: wd0a: DMA error reading fsbn 12543712 of 12543712-12543743 (wd0 bn 12543712; cn 3074 tn 7 sn 7), retrying wd0: soft error (corrected) and worse, there were about the same number during the first build with Brad's full patch. A previous build last week with no patches at all caused no errors. I've restarted a build with no patches to confirm that it's not dying hw. my conclusion for now is that not only is UDMA 4 still not good, but the patch doesn't make it better. Thanks Matthieu, As far as I'm concerned the diff that started this thread is off the table for now. Earlier in this thread somebody suggested to restrict the pciide downgrade to Ultra-DMA mode 2 to just the broken Acer Labs controller. That is actually really easy to do. The big question is whether we want to do this just on sparc64 (and add an ugly #ifdef __sparc64__ in otherwise MI code), or if we should do this to all rev 0xc3 Acer Labs M5229 controllers. There is at least one of those on a Pentium III machine that has a disk attached that does Ultra-DMA mode 4 now in dmesglog. Opinions? I have no issue which particular workaround is used as long as something is done in the pciide(4) driver instead of the kernel config. I had no knowledge that a similar diff had been tested in private before otherwise I wouldn't have posted it in the first place and would have gone down the route of restricting the Ultra-DMA mode allowed to be used within the driver for that particular revision. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
Date: Fri, 14 Jan 2011 18:56:07 +0100 From: Alexander Schrijver alexander.schrij...@gmail.com The big question of course is whether it will survive a make build with the change that removes the restriction of only using Ultra-DMA up to mode 2, but without the fixes in pciide.c. Beware, that might actually eat your filesystem. I'm doing this right now. I'm running a make build as we speak. I'm not sure when it should fuck up the filesystem. Me neither, that's what makes this diff so difficult to test properly. diff i used: That's the right diff to use for this test. Thanks Alexander!
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On Sat, Jan 15, 2011 at 12:17:54PM +0100, Mark Kettenis wrote: Date: Fri, 14 Jan 2011 18:56:07 +0100 From: Alexander Schrijver alexander.schrij...@gmail.com The big question of course is whether it will survive a make build with the change that removes the restriction of only using Ultra-DMA up to mode 2, but without the fixes in pciide.c. Beware, that might actually eat your filesystem. I'm doing this right now. I'm running a make build as we speak. I'm not sure when it should fuck up the filesystem. Me neither, that's what makes this diff so difficult to test properly. So, make build failed with the following message. I'm not sure what is to blame here. I forgot to upgrade packages to the latest -current. AFAIK the build process isn't dependend on external packages, but i'm not entirely sure if it could accidentally use an externel package. Looking at the commit logs nothing happened in /gnu/ in the last week so it probably isn't the tree. The boot filesystem checks went fine. Nothing new in dmesg either. i'll run it again tomorrow with a decently upgraded system and a fresh /usr/src/. I don't know anything about the perl build system so i can't really investigate that. cvs diff says none of the files have changed in the perl directory. The message with some stuff removed: Running Makefile.PL in cpan/Encode [ ... stuff removed ... ] cp Encode/_PM.e2x ../../lib/Encode/_PM.e2x cp lib/Encode/CJKConstants.pm ../../lib/Encode/CJKConstants.pm make: don't know how to make ExtUtils/xsubpp. Stop in /usr/obj/gnu/usr.bin/perl/cpan/Encode. Unsuccessful make(cpan/Encode): code=512 at make_ext.pl line 449. *** Error code 25 Stop in /usr/src/gnu/usr.bin/perl/obj (line 695 of makefile). *** Error code 1 Stop in /usr/src/gnu/usr.bin/perl (line 81 of /usr/src/gnu/usr.bin/perl/Makefile.bsd-wrapper). *** Error code 1 Stop in /usr/src/gnu/usr.bin (line 48 of /usr/share/mk/bsd.subdir.mk). *** Error code 1 Stop in /usr/src/gnu (line 48 of /usr/share/mk/bsd.subdir.mk). *** Error code 1 Stop in /usr/src (line 48 of /usr/share/mk/bsd.subdir.mk). *** Error code 1 Stop in /usr/src (line 74 of Makefile). complete message: === gnu/usr.bin/perl cd /usr/src/gnu/usr.bin/perl/obj exec /bin/sh cflags.SH Extracting cflags (with variable substitutions) cd /usr/src/gnu/usr.bin/perl/obj exec /bin/sh makeaperl.SH Extracting makeaperl (with variable substitutions) cd /usr/src/gnu/usr.bin/perl/obj exec /bin/sh myconfig.SH Extracting myconfig (with variable substitutions) cd /usr/src/gnu/usr.bin/perl/obj exec /bin/sh Policy_sh.SH Extracting Policy.sh (with variable substitutions) cd /usr/src/gnu/usr.bin/perl/obj/pod exec /bin/sh Makefile.SH Extracting pod/Makefile (with variable substitutions) cd /usr/src/gnu/usr.bin/perl/obj/x2p exec /bin/sh cflags.SH Extracting x2p/cflags (with variable substitutions) cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/gv.c -o gv.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/toke.c -o toke.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c perly.c -o perly.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/pad.c -o pad.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/regcomp.c -o regcomp.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/dump.c -o dump.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/util.c -o util.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/mg.c -o mg.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/reentr.c -o reentr.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/mro.c -o mro.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/hv.c -o hv.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c /usr/src/gnu/usr.bin/perl/av.c -o av.o cc -O2 -pipe -g -fno-strict-aliasing -fno-delete-null-pointer-checks -DPERL_CORE -DPERL_RANDOM_DEVICE=/dev/arandom -I. -c
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On Wed, Jan 12, 2011 at 08:32:12PM -0500, Brad wrote: The following diff is ported from NetBSD (the workaround originated from OpenSolaris) to workaround the issue of data corruption with the ALI M5229 IDE chipset when using UltraDMA. Same workaround is also used by FreeBSD/Linux. This chipset is found in some sparc64 systems such as the Blade 100 and Netra X1. I don't have any such systems but I went digging for this being curious why the nasty hack was added to the kernel configs to disable UltraDMA to workaround this bug and thus penalizing other IDE/SATA controllers that could be in the same system. If you have one of the mentioned systems please test this. My Blade 150 which has this controller seems to survive a make build with this. Before the patch: pciide0 at pci0 dev 13 function 0 Acer Labs M5229 UDMA IDK rev 0xc3: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI pciide0: using ivec 0x7cc for native-PCI interrupt wd0 at pciide0 channel 0 drive 0: WDC WD400BB-22DEA0 atapiscsi0 at pciide0 channel 0 drive 1 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 cd0(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 pciide0: channel 1 disabled (no drives) after: pciide0 at pci0 dev 13 function 0 Acer Labs M5229 UDMA IDK rev 0xc3: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI pciide0: using ivec 0x7cc for native-PCI interrupt wd0 at pciide0 channel 0 drive 0: WDC WD400BB-22DEA0 atapiscsi0 at pciide0 channel 0 drive 1 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 cd0(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 pciide0: channel 1 disabled (no drives) -- Matthieu Herrb
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
Date: Fri, 14 Jan 2011 09:00:09 +0100 From: Matthieu Herrb matthieu.he...@laas.fr On Wed, Jan 12, 2011 at 08:32:12PM -0500, Brad wrote: The following diff is ported from NetBSD (the workaround originated from OpenSolaris) to workaround the issue of data corruption with the ALI M5229 IDE chipset when using UltraDMA. Same workaround is also used by FreeBSD/Linux. This chipset is found in some sparc64 systems such as the Blade 100 and Netra X1. I don't have any such systems but I went digging for this being curious why the nasty hack was added to the kernel configs to disable UltraDMA to workaround this bug and thus penalizing other IDE/SATA controllers that could be in the same system. If you have one of the mentioned systems please test this. My Blade 150 which has this controller seems to survive a make build with this. The big question of course is whether it will survive a make build with the change that removes the restriction of only using Ultra-DMA up to mode 2, but without the fixes in pciide.c. Beware, that might actually eat your filesystem.
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On 14 January 2011 09:11, Mark Kettenis mark.kette...@xs4all.nl wrote: Date: Fri, 14 Jan 2011 09:00:09 +0100 From: Matthieu Herrb matthieu.he...@laas.fr On Wed, Jan 12, 2011 at 08:32:12PM -0500, Brad wrote: The following diff is ported from NetBSD (the workaround originated from OpenSolaris) to workaround the issue of data corruption with the ALI M5229 IDE chipset when using UltraDMA. Same workaround is also used by FreeBSD/Linux. This chipset is found in some sparc64 systems such as the Blade 100 and Netra X1. I don't have any such systems but I went digging for this being curious why the nasty hack was added to the kernel configs to disable UltraDMA to workaround this bug and thus penalizing other IDE/SATA controllers that could be in the same system. If you have one of the mentioned systems please test this. My Blade 150 which has this controller seems to survive a make build with this. The big question of course is whether it will survive a make build with the change that removes the restriction of only using Ultra-DMA up to mode 2, but without the fixes in pciide.c. Beware, that might actually eat your filesystem. Even so, can't we force udma 2 for that chipset only ? considering the fix in pciide isn't enough. Thing is I ran into this problem with a sili 3114 (pciide) in an ultra 5, had to manually change the wd flags to get past UDMA2.
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
The big question of course is whether it will survive a make build with the change that removes the restriction of only using Ultra-DMA up to mode 2, but without the fixes in pciide.c. Beware, that might actually eat your filesystem. I'm doing this right now. I'm running a make build as we speak. I'm not sure when it should fuck up the filesystem. before: pciide0 at pci0 dev 13 function 0 Acer Labs M5229 UDMA IDE rev 0xc3: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI pciide0: using ivec 0x7cc for native-PCI interrupt wd0 at pciide0 channel 0 drive 0: ST340014A wd0: 16-sector PIO, LBA48, 38166MB, 78165360 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 wd1 at pciide0 channel 1 drive 0: ST340014A wd1: 16-sector PIO, LBA48, 38166MB, 78165360 sectors wd1(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 after: pciide0: using ivec 0x7cc for native-PCI interrupt wd0 at pciide0 channel 0 drive 0: ST340014A wd0: 16-sector PIO, LBA48, 38166MB, 78165360 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 wd1 at pciide0 channel 1 drive 0: ST340014A wd1: 16-sector PIO, LBA48, 38166MB, 78165360 sectors wd1(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 4 diff i used: Index: arch/sparc64/conf/GENERIC === RCS file: /home/cvs/src/sys/arch/sparc64/conf/GENERIC,v retrieving revision 1.261 diff -u -p -r1.261 GENERIC --- arch/sparc64/conf/GENERIC 12 Dec 2010 14:33:57 - 1.261 +++ arch/sparc64/conf/GENERIC 11 Jan 2011 23:41:06 - @@ -381,7 +381,7 @@ stty* at spif? sbpp* at spif? pciide*at pci? flags 0x -wd*at pciide? flags 0x0a00 +wd*at pciide? flags 0x atapiscsi* at pciide? scsibus* at atapiscsi? Index: arch/sparc64/conf/RAMDISK === RCS file: /home/cvs/src/sys/arch/sparc64/conf/RAMDISK,v retrieving revision 1.98 diff -u -p -r1.98 RAMDISK --- arch/sparc64/conf/RAMDISK 19 Apr 2010 10:44:33 - 1.98 +++ arch/sparc64/conf/RAMDISK 11 Jan 2011 23:41:13 - @@ -151,7 +151,7 @@ ti* at sbus? gem* at sbus? pciide*at pci? flags 0x -wd*at pciide? flags 0x0a00 +wd*at pciide? flags 0x atapiscsi* at pciide? scsibus* at atapiscsi? Index: arch/sparc64/conf/RAMDISKU5 === RCS file: /home/cvs/src/sys/arch/sparc64/conf/RAMDISKU5,v retrieving revision 1.16 diff -u -p -r1.16 RAMDISKU5 --- arch/sparc64/conf/RAMDISKU5 24 Jun 2009 11:38:40 - 1.16 +++ arch/sparc64/conf/RAMDISKU5 11 Jan 2011 23:41:30 - @@ -55,7 +55,7 @@ pcons0at mainbus0 # PROM console timer* at mainbus0 # Timer chip (some systems) pciide*at pci? flags 0x -wd*at pciide? flags 0x0a00 +wd*at pciide? flags 0x atapiscsi* at pciide? scsibus* at atapiscsi?
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On Wed, Jan 12, 2011 at 08:32:12PM -0500, Brad wrote: The following diff is ported from NetBSD (the workaround originated from OpenSolaris) to workaround the issue of data corruption with the ALI M5229 IDE chipset when using UltraDMA. Same workaround is also used by FreeBSD/Linux. This chipset is found in some sparc64 systems such as the Blade 100 and Netra X1. As well as Blade 1500, Fire T200, Fire v120 and Fire v210.. I don't have any such systems but I went digging for this being curious why the nasty hack was added to the kernel configs to disable UltraDMA to workaround this bug and thus penalizing other IDE/SATA controllers that could be in the same system. If you have one of the mentioned systems please test this. Index: dev/pci/pciide.c === RCS file: /home/cvs/src/sys/dev/pci/pciide.c,v retrieving revision 1.323 diff -u -p -r1.323 pciide.c --- dev/pci/pciide.c 18 Nov 2010 18:12:52 - 1.323 +++ dev/pci/pciide.c 13 Jan 2011 00:22:14 - @@ -212,6 +212,8 @@ void natsemi_irqack(struct channel_softc void ns_scx200_chip_map(struct pciide_softc *, struct pci_attach_args *); void ns_scx200_setup_channel(struct channel_softc *); +int acer_pcib_match(struct pci_attach_args *); +void acer_do_reset(struct channel_softc *); void acer_chip_map(struct pciide_softc *, struct pci_attach_args *); void acer_setup_channel(struct channel_softc *); int acer_pci_intr(void *); @@ -289,6 +291,11 @@ struct pciide_product_desc { void (*chip_map)(struct pciide_softc *, struct pci_attach_args *); }; +struct pciide_acer_softc { + struct pciide_softc pciide_sc; + struct pci_attach_args pcib_pa; +}; + /* Flags for ide_flags */ #define IDE_PCI_CLASS_OVERRIDE 0x0001 /* accept even if class != pciide */ #define IDE_16BIT_IOSPACE0x0002 /* I/O space BARS ignore upper word */ @@ -5619,10 +5626,27 @@ ns_scx200_setup_channel(struct channel_s pciide_print_modes(cp); } +int +acer_pcib_match(struct pci_attach_args *pa) +{ + /* + * We need to access the PCI config space of the pcib, see + * acer_do_reset(). + */ + if (PCI_CLASS(pa-pa_class) == PCI_CLASS_BRIDGE + PCI_SUBCLASS(pa-pa_class) == PCI_SUBCLASS_BRIDGE_ISA + PCI_VENDOR(pa-pa_id) == PCI_VENDOR_ALI + PCI_PRODUCT(pa-pa_id) == PCI_PRODUCT_ALI_M1533) + return (1); + + return (0); +} + void acer_chip_map(struct pciide_softc *sc, struct pci_attach_args *pa) { struct pciide_channel *cp; + struct pciide_acer_softc *acer_sc = (struct pciide_acer_softc *)sc; int channel; pcireg_t cr, interface; bus_size_t cmdsize, ctlsize; @@ -5684,6 +5708,12 @@ acer_chip_map(struct pciide_softc *sc, s pciide_pci_read(sc-sc_pc, sc-sc_tag, ACER_0x4B) | ACER_0x4B_CDETECT); + if (rev == 0xC3) { + /* Install reset bug workaround */ + if (pci_find_device(acer_sc-pcib_pa, acer_pcib_match)) + sc-sc_wdcdev.reset = acer_do_reset; + } + for (channel = 0; channel sc-sc_wdcdev.nchannels; channel++) { cp = sc-pciide_channels[channel]; if (pciide_chansetup(sc, channel, interface) == 0) @@ -5713,6 +5743,31 @@ acer_chip_map(struct pciide_softc *sc, s } acer_setup_channel(cp-wdc_channel); } +} + +void +acer_do_reset(struct channel_softc *chp) +{ + struct pciide_channel *cp = (struct pciide_channel *)chp; + struct pciide_softc *sc = (struct pciide_softc *)cp-wdc_channel.wdc; + struct pciide_acer_softc *acer_sc = (struct pciide_acer_softc *)sc; + u_int8_t reg; + + /* + * From OpenSolaris: after a reset we need to disable/enable the + * corresponding channel, or data corruption will occur in + * UltraDMA modes. + */ + + wdc_do_reset(chp); + + reg = pciide_pci_read(acer_sc-pcib_pa.pa_pc, acer_sc-pcib_pa.pa_tag, + ACER_PCIB_CTRL); + pciide_pci_write(acer_sc-pcib_pa.pa_pc, acer_sc-pcib_pa.pa_tag, + ACER_PCIB_CTRL, reg ~ACER_PCIB_CTRL_ENCHAN(chp-channel)); + delay(1000); + pciide_pci_write(acer_sc-pcib_pa.pa_pc, acer_sc-pcib_pa.pa_tag, + ACER_PCIB_CTRL, reg); } void Index: dev/pci/pciide_acer_reg.h === RCS file: /home/cvs/src/sys/dev/pci/pciide_acer_reg.h,v retrieving revision 1.8 diff -u -p -r1.8 pciide_acer_reg.h --- dev/pci/pciide_acer_reg.h 23 Jul 2010 07:47:13 - 1.8 +++ dev/pci/pciide_acer_reg.h 12 Jan 2011 05:14:26 - @@ -89,6 +89,10 @@ #define ACER_0x79_REVC2_EN 0x4 #define ACER_0x79_EN 0x2 +/* OpenSolaris: channel enable/disable in the PCI-ISA bridge */ +#define ACER_PCIB_CTRL 0x58 +#define ACER_PCIB_CTRL_ENCHAN(chan) (0x4 (chan)) + /*
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
On Thu, Jan 13, 2011 at 09:02:26AM +0100, Jasper Lievisse Adriaanse wrote: On Wed, Jan 12, 2011 at 08:32:12PM -0500, Brad wrote: The following diff is ported from NetBSD (the workaround originated from OpenSolaris) to workaround the issue of data corruption with the ALI M5229 IDE chipset when using UltraDMA. Same workaround is also used by FreeBSD/Linux. This chipset is found in some sparc64 systems such as the Blade 100 and Netra X1. As well as Blade 1500, Fire T200, Fire v120 and Fire v210.. I don't have any such systems but I went digging for this being curious why the nasty hack was added to the kernel configs to disable UltraDMA to workaround this bug and thus penalizing other IDE/SATA controllers that could be in the same system. If you have one of the mentioned systems please test this. Fwiw, my X1 is still working fine. Index: dev/pci/pciide.c === RCS file: /home/cvs/src/sys/dev/pci/pciide.c,v retrieving revision 1.323 diff -u -p -r1.323 pciide.c --- dev/pci/pciide.c18 Nov 2010 18:12:52 - 1.323 +++ dev/pci/pciide.c13 Jan 2011 00:22:14 - @@ -212,6 +212,8 @@ void natsemi_irqack(struct channel_softc void ns_scx200_chip_map(struct pciide_softc *, struct pci_attach_args *); void ns_scx200_setup_channel(struct channel_softc *); +int acer_pcib_match(struct pci_attach_args *); +void acer_do_reset(struct channel_softc *); void acer_chip_map(struct pciide_softc *, struct pci_attach_args *); void acer_setup_channel(struct channel_softc *); int acer_pci_intr(void *); @@ -289,6 +291,11 @@ struct pciide_product_desc { void (*chip_map)(struct pciide_softc *, struct pci_attach_args *); }; +struct pciide_acer_softc { + struct pciide_softc pciide_sc; + struct pci_attach_args pcib_pa; +}; + /* Flags for ide_flags */ #define IDE_PCI_CLASS_OVERRIDE 0x0001 /* accept even if class != pciide */ #define IDE_16BIT_IOSPACE 0x0002 /* I/O space BARS ignore upper word */ @@ -5619,10 +5626,27 @@ ns_scx200_setup_channel(struct channel_s pciide_print_modes(cp); } +int +acer_pcib_match(struct pci_attach_args *pa) +{ + /* +* We need to access the PCI config space of the pcib, see +* acer_do_reset(). +*/ + if (PCI_CLASS(pa-pa_class) == PCI_CLASS_BRIDGE + PCI_SUBCLASS(pa-pa_class) == PCI_SUBCLASS_BRIDGE_ISA + PCI_VENDOR(pa-pa_id) == PCI_VENDOR_ALI + PCI_PRODUCT(pa-pa_id) == PCI_PRODUCT_ALI_M1533) + return (1); + + return (0); +} + void acer_chip_map(struct pciide_softc *sc, struct pci_attach_args *pa) { struct pciide_channel *cp; + struct pciide_acer_softc *acer_sc = (struct pciide_acer_softc *)sc; int channel; pcireg_t cr, interface; bus_size_t cmdsize, ctlsize; @@ -5684,6 +5708,12 @@ acer_chip_map(struct pciide_softc *sc, s pciide_pci_read(sc-sc_pc, sc-sc_tag, ACER_0x4B) | ACER_0x4B_CDETECT); + if (rev == 0xC3) { + /* Install reset bug workaround */ + if (pci_find_device(acer_sc-pcib_pa, acer_pcib_match)) + sc-sc_wdcdev.reset = acer_do_reset; + } + for (channel = 0; channel sc-sc_wdcdev.nchannels; channel++) { cp = sc-pciide_channels[channel]; if (pciide_chansetup(sc, channel, interface) == 0) @@ -5713,6 +5743,31 @@ acer_chip_map(struct pciide_softc *sc, s } acer_setup_channel(cp-wdc_channel); } +} + +void +acer_do_reset(struct channel_softc *chp) +{ + struct pciide_channel *cp = (struct pciide_channel *)chp; + struct pciide_softc *sc = (struct pciide_softc *)cp-wdc_channel.wdc; + struct pciide_acer_softc *acer_sc = (struct pciide_acer_softc *)sc; + u_int8_t reg; + + /* +* From OpenSolaris: after a reset we need to disable/enable the +* corresponding channel, or data corruption will occur in +* UltraDMA modes. +*/ + + wdc_do_reset(chp); + + reg = pciide_pci_read(acer_sc-pcib_pa.pa_pc, acer_sc-pcib_pa.pa_tag, + ACER_PCIB_CTRL); + pciide_pci_write(acer_sc-pcib_pa.pa_pc, acer_sc-pcib_pa.pa_tag, + ACER_PCIB_CTRL, reg ~ACER_PCIB_CTRL_ENCHAN(chp-channel)); + delay(1000); + pciide_pci_write(acer_sc-pcib_pa.pa_pc, acer_sc-pcib_pa.pa_tag, + ACER_PCIB_CTRL, reg); } void Index: dev/pci/pciide_acer_reg.h === RCS file: /home/cvs/src/sys/dev/pci/pciide_acer_reg.h,v retrieving revision 1.8 diff -u -p -r1.8 pciide_acer_reg.h --- dev/pci/pciide_acer_reg.h 23 Jul 2010 07:47:13 - 1.8 +++ dev/pci/pciide_acer_reg.h 12 Jan 2011 05:14:26 - @@ -89,6 +89,10 @@ #define ACER_0x79_REVC2_EN 0x4 #define ACER_0x79_EN 0x2
Re: Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
The reset callback to wdc was added for this, but it didn't help some systems with the problem so the pciide bits never went in. If someone has a system that is known to need the workaround this can certaintly be looked into again though. On Wed, Jan 12, 2011 at 08:32:12PM -0500, Brad wrote: The following diff is ported from NetBSD (the workaround originated from OpenSolaris) to workaround the issue of data corruption with the ALI M5229 IDE chipset when using UltraDMA. Same workaround is also used by FreeBSD/Linux. This chipset is found in some sparc64 systems such as the Blade 100 and Netra X1. I don't have any such systems but I went digging for this being curious why the nasty hack was added to the kernel configs to disable UltraDMA to workaround this bug and thus penalizing other IDE/SATA controllers that could be in the same system. If you have one of the mentioned systems please test this.
Workaround for data corruption issue with ALI M5229 IDE chip used with Sun Blade 100/Netra X1.
The following diff is ported from NetBSD (the workaround originated from OpenSolaris) to workaround the issue of data corruption with the ALI M5229 IDE chipset when using UltraDMA. Same workaround is also used by FreeBSD/Linux. This chipset is found in some sparc64 systems such as the Blade 100 and Netra X1. I don't have any such systems but I went digging for this being curious why the nasty hack was added to the kernel configs to disable UltraDMA to workaround this bug and thus penalizing other IDE/SATA controllers that could be in the same system. If you have one of the mentioned systems please test this. Index: dev/pci/pciide.c === RCS file: /home/cvs/src/sys/dev/pci/pciide.c,v retrieving revision 1.323 diff -u -p -r1.323 pciide.c --- dev/pci/pciide.c18 Nov 2010 18:12:52 - 1.323 +++ dev/pci/pciide.c13 Jan 2011 00:22:14 - @@ -212,6 +212,8 @@ void natsemi_irqack(struct channel_softc void ns_scx200_chip_map(struct pciide_softc *, struct pci_attach_args *); void ns_scx200_setup_channel(struct channel_softc *); +int acer_pcib_match(struct pci_attach_args *); +void acer_do_reset(struct channel_softc *); void acer_chip_map(struct pciide_softc *, struct pci_attach_args *); void acer_setup_channel(struct channel_softc *); int acer_pci_intr(void *); @@ -289,6 +291,11 @@ struct pciide_product_desc { void (*chip_map)(struct pciide_softc *, struct pci_attach_args *); }; +struct pciide_acer_softc { + struct pciide_softc pciide_sc; + struct pci_attach_args pcib_pa; +}; + /* Flags for ide_flags */ #define IDE_PCI_CLASS_OVERRIDE 0x0001 /* accept even if class != pciide */ #define IDE_16BIT_IOSPACE 0x0002 /* I/O space BARS ignore upper word */ @@ -5619,10 +5626,27 @@ ns_scx200_setup_channel(struct channel_s pciide_print_modes(cp); } +int +acer_pcib_match(struct pci_attach_args *pa) +{ + /* +* We need to access the PCI config space of the pcib, see +* acer_do_reset(). +*/ + if (PCI_CLASS(pa-pa_class) == PCI_CLASS_BRIDGE + PCI_SUBCLASS(pa-pa_class) == PCI_SUBCLASS_BRIDGE_ISA + PCI_VENDOR(pa-pa_id) == PCI_VENDOR_ALI + PCI_PRODUCT(pa-pa_id) == PCI_PRODUCT_ALI_M1533) + return (1); + + return (0); +} + void acer_chip_map(struct pciide_softc *sc, struct pci_attach_args *pa) { struct pciide_channel *cp; + struct pciide_acer_softc *acer_sc = (struct pciide_acer_softc *)sc; int channel; pcireg_t cr, interface; bus_size_t cmdsize, ctlsize; @@ -5684,6 +5708,12 @@ acer_chip_map(struct pciide_softc *sc, s pciide_pci_read(sc-sc_pc, sc-sc_tag, ACER_0x4B) | ACER_0x4B_CDETECT); + if (rev == 0xC3) { + /* Install reset bug workaround */ + if (pci_find_device(acer_sc-pcib_pa, acer_pcib_match)) + sc-sc_wdcdev.reset = acer_do_reset; + } + for (channel = 0; channel sc-sc_wdcdev.nchannels; channel++) { cp = sc-pciide_channels[channel]; if (pciide_chansetup(sc, channel, interface) == 0) @@ -5713,6 +5743,31 @@ acer_chip_map(struct pciide_softc *sc, s } acer_setup_channel(cp-wdc_channel); } +} + +void +acer_do_reset(struct channel_softc *chp) +{ + struct pciide_channel *cp = (struct pciide_channel *)chp; + struct pciide_softc *sc = (struct pciide_softc *)cp-wdc_channel.wdc; + struct pciide_acer_softc *acer_sc = (struct pciide_acer_softc *)sc; + u_int8_t reg; + + /* +* From OpenSolaris: after a reset we need to disable/enable the +* corresponding channel, or data corruption will occur in +* UltraDMA modes. +*/ + + wdc_do_reset(chp); + + reg = pciide_pci_read(acer_sc-pcib_pa.pa_pc, acer_sc-pcib_pa.pa_tag, + ACER_PCIB_CTRL); + pciide_pci_write(acer_sc-pcib_pa.pa_pc, acer_sc-pcib_pa.pa_tag, + ACER_PCIB_CTRL, reg ~ACER_PCIB_CTRL_ENCHAN(chp-channel)); + delay(1000); + pciide_pci_write(acer_sc-pcib_pa.pa_pc, acer_sc-pcib_pa.pa_tag, + ACER_PCIB_CTRL, reg); } void Index: dev/pci/pciide_acer_reg.h === RCS file: /home/cvs/src/sys/dev/pci/pciide_acer_reg.h,v retrieving revision 1.8 diff -u -p -r1.8 pciide_acer_reg.h --- dev/pci/pciide_acer_reg.h 23 Jul 2010 07:47:13 - 1.8 +++ dev/pci/pciide_acer_reg.h 12 Jan 2011 05:14:26 - @@ -89,6 +89,10 @@ #define ACER_0x79_REVC2_EN 0x4 #define ACER_0x79_EN 0x2 +/* OpenSolaris: channel enable/disable in the PCI-ISA bridge */ +#define ACER_PCIB_CTRL 0x58 +#define ACER_PCIB_CTRL_ENCHAN(chan) (0x4 (chan)) + /* * IDE bus frequency (1 byte) * This should be setup by the BIOS - can we rely on this ? Index: arch/sparc64/conf/GENERIC