Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-12-09 Thread Matthew D. Fuller
On Mon, Nov 19, 2007 at 09:02:33AM +0100 I heard the voice of
Søren Schmidt, and lo! it spake thus:
 
 I'd like to get the final verdict of the attached patch and if it
 fixes the problem or not.

Behind the curve, as usual, I just upgraded one of my systems that's
had the problem in the past to RELENG_7 (which has the fix).  It's
since moved a bunch of data and done a bunch of builds without a hint
of trouble, so looks good to me.


-- 
Matthew Fuller (MF4839)   |  [EMAIL PROTECTED]
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-20 Thread Joao Barros
On 11/20/07, Thierry Herbelot [EMAIL PROTECTED] wrote:
 Le Tuesday 20 November 2007, Ari Suutari a écrit :

  I have Promise TX2 (PDC20575). It didn't work with 7.0 betas
  before, but with this patch things run as well as they did
  on 6.x.
 
   Ari S.

 Hello,

 Has anyone an idea why the Promise controllers seemed to work correctly under
 6.x, then have issues with 7.0 ? (more precisely : was the existing bug not
 triggered by the 6.x kernel ?)


Apparently not all Promise controllers are/were affected. I've been
running CURRENT since Pawel committed ZFS with an onboard Promise:

atapci0: Promise PDC20319 SATA150 controller port
0xb000-0xb03f,0xb400-0xb40f,0xb800-0xb87f mem
0xfc024000-0xfc024fff,0xfc00-0xfc01 irq 23 at device 4.0 on
pci4
ar0: 305245MB Promise Fasttrak RAID0 (stripe 64 KB) status: READY
ar1: 305245MB Promise Fasttrak RAID0 (stripe 64 KB) status: READY

[EMAIL PROTECTED]:4:4:0: class=0x010400 card=0x80f51043 chip=0x3319105a
rev=0x02 hdr=0x00
vendor = 'Promise Technology Inc'
device = 'PDC20319(??) FastTrak SATA150 TX4 Controller'
class  = mass storage
subclass   = RAID

The only problem I have and I'm filling a pr for that, is when booting
from CD with the controller enabled, the BTX loader just reboots.

-- 
Joao Barros
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-20 Thread Ari Suutari


Hi,


On man, nov 19, 2007 at 09:02:33 +0100, Søren Schmidt wrote:

Hi All!

I'd like to get the final verdict of the attached patch and if it fixes 
the problem or not.


Please test and report, its a bit urgent if it need to get into R7 :)




I have Promise TX2 (PDC20575). It didn't work with 7.0 betas
before, but with this patch things run as well as they did
on 6.x.

Ari S.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-20 Thread Thierry Herbelot
Le Tuesday 20 November 2007, Ari Suutari a écrit :

 I have Promise TX2 (PDC20575). It didn't work with 7.0 betas
 before, but with this patch things run as well as they did
 on 6.x.

  Ari S.

Hello,

Has anyone an idea why the Promise controllers seemed to work correctly under 
6.x, then have issues with 7.0 ? (more precisely : was the existing bug not 
triggered by the 6.x kernel ?)

Thierry
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-20 Thread Søren Schmidt

Joao Barros wrote:

On 11/20/07, Thierry Herbelot [EMAIL PROTECTED] wrote:
  

Le Tuesday 20 November 2007, Ari Suutari a écrit :



I have Promise TX2 (PDC20575). It didn't work with 7.0 betas
before, but with this patch things run as well as they did
on 6.x.

 Ari S.
  

Hello,

Has anyone an idea why the Promise controllers seemed to work correctly under
6.x, then have issues with 7.0 ? (more precisely : was the existing bug not
triggered by the 6.x kernel ?)


The problems as in the Promise HW, so it bound to happen on 6.x as well. 
Note that it just leads to data corruption not nessesarily hangs/lockups.


Apparently not all Promise controllers are/were affected. I've been
running CURRENT since Pawel committed ZFS with an onboard Promise:

atapci0: Promise PDC20319 SATA150 controller port
0xb000-0xb03f,0xb400-0xb40f,0xb800-0xb87f mem
0xfc024000-0xfc024fff,0xfc00-0xfc01 irq 23 at device 4.0 on
pci4
ar0: 305245MB Promise Fasttrak RAID0 (stripe 64 KB) status: READY
ar1: 305245MB Promise Fasttrak RAID0 (stripe 64 KB) status: READY
  

No, only the newer Gen2 chips, the older should be safe.

[EMAIL PROTECTED]:4:4:0: class=0x010400 card=0x80f51043 chip=0x3319105a
rev=0x02 hdr=0x00
vendor = 'Promise Technology Inc'
device = 'PDC20319(??) FastTrak SATA150 TX4 Controller'
class  = mass storage
subclass   = RAID

The only problem I have and I'm filling a pr for that, is when booting
from CD with the controller enabled, the BTX loader just reboots.
  

Thats at least not an ATA problem :)

-Søren


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-19 Thread Søren Schmidt

Hi All!

I'd like to get the final verdict of the attached patch and if it fixes 
the problem or not.


Please test and report, its a bit urgent if it need to get into R7 :)


-Søren
? promise-fix2
? promise-fix3
Index: ata-chipset.c
===
RCS file: /home/ncvs/src/sys/dev/ata/ata-chipset.c,v
retrieving revision 1.202.2.2
diff -u -r1.202.2.2 ata-chipset.c
--- ata-chipset.c   31 Oct 2007 19:59:53 -  1.202.2.2
+++ ata-chipset.c   18 Nov 2007 11:54:59 -
@@ -142,6 +142,7 @@
 static int ata_promise_mio_command(struct ata_request *request);
 static void ata_promise_mio_reset(device_t dev);
 static void ata_promise_mio_dmainit(device_t dev);
+static void ata_promise_mio_setprd(void *xsc, bus_dma_segment_t *segs, int 
nsegs, int error);
 static void ata_promise_mio_setmode(device_t dev, int mode);
 static void ata_promise_sx4_intr(void *data);
 static int ata_promise_sx4_command(struct ata_request *request);
@@ -792,6 +793,7 @@
prd[i].dbc = htole32((segs[i].ds_len - 1)  ATA_AHCI_PRD_MASK);
}
 }
+KASSERT(nsegs = ATA_DMA_ENTRIES, too many DMA segment entries\n);
 args-nsegs = nsegs;
 }
 
@@ -2760,6 +2762,8 @@
prd[i].addrhi = htole32((u_int64_t)segs[i].ds_addr  32);
 }
 prd[i - 1].count |= htole32(ATA_DMA_EOT);
+KASSERT(nsegs = ATA_DMA_ENTRIES, too many DMA segment entries\n);
+args-nsegs = nsegs;
 }
 
 static void
@@ -3288,9 +3292,13 @@
/* prime fake interrupt register */
ATA_OUTL(ctlr-r_res2, fake_reg, 0x);
 
-   /* clear SATA status */
+   /* clear SATA status and unmask interrupts */
ATA_OUTL(ctlr-r_res2, stat_reg, 0x00ff);
 
+   /* enable long burst lenght on gen2 chips */
+   if ((ctlr-chip-cfg2 == PRSATA2) || (ctlr-chip-cfg2 == PRCMBO2))
+   ATA_OUTL(ctlr-r_res2, 0x44, ATA_INL(ctlr-r_res2, 0x44) | 0x2000);
+
ctlr-allocate = ata_promise_mio_allocate;
ctlr-reset = ata_promise_mio_reset;
ctlr-dmainit = ata_promise_mio_dmainit;
@@ -3778,8 +3786,42 @@
 static void
 ata_promise_mio_dmainit(device_t dev)
 {
+struct ata_channel *ch = device_get_softc(dev);
+
 /* note start and stop are not used here */
 ata_dmainit(dev);
+if (ch-dma) 
+   ch-dma-setprd = ata_promise_mio_setprd;
+}
+
+
+#define MAXLASTSGSIZE (32 * sizeof(u_int32_t))
+static void 
+ata_promise_mio_setprd(void *xsc, bus_dma_segment_t *segs, int nsegs, int 
error)
+{
+struct ata_dmasetprd_args *args = xsc;
+struct ata_dma_prdentry *prd = args-dmatab;
+int i;
+
+if ((args-error = error))
+   return;
+
+for (i = 0; i  nsegs; i++) {
+   prd[i].addr = htole32(segs[i].ds_addr);
+   prd[i].count = htole32(segs[i].ds_len);
+}
+if (segs[i - 1].ds_len  MAXLASTSGSIZE) {
+   //printf(split last SG element of %u\n, segs[i - 1].ds_len);
+   prd[i - 1].count = htole32(segs[i - 1].ds_len - MAXLASTSGSIZE);
+   prd[i].count = htole32(MAXLASTSGSIZE);
+   prd[i].addr = htole32(segs[i - 1].ds_addr +
+ (segs[i - 1].ds_len - MAXLASTSGSIZE));
+   nsegs++;
+   i++;
+}
+prd[i - 1].count |= htole32(ATA_DMA_EOT);
+KASSERT(nsegs = ATA_DMA_ENTRIES, too many DMA segment entries\n);
+args-nsegs = nsegs;
 }
 
 static void
@@ -4849,6 +4891,8 @@
prd[i].count = htole32(segs[i].ds_len);
 }
 prd[i - 1].control = htole32(ATA_DMA_EOT);
+KASSERT(nsegs = ATA_DMA_ENTRIES, too many DMA segment entries\n);
+args-nsegs = nsegs;
 }
 
 static void
Index: ata-dma.c
===
RCS file: /home/ncvs/src/sys/dev/ata/ata-dma.c,v
retrieving revision 1.147
diff -u -r1.147 ata-dma.c
--- ata-dma.c   8 Apr 2007 21:53:52 -   1.147
+++ ata-dma.c   18 Nov 2007 11:54:59 -
@@ -213,6 +213,7 @@
prd[i].count = htole32(segs[i].ds_len);
 }
 prd[i - 1].count |= htole32(ATA_DMA_EOT);
+KASSERT(nsegs = ATA_DMA_ENTRIES, too many DMA segment entries\n);
 args-nsegs = nsegs;
 }
 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-19 Thread Ulf Lilleengen
On man, nov 19, 2007 at 09:02:33 +0100, Søren Schmidt wrote:
 Hi All!
 
 I'd like to get the final verdict of the attached patch and if it fixes 
 the problem or not.
 
 Please test and report, its a bit urgent if it need to get into R7 :)
 
 
Hi!

I'm sorry I wasn't able to test this earlier, but my office was locked during
the weekend and I was therefore not able to test until today. 

But good news is, it works. I get no error messages when reading or writing
data to the drives anymore, and the partition table is correctly read so that
the correct device nodes show up. This should definately go into 7.0 imho if
no bugs show up.

Thanks!

-- 
Ulf Lilleengen
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-19 Thread Thierry Herbelot
Le Monday 19 November 2007, Søren Schmidt a écrit :
 Hi All!

 I'd like to get the final verdict of the attached patch and if it fixes
 the problem or not.

 Please test and report, its a bit urgent if it need to get into R7 :)


 -Søren

Hello SoS,

From what I read, it seems that the last promise-fix3 patch is the same as the 
previous promise-fix2, except a cosmetic change.

Then, I'd say go for it as I was happy with promise_fix2.

Thanks

TfH
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-16 Thread Ulf Lilleengen
On fre, nov 02, 2007 at 01:34:51 +0300, Alexander Sabourenkov wrote:
 Hello.
 
 I have ported the workaround for the hardware bug that causes data
 corruption on Promise SATA300 TX4 cards to RELENG_7.
 
 Bug description:
 SATA300 TX4 hardware chokes if last PRD entry (in a dma transfer) is
 larger than 164 bytes. This was found while analysing vendor-supplied
 linux driver.
 
 Workaround:
 Split trailing PRD entry if it's larger that 164 bytes.
 
 Two supplied patches do fix problem on my machine.
 
 There is, however, a style problem with them. It seems like PRD entry
 count is limited at 256. I have not found a good way to guarantee that
 one entry is always available to do the split, thus the ugly solution of
 patching ata-dma.c.
 
 
 Patches, patched and original files are at http://lxnt.info/tx4/freebsd/.
 
Hi,

I tried the patch, but I end up with the partition table being incorrectly
read (probably) on the drives connected to my TX4 card. Normally, there's
one partition on the drive, but when I apply the patch, the drive provider
(ad6) is all that shows up in /dev. 

When I revert the patch, the partition (ad6s1) shows up in /dev again.

I applied both the ata-chipset patch and ata-dma patch to a RELENG_7 system.

-- 
Ulf Lilleengen
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-16 Thread Søren Schmidt

Ulf Lilleengen wrote:

I tried the patch, but I end up with the partition table being incorrectly
read (probably) on the drives connected to my TX4 card. Normally, there's
one partition on the drive, but when I apply the patch, the drive provider
(ad6) is all that shows up in /dev. 


When I revert the patch, the partition (ad6s1) shows up in /dev again.

I applied both the ata-chipset patch and ata-dma patch to a RELENG_7 system.

  
You should try the attached official patch and let me know if that 
helps, thanks!


-Søren
? promise-fix2
Index: ata-chipset.c
===
RCS file: /home/ncvs/src/sys/dev/ata/ata-chipset.c,v
retrieving revision 1.202.2.2
diff -u -r1.202.2.2 ata-chipset.c
--- ata-chipset.c   31 Oct 2007 19:59:53 -  1.202.2.2
+++ ata-chipset.c   11 Nov 2007 17:08:49 -
@@ -142,6 +142,7 @@
 static int ata_promise_mio_command(struct ata_request *request);
 static void ata_promise_mio_reset(device_t dev);
 static void ata_promise_mio_dmainit(device_t dev);
+static void ata_promise_mio_setprd(void *xsc, bus_dma_segment_t *segs, int 
nsegs, int error);
 static void ata_promise_mio_setmode(device_t dev, int mode);
 static void ata_promise_sx4_intr(void *data);
 static int ata_promise_sx4_command(struct ata_request *request);
@@ -792,6 +793,7 @@
prd[i].dbc = htole32((segs[i].ds_len - 1)  ATA_AHCI_PRD_MASK);
}
 }
+KASSERT(nsegs = ATA_DMA_ENTRIES, too many DMA segment entries\n);
 args-nsegs = nsegs;
 }
 
@@ -2760,6 +2762,8 @@
prd[i].addrhi = htole32((u_int64_t)segs[i].ds_addr  32);
 }
 prd[i - 1].count |= htole32(ATA_DMA_EOT);
+KASSERT(nsegs = ATA_DMA_ENTRIES, too many DMA segment entries\n);
+args-nsegs = nsegs;
 }
 
 static void
@@ -3288,9 +3292,13 @@
/* prime fake interrupt register */
ATA_OUTL(ctlr-r_res2, fake_reg, 0x);
 
-   /* clear SATA status */
+   /* clear SATA status and unmask interrupts */
ATA_OUTL(ctlr-r_res2, stat_reg, 0x00ff);
 
+   /* enable long burst lenght on gen2 chips */
+   if ((ctlr-chip-cfg2 == PRSATA2) || (ctlr-chip-cfg2 == PRCMBO2))
+   ATA_OUTL(ctlr-r_res2, 0x44, ATA_INL(ctlr-r_res2, 0x44) | 0x2000);
+
ctlr-allocate = ata_promise_mio_allocate;
ctlr-reset = ata_promise_mio_reset;
ctlr-dmainit = ata_promise_mio_dmainit;
@@ -3778,8 +3786,42 @@
 static void
 ata_promise_mio_dmainit(device_t dev)
 {
+struct ata_channel *ch = device_get_softc(dev);
+
 /* note start and stop are not used here */
 ata_dmainit(dev);
+if (ch-dma) 
+   ch-dma-setprd = ata_promise_mio_setprd;
+}
+
+
+#define MAXLASTSGSIZE (32 * sizeof(u_int32_t))
+static void 
+ata_promise_mio_setprd(void *xsc, bus_dma_segment_t *segs, int nsegs, int 
error)
+{
+struct ata_dmasetprd_args *args = xsc;
+struct ata_dma_prdentry *prd = args-dmatab;
+int i;
+
+if ((args-error = error))
+   return;
+
+for (i = 0; i  nsegs; i++) {
+   prd[i].addr = htole32(segs[i].ds_addr);
+   prd[i].count = htole32(segs[i].ds_len);
+}
+if (segs[i - 1].ds_len  MAXLASTSGSIZE) {
+   //printf(split last SG element of %u\n, segs[i - 1].ds_len);
+   prd[i - 1].count = htole32(segs[i - 1].ds_len - MAXLASTSGSIZE);
+   prd[i].count = htole32(MAXLASTSGSIZE);
+   prd[i].addr = htole32(segs[i - 1].ds_addr +
+ (segs[i - 1].ds_len - MAXLASTSGSIZE));
+   nsegs++;
+   i++;
+}
+prd[i - 1].count |= htole32(ATA_DMA_EOT);
+KASSERT(nsegs = ATA_DMA_ENTRIES, too many DMA segment entries\n);
+args-nsegs = nsegs;
 }
 
 static void
@@ -4849,6 +4891,8 @@
prd[i].count = htole32(segs[i].ds_len);
 }
 prd[i - 1].control = htole32(ATA_DMA_EOT);
+KASSERT(nsegs = ATA_DMA_ENTRIES, too many DMA segment entries\n);
+args-nsegs = nsegs;
 }
 
 static void
Index: ata-dma.c
===
RCS file: /home/ncvs/src/sys/dev/ata/ata-dma.c,v
retrieving revision 1.147
diff -u -r1.147 ata-dma.c
--- ata-dma.c   8 Apr 2007 21:53:52 -   1.147
+++ ata-dma.c   11 Nov 2007 17:08:49 -
@@ -213,6 +213,7 @@
prd[i].count = htole32(segs[i].ds_len);
 }
 prd[i - 1].count |= htole32(ATA_DMA_EOT);
+KASSERT(nsegs = ATA_DMA_ENTRIES, too many DMA segment entries\n);
 args-nsegs = nsegs;
 }
 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-09 Thread Alexander Sabourenkov

Roman Kurakin wrote:

By the way, is there any chance to get RAID5 working with this controller?



Software only.


--

./lxnt

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-09 Thread Roman Kurakin

By the way, is there any chance to get RAID5 working with this controller?

rik

Alexander Sabourenkov wrote:

Arno J. Klaassen wrote:
  

Rather than the marginal HW part, it seems, for me, closely related to
MB/BIOS (as well (Alexander apperently has about the same setup as I
have for this test)):




[...]

  

I vaguely remember from another PR that the Promise card does
something with PCI-bursting which fbsd does not detect and/or
handle correctly (and beyond my simple skills as dumb tester, but
maybe the linux-sources contain a clue about that as well).




Analysis of chip initialization in vendor-supplied, Linux and FreeBSD
drivers shows that FreeBSD's one:
- does not enable something called 'BMR_BURST',
- performs hotplug init in one write (instead of two read-modify-writes ),
- does an extra write (offset 0x54) which is not done in other drivers.

Analysis text: http://lxnt.info/tx4/chipinit.text

Patch with ported chipinit (dangerous to use with anything from Promise
other than sata300 tx4 !!):
http://lxnt.info/tx4/freebsd/chipinit.patch (cumulative)
http://lxnt.info/tx4/freebsd/ata-chipset.c+chipinit (patched source)

Note two things:
1. I have not compiled or tested this patch. Please do.
2. I may have missed this bug because I'm frequently rebooting between
Linux and FreeBSD, and what Linux driver initialized may have lasted the
reboots.


  


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-03 Thread Ender

Arno J. Klaassen wrote:

Hello,

Alexander Sabourenkov [EMAIL PROTECTED] writes:

  

Hello.

I have ported the workaround for the hardware bug that causes data
corruption on Promise SATA300 TX4 cards to RELENG_7.

Bug description:
SATA300 TX4 hardware chokes if last PRD entry (in a dma transfer) is
larger than 164 bytes. This was found while analysing vendor-supplied
linux driver.

Workaround:
Split trailing PRD entry if it's larger that 164 bytes.

Two supplied patches do fix problem on my machine.




definitely an improvement, but not sufficient (for my setup ) :

amd64-releng_6 on an ASUS A8V UP (box ran rock-stable
for years i386-releng_5 with same hardware apart TX4 and
drives)

from dmesg :

atapci0: Promise PDC40718 SATA300 controller port 0xe000-0xe07f,0xd800-0xd8ff 
mem 0xfbb0-0xfbb00fff,0xfba0-0xfba1 irq 18 at device 13.0 on pci0
ata2: ATA channel 0 on atapci0
ata3: ATA channel 1 on atapci0
ata4: ATA channel 2 on atapci0
ata5: ATA channel 3 on atapci0
atapci1: VIA 6420 SATA150 controller port 
0xd400-0xd407,0xd000-0xd003,0xc800-0xc807,0xc400-0xc403,0xc000-0xc00f,0xb800-0xb8ff 
irq 20 at device 15.0 on pci0
ata6: ATA channel 0 on atapci1
ata7: ATA channel 1 on atapci1
atapci2: VIA 8237 UDMA133 controller port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0
ata0: ATA channel 0 on atapci2
ata1: ATA channel 1 on atapci2

[ ... ]

ad0: 38166MB Seagate ST3402111A 3.AAJ at ata0-master UDMA100
ad6: 476940MB WDC WD5000AAKS-00TMA0 12.01C01 at ata3-master SATA300
ad12: 305245MB WDC WD3200JD-22KLB0 08.05J08 at ata6-master SATA150

booting from ad0 and simple gconcat over ad6 and ad12.

Improvement : I now can fsck /dev/concat/data without
ad6 being detached

Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data,
I get after about some Gigs of data have been transfered :

Nov  2 16:39:55 charlotte kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=268435392
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - 
completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA retrying (0 retries 
left) LBA=268435392
Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA 
status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR 
error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH
 LBA=268435392
Nov  2 16:40:50 charlotte kernel: 
g_vfs_done():concat/data[WRITE(offset=137438920704, length=131072)]error = 5
Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA48 retrying (1 retry 
left) LBA=268435648
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error 
(retrying request) LBA=268435648
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - 
completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA48 timed out 
LBA=268435648
Nov  2 16:40:50 charlotte kernel: 
g_vfs_done():concat/data[WRITE(offset=137439051776, length=131072)]error = 5

...

I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see
if that makes a difference)

Regards, Arno

  


Just a guess here, I bet that patch helped, but there are compound 
problems reguarding SATA on amd64 in 7.x Do a quick search for [sata] 
(especially g_vfs_done) in the PR database. Hopefully this removed a 
layer of bugs so the other ones are easyer to fix.





___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-03 Thread Alexander Sabourenkov
Arno J. Klaassen wrote:

 Rather than the marginal HW part, it seems, for me, closely related to
 MB/BIOS (as well (Alexander apperently has about the same setup as I
 have for this test)):
 

[...]

 
 I vaguely remember from another PR that the Promise card does
 something with PCI-bursting which fbsd does not detect and/or
 handle correctly (and beyond my simple skills as dumb tester, but
 maybe the linux-sources contain a clue about that as well).
 

Analysis of chip initialization in vendor-supplied, Linux and FreeBSD
drivers shows that FreeBSD's one:
- does not enable something called 'BMR_BURST',
- performs hotplug init in one write (instead of two read-modify-writes ),
- does an extra write (offset 0x54) which is not done in other drivers.

Analysis text: http://lxnt.info/tx4/chipinit.text

Patch with ported chipinit (dangerous to use with anything from Promise
other than sata300 tx4 !!):
http://lxnt.info/tx4/freebsd/chipinit.patch (cumulative)
http://lxnt.info/tx4/freebsd/ata-chipset.c+chipinit (patched source)

Note two things:
1. I have not compiled or tested this patch. Please do.
2. I may have missed this bug because I'm frequently rebooting between
Linux and FreeBSD, and what Linux driver initialized may have lasted the
reboots.


-- 

./lxnt
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Søren Schmidt

Søren Schmidt wrote:

Good catch!

However from my quick glimpse at the Promise sources the limit seems 
to be 32 Dwords ie 32*4 = 128bytes.
I'll investigate further and ask Promise for the gory details, stay 
tuned...
I dont think the PRD count limitation is a real problem, I've newer 
seen that long a list and IIRC we newer do more than 64K transfers in 
one go (yet).

Anyhow I need to get checks in for that not just here...

Give me a few days and I'll get this figured out for 7-rel...
Oh, and I forgot, do you have a surefire way to reproduce the problem so 
the fix can be tested ?


I've newer been able to trigger this problem myself so far.


-Søren

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Søren Schmidt

Alexander Sabourenkov wrote:

Hello.

I have ported the workaround for the hardware bug that causes data
corruption on Promise SATA300 TX4 cards to RELENG_7.

Bug description:
SATA300 TX4 hardware chokes if last PRD entry (in a dma transfer) is
larger than 164 bytes. This was found while analysing vendor-supplied
linux driver.

Workaround:
Split trailing PRD entry if it's larger that 164 bytes.

Two supplied patches do fix problem on my machine.

There is, however, a style problem with them. It seems like PRD entry
count is limited at 256. I have not found a good way to guarantee that
one entry is always available to do the split, thus the ugly solution of
patching ata-dma.c.
  

Good catch!

However from my quick glimpse at the Promise sources the limit seems to 
be 32 Dwords ie 32*4 = 128bytes.

I'll investigate further and ask Promise for the gory details, stay tuned...
I dont think the PRD count limitation is a real problem, I've newer seen 
that long a list and IIRC we newer do more than 64K transfers in one go 
(yet).

Anyhow I need to get checks in for that not just here...

Give me a few days and I'll get this figured out for 7-rel...

-Søren


Patches, patched and original files are at http://lxnt.info/tx4/freebsd/.


--- ata-chipset.c.orig  2007-11-02 01:05:49.0 +0300
+++ ata-chipset.c   2007-11-02 01:05:49.0 +0300
@@ -142,6 +142,7 @@
 static int ata_promise_mio_command(struct ata_request *request);
 static void ata_promise_mio_reset(device_t dev);
 static void ata_promise_mio_dmainit(device_t dev);
+static void ata_promise_mio_dmasetprd(void *xsc, bus_dma_segment_t
*segs, int nsegs, int error);
 static void ata_promise_mio_setmode(device_t dev, int mode);
 static void ata_promise_sx4_intr(void *data);
 static int ata_promise_sx4_command(struct ata_request *request);
@@ -185,7 +186,6 @@
 static int ata_check_80pin(device_t dev, int mode);
 static int ata_mode2idx(int mode);

-
 /*
  * generic ATA support functions
  */
@@ -3759,8 +3759,44 @@
 static void
 ata_promise_mio_dmainit(device_t dev)
 {
+struct ata_channel *ch = device_get_softc(dev);
+   
 /* note start and stop are not used here */
 ata_dmainit(dev);
+
+if (ch-dma)
+   ch-dma-setprd = ata_promise_mio_dmasetprd;
+}
+
+static void
+ata_promise_mio_dmasetprd(void *xsc, bus_dma_segment_t *segs, int
nsegs, int error)
+{
+#define PDC_MAXLASTSGSIZE 41*4
+struct ata_dmasetprd_args *args = xsc;
+struct ata_dma_prdentry *prd = args-dmatab;
+int i;
+
+if ((args-error = error))
+   return;
+
+for (i = 0; i  nsegs; i++) {
+   prd[i].addr = htole32(segs[i].ds_addr);
+   prd[i].count = htole32(segs[i].ds_len);
+}
+
+if (segs[i - 1].ds_len  PDC_MAXLASTSGSIZE) {
+   /*
+   printf(splitting trailing PRD of %ld (limit %d)\n, segs[i -
1].ds_len, PDC_MAXLASTSGSIZE);
+   */
+   prd[i - 1].count = htole32(segs[i - 1].ds_len - PDC_MAXLASTSGSIZE);
+   prd[i].count = htole32(PDC_MAXLASTSGSIZE);
+   prd[i].addr = htole32(segs[i - 1].ds_addr + PDC_MAXLASTSGSIZE);
+   i ++;
+   nsegs ++;
+}
+
+prd[i - 1].count |= htole32(ATA_DMA_EOT);
+args-nsegs = nsegs;
 }

 static void

--- ata-dma.c.orig  2007-11-02 01:05:53.0 +0300
+++ ata-dma.c   2007-11-02 01:05:53.0 +0300
@@ -113,7 +113,7 @@
 if
(bus_dma_tag_create(ch-dma-dmatag,ch-dma-alignment,ch-dma-boundary,
   ch-dma-max_address, BUS_SPACE_MAXADDR,
   NULL, NULL, ch-dma-max_iosize,
-  ATA_DMA_ENTRIES, ch-dma-segsize,
+  ATA_DMA_ENTRIES - 1, ch-dma-segsize,
   0, NULL, NULL, ch-dma-data_tag))
goto error;


  



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Alexander Sabourenkov

Søren Schmidt wrote:

Søren Schmidt wrote:

Good catch!

However from my quick glimpse at the Promise sources the limit seems 
to be 32 Dwords ie 32*4 = 128bytes.


Please see driver named 4_sataii150-300_linux2.6-src_x86-64_v1.01.0.23

I'll investigate further and ask Promise for the gory details, stay 
tuned...
I dont think the PRD count limitation is a real problem, I've newer 
seen that long a list and IIRC we newer do more than 64K transfers in 
one go (yet).


In (current) practice, yes, but check should be there even if only to 
document the limit.



Anyhow I need to get checks in for that not just here...

Give me a few days and I'll get this figured out for 7-rel...
Oh, and I forgot, do you have a surefire way to reproduce the problem so 
the fix can be tested ?


dd if=/dev/ad8 of=/dev/null bs=1048576 count=1000 works every time.

I have tested it on my home machine:

without the patch first timeouts and errors appear about 10 seconds into 
the read.


with the patch a read of entire disk (320G) completed without errors.

Previous tests of analogous linux driver fix shown no errors and no data 
corruption on two write-whole-drive, read-whole-drive cycles.




I've newer been able to trigger this problem myself so far.



Seems like the bug is highly configuration-dependent, or 
pci-chiset-depended, or just present in some production runs and not other.


--

./lxnt
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Arno J. Klaassen
Hello,

Alexander Sabourenkov [EMAIL PROTECTED] writes:

 Hello.
 
 I have ported the workaround for the hardware bug that causes data
 corruption on Promise SATA300 TX4 cards to RELENG_7.
 
 Bug description:
 SATA300 TX4 hardware chokes if last PRD entry (in a dma transfer) is
 larger than 164 bytes. This was found while analysing vendor-supplied
 linux driver.
 
 Workaround:
 Split trailing PRD entry if it's larger that 164 bytes.
 
 Two supplied patches do fix problem on my machine.


definitely an improvement, but not sufficient (for my setup ) :

amd64-releng_6 on an ASUS A8V UP (box ran rock-stable
for years i386-releng_5 with same hardware apart TX4 and
drives)

from dmesg :

atapci0: Promise PDC40718 SATA300 controller port 0xe000-0xe07f,0xd800-0xd8ff 
mem 0xfbb0-0xfbb00fff,0xfba0-0xfba1 irq 18 at device 13.0 on pci0
ata2: ATA channel 0 on atapci0
ata3: ATA channel 1 on atapci0
ata4: ATA channel 2 on atapci0
ata5: ATA channel 3 on atapci0
atapci1: VIA 6420 SATA150 controller port 
0xd400-0xd407,0xd000-0xd003,0xc800-0xc807,0xc400-0xc403,0xc000-0xc00f,0xb800-0xb8ff
 irq 20 at device 15.0 on pci0
ata6: ATA channel 0 on atapci1
ata7: ATA channel 1 on atapci1
atapci2: VIA 8237 UDMA133 controller port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0
ata0: ATA channel 0 on atapci2
ata1: ATA channel 1 on atapci2

[ ... ]

ad0: 38166MB Seagate ST3402111A 3.AAJ at ata0-master UDMA100
ad6: 476940MB WDC WD5000AAKS-00TMA0 12.01C01 at ata3-master SATA300
ad12: 305245MB WDC WD3200JD-22KLB0 08.05J08 at ata6-master SATA150

booting from ad0 and simple gconcat over ad6 and ad12.

Improvement : I now can fsck /dev/concat/data without
ad6 being detached

Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data,
I get after about some Gigs of data have been transfered :

Nov  2 16:39:55 charlotte kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=268435392
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - 
completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA retrying (0 retries 
left) LBA=268435392
Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA 
status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR 
error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH
 LBA=268435392
Nov  2 16:40:50 charlotte kernel: 
g_vfs_done():concat/data[WRITE(offset=137438920704, length=131072)]error = 5
Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA48 retrying (1 retry 
left) LBA=268435648
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error 
(retrying request) LBA=268435648
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - 
completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA48 timed out 
LBA=268435648
Nov  2 16:40:50 charlotte kernel: 
g_vfs_done():concat/data[WRITE(offset=137439051776, length=131072)]error = 5

...

I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see
if that makes a difference)

Regards, Arno
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Alexander Sabourenkov
Arno J. Klaassen wrote:
 definitely an improvement, but not sufficient (for my setup ) :
 
 amd64-releng_6 on an ASUS A8V UP (box ran rock-stable
 for years i386-releng_5 with same hardware apart TX4 and
 drives)
 
 from dmesg :
 

Setup is identical to mine, except for the drives.
http://lxnt.info/tx4/freebsd/dmesg.text

 
 Improvement : I now can fsck /dev/concat/data without
 ad6 being detached

It was that bad? wow.

 Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data,
 I get after about some Gigs of data have been transfered :
 

That's strange. Are you sure cables, PSU and line power are ok?
Back in October upgrading PSU halved the error count for me (under linux).

 
 I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see
 if that makes a difference)
 

Please do.

-- 

./lxnt
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Arno J. Klaassen
Alexander Sabourenkov [EMAIL PROTECTED] writes:

 Arno J. Klaassen wrote:
  definitely an improvement, but not sufficient (for my setup ) :
  
  amd64-releng_6 on an ASUS A8V UP (box ran rock-stable
  for years i386-releng_5 with same hardware apart TX4 and
  drives)
  
  from dmesg :
  
 
 Setup is identical to mine, except for the drives.
 http://lxnt.info/tx4/freebsd/dmesg.text
 
  
  Improvement : I now can fsck /dev/concat/data without
  ad6 being detached
 
 It was that bad? wow.


yop (often even beyond repair ... )

  Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data,
  I get after about some Gigs of data have been transfered :
  
 
 That's strange. Are you sure cables, PSU and line power are ok?
 Back in October upgrading PSU halved the error count for me (under linux).

I could try, but don't believe in it : just three disks and an extra
controller iso the two disks it used to run with ...
  
  I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see
  if that makes a difference)
  
 
 Please do.

bon, it does : no more scaring messages about DMA SETFEATURES etc, though
it now ends in a panic ...

the end of my /var/log/messages (I turned on your printf as well ) :

Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128)
Nov  2 22:59:11 charlotte last message repeated 15 times
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128)
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128)
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128)
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128)
Nov  2 22:59:11 charlotte last message repeated 11 times
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128)
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128)
Nov  2 23:01:18 charlotte syslogd: kernel boot file is /boot/kernel/kernel
Nov  2 23:01:18 charlotte kernel: splitting trailing PRD of 4096 (limit 128)
Nov  2 23:01:18 charlotte last message repeated 17 times
Nov  2 23:01:18 charlotte kernel: Copyright (c) 1992-2007 The FreeBSD Project.
Nov  2 23:01:18 charlotte kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 
1989, 1991, 1992, 1993, 1994


And for the panic :

panic: ffs_clusteralloc: map mismatch
Uptime: 35m27s
Dumping 1023 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 1023MB (261808 pages) 1007 991 975 959 943 927 911 895 879 863 847 
831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 
511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 
191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:172
172 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:172
#1  0x0004 in ?? ()
#2  0x8025e233 in boot (howto=260)
at /files/bsd/src6/sys/kern/kern_shutdown.c:409
#3  0x8025e836 in panic (fmt=0xff00305bebe0 )
at /files/bsd/src6/sys/kern/kern_shutdown.c:565
#4  0x8037ab26 in ffs_clusteralloc (ip=0xff00241ae900, cg=9425, 
bpref=0, len=5) at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:1663
#5  0x803769a8 in ffs_hashalloc (ip=0xff00241ae900, cg=395, 
pref=0, size=5, allocator=0x8037a650 ffs_clusteralloc)
at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:1281
#6  0x8037841a in ffs_reallocblks (ap=0x0)
at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:778
#7  0x8042496d in VOP_REALLOCBLKS_APV (vop=0x0, a=0x0)
at vnode_if.c:2056
#8  0x802bd70c in cluster_write (vp=0xff0015904ba0, 
bp=0x9e74ea10, filesize=81920, seqcount=17) at vnode_if.h:1052
#9  0x8039662f in ffs_write (ap=0xad243a30)
at /files/bsd/src6/sys/ufs/ffs/ffs_vnops.c:763
#10 0x804251fb in VOP_WRITE_APV (vop=0x805ad880, 
a=0xad243a30) at vnode_if.c:698
#11 0x802d9bca in vn_write (fp=0xff002e86da50, 
uio=0xad243b50, active_cred=0x0, flags=0, td=0xff00305bebe0)
at vnode_if.h:372
#12 0x802894d7 in dofilewrite (td=0xff00305bebe0, fd=1, 
fp=0xff002e86da50, auio=0xad243b50, offset=0, flags=0)
at file.h:253
#13 0x80289840 in kern_writev (td=0xff00305bebe0, fd=1, 
auio=0xad243b50) at /files/bsd/src6/sys/kern/sys_generic.c:402
#14 0x80289938 in write (td=0x0, uap=0x0)
at /files/bsd/src6/sys/kern/sys_generic.c:326
#15 0x803e0b21 in syscall (frame=
  {tf_rdi = 1, tf_rsi = 277012480, tf_rdx = 262144, tf_rcx = 262144, tf_r8 
= 262144, tf_r9 = 3219503195, tf_rax = 4, tf_rbx = 277012480, tf_rbp = 32768, 
tf_r10 = 1669914800, tf_r11 = 2860306816, tf_r12 = 0, tf_r13 = 1, tf_r14 = 
6326848, tf_r15 = 0, tf_trapno = 12, tf_addr = 277270528, tf_flags = 12, tf_err 
= 2, tf_rip = 34367373196, tf_cs = 43, tf_rflags = 518, tf_rsp = 
140737488337304, tf_ss = 35}) at 

Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Søren Schmidt

Arno J. Klaassen wrote:

definitely an improvement, but not sufficient (for my setup ) :

amd64-releng_6 on an ASUS A8V UP (box ran rock-stable
for years i386-releng_5 with same hardware apart TX4 and
drives)

from dmesg :

atapci0: Promise PDC40718 SATA300 controller port 0xe000-0xe07f,0xd800-0xd8ff 
mem 0xfbb0-0xfbb00fff,0xfba0-0xfba1 irq 18 at device 13.0 on pci0
ata2: ATA channel 0 on atapci0
ata3: ATA channel 1 on atapci0
ata4: ATA channel 2 on atapci0
ata5: ATA channel 3 on atapci0
atapci1: VIA 6420 SATA150 controller port 
0xd400-0xd407,0xd000-0xd003,0xc800-0xc807,0xc400-0xc403,0xc000-0xc00f,0xb800-0xb8ff 
irq 20 at device 15.0 on pci0
ata6: ATA channel 0 on atapci1
ata7: ATA channel 1 on atapci1
atapci2: VIA 8237 UDMA133 controller port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0
ata0: ATA channel 0 on atapci2
ata1: ATA channel 1 on atapci2

[ ... ]

ad0: 38166MB Seagate ST3402111A 3.AAJ at ata0-master UDMA100
ad6: 476940MB WDC WD5000AAKS-00TMA0 12.01C01 at ata3-master SATA300
ad12: 305245MB WDC WD3200JD-22KLB0 08.05J08 at ata6-master SATA150

booting from ad0 and simple gconcat over ad6 and ad12.

Improvement : I now can fsck /dev/concat/data without
ad6 being detached

Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data,
I get after about some Gigs of data have been transfered :

Nov  2 16:39:55 charlotte kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=268435392
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - 
completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA retrying (0 retries 
left) LBA=268435392
Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA 
status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR 
error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH
 LBA=268435392
Nov  2 16:40:50 charlotte kernel: 
g_vfs_done():concat/data[WRITE(offset=137438920704, length=131072)]error = 5
Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA48 retrying (1 retry 
left) LBA=268435648
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error 
(retrying request) LBA=268435648
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - 
completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA48 timed out 
LBA=268435648
Nov  2 16:40:50 charlotte kernel: 
g_vfs_done():concat/data[WRITE(offset=137439051776, length=131072)]error = 5

...

I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see
if that makes a difference)
  
One thing to try is to loose any geom raid, if raid needed use ataraid 
instead.


I'm shuffeling boards and controllers here to try to reproduce, so far 
no luck it just works(tm), it seems to depend quite heavily on the 
right combination of possibly marginal HW


-Søren


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Arno J. Klaassen
Hello,

  [ ... ]
  I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see
  if that makes a difference)
 
 One thing to try is to loose any geom raid, if raid needed use ataraid
 instead.

Nope : i did a newfs ad6 (the disk at the Promise TX4) and then an
rsync on it panics the same way as the geom_concat case did.


 I'm shuffeling boards and controllers here to try to reproduce, so far
 no luck it just works(tm), it seems to depend quite heavily on the
 right combination of possibly marginal HW

Rather than the marginal HW part, it seems, for me, closely related to
MB/BIOS (as well (Alexander apperently has about the same setup as I
have for this test)):

a while ago (using releng_6) i tried the same setup on three different
MBs: ahd-controller + scsi-boot-disk and TX4 and three disks in
geom_mirror; results :

  - on ASUS A8? board (I use plenty of them without the sligthest
problem for years; not really expensive but not marginal IMHO) :
just look at it and it would crash (g_vfs_done)

  - on Tyan S28?? : rock stable, unable to crash however
hard I tried

  - on some MSI K8 (I usually run Vista on for testing; this one I
really bought as cheap as possible ) : would run OK, even 
under rather heavy load, but when pushing really hard it 
finaly deliveres the lovely g_vfs_done ...

I vaguely remember from another PR that the Promise card does
something with PCI-bursting which fbsd does not detect and/or
handle correctly (and beyond my simple skills as dumb tester, but
maybe the linux-sources contain a clue about that as well).

Regards and thanx for your efforts

Arno
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-01 Thread Alexander Sabourenkov
Hello.

I have ported the workaround for the hardware bug that causes data
corruption on Promise SATA300 TX4 cards to RELENG_7.

Bug description:
SATA300 TX4 hardware chokes if last PRD entry (in a dma transfer) is
larger than 164 bytes. This was found while analysing vendor-supplied
linux driver.

Workaround:
Split trailing PRD entry if it's larger that 164 bytes.

Two supplied patches do fix problem on my machine.

There is, however, a style problem with them. It seems like PRD entry
count is limited at 256. I have not found a good way to guarantee that
one entry is always available to do the split, thus the ugly solution of
patching ata-dma.c.


Patches, patched and original files are at http://lxnt.info/tx4/freebsd/.


--- ata-chipset.c.orig  2007-11-02 01:05:49.0 +0300
+++ ata-chipset.c   2007-11-02 01:05:49.0 +0300
@@ -142,6 +142,7 @@
 static int ata_promise_mio_command(struct ata_request *request);
 static void ata_promise_mio_reset(device_t dev);
 static void ata_promise_mio_dmainit(device_t dev);
+static void ata_promise_mio_dmasetprd(void *xsc, bus_dma_segment_t
*segs, int nsegs, int error);
 static void ata_promise_mio_setmode(device_t dev, int mode);
 static void ata_promise_sx4_intr(void *data);
 static int ata_promise_sx4_command(struct ata_request *request);
@@ -185,7 +186,6 @@
 static int ata_check_80pin(device_t dev, int mode);
 static int ata_mode2idx(int mode);

-
 /*
  * generic ATA support functions
  */
@@ -3759,8 +3759,44 @@
 static void
 ata_promise_mio_dmainit(device_t dev)
 {
+struct ata_channel *ch = device_get_softc(dev);
+   
 /* note start and stop are not used here */
 ata_dmainit(dev);
+
+if (ch-dma)
+   ch-dma-setprd = ata_promise_mio_dmasetprd;
+}
+
+static void
+ata_promise_mio_dmasetprd(void *xsc, bus_dma_segment_t *segs, int
nsegs, int error)
+{
+#define PDC_MAXLASTSGSIZE 41*4
+struct ata_dmasetprd_args *args = xsc;
+struct ata_dma_prdentry *prd = args-dmatab;
+int i;
+
+if ((args-error = error))
+   return;
+
+for (i = 0; i  nsegs; i++) {
+   prd[i].addr = htole32(segs[i].ds_addr);
+   prd[i].count = htole32(segs[i].ds_len);
+}
+
+if (segs[i - 1].ds_len  PDC_MAXLASTSGSIZE) {
+   /*
+   printf(splitting trailing PRD of %ld (limit %d)\n, segs[i -
1].ds_len, PDC_MAXLASTSGSIZE);
+   */
+   prd[i - 1].count = htole32(segs[i - 1].ds_len - PDC_MAXLASTSGSIZE);
+   prd[i].count = htole32(PDC_MAXLASTSGSIZE);
+   prd[i].addr = htole32(segs[i - 1].ds_addr + PDC_MAXLASTSGSIZE);
+   i ++;
+   nsegs ++;
+}
+
+prd[i - 1].count |= htole32(ATA_DMA_EOT);
+args-nsegs = nsegs;
 }

 static void

--- ata-dma.c.orig  2007-11-02 01:05:53.0 +0300
+++ ata-dma.c   2007-11-02 01:05:53.0 +0300
@@ -113,7 +113,7 @@
 if
(bus_dma_tag_create(ch-dma-dmatag,ch-dma-alignment,ch-dma-boundary,
   ch-dma-max_address, BUS_SPACE_MAXADDR,
   NULL, NULL, ch-dma-max_iosize,
-  ATA_DMA_ENTRIES, ch-dma-segsize,
+  ATA_DMA_ENTRIES - 1, ch-dma-segsize,
   0, NULL, NULL, ch-dma-data_tag))
goto error;


-- 

./lxnt
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]