Re: floating point operations
On Thu, Nov 01, 2007 at 10:40:04AM +1100, James Healy wrote: The remaining op is not easily converted to fixed point math, and we're wondering what impact a single flop on the receipt of each ACK will have. We don't have a strong understanding of the amount of overhead involved in executing a flop instead of an int op on modern hardware. A single floating point operation in the kernel means that the kernel must be adapted to allow floating point within it - saving userland FP state somewhere between kernel entry and the FLOP, handling pesky exceptions etc. The problem is not the number of FLOPs in the kernel, the problem is that the kernel is not currently setup to allow any floating point within it. This topic came up last week and I suggest you have a look at the thread starting: http://lists.freebsd.org/pipermail/freebsd-hackers/2007-October/022037.html That said, I'm intrigued as to what operation you are stuck on. I'm having trouble visualising what you might be doing that gets stuck on a single FP instruction. -- Peter pgp5nEedp9M82.pgp Description: PGP signature
Re: Useful tools missing from /rescue
On Thu, Nov 01, 2007 at 08:53:39AM -0700, David O'Brien wrote: On Thu, Oct 18, 2007 at 02:04:21AM +0400, Yar Tikhiy wrote: On Mon, Oct 15, 2007 at 10:38:26AM -0700, David O'Brien wrote: I guess I'm not creative enough in the ways I've screwed up my systems and needed tools from /rescue. 8-) Just try to installworld FreeBSD/amd64 over a running FreeBSD/i386. ;-) I strongly feel that shouldn't be supported on a live system. So to me We already got that possibility for free along with src/Makefile.inc1#1.590, so no particular efforts are needed to support it. it shouldn't be an excuse to put a duplicated copy of /usr/[s]bin into /rescue. It's an exaggeration. The most of /usr/[s]bin aren't in /rescue yet. :-) It is a delicate thing to get right - and there are easy ways to do it today: Boot from disc1; mount / and /usr; mv /mnt/etc /mnt/etc.hold; rm -rf the bits in bin,sbin,libexec; then run the install.sh from the disc1; mv /mnt/etc /mnt/etc.new ; mv /mnt/etc.hold /mnt/etc One of the things I love FreeBSD for is being able to do things in different ways and to choose such a way depending on the case. :-) E.g., one may want to go from CURRENT/arch1 to CURRENT/arch2 without having to install a binary release or snapshot for arch2 first. -- Yar ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Søren Schmidt wrote: Good catch! However from my quick glimpse at the Promise sources the limit seems to be 32 Dwords ie 32*4 = 128bytes. I'll investigate further and ask Promise for the gory details, stay tuned... I dont think the PRD count limitation is a real problem, I've newer seen that long a list and IIRC we newer do more than 64K transfers in one go (yet). Anyhow I need to get checks in for that not just here... Give me a few days and I'll get this figured out for 7-rel... Oh, and I forgot, do you have a surefire way to reproduce the problem so the fix can be tested ? I've newer been able to trigger this problem myself so far. -Søren ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Alexander Sabourenkov wrote: Hello. I have ported the workaround for the hardware bug that causes data corruption on Promise SATA300 TX4 cards to RELENG_7. Bug description: SATA300 TX4 hardware chokes if last PRD entry (in a dma transfer) is larger than 164 bytes. This was found while analysing vendor-supplied linux driver. Workaround: Split trailing PRD entry if it's larger that 164 bytes. Two supplied patches do fix problem on my machine. There is, however, a style problem with them. It seems like PRD entry count is limited at 256. I have not found a good way to guarantee that one entry is always available to do the split, thus the ugly solution of patching ata-dma.c. Good catch! However from my quick glimpse at the Promise sources the limit seems to be 32 Dwords ie 32*4 = 128bytes. I'll investigate further and ask Promise for the gory details, stay tuned... I dont think the PRD count limitation is a real problem, I've newer seen that long a list and IIRC we newer do more than 64K transfers in one go (yet). Anyhow I need to get checks in for that not just here... Give me a few days and I'll get this figured out for 7-rel... -Søren Patches, patched and original files are at http://lxnt.info/tx4/freebsd/. --- ata-chipset.c.orig 2007-11-02 01:05:49.0 +0300 +++ ata-chipset.c 2007-11-02 01:05:49.0 +0300 @@ -142,6 +142,7 @@ static int ata_promise_mio_command(struct ata_request *request); static void ata_promise_mio_reset(device_t dev); static void ata_promise_mio_dmainit(device_t dev); +static void ata_promise_mio_dmasetprd(void *xsc, bus_dma_segment_t *segs, int nsegs, int error); static void ata_promise_mio_setmode(device_t dev, int mode); static void ata_promise_sx4_intr(void *data); static int ata_promise_sx4_command(struct ata_request *request); @@ -185,7 +186,6 @@ static int ata_check_80pin(device_t dev, int mode); static int ata_mode2idx(int mode); - /* * generic ATA support functions */ @@ -3759,8 +3759,44 @@ static void ata_promise_mio_dmainit(device_t dev) { +struct ata_channel *ch = device_get_softc(dev); + /* note start and stop are not used here */ ata_dmainit(dev); + +if (ch-dma) + ch-dma-setprd = ata_promise_mio_dmasetprd; +} + +static void +ata_promise_mio_dmasetprd(void *xsc, bus_dma_segment_t *segs, int nsegs, int error) +{ +#define PDC_MAXLASTSGSIZE 41*4 +struct ata_dmasetprd_args *args = xsc; +struct ata_dma_prdentry *prd = args-dmatab; +int i; + +if ((args-error = error)) + return; + +for (i = 0; i nsegs; i++) { + prd[i].addr = htole32(segs[i].ds_addr); + prd[i].count = htole32(segs[i].ds_len); +} + +if (segs[i - 1].ds_len PDC_MAXLASTSGSIZE) { + /* + printf(splitting trailing PRD of %ld (limit %d)\n, segs[i - 1].ds_len, PDC_MAXLASTSGSIZE); + */ + prd[i - 1].count = htole32(segs[i - 1].ds_len - PDC_MAXLASTSGSIZE); + prd[i].count = htole32(PDC_MAXLASTSGSIZE); + prd[i].addr = htole32(segs[i - 1].ds_addr + PDC_MAXLASTSGSIZE); + i ++; + nsegs ++; +} + +prd[i - 1].count |= htole32(ATA_DMA_EOT); +args-nsegs = nsegs; } static void --- ata-dma.c.orig 2007-11-02 01:05:53.0 +0300 +++ ata-dma.c 2007-11-02 01:05:53.0 +0300 @@ -113,7 +113,7 @@ if (bus_dma_tag_create(ch-dma-dmatag,ch-dma-alignment,ch-dma-boundary, ch-dma-max_address, BUS_SPACE_MAXADDR, NULL, NULL, ch-dma-max_iosize, - ATA_DMA_ENTRIES, ch-dma-segsize, + ATA_DMA_ENTRIES - 1, ch-dma-segsize, 0, NULL, NULL, ch-dma-data_tag)) goto error; ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: indefinite wait buffer patch
On Thu, Nov 01, 2007 at 09:20:42PM +0100, Arno J. Klaassen wrote: Hello, while slowly testing releng_7, I remembered I have since about two years the attached diff in my releng_6 sources (patch recreated against releng_7 with low timeouts for debugging) : it addresses the situation when one creates a huge swap-space on a (relatively) slow disk-subsystem : e.g. for scientific computing it sometimes makes sense to have, e.g. 8G swap for 2G main memory if you know you're treating N less then 2G matrices and process is CPU-bound for quite a while for 1 matrix before switching to the other. But then, when switching from one matrix to another, dmesg gets flooded by : indefinite wait buffer messages. The attached patch shows in fact that the wait buffer is never indefinite (unless a real HW-problem probably) and linearly increases timeout to match with reality. I think this is mostly good. See below. The last chunk is just to prevent for a panic at reboot when there is so much data swapped out that is doesn't get treated before 'reboot-finish-time-out'. This chunk would cause (non-silent) data corruption. Besides reboot, the code is used when swap is turned off on live system. Index: sys/vm/swap_pager.c === RCS file: /home/ncvs/src/sys/vm/swap_pager.c,v retrieving revision 1.295 diff -u -r1.295 swap_pager.c --- sys/vm/swap_pager.c 5 Aug 2007 21:04:32 - 1.295 +++ sys/vm/swap_pager.c 1 Nov 2007 18:59:18 - @@ -941,6 +941,10 @@ ... + static int timo_secs = TIMO_START; ... + if (retry*TIMO_CHUNK timo_secs) { + timo_secs = retry*TIMO_CHUNK; Imagine that two buffers got the timeout on swap-in simultaneously. I think that, instead, making timo_secs local variable would be right. Also, timeout reading one buffer shall not increase the timeout swapping in another one. pgpXgf1VapjVs.pgp Description: PGP signature
Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Søren Schmidt wrote: Søren Schmidt wrote: Good catch! However from my quick glimpse at the Promise sources the limit seems to be 32 Dwords ie 32*4 = 128bytes. Please see driver named 4_sataii150-300_linux2.6-src_x86-64_v1.01.0.23 I'll investigate further and ask Promise for the gory details, stay tuned... I dont think the PRD count limitation is a real problem, I've newer seen that long a list and IIRC we newer do more than 64K transfers in one go (yet). In (current) practice, yes, but check should be there even if only to document the limit. Anyhow I need to get checks in for that not just here... Give me a few days and I'll get this figured out for 7-rel... Oh, and I forgot, do you have a surefire way to reproduce the problem so the fix can be tested ? dd if=/dev/ad8 of=/dev/null bs=1048576 count=1000 works every time. I have tested it on my home machine: without the patch first timeouts and errors appear about 10 seconds into the read. with the patch a read of entire disk (320G) completed without errors. Previous tests of analogous linux driver fix shown no errors and no data corruption on two write-whole-drive, read-whole-drive cycles. I've newer been able to trigger this problem myself so far. Seems like the bug is highly configuration-dependent, or pci-chiset-depended, or just present in some production runs and not other. -- ./lxnt ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Hello, Alexander Sabourenkov [EMAIL PROTECTED] writes: Hello. I have ported the workaround for the hardware bug that causes data corruption on Promise SATA300 TX4 cards to RELENG_7. Bug description: SATA300 TX4 hardware chokes if last PRD entry (in a dma transfer) is larger than 164 bytes. This was found while analysing vendor-supplied linux driver. Workaround: Split trailing PRD entry if it's larger that 164 bytes. Two supplied patches do fix problem on my machine. definitely an improvement, but not sufficient (for my setup ) : amd64-releng_6 on an ASUS A8V UP (box ran rock-stable for years i386-releng_5 with same hardware apart TX4 and drives) from dmesg : atapci0: Promise PDC40718 SATA300 controller port 0xe000-0xe07f,0xd800-0xd8ff mem 0xfbb0-0xfbb00fff,0xfba0-0xfba1 irq 18 at device 13.0 on pci0 ata2: ATA channel 0 on atapci0 ata3: ATA channel 1 on atapci0 ata4: ATA channel 2 on atapci0 ata5: ATA channel 3 on atapci0 atapci1: VIA 6420 SATA150 controller port 0xd400-0xd407,0xd000-0xd003,0xc800-0xc807,0xc400-0xc403,0xc000-0xc00f,0xb800-0xb8ff irq 20 at device 15.0 on pci0 ata6: ATA channel 0 on atapci1 ata7: ATA channel 1 on atapci1 atapci2: VIA 8237 UDMA133 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0 ata0: ATA channel 0 on atapci2 ata1: ATA channel 1 on atapci2 [ ... ] ad0: 38166MB Seagate ST3402111A 3.AAJ at ata0-master UDMA100 ad6: 476940MB WDC WD5000AAKS-00TMA0 12.01C01 at ata3-master SATA300 ad12: 305245MB WDC WD3200JD-22KLB0 08.05J08 at ata6-master SATA150 booting from ad0 and simple gconcat over ad6 and ad12. Improvement : I now can fsck /dev/concat/data without ad6 being detached Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data, I get after about some Gigs of data have been transfered : Nov 2 16:39:55 charlotte kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268435392 Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435392 Nov 2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH LBA=268435392 Nov 2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=137438920704, length=131072)]error = 5 Nov 2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=268435648 Nov 2 16:40:50 charlotte kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=268435648 Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA48 timed out LBA=268435648 Nov 2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=137439051776, length=131072)]error = 5 ... I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see if that makes a difference) Regards, Arno ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: indefinite wait buffer patch
Since eyeballs are in swap_page.c - is the putpages panic string mislabeled: swap_pager_putpages(vm_object_t object, vm_page_t *m, int count, boolean_t sync, int *rtvals) { int i; int n = 0; if (count m[0]-object != object) { panic(swap_pager_getpages: object mismatch %p/%p, putpages --Mark Tinguely. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Arno J. Klaassen wrote: definitely an improvement, but not sufficient (for my setup ) : amd64-releng_6 on an ASUS A8V UP (box ran rock-stable for years i386-releng_5 with same hardware apart TX4 and drives) from dmesg : Setup is identical to mine, except for the drives. http://lxnt.info/tx4/freebsd/dmesg.text Improvement : I now can fsck /dev/concat/data without ad6 being detached It was that bad? wow. Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data, I get after about some Gigs of data have been transfered : That's strange. Are you sure cables, PSU and line power are ok? Back in October upgrading PSU halved the error count for me (under linux). I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see if that makes a difference) Please do. -- ./lxnt ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: indefinite wait buffer patch
On Fri, 2 Nov 2007, 12:59-0500, Mark Tinguely wrote: Since eyeballs are in swap_page.c - is the putpages panic string mislabeled: swap_pager_putpages(vm_object_t object, vm_page_t *m, int count, boolean_t sync, int *rtvals) { int i; int n = 0; if (count m[0]-object != object) { panic(swap_pager_getpages: object mismatch %p/%p, putpages Just fixed. Thanks. -- Maxim Konovalov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Alexander Sabourenkov [EMAIL PROTECTED] writes: Arno J. Klaassen wrote: definitely an improvement, but not sufficient (for my setup ) : amd64-releng_6 on an ASUS A8V UP (box ran rock-stable for years i386-releng_5 with same hardware apart TX4 and drives) from dmesg : Setup is identical to mine, except for the drives. http://lxnt.info/tx4/freebsd/dmesg.text Improvement : I now can fsck /dev/concat/data without ad6 being detached It was that bad? wow. yop (often even beyond repair ... ) Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data, I get after about some Gigs of data have been transfered : That's strange. Are you sure cables, PSU and line power are ok? Back in October upgrading PSU halved the error count for me (under linux). I could try, but don't believe in it : just three disks and an extra controller iso the two disks it used to run with ... I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see if that makes a difference) Please do. bon, it does : no more scaring messages about DMA SETFEATURES etc, though it now ends in a panic ... the end of my /var/log/messages (I turned on your printf as well ) : Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128) Nov 2 22:59:11 charlotte last message repeated 15 times Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128) Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128) Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128) Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128) Nov 2 22:59:11 charlotte last message repeated 11 times Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128) Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128) Nov 2 23:01:18 charlotte syslogd: kernel boot file is /boot/kernel/kernel Nov 2 23:01:18 charlotte kernel: splitting trailing PRD of 4096 (limit 128) Nov 2 23:01:18 charlotte last message repeated 17 times Nov 2 23:01:18 charlotte kernel: Copyright (c) 1992-2007 The FreeBSD Project. Nov 2 23:01:18 charlotte kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 And for the panic : panic: ffs_clusteralloc: map mismatch Uptime: 35m27s Dumping 1023 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 1023MB (261808 pages) 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump () at pcpu.h:172 #1 0x0004 in ?? () #2 0x8025e233 in boot (howto=260) at /files/bsd/src6/sys/kern/kern_shutdown.c:409 #3 0x8025e836 in panic (fmt=0xff00305bebe0 ) at /files/bsd/src6/sys/kern/kern_shutdown.c:565 #4 0x8037ab26 in ffs_clusteralloc (ip=0xff00241ae900, cg=9425, bpref=0, len=5) at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:1663 #5 0x803769a8 in ffs_hashalloc (ip=0xff00241ae900, cg=395, pref=0, size=5, allocator=0x8037a650 ffs_clusteralloc) at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:1281 #6 0x8037841a in ffs_reallocblks (ap=0x0) at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:778 #7 0x8042496d in VOP_REALLOCBLKS_APV (vop=0x0, a=0x0) at vnode_if.c:2056 #8 0x802bd70c in cluster_write (vp=0xff0015904ba0, bp=0x9e74ea10, filesize=81920, seqcount=17) at vnode_if.h:1052 #9 0x8039662f in ffs_write (ap=0xad243a30) at /files/bsd/src6/sys/ufs/ffs/ffs_vnops.c:763 #10 0x804251fb in VOP_WRITE_APV (vop=0x805ad880, a=0xad243a30) at vnode_if.c:698 #11 0x802d9bca in vn_write (fp=0xff002e86da50, uio=0xad243b50, active_cred=0x0, flags=0, td=0xff00305bebe0) at vnode_if.h:372 #12 0x802894d7 in dofilewrite (td=0xff00305bebe0, fd=1, fp=0xff002e86da50, auio=0xad243b50, offset=0, flags=0) at file.h:253 #13 0x80289840 in kern_writev (td=0xff00305bebe0, fd=1, auio=0xad243b50) at /files/bsd/src6/sys/kern/sys_generic.c:402 #14 0x80289938 in write (td=0x0, uap=0x0) at /files/bsd/src6/sys/kern/sys_generic.c:326 #15 0x803e0b21 in syscall (frame= {tf_rdi = 1, tf_rsi = 277012480, tf_rdx = 262144, tf_rcx = 262144, tf_r8 = 262144, tf_r9 = 3219503195, tf_rax = 4, tf_rbx = 277012480, tf_rbp = 32768, tf_r10 = 1669914800, tf_r11 = 2860306816, tf_r12 = 0, tf_r13 = 1, tf_r14 = 6326848, tf_r15 = 0, tf_trapno = 12, tf_addr = 277270528, tf_flags = 12, tf_err = 2, tf_rip = 34367373196, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488337304, tf_ss = 35}) at
Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Arno J. Klaassen wrote: definitely an improvement, but not sufficient (for my setup ) : amd64-releng_6 on an ASUS A8V UP (box ran rock-stable for years i386-releng_5 with same hardware apart TX4 and drives) from dmesg : atapci0: Promise PDC40718 SATA300 controller port 0xe000-0xe07f,0xd800-0xd8ff mem 0xfbb0-0xfbb00fff,0xfba0-0xfba1 irq 18 at device 13.0 on pci0 ata2: ATA channel 0 on atapci0 ata3: ATA channel 1 on atapci0 ata4: ATA channel 2 on atapci0 ata5: ATA channel 3 on atapci0 atapci1: VIA 6420 SATA150 controller port 0xd400-0xd407,0xd000-0xd003,0xc800-0xc807,0xc400-0xc403,0xc000-0xc00f,0xb800-0xb8ff irq 20 at device 15.0 on pci0 ata6: ATA channel 0 on atapci1 ata7: ATA channel 1 on atapci1 atapci2: VIA 8237 UDMA133 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0 ata0: ATA channel 0 on atapci2 ata1: ATA channel 1 on atapci2 [ ... ] ad0: 38166MB Seagate ST3402111A 3.AAJ at ata0-master UDMA100 ad6: 476940MB WDC WD5000AAKS-00TMA0 12.01C01 at ata3-master SATA300 ad12: 305245MB WDC WD3200JD-22KLB0 08.05J08 at ata6-master SATA150 booting from ad0 and simple gconcat over ad6 and ad12. Improvement : I now can fsck /dev/concat/data without ad6 being detached Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data, I get after about some Gigs of data have been transfered : Nov 2 16:39:55 charlotte kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268435392 Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435392 Nov 2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH LBA=268435392 Nov 2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=137438920704, length=131072)]error = 5 Nov 2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=268435648 Nov 2 16:40:50 charlotte kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=268435648 Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA48 timed out LBA=268435648 Nov 2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=137439051776, length=131072)]error = 5 ... I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see if that makes a difference) One thing to try is to loose any geom raid, if raid needed use ataraid instead. I'm shuffeling boards and controllers here to try to reproduce, so far no luck it just works(tm), it seems to depend quite heavily on the right combination of possibly marginal HW -Søren ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Hello, [ ... ] I will test again with #define PDC_MAXLASTSGSIZE 32*4 (just to see if that makes a difference) One thing to try is to loose any geom raid, if raid needed use ataraid instead. Nope : i did a newfs ad6 (the disk at the Promise TX4) and then an rsync on it panics the same way as the geom_concat case did. I'm shuffeling boards and controllers here to try to reproduce, so far no luck it just works(tm), it seems to depend quite heavily on the right combination of possibly marginal HW Rather than the marginal HW part, it seems, for me, closely related to MB/BIOS (as well (Alexander apperently has about the same setup as I have for this test)): a while ago (using releng_6) i tried the same setup on three different MBs: ahd-controller + scsi-boot-disk and TX4 and three disks in geom_mirror; results : - on ASUS A8? board (I use plenty of them without the sligthest problem for years; not really expensive but not marginal IMHO) : just look at it and it would crash (g_vfs_done) - on Tyan S28?? : rock stable, unable to crash however hard I tried - on some MSI K8 (I usually run Vista on for testing; this one I really bought as cheap as possible ) : would run OK, even under rather heavy load, but when pushing really hard it finaly deliveres the lovely g_vfs_done ... I vaguely remember from another PR that the Promise card does something with PCI-bursting which fbsd does not detect and/or handle correctly (and beyond my simple skills as dumb tester, but maybe the linux-sources contain a clue about that as well). Regards and thanx for your efforts Arno ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]