Re: Followup to "fallback to PIO mode" on dual processor AMD systems
On Fri, Jan 03, 2003 at 06:36:29AM +1100, Bruce Evans wrote: > > The fallback is clearly wrong because it turns isolated media errors > into pessimized i/o for the whole disk at best, system hangs during > resets next best, and system crashes at worst. I keep a disk with bad > media on line for testing some of this, and zap the fallback using the > following patch (hope this is complete; it was edited from a larger > patch). Perhaps the right answer is to test uptime and do the fallback if the error happens in the first minute, at least for permanently-mounted disks. In any case, retries in the current mode should be exhausted first. -- Barney Wolff http://www.databus.com/bwresume.pdf I'm available by contract or FT, in the NYC metro area or via the 'Net. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-questions" in the body of the message
Re: Followup to "fallback to PIO mode" on dual processor AMD systems
Quoting Bruce Evans <[EMAIL PROTECTED]>: > On Thu, 2 Jan 2003, Bruce Campbell wrote: > > > At present, I don't suspect bad media because the error message is > > "WRITE command timeout tag=0 serv=0" which doesn't suggest a specific > > sector/track etc, and running with UDMA33 instead of UDMA100 makes the > problem > > appear to vanish. > > The fallback is clearly wrong because it turns isolated media errors > into pessimized i/o for the whole disk at best, system hangs during > resets next best, and system crashes at worst. I keep a disk with bad > media on line for testing some of this, and zap the fallback using the > following patch (hope this is complete; it was edited from a larger > patch). Thanks for the patch. Under moderate load, I am seeing occasional instances of: /kernel: ad0: WRITE command timeout tag=0 serv=0 - resetting /kernel: ata0: resetting devices .. done and everything keeps on working normally via DMA. ie it does not drop to PIO. The more manacing case is this: Dec 30 23:26:59 /kernel: ad0: WRITE command timeout tag=0 serv=0 - resetting Dec 30 23:26:59 /kernel: ata0: resetting devices .. done Dec 30 23:26:59 /kernel: ad0: WRITE command timeout tag=0 serv=0 - resetting Dec 30 23:27:00 /kernel: ata0: resetting devices .. done Dec 30 23:27:00 /kernel: ad0: WRITE command timeout tag=0 serv=0 - resetting Dec 30 23:27:00 /kernel: ata0: resetting devices .. done Dec 30 23:27:00 /kernel: ad0: WRITE command timeout tag=0 serv=0 - resetting Dec 30 23:27:00 /kernel: ad0: timeout waiting for cmd=ef s=d0 e=00 Dec 30 23:27:00 /kernel: ad0: trying fallback to PIO mode Dec 30 23:27:00 /kernel: ata0: resetting devices .. done So it appears it would no longer with DMA, but it would work with PIO. If it is manually set back to UDMA with the atacontrol command, it times out again, and falls back to PIO. However, a soft reboot, and all is well again. > > %%% > Index: ata-disk.c > === > RCS file: /home/ncvs/src/sys/dev/ata/ata-disk.c,v > retrieving revision 1.139 > diff -u -2 -r1.139 ata-disk.c > --- ata-disk.c17 Dec 2002 16:26:22 - 1.139 > +++ ata-disk.c18 Dec 2002 01:03:37 - > @@ -597,5 +606,5 @@ > else { > ata_dmainit(adp->device, ata_pmode(adp->device->param), -1, -1); > - printf(" falling back to PIO mode\n"); > + printf(" NOT falling back to PIO mode\n"); > } > TAILQ_INSERT_HEAD(&adp->device->channel->ata_queue, request, chain); > @@ -603,4 +612,5 @@ > } > > +#if 0 > /* if using DMA, try once again in PIO mode */ > if (request->flags & ADR_F_DMA_USED) { > @@ -613,4 +623,5 @@ > return ATA_OP_FINISHED; > } > +#endif > > request->flags |= ADR_F_ERROR; > %%% > > Bruce > -- Bruce Campbell Engineering Computing CPH-2374B University of Waterloo (519)888-4567 ext 5889 This mail sent through www.mywaterloo.ca To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-questions" in the body of the message
Re: Followup to "fallback to PIO mode" on dual processor AMD systems
On Thu, 2 Jan 2003, Bruce Campbell wrote: > At present, I don't suspect bad media because the error message is > "WRITE command timeout tag=0 serv=0" which doesn't suggest a specific > sector/track etc, and running with UDMA33 instead of UDMA100 makes the problem > appear to vanish. The fallback is clearly wrong because it turns isolated media errors into pessimized i/o for the whole disk at best, system hangs during resets next best, and system crashes at worst. I keep a disk with bad media on line for testing some of this, and zap the fallback using the following patch (hope this is complete; it was edited from a larger patch). %%% Index: ata-disk.c === RCS file: /home/ncvs/src/sys/dev/ata/ata-disk.c,v retrieving revision 1.139 diff -u -2 -r1.139 ata-disk.c --- ata-disk.c 17 Dec 2002 16:26:22 - 1.139 +++ ata-disk.c 18 Dec 2002 01:03:37 - @@ -597,5 +606,5 @@ else { ata_dmainit(adp->device, ata_pmode(adp->device->param), -1, -1); - printf(" falling back to PIO mode\n"); + printf(" NOT falling back to PIO mode\n"); } TAILQ_INSERT_HEAD(&adp->device->channel->ata_queue, request, chain); @@ -603,4 +612,5 @@ } +#if 0 /* if using DMA, try once again in PIO mode */ if (request->flags & ADR_F_DMA_USED) { @@ -613,4 +623,5 @@ return ATA_OP_FINISHED; } +#endif request->flags |= ADR_F_ERROR; %%% Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-questions" in the body of the message
Re: Followup to 'fallback to PIO mode' on dual processor AMD systems
Bruce Campbell said: > > - try UDMA100 with the drives directly attached (ie. no removable tray) - > maybe try a non onboard IDE controller yes I would reccomend a PCI ide controller, such as the Promise ATA/100, or Promise ATA/66. Also be sure your IDE cables are 18" and not 24" or 32" some people like to go crazy with overly long IDE cables. Sometimes for me longer then 18" and I get CRC errors(but nothing fatal). > - shuffle the disks to see if the problems follow the disks or not > > At present, I don't suspect bad media because the error message is "WRITE > command timeout tag=0 serv=0" which doesn't suggest a specific > sector/track etc, and running with UDMA33 instead of UDMA100 makes the > problem appear to vanish. I read your burn in procedures, a couple additions to throw in I'd reccomend: CPUBurn: http://users.ev1.net/~redelm/ I've only tried it on linux but the page lists *BSD too. This package also includes a memory tester, I usually run 1 CPUburn process per CPU and as many memory testers as I have RAM. If you try to load too many the newest process will segfault(since it can't allocate memory), harmless. Run this for at least 24 hours. memtest86: http://www.memtest86.com/ when you boot it, go to the options screen and turn on all tests, and run it through once or twice, with your system I'd expect 1 pass of all tests to be done in about 20 hours. most of my servers that run IDE have DMA/33 controllers, the few that have faster ones all use Promise ATA/100 cards or 3ware 6800 series raid cards. I haven't trusted recent AMD/VIA/Intel IDE chips for a while. nate To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-questions" in the body of the message