Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA
This is a reply to a very old thread. I decided to reply because 1. nobody has mentioned the real cause of the problem yet (some answers were misleading or even outright wrong), 2. I've experienced the same problem in the past few weeks, 3. my findings might be useful for other people who are googling for the symptoms (like me) and stumble across this thread. The drive in question seems to be very popular, especially in low-end private servers and home machines. It is very reliable; I still have these and similar ones in production. The drive of mine that exhibited the problem recently is this: ad0: 24405MB IBM DJNA-352500 J51OA30K at ata0-master UDMA66 It is powering a small server running DNS, SMTP, WWW and other things for several private domains. The load is very low, most of the time. Now for the actual problem: V.I.Victor idmc_v...@intgdev.com wrote: For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been reporting: Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947 Feb 14 12:11:09 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2706335 Feb 15 00:12:02 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383 Feb 15 12:12:57 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=139839 Feb 16 00:13:50 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391 Feb 16 12:14:36 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391 The system was created Jan 08 and, prior to the above, the ad0: timeout had only been reported twice: Jan 25 11:43:34 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=17920255 Feb 6 11:59:42 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383 [...] ad0: 14664MB IBM-DJNA-351520/J56OA30K [29795/16/63] at ata0-master UDMA66 First of all: The disk is *not* dying. SMART won't reveal anything. The behaviour is perfectly normal for IBM-DJNA-3* type disks. When those disks are used in continuous operation (24/7), they will go into automatic maintenance mode after 6 days. This is kind of a short self-test and recalibration to ensure reliable continous operation. It will be repeated after another 6 days ad infinitum. Note that there are exactly 12 days between your Jan 25 and Feb 6 incidents, and exactly 6 days between Feb 6 and Feb 12 incidents. An automatic maintenance on Jan 31 apparently finished successfully without a timeout message. Normally the drive will wait until it detects an idle period, then perform the maintenance, then continue normal operation. Maintenance mode involves a short spin down / spin up cycle. However, if the drive receives a command during spin down, it will abort maintenance mode, spin up (which takes a few seconds and might cause a timeout to the operating system), then perform the command, and RETRY MAINTENACE AFTER 12 HOURS. So that's where your timeout messages every 12 hours come from. This is not in any way harmful. Eventually the maintenance will succeed (i.e. the idle period is long enough to finish), then you won't get timeout messages anymore for at least 6 days. You mentioned that the problem appeared (and disappeared) when you set the machine's clock. This is easy to explain, too. The hard disk has its own clock which is not synchronized with the system clock. It starts counting from zero when the disk is powered up. By changing the system's clock, you shift the offset between it and the drive's clock. That means that periodic activity will happen at different times, relative to the drive's clock. Such periodic activity includes cron jobs and other things. For example, sendmail's queue runner wakes up every 30 minutes by default. Many other daemons also perform periodic activity. All of that can happen to start in the middle of the idle period that the drive chose to use for its maintenance, thus interrupting maintenance, as described above. If the offset between the system's clock and the drive's clock changes, chances are that such periodic activity will happen at different times, from the point of view of the drive, so the likelihood that the drive can complete its maintenance changes (better or worse). Unfortunately there is no way to configure or disable that maintenance mode. The only way to somewhat control it is to periodically enforce a spin-down (standby ATA command) when you know that the drive is idle. This usually requires to unmount the filesystems, though, because otherwise you can't guarantee that they will be idle for long enough. You can read IBM's official documentation here:
Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA
[...] First of all: The disk is *not* dying. SMART won't reveal anything. The behaviour is perfectly normal for IBM-DJNA-3* type disks. When those disks are used in continuous operation (24/7), they will go into automatic maintenance mode after 6 days. This is kind of a short self-test and recalibration to ensure reliable continous operation. It will be repeated after another 6 days ad infinitum. It's been over 3.5 years since my original post -- imagine my surprise! The drive's still running (24/7) and still reporting the same retries. Because of the pattern of the retries, I never really thought that the drive was bad. But, until now, I never knew why it was happening. Thanks *very* much for the info! ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA
On Sun, 19 Feb 2006, Mike Tancsa wrote: On Sun, 19 Feb 2006 22:21:04 +, in sentex.lists.freebsd.questions you wrote: On Thu, 16 Feb 2006, Mike Tancsa wrote: For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been reporting: Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947 So -- can anyone help track this down? It sounds like a hardware issue. Install /usr/ports/sysutils/smartmontools and ask the drive to see whats up. I installed 'smartmontools' but haven't used as yet. I've been waiting to see what happens -- the problem simply stopped. There've been no ad0: TIMEOUT messages for 3-days. The errors get logged in the drive so you dont have to wait for more errors to happen. Start it running now so you can see if any of the bad counters are changing as well as to ask the drive what it was. My guess is you have some bad sectors the drive remapped. OK. No problems found... And -- still -- no more ad0: TIMEOUTs But, I'm not really surprised. As mentioned in the original post, a 2-gig file had been created that presumably moved-past any bad sector patches; approx. midway during the TIMEOUT report period. Plus -- since the drive is (was) storing email, writing logs, etc. 24-hrs a day, it seems improbable that bad-sectors would only show-up every 12-hrs. Although I'm uncomfortable with magic-fixes, I wonder if there's more than a coincidental connection between setting the date and the reports starting and stopping. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA
I had a drive dying and it showed up just like this - it turned out to be the daily scripts that scan for file changes, etc, and my backup script were tickling a back sector of the disk. Have you run the smartctl -t long /dev/ad0 command to have it perform a full self test? You normally have to let that run for a while, then take another look at the smart error log to see if anything showed up. Mine ended up having an error that the drive could not self correct. As to why you're able to write a 2 gig file without a problem - if you have some binary or config file or man file, etc sitting on those bad spots, you wouldn't be writing to those blocks. Anytime a security script iterates through them, they would be tickling that block, causing an error. Another possibility is that you have a bad ide cable. Hopefully that is of some use. Jerry http://www.networkstrike.com V.I.Victor wrote: On Sun, 19 Feb 2006, Mike Tancsa wrote: On Sun, 19 Feb 2006 22:21:04 +, in sentex.lists.freebsd.questions you wrote: On Thu, 16 Feb 2006, Mike Tancsa wrote: For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been reporting: Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947 So -- can anyone help track this down? It sounds like a hardware issue. Install /usr/ports/sysutils/smartmontools and ask the drive to see whats up. I installed 'smartmontools' but haven't used as yet. I've been waiting to see what happens -- the problem simply stopped. There've been no ad0: TIMEOUT messages for 3-days. The errors get logged in the drive so you dont have to wait for more errors to happen. Start it running now so you can see if any of the bad counters are changing as well as to ask the drive what it was. My guess is you have some bad sectors the drive remapped. OK. No problems found... And -- still -- no more ad0: TIMEOUTs But, I'm not really surprised. As mentioned in the original post, a 2-gig file had been created that presumably moved-past any bad sector patches; approx. midway during the TIMEOUT report period. Plus -- since the drive is (was) storing email, writing logs, etc. 24-hrs a day, it seems improbable that bad-sectors would only show-up every 12-hrs. Although I'm uncomfortable with magic-fixes, I wonder if there's more than a coincidental connection between setting the date and the reports starting and stopping. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA
On Thu, 16 Feb 2006, Mike Tancsa wrote: For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been reporting: Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947 So -- can anyone help track this down? It sounds like a hardware issue. Install /usr/ports/sysutils/smartmontools and ask the drive to see whats up. I installed 'smartmontools' but haven't used as yet. I've been waiting to see what happens -- the problem simply stopped. There've been no ad0: TIMEOUT messages for 3-days. The only thing done outside of the ordinary, prior to the messages stopping, was to set the date. It's probably a coincidence but setting the date was also the last thing done before the the messages started. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA
On Sun, 19 Feb 2006 22:21:04 +, in sentex.lists.freebsd.questions you wrote: On Thu, 16 Feb 2006, Mike Tancsa wrote: For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been reporting: Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947 So -- can anyone help track this down? It sounds like a hardware issue. Install /usr/ports/sysutils/smartmontools and ask the drive to see whats up. I installed 'smartmontools' but haven't used as yet. I've been waiting to see what happens -- the problem simply stopped. There've been no ad0: TIMEOUT messages for 3-days. The errors get logged in the drive so you dont have to wait for more errors to happen. Start it running now so you can see if any of the bad counters are changing as well as to ask the drive what it was. My guess is you have some bad sectors the drive remapped. ---Mike Mike Tancsa, Sentex communications http://www.sentex.net Providing Internet Access since 1994 [EMAIL PROTECTED], (http://www.tancsa.com) ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA
V.I.Victor [EMAIL PROTECTED] wrote: For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been reporting: Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947 Feb 14 12:11:09 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2706335 Feb 15 00:12:02 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383 Feb 15 12:12:57 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=139839 Feb 16 00:13:50 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391 Feb 16 12:14:36 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391 The system was created Jan 08 and, prior to the above, the ad0: timeout had only been reported twice: Jan 25 11:43:34 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=17920255 Feb 6 11:59:42 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383 If smartd doesn't report any problems try changing the ATA cable. Worked for me once, although my messages didn't come every 12 hours and would later result in an disconnected system disk leading to a crash. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA
Fabian Keil wrote: V.I.Victor [EMAIL PROTECTED] wrote: For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been reporting: Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947 Feb 14 12:11:09 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2706335 Feb 15 00:12:02 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383 Feb 15 12:12:57 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=139839 Feb 16 00:13:50 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391 Feb 16 12:14:36 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391 The system was created Jan 08 and, prior to the above, the ad0: timeout had only been reported twice: Jan 25 11:43:34 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=17920255 Feb 6 11:59:42 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383 Hi, I hate sending out a me too message but yet here we are. It's not a production machine so I was going to let it run its course. The messages stopped appearing after a couple of days. I reviewed my logs and couldn't find anything more revealing and my system appeared to be functioning well. Likely unrelated but my new activities at the time were experimenting with some new SAMBA shares and setting-up the CAM driver. There is likely no correlation but I thought I'd mention it. --Duane ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Every 12-hrs -- ad0: TIMEOUT - WRITE DMA
For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been reporting: Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947 Feb 14 12:11:09 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2706335 Feb 15 00:12:02 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383 Feb 15 12:12:57 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=139839 Feb 16 00:13:50 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391 Feb 16 12:14:36 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391 The system was created Jan 08 and, prior to the above, the ad0: timeout had only been reported twice: Jan 25 11:43:34 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=17920255 Feb 6 11:59:42 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383 Before to asking here, I did several searches for possible causes. I think I've eliminated disk spin-down and bad-block-mapping -- just before the Feb 15, 12:12 period a 2-gig file was created; leaving the disk 'spinning' and bad-blocks presumably bypassed. Another found item said that some IBM drives recalibrate every 25-hours. Interesting concept, but a different period and without previous history. Lastly, several items referred to changing PREEMPTION but never seemed to reach a final conclusion. I also checked the cron log and found nothing running at the timeout times. So -- can anyone help track this down? Final note: the hardware is an old, resurrected Win98 machine running 24/7 and is used only for email processing. I installed it primarily as a proof of concept, so it can be replaced if necessary. Some specifics: FreeBSD 5.4-RELEASE #0: Sun May 8 10:21:06 UTC 2005 CPU: AMD-K7(tm) Processor (598.84-MHz 686-class CPU) Origin = AuthenticAMD Id = 0x612 Stepping = 2 Features=0x81f9ffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,MMX AMD Features=0xc040AMIE,DSP,3DNow! real memory = 134152192 (127 MB) avail memory = 121630720 (115 MB) ACPI disabled by blacklist. Contact your BIOS vendor. ad0: 14664MB IBM-DJNA-351520/J56OA30K [29795/16/63] at ata0-master UDMA66 acd0: CDROM CREATIVE CD5233E/C2.05 at ata1-master PIO4 Thanks for any help! ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA
On Thu, 16 Feb 2006 20:10:31 +, in sentex.lists.freebsd.questions you wrote: For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been reporting: Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947 So -- can anyone help track this down? It sounds like a hardware issue. Install /usr/ports/sysutils/smartmontools and ask the drive to see whats up. ---Mike Mike Tancsa, Sentex communications http://www.sentex.net Providing Internet Access since 1994 [EMAIL PROTECTED], (http://www.tancsa.com) ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]