Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA

2009-10-04 Thread Oliver Fromme
This is a reply to a very old thread.

I decided to reply because

 1. nobody has mentioned the real cause of the problem yet
(some answers were misleading or even outright wrong),

 2. I've experienced the same problem in the past few weeks,

 3. my findings might be useful for other people who are
googling for the symptoms (like me) and stumble across
this thread.

The drive in question seems to be very popular, especially
in low-end private servers and home machines.  It is very
reliable; I still have these and similar ones in production.
The drive of mine that exhibited the problem recently is
this:

ad0: 24405MB IBM DJNA-352500 J51OA30K at ata0-master UDMA66

It is powering a small server running DNS, SMTP, WWW and
other things for several private domains.  The load is very
low, most of the time.

Now for the actual problem:

V.I.Victor idmc_v...@intgdev.com wrote:
  For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been
  reporting:
  
  Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=2701279
  Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=2701279
  Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=2963331
  Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=2705947
  Feb 14 12:11:09 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=2706335
  Feb 15 00:12:02 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=2832383
  Feb 15 12:12:57 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=139839
  Feb 16 00:13:50 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=131391
  Feb 16 12:14:36 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=131391
  
  The system was created Jan 08 and, prior to the above, the ad0: timeout had
  only been reported twice:
  
  Jan 25 11:43:34 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=17920255
  Feb 6 11:59:42 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
  LBA=2832383
  [...]
  ad0: 14664MB IBM-DJNA-351520/J56OA30K [29795/16/63] at ata0-master UDMA66

First of all:  The disk is *not* dying.  SMART won't reveal anything.
The behaviour is perfectly normal for IBM-DJNA-3* type disks.

When those disks are used in continuous operation (24/7), they
will go into automatic maintenance mode after 6 days.  This is
kind of a short self-test and recalibration to ensure reliable
continous operation.  It will be repeated after another 6 days
ad infinitum.

Note that there are exactly 12 days between your Jan 25 and Feb 6
incidents, and exactly 6 days between Feb 6 and Feb 12 incidents.
An automatic maintenance on Jan 31 apparently finished successfully
without a timeout message.

Normally the drive will wait until it detects an idle period,
then perform the maintenance, then continue normal operation.
Maintenance mode involves a short spin down / spin up cycle.

However, if the drive receives a command during spin down, it
will abort maintenance mode, spin up (which takes a few seconds
and might cause a timeout to the operating system), then
perform the command, and RETRY MAINTENACE AFTER 12 HOURS.

So that's where your timeout messages every 12 hours come from.
This is not in any way harmful.  Eventually the maintenance
will succeed (i.e. the idle period is long enough to finish),
then you won't get timeout messages anymore for at least 6 days.

You mentioned that the problem appeared (and disappeared) when
you set the machine's clock.  This is easy to explain, too.
The hard disk has its own clock which is not synchronized with
the system clock.  It starts counting from zero when the disk
is powered up.  By changing the system's clock, you shift the
offset between it and the drive's clock.

That means that periodic activity will happen at different times,
relative to the drive's clock.  Such periodic activity includes
cron jobs and other things.  For example, sendmail's queue runner
wakes up every 30 minutes by default.  Many other daemons also
perform periodic activity.  All of that can happen to start in
the middle of the idle period that the drive chose to use for its
maintenance, thus interrupting maintenance, as described above.

If the offset between the system's clock and the drive's clock
changes, chances are that such periodic activity will happen at
different times, from the point of view of the drive, so the
likelihood that the drive can complete its maintenance changes
(better or worse).

Unfortunately there is no way to configure or disable that
maintenance mode.  The only way to somewhat control it is to
periodically enforce a spin-down (standby ATA command) when
you know that the drive is idle.  This usually requires to
unmount the filesystems, though, because otherwise you can't
guarantee that they will be idle for long enough.

You can read IBM's official documentation here:


Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA

2009-10-04 Thread V.I.Victor

[...]
First of all:  The disk is *not* dying.  SMART won't reveal anything.
The behaviour is perfectly normal for IBM-DJNA-3* type disks.

When those disks are used in continuous operation (24/7), they
will go into automatic maintenance mode after 6 days.  This is
kind of a short self-test and recalibration to ensure reliable
continous operation.  It will be repeated after another 6 days
ad infinitum.


It's been over 3.5 years since my original post -- imagine my surprise!

The drive's still running (24/7) and still reporting the same retries. Because 
of the pattern of the retries, I never really thought that the drive was bad.  
But, until now, I never knew why it was happening.

Thanks *very* much for the info!



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA

2006-02-20 Thread V.I.Victor
On Sun, 19 Feb 2006, Mike Tancsa wrote:

 On Sun, 19 Feb 2006 22:21:04 +, in sentex.lists.freebsd.questions
 you wrote:

 On Thu, 16 Feb 2006, Mike Tancsa wrote:

 For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been
 reporting:

 Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2701279
 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2701279
 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2963331
 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2705947

 So -- can anyone help track this down?


 It sounds like a hardware issue. Install
 /usr/ports/sysutils/smartmontools and ask the drive to see whats up.

 I installed 'smartmontools' but haven't used as yet. I've been waiting to
 see what happens -- the problem simply stopped. There've been no ad0:
 TIMEOUT messages for 3-days.

 The errors get logged in the drive so you dont have to wait for more
 errors to happen. Start it running now so you can see if any of the
 bad counters are changing as well as to ask the drive what it was.
 My guess is you have some bad sectors the drive remapped.

OK. No problems found... And -- still -- no more ad0: TIMEOUTs

But, I'm not really surprised. As mentioned in the original post, a
2-gig file had been created that presumably moved-past any bad
sector patches; approx. midway during the TIMEOUT report period.

Plus -- since the drive is (was) storing email, writing logs, etc.
24-hrs a day, it seems improbable that bad-sectors would only show-up
every 12-hrs.

Although I'm uncomfortable with magic-fixes, I wonder if there's
more than a coincidental connection between setting the date and the
reports starting and stopping.





___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA

2006-02-20 Thread Jerry Bell
I had a drive dying and it showed up just like this - it turned out to 
be the daily scripts that scan for file changes, etc, and my backup 
script were tickling a back sector of the disk.  Have you run the 
smartctl -t long /dev/ad0 command to have it perform a full self test?  
You normally have to let that run for a while, then take another look at 
the smart error log to see if anything showed up.  Mine ended up having 
an error that the drive could not self correct. 

As to why you're able to write a 2 gig file without a problem - if you 
have some binary or config file or man file, etc sitting on those bad 
spots, you wouldn't be writing to those blocks.  Anytime a security 
script iterates through them, they would be tickling that block, causing 
an error.


Another possibility is that you have a bad ide cable.

Hopefully that is of some use.

Jerry
http://www.networkstrike.com

V.I.Victor wrote:

On Sun, 19 Feb 2006, Mike Tancsa wrote:

  

On Sun, 19 Feb 2006 22:21:04 +, in sentex.lists.freebsd.questions
you wrote:



On Thu, 16 Feb 2006, Mike Tancsa wrote:

  

For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been
reporting:

Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279
Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279
Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331
Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947

So -- can anyone help track this down?
  

It sounds like a hardware issue. Install
/usr/ports/sysutils/smartmontools and ask the drive to see whats up.


I installed 'smartmontools' but haven't used as yet. I've been waiting to
see what happens -- the problem simply stopped. There've been no ad0:
TIMEOUT messages for 3-days.
  

The errors get logged in the drive so you dont have to wait for more
errors to happen. Start it running now so you can see if any of the
bad counters are changing as well as to ask the drive what it was.
My guess is you have some bad sectors the drive remapped.



OK. No problems found... And -- still -- no more ad0: TIMEOUTs

But, I'm not really surprised. As mentioned in the original post, a
2-gig file had been created that presumably moved-past any bad
sector patches; approx. midway during the TIMEOUT report period.

Plus -- since the drive is (was) storing email, writing logs, etc.
24-hrs a day, it seems improbable that bad-sectors would only show-up
every 12-hrs.

Although I'm uncomfortable with magic-fixes, I wonder if there's
more than a coincidental connection between setting the date and the
reports starting and stopping.





___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]
  

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA

2006-02-19 Thread V.I.Victor
On Thu, 16 Feb 2006, Mike Tancsa wrote:

 For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been
 reporting:

 Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2701279
 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2701279
 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2963331
 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2705947

 So -- can anyone help track this down?


 It sounds like a hardware issue. Install
 /usr/ports/sysutils/smartmontools and ask the drive to see whats up.

I installed 'smartmontools' but haven't used as yet. I've been waiting to
see what happens -- the problem simply stopped. There've been no ad0:
TIMEOUT messages for 3-days.

The only thing done outside of the ordinary, prior to the messages
stopping, was to set the date. It's probably a coincidence but setting
the date was also the last thing done before the the messages started.



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA

2006-02-19 Thread Mike Tancsa
On Sun, 19 Feb 2006 22:21:04 +, in sentex.lists.freebsd.questions
you wrote:

On Thu, 16 Feb 2006, Mike Tancsa wrote:

 For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been
 reporting:

 Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2701279
 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2701279
 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2963331
 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2705947

 So -- can anyone help track this down?


 It sounds like a hardware issue. Install
 /usr/ports/sysutils/smartmontools and ask the drive to see whats up.

I installed 'smartmontools' but haven't used as yet. I've been waiting to
see what happens -- the problem simply stopped. There've been no ad0:
TIMEOUT messages for 3-days.

The errors get logged in the drive so you dont have to wait for more
errors to happen.  Start it running now so you can see if any of the
bad counters are changing as well as to ask the drive what it was.
My guess is you have some bad sectors the drive remapped.

---Mike

Mike Tancsa, Sentex communications http://www.sentex.net
Providing Internet Access since 1994
[EMAIL PROTECTED], (http://www.tancsa.com)
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA

2006-02-17 Thread Fabian Keil
V.I.Victor [EMAIL PROTECTED] wrote:

 For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been
 reporting:
 
 Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2701279
 Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2701279
 Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2963331
 Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2705947
 Feb 14 12:11:09 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2706335
 Feb 15 00:12:02 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2832383
 Feb 15 12:12:57 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=139839
 Feb 16 00:13:50 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=131391
 Feb 16 12:14:36 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=131391
 
 The system was created Jan 08 and, prior to the above, the ad0: timeout had
 only been reported twice:
 
 Jan 25 11:43:34 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=17920255
 Feb 6 11:59:42 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
 LBA=2832383

If smartd doesn't report any problems try changing the ATA cable.

Worked for me once, although my messages didn't come every 12 hours
and would later result in an disconnected system disk leading to a
crash.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature


Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA

2006-02-17 Thread Duane Whitty

Fabian Keil wrote:

V.I.Victor [EMAIL PROTECTED] wrote:


For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been
reporting:

Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279
Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279
Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331
Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947
Feb 14 12:11:09 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2706335
Feb 15 00:12:02 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383
Feb 15 12:12:57 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=139839
Feb 16 00:13:50 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391
Feb 16 12:14:36 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391

The system was created Jan 08 and, prior to the above, the ad0: timeout had
only been reported twice:

Jan 25 11:43:34 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
LBA=17920255
Feb 6 11:59:42 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383



Hi,

I hate sending out a me too message but yet here 
we are.


It's not a production machine so I was going to 
let it run its course.  The messages stopped 
appearing after a couple of days.


I reviewed my logs and couldn't find anything more 
revealing and my system appeared to be functioning 
well.


Likely unrelated but my new activities at the time 
were experimenting with some new SAMBA shares and 
setting-up the CAM driver.  There is likely no 
correlation but I thought I'd mention it.



--Duane
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Every 12-hrs -- ad0: TIMEOUT - WRITE DMA

2006-02-16 Thread V.I.Victor
For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been
reporting:

Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279
Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2701279
Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2963331
Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2705947
Feb 14 12:11:09 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2706335
Feb 15 00:12:02 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383
Feb 15 12:12:57 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=139839
Feb 16 00:13:50 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391
Feb 16 12:14:36 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=131391

The system was created Jan 08 and, prior to the above, the ad0: timeout had
only been reported twice:

Jan 25 11:43:34 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
LBA=17920255
Feb 6 11:59:42 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2832383


Before to asking here, I did several searches for possible causes. I think
I've eliminated disk spin-down and bad-block-mapping -- just before the Feb
15, 12:12 period a 2-gig file was created; leaving the disk 'spinning' and
bad-blocks presumably bypassed.

Another found item said that some IBM drives recalibrate every 25-hours.
Interesting concept, but a different period and without previous history.

Lastly, several items referred to changing PREEMPTION but never seemed to reach
a final conclusion.

I also checked the cron log and found nothing running at the timeout times.


So -- can anyone help track this down?


Final note: the hardware is an old, resurrected Win98 machine running 24/7
and is used only for email processing. I installed it primarily as a proof
of concept, so it can be replaced if necessary.

Some specifics:

FreeBSD 5.4-RELEASE #0: Sun May 8 10:21:06 UTC 2005
CPU: AMD-K7(tm) Processor (598.84-MHz 686-class CPU)
Origin = AuthenticAMD Id = 0x612 Stepping = 2
Features=0x81f9ffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,MMX
AMD Features=0xc040AMIE,DSP,3DNow!
real memory = 134152192 (127 MB)
avail memory = 121630720 (115 MB)
ACPI disabled by blacklist. Contact your BIOS vendor.
ad0: 14664MB IBM-DJNA-351520/J56OA30K [29795/16/63] at ata0-master UDMA66
acd0: CDROM CREATIVE CD5233E/C2.05 at ata1-master PIO4

Thanks for any help!




___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Every 12-hrs -- ad0: TIMEOUT - WRITE DMA

2006-02-16 Thread Mike Tancsa
On Thu, 16 Feb 2006 20:10:31 +, in sentex.lists.freebsd.questions
you wrote:

For the last 4-days, our (otherwise OK) 5.4-RELEASE machine has been
reporting:

Feb 12 12:08:05 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
LBA=2701279
Feb 13 00:08:51 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
LBA=2701279
Feb 13 12:09:38 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
LBA=2963331
Feb 14 00:10:24 : ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) 
LBA=2705947

So -- can anyone help track this down?


It sounds like a hardware issue.  Install
/usr/ports/sysutils/smartmontools and ask the drive to see whats up.

---Mike

Mike Tancsa, Sentex communications http://www.sentex.net
Providing Internet Access since 1994
[EMAIL PROTECTED], (http://www.tancsa.com)
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]