RE: WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-28 Thread Ted Mittelstaedt


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Anthony
 Atkielski
 Sent: Sunday, February 27, 2005 2:10 PM
 To: freebsd-questions@freebsd.org
 Subject: Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE


 Mike Tancsa writes:

  Could be a bad sector on the drive, or bad cable. Hard to say.  Try
  /usr/ports/sysutils/smartmontools/
 
  It can read all sorts of info off the drive and help you narrow down
  what the problem might be.

 Wow!  That is a very cool tool.  There's even a Windows port so I can
 use it on my XP machine.

 The two SATA drives show no errors.  The older IDE drive
 (which contains
 the filesystem root) shows the stuff below.  There have been over 1000
 read errors over the lifetime of the disk, but the disk had some hard
 times back in December when it was in my overheated old server, so that
 might account for part of that.  The most recent errors look like they
 might correlate with what I saw today (unfortunately, I'm not sure how
 to interpret them):

Rule of thumb on IDE hard drives, if they show more than a few errors
with a
tool like smartmon, they need to be thrown in the garbage.

Heat is the number one enemy of hard drives.  If this drive overheated,
particularly over a long timeperiod, resistance values and semiconductor
values can shift, permanently, in the electronics of the drive.  So even
if the heads and platters are still good, your on borrowed time with the
circuit board.  And since it's the circuit board that's dodgy, the drive
surface isn't failing, so the problems aren't going to register with
S.M.A.R.T.

Despite S.M.A.R.T., the vast majority of IDE hard drives that fail, fail
without warning.

Ted

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-28 Thread Anthony Atkielski
Ted Mittelstaedt writes:

 Rule of thumb on IDE hard drives, if they show more than a few errors
 with a tool like smartmon, they need to be thrown in the garbage.

Seems prudent to me, but right now I don't have the budget to replace
this drive (yes, 40 GB IDE drives are cheap, but I don't have even
that).

-- 
Anthony


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-27 Thread Anthony Atkielski
I've gotten two messages like the ones below today on my production server
(5.3-RELEASE):

messages:Feb 27 14:48:17 freebie kernel: ad10: TIMEOUT - WRITE_DMA retrying (2 
retries left) LBA=4848803
messages:Feb 27 14:48:17 freebie kernel: ad10: FAILURE - WRITE_DMA timed out

What do these messages mean?  The referenced drive is one of two identical SATA
drives on the server; it holds /tmp and /var.  I don't recall seeing
these messages before.

Is there a way to work backwards from the LBA to the filesystem so that
I can see which file was being referenced when this occurred?

-- 
Anthony


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-27 Thread cpghost
On Sun, Feb 27, 2005 at 03:53:30PM +0100, Anthony Atkielski wrote:
 messages:Feb 27 14:48:17 freebie kernel: ad10: TIMEOUT - WRITE_DMA retrying 
 (2 retries left) LBA=4848803
 messages:Feb 27 14:48:17 freebie kernel: ad10: FAILURE - WRITE_DMA timed out

[...]

 Is there a way to work backwards from the LBA to the filesystem so that
 I can see which file was being referenced when this occurred?

Theoretically, one could use 'fsdb -r' in a scripted manner, to
generate a mapping of file names to blocks (relative to the partition
of the file system you are mapping). Once you have the blocks, you'll
need to do so artithmetics to map those blocks to LBA address ranges
(perhaps via GEOM or using data in disklabels). Finally, you'll have
to locate the range for a particular LBA address and work backwards
up to the inode #, and then to the filename(s) that link to that inode.

Perhaps there's already a system utility or port for this? It would be
really useful!

 Anthony

Cheers,
-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-27 Thread Anthony Atkielski
[EMAIL PROTECTED] writes:

 Theoretically, one could use 'fsdb -r' in a scripted manner, to
 generate a mapping of file names to blocks (relative to the partition
 of the file system you are mapping). Once you have the blocks, you'll
 need to do so artithmetics to map those blocks to LBA address ranges
 (perhaps via GEOM or using data in disklabels). Finally, you'll have
 to locate the range for a particular LBA address and work backwards
 up to the inode #, and then to the filename(s) that link to that inode.

Sounds complicated.  Surely I'm not the first person to wish for such a
utility ... in UNIXland, there seems to be a command for just about
every conceivable purpose (?).

 Perhaps there's already a system utility or port for this? It would be
 really useful!

I'm mainly worried about exactly what the system was trying to write at
the time.  It's not clear from the message whether the write succeeded
or not.

-- 
Anthony


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-27 Thread cpghost
On Sun, Feb 27, 2005 at 05:19:32PM +0100, Anthony Atkielski wrote:
 [EMAIL PROTECTED] writes:
 
  Theoretically, one could use 'fsdb -r' in a scripted manner, to
  generate a mapping of file names to blocks (relative to the partition
  of the file system you are mapping). Once you have the blocks, you'll
  need to do so artithmetics to map those blocks to LBA address ranges
  (perhaps via GEOM or using data in disklabels). Finally, you'll have
  to locate the range for a particular LBA address and work backwards
  up to the inode #, and then to the filename(s) that link to that inode.
 
 Sounds complicated.  Surely I'm not the first person to wish for such a
 utility ... in UNIXland, there seems to be a command for just about
 every conceivable purpose (?).

Or you could write the missing ones :-).

Actually, it's not that hard. You need three mappings:

1. (lba address, (filesystem, block #))
2. ((filesystem, block #), (filesystem, inode #))
3. ((filesystem, inode #), (list of filenames linking to inode #))

Each of those mappings could be done and displayed by a single
utility. Combining all three into a lba2filenames program would
then be trivial.

  Perhaps there's already a system utility or port for this? It would be
  really useful!
 
 I'm mainly worried about exactly what the system was trying to write at
 the time.  It's not clear from the message whether the write succeeded
 or not.

Yes, that's exactly my concern too.

 -- 
 Anthony

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-27 Thread Mike Tancsa
On Sun, 27 Feb 2005 15:53:30 +0100, in sentex.lists.freebsd.questions
you wrote:

I've gotten two messages like the ones below today on my production server
(5.3-RELEASE):

messages:Feb 27 14:48:17 freebie kernel: ad10: TIMEOUT - WRITE_DMA retrying (2 
retries left) LBA=4848803
messages:Feb 27 14:48:17 freebie kernel: ad10: FAILURE - WRITE_DMA timed out

Could be a bad sector on the drive, or bad cable. Hard to say.  Try
/usr/ports/sysutils/smartmontools/

It can read all sorts of info off the drive and help you narrow down
what the problem might be.


---Mike

Mike Tancsa, Sentex communications http://www.sentex.net
Providing Internet Access since 1994
[EMAIL PROTECTED], (http://www.tancsa.com)
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-27 Thread Anthony Atkielski
[EMAIL PROTECTED] writes:

 Actually, it's not that hard. You need three mappings:

 1. (lba address, (filesystem, block #))
 2. ((filesystem, block #), (filesystem, inode #))
 3. ((filesystem, inode #), (list of filenames linking to inode #))

Seems like it would be straightforward with adequate documentation.

-- 
Anthony


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-27 Thread Anthony Atkielski
Mike Tancsa writes:

 Could be a bad sector on the drive, or bad cable. Hard to say.  Try
 /usr/ports/sysutils/smartmontools/

 It can read all sorts of info off the drive and help you narrow down
 what the problem might be.

Wow!  That is a very cool tool.  There's even a Windows port so I can
use it on my XP machine.

The two SATA drives show no errors.  The older IDE drive (which contains
the filesystem root) shows the stuff below.  There have been over 1000
read errors over the lifetime of the disk, but the disk had some hard
times back in December when it was in my overheated old server, so that
might account for part of that.  The most recent errors look like they
might correlate with what I saw today (unfortunately, I'm not sure how
to interpret them):

==
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG SV4002H
Serial Number:0413J1FR932555
Firmware Version: QP100-07
Device is:In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 1
Local Time is:Sun Feb 27 22:52:54 2005 CET

== WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

The SMART RETURN STATUS return value (smartmontools -H option/Directive)
 can not be retrieved with this version of ATAng, please do not rely on this 
value
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: (1560) seconds.
Offline data collection
capabilities:(0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine 
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:(   8) minutes.

SMART Attributes Data Structure revision number: 9
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000a   100   100   000Old_age   Always   
-   1050
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   55
  5 Reallocated_Sector_Ct   0x0033   253   253   009Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000b   253   253   051Pre-fail  Always   
-   0
  8 Seek_Time_Performance   0x0024   253   253   000Old_age   Offline  
-   0
  9 Power_On_Hours  0x0032   096   096   000Old_age   Always   
-   2968364
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   54
194 Temperature_Celsius 0x0022   175   145   000Old_age   Always   
-   21
197 Current_Pending_Sector  0x0033   253   253   009Pre-fail  Always   
-   0
198 Offline_Uncorrectable   0x0031   253   253   009Pre-fail  Offline  
-   0
199 UDMA_CRC_Error_Count0x000a   200   200   000Old_age   Always   
-   0
200 Multi_Zone_Error_Rate   0x000b   100   100   051Pre-fail  Always   
-   0
201 Soft_Read_Error_Rate0x000b   100   100   051Pre-fail  Always   
-   1

SMART Error Log Version: 1
Warning: ATA error count 22 inconsistent with error log pointer 4

ATA Error Count: 22 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number 

Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-27 Thread Mike Tancsa
On Sun, 27 Feb 2005 23:09:50 +0100, in sentex.lists.freebsd.questions
you wrote:

Mike Tancsa writes:

 Could be a bad sector on the drive, or bad cable. Hard to say.  Try
 /usr/ports/sysutils/smartmontools/

 It can read all sorts of info off the drive and help you narrow down
 what the problem might be.


The two SATA drives show no errors.  The older IDE drive (which contains
the filesystem root) shows the stuff below.  There have been over 1000

Device does not support Selective Self Tests/Logging


Try running some of the tests on the SATA drives as well as run the
monitoring daemon. With any luck, it will provide a little more
information about the error condition you are seeing.

---Mike

Mike Tancsa, Sentex communications http://www.sentex.net
Providing Internet Access since 1994
[EMAIL PROTECTED], (http://www.tancsa.com)
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE

2005-02-27 Thread Garance A Drosihn
At 3:53 PM +0100 2/27/05, Anthony Atkielski wrote:
I've gotten two messages like the ones below today on my
production server (5.3-RELEASE):
... kernel: ad10: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=4848803
... kernel: ad10: FAILURE - WRITE_DMA timed out
What do these messages mean?  The referenced drive is one of
two identical SATA drives on the server; it holds /tmp and /var.
I don't recall seeing these messages before.
Is there a way to work backwards from the LBA to the filesystem
so that I can see which file was being referenced when this
occurred?
First question: which SATA controller are you using?  And what is
the makemodel of the hard drives that you are using?
Note: There have been several different threads on different mailing
lists from users having WRITE_DMA errors similar to this.  At least
some of the problem is in the code which handles disk I/O.  The
developer who works the most on that code is in the middle of a
fairly major set of improvements to it, as is described in the
thread with a subject of:
UPDATE2: ATA mkIII first official patches - please test!
on the freebsd-current and freebsd-stable mailing list.  That major
set of improvements is still being tested, but it does solve some
ATA/SATA issues for many users.  Which issues you are running into
will depend on which SATA controller you have, and the makemodel
of SATA hard-disks that you have attached to the controller.
I realize that none of that info really helps you right now, but
I just thought I would say that it may be you're not having any
hardware problems.  Or at least, not on the disk itself.  It might
be a problem with the disk-controller, or it might be fairly minor
timing-problems that come up under certain kinds of load.
Of course, it still *could* be your hard disk...  Also note that I
am not an expert on hard disks or disk I/O.  It's just that I've
suffered through many similar problems, and I know that Søren has
been working on the newer, improved code for handling ATA/SATA.
--
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior Systems Programmer   or  [EMAIL PROTECTED]
Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]