subject:"READ_DMA48 error interpretation"

Re: READ_DMA48 error interpretation

2007-02-08 Thread Ian Smith

On Wed, 7 Feb 2007, Richard Lynch wrote:
[I've tried to snip away a lot of stuff, without losing any context...]

I'll prune a bit too, but will backtrack to earlier context, so thanks.

On Tue, February 6, 2007 2:50 am, Ian Smith wrote:
On Mon, 5 Feb 2007 01:13:31 -0600 (CST) Richard Lynch [EMAIL PROTECTED]
wrote:
On Tue, January 16, 2007 3:21 pm, Chuck Swiger wrote:
On Jan 16, 2007, at 1:13 PM, Richard Lynch wrote:
...
+ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=404955007
+ad1: FAILURE - READ_DMA48 status=51READY,DSC,ERROR
error=10NID_NOT_FOUND
LBA=404955007
+g_vfs_done():ad1s1[READ(offset=207336931328, length=16384)]error = 5

Looks like a not ready error maybe. The only value in your ad1.txt that
looks like it's ever been anywhere near any error threshold is ID# 11,
Calibration_Retry_Count, and its current value is fine. Power glitch?

Are you getting any other hard looking errors in /var/log/messages? Is
fsck happy? It never hurts to run 'fsck -n' whenever you feel the urge.

Try installing the sysutils/smartmontools port and run a drive
self-

I ran the short test on the problem drives, and it said everything
was
fine.

I'll try the long test at a later date.

Only your ad3.txt referred to below shows a (short) test having been
completed and logged. You might check the smartctl -a results after
running at least short tests initially (looks like the long ones will
take 4-5 hours for your 4 drives) as Chuck has since suggested.

#2. Sequences like this show up a fair amount:
Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time
changed
from 152 to 153
Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time
changed
from 153 to 152
Device: /dev/ad0, SMART Prefailure Attribute: 8
Seek_Time_Performance
changed from 251 to 250

I'm not sure of the degree of logging you're having smartd use here, but
these small changes of value, especially up and down by 1 but a long way
from any error threshold, seem to be excessive and relatively trivial
perhaps debug-level detail?, ie most likely nothing of any concern.

I suggest reading man smartctl under '-A, --attributes' and then you'll
know as much as I do about what these may mean, and maybe worry less ..

Here are all the smartctl -a outputs:

http://l-i-e.com/ad0.txt
http://l-i-e.com/ad1.txt
http://l-i-e.com/ad2.txt
http://l-i-e.com/ad3.txt

ad3 is giving the most errors...
ad1 gives a fair amount though

Do you mean according to that fine-detail attribute changes logging? Or
real read/write/seek etc errors being logged to messages?

And the ad0 and ad2 seem to be giving the spinup errors.

None of those reports seem to indicate any problems really, though if
anyone else cares to peek and notices any anomalies, I'm all eyes.

As for temperatures, the readings for all 4 drives seem very cool, but
then it is winter over there .. Temperature Celcius for ad0 to ad3 being
36, 27, 22 and 18 degrees C, each present and worst value well clear of
error thresholds .. did you interprete those values as temperatures?

ad0 is pretty much full
ad1 is the one I'm filling up currently
ad2 and ad3 have no actual content on them yet, but will soon

All the drives are kind of in an old PC tower (XT? AT???), except the
outer casing is, errr, not there... Just the framework.

Might be worth checking that your power supply is up to handling 4 big
drives, but they weren't running more than mildly warm when reported.

ad2 and ad3 are in one of these Thermaltake iCage things:
http://www.performance-pcs.com/catalog/index.php?main_page=product_infocPath=257products_id=3533
which converts the old-school floppy drive[s] bay into an IDE bay, and
puts a big honking fan blowing on them.

These too were running nice and cool, 22 and 18C, when reported. Cf my
40GB laptop drive (at smartctl version 5.36 [i386-portbld-freebsd5.5],
rather more recent than your 5.33 freebsd6.0) this afternoon:

194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 40
(Lifetime Min/Max 13/49)

I'm not claiming it's good enough but I tried.

I left the iCage bay between them empty for airflow/cooling.

ad0 and ad1 are in the usual IDE bay of a tower.
I have a fan in there, but without the cover to shape the airflow,
perhaps that is not doing much useful...

Perhaps it wasn't properly warmed up when you ran those reports, but on
the data you've provided you don't have any sort of temperature problem.

I can touch the exposed front and back top (above IDE cable) and lay
my finger along it. It's hot but not like, ouch hot :-)

Over 70C or so is too hot to touch except momentarily. You're cool :)

I don't think it's 100C+ hot, as that's boiling -- but perhaps the
thermometer is somewhere inside or...

Seems more likely, though, that that number is Fahrenheit (sp?) and
not

Re: READ_DMA48 error interpretation

2007-02-08 Thread Garrett Cooper

Ian Smith wrote:

On Wed, 7 Feb 2007, Richard Lynch wrote:
[I've tried to snip away a lot of stuff, without losing any context...]

I'll prune a bit too, but will backtrack to earlier context, so thanks.

Are you getting any other hard looking errors in /var/log/messages? Is
fsck happy? It never hurts to run 'fsck -n' whenever you feel the urge.

Try installing the sysutils/smartmontools port and run a drive
self-

I ran the short test on the problem drives, and it said everything

was
fine.

I'll try the long test at a later date.

I suggest reading man smartctl under '-A, --attributes' and then you'll
know as much as I do about what these may mean, and maybe worry less ..

Here are all the smartctl -a outputs:

http://l-i-e.com/ad0.txt

http://l-i-e.com/ad1.txt
http://l-i-e.com/ad2.txt
http://l-i-e.com/ad3.txt

ad3 is giving the most errors...

ad1 gives a fair amount though

Do you mean according to that fine-detail attribute changes logging? Or
real read/write/seek etc errors being logged to messages?

And the ad0 and ad2 seem to be giving the spinup errors.

None of those reports seem to indicate any problems really, though if
anyone else cares to peek and notices any anomalies, I'm all eyes.

ad0 is pretty much full
ad1 is the one I'm filling up currently
ad2 and ad3 have no actual content on them yet, but will soon

All the drives are kind of in an old PC tower (XT? AT???), except the

outer casing is, errr, not there... Just the framework.

Might be worth checking that your power supply is up to handling 4 big
drives, but they weren't running more than mildly warm when reported.

ad2 and ad3 are in one of these Thermaltake iCage things:

http://www.performance-pcs.com/catalog/index.php?main_page=product_infocPath=257products_id=3533
which converts the old-school floppy drive[s] bay into an IDE bay, and
puts a big honking fan blowing on them.

194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 40
(Lifetime Min/Max 13/49)

I'm not claiming it's good enough but I tried.

I left the iCage bay between them empty for airflow/cooling.

ad0 and ad1 are in the usual IDE bay of a tower.

I have a fan in there, but without the cover to shape the airflow,
perhaps that is not doing much useful...

Perhaps it wasn't properly warmed up when you ran those reports, but on
the data you've provided you don't have any sort of temperature problem.

I can touch the exposed front and back top (above IDE cable) and lay
my finger along it. It's hot but not like, ouch hot :-)

Over 70C or so is too hot to touch except momentarily. You're cool :)

I don't think it's 100C+ hot, as that's boiling -- but perhaps the
thermometer is somewhere inside or...

Seems more likely, though, that that number is

Re: READ_DMA48 error interpretation

2007-02-07 Thread Richard Lynch

[I've tried to snip away a lot of stuff, without losing any context...]

On Tue, February 6, 2007 2:50 am, Ian Smith wrote:
In freebsd-questions Digest, Vol 164, Issue 1
At Message: 19
On Mon, 5 Feb 2007 01:13:31 -0600 (CST) Richard Lynch [EMAIL PROTECTED]
wrote:
On Tue, January 16, 2007 3:21 pm, Chuck Swiger wrote:
On Jan 16, 2007, at 1:13 PM, Richard Lynch wrote:

...
+ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=404955007
+ad1: FAILURE - READ_DMA48 status=51READY,DSC,ERROR
error=10NID_NOT_FOUND
LBA=404955007
+g_vfs_done():ad1s1[READ(offset=207336931328,
length=16384)]error = 5

Try installing the sysutils/smartmontools port and run a drive
self-

I ran the short test on the problem drives, and it said everything
was
fine.

I'll try the long test at a later date.

Show us the result of 'smartctl -a drive' after a test or two.

It'd be more useful to see these within the context shown by smartctl
-a

Whoops!

I did miss that step, didn't I?

Sorry!

Here are all the smartctl -a outputs:

http://l-i-e.com/ad0.txt
http://l-i-e.com/ad1.txt
http://l-i-e.com/ad2.txt
http://l-i-e.com/ad3.txt

ad3 is giving the most errors...
ad1 gives a fair amount though
And the ad0 and ad2 seem to be giving the spinup errors.

ad0 is pretty much full
ad1 is the one I'm filling up currently
ad2 and ad3 have no actual content on them yet, but will soon

All the drives are kind of in an old PC tower (XT? AT???), except the
outer casing is, errr, not there... Just the framework.

I'm not claiming it's good enough but I tried.

I left the iCage bay between them empty for airflow/cooling.

ad0 and ad1 are in the usual IDE bay of a tower.
I have a fan in there, but without the cover to shape the airflow,
perhaps that is not doing much useful...

I can touch the exposed front and back top (above IDE cable) and lay
my finger along it. It's hot but not like, ouch hot :-)

I don't think it's 100C+ hot, as that's boiling -- but perhaps the
thermometer is somewhere inside or...

Seems more likely, though, that that number is Fahrenheit (sp?) and
not Celcius..

I didn't even realize it said C, and thought it was F...

Still seemed pretty dang hot to me.

I could haul in the outer casing and slap it on though, if needed. I
think. That big ol' fan might make that kinda hard. Oh well. I've
got tin-snips somewhere around here... :-v

Oh, here's a rather long excerpt of the log in case there's minutae
within it that I've failed to include:
http://l-i-e.com/smartd.log

The output of smartctl -a for one or two of your drives would likely
be
much more indicative. I don't claim to be an expert in this at all,
but
some of us might spot any obvious anomalies.

I sure appreciate the time y'all are taking on this!

I am definitely not a hardware guy, as you have probably already
surmised. :-)

--
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: READ_DMA48 error interpretation

2007-02-07 Thread Chuck Swiger


Richard Lynch wrote:

[I've tried to snip away a lot of stuff, without losing any context...]

[ ...trimming away context good, people can go back and read the thread... ]


I can touch the exposed front and back top (above IDE cable) and lay
my finger along it.  It's hot but not like, ouch hot :-)


You're not seeing any reallocated sectors and you're not seeing UDMA errors 
(ie, in the cabling).  For lack of any better guesses, I'd gather that your 
drives are running above normal temps and aren't reading data perfectly, but 
are doing well enough that the built-in ECC is managing to deal with the issues.



I don't think it's 100C+ hot, as that's boiling -- but perhaps the
thermometer is somewhere inside or...


On a good day, the thermometers actually provide a real, calibrated, accurate 
result...but many drives don't even come close.



The output of smartctl -a for one or two of your drives would likely
be much more indicative.  I don't claim to be an expert in this at all,
but some of us might spot any obvious anomalies.


I sure appreciate the time y'all are taking on this!

I am definitely not a hardware guy, as you have probably already
surmised. :-)


You should actually run smartctl -t long /dev/ad0 and repeat for all of the 
devices, and then re-check the smartctl -a output.  Might be better not to 
run the self-tests all at once, come to think of it.


--
-Chuck


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: READ_DMA48 error interpretation

2007-02-06 Thread Ian Smith

In freebsd-questions Digest, Vol 164, Issue 1
At Message: 19
On Mon, 5 Feb 2007 01:13:31 -0600 (CST) Richard Lynch [EMAIL PROTECTED] wrote:
  On Tue, January 16, 2007 3:21 pm, Chuck Swiger wrote:
   On Jan 16, 2007, at 1:13 PM, Richard Lynch wrote:
   I know the messages below mean the hard drive or IDE cards are
   having
   problems.  But is this like RED ALERT or more like YELLOW or what?
  ...
   +ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=404955007
   +ad1: FAILURE - READ_DMA48 status=51READY,DSC,ERROR
   error=10NID_NOT_FOUND
   LBA=404955007
   +g_vfs_done():ad1s1[READ(offset=207336931328, length=16384)]error = 5
  
   If you have current backups, it's a yellow alert.  Otherwise...
  
   And what do I do about it?
  
   umount and fsck everything a lot?

Once should do :)  It's possible to have read errors, from a write error
say on unclean power removal, that don't indicate a drive fault at all.

   swap cards/drives around until it stops?
   Ignore it and pray?

The latter is or at least was listed as a backup strategy in the docs :)
 
   Try installing the sysutils/smartmontools port and run a drive self-
   test.  That will give you a much better assessment of the state of
   the drive and whether it is likely to completely fail in the next 24
   hours...
  
  I ran the short test on the problem drives, and it said everything was
  fine.
  
  I'll try the long test at a later date.

Show us the result of 'smartctl -a drive' after a test or two.

  Meanwhile, I turned on the smartd daemon, and am seeing two issues in
  the logs...
  
  #1. The drive temperatures seem ridiculously high to this naive
  reader, but what do I know?...
  110 to 190 Celcius?  Yikes...  Or maybe that's normal?
  How hot is too hot?

As [EMAIL PROTECTED] pointed out, 100C is too hot.  I don't believe
those 110 to 190 numbers at all and suspect a drive would melt down at
anything near that.  Maybe these are Farenheit temperatures?

While perryh's advice about airflow and enclosures etc was spot on, I
suspect you need to check whether your particular drives may need some
corrective parameters if not fully covered by the smartctl database, as
some tend to do.  There are hints about this in smartctl(8) -v option.

  #2. Sequences like this show up a fair amount:
  Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time changed
  from 152 to 153
  Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time changed
  from 153 to 152
  Device: /dev/ad0, SMART Prefailure Attribute: 8 Seek_Time_Performance
  changed from 251 to 250

It'd be more useful to see these within the context shown by smartctl -a

  So is the real problem just that the drives are spun down and can't
  spin up fast enough? I can probably live with the consequences of
  that, and just go on with life -- The occasional HTTP request for an
  audio file will fail the first time, and they have to hit reload.
  
  This box is the fail-safe roll-over server for audio files that are
  all up online somewhere else managed by a professional (not me), so
  it's no surprise that the rare time-out on the real server also ends
  up with a drive spin up and failed request on the backup.  Kind of
  annoying, I guess, to an end user, but forcing the drives to always be
  spinning is probably not a Good Idea.

I don't know about that; while I wouldn't worry too much about spin-up
times unless it's a major annoyance to clients, I've always subscribed
to drives lasting much longer if left spinning.  The server delivering
this mail has spun its old IBM DTLA-something drive 24h/365d for nearly
9 years now, despite no aircon in a hot climate (up to ~45C in summer).

  Oh, here's a rather long excerpt of the log in case there's minutae
  within it that I've failed to include:
  http://l-i-e.com/smartd.log

The output of smartctl -a for one or two of your drives would likely be
much more indicative.  I don't claim to be an expert in this at all, but
some of us might spot any obvious anomalies.

Cheers, Ian

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: READ_DMA48 error interpretation

2007-02-05 Thread perryh

 #1. The drive temperatures seem ridiculously high to this naive
 reader, but what do I know?...
 110 to 190 Celcius?  Yikes...  Or maybe that's normal?
 How hot is too hot?

I'd think if you can't hold onto them with bare hands they are too
hot.  100C is *way* too hot.  It's a wonder they are working at all.

If they are in an enclosure, clean the air filter (if any), make
sure the fan(s) is/are running (and actually moving air); add a fan
if there isn't one.  If they are not in an enclosure, (or if they're
in a big one, like the same box with the rest of the system) add a
fan either blowing on them or drawing air over them.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: READ_DMA48 error interpretation

2007-02-04 Thread Richard Lynch

On Tue, January 16, 2007 3:21 pm, Chuck Swiger wrote:
 On Jan 16, 2007, at 1:13 PM, Richard Lynch wrote:
 I know the messages below mean the hard drive or IDE cards are
 having
 problems.  But is this like RED ALERT or more like YELLOW or what?
...
 +ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=404955007
 +ad1: FAILURE - READ_DMA48 status=51READY,DSC,ERROR
 error=10NID_NOT_FOUND
 LBA=404955007
 +g_vfs_done():ad1s1[READ(offset=207336931328, length=16384)]error = 5

 If you have current backups, it's a yellow alert.  Otherwise...

 And what do I do about it?

 umount and fsck everything a lot?
 swap cards/drives around until it stops?
 Ignore it and pray?

 Try installing the sysutils/smartmontools port and run a drive self-
 test.  That will give you a much better assessment of the state of
 the drive and whether it is likely to completely fail in the next 24
 hours...

I ran the short test on the problem drives, and it said everything was
fine.

I'll try the long test at a later date.

Meanwhile, I turned on the smartd daemon, and am seeing two issues in
the logs...

#1. The drive temperatures seem ridiculously high to this naive
reader, but what do I know?...
110 to 190 Celcius?  Yikes...  Or maybe that's normal?
How hot is too hot?

#2. Sequences like this show up a fair amount:
Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time changed
from 152 to 153
Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time changed
from 153 to 152
Device: /dev/ad0, SMART Prefailure Attribute: 8 Seek_Time_Performance
changed from 251 to 250

So is the real problem just that the drives are spun down and can't
spin up fast enough? I can probably live with the consequences of
that, and just go on with life -- The occasional HTTP request for an
audio file will fail the first time, and they have to hit reload.

This box is the fail-safe roll-over server for audio files that are
all up online somewhere else managed by a professional (not me), so
it's no surprise that the rare time-out on the real server also ends
up with a drive spin up and failed request on the backup.  Kind of
annoying, I guess, to an end user, but forcing the drives to always be
spinning is probably not a Good Idea.

Oh, here's a rather long excerpt of the log in case there's minutae
within it that I've failed to include:
http://l-i-e.com/smartd.log

Any help in interpreting these results is most appreciated!

THANKS!!!

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

READ_DMA48 error interpretation

2007-01-16 Thread Richard Lynch

I know the messages below mean the hard drive or IDE cards are having
problems.

But is this like RED ALERT or more like YELLOW or what?

And what do I do about it?

umount and fsck everything a lot?

swap cards/drives around until it stops?

Ignore it and pray?

All the content is already copied to a second box, plus on CD, and
none of it is crucial data, so if I lose a LITTLE data by ignoring
this, I'm okay.

If the whole thing wipes out, that would be bad.

These drives are often spun down, as they are not accessed very often
-- it's the roll-over fall-back audio server in a cobbled-together
system I won't describe, as you'll just laugh at me. :-)  Is it
possible that these are just from the drives spinning up too slowly?

+ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=404955007
+ad1: FAILURE - READ_DMA48 status=51READY,DSC,ERROR
error=10NID_NOT_FOUND
LBA=404955007
+g_vfs_done():ad1s1[READ(offset=207336931328, length=16384)]error = 5
+ad1: TIMEOUT - READ_DMA retrying (1 retry left) LBA=106507715
+ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=324791875
+ad1: FAILURE - READ_DMA48 status=51READY,DSC,ERROR
error=10NID_NOT_FOUND
LBA=324791875
+g_vfs_done():ad1s1[READ(offset=166293407744, length=2048)]error = 5
+ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=325168415
+ad1: FAILURE - READ_DMA48 status=51READY,DSC,ERROR
error=10NID_NOT_FOUND
LBA=325168415
+g_vfs_done():ad1s1[READ(offset=166486196224, length=16384)]error = 5
+ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=400062279
+ad1: FAILURE - READ_DMA48 status=51READY,DSC,ERROR
error=10NID_NOT_FOUND
LBA=400062279
+g_vfs_done():ad1s1[READ(offset=204831854592, length=4096)]error = 5
+ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=387991903
+ad1: FAILURE - READ_DMA48 status=51READY,DSC,ERROR
error=10NID_NOT_FOUND
LBA=387991903
+g_vfs_done():ad1s1[READ(offset=198651822080, length=16384)]error = 5
+ad3: TIMEOUT - READ_DMA retrying (1 retry left) LBA=287

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: READ_DMA48 error interpretation

2007-01-16 Thread Chuck Swiger


On Jan 16, 2007, at 1:13 PM, Richard Lynch wrote:

I know the messages below mean the hard drive or IDE cards are having
problems.  But is this like RED ALERT or more like YELLOW or what?


If you have current backups, it's a yellow alert.  Otherwise...


And what do I do about it?

umount and fsck everything a lot?
swap cards/drives around until it stops?
Ignore it and pray?


Try installing the sysutils/smartmontools port and run a drive self- 
test.  That will give you a much better assessment of the state of  
the drive and whether it is likely to completely fail in the next 24  
hours...


--
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: READ_DMA48 error interpretation

Re: READ_DMA48 error interpretation

Re: READ_DMA48 error interpretation

Re: READ_DMA48 error interpretation

Re: READ_DMA48 error interpretation

Re: READ_DMA48 error interpretation

Re: READ_DMA48 error interpretation

READ_DMA48 error interpretation

Re: READ_DMA48 error interpretation

9 matches

Site Navigation

Mail list logo

Footer information