Re: This has begun to annoy me...

2008-02-01 Thread Josef Grosch
On Fri, Feb 01, 2008 at 09:08:53PM +0100, Wojciech Puchar wrote:
> >+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4570911
> >+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=42958719
> >+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113343327
> >+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=117315327
> >
> >It looks like disk write errors to me, but I'm not sure.
> >
> yes it use.
> 
> use ports/sysutils/smartmontools
> 
> to make sure.
> 
> print smart output and/or these errors and request replacement in your 
> shop.


The disk is failing. Smartmontools, which is a great package, will only
tell you want you already know, that the disk is failing. I would copy
every thing important off that disk _NOW_ then go and get a replacement
disk TODAY. It's possible that the disk may spin for another 2 months or it
might burst into flames in the next 20 minutes. Either way the disk is
telling you, "I'm sick". 

Also, what ever you do, do not power down that machine until you have
copied off the data. When you power cycle machines with disk in this state
there is a very good chance the disk will not spin back up.

I see this all the time at my job. 


Good Luck

Josef

-- 
Josef Grosch   | Another day closer to a | FreeBSD 6.3
[EMAIL PROTECTED] |   Micro$oft free world  | Berkeley, Ca.


pgputPSgOb4ff.pgp
Description: PGP signature


Re: This has begun to annoy me...

2008-02-01 Thread Wojciech Puchar


use ports/sysutils/smartmontools



What do you use for SATA RAID drives?  (/dev/mfi[num])


i don't use "hardware" RAID at all.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread Paul Schmehl
--On Friday, February 01, 2008 21:08:53 +0100 Wojciech Puchar 
<[EMAIL PROTECTED]> wrote:



+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4570911
+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=42958719
+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113343327
+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=117315327

It looks like disk write errors to me, but I'm not sure.


yes it use.

use ports/sysutils/smartmontools



What do you use for SATA RAID drives?  (/dev/mfi[num])

--
Paul Schmehl ([EMAIL PROTECTED])
Senior Information Security Analyst
The University of Texas at Dallas
http://www.utdallas.edu/ir/security/

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread James Harrison
On Fri, 2008-02-01 at 22:37 +0100, Wojciech Puchar wrote:
> > I have a laptop doing same thing, but even with the occasional spontaneous
> > reboot, it is still more reliable than Windows, and as I haven't had time to
> > mess with it, it stays in there. Probably will run dban on the drive, see if
> > it makes happy.
> 
> what is dban?
Darik's Boot and Nuke:

http://dban.sourceforge.net/


> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread Wojciech Puchar

I have a laptop doing same thing, but even with the occasional spontaneous
reboot, it is still more reliable than Windows, and as I haven't had time to
mess with it, it stays in there. Probably will run dban on the drive, see if
it makes happy.


what is dban?
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread perlcat
On Friday 01 February 2008 14:23:27 Wojciech Puchar wrote:
> > You could changing the disk for another.  If you get similar messages
> > with the other disk too, then it is probably not the disk which is at
> > fault.
>
> smartmontools gets error info from DRIVE directly, so it's easy to know if
> it was drive media error or drive communication errors!
>
> > --
> > 

I have a laptop doing same thing, but even with the occasional spontaneous 
reboot, it is still more reliable than Windows, and as I haven't had time to 
mess with it, it stays in there. Probably will run dban on the drive, see if 
it makes happy.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread Wojciech Puchar

You could changing the disk for another.  If you get similar messages with
the other disk too, then it is probably not the disk which is at fault.


smartmontools gets error info from DRIVE directly, so it's easy to know if 
it was drive media error or drive communication errors!







--

Erik Trulsson
[EMAIL PROTECTED]
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread Mel
On Friday 01 February 2008 20:40:32 Kurt Buff wrote:
> I've been getting this in my daily security email from one of my boxes
> for quite a while, and have been ignoring it, because of workload.
>
> However, it's finally annoyed me enough to pursue it.
>
> What would the significance of the following section be?
>
> zsquid.mycompany.com kernel log messages:
> +++ /tmp/security.4blejPLWFri Feb  1 03:02:08 2008
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4662143
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113995359
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=112970015
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4668319
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4849151
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=115527359
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113714335
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113715199
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4570911
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=42958719
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113343327
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=117315327
>
> It looks like disk write errors to me, but I'm not sure.
>

Disk errors (as in bad sectors) or bad cable.
-- 
Mel
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread Bill Moran
In response to "Kurt Buff" <[EMAIL PROTECTED]>:

> On Feb 1, 2008 11:46 AM, Bill Moran <[EMAIL PROTECTED]> wrote:
> > In response to "Kurt Buff" <[EMAIL PROTECTED]>:
> >
> >
> > > I've been getting this in my daily security email from one of my boxes
> > > for quite a while, and have been ignoring it, because of workload.
> > >
> > > However, it's finally annoyed me enough to pursue it.
> > >
> > > What would the significance of the following section be?
> > >
> > > zsquid.mycompany.com kernel log messages:
> > > +++ /tmp/security.4blejPLWFri Feb  1 03:02:08 2008
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4662143
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113995359
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=112970015
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4668319
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4849151
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=115527359
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113714335
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113715199
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4570911
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=42958719
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113343327
> > > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=117315327
> > >
> > > It looks like disk write errors to me, but I'm not sure.
> > >
> > > Thoughts?
> >
> > Looks like a failing disk to me.
> >
> > Depending on the importance of the data on that drive, make sure you're
> > getting backups and get a new drive on order.
> >
> > If it's always done this, it could also be faulty or non-standard hardware
> > (such as the SATA controller).
> 
> That's pretty much what I expected.
> 
> It's done this since I've been running the machine, about six months
> now. Given your comment, I expect it's an issue with the controller.

I recommend you _not_ trust that system.  In my experience, systems with
weird errors like this can suddenly lose your data for no reason.

-- 
Bill Moran
http://www.potentialtech.com
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread Erik Trulsson
On Fri, Feb 01, 2008 at 11:40:32AM -0800, Kurt Buff wrote:
> I've been getting this in my daily security email from one of my boxes
> for quite a while, and have been ignoring it, because of workload.
> 
> However, it's finally annoyed me enough to pursue it.
> 
> What would the significance of the following section be?
> 
> zsquid.mycompany.com kernel log messages:
> +++ /tmp/security.4blejPLWFri Feb  1 03:02:08 2008
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4662143
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113995359
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=112970015
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4668319
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4849151
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=115527359
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113714335
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113715199
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4570911
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=42958719
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113343327
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=117315327
> 
> It looks like disk write errors to me, but I'm not sure.
> 
> Thoughts?

Not so much write *errors* as write *problems*.
What seems to have happened is that some write requests do not finish
fast enough so the ATA driver gets a timeout, but retries the operation
and it works on the second try.  (If it hadn't worked on the second try
either then there should have been much scarier messages in the log.)


It might be because of a disk going bad where the disk has to do remapping
of bad blocks when it detects write problems which take extra time, causing
timeouts.

It might be because of sub-standard cabling which cause errors (but then
there ought to have been problems with reading as well as writing.)

It might be because FreeBSD's ATA driver uses a timeout value which is a bit
too aggressive (too low) for your controller/disk combination.


You could changing the disk for another.  If you get similar messages with
the other disk too, then it is probably not the disk which is at fault.




-- 

Erik Trulsson
[EMAIL PROTECTED]
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread Wojciech Puchar


That's pretty much what I expected.

It's done this since I've been running the machine, about six months
now. Given your comment, I expect it's an issue with the controller.


with controller - it would start immediately, not after 6 months i think

but please use smartmontools.

good idea is to use it always, not only when problems start to appear
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread Wojciech Puchar

+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4570911
+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=42958719
+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113343327
+ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=117315327

It looks like disk write errors to me, but I'm not sure.


yes it use.

use ports/sysutils/smartmontools

to make sure.

print smart output and/or these errors and request replacement in your 
shop.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread Kurt Buff
On Feb 1, 2008 11:46 AM, Bill Moran <[EMAIL PROTECTED]> wrote:
> In response to "Kurt Buff" <[EMAIL PROTECTED]>:
>
>
> > I've been getting this in my daily security email from one of my boxes
> > for quite a while, and have been ignoring it, because of workload.
> >
> > However, it's finally annoyed me enough to pursue it.
> >
> > What would the significance of the following section be?
> >
> > zsquid.mycompany.com kernel log messages:
> > +++ /tmp/security.4blejPLWFri Feb  1 03:02:08 2008
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4662143
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113995359
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=112970015
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4668319
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4849151
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=115527359
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113714335
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113715199
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4570911
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=42958719
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113343327
> > +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=117315327
> >
> > It looks like disk write errors to me, but I'm not sure.
> >
> > Thoughts?
>
> Looks like a failing disk to me.
>
> Depending on the importance of the data on that drive, make sure you're
> getting backups and get a new drive on order.
>
> If it's always done this, it could also be faulty or non-standard hardware
> (such as the SATA controller).

That's pretty much what I expected.

It's done this since I've been running the machine, about six months
now. Given your comment, I expect it's an issue with the controller.

Thanks,

Kurt
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: This has begun to annoy me...

2008-02-01 Thread Bill Moran
In response to "Kurt Buff" <[EMAIL PROTECTED]>:

> I've been getting this in my daily security email from one of my boxes
> for quite a while, and have been ignoring it, because of workload.
> 
> However, it's finally annoyed me enough to pursue it.
> 
> What would the significance of the following section be?
> 
> zsquid.mycompany.com kernel log messages:
> +++ /tmp/security.4blejPLWFri Feb  1 03:02:08 2008
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4662143
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113995359
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=112970015
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4668319
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4849151
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=115527359
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113714335
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113715199
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=4570911
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=42958719
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=113343327
> +ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=117315327
> 
> It looks like disk write errors to me, but I'm not sure.
> 
> Thoughts?

Looks like a failing disk to me.

Depending on the importance of the data on that drive, make sure you're
getting backups and get a new drive on order.

If it's always done this, it could also be faulty or non-standard hardware
(such as the SATA controller).

-- 
Bill Moran
http://www.potentialtech.com
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"