subject:"Disk Failure\?"

ZFS on 8.1 - various problems after a disk failure.

2011-06-13 Thread Howard Jones

I have a FreeBSD 8.2 server at home with 4 2TB drives in it running ZFS
with a raidz pool. Some time ago, I had a disk fail. Initially it wasn't
totally obvious the disk had failed so I ran a 'zpool scrub' on the
pool, which threw up a lot of errors, and also produced a lot of sense
errors, making it obvious I had a dead disk.

I replaced the disk, then ran zpool replace zjumbo ad4 ad4 to replace
the bad disk in-place, and start a resilver.

Now I have a few problems:
1) The old ad4 is still listed, even after several scrub/resilvers.
Shouldn't it go away?
2) Although I lost a whole directory with ~1TB of music, the space
allocated to that directory is still around according df.
3) I have another bunch of files that appear in directory listings, but
if I get Illegal byte sequence errors when trying to read them (with
anything - du, file, wc).

I have backups of most of the stuff on the pool (although it'd be nice
to recover the more recent data), but how do I get out of this situation
without nuking the site from orbit? (my current plan) Firstly, to get a
reliable representation of what's actually on the filesystem, and for
bonus points, getting back some of the data that should be intact (only
one disk in the set was actually bad, right?).

Here's my current zpool status. Thanks in advance for any pointers!

Howie

# zpool status
  pool: zjumbo
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 10h57m with 15190 errors on Thu May 19
09:26:59 2011
config:

NAME   STATE READ WRITE CKSUM
zjumbo DEGRADED 0 0  199K
  raidz1   DEGRADED 0 0  792K
replacing  DEGRADED 0 0 0
  ad4/old  UNAVAIL  0 16.1M 0  cannot open
  ad4  ONLINE   0 0 0  1.15T resilvered
ad6ONLINE   0 0 0  677M resilvered
ad8ONLINE   0 0 0  660M resilvered
ad10   ONLINE   0 0 0  535M resilvered

errors: 15190 data errors, use '-v' for a list


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-27 Thread Sebastian Seidl


George Davidovich wrote:

On Wed, Aug 26, 2009 at 04:45:40PM -0400, Jerry McAllister wrote:
  

On Wed, Aug 26, 2009 at 10:23:47PM +0200, Roland Smith wrote:



On Wed, Aug 26, 2009 at 12:13:48PM -0700, George Davidovich wrote: I
remember this special non-condictive 3M fluid that can be used to
cool electronics. A group of hackers dunked a complete PC minus the
case and power supply in this stuff. The fluid itself was cooled
with liquid nitrogen. They everclocked it something wicked. Not very
practical though. :-)
  

A number of supercomputers from Cray and Control Data and maybe some
other places used this sort of thing on some experimental systems.  I
don't know if any ever were put in to commercial production.  They
submerged who boards in to it and then supercooled the fluid.   I
don't remember the chemical names.  



I do, but have no idea why.

http://en.wikipedia.org/wiki/Perfluorohexane

  
The fluid was a relative of Freon and held sufficient levels of oxygen 
to support lung breathers.  They used to have a tank with a live mouse 
submerged in it bouncing around and seeming to have no trouble not 
choking or drowning.  



  

A variation of it was also researched as a blood substitute for some
special medical needs.  I don't know how far that went.I know it
is not all fantasy because I saw the live mouse.   



I believe you.  I saw a similar scene in a movie, so I already knew it
had to be true.  Bonus points for anyone that can add to this thread's
collection of off-topic but semi-interesting trivia and name the movie. 

  

I didn't try the blood substitute.



How do you save a drowning mouse?
Use mouse to mouse resuscitation.

Thanks, I'll be here all week.  Try the veal instead.

  
If the the freezer doesn't work I suggest finding an identical drive and 
replace the electronic board. Worked for many damaged drives.


Regards,
Sebastian Seidl

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-27 Thread Mark Stapper

Gary Gatten wrote:
 Naw, I don't recall the POST error exactly, but from what I remember it
 couldn't find a boot device.  Could've been the controller, but from
 what I recall I swapped the drive (later) and all was good.  I really
 don't recall though - I could've put the bad drive in a good laptop
 and fixed it that way - really don't recall details.  Wish I could fix
 some other problems by throwing them in a freezer!
   
Some try to solve their marital problems with a freezer... and an axe ;-)



signature.asc
Description: OpenPGP digital signature

Re: hard disk failure - now what?

2009-08-26 Thread cpghost

On Mon, Aug 24, 2009 at 02:51:41PM -0600, Tim Judd wrote:
  Buy spinrite, no matter what.
 
 It's OS/FS independent.  it works on the bits stored on the magnetic
 platters, NOT on a filesystem.  TiVo, Linux, BSD and Mac OSX drives
 are treated the same.  Bits on a magnetic platter.  It's recovery
 stems from the randomization and movement of the head to the sector in
 question that allows it to salvage any bits it can (for example, other
 recovery will abandon 512bytes if 1 bit cannot be read.  spinrite will
 recover 512bytes-1bit to a hard drive's spare sector once spinrite
 says i'm done working with this sector.)  It leads to a very
 successful rate.

(Disclaimer: I'm not familiar with spinrite.)

512bytes-1bit may be read back, but you can't be sure that those are
the correct bytes! IIRC, sectors are usually protected by some kind of
ECC. Simply ignoring the ECC and reading raw magnetic data will all
too often result in corrupt sectors.

Of course, if you have out-of-band error correction or at least error
detection mechanisms (like .PAR or md5/sha1 checksums), raw magnetic
recovery is better than nothing, if you're desperate.

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread Jerry

On Wed, 26 Aug 2009 18:10:38 +0200
cpghost cpgh...@cordula.ws wrote:

 On Mon, Aug 24, 2009 at 02:51:41PM -0600, Tim Judd wrote:
   Buy spinrite, no matter what.
  
  It's OS/FS independent.  it works on the bits stored on the magnetic
  platters, NOT on a filesystem.  TiVo, Linux, BSD and Mac OSX drives
  are treated the same.  Bits on a magnetic platter.  It's recovery
  stems from the randomization and movement of the head to the sector
  in question that allows it to salvage any bits it can (for example,
  other recovery will abandon 512bytes if 1 bit cannot be read.
  spinrite will recover 512bytes-1bit to a hard drive's spare sector
  once spinrite says i'm done working with this sector.)  It leads
  to a very successful rate.
 
 (Disclaimer: I'm not familiar with spinrite.)
 
 512bytes-1bit may be read back, but you can't be sure that those are
 the correct bytes! IIRC, sectors are usually protected by some kind of
 ECC. Simply ignoring the ECC and reading raw magnetic data will all
 too often result in corrupt sectors.
 
 Of course, if you have out-of-band error correction or at least error
 detection mechanisms (like .PAR or md5/sha1 checksums), raw magnetic
 recovery is better than nothing, if you're desperate.
 
 -cpghost.

I have used Spinrite several times with excellent results. In fact, I
recently used it to recover a Laptop drive that had become unusable.

Spinrite tries to turn off ECC if possible. It is not the cheapest
product; however, it works better than anything else I have tried on
bonked discs. Use it on its highest recover level and it will recover
the drive; although it may take a while.

http://www.grc.com/intro.htm

-- 
Jerry
ges...@yahoo.com

Lord, defend me from my friends; I can account for my enemies.

Charles D'Hericault
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread Roland Smith

On Tue, Aug 25, 2009 at 11:46:50PM -0600, Kelly Martin wrote:
 plugging the drive in and accessing it, I heard those tell-tale signs
 of hard drive failure: clicks and pops and other unusual noises, so I
 know that it has some damage. I hate those sounds, having heard them
 on failing drives too many times before.

If the drive is that bad, it is doubtfull if dd or ddrescue will be able to
get a good copy.

  My question: what kind of checks and/or repair tools should I run on
  the damaged drive after it's mounted?
 
  As others have mentioned, first make a copy (with the disk unmounted) of the
  partitions on that disk with dd, saving them to another drive. That way you
  can experiment with the data without further deterioration of the
  original.
 
 I ran dd and it took over 20 hours to complete. In fact it just
 finished this evening, after running all day. Lots of FAILURE errors
 were reported along the way, enough to fill two console screens or
 more. And of course to complicate things I didn't have a spare drive
 as an output device that was the *same size*, so I used a smaller
 drive thinking that it wouldn't matter since the source drive wasn't
 full anyway. I have no idea if data is scattered around on the FFS
 filesystem such that cloning a mostly empty, larger drive onto
 something smaller might lose data... I searched Google and couldn't
 find the answer, so I proceeded anyway. It doesn't matter now though,
 as I have a new drive now and another plan.

Using dd you make a block-for block copy; dd doesn't know about filesystems.
You could pipe the output from dd through a compression program like gzip or
bzip2. That could yield a smaller image. But you'd have to uncompress it in
order to use it.

Or you could try just copying the filesystems separately. E.g. copy from
ad4s1f instead of the whole ad4. That way you can split the data over several
files which you can store in different places.

 I'm going to try dd a second time, but this time I'll use ddrescue as
 some people suggested and I'll make the target drive an
 identical-sized 500 Gbyte drive, which I purchased today. I imagine it
 will take a long time to create this cloned disk... hopefully with
 fewer errors than dd gave me, though we'll see.
 
I hope you get a good copy, but it doesn't sound too likely. I'm not a hardware
expert, but if the disk is really breaking down in the hardware or
electronics, it is not inconceivable that even reading might further
deteriorate it. If you do not get a good 1:1 copy, you'll have extra errors in
your data! Depending on the options you give dd, it will either skip blocks
with errors or fill it with zeroes or other characters. See the piece of the
manual page of fsck_ufs that describes the 'noerror' conversion.

 Indeed some of the partitions seem to be beyond repair. In particular
 my /var partition is totally fubar'ed. When using fsck_ffs I got all
 sorts of errors when trying to repair the partition, things like:
 
 BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE
 So I used the -b option suggested in the man page, fsck_ffs -y -b 160
 /dev/ad0s1d and it ran and fixed a few things, but then stopped with
 the following error:
 
 fsck_ufs: cannot alloc 4294967292 bytes for inoinfo

The meaning of errors is explained in Appendix A of Fsck - The UNIX File
System Check Program. You can find it this as
/usr/share/doc/smm/03.fsck/paper.ascii.gz

 MySQL databases are normally stored in /var/db/mysql. But then I
 remembered my MySQL server was actually running in a Jail environment,
 and therefore it was located at /usr/jails/myjail/var/db/mysql instead
 of /var/db/mysql, and therefore the jailed MySQL database was on a
 totally different partition. Lucky! And I was also very lucky that I
 could mount the large /usr partition in read-only mode and copy off
 the most critical files I needed, starting with the database. No
 errors on that part of the disk so far, at least with the few critical
 files I've copied over. Whew!

Congratulations!
 
 Until just a few minutes ago I didn't think there'd be a happy ending.
 But I've got the most critical data copied over now, the rest can
 wait. I'm going to go run dd a second time (well, ddrescue) now and
 then start work on the copy once it finishes, in a day or two.

Time to start thinking about a solid backup strategy as well. :-)


Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpOcLejmqquP.pgp
Description: PGP signature

Re: hard disk failure - now what?

2009-08-26 Thread George Davidovich

On Wed, Aug 26, 2009 at 08:07:41PM +0200, Roland Smith wrote:
 On Tue, Aug 25, 2009 at 11:46:50PM -0600, Kelly Martin wrote:
  plugging the drive in and accessing it, I heard those tell-tale
  signs of hard drive failure: clicks and pops and other unusual
  noises, so I know that it has some damage. I hate those sounds,
  having heard them on failing drives too many times before.
 
 If the drive is that bad, it is doubtfull if dd or ddrescue will be
 able to get a good copy.

Probably true.  I hesitate to suggest this, but sticking the drive in a
freezer (preferrably in a ziplock bag) for a few hours or overnight
might help.  Stories from people claiming I swear it works! go back
years.  

To the exent it does work, it might give Kelly enough time to attempt
recovery.  If more time is required, he can try and find a creative
workaround for the 5 meter max length for USB cables.  Also,
experimenting with dry ice or acetone baths might prove to be
interesting, or at least educational. ;-)

-- 
George
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread Roland Smith

On Wed, Aug 26, 2009 at 12:13:48PM -0700, George Davidovich wrote:
snip
  If the drive is that bad, it is doubtfull if dd or ddrescue will be
  able to get a good copy.
 
 Probably true.  I hesitate to suggest this, but sticking the drive in a
 freezer (preferrably in a ziplock bag) for a few hours or overnight
 might help.  Stories from people claiming I swear it works! go back
 years.  

Interesting.

 To the exent it does work, it might give Kelly enough time to attempt
 recovery.  If more time is required, he can try and find a creative
 workaround for the 5 meter max length for USB cables.  Also,
 experimenting with dry ice or acetone baths might prove to be
 interesting, or at least educational. ;-)

Acetone and electronics are _not_ a good mix! Acetone is extremely
flammable. It evaporates easily and can form explosive mixtures in air over a
wide range of concentrations. Not to mention that it would degrade/destroy
printed circuit boards; acetone breaks down the resin that binds the glass
fibers in the laminates! Not as fast as n-Methyl-2-pyrrolidone, bus fast
enough.

I remember this special non-condictive 3M fluid that can be used to cool
electronics. A group of hackers dunked a complete PC minus the case and power
supply in this stuff. The fluid itself was cooled with liquid nitrogen. They
everclocked it something wicked. Not very practical though. :-)

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpXZwydo45KR.pgp
Description: PGP signature

Re: hard disk failure - now what?

2009-08-26 Thread Jerry McAllister

On Wed, Aug 26, 2009 at 10:23:47PM +0200, Roland Smith wrote:

 On Wed, Aug 26, 2009 at 12:13:48PM -0700, George Davidovich wrote:
 snip
   If the drive is that bad, it is doubtfull if dd or ddrescue will be
   able to get a good copy.
  
  Probably true.  I hesitate to suggest this, but sticking the drive in a
  freezer (preferrably in a ziplock bag) for a few hours or overnight
  might help.  Stories from people claiming I swear it works! go back
  years.  
 
 Interesting.
 
  To the exent it does work, it might give Kelly enough time to attempt
  recovery.  If more time is required, he can try and find a creative
  workaround for the 5 meter max length for USB cables.  Also,
  experimenting with dry ice or acetone baths might prove to be
  interesting, or at least educational. ;-)
 
 
 I remember this special non-condictive 3M fluid that can be used to cool
 electronics. A group of hackers dunked a complete PC minus the case and power
 supply in this stuff. The fluid itself was cooled with liquid nitrogen. They
 everclocked it something wicked. Not very practical though. :-)

A number of supercomputers from Cray and Control Data and maybe some
other places used this sort of thing on some experimental systems.  I
don't know if any ever were put in to commercial production.  They submerged
who boards in to it and then supercooled the fluid.   I don't remember
the chemical names.  

The fluid was a relative of Freon and held sufficient levels of oxygen 
to support lung breathers.  They used to have a tank with a live mouse 
submerged in it bouncing around and seeming to have no trouble not 
choking or drowning.  A variation of it was also researched as a blood 
substitute for some special medical needs.  I don't know how far that 
went.I know it is not all fantasy because I saw the live mouse.   
I didn't try the blood substitute.

jerry


 
 Roland
 -- 
 R.F.Smith   http://www.xs4all.nl/~rsmith/
 [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
 pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread Polytropon

On Wed, 26 Aug 2009 12:13:48 -0700, George Davidovich free...@optimis.net 
wrote:
 Probably true.  I hesitate to suggest this, but sticking the drive in a
 freezer (preferrably in a ziplock bag) for a few hours or overnight
 might help.  Stories from people claiming I swear it works! go back
 years.  

I heared a similar suggestion from a guy who tried to get the
protection code out of a car radio. :-)



 To the exent it does work, it might give Kelly enough time to attempt
 recovery.  If more time is required, he can try and find a creative
 workaround for the 5 meter max length for USB cables. 

5 meters? I always thought USB is specified for 2 meters only.
I've never seen a 5 meters long USB cable, by the way.





-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread George Davidovich

On Wed, Aug 26, 2009 at 04:45:40PM -0400, Jerry McAllister wrote:
 On Wed, Aug 26, 2009 at 10:23:47PM +0200, Roland Smith wrote:
 
  On Wed, Aug 26, 2009 at 12:13:48PM -0700, George Davidovich wrote: I
  remember this special non-condictive 3M fluid that can be used to
  cool electronics. A group of hackers dunked a complete PC minus the
  case and power supply in this stuff. The fluid itself was cooled
  with liquid nitrogen. They everclocked it something wicked. Not very
  practical though. :-)
 
 A number of supercomputers from Cray and Control Data and maybe some
 other places used this sort of thing on some experimental systems.  I
 don't know if any ever were put in to commercial production.  They
 submerged who boards in to it and then supercooled the fluid.   I
 don't remember the chemical names.  

I do, but have no idea why.

http://en.wikipedia.org/wiki/Perfluorohexane

 The fluid was a relative of Freon and held sufficient levels of oxygen 
 to support lung breathers.  They used to have a tank with a live mouse 
 submerged in it bouncing around and seeming to have no trouble not 
 choking or drowning.  

 A variation of it was also researched as a blood substitute for some
 special medical needs.  I don't know how far that went.I know it
 is not all fantasy because I saw the live mouse.   

I believe you.  I saw a similar scene in a movie, so I already knew it
had to be true.  Bonus points for anyone that can add to this thread's
collection of off-topic but semi-interesting trivia and name the movie. 

 I didn't try the blood substitute.

How do you save a drowning mouse?
Use mouse to mouse resuscitation.

Thanks, I'll be here all week.  Try the veal instead.

-- 
George
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread Scott Schappell


On Aug 26, 2009, at 14:14:51, George Davidovich wrote:

I believe you.  I saw a similar scene in a movie, so I already knew it
had to be true.  Bonus points for anyone that can add to this thread's
collection of off-topic but semi-interesting trivia and name the  
movie.


What is The Abyss for 1000, Alex?

:)

Scott
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

RE: hard disk failure - now what?

2009-08-26 Thread Gary Gatten

I had a laptop years ago that started to die, but seemed to work OK when
first removed from a cold car.  After an hour or so it would die.  I
eventually put it in the freezer long enough to get what I needed off
the drive, so in some cases I would agree that cold is good!

-Original Message-
From: owner-freebsd-questi...@freebsd.org
[mailto:owner-freebsd-questi...@freebsd.org] On Behalf Of Polytropon
Sent: Wednesday, August 26, 2009 4:13 PM
To: George Davidovich
Cc: freebsd-questions@freebsd.org
Subject: Re: hard disk failure - now what?

On Wed, 26 Aug 2009 12:13:48 -0700, George Davidovich
free...@optimis.net wrote:
 Probably true.  I hesitate to suggest this, but sticking the drive in
a
 freezer (preferrably in a ziplock bag) for a few hours or overnight
 might help.  Stories from people claiming I swear it works! go back
 years.  

I heared a similar suggestion from a guy who tried to get the
protection code out of a car radio. :-)

 To the exent it does work, it might give Kelly enough time to attempt
 recovery.  If more time is required, he can try and find a creative
 workaround for the 5 meter max length for USB cables. 

5 meters? I always thought USB is specified for 2 meters only.
I've never seen a 5 meters long USB cable, by the way.

-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to
freebsd-questions-unsubscr...@freebsd.org

font size=1
div style='border:none;border-bottom:double windowtext 2.25pt;padding:0in 0in 
1.0pt 0in'
/div
This email is intended to be reviewed by only the intended recipient
 and may contain information that is privileged and/or confidential.
 If you are not the intended recipient, you are hereby notified that
 any review, use, dissemination, disclosure or copying of this email
 and its attachments, if any, is strictly prohibited.  If you have
 received this email in error, please immediately notify the sender by
 return email and delete this email from your system.
/font

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread Jerry McAllister

On Wed, Aug 26, 2009 at 02:14:51PM -0700, George Davidovich wrote:

  
  A number of supercomputers from Cray and Control Data and maybe some
  other places used this sort of thing on some experimental systems.  I
  don't know if any ever were put in to commercial production.  They
  submerged who boards in to it and then supercooled the fluid.   I
  don't remember the chemical names.  
 
 I do, but have no idea why.
 
 http://en.wikipedia.org/wiki/Perfluorohexane
 
  The fluid was a relative of Freon and held sufficient levels of oxygen 
  to support lung breathers.  They used to have a tank with a live mouse 
  submerged in it bouncing around and seeming to have no trouble not 
  choking or drowning.  
 
  A variation of it was also researched as a blood substitute for some
  special medical needs.  I don't know how far that went.I know it
  is not all fantasy because I saw the live mouse.   
 
 I believe you.  I saw a similar scene in a movie, so I already knew it
 had to be true.  Bonus points for anyone that can add to this thread's
 collection of off-topic but semi-interesting trivia and name the movie. 

I vaguely remember a movie with it in, but I saw it in
person at Cray headquarters back when.

 
  I didn't try the blood substitute.
 
   How do you save a drowning mouse?
   Use mouse to mouse resuscitation.
 
 Thanks, I'll be here all week.  Try the veal instead.

Only with the asparagus.

jerry

 
 -- 
 George
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread Polytropon

On Wed, 26 Aug 2009 16:30:59 -0500, Gary Gatten ggat...@waddell.com wrote:
 I had a laptop years ago that started to die, but seemed to work OK when
 first removed from a cold car.  After an hour or so it would die.  I
 eventually put it in the freezer long enough to get what I needed off
 the drive, so in some cases I would agree that cold is good!

That really sounds like a thermal problem (defective cooling)...



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

RE: hard disk failure - now what?

2009-08-26 Thread Gary Gatten

Naw, I don't recall the POST error exactly, but from what I remember it
couldn't find a boot device.  Could've been the controller, but from
what I recall I swapped the drive (later) and all was good.  I really
don't recall though - I could've put the bad drive in a good laptop
and fixed it that way - really don't recall details.  Wish I could fix
some other problems by throwing them in a freezer!

-Original Message-
From: Polytropon [mailto:free...@edvax.de] 
Sent: Wednesday, August 26, 2009 5:54 PM
To: Gary Gatten
Cc: George Davidovich; freebsd-questions@freebsd.org
Subject: Re: hard disk failure - now what?

On Wed, 26 Aug 2009 16:30:59 -0500, Gary Gatten ggat...@waddell.com
wrote:
 I had a laptop years ago that started to die, but seemed to work OK
when
 first removed from a cold car.  After an hour or so it would die.  I
 eventually put it in the freezer long enough to get what I needed off
 the drive, so in some cases I would agree that cold is good!

That really sounds like a thermal problem (defective cooling)...



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...





font size=1
div style='border:none;border-bottom:double windowtext 2.25pt;padding:0in 0in 
1.0pt 0in'
/div
This email is intended to be reviewed by only the intended recipient
 and may contain information that is privileged and/or confidential.
 If you are not the intended recipient, you are hereby notified that
 any review, use, dissemination, disclosure or copying of this email
 and its attachments, if any, is strictly prohibited.  If you have
 received this email in error, please immediately notify the sender by
 return email and delete this email from your system.
/font

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread Polytropon

On Wed, 26 Aug 2009 20:07:41 +0200, Roland Smith rsm...@xs4all.nl wrote:
 If the drive is that bad, it is doubtfull if dd or ddrescue will be able to
 get a good copy.

There's an additional problem: Let's assume dd creates an 1:1 copy
of the file system in its actual state - nobody guarantees that
this file system is fully intact, or can be repaired.

I have (!) the problem myself that I got the dd copy from the partition
holding my home directory just fine, but the file system itself is
damaged in such a state that fsck_ffs cannot repair it. At least, I
could get data off it - EXCEPT my home directory, sadly. But that's
not a (physical) disk problem, but a file system related one.



 Using dd you make a block-for block copy; dd doesn't know about filesystems.
 You could pipe the output from dd through a compression program like gzip or
 bzip2. That could yield a smaller image. But you'd have to uncompress it in
 order to use it.

I'm often told that hard disks are cheap today, and it's much
more relaxing operating on a plain image than on a compressed
one.




 Or you could try just copying the filesystems separately. E.g. copy from
 ad4s1f instead of the whole ad4. That way you can split the data over several
 files which you can store in different places.

That is the encouraged method. In case you have separated file
systems, it's a quite optimum case. For example, you don't need
to mess around with a 20 GB /tmp partition if you intendedly want
to lose its data.



 I hope you get a good copy, but it doesn't sound too likely. I'm not a 
 hardware
 expert, but if the disk is really breaking down in the hardware or
 electronics, it is not inconceivable that even reading might further
 deteriorate it.

In case of such hardware defects that causes growing problems,
it's wise to get the data (1st) as fast as possible and (2nd)
as accurate as possible - before the disk completely dies.

In such a case, it's still possible to recover data, e. g. to
mount the disks (the cylinders or platters) into another drive
unit. But if the disks are defective theirselves...


 If you do not get a good 1:1 copy, you'll have extra errors in
 your data! Depending on the options you give dd, it will either skip blocks
 with errors or fill it with zeroes or other characters. See the piece of the
 manual page of fsck_ufs that describes the 'noerror' conversion.

As far as I remember, dd_rescue or ddrescue can handle such
problems. In case of errors, they retry and keep reading.



  fsck_ufs: cannot alloc 4294967292 bytes for inoinfo
 
 The meaning of errors is explained in Appendix A of Fsck - The UNIX File
 System Check Program. You can find it this as
 /usr/share/doc/smm/03.fsck/paper.ascii.gz

When I tried to repair my defective partition in another system
with less RAM, I got a similar error:

cannot alloc 1073796864 bytes for inoinfo

The real (usual) error is

fsck_4.2bsd: bad inode number 306176 to nextinode

It seems that more RAM is needed to store information.



 Time to start thinking about a solid backup strategy as well. :-)

The correct time to do so is BEFORE you start storing data. :-)



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread Al Plant


Gary Gatten wrote:

I had a laptop years ago that started to die, but seemed to work OK when
first removed from a cold car.  After an hour or so it would die.  I
eventually put it in the freezer long enough to get what I needed off
the drive, so in some cases I would agree that cold is good!

-Original Message-
From: owner-freebsd-questi...@freebsd.org
[mailto:owner-freebsd-questi...@freebsd.org] On Behalf Of Polytropon
Sent: Wednesday, August 26, 2009 4:13 PM
To: George Davidovich
Cc: freebsd-questions@freebsd.org
Subject: Re: hard disk failure - now what?

On Wed, 26 Aug 2009 12:13:48 -0700, George Davidovich
free...@optimis.net wrote:

Probably true.  I hesitate to suggest this, but sticking the drive in

a

freezer (preferrably in a ziplock bag) for a few hours or overnight
might help.  Stories from people claiming I swear it works! go back
years.  


I heared a similar suggestion from a guy who tried to get the
protection code out of a car radio. :-)




To the exent it does work, it might give Kelly enough time to attempt
recovery.  If more time is required, he can try and find a creative
workaround for the 5 meter max length for USB cables. 


5 meters? I always thought USB is specified for 2 meters only.
I've never seen a 5 meters long USB cable, by the way.






Aloha,

Off Topic but very funny as well as interesting.

I have a usb cable that I bought  it on line and have used it for a 
small video camera that is 15 meters long and it works OK.



~Al Plant - Honolulu, Hawaii -  Phone:  808-284-2740
  + http://hawaiidakine.com + http://freebsdinfo.org +
  + http://aloha50.net   - Supporting - FreeBSD 6.* - 7.* - 8.* +
   email: n...@hdk5.net 
All that's really worth doing is what we do for others.- Lewis Carrol

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-26 Thread Roland Smith

On Thu, Aug 27, 2009 at 01:03:58AM +0200, Polytropon wrote:
 On Wed, 26 Aug 2009 20:07:41 +0200, Roland Smith rsm...@xs4all.nl wrote:
  If the drive is that bad, it is doubtfull if dd or ddrescue will be able to
  get a good copy.
 
 There's an additional problem: Let's assume dd creates an 1:1 copy
 of the file system in its actual state - nobody guarantees that
 this file system is fully intact, or can be repaired.

Certainly. If filesystem data is missing, there is only so much that fsck_ufs
can do about it.
 
  Using dd you make a block-for block copy; dd doesn't know about filesystems.
  You could pipe the output from dd through a compression program like gzip or
  bzip2. That could yield a smaller image. But you'd have to uncompress it in
  order to use it.
 
 I'm often told that hard disks are cheap today, and it's much
 more relaxing operating on a plain image than on a compressed
 one.

Of course. But if you are operating under restricted scape constraints...

  I hope you get a good copy, but it doesn't sound too likely. I'm not a
  hardware expert, but if the disk is really breaking down in the hardware
  or electronics, it is not inconceivable that even reading might further
  deteriorate it.
 
 In case of such hardware defects that causes growing problems,
 it's wise to get the data (1st) as fast as possible and (2nd)
 as accurate as possible - before the disk completely dies.

And (3rd) in as few tries as possible!

 In such a case, it's still possible to recover data, e. g. to
 mount the disks (the cylinders or platters) into another drive
 unit. But if the disks are defective theirselves...

I wonder if that is still possible with current drives? My impression was
(from a paper that I can't locate ATM) that data densities are so high that it
is extremely difficult to read the data with different arm/head assembly then
the one it was written with.

  Time to start thinking about a solid backup strategy as well. :-)
 
 The correct time to do so is BEFORE you start storing data. :-)

Very true! But since the lack of backups was what got the OP in this mess in
the first place...

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpuFA9QD2zWP.pgp
Description: PGP signature

Re: hard disk failure - now what?

2009-08-25 Thread perryh

Lowell Gilbert freebsd-questions-lo...@be-well.ilk.org wrote:
 Kelly Martin kellymar...@gmail.com writes:
  I just experienced a hard drive failure on one of my
  FreeBSD 7.2 production servers with no backup!
...
 First, try copying the entire disk, *without* mounting it.

Yep.

 Use dd(1) to get a copy of the whole disk.  I believe that
 conv=noerror may be necessary.

Much better:  use sysutils/ddrescue, which was written
specifically to deal with this sort of situation.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-25 Thread Jerry McAllister

On Mon, Aug 24, 2009 at 10:26:11PM +0200, Polytropon wrote:

 On Mon, 24 Aug 2009 12:29:19 -0600, Kelly Martin kellymar...@gmail.com 
 wrote:
  My question: what kind of checks and/or repair tools should I run on
  the damaged drive after it's mounted? Or should I mount it as
  read-only and start backing it up?
 
 Thou shalt not manipluate thy file systems while they are mounted. :-)
 Perform an fsck on the partitions first, then mount them ro. Copy
 the files you need.
 
 In case you can't reach essential files, you have the change to
 use forensic tools to get them.
 
 Finally, keep in mind that for further diagnostics and restore
 operations it's always wise not to use the original file systems,
 i. e. the original disk. Make dd copies of the partitions onto
 a working disk and use them instead. Luckily, most operations
 work on plain files as well as on block device specials.

dd will barf on bad bits too.
You can tinker to make it skip over the bad block, but it
won't read it.   

jerry


 
  I am hoping most of my data is
  still there, but also don't want to damage it further.
 
 Good idea. This encourages you to follow the advice given above.
 
 
 
  I desperately
  need to salvage the data, what do the kind people on this list
  recommend?
 
 BACKUPS!!! =^_^=
 
 
 
 -- 
 Polytropon
 Magdeburg, Germany
 Happy FreeBSD user since 4.0
 Andra moi ennepe, Mousa, ...
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-25 Thread Lowell Gilbert

per...@pluto.rain.com writes:

 Lowell Gilbert freebsd-questions-lo...@be-well.ilk.org wrote:
 Kelly Martin kellymar...@gmail.com writes:
  I just experienced a hard drive failure on one of my
  FreeBSD 7.2 production servers with no backup!
 ...
 First, try copying the entire disk, *without* mounting it.

 Yep.

 Use dd(1) to get a copy of the whole disk.  I believe that
 conv=noerror may be necessary.

 Much better:  use sysutils/ddrescue, which was written
 specifically to deal with this sort of situation.

Excellent suggestion.
-- 
Lowell Gilbert, embedded/networking software engineer, Boston area
http://be-well.ilk.org/~lowell/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-25 Thread Polytropon

On Tue, 25 Aug 2009 11:04:38 -0400, Jerry McAllister jerr...@msu.edu wrote:
 dd will barf on bad bits too.
 You can tinker to make it skip over the bad block, but it
 won't read it.   

As it has been suggested, there are interesting tools in the
ports collection. I'll post my famous list again. Among them,
note ddrescue and dd_rescue. But base system tools such as the
fetch program can help.


System:
dd
fsck_ffs
clri
fsdb
fetch -rR device
recoverdisk (!)

Ports:
ddrescue
dd_rescue
ffs2recov
magicrescue
testdisk
The Sleuth Kit:
fls
dls
ils
autopsy
scan_ffs
recoverjpeg
foremost
photorec

Those programs are not ordered in any way.


-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-25 Thread Kelly Martin

First, thanks to everyone for the really great replies. Many
suggestions were quite helpful and have kept me on track. I'll quote a
couple of people and then add some comments below.

On Mon, Aug 24, 2009 at 4:32 PM, Roland Smithrsm...@xs4all.nl wrote:
 It _could_ just be a bad or improperly connected SATA cable. Try changing or
 re-seating the cable.

I thought of that too, but no luck.

 Read errors cannot damage your data, but write errors can! Immediately stop
 all writing to the disk. Re-mount the partitions on that disk as read-only, or
 unmount them.

That was a consensus among everyone who replied, so I made that step
#1. I mounted the partitions read-only and crossed my fingers. Trying
to check the integrity of the data, or even get directory listings was
another matter, as I got various strange errors... which told me I
quite likely had some data loss.

 To see if a disk really is broken, install sysutils/smartmontools, and run
 'smartctl -a' on the disk. If you see errors in its report (e.g. reallocated
 sectors), the disk is dying and should be unplugged to prevent it from getting
 worse.

That's a good idea and I'll try to use it in the future. After
plugging the drive in and accessing it, I heard those tell-tale signs
of hard drive failure: clicks and pops and other unusual noises, so I
know that it has some damage. I hate those sounds, having heard them
on failing drives too many times before.


 My question: what kind of checks and/or repair tools should I run on
 the damaged drive after it's mounted?

 As others have mentioned, first make a copy (with the disk unmounted) of the
 partitions on that disk with dd, saving them to another drive. That way you
 can experiment with the data without further deterioration of the
 original.

I ran dd and it took over 20 hours to complete. In fact it just
finished this evening, after running all day. Lots of FAILURE errors
were reported along the way, enough to fill two console screens or
more. And of course to complicate things I didn't have a spare drive
as an output device that was the *same size*, so I used a smaller
drive thinking that it wouldn't matter since the source drive wasn't
full anyway. I have no idea if data is scattered around on the FFS
filesystem such that cloning a mostly empty, larger drive onto
something smaller might lose data... I searched Google and couldn't
find the answer, so I proceeded anyway. It doesn't matter now though,
as I have a new drive now and another plan.

You can use this disk image e.g. as a vnode-backed memory disk, see
 mdconfig(8). If you cannot get a good copy of the disk partitions it might be
 a good idea to get a quote from a professional hard drive data recovery
 company to do that for you. I've never had occasion to try this (hooray for
 backups) but I've heard it can be quite expensive. :-/

I'm going to try dd a second time, but this time I'll use ddrescue as
some people suggested and I'll make the target drive an
identical-sized 500 Gbyte drive, which I purchased today. I imagine it
will take a long time to create this cloned disk... hopefully with
fewer errors than dd gave me, though we'll see.

 Try using fsck_ffs on (copies of) the disk image to see if that can restore
 the damage. If the damage is beyond repair for fsck_ffs, you have a real
 problem. Of course is you have a good disk image, your data is still
 there, but you might have to use a forensics program like sysutils/sleuthkit
 or hexdump to try and piece files together. And even then you cannot be sure
 that there is no corrupted data in the files themselves. Good luck with that. 
 :-(

Indeed some of the partitions seem to be beyond repair. In particular
my /var partition is totally fubar'ed. When using fsck_ffs I got all
sorts of errors when trying to repair the partition, things like:

BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE
So I used the -b option suggested in the man page, fsck_ffs -y -b 160
/dev/ad0s1d and it ran and fixed a few things, but then stopped with
the following error:

fsck_ufs: cannot alloc 4294967292 bytes for inoinfo

The worst part of all is that the /var partition would normally be
okay to lose if it didn't have my MySQL database on it - the most
important data on the server. I just about choked down a golf ball
when I discovered my /var partition was in such rough shape and I
might be forced to use real recovery tools, or hire a professional for
$$$, or be out-of-luck.

MySQL databases are normally stored in /var/db/mysql. But then I
remembered my MySQL server was actually running in a Jail environment,
and therefore it was located at /usr/jails/myjail/var/db/mysql instead
of /var/db/mysql, and therefore the jailed MySQL database was on a
totally different partition. Lucky! And I was also very lucky that I
could mount the large /usr partition in read-only mode and copy off
the most critical files I needed, starting with the database. No
errors on that part of the disk so

hard disk failure - now what?

2009-08-24 Thread Kelly Martin

I just experienced a hard drive failure on one of my FreeBSD 7.2
production servers with no backup! I am so mad at myself for not
backing up!! Now it's a salvage operation. Here are the type of errors
I was getting on the console, over-and-over:

ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=441633503
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout -
completing request directly
ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout -
completing request directly
ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad4: FAILURE - WRITE_DMA48 timed out LBA=441633375
g_vgs_done():ad4s1f[WRITE(offset=216338284544, length=16384)]error = 5

I could still login to the machine (after an eternity) but got lots of
read/write errors along the way.  The offset shown in the errors kept
changing, so I thought it was a hardware eSATA controller issue
instead of a bad sector on the drive -  I replaced the motherboard,
but the problem persisted. So I bought a new hard drive and have
re-installed FreeBSD 7.2 on it. I'd like to plug in the old hard drive
today, mount it and salvage as much as I can... especially the
database files, config files, etc.

My question: what kind of checks and/or repair tools should I run on
the damaged drive after it's mounted? Or should I mount it as
read-only and start backing it up? I am hoping most of my data is
still there, but also don't want to damage it further. I desperately
need to salvage the data, what do the kind people on this list
recommend?

thanks,
kelly
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-24 Thread Tim Judd

On 8/24/09, Kelly Martin kellymar...@gmail.com wrote:
 I just experienced a hard drive failure on one of my FreeBSD 7.2
 production servers with no backup! I am so mad at myself for not
 backing up!! Now it's a salvage operation. Here are the type of errors
 I was getting on the console, over-and-over:

 ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=441633503
 ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout -
 completing request directly
 ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout -
 completing request directly
 ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
 ad4: FAILURE - WRITE_DMA48 timed out LBA=441633375
 g_vgs_done():ad4s1f[WRITE(offset=216338284544, length=16384)]error = 5

 I could still login to the machine (after an eternity) but got lots of
 read/write errors along the way.  The offset shown in the errors kept
 changing, so I thought it was a hardware eSATA controller issue
 instead of a bad sector on the drive -  I replaced the motherboard,
 but the problem persisted. So I bought a new hard drive and have
 re-installed FreeBSD 7.2 on it. I'd like to plug in the old hard drive
 today, mount it and salvage as much as I can... especially the
 database files, config files, etc.

 My question: what kind of checks and/or repair tools should I run on
 the damaged drive after it's mounted? Or should I mount it as
 read-only and start backing it up? I am hoping most of my data is
 still there, but also don't want to damage it further. I desperately
 need to salvage the data, what do the kind people on this list
 recommend?

 thanks,
 kelly


If I were you, get a copy of spinrite (from grc.com) and always keep
it handy.  It can be risky on a drive already failing.  Here's what
I'd do

Buy spinrite, no matter what.

slave the bad drive, read-only mount..  even if the FS is dirty,
read-only.. no fsck.
copy the data you can (if any).
reboot and run spinrite on the bad drive, deepest analysis (level 4 or
5) [may take days, weeks or even reports of months]
re-slave the bad drive to the system, fsck and mount read-only.
compare and copy any additional data, if any/if applicable, you can.

Scrap/destroy the drive if it has sensitive data.  I crack open the
drive and dismantle the HDD platters from the spindle, break the
read-write head ribbon cable, and remove the circuit board on the
drive when I destroy drives.

Each component should be recycled (being the responsible citizen),
maybe on separate runs to remove the possibility of someone nosy
getting into your stuff.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-24 Thread Lowell Gilbert

Kelly Martin kellymar...@gmail.com writes:

 I just experienced a hard drive failure on one of my FreeBSD 7.2
 production servers with no backup! I am so mad at myself for not
 backing up!! Now it's a salvage operation. Here are the type of errors
 I was getting on the console, over-and-over:

 ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=441633503
 ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout -
 completing request directly
 ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout -
 completing request directly
 ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
 ad4: FAILURE - WRITE_DMA48 timed out LBA=441633375
 g_vgs_done():ad4s1f[WRITE(offset=216338284544, length=16384)]error = 5

 I could still login to the machine (after an eternity) but got lots of
 read/write errors along the way.  The offset shown in the errors kept
 changing, so I thought it was a hardware eSATA controller issue
 instead of a bad sector on the drive -  I replaced the motherboard,
 but the problem persisted. So I bought a new hard drive and have
 re-installed FreeBSD 7.2 on it. I'd like to plug in the old hard drive
 today, mount it and salvage as much as I can... especially the
 database files, config files, etc.

 My question: what kind of checks and/or repair tools should I run on
 the damaged drive after it's mounted? Or should I mount it as
 read-only and start backing it up? I am hoping most of my data is
 still there, but also don't want to damage it further. I desperately
 need to salvage the data, what do the kind people on this list
 recommend?

First, try copying the entire disk, *without* mounting it.  Use dd(1) to
get a copy of the whole disk.  I believe that conv=noerror may be necessary.

-- 
Lowell Gilbert, embedded/networking software engineer, Boston area
http://be-well.ilk.org/~lowell/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-24 Thread Polytropon

On Mon, 24 Aug 2009 12:29:19 -0600, Kelly Martin kellymar...@gmail.com wrote:
 My question: what kind of checks and/or repair tools should I run on
 the damaged drive after it's mounted? Or should I mount it as
 read-only and start backing it up?

Thou shalt not manipluate thy file systems while they are mounted. :-)
Perform an fsck on the partitions first, then mount them ro. Copy
the files you need.

In case you can't reach essential files, you have the change to
use forensic tools to get them.

Finally, keep in mind that for further diagnostics and restore
operations it's always wise not to use the original file systems,
i. e. the original disk. Make dd copies of the partitions onto
a working disk and use them instead. Luckily, most operations
work on plain files as well as on block device specials.



 I am hoping most of my data is
 still there, but also don't want to damage it further.

Good idea. This encourages you to follow the advice given above.



 I desperately
 need to salvage the data, what do the kind people on this list
 recommend?

BACKUPS!!! =^_^=



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-24 Thread Polytropon

On Mon, 24 Aug 2009 14:13:22 -0600, Tim Judd taj...@gmail.com wrote:
 If I were you, get a copy of spinrite (from grc.com) and always keep
 it handy.  It can be risky on a drive already failing.  Here's what
 I'd do
 
 Buy spinrite, no matter what.

Is it really such a good tool? From my own problems, I researched
that common recovery tools are R-Studio and UFS Explorer. Both
do not natively run on BSD, but the first one offers a bootable
CD. Without buying, you can run the diagnostics mode fullwise.
For recovery, you need to buy the program.

The Spinrite web page reads as follows:

The industry's #1 hard drive data recovery
software is NOW COMPATIBLE with NTFS,
FAT, Linux, and ALL OTHER file systems!

What? Linux and other file systems?

Is this just marketing, in order to look good to the not very
educated ones? Or do they not know what they're talking about?

In fact, I will keep an eye on this program. Maybe it can help me
get my data back (inode defect of $HOME entry). I'm reading their
web page some more right now.



 slave the bad drive, read-only mount..  even if the FS is dirty,
 read-only.. no fsck.

You can at least do one fsck run without any modification options,
like a read only file system check. This of course can - like
any read operation on the disk - be risky if the disk is fast
degrading, simply by using it.





-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-24 Thread Tim Judd

On 8/24/09, Polytropon free...@edvax.de wrote:
 On Mon, 24 Aug 2009 14:13:22 -0600, Tim Judd taj...@gmail.com wrote:
 If I were you, get a copy of spinrite (from grc.com) and always keep
 it handy.  It can be risky on a drive already failing.  Here's what
 I'd do

 Buy spinrite, no matter what.

 Is it really such a good tool? From my own problems, I researched
 that common recovery tools are R-Studio and UFS Explorer. Both
 do not natively run on BSD, but the first one offers a bootable
 CD. Without buying, you can run the diagnostics mode fullwise.
 For recovery, you need to buy the program.

 The Spinrite web page reads as follows:

   The industry's #1 hard drive data recovery
   software is NOW COMPATIBLE with NTFS,
   FAT, Linux, and ALL OTHER file systems!

It's OS/FS independent.  it works on the bits stored on the magnetic
platters, NOT on a filesystem.  TiVo, Linux, BSD and Mac OSX drives
are treated the same.  Bits on a magnetic platter.  It's recovery
stems from the randomization and movement of the head to the sector in
question that allows it to salvage any bits it can (for example, other
recovery will abandon 512bytes if 1 bit cannot be read.  spinrite will
recover 512bytes-1bit to a hard drive's spare sector once spinrite
says i'm done working with this sector.)  It leads to a very
successful rate.


 What? Linux and other file systems?

 Is this just marketing, in order to look good to the not very
 educated ones? Or do they not know what they're talking about?

 In fact, I will keep an eye on this program. Maybe it can help me
 get my data back (inode defect of $HOME entry). I'm reading their
 web page some more right now.


Again, works on the bits.  if it's a bit problem, it will do it's best
to fix the problem, unless it's a hardware defect and cannot be
relocated.  If enough sectors are relocated, and the drive has run out
of spare sectors, it's time to scrap the drive anyway.


 slave the bad drive, read-only mount..  even if the FS is dirty,
 read-only.. no fsck.

 You can at least do one fsck run without any modification options,
 like a read only file system check. This of course can - like
 any read operation on the disk - be risky if the disk is fast
 degrading, simply by using it.


which is why i recommend against making changes to the disk until a
spinrite has completed.


Personally, I setup a spinrite to be net-bootable (not officially
supported).  I can write a walkthrough to people who want to net-boot
it.  I won't provide spinrite, of course.


I currently netboot:
  FreeBSD
  memtest86
  spinrite

with no changes to my setup any time I want to boot anything.



 --
 Polytropon
 Magdeburg, Germany
 Happy FreeBSD user since 4.0
 Andra moi ennepe, Mousa, ...

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-24 Thread Polytropon

On Mon, 24 Aug 2009 14:51:41 -0600, Tim Judd taj...@gmail.com wrote:
 It's OS/FS independent.  it works on the bits stored on the magnetic
 platters, NOT on a filesystem.

Ah, I see. So it's primarily intended for diagnosing and recovering
from physically defective disks. Good to know, because there are
times when you exactly need to do this. So it's much more hardware
oriented than the usual candidates for recovery programs.

So the strange mentioning of Linux and other file systems just
seems to be of a marketing nature. :-)





-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-24 Thread Tim Judd

On 8/24/09, Polytropon free...@edvax.de wrote:
 On Mon, 24 Aug 2009 14:51:41 -0600, Tim Judd taj...@gmail.com wrote:
 It's OS/FS independent.  it works on the bits stored on the magnetic
 platters, NOT on a filesystem.

 Ah, I see. So it's primarily intended for diagnosing and recovering
 from physically defective disks. Good to know, because there are
 times when you exactly need to do this. So it's much more hardware
 oriented than the usual candidates for recovery programs.

 So the strange mentioning of Linux and other file systems just
 seems to be of a marketing nature. :-)

whatever you would like to call it, I find it accurate description of
the product and it avoids false advertising.


Not just diagnostics and recovery, it's for preventive maintenance,
and healthy operations too.  Most people who use it are in a
diagnostics and recovery, but if you always use it as preventive
maintenance, you'll never need to use it for diagnostics and recovery.


People complain about it: I keep running spinrite, but it never finds
problems!  exactly, it's doing it's job and not having to
recover.  It's doing the work the drive needs to swap out bad sectors
and everything.



 --
 Polytropon
 Magdeburg, Germany
 Happy FreeBSD user since 4.0
 Andra moi ennepe, Mousa, ...

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-24 Thread Polytropon

On Mon, 24 Aug 2009 15:32:05 -0600, Tim Judd taj...@gmail.com wrote:
 Not just diagnostics and recovery, it's for preventive maintenance,
 and healthy operations too.  Most people who use it are in a
 diagnostics and recovery, but if you always use it as preventive
 maintenance, you'll never need to use it for diagnostics and recovery.
 
 People complain about it: I keep running spinrite, but it never finds
 problems!  exactly, it's doing it's job and not having to
 recover.  It's doing the work the drive needs to swap out bad sectors
 and everything.

Well, and its price is not as high as most recovery tools.
So prevention is cheaper than intervention here. :-)



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: hard disk failure - now what?

2009-08-24 Thread Roland Smith

On Mon, Aug 24, 2009 at 12:29:19PM -0600, Kelly Martin wrote:
 I just experienced a hard drive failure on one of my FreeBSD 7.2
 production servers with no backup! I am so mad at myself for not
 backing up!!

Welcome to the club. :-)

 Now it's a salvage operation. Here are the type of errors
 I was getting on the console, over-and-over:
 
 ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=441633503
 ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout -
 completing request directly
 ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout -
 completing request directly
 ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
 ad4: FAILURE - WRITE_DMA48 timed out LBA=441633375
 g_vgs_done():ad4s1f[WRITE(offset=216338284544, length=16384)]error = 5

It _could_ just be a bad or improperly connected SATA cable. Try changing or
re-seating the cable.

Read errors cannot damage your data, but write errors can! Immediately stop
all writing to the disk. Re-mount the partitions on that disk as read-only, or
unmount them.

To see if a disk really is broken, install sysutils/smartmontools, and run
'smartctl -a' on the disk. If you see errors in its report (e.g. reallocated
sectors), the disk is dying and should be unplugged to prevent it from getting
worse.

 My question: what kind of checks and/or repair tools should I run on
 the damaged drive after it's mounted?

As others have mentioned, first make a copy (with the disk unmounted) of the
partitions on that disk with dd, saving them to another drive. That way you
can experiment with the data without further deterioration of the
original. You can use this disk image e.g. as a vnode-backed memory disk, see
mdconfig(8). If you cannot get a good copy of the disk partitions it might be
a good idea to get a quote from a professional hard drive data recovery
company to do that for you. I've never had occasion to try this (hooray for
backups) but I've heard it can be quite expensive. :-/

Try using fsck_ffs on (copies of) the disk image to see if that can restore
the damage. If the damage is beyond repair for fsck_ffs, you have a real
problem. Of course is you have a good disk image, your data is still
there, but you might have to use a forensics program like sysutils/sleuthkit
or hexdump to try and piece files together. And even then you cannot be sure
that there is no corrupted data in the files themselves. Good luck with that. 
:-(


Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpG8KHu4CLdA.pgp
Description: PGP signature

Re: Booting to root on gmirror with disk failure, is it even possible?

2007-09-05 Thread Tobias Ernst

Modulok schrieb:

 Before I invest significantly more time into my current gmirror
 issues, I have but two simple questions for anyone out there:
 
 1. Has anyone used gmirror for the root partition and been able to
 successfully boot with one failed (or un-plugged) disk? It's the
 latter part of the question that is the real issue for me. I'm just
 looking for a confirmed it's possible.

Yes, it is possible. IBM xSeries 346, FreeBSD 6.2-RELEASE, amd64. U360
hard drives. More specs are available from IBM. Using gmirror because we
only have an Adaptec HostRAID (aka FakeRAID) controller and not a
real ServerRaid, i.e. our SCSI controller basically has no useful RAID
capabilities built in.

My test case is to unplug any one disk while the system is running.
(Don't do this with your system unless your hardware is specified for
hot plugging!). FreeBSD detects a bus reset, marks the gmirror as
degraded and continues operating normally, and I can also reboot the
degraded gmirror without any problems.

The more conservative test case is to power down the system, unplug any
one disk, and restart the system. No problems with that either.

In fact, the absolutely robust behaviour of gmirror was one of my key
arguments for switching from Linux to FreeBSD :-).

Of course there are a zillion ways to fail your hard disk, and there
could be cases where one hard disk might start behaving erratically, and
gmirror might not be able to detect all such cases and might try to
continue using the failed disk. This could theoretically lead to some
nasty data integrity issues in the worst case. But this is true for any
RAID, even when implemented in hardware IMO.

Regards
Tobias

-- 
Universität Stuttgart|Fakultät für Architektur und Stadtplanung|casinoIT
70174 Stuttgart Geschwister-Scholl-Straße 24D
T +49 (0)711 121-4228 F +49 (0)711 121-4276
E [EMAIL PROTECTED]  I http://www.casino.uni-stuttgart.de
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Booting to root on gmirror with disk failure, is it even possible?

2007-09-05 Thread Eric Crist


On Sep 4, 2007, at 9:31 PMSep 4, 2007, Modulok wrote:


Before I invest significantly more time into my current gmirror
issues, I have but two simple questions for anyone out there:

1. Has anyone used gmirror for the root partition and been able to
successfully boot with one failed (or un-plugged) disk? It's the
latter part of the question that is the real issue for me. I'm just
looking for a confirmed it's possible.

2. If yes, what version of FreeBSD, what brand/model of hard disks,
and what mainboard was used?



We have been using gmirror on some Dell systems for a while now, and  
we put it through it's paces before we deployed it to production.  We  
pulled drives while the system was running, rebooted, the works.


We found gmirror to be pretty fault tolerant and were not able to get  
it to fail.  If you pull your main drive, the system was always able  
to successfully boot from the second drive.  Rebuilding was always  
possible, as well.


Our tests were done on older Dell PowerEdge 1650's with Fujitsu SCSI  
drives.  I don't know specifically what model/manufacturer the  
motherboard is.


If there's any other questions, feel free to ask!
-
Eric F Crist
Secure Computing Networks


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Booting to root on gmirror with disk failure, is it even possible?

2007-09-04 Thread Modulok

Before I invest significantly more time into my current gmirror
issues, I have but two simple questions for anyone out there:

1. Has anyone used gmirror for the root partition and been able to
successfully boot with one failed (or un-plugged) disk? It's the
latter part of the question that is the real issue for me. I'm just
looking for a confirmed it's possible.

2. If yes, what version of FreeBSD, what brand/model of hard disks,
and what mainboard was used?

Thanks.
-Modulok-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Booting to root on gmirror with disk failure, is it even possible?

2007-09-04 Thread Wojciech Puchar



1. Has anyone used gmirror for the root partition and been able to
successfully boot with one failed (or un-plugged) disk? It's the
latter part of the question that is the real issue for me. I'm just
looking for a confirmed it's possible.


yes it is.
but with unplugged, with failed it may not work, depend how disk is failed


2. If yes, what version of FreeBSD, what brand/model of hard disks,
and what mainboard was used?


any.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: root on gmirror and disk failure...

2007-09-02 Thread Matthew Seaman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Modulok wrote:

 The provider, /dev/mirror/gm0, is not being created. This is
 significant for obvious reasons: no mirror means no provider which
 means no root. The module /boot/kernel/geom_mirror.ko is loaded, as I
 have manually loaded it via the loader(8) prompt and attempted the
 simulated failure again with identical results. It's the provider that
 isn't being created, for whatever reason.
 
 Suggestions? (Other than purchasing a hardware RAID card).

I've seen these symptoms before.  It appears to be a feature of
certain Motherboards.  If you're lucky there will be some BIOS
options you can toggle to make it behave better -- usually to do
will telling the motherboard *not* to do any sort of RAID stuff itself.

Otherwise, look for BIOS updates, or switch to a different
Motherboard.  Systems supplied with hot swap drives (meaning SATA
rather than IDE) tend to work better.  Or install a hardware RAID
controller.

I've also a feeling that there's a behavioural difference between
'only one drive present' and 'one working drive and a blank disk'
but haven't really tested that theory out, as we solved the original
problem by other means.

Cheers,

Matthew

- --
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
  Kent, CT11 9PW
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG2m6/8Mjk52CukIwRCM/SAJ9paSYYdKjY4USerNYDCSKKPtjHUACfRNID
nDHwMgpGUrWa5q2h2y+rASw=
=9AW5
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

root on gmirror and disk failure...

2007-09-01 Thread Modulok

BACKGROUND
I'm experimenting with gmirror on the root partition on a test system
running FreeBSD 6.1-RELEASE. I have two physical disks on the same IDE
header (master and slave). This was setup following the tutorial in
the handbook. The system boots and mounts root from /dev/mirror/gm0s1a
as expected. The reported status of the mirror was active, complete,
up and running with both disks synced. Everything is as it should be.

PROBLEM
When I simulate a disk failure on the slave drive (the first consumer
in the mirror), by disconnecting the power input and booting the
system, the following occurs:
1. The BIOS complains of a disk error, but otherwise continues to boot.
2. The master boot record is found and boot is executed.
3. boot complains, letting me know the default disk it boots from
is gone and prompts for manual specification. This is fine and I
specify the alternate disk to boot from.
4. loader(8) is found and presents its menu, it starts the
automatic boot sequence with the default kernel.
5. It successfully finds the kernel and executes it.
6. The kernel output Trying to mount root from
ufs:/dev/mirror/gm0s1a, fails and I am prompted for the location of a
root partition. I digress just a moment here as the system appears to
hang, as keyboard input is not detected and at this point on the boot
phase I have no other metric from which to judge whether it is
actually hanged or not. The keyboard was detected prior to this
failure, as reported in the kernel's output and the keyboard is known
to work on this system. So that's weird, but it even if I could type
in something, it wouldn't matter because it's looking in the correct
location for root, it's just that location doesn't exist...moving
right along.

The provider, /dev/mirror/gm0, is not being created. This is
significant for obvious reasons: no mirror means no provider which
means no root. The module /boot/kernel/geom_mirror.ko is loaded, as I
have manually loaded it via the loader(8) prompt and attempted the
simulated failure again with identical results. It's the provider that
isn't being created, for whatever reason.

Suggestions? (Other than purchasing a hardware RAID card).
-Modulok-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Hard Disk failure

2004-10-09 Thread Mike Woods

Dean Hollister wrote:
Yep, the kernel reports it cannot read a couple sectors at bootup.
Is it just a case of fdisk'ing/label'ing the new drive with a standard 
MBR, setting up the filesystems and copying to them. Then the new 
drive should just boot normally?
Pretty much, i've done it a few times and never had a problem, tis also 
a good time to make any changes to your partiton structure :)

Remember to make any changes to the fstab that might need doing like 
moved partitions or differing device names!


Mike Woods
IT Technician
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Hard Disk failure

2004-10-09 Thread Lowell Gilbert

Mike Woods [EMAIL PROTECTED] writes:

 Dean Hollister wrote:
 
  Yep, the kernel reports it cannot read a couple sectors at bootup.
 
  Is it just a case of fdisk'ing/label'ing the new drive with a
  standard MBR, setting up the filesystems and copying to them. Then
  the new drive should just boot normally?
 
 Pretty much, i've done it a few times and never had a problem, tis
 also a good time to make any changes to your partiton structure :)
 
 Remember to make any changes to the fstab that might need doing like
 moved partitions or differing device names!

It's also worth pointing out that you really want to do the copy with
dump(8) and restore(8) to get file flags and special files copied
properly...  
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Hard Disk failure

2004-10-09 Thread Danny MacMillan

On Fri, Oct 08, 2004 at 04:07:27PM -0600, Dean Hollister wrote:
 
 Dear All,
 
 A quick question, and I've searched the FAQ/Handbook to no avail...
 
 One of the machines I maintain has developed bad sectors on it's /usr 
 filesystem. I can mount the filesystem R/O, so is it possible to install a 
 new drive, partition it in an identical fashion to the faulty drive and 
 copy the filesystems across to the new drive and then boot from the new 
 drive?
 
 Is there a walkthrough on the best way to do this?
 

This sounds like what you're looking for:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/disks.html#NEW-HUGE-DISK

I've never done this but it seems to be the traditional recommendation for
this sort of thing.

 Regards,
 
 d.

-- 
Danny
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Hard Disk failure

2004-10-08 Thread Benjamin P. Keating

Hey Dean,

Everything is a file in the UNIX world,  so copying over file for file
is no problem. be sure that you preserve permissions (aka 'archive
mode') preseriving ownership and permissions is vital.  '``cp'' should
do everything you need in this case.

Are you sure there are bad sectors? Can you attach your dmesg output
(just relative section please).

HTH
Ben


On Sat, 9 Oct 2004 06:07:27 +0800 (WST), Dean Hollister
[EMAIL PROTECTED] wrote:
 
 Dear All,
 
 A quick question, and I've searched the FAQ/Handbook to no avail...
 
 One of the machines I maintain has developed bad sectors on it's /usr
 filesystem. I can mount the filesystem R/O, so is it possible to install a
 new drive, partition it in an identical fashion to the faulty drive and
 copy the filesystems across to the new drive and then boot from the new
 drive?
 
 Is there a walkthrough on the best way to do this?
 
 Regards,
 
 d.
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

firewire disk failure

2003-11-15 Thread Ryan Clancey

about a month ago, i bought a maxtor external firewire disk.  it worked
great for the majority of that time.  today, though, it appears to have
failed quite spectacularly.  my machine was hung, and after rebooting, i
ran fsck on the disk, and got the following error:

(da0:sbp0:0:0:0): READ(10). CDB: 28 08 cb 24 95 0 0 1 0
(da0:sbp0:0:0:0): CAM Status: SCSI Status Error
(da0:sbp0:0:0:0): SCSI Status: Check Condition
(da0:sbp0:0:0:0): MEDIUM ERROR asc:4b,0
(da0:sbp0:0:0:0): Data phase error
(da0:sbp0:0:0:0): Retrying Command (per Sense Data)

repeated 5 times, then a bunch of fsck filesystem errors.  rinse,
lather, repeat, ad nauseum.

my question is this: what is the most likely culprit?  the kernel (5.1),
the disk, the enclosure, or the firewire card (d-link dfb-a5)?  has
anyone had problems with any of these components?  my concern is, which
one needs to be replaced?  the disk most certainly, but i don't want to
do that and find it failing again next month.

thanks for any input.
-ryan clancey

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Disk Failure?

2003-07-15 Thread Stephen Bader

Whenever I try to dump /var on one of our machines, I receive the
following errors from dump:

  DUMP: 30.11% done, finished in 1:56
  DUMP: read error from /dev/da1s1e: Invalid argument: [block
-1245853416]: count=16384
  DUMP: read error from /dev/da1s1e: Invalid argument: [sector
-1245853416]: count=512
  DUMP: read error from /dev/da1s1e: Invalid argument: [sector
-1245853415]: count=512
  DUMP: read error from /dev/da1s1e: Invalid argument: [sector
-1245853414]: count=512

It prints the above error about 500 times, and then completes the dump
(but I'm sure the data is not all there, due to the above error).

My question is, is this an indicator of a failing disk, or just a disk
that needs to be fsck'd? This is a very busy machine, and I'm also curious
if this may be caused by the fact that logs and email are being written to
the partition while the dump is going on, and if that would cause the
above error.

TIA!

-Steve


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Disk failure (hardware or config problem?) Vinum

2003-03-03 Thread Greg 'groggy' Lehey

[Format recovered--see http://www.lemis.com/email/email-format.html]

Log output wrapped.

On Tuesday, 25 February 2003 at  0:24:27 -0300, David Feig wrote:
 I have been playing with Vinum and my first serious experiment resulted
 in a serious failure. I am not sure if my hard drive just chose this
 moment to fail or if it is a configuration problem but I can't seem to
 do anything with the drive anymore:

 ...
 Feb 24 21:37:17 hoho /kernel: ad7s1e: hard error writing fsbn 8 (ad7s1 bn 8; cn 0 tn 
 0 sn 8)ad7s1e: hard error writing fsbn 8 (ad7s1 bn 8; cn 0 tn 0 sn 8) status=51 
 error=04

That's a hardware error.

Greg
--
When replying to this message, please copy the original recipients.
If you don't, I may ignore the reply or reply to the original recipients.
For more information, see http://www.lemis.com/questions.html
See complete headers for address and phone numbers
Please note: we block mail from major spammers, notably yahoo.com.
See http://www.lemis.com/yahoospam.html for further details.


pgp0.pgp
Description: PGP signature

Disk failure (hardware or config problem?) Vinum

2003-02-24 Thread David Feig

I have been playing with Vinum and my first serious experiment resulted 
in a serious failure. I am not sure if my hard drive just chose this 
moment to fail or if it is a configuration problem but I can't seem to 
do anything with the drive anymore:

I have a 40 G IBM drive that I have not really ever used. I configured 
it with Vinum to be a single volume with a single plex and a single 
drive (can't get much simpler than that.) After newfs'ing it I proceded 
to copy about 35G of data to it. While doing other things my system 
suddenly froze. I could not switch to a virtual terminal. The machine 
was no longer funtioning as a gateway but I could still ping it. I had 
no other option than to reboot. 

On reboot fsck successful cleaned up various inconsistoncies on various 
partitions (not the vinum volume.)

The last message in /etc/messages was:

Feb 24 07:23:47 hoho /kernel: pid 756 (kdeinit), uid 1000 on /: file 
system full

Although none of my file systems are full after reboot.

The drive ad7 still shows up in my boot messages. When I started vinum 
I got this in /var/messages:



Feb 24 21:36:47 hoho /kernel: vinum: loaded
Feb 24 21:37:17 hoho /kernel: vinum: drive a is up
Feb 24 21:37:17 hoho /kernel: vinum: simplevinum.p0.s0 is up
Feb 24 21:37:17 hoho /kernel: vinum: simplevinum.p0 is up
Feb 24 21:37:17 hoho /kernel: vinum: simplevinum is up
Feb 24 21:37:17 hoho /kernel: ad7s1e: hard error writing fsbn 8 (ad7s1 
bn 8; cn 0 tn 0 sn 8)ad7s1e: hard error writing fsbn 8 (ad7s1 bn 8; cn 
0 tn 0 sn 8) status=51 error=04
Feb 24 21:37:17 hoho /kernel: ad7: DMA problem fallback to PIO mode
Feb 24 21:37:17 hoho /kernel: vinum: Can't write config to /dev/ad7s1e, 
error 5
Feb 24 21:37:17 hoho /kernel: vinum: drive a is down
Feb 24 21:37:17 hoho /kernel: vinum: simplevinum.p0.s0 is crashed
Feb 24 21:37:17 hoho /kernel: vinum: simplevinum.p0 is faulty
Feb 24 21:37:17 hoho /kernel: vinum: simplevinum is down

I then tried to reconfigure the drive: 

su-2.05# dd if=/dev/zero of=/dev/ad7 bs=1k count=1
dd: /dev/ad7: Input/output error
1+0 records in
0+0 records out
0 bytes transferred in 0.014675 secs (0 bytes/sec)

with the /var/log/messages output of:
Feb 25 00:16:50 hoho /kernel: ad7: hard error writing fsbn 0 of 0-1 
(ad7 bn 0; cn 0 tn 0 sn 0) status=51 error=04

Note, I have an identical drive on the same IDE controller and it is 
still working fine, so I don't think it is a controller problem. 

Is my drive toast?

-- David 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-questions in the body of the message

48 matches

Mail list logo