subject:"Failing Drive"

Re: dealing with a failing drive

2007-11-25 Thread David Newman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/25/07 9:08 PM, Ted Mittelstaedt wrote:

> There are two physical disks in the server.  bus 1 target 0 and
> bus 1 target 1.  Those ARE the physical disks.  If one of them
> has failed instead of:
> 
>  Sync, Ultra2, Wide - Configured in a logical volume.
> 
> you will see something like:
> 
>  Sync, Ultra2, Wide - Unconfigured
> 
> or nothing at all.

Cool, thanks. Your output and mine are virtually identical.

Now I get what you mean by running idacontrol periodically and grokking
the output to verify both disks are still in the array.

> 
> It is normal for idacontrol to generate soft write errors.  The
> developer knows about this.  There's really no easy way to make
> it not happen.  It doesen't hurt anything, however.

OK, good to know.

thanks much!

dn

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFHSlXUyPxGVjntI4IRAlbxAJ0aZDSOeyrTIoEVtKOZd5UMbDMx9QCdHP8I
TAh9zWa+2cUlE5Qh2qfks2Y=
=iEK3
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

RE: dealing with a failing drive

2007-11-25 Thread Ted Mittelstaedt

Are we looking at the same output?

Here's the output of idacontrol show off one of my DL360 servers:

mail# idacontrol show
cmd_show_all()
[Compaq Integrated Array controller]
  Controller uptime: 301 hours 54 minutes 22 seconds
   Firmware Version: 1.50 (running) 1.50 (ROM)
Revision -
   Hardware: 2
  Marketing: A
 SCSI bus count: 2
 Max drives per bus: 16
Maximum request: 65535 blocks

Logical drive 0: 17359MB (35553120 sectors), blocksize=512
 Status: Logical drive ok
   Mode: Mirroring (RAID1)
   Drive ID: 
Drive Label:
bus 1 target 0 lun 0:
enclosure 0, bay 0, connector 2J
 direct-access
17361MB (35556888 512 byte sectors, 1088 reserved)
Sync, Ultra2, Wide - Configured in a logical volume.
bus 1 target 1 lun 0:
enclosure 0, bay 1, connector 2J
 direct-access
17361MB (35556888 512 byte sectors, 1088 reserved)
Sync, Ultra2, Wide - Configured in a logical volume.
bus 1 target 7 lun 0:
enclosure 0, bay 7, connector 2J
 non-disk
Async
mail#

There are two physical disks in the server.  bus 1 target 0 and
bus 1 target 1.  Those ARE the physical disks.  If one of them
has failed instead of:

 Sync, Ultra2, Wide - Configured in a logical volume.

you will see something like:

 Sync, Ultra2, Wide - Unconfigured

or nothing at all.

It is normal for idacontrol to generate soft write errors.  The
developer knows about this.  There's really no easy way to make
it not happen.  It doesen't hurt anything, however.

If the RAID card itself is flakey you can't really tell it from
software.  Even the Windows RAID utilities that HP/Compaq supplies
won't tell you this.

The "by the book" way of troubleshooting these servers is if you get
a disk failure, you immediately swap the disk.  Then if the failure
happens again and your pretty sure it's not the disk, you down the
server, and boot it into Compaq Diagnostics and let it run for a day or so.

It is not uncommon to end up with several additional hard drives
that you don't need in the process of identifying a bad RAID card
in a server.  We have all done it, it is part of the territory.  If
you cannot afford it, stay away from these servers.  Remember these
servers are designed for a medium to large corporation that has
a lot of resources.

To give you a typical scenario, a couple weeks ago one of our mailservers
running on a Proliant 1600R started freezing up.  I had the admin
pull the entire disk array and put the disks into our backup server,
that went online in place of the original server, and the original
server was pulled and put on a test bench.  About a week later the
admin finally discovered the processor board had worked it's way
almost out of the socket, after much hair-pulling, running of
diagnostics, and so on.

Ted

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of David Newman
> Sent: Sunday, November 25, 2007 2:58 PM
> To: Ted Mittelstaedt
> Cc: freebsd-questions@freebsd.org
> Subject: Re: dealing with a failing drive
>
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 11/24/07 12:39 PM, Ted Mittelstaedt wrote:
> > The output of idacontrol show will show if one of the
> > hard disks in the SmartArray has failed.  Your choice with
> > a hardware array is to either run it with redundancy or not.
> > (ie: raid5 or mirroring or striping)  You have to choose
> > which is more important for you.
> >
> > IMHO it is very foolish to stripe an array that you have
> > critical data on and assume that you can predict a failure
> > of a disk using smart or other monitoring, and replace it
> > in advance of a failure.  If your concern is redundancy, then
> > add more disks to the array and create a raid 5 or a mirror.
> > Then ignore all the predictive junk and let the array card
> > concern itself with detecting if a drive has failed.  Run
> > idacontrol periodically out of a script that checks for a
> > failure of a disk and e-mails you if there is one.
>
> Thanks, this is good advice, but it doesn't answer the specific
> questions I had:
>
> 1. How to diagnose the health of a *physical* disk that's part of a RAID
> array (RAID1, in this case) in an old Compaq Proliant server?
>
> 2. Is it normal for idacontrol to generate soft write errors?
>
> Backstory here is that Proliant server #1 generated beaucoup hard and
> soft read and write errors and eventually locked up. I thought it was
> one of the disks but replacing one at a time didn't help. So I took both
> disks and put them in identical Proliant server #2. Ergo, I would
> conclude server #1's RAID controller flaked

Re: dealing with a failing drive

2007-11-25 Thread Bob Richards

On Sun, 25 Nov 2007 08:45:46 +
Matthew Seaman <[EMAIL PROTECTED]> wrote:

> sysutils/aaccli  aaccli-1.0  Adaptec SCSI RAID administration
>

As I said in my previous post, this is EXACTLY what was wanted.

Installation of aaccli  was a snap. My only problem was the total lack
of documentation; no man page, no info file Capturing the "help"
screens within the CLI was useful, but pretty incomplete.

I found an Adaptec doc, describing their cli-sata-scsi-iug program;
http://download.adaptec.com/pdfs/installation_guides/cli-sata-scsi-iug.pdf 

This seems to be exactly what aaccli is. 

Since I usually do this sort of work outside of X, at the console, I
converted the adaptec  pdf file into a text file using pdftotext. The
ridiculous copyright restrictions on this file prevents me from
producing a man page, or an info file for redistribution as part of the
port!

So; If anyone wants either the pdf file, or the converted text file, I
would be glad to email same. Just send an email to
[EMAIL PROTECTED] and ask for
either my  /usr/local/share/cli/cli-sata-scsi-iug.pdf or for my 
/usr/local/share/cli/cli-sata-scsi-iug.txt.

Bob

-- 
  _
 /o\
// \\ The ASCII
\\ // Ribbon Campaign
 \V/  Against HTML
 /A\  eMail!
// \\
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-25 Thread David Newman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/24/07 12:39 PM, Ted Mittelstaedt wrote:
> The output of idacontrol show will show if one of the
> hard disks in the SmartArray has failed.  Your choice with
> a hardware array is to either run it with redundancy or not.
> (ie: raid5 or mirroring or striping)  You have to choose 
> which is more important for you.
> 
> IMHO it is very foolish to stripe an array that you have
> critical data on and assume that you can predict a failure
> of a disk using smart or other monitoring, and replace it
> in advance of a failure.  If your concern is redundancy, then
> add more disks to the array and create a raid 5 or a mirror.
> Then ignore all the predictive junk and let the array card
> concern itself with detecting if a drive has failed.  Run
> idacontrol periodically out of a script that checks for a
> failure of a disk and e-mails you if there is one.

Thanks, this is good advice, but it doesn't answer the specific
questions I had:

1. How to diagnose the health of a *physical* disk that's part of a RAID
array (RAID1, in this case) in an old Compaq Proliant server?

2. Is it normal for idacontrol to generate soft write errors?

Backstory here is that Proliant server #1 generated beaucoup hard and
soft read and write errors and eventually locked up. I thought it was
one of the disks but replacing one at a time didn't help. So I took both
disks and put them in identical Proliant server #2. Ergo, I would
conclude server #1's RAID controller flaked out.

idacontrol is useful for telling the health of the logical disk. What it
doesn't tell me (or maybe I just don't see it) is whether the physical
disks are ok, and those "soft write errors" concern me. I had a failure
situation, and need to figure out whether just the controller was bad or
whether I need to replace at least one disk too.

Thanks again!

dn

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFHSf39yPxGVjntI4IRAp1yAJ4vMV9FkeaBsHRr/Z5WpCL27wJ3tACfS+pT
3UVlscnQUZhe8ulHksKDWsY=
=Om7/
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-25 Thread Bob Richards

On Sun, 25 Nov 2007 08:45:46 +
Matthew Seaman <[EMAIL PROTECTED]> wrote:

>... it's a rebadged Adaptec RAID controller using
> the aac

Wonderful; I can now look into and play with the RAID system without
taking the OS off-line and going to the bios.

Thanks!
Bob

-- 
  _
 /o\
// \\ The ASCII
\\ // Ribbon Campaign
 \V/  Against HTML
 /A\  eMail!
// \\
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-25 Thread Matthew Seaman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Bob Richards wrote:

> I have a similar issue, only it is with a Dell server which has 6 SCSI
> drives in a hardware raid array. The controller is a Dell PERC 2/Si.
> 
> Is there an equivalent monitor utility for this as well? I am currently
> running: FreeBSD 6.1-RELEASE-p20 #2.

If that's a rebadged LSI MegaRAID card and uses the amr driver under
FreeBSD, then there are two packages that may be of interest:

sysutils/amrstat amrstat-20070216Utility for LSI Logic's MegaRAID RAID 
controllers
sysutils/megarc  megarc-1.51 LSI Logic's MegaRAID controlling software

On the other hand, if it's a rebadged Adaptec RAID controller using
the aac driver under FreeBSD then you want:

sysutils/aaccli  aaccli-1.0  Adaptec SCSI RAID administration tool

Cheers,

Matthew

- -- 
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
  Kent, CT11 9PW
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHSTY68Mjk52CukIwRCPEqAJ9Pc4YyFagh7y9jmA2SPOUv7+2bJgCfd21K
IGMSIdhSznOl9WTms5Oc0NI=
=JgO2
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-24 Thread Bob Richards


> Compaq uses several RAID cards most are under the so-called
> "SmartArray" using the ida driver.  If this is yours, you can
> use a utility called "idacontrol" that can monitor the array,

Interesting discussion!

I have a similar issue, only it is with a Dell server which has 6 SCSI
drives in a hardware raid array. The controller is a Dell PERC 2/Si.

Is there an equivalent monitor utility for this as well? I am currently
running: FreeBSD 6.1-RELEASE-p20 #2.

TIA

Bob


signature.asc
Description: PGP signature

RE: dealing with a failing drive

2007-11-24 Thread Ted Mittelstaedt

The output of idacontrol show will show if one of the
hard disks in the SmartArray has failed.  Your choice with
a hardware array is to either run it with redundancy or not.
(ie: raid5 or mirroring or striping)  You have to choose 
which is more important for you.

IMHO it is very foolish to stripe an array that you have
critical data on and assume that you can predict a failure
of a disk using smart or other monitoring, and replace it
in advance of a failure.  If your concern is redundancy, then
add more disks to the array and create a raid 5 or a mirror.
Then ignore all the predictive junk and let the array card
concern itself with detecting if a drive has failed.  Run
idacontrol periodically out of a script that checks for a
failure of a disk and e-mails you if there is one.

Ted

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of David Newman
> Sent: Monday, November 19, 2007 8:44 AM
> To: freebsd-questions@freebsd.org
> Subject: Re: dealing with a failing drive
> 
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 11/18/07 11:30 PM, Ted Mittelstaedt wrote:
> > Hi David,  apologies to Jerry for jumping in.
> > 
> > Compaq uses several RAID cards most are under the so-called
> > "SmartArray" using the ida driver.  If this is yours, you can
> > use a utility called "idacontrol" that can monitor the array,
> 
> Hi Ted,
> 
> Thanks much for this info. I'm pleased to report that idacontrol thinks
> the logical array is in good shape. (This is on an identical server; I
> moved both disks from a RAID1 array there after the first server started
> reporting write and read errors.)
> 
> 
> > NOTE:
> > 
> > The smart utility only works on SATA or ATA/IDE drives, not SCSI.
> 
> Yes. I've heard it said that "SMART isn't."
> 
> This Proliant DL320 server uses a SmartArray controller and SCSI disks.
> SMART or not, is there a way of monitoring the health of the physical
> disks from within FreeBSD?
> 
> thanks again!
> 
> dn
> 
> 
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.3 (Darwin)
> 
> iD8DBQFHQb1WyPxGVjntI4IRAhZwAKCzS4yKRyeJZDXm2pq+aIL8VMBKQQCfUpq3
> +eThP189Kav2DSRVAgDdbDI=
> =coqi
> -END PGP SIGNATURE-
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to 
> "[EMAIL PROTECTED]"
> 
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-22 Thread David Newman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/18/07 11:30 PM, Ted Mittelstaedt wrote:

> idacontrol show | grep "Status"
> 
> IF status is fully up it will say:
> 
> Status: Logical drive ok

And that's what it does say. So far so good...

...but then each time I run idacontrol I get this in /var/log/messages:

Nov 21 17:01:30 mail kernel: ida0: soft error
Nov 21 17:01:36 mail last message repeated 59 times

Does this mean the controller is OK and the disks are dying? Or is it
expected behavior with idacontrol? Or something else?

thanks

dn

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFHRNeLyPxGVjntI4IRAkigAJ41KeUVpDfNab6f/F/eHcSCrJLMrwCdHLos
eYOqGGn8K3RV1l/okGwuYp4=
=U4Tx
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Failing Drive

2007-11-19 Thread Chad Gross

On Nov 16, 2007 5:05 PM, Douglas Rodriguez <[EMAIL PROTECTED]> wrote:
> I've been getting the following message repeating continuously:
>
> ad1:FAILURE - READ_DMA status=51
> error=1 LBA=216026367
> g_vfs_done():ad1s1[READ(offset = 110605467648, length = 16384)]error=5
> ad1:FAILURE - READ_DMA status=51
> error=40 LBA=216026367
> g_vfs_done():ad1s1[READ(offset = 110605467648, length = 16384)]error=5
> ad1:FAILURE - READ_DMA status=51
> error=1 LBA=216026367
> g_vfs_done():ad1s1[READ(offset = 110605467648, length = 16384)]error=5
> 
>
> The same thing repeats every so often.  What does this mean?  I've read
> other threads (Drives Dieing) about possibly shutting down dma or
> reinstalling the system, but is that the best solution to this kind of
> problem?
>
> Thanks.
>
> ~Doug
>
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>

One of the first things you can do is install sysutils/smartmontools.
This package gives you the ability to access the S.M.A.R.T.
functionality of your drives. Of course, your drives need to include
S.M.A.R.T.  capability and be enabled. After installing you can check
to see if your drives support it by using the smartctl command. This
is also the command that will use to run tests and check the results.

Check out their homepage for more info: http://smartmontools.sourceforge.net/

Regards

-- 
Chad M. Gross
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-19 Thread David Newman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/18/07 11:30 PM, Ted Mittelstaedt wrote:
> Hi David,  apologies to Jerry for jumping in.
> 
> Compaq uses several RAID cards most are under the so-called
> "SmartArray" using the ida driver.  If this is yours, you can
> use a utility called "idacontrol" that can monitor the array,

Hi Ted,

Thanks much for this info. I'm pleased to report that idacontrol thinks
the logical array is in good shape. (This is on an identical server; I
moved both disks from a RAID1 array there after the first server started
reporting write and read errors.)

> NOTE:
> 
> The smart utility only works on SATA or ATA/IDE drives, not SCSI.

Yes. I've heard it said that "SMART isn't."

This Proliant DL320 server uses a SmartArray controller and SCSI disks.
SMART or not, is there a way of monitoring the health of the physical
disks from within FreeBSD?

thanks again!

dn

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFHQb1WyPxGVjntI4IRAhZwAKCzS4yKRyeJZDXm2pq+aIL8VMBKQQCfUpq3
+eThP189Kav2DSRVAgDdbDI=
=coqi
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

RE: dealing with a failing drive

2007-11-18 Thread Ted Mittelstaedt



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Jerry
> McAllister
> Sent: Monday, November 12, 2007 8:04 AM
> To: David Newman
> Cc: freebsd-questions@freebsd.org
> Subject: Re: dealing with a failing drive
> 
> 
> On Sat, Nov 10, 2007 at 05:22:06PM -0800, David Newman wrote:

> > I vaguely remember trying about a year ago to load a SMART utility from
> > the ports collection but it wouldn't work on drives in a RAID array.
> > 
> > Is there some other way to:
> > 
> > a) diagnose/fix the errant disk here?
> > b) monitor the health of disks on a Compaq controller so it doesn't get
> > to this point to begin with?
> > 

Hi David,  apologies to Jerry for jumping in.

Compaq uses several RAID cards most are under the so-called
"SmartArray" using the ida driver.  If this is yours, you can
use a utility called "idacontrol" that can monitor the array,
here's the instructions for using it.  You will need usrsbin 
sources installed:


) Install idacontrol


cd /usr/ports
mkdir distfiles
cd /usr/ports/distfiles
mkdir manual-build
cd manual-build
fetch ftp://ftp.jurai.net/users/winter/idacontrol.tar
cd /usr/src
tar xf /usr/ports/distfiles/manual-build/idacontrol.tar

cd /usr/src/usr.sbin/idacontrol

vi makefile  change variable NOMAN to NO_MAN
make obj && make depend && make && make install

cd

idacontrol show | grep "Status"

IF status is fully up it will say:

Status: Logical drive ok

IF status is degraded it will say 1 of several other error messages.


More on PR i386/70482
and on thread:
http://lists.freebsd.org/pipermail/freebsd-scsi/2005-September/002009.html


NOTE:

The smart utility only works on SATA or ATA/IDE drives, not SCSI.

Ted
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Failing Drive

2007-11-16 Thread Kevin Kinsey


Douglas Rodriguez wrote:

I've been getting the following message repeating continuously:

ad1:FAILURE - READ_DMA status=51
error=1 LBA=216026367
g_vfs_done():ad1s1[READ(offset = 110605467648, length = 16384)]error=5
ad1:FAILURE - READ_DMA status=51
error=40 LBA=216026367
g_vfs_done():ad1s1[READ(offset = 110605467648, length = 16384)]error=5
ad1:FAILURE - READ_DMA status=51
error=1 LBA=216026367
g_vfs_done():ad1s1[READ(offset = 110605467648, length = 16384)]error=5


The same thing repeats every so often.  What does this mean?  I've read
other threads (Drives Dieing) about possibly shutting down dma or
reinstalling the system, but is that the best solution to this kind of
problem?



Backup, backup, backup ;-)

You'll need a Real Expert(tm) to help on the ILLEGAL_LENGTH error, but
I've seen UNCORRECTABLE plenty.  Keep in mind that it may cost some time
and energy to find out; apart from a bad disk, could be a bad disk *controller*.

I bought two new HDD's recently because of similar problems, but all of
them are now working fine on a new motherboard :-/

Sorry no help here :-/

Kevin Kinsey
--
Recursion: n. See Recursion.
-- Random Shack Data Processing Dictionary
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Failing Drive

2007-11-16 Thread Douglas Rodriguez

I've been getting the following message repeating continuously:

ad1:FAILURE - READ_DMA status=51
error=1 LBA=216026367
g_vfs_done():ad1s1[READ(offset = 110605467648, length = 16384)]error=5
ad1:FAILURE - READ_DMA status=51
error=40 LBA=216026367
g_vfs_done():ad1s1[READ(offset = 110605467648, length = 16384)]error=5
ad1:FAILURE - READ_DMA status=51
error=1 LBA=216026367
g_vfs_done():ad1s1[READ(offset = 110605467648, length = 16384)]error=5


The same thing repeats every so often.  What does this mean?  I've read
other threads (Drives Dieing) about possibly shutting down dma or
reinstalling the system, but is that the best solution to this kind of
problem?

Thanks.

~Doug

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-14 Thread jdow

From: "Jerry McAllister" <[EMAIL PROTECTED]>
Sent: Monday, November 12, 2007 12:53

On Mon, Nov 12, 2007 at 09:26:38AM -0800, David Newman wrote:

On 11/12/07 8:14 AM, Jerry McAllister wrote:

> An update: After doing what you suggest (leaving in the "good" disk,
> adding a new disk, RAID rebuilding) I still got soft write errors --
> with *either one* of the disks I tried.
> 
> Then I tried putting both disks in an identical server and they came up

> fine, no read or write errors.
> 
> Ergo, the bad RAID controller is bad and the disks may be OK.
> 
>> Probably not.

>> Generally, if the RAID controller is bad, you will see errors
>> all over and not it just one place, tho I suppose it is possible.
>> Check and see what it reports as error locations and see if they
>> move around any.

Jerry, thanks for your response.

After 36 hours of running the same disks in a different, identical
machine there hasn't been a single read or write error. I'm hardly a
storage expert but from the evidence I have I'm inclined to believe the
root cause was a bad RAID controller and not failed disks.

That is not much proof. 
The different machine would probably be accessing the disks in

a different way, either slightly different positioning or using
different space.   Also, 36 hours is not really much time.

Dn, I have had a Promise controller that was bad. I kept getting errors
at one specific location on two disks out of three on a RAID 5. The
system continued to operate. When I finally spent the time to nail it
down to the controller I found the Promise people more than anxious to
get the beast for a postmortem. It had been bad for me from day one. It
would take about a week to a month for the problem to appear. After the
6th disk showing the problem at the same block number the coin dropped
in my sometimes overly slow mind.

{^_-}Joanne
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-14 Thread jdow


From: "David Newman" <[EMAIL PROTECTED]>

-BEGIN PGP SIGNED MESSAGE-



On 11/12/07 8:14 AM, Jerry McAllister wrote:


An update: After doing what you suggest (leaving in the "good" disk,
adding a new disk, RAID rebuilding) I still got soft write errors --
with *either one* of the disks I tried.

Then I tried putting both disks in an identical server and they came up
fine, no read or write errors.

Ergo, the bad RAID controller is bad and the disks may be OK.


Probably not.
Generally, if the RAID controller is bad, you will see errors
all over and not it just one place, tho I suppose it is possible.
Check and see what it reports as error locations and see if they
move around any.


Jerry, thanks for your response.

After 36 hours of running the same disks in a different, identical
machine there hasn't been a single read or write error. I'm hardly a
storage expert but from the evidence I have I'm inclined to believe the
root cause was a bad RAID controller and not failed disks.

I'm aware of CLI tools to monitor 3Ware SATA RAID controllers. Anyone
know if there are similar tools for HP/Compaq SCSI RAID controllers?


Bad cable? Iffy power supply? Examine each step the data and power
take for possible hitches. You might even have an overheated and
weakened power connector on a drive. If it's not making solid contact
it can give you headaches.

{^_^}
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-12 Thread Jerry McAllister

On Mon, Nov 12, 2007 at 09:26:38AM -0800, David Newman wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 11/12/07 8:14 AM, Jerry McAllister wrote:
> 
> > An update: After doing what you suggest (leaving in the "good" disk,
> > adding a new disk, RAID rebuilding) I still got soft write errors --
> > with *either one* of the disks I tried.
> > 
> > Then I tried putting both disks in an identical server and they came up
> > fine, no read or write errors.
> > 
> > Ergo, the bad RAID controller is bad and the disks may be OK.
> > 
> >> Probably not.
> >> Generally, if the RAID controller is bad, you will see errors
> >> all over and not it just one place, tho I suppose it is possible.
> >> Check and see what it reports as error locations and see if they
> >> move around any.
> 
> Jerry, thanks for your response.
> 
> After 36 hours of running the same disks in a different, identical
> machine there hasn't been a single read or write error. I'm hardly a
> storage expert but from the evidence I have I'm inclined to believe the
> root cause was a bad RAID controller and not failed disks.

That is not much proof. 
The different machine would probably be accessing the disks in
a different way, either slightly different positioning or using
different space.   Also, 36 hours is not really much time.

It could be you are right, but disks have a way of starting small
in errors and then avalanching on you with accelerating volume
of errors just when you begin to feel safe.

You could be right, but is the price of a disk worth it - the
price of a new RAID controller, for that matter?   Replace them
both.

jerry

> 
> I'm aware of CLI tools to monitor 3Ware SATA RAID controllers. Anyone
> know if there are similar tools for HP/Compaq SCSI RAID controllers?
> 
> thanks
> 
> dn
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.3 (Darwin)
> 
> iD8DBQFHOIzOyPxGVjntI4IRAmMWAJ4grMR6mcL/j9qbcGY/fJfDEqv3KgCg8BVW
> wcHVDkZPykFcQzVYnp8mx+g=
> =8rws
> -END PGP SIGNATURE-
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-12 Thread David Newman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/12/07 8:14 AM, Jerry McAllister wrote:

> An update: After doing what you suggest (leaving in the "good" disk,
> adding a new disk, RAID rebuilding) I still got soft write errors --
> with *either one* of the disks I tried.
> 
> Then I tried putting both disks in an identical server and they came up
> fine, no read or write errors.
> 
> Ergo, the bad RAID controller is bad and the disks may be OK.
> 
>> Probably not.
>> Generally, if the RAID controller is bad, you will see errors
>> all over and not it just one place, tho I suppose it is possible.
>> Check and see what it reports as error locations and see if they
>> move around any.

Jerry, thanks for your response.

After 36 hours of running the same disks in a different, identical
machine there hasn't been a single read or write error. I'm hardly a
storage expert but from the evidence I have I'm inclined to believe the
root cause was a bad RAID controller and not failed disks.

I'm aware of CLI tools to monitor 3Ware SATA RAID controllers. Anyone
know if there are similar tools for HP/Compaq SCSI RAID controllers?

thanks

dn
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFHOIzOyPxGVjntI4IRAmMWAJ4grMR6mcL/j9qbcGY/fJfDEqv3KgCg8BVW
wcHVDkZPykFcQzVYnp8mx+g=
=8rws
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-12 Thread Jerry McAllister

On Sun, Nov 11, 2007 at 07:56:52AM -0800, David Newman wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 11/10/07 9:09 PM, Modulok wrote:
> >>> I'd welcome suggestions on how (or whether) to try to revive a SCSI
> > drive that's failing.
> > 
> > It depends on how valuable the data on the array is, and more
> > importantly, how much funding you have at your disposal to fix the
> > problem. If it were me, I would set aside the bad disk, connect a new
> > disk to the card and re-synchronize the array. (Assuming one of the
> > members still retains a good copy of the data.) Afterwards I would
> > destroy, or toss the existing disk in the trash can (depending on the
> > sensitivity of the data stored on it.)
> 
> Thanks for your reply.
> 
> An update: After doing what you suggest (leaving in the "good" disk,
> adding a new disk, RAID rebuilding) I still got soft write errors --
> with *either one* of the disks I tried.
> 
> Then I tried putting both disks in an identical server and they came up
> fine, no read or write errors.
> 
> Ergo, the bad RAID controller is bad and the disks may be OK.

Probably not.
Generally, if the RAID controller is bad, you will see errors
all over and not it just one place, tho I suppose it is possible.
Check and see what it reports as error locations and see if they
move around any.

A soft error is usually one that can be corrected within the limits
of rereads and any error correction that the system is using.  It
may be that the error was introduced when the problems with the old
disk was occuring so that there was an error written on to the other
supposedly good disk and then mirrored to the new disk - errors can
be preserved by mirroring too.

Having said that, I don't know where this error is from.  Try reading up
and rewriting the data that is in the spot getting the error and then 
reading it from the new location.   It is pretty hard to figure out
and specifically rewrite one certain block on modern systems because
the physical locations are virtual.   Although you would expect the
same sector number to be in the same place from one write to the next,
if it happens that that sector gets remapped due to an error, then
it will actually be a different physical location the next time and
you don't really prove anything.   But, it is worth experimenting 
with if you want.

You can dd from and to any sector on the partition by carefully
using skip counts and block counts.   But, you have to figure out
the location (sector number) first.

Good luck,

jerry

> 
> >>> Is there some other way to:
> >>> b)monitor the health of disks on a Compaq controller so it doesn't
> > get to this point to begin with?
> > 
> > There are various tools out there that attempt to 'monitor' the
> > condition of disk drives to try and predict when failure is eminent.
> > For valuable data, it is safer to setup a mirror and simply toss out
> > bad disks as they fail. For extremely valuable data use a 3 disk
> > array. With a 3 disk setup you will still be covered in the event that
> > an additional disk craps out during the re-sync.
> > 
> > To quote google's article on disk failure, regarding SMART:
> 
> Right, I've heard it said that "SMART isn't."
> 
> Nonetheless, I'd appreciate any suggestions to monitor the health of
> disks -- and RAID controllers too -- on HP Proliant servers running FreeBSD.
> 
> thanks again.
> 
> dn
> 
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.1 (Darwin)
> 
> iD8DBQFHNyZDyPxGVjntI4IRAqk1AKCUwByNOAJZwvtD9V21TZfyaMWaxgCdFSCZ
> dZjf3ynK+4OffBzsDOawF9A=
> =DUqc
> -END PGP SIGNATURE-
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-12 Thread Jerry McAllister

On Sat, Nov 10, 2007 at 05:22:06PM -0800, David Newman wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> I'd welcome suggestions on how (or whether) to try to revive a SCSI
> drive that's failing.

to answer 'whether':  don't.   Get your stuff off from it as
soon as possible and nuke it if it has anything sensitive at all.

If it is a mirror or raid5 then you should be able to just replace it, but
otherwise, back it up immediately and quit using it.

Generally, if you start seeing a regular hard error, the drive
is on its last legs.   The errors only increase.You may be 
able to do things to get past this one error, but more will be
coming.

So, is answer to 'how': also don't.

jerry

> 
> This is on FreeBSD 6.2-RELENG on a Compaq Proliant DL320, onboard RAID
> and two SCSI drives in a RAID1 array.
> 
> Today this system rebooted and hung on Compaq's "what do you want the
> RAID controller to do?" message. I told it to fix any errors.
> 
> When I brought the system back up (after running fsck in single-user
> mode), the log had lots of errors like this:
> 
> Nov 10 09:00:40 mail kernel: ida0: hard write error
> Nov 10 09:00:40 mail kernel: ida0: invalid request
> Nov 10 09:01:48 mail last message repeated 35 times
> Nov 10 09:03:49 mail last message repeated 571 times
> Nov 10 09:12:27 mail last message repeated 796 times
> 
> I vaguely remember trying about a year ago to load a SMART utility from
> the ports collection but it wouldn't work on drives in a RAID array.
> 
> Is there some other way to:
> 
> a) diagnose/fix the errant disk here?
> b) monitor the health of disks on a Compaq controller so it doesn't get
> to this point to begin with?
> 
> thanks in advance
> 
> dn
> 
> 
> 
> 
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.1 (Darwin)
> 
> iD8DBQFHNlk+yPxGVjntI4IRAntlAJ9FWA2ez+BdnViq7mrIpkLBTLm/CgCfRyEA
> czDvMn6+8KjlI3V0iBG4U3I=
> =36+k
> -END PGP SIGNATURE-
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-11 Thread David Newman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/10/07 9:09 PM, Modulok wrote:
>>> I'd welcome suggestions on how (or whether) to try to revive a SCSI
> drive that's failing.
> 
> It depends on how valuable the data on the array is, and more
> importantly, how much funding you have at your disposal to fix the
> problem. If it were me, I would set aside the bad disk, connect a new
> disk to the card and re-synchronize the array. (Assuming one of the
> members still retains a good copy of the data.) Afterwards I would
> destroy, or toss the existing disk in the trash can (depending on the
> sensitivity of the data stored on it.)

Thanks for your reply.

An update: After doing what you suggest (leaving in the "good" disk,
adding a new disk, RAID rebuilding) I still got soft write errors --
with *either one* of the disks I tried.

Then I tried putting both disks in an identical server and they came up
fine, no read or write errors.

Ergo, the bad RAID controller is bad and the disks may be OK.

>>> Is there some other way to:
>>> b)monitor the health of disks on a Compaq controller so it doesn't
> get to this point to begin with?
> 
> There are various tools out there that attempt to 'monitor' the
> condition of disk drives to try and predict when failure is eminent.
> For valuable data, it is safer to setup a mirror and simply toss out
> bad disks as they fail. For extremely valuable data use a 3 disk
> array. With a 3 disk setup you will still be covered in the event that
> an additional disk craps out during the re-sync.
> 
> To quote google's article on disk failure, regarding SMART:

Right, I've heard it said that "SMART isn't."

Nonetheless, I'd appreciate any suggestions to monitor the health of
disks -- and RAID controllers too -- on HP Proliant servers running FreeBSD.

thanks again.

dn


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFHNyZDyPxGVjntI4IRAqk1AKCUwByNOAJZwvtD9V21TZfyaMWaxgCdFSCZ
dZjf3ynK+4OffBzsDOawF9A=
=DUqc
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: dealing with a failing drive

2007-11-10 Thread Modulok

>> I'd welcome suggestions on how (or whether) to try to revive a SCSI
drive that's failing.

It depends on how valuable the data on the array is, and more
importantly, how much funding you have at your disposal to fix the
problem. If it were me, I would set aside the bad disk, connect a new
disk to the card and re-synchronize the array. (Assuming one of the
members still retains a good copy of the data.) Afterwards I would
destroy, or toss the existing disk in the trash can (depending on the
sensitivity of the data stored on it.)

>> Is there some other way to:
>> b)monitor the health of disks on a Compaq controller so it doesn't
get to this point to begin with?

There are various tools out there that attempt to 'monitor' the
condition of disk drives to try and predict when failure is eminent.
For valuable data, it is safer to setup a mirror and simply toss out
bad disks as they fail. For extremely valuable data use a 3 disk
array. With a 3 disk setup you will still be covered in the event that
an additional disk craps out during the re-sync.

To quote google's article on disk failure, regarding SMART:

"...we find that failure prediction models based on SMART parameters
alone are likely to be severely limited in the prediction accuracy,
given that a large fraction of our failed drives have shown on SMART
error signals whatsoever. This result suggests that SMART models are
more useful in predicting trends for large aggregate populations that
for individual components."

http://labs.google.com/papers/disk_failures.pdf

My 2 cents.
-Modulok-

On 11/10/07, David Newman <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> I'd welcome suggestions on how (or whether) to try to revive a SCSI
> drive that's failing.
>
> This is on FreeBSD 6.2-RELENG on a Compaq Proliant DL320, onboard RAID
> and two SCSI drives in a RAID1 array.
>
> Today this system rebooted and hung on Compaq's "what do you want the
> RAID controller to do?" message. I told it to fix any errors.
>
> When I brought the system back up (after running fsck in single-user
> mode), the log had lots of errors like this:
>
> Nov 10 09:00:40 mail kernel: ida0: hard write error
> Nov 10 09:00:40 mail kernel: ida0: invalid request
> Nov 10 09:01:48 mail last message repeated 35 times
> Nov 10 09:03:49 mail last message repeated 571 times
> Nov 10 09:12:27 mail last message repeated 796 times
>
> I vaguely remember trying about a year ago to load a SMART utility from
> the ports collection but it wouldn't work on drives in a RAID array.
>
> Is there some other way to:
>
> a) diagnose/fix the errant disk here?
> b) monitor the health of disks on a Compaq controller so it doesn't get
> to this point to begin with?
>
> thanks in advance
>
> dn
>
>
>
>
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.1 (Darwin)
>
> iD8DBQFHNlk+yPxGVjntI4IRAntlAJ9FWA2ez+BdnViq7mrIpkLBTLm/CgCfRyEA
> czDvMn6+8KjlI3V0iBG4U3I=
> =36+k
> -END PGP SIGNATURE-
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

dealing with a failing drive

2007-11-10 Thread David Newman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I'd welcome suggestions on how (or whether) to try to revive a SCSI
drive that's failing.

This is on FreeBSD 6.2-RELENG on a Compaq Proliant DL320, onboard RAID
and two SCSI drives in a RAID1 array.

Today this system rebooted and hung on Compaq's "what do you want the
RAID controller to do?" message. I told it to fix any errors.

When I brought the system back up (after running fsck in single-user
mode), the log had lots of errors like this:

Nov 10 09:00:40 mail kernel: ida0: hard write error
Nov 10 09:00:40 mail kernel: ida0: invalid request
Nov 10 09:01:48 mail last message repeated 35 times
Nov 10 09:03:49 mail last message repeated 571 times
Nov 10 09:12:27 mail last message repeated 796 times

I vaguely remember trying about a year ago to load a SMART utility from
the ports collection but it wouldn't work on drives in a RAID array.

Is there some other way to:

a) diagnose/fix the errant disk here?
b) monitor the health of disks on a Compaq controller so it doesn't get
to this point to begin with?

thanks in advance

dn





-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFHNlk+yPxGVjntI4IRAntlAJ9FWA2ez+BdnViq7mrIpkLBTLm/CgCfRyEA
czDvMn6+8KjlI3V0iBG4U3I=
=36+k
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

RE: replacing failing drive

2007-04-11 Thread Ruben

Hi Dave, 

You could prepare the replacement drive offline and test it first, provided
you have a generic kernel you can do this on any piece of hardware you have
lying around. By the way there is no need to install anything, check out a
previous answer I wrote, it's for changing RAID levels but the concept is
pretty much the same : 

http://lists.freebsd.org/pipermail/freebsd-questions/2005-July/092529.html

Good luck, 

Ruben 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dave
Sent: April 11, 2007 7:02 PM
To: freebsd-questions@freebsd.org
Subject: replacing failing drive

Hello,
I've got a drive that i'm uncertain if it's failing. It is making an 
occational clicking noise, which is getting more frequent. I installed 
smartmontools and tried to start them, output below:

#smartctl -a /dev/ad0
smartctl version 5.37 [i386-portbld-freebsd6.1] Copyright (C) 2002-6 Bruce 
Allen
Home page is http://smartmontools.sourceforge.net/

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

A mandatory SMART command failed: exiting. To continue, add one or more '-T 
permissive' options.
#/usr/local/etc/rc.d/smartd start
Starting smartd.
(pass0:vpo0:0:5:0): INQUIRY. CDB: 12 0 0 0 24 0
(pass0:vpo0:0:5:0): CAM Status: Command timeout
(pass0:vpo0:0:5:0): INQUIRY. CDB: 12 0 0 0 40 0
(pass0:vpo0:0:5:0): CAM Status: Command timeout
(pass0:vpo0:0:5:0): Vendor Specific Command. CDB: 85 8 e 0 0 0 1 0 0 0 0 0 0

0 ec 0
(pass0:vpo0:0:5:0): CAM Status: Command timeout
(pass0:vpo0:0:5:0): Vendor Specific Command. CDB: 85 8 e 0 0 0 1 0 0 0 0 0 0

0 a1 0
(pass0:vpo0:0:5:0): CAM Status: Command timeout

Does this mean this drive is failing? I've got another identical drive that 
i run smartd on and it doesn't have any issues picking up it's smart id or 
in running tests on it. This is on a 6.2 box. If this drive is failing i'd 
like to drop in another one with minimum downtime. Could someone check my 
procedure:

1. Install new drive as slave
2. Use sysinstall to partition the new drive (i only use a single partition)
3. Use sysinstall to create bsd labels and give them the same values as the 
master drive
4. Use sysinstall to install the boot manager on slave drive
5. Use dump/restore to copy all data on to the slave drive.
6. Power down the box, remove old master drive, set new drive to master, and

reboot

Thanks.
Dave.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 269.2.0/756 - Release Date: 4/10/2007
10:44 PM


-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 269.2.0/756 - Release Date: 04/10/2007
10:44 PM
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

replacing failing drive

2007-04-11 Thread Dave


Hello,
   I've got a drive that i'm uncertain if it's failing. It is making an 
occational clicking noise, which is getting more frequent. I installed 
smartmontools and tried to start them, output below:


#smartctl -a /dev/ad0
smartctl version 5.37 [i386-portbld-freebsd6.1] Copyright (C) 2002-6 Bruce 
Allen

Home page is http://smartmontools.sourceforge.net/

Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)

A mandatory SMART command failed: exiting. To continue, add one or more '-T 
permissive' options.

#/usr/local/etc/rc.d/smartd start
Starting smartd.
(pass0:vpo0:0:5:0): INQUIRY. CDB: 12 0 0 0 24 0
(pass0:vpo0:0:5:0): CAM Status: Command timeout
(pass0:vpo0:0:5:0): INQUIRY. CDB: 12 0 0 0 40 0
(pass0:vpo0:0:5:0): CAM Status: Command timeout
(pass0:vpo0:0:5:0): Vendor Specific Command. CDB: 85 8 e 0 0 0 1 0 0 0 0 0 0 
0 ec 0

(pass0:vpo0:0:5:0): CAM Status: Command timeout
(pass0:vpo0:0:5:0): Vendor Specific Command. CDB: 85 8 e 0 0 0 1 0 0 0 0 0 0 
0 a1 0

(pass0:vpo0:0:5:0): CAM Status: Command timeout

Does this mean this drive is failing? I've got another identical drive that 
i run smartd on and it doesn't have any issues picking up it's smart id or 
in running tests on it. This is on a 6.2 box. If this drive is failing i'd 
like to drop in another one with minimum downtime. Could someone check my 
procedure:


1. Install new drive as slave
2. Use sysinstall to partition the new drive (i only use a single partition)
3. Use sysinstall to create bsd labels and give them the same values as the 
master drive

4. Use sysinstall to install the boot manager on slave drive
5. Use dump/restore to copy all data on to the slave drive.
6. Power down the box, remove old master drive, set new drive to master, and 
reboot


Thanks.
Dave.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: retrieving data from a failing drive

2002-11-11 Thread Mike Hogsett


> Is there any way I can force the FS to be marked clean, or to mount a
> dirty filesystem (possibly in read-only mode)?

Read the mount(8) man page, specifically `-f' and `-r' options.
I hope you have backups!

 - Mike

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-questions" in the body of the message

retrieving data from a failing drive

2002-11-11 Thread Mark Miller

Hi,

I have a 20G /usr partition (IDE drive) that is reporting hard errors at a
certain sector.  I've run fsck -y many times, and each time it hits the
bad sector, it falls back from DMA to PIO mode, and finally exits, saying
"The filesystem is still marked dirty, please run fsck again."

I'm looking around for a new drive that I can restore to, but I don't know
how to read data off a dirty filesystem.  Is there any way I can force the
FS to be marked clean, or to mount a dirty filesystem (possibly in
read-only mode)?  OTOH, would something like netbsd's g4u (ghost for unix)
help me out here?

TIA
Mark Miller


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-questions" in the body of the message

Re: dealing with a failing drive

RE: dealing with a failing drive

Re: dealing with a failing drive

Re: dealing with a failing drive

Re: dealing with a failing drive

Re: dealing with a failing drive

Re: dealing with a failing drive

RE: dealing with a failing drive

Re: dealing with a failing drive

Re: Failing Drive

Re: dealing with a failing drive

RE: dealing with a failing drive

Re: Failing Drive

Failing Drive

Re: dealing with a failing drive

Re: dealing with a failing drive

Re: dealing with a failing drive

Re: dealing with a failing drive

Re: dealing with a failing drive

Re: dealing with a failing drive

Re: dealing with a failing drive

Re: dealing with a failing drive

dealing with a failing drive

RE: replacing failing drive

replacing failing drive

Re: retrieving data from a failing drive

retrieving data from a failing drive

27 matches

Site Navigation

Mail list logo

Footer information