[zfs-discuss] ZPOOL Metadata / Data Error - Help

2009-10-04 Thread Bruno Sousa

Hi all !

I have a serious problem, with a server, and i'm hoping that some one 
could help me how to understand what's wrong.
So basically i have a server with a pool of 6 disks, and after a zpool 
scrub i go the message :


errors: Permanent errors have been detected in the following files:

 metadata:0x0
 metadata:0x15

The version of the opensolaris is 5.11 snv_101b (yes, i now, quite old). 
This server has been up and running for more than 4 months, with weekly 
zpool scrubs, and now i got this message.


Here are some extra details about the system:

1 - i can still access the data in the pool , but i don't know if i  can 
access all the data and/or if all the data is not corrupted

2 - nothing was changed in the hardware
3 - all the disks are ST31000340NS-SN06 , Seagate 1TB 7.200 rpm 
enterprise class , firmware SN06
4 - all the disks are connected to a LSI Logic SAS1068E  connected to  a 
JBOD chassis (Supermicro)

5 - the server is a SUN X2200 Dual-Core
6 - using the lsiutil, and querying the Display phy counters i see :
   Expander (Handle 0009) Phy 21:  Link Up
 Invalid DWord Count   1,171
 Running Disparity Error Count   937
 Loss of DWord Synch Count 0
 Phy Reset Problem Count   0

   Expander (Handle 0009) Phy 22:  Link Up
 Invalid DWord Count   2,110,435
 Running Disparity Error Count   855,781
 Loss of DWord Synch Count 3
 Phy Reset Problem Count   0

   Expander (Handle 0009) Phy 23:  Link Up
 Invalid DWord Count 740,029
 Running Disparity Error Count   716,196
 Loss of DWord Synch Count 1
 Phy Reset Problem Count   0

   Expander (Handle 0009) Phy 24:  Link Up
 Invalid DWord Count 705,870
 Running Disparity Error Count   692,280
 Loss of DWord Synch Count 1
 Phy Reset Problem Count   0

   Expander (Handle 0009) Phy 25:  Link Up
 Invalid DWord Count 698,935
 Running Disparity Error Count   667,148
 Loss of DWord Synch Count 1
 Phy Reset Problem Count   0
7  - the /var/log/messages show o SCSI transport failed: 
reason 'reset': retrying command

   o  SCSI transport failed: reason 'reset': giving up

Maybe i'm wrong...but it seems like the disks started to report errors?
The reason behind the fact that i don't know if all the data is 
accessible and valid is because the pool size is quite big, as seen :


NAME SIZE   USED  AVAILCAP  HEALTH  ALTROOT
POOL01  2.72T  1.71T  1.01T62%  ONLINE  -


It might be the fact that i have been suffering from this problem from 
some time, but the lsi hba had never reported any error, and i assumed 
that ZFS was build to deal with this kind of problems : the silent data 
corruption .
I'm would to understand if the problems started due to a high load in 
the LSI hba that lead to timeouts and therefore disk errors, of if the 
the LSI hba opensolaris driver was overloaded ,resulting in disk errors 
and LSI hba errors...

Any clue to see what lead to what?

Even more importand did i lost data, or zfs is reporting errors to disk 
drivers errors, but the data already existing is okay, and the new data 
may be affected? Is the zpool metadata recoverable?
My biggest concern, is to know if my pool is corrupted, and if so how 
can i fix the zpool,metadata, problem.


Thanks for all your time,

Bruno

r...@server01:/# zpool status -vx
pool: POOL01
state: ONLINE
status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:

 NAME STATE READ WRITE CKSUM
   POOL01   ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c5t9d0   ONLINE   0 0 0
 c5t10d0  ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c5t11d0  ONLINE   0 0 0
 c5t12d0  ONLINE   0 0 0
   mirror ONLINE   0 0 0
 c5t13d0  ONLINE   0 0 0
 c5t14d0  ONLINE   0 0 0
errors: Permanent errors have been detected in the following files:

 metadata:0x0
 metadata:0x15


--
This message has been scanned for viruses and
dangerous 

Re: [zfs-discuss] ZPOOL Metadata / Data Error - Help

2009-10-04 Thread dick hoogendijk

Bruno Sousa wrote:

Action: Restore the file in question if possible. Otherwise restore the
  entire pool from backup.
  metadata:0x0
  metadata:0x15


Hmm, and what file(s) would this be?

--
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS 10u7 5/09 | OpenSolaris 2010.02 b123
+ All that's really worth doing is what we do for others (Lewis Carrol)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZPOOL Metadata / Data Error - Help

2009-10-04 Thread Rob Logan


Action: Restore the file in question if possible. Otherwise restore  
the

 entire pool from backup.
 metadata:0x0
 metadata:0x15


bet its in a snapshot that looks to have been destroyed already. try

zpool clear POOL01
zpool scrub POOL01


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss