[zfs-discuss] Permanent errors

2010-01-12 Thread epiq
Hello !

Can anybody help me with some trouble:

j...@opensolaris:~# zpool status -v  
  pool: green
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h8m, 5.08% done, 2h42m to go
config:

NAMESTATE READ WRITE CKSUM
green   ONLINE   0 0   117
  c3d0  ONLINE   0 0   234
  c4d0  ONLINE   0 0   234

errors: Permanent errors have been detected in the following files:

/green/home/epiq/vid/2resolume/400x300/people/deti/primorsk - manoly - 
rebenok na vode.avi
/green/home/epiq/mus/!!![Labels]!!!/playhouse/[playcd007] captain 
comatose - going out/11_captain comatose - $100 (live).mp3
green/home:0x1451a
green/home:0x1cd29
green/home:0x14537
green/home:0x1454e
green/home:0x14577
green/home:0x1458e
green/home:0x14599
/green/home/epiq/vid/!!!incoming/Mar`ja_Iskusnica.avi
/green/home/epiq/mus/easy
green/home:0x144ec
green/home:0x144f9
green/home:0x144fc

As you can see in output my zfs pool have permanent errors with some files and 
directories - but how i can clear these errors? I try to do scrub one time, and 
bigger part of errors gone, but some stay, and i can't delete it :


j...@opensolaris:~# rm -rf /green/home/epiq/mus/easy
rm: Unable to remove directory /green/home/epiq/mus/easy: Directory not empty
j...@opensolaris:~# rmdir --ignore-fail-on-non-empty /green/home/epiq/mus/easy
rmdir: illegal option -- ignore-fail-on-non-empty
Usage: rmdir [-ps] dirname ...
j...@opensolaris:~# rm -rf 
/green/home/epiq/vid/2resolume/400x300/people/deti/primorsk - manoly - rebenok 
na vode.avi
j...@opensolaris:~# ls -la 
/green/home/epiq/vid/2resolume/400x300/people/deti/primorsk - manoly - rebenok 
na vode.avi
-rw-r--r--   1 101  staff18221286 Oct 25  2008 
/green/home/epiq/vid/2resolume/400x300/people/deti/primorsk - manoly - rebenok 
na vode.avi

While i googled around this problem i found zdb commands:

j...@opensolaris:~# zdb -d green/home 0x1451a 0x1cd29 0x14537 0x1454e 0x14577 
0x1458e 0x14599 0x144ec 0x144f9 0x144fc
Dataset green/home [ZPL], ID 40, cr_txg 325, 906G, 116136 objects

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 83226216K   128K  12.6M  12.6M  100.00  ZFS plain file
118057416K   128K  5.22G  5.22G  100.00  ZFS plain file
 83255216K   128K  7.63M  7.62M  100.00  ZFS plain file
 83278216K   128K  4.25M  4.25M  100.00  ZFS plain file
 83319216K   128K  5.38M  5.38M  100.00  ZFS plain file
 83342216K   128K  5.50M  5.50M  100.00  ZFS plain file
 83353216K   128K  6.13M  6.12M  100.00  ZFS plain file
 83180216K   128K  8.26M  8.25M  100.00  ZFS plain file
 83193216K   128K  6.01M  6.00M  100.00  ZFS plain file
 83196216K   128K  8.63M  8.62M  100.00  ZFS plain file


, but how it can halp me?

With best wishes, Epiq.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors

2010-01-12 Thread Cindy Swearingen

Hi--

The best approach is to correct the issues that are causing these
problems in the first place. The fmdump -eV commnand will identify
the hardware problems that caused the checksum errors and the corrupted
files.

You might be able to use some combination of zpool scrub, zpool clear,
and removing these corrupted files manually, but unless the failing
disks (?) are repaired, you could potentially lose more data.

We have zpool recovery feature in build 128 but it will not work
in this scenario since the pool is online (its importable).

If you can't repair the underlying hardware issues and you have no
backup of this data, then you might consider reviewing the steps
that are described here:

http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script

We haven't had a chance to look at this script closely or test it yet so
the usual caveats apply.

Thanks,

Cindy



On 01/12/10 07:11, epiq wrote:

Hello !

Can anybody help me with some trouble:

j...@opensolaris:~# zpool status -v  
  pool: green

 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h8m, 5.08% done, 2h42m to go
config:

NAMESTATE READ WRITE CKSUM
green   ONLINE   0 0   117
  c3d0  ONLINE   0 0   234
  c4d0  ONLINE   0 0   234

errors: Permanent errors have been detected in the following files:

/green/home/epiq/vid/2resolume/400x300/people/deti/primorsk - manoly - 
rebenok na vode.avi
/green/home/epiq/mus/!!![Labels]!!!/playhouse/[playcd007] captain 
comatose - going out/11_captain comatose - $100 (live).mp3
green/home:0x1451a
green/home:0x1cd29
green/home:0x14537
green/home:0x1454e
green/home:0x14577
green/home:0x1458e
green/home:0x14599
/green/home/epiq/vid/!!!incoming/Mar`ja_Iskusnica.avi
/green/home/epiq/mus/easy
green/home:0x144ec
green/home:0x144f9
green/home:0x144fc

As you can see in output my zfs pool have permanent errors with some files and 
directories - but how i can clear these errors? I try to do scrub one time, and 
bigger part of errors gone, but some stay, and i can't delete it :


j...@opensolaris:~# rm -rf /green/home/epiq/mus/easy
rm: Unable to remove directory /green/home/epiq/mus/easy: Directory not empty
j...@opensolaris:~# rmdir --ignore-fail-on-non-empty /green/home/epiq/mus/easy
rmdir: illegal option -- ignore-fail-on-non-empty
Usage: rmdir [-ps] dirname ...
j...@opensolaris:~# rm -rf 
/green/home/epiq/vid/2resolume/400x300/people/deti/primorsk - manoly - rebenok na 
vode.avi
j...@opensolaris:~# ls -la 
/green/home/epiq/vid/2resolume/400x300/people/deti/primorsk - manoly - rebenok na 
vode.avi
-rw-r--r--   1 101  staff18221286 Oct 25  2008 
/green/home/epiq/vid/2resolume/400x300/people/deti/primorsk - manoly - rebenok 
na vode.avi

While i googled around this problem i found zdb commands:

j...@opensolaris:~# zdb -d green/home 0x1451a 0x1cd29 0x14537 0x1454e 0x14577 
0x1458e 0x14599 0x144ec 0x144f9 0x144fc
Dataset green/home [ZPL], ID 40, cr_txg 325, 906G, 116136 objects

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 83226216K   128K  12.6M  12.6M  100.00  ZFS plain file
118057416K   128K  5.22G  5.22G  100.00  ZFS plain file
 83255216K   128K  7.63M  7.62M  100.00  ZFS plain file
 83278216K   128K  4.25M  4.25M  100.00  ZFS plain file
 83319216K   128K  5.38M  5.38M  100.00  ZFS plain file
 83342216K   128K  5.50M  5.50M  100.00  ZFS plain file
 83353216K   128K  6.13M  6.12M  100.00  ZFS plain file
 83180216K   128K  8.26M  8.25M  100.00  ZFS plain file
 83193216K   128K  6.01M  6.00M  100.00  ZFS plain file
 83196216K   128K  8.63M  8.62M  100.00  ZFS plain file


, but how it can halp me?

With best wishes, Epiq.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors

2010-01-12 Thread epiq
Cindys, thank you for answer, but i need explain some details. This pool is new 
hardware for my system - 2x1Tb WD Green hard drives, but data on this pool was 
copied from old 9x300 Gb hard drives pool with hw problem. while i copied it 
data where was many errors, but at the end i see this picture : 
j...@opensolaris:~# fmdump -eV 
fmdump: failed to open /var/fm/fmd/errlog: No such file or directory

and where no READ or WRITE errors on new pool - only CKSUM. As i understand it 
closely to transfer errors, then problem with new HW. And now i need only to 
clear this permanent errors on new pool, for probably restoring this files from 
backups.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors

2010-01-12 Thread Cindy Swearingen

Hi,

I think you are saying that you copied the data on this system from a
previous system with hardware problems. It looks like the data that was
copied was corrupt, which is causing the permanent errors on the new
system (?)

The manual removal of the corrupt files, zpool scrub and zpool clear
might work, but I don't have experience with this many errors on a
non-redundant config.

If you have a clean backup of this data, you might consider destroying
the green/home dataset, clearing the pool errors, recreating green/home,
and restoring the known, good data.

Maybe someone else can suggest a better approach...

Cindy


On 01/12/10 15:27, epiq wrote:
Cindys, thank you for answer, but i need explain some details. This pool is new hardware for my system - 2x1Tb WD Green hard drives, but data on this pool was copied from old 9x300 Gb hard drives pool with hw problem. while i copied it data where was many errors, but at the end i see this picture : 
j...@opensolaris:~# fmdump -eV 
fmdump: failed to open /var/fm/fmd/errlog: No such file or directory


and where no READ or WRITE errors on new pool - only CKSUM. As i understand it 
closely to transfer errors, then problem with new HW. And now i need only to 
clear this permanent errors on new pool, for probably restoring this files from 
backups.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors on two files

2009-12-06 Thread Gary Mills
On Fri, Dec 04, 2009 at 02:52:47PM -0700, Cindy Swearingen wrote:
 
 If space/dcc is a dataset, is it mounted? ZFS might not be able to
 print the filenames if the dataset is not mounted, but I'm not sure
 if this is why only object numbers are displayed.

Yes, it's mounted and is quite an active filesystem.

 I would also check fmdump -eV to see how frequent the hardware
 has had problems.

That shows ZFS checksum errors in July, but nothing since that time.
There were also DIMM errors before that, starting in June.  We
replaced the failed DIMMs, also in July.  This is an X4450 with ECC
memory.  There were no disk errors reported.  I suppose we can blame
the memory.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors on two files

2009-12-06 Thread Gary Mills
On Sat, Dec 05, 2009 at 01:52:12AM +0300, Victor Latushkin wrote:
 On Dec 5, 2009, at 0:52, Cindy Swearingen cindy.swearin...@sun.com  
 wrote:
 
 The zpool status -v command will generally print out filenames, dnode
 object numbers, or identify metadata corruption problems. These look
 like object numbers, because they are large, rather than metadata
 objects, but an expert will have to comment.
 
 Yes, thi is object numbers and most likely reason these are not turned  
 into filnames is that corresponding files no longer exist.

That seems to be the case:

# zdb -d space/dcc 0x11e887 0xba25aa
Dataset space/dcc [ZPL], ID 21, cr_txg 19, 20.5G, 3672408 objects

 So I'd run scrub another time, if the files are gone and there are no  
 other corruptions scrub will reset error log and zpool status should  
 become clean.

That worked.  After the scrub, there are no errors reported.

 You might be able to identify these object numbers with zdb, but
 I'm not sure how do that.
 
 You can try to use zdb this way to check if these objects still exist
 
 zdb -d space/dcc 0x11e887 0xba25aa

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors on two files

2009-12-04 Thread Cindy Swearingen

Hi Gary,

To answer your questions, the hardware read some data and ZFS detected
a problem with the checksums in this dataset and reported this problem.
ZFS can do this regardless of ZFS redundancy.

I don't think a scrub will fix these permanent errors, but it depends
on the corruption. If its data, but not redundant and no copies=2,
then probably not. If its metadata, then multiple copies exist, but
it depends on the extent of the corruption.

If space/dcc is a dataset, is it mounted? ZFS might not be able to
print the filenames if the dataset is not mounted, but I'm not sure
if this is why only object numbers are displayed.

The zpool status -v command will generally print out filenames, dnode
object numbers, or identify metadata corruption problems. These look
like object numbers, because they are large, rather than metadata
objects, but an expert will have to comment.

You might be able to identify these object numbers with zdb, but
I'm not sure how do that.

I would also check fmdump -eV to see how frequent the hardware
has had problems.

Cindy


On 12/04/09 12:19, Gary Mills wrote:

I just noticed this today:

# zpool status -v
  pool: space
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM

space   ONLINE   0 0 0
  c0t1d0ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

space/dcc:0x11e887

space/dcc:0xba25aa

The device here is a hardware mirror of two 146-gig SAS drives.
How can ZFS detect errors when it has no redundancy?  How do I
determine what files these are?  Will a scrub fix it?  This is a
production system, so I want to be careful.

It's running Solaris 10 5/09 s10x_u7wos_08 X86.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors on two files

2009-12-04 Thread Victor Latushkin
On Dec 5, 2009, at 0:52, Cindy Swearingen cindy.swearin...@sun.com  
wrote:



Hi Gary,

To answer your questions, the hardware read some data and ZFS detected
a problem with the checksums in this dataset and reported this  
problem.

ZFS can do this regardless of ZFS redundancy.

I don't think a scrub will fix these permanent errors, but it depends
on the corruption. If its data, but not redundant and no copies=2,
then probably not. If its metadata, then multiple copies exist, but
it depends on the extent of the corruption.

If space/dcc is a dataset, is it mounted? ZFS might not be able to
print the filenames if the dataset is not mounted, but I'm not sure
if this is why only object numbers are displayed.

The zpool status -v command will generally print out filenames, dnode
object numbers, or identify metadata corruption problems. These look
like object numbers, because they are large, rather than metadata
objects, but an expert will have to comment.


Yes, thi is object numbers and most likely reason these are not turned  
into filnames is that corresponding files no longer exist.


So I'd run scrub another time, if the files are gone and there are no  
other corruptions scrub will reset error log and zpool status should  
become clean.


You might be able to identify these object numbers with zdb, but
I'm not sure how do that.



You can try to use zdb this way to check if these objects still exist

zdb -d space/dcc 0x11e887 0xba25aa

Victor

I would also check fmdump -eV to see how frequent the hardware
has had problems.

Cindy


On 12/04/09 12:19, Gary Mills wrote:

I just noticed this today:
   # zpool status -v
 pool: space
state: ONLINE
   status: One or more devices has experienced an error resulting  
in data

   corruption.  Applications may be affected.
   action: Restore the file in question if possible.  Otherwise  
restore the

   entire pool from backup.
  see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
   config:
   NAMESTATE READ WRITE CKSUM
   space   ONLINE   0 0 0
 c0t1d0ONLINE   0 0 0
   errors: Permanent errors have been detected in the following  
files:

   space/dcc:0x11e887
   space/dcc:0xba25aa
The device here is a hardware mirror of two 146-gig SAS drives.
How can ZFS detect errors when it has no redundancy?  How do I
determine what files these are?  Will a scrub fix it?  This is a
production system, so I want to be careful.
It's running Solaris 10 5/09 s10x_u7wos_08 X86.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)

2008-10-05 Thread Emmanuel
Hi

I am looking for guidance on the following zfs setup and error:
- opensolaris 2008.05 running as guest in vmware server - ubuntu host
- system has run flawlessly as an NFS file server for some months now. Single 
zpool (called 'tank'), 2 vdevs each as raid-Z, about 10 filesystems (one of 
them called 'mail')
- after a power surge causing a reboot, opensolaris became unable to mount the 
pool

Using the opensolaris cd as a rescue disk, I discovered a permanent error 
(ZFS-8000-8A) quoting tank/mail:0x0 as the location of the error (that is 
the name of the filesystem itself and not a specific file). The FS contains 
maidirs archives, probably of the order of 10,000 files. 

The pool comes out clean of a scrub.

Googling, I tried to unmount / mount to possibly replay the log (ZIL) in case 
the transactions didn't play through entirely. Same negative result.

Reading http://docs.sun.com/app/docs/doc/819-5461/gbbwl?a=view, there is a case 
mentioning monkey/dnode:0x0 that seems close enough. Is that really the 
case? If so, how do I 'move' the data as the solution proposes?

As you imagine, I'd like to rescue files so any alternative hint is welcome.

Thanks.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)

2008-10-05 Thread Emmanuel
Reading through the post the error message didn't come through properly. It is 
tank/mail:0x0 (with lesser than and greater than on either sides of the 0's).
Also, the 4 disks (2 vdevs x 2 for raid-z) are physical sata disks dedicated to 
the vmware image.

Thanks.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss