Re: [zfs-discuss] ZFS non-zero checksum and permanent error with deleted file

2009-11-04 Thread Steven Samuel Cole

Thank you very much for your reply! :-)

Trevor Pretty schrieb:

Steven

I had a similar problem back in 2006 when I was first playing with ZFS. 
Jeff Bronwick sent me this. It may (or not) help. I'm not sure if the 
number is still the inode. If it is a please let zfs-discuss know.




 I've a non-mirrored zfs file systems which shows the status below. I saw
 the thread in the archives about working this out but it looks like ZFS
 messages have changed. How do I find out what file(s) this is?
 [...]
 errors: The following persistent errors have been detected:
 
   DATASET  OBJECT  RANGE

   LOCAL28905   3262251008-3262382080
  


I realize this is a bit lame, but currently the answer is:

find /LOCAL -mount -inum 28905

And yes, we do indeed plan to automate this.   ;-) 


Jeff


Did your output come from a Solaris system ?

I couldn't find anything about a -mount parameter in the find man page, 
what does it do ?


[u...@host ~]$ sudo zpool status -v zpool01
  ...
errors: Permanent errors have been detected in the following files:

zpool01:0x3736a


[u...@host ~]$ sudo find /mnt/zpool01/ -inum 3736a
find: -inum: 3736a: illegal trailing character
[u...@host ~]$ sudo find /mnt/zpool01/ -inum 0x3736a
find: -inum: 0x3736a: illegal trailing character

Apparently, the -inum parameter needs a decimal number:

[u...@host ~]$ sudo find /mnt/zpool01/ -inum 226154
[u...@host ~]$

How could find ever find anything ? The file at that inode as deleted 
after all. And even if it did find anything, what would I do with the 
result ?

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: [zfs-discuss] ZFS non-zero checksum and permanent error with deleted file

2009-11-04 Thread Steven Samuel Cole

Bob Friesenhahn schrieb:

On Thu, 5 Nov 2009, Steven Samuel Cole wrote:


Definitely do

  zpool scrub zpool01

to see if there is any other decay.


I have done that prior to getting the status, several times actually, 
tried to indicate that in my OP. IIRC, all checksums are zero after 
clearing; after scrubbing, the total checksum goes back up to 4. The 
error is not cleared, though.


Strange.  I do recall that there was one OpenSolaris development release 
which did produce spurious checksum errors which looked weird like 
that.  Hopefully you are not using that particular release.


I am using ZFS as it comes with the official FreeBSD 7.2 64bit, no 
patches, no dev releases, all binary out of the box, nothing self-built. 
IIRC, that's ZFS version 6.



Your 'zpool status' output did not indicate that a scrub was done.


You are correct, my mistake. I reproduced the 3 zpool command lines in 
my OP from memory. I have gone through many clear/scrub/status, 
export/import, wash/rinse/repeat cycles now, the 'last scrub' info must 
have gone lost in one of them. A scrub on that pool takes ~8 hours, so I 
refrained from running it again just for demonstration purposes.


Hmmm. Just as I want to double-check, I get this:

[u...@host ~]$ sudo zpool history
History for 'zpool01':
2008-05-31.22:16:22 zpool create -m /mnt/zpool01 zpool01 raidz1 ad12 
ad14 ad16 ad18

2008-12-28.15:06:54 zpool import zpool01
2008-12-28.18:37:42 zpool export zpool01
2008-12-28.18:51:39 zpool import zpool01
2009-01-05.17:31:51 zpool export zpool01
2009-01-05.19:55:27 zpool import -d /dev/disk/by-id zpool01
2009-08-25.00:50:31 zpool clear zpool01
Assertion failed: ((null)), function nvlist_lookup_string(records[i], 
ZPOOL_HIST_CMD, cmdstr) == 0, file 
/usr/src/cddl/sbin/zpool/../../../cddl/contrib/opensolaris/cmd/zpool/zpool_main.c, 
line 3338.

Abort trap: 6 (core dumped)

Sigh. Maybe I should take that as another indication that something is 
just not right and I should rebuild the pool, afterwise there'll always 
be that nagging thought if my data is actually safe...




Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


ZFS non-zero checksum and permanent error with deleted file

2009-11-03 Thread Steven Samuel Cole

Hello,

I couldn't find a dedicated FreeBSD/ZFS mailing list, so I hope this is 
the right place to ask.


I'd like some advice if I should rely on one of my ZFS pools:

[u...@host ~]$ sudo zpool clear zpool01
  ...
[u...@host ~]$ sudo zpool scrub zpool01
  ...
[u...@host ~]$ sudo zpool status -v zpool01
  pool: zpool01
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
zpool01 ONLINE   0 0 4
  raidz1ONLINE   0 0 4
ad12ONLINE   0 0 0
ad14ONLINE   0 0 0
ad16ONLINE   0 0 0
ad18ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

zpool01:0x3736a


How can there be an error in a file that does not seem to exist ?
How can I clear / recover from the error ?

I have read the corresponding documentation and did the obligatory 
research, but so far, the only option I can see is a full destroy/create 
cycle - which seems an overkill, considering the pool size and the fact 
that there seems to be only one (deleted ?) file involved.


[u...@host ~]$ df -h /mnt/zpool01/
FilesystemSizeUsed   Avail Capacity  Mounted on
zpool01   1.3T1.2T133G90%/mnt/zpool01

[u...@host ~]$ uname -a
FreeBSD host.domain 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Fri May  1 
07:18:07 UTC 2009 
r...@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64


Cheers,

ssc
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: error message when starting smartd: FAILURE - SMART status=51 ...

2008-06-21 Thread Steven Samuel Cole

Uwe Laverenz schrieb:

On Sat, Jun 21, 2008 at 01:38:25PM +1200, Steven Samuel Cole wrote:

Also, the disks are SATA300, the controller supports SATA150 only; there 
is a jumper on the disks that limits them to SATA150 which I removed. 
Could that be relevant ?


Yes, it could be relevant. Several controllers have shown problems without
this jumper in the past (VIA, 3ware...).

Uwe

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Hey Uwe,

thanks for your reply :-)

I put the jumpers back in, but unfortunately, the messages persist :-(

Cheers,

Steve
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


error message when starting smartd: FAILURE - SMART status=51 ...

2008-06-20 Thread Steven Samuel Cole

Hello,

I see an error message every time I boot my AMD64 FreeBSD 7.0 system or 
when I restart smartd. These are the dmesg lines that seem relevant to 
the issue (shortened for clarity):


kernel: FreeBSD 7.0-RELEASE #0: Fri Jun  6 22:06:44 NZST 2008
kernel: CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ (2611.86-MHz 
K8-class CPU)

kernel: Origin = AuthenticAMD  Id = 0x60fb2  Stepping = 2
kernel: usable memory = 8576704512 (8179 MB)
   ...
kernel: atapci2: SiI SiI 3114 SATA150 controller port 
0xd480-0xd487,0xd400-0xd403,0xd080-0xd087,0xd000-0xd003,0xcc00-0xcc0f 
mem 0xddeff400-0xddeff7ff irq 18 at device 10.0 on pci1

   ...
kernel: ata6: ATA channel 0 on atapci2
kernel: ata7: ATA channel 1 on atapci2
kernel: ata8: ATA channel 2 on atapci2
kernel: ata9: ATA channel 3 on atapci2
   ...
kernel: ad12: 476940MB Seagate ST3500320AS SD15 at ata6-master SATA150
kernel: ad14: 476940MB Seagate ST3500320AS SD15 at ata7-master SATA150
kernel: ad16: 476940MB Seagate ST3500320AS SD15 at ata8-master SATA150
kernel: ad18: 476940MB Seagate ST3500320AS SD15 at ata9-master SATA150

I am happy to provide more info if that helps.#
I am using the latest version of smartmontools (5.38).

The actual error messages are:

kernel: ad12: FAILURE - SMART status=51READY,DSC,ERROR error=4ABORTED
kernel: ad14: FAILURE - SMART status=51READY,DSC,ERROR error=4ABORTED
kernel: ad16: FAILURE - SMART status=51READY,DSC,ERROR error=4ABORTED
kernel: ad18: FAILURE - SMART status=51READY,DSC,ERROR error=4ABORTED

Apart from that, the disks seem to be working fine.
I am running a ZFS pool on them and they perform great, no worries 
whatsoever; I am simply unsure what these error messages mean and if 
they should worry me.


Also, the disks are SATA300, the controller supports SATA150 only; there 
is a jumper on the disks that limits them to SATA150 which I removed. 
Could that be relevant ?


Thank you very much for your attention.

Kind regards,

Steve
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]