Re: [zfs-discuss] ZFS problems which scrub can't find?
Do you use any form of compression? I changed compression from none to gzip-9, got some message about changing properties of boot pool (or fs), copied and moved all files under /usr and /etc to enforce compression, rebooted, and - guess what message did I get. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems which scrub can't find?
Off the lists, someone suggested to me that the Inconsistent filesystem may be the boot archive and not the ZFS filesystem (though I still don't know what's wrong with booting b99). Regardless, I tried rebuilding the boot_archive with bootadm update-archive -vf and verified it by mounting it and peeking inside. I also tried both with and without /etc/hostid. I still get the same behavior. Any thoughts? Thanks in advance, - Matt [EMAIL PROTECTED] wrote: Hi, After a recent pkg image-update to OpenSolaris build 100, my system booted once and now will no longer boot. After exhausting other options, I am left wondering if there is some kind of ZFS issue a scrub won't find. The current behavior is that it will load GRUB, but trying to boot the most recent boot environment (b100 based) I get Error 16: Inconsistent filesystem structure. The pool has gone through two scrubs from a livecd based on b101a without finding anything wrong. If I select the previous boot environment (b99 based), I get a kernel panic. I've tried replacing the /etc/hostid based on a hunch from one of the engineers working on Indiana and ZFS boot. I also tried rebuilding the boot_archive and reloading the GRUB based on build 100. I then tried reloading the build 99 grub to hopefully get to where I could boot build 99. No luck with any of these thus far. More below, and some comments in this bug: http://defect.opensolaris.org/bz/show_bug.cgi?id=3965, though may need to be a separate bug. I'd appreciate any suggestions and be glad to gather any data to diagnose this if possible. == Screen when trying to boot b100 after boot menu == Booting 'opensolaris-15' bootfs rpool/ROOT/opensolaris-15 kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS loading '/platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS' ... cpu: 'GenuineIntel' family 6 model 15 step 11 [BIOS accepted mixed-mode target setting!] [Multiboot-kludge, loadaddr=0xbffe38, text-and-data=0x1931a8, bss=0x0, entry=0xc0] '/platform/i86pc/kernel/amd64/unix -B zfs-bootfs=rpool/391,bootpath=[EMAIL PROTECTED],0/pci1179,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0:a,diskdevid=id1,[EMAIL PROTECTED]/a' is loaded module$ /platform/i86pc/$ISADIR/boot_archive loading '/platform/i86pc/$ISADIR/boot_archive' ... Error 16: Inconsistent filesystem structure Press any key to continue... == Booting b99 == (by selecting the grub entry from the GRUB menu and adding -kd then doing a :c to continue I get the following stack trace) debug_enter+37 () panicsys+40b () vpanic+15d () panic+9c () (lines above typed in from ::stack, lines below typed in from when it dropped into the debugger) unix:die+ea () unix:trap+3d0 () unix:cmntrap+e9 () unix:mutex_owner_running+d () genunix:lokuppnat+bc () genunix:vn_removeat+7c () genunix:vn_remove_28 () zfs:spa_config_write+18d () zfs:spa_config_sync+102 () zfs:spa_open_common+24b () zfs:spa_open+1c () zfs:dsl_dsobj_to_dsname+37 () zfs:zfs_parse_bootfs+68 () zfs:zfs_mountroot+10a () genunxi:fsop_mountroot+1a () genunix:rootconf+d5 () genunix:vfs_mountroot+65 () genunix:main+e6 () unix:_locore_start+92 () panic: entering debugger (no dump device, continue to reboot) Loaded modules: [ scsi_vhci uppc sd zfs specfs pcplusmp cpu.generic ] kmdb: target stopped at: kmdb_enter+0xb: movq %rax,%rdi == Output from zdb == LABEL 0 version=10 name='rpool' state=1 txg=327816 pool_guid=6981480028020800083 hostid=95693 hostname='opensolaris' top_guid=5199095267524632419 guid=5199095267524632419 vdev_tree type='disk' id=0 guid=5199095267524632419 path='/dev/dsk/c4t0d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/[EMAIL PROTECTED],0/pci1179,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0:a' whole_disk=0 metaslab_array=14 metaslab_shift=29 ashift=9 asize=90374406144 is_log=0 DTL=161 LABEL 1 version=10 name='rpool' state=1 txg=327816 pool_guid=6981480028020800083 hostid=95693 hostname='opensolaris' top_guid=5199095267524632419 guid=5199095267524632419 vdev_tree type='disk' id=0 guid=5199095267524632419 path='/dev/dsk/c4t0d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/[EMAIL PROTECTED],0/pci1179,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0:a' whole_disk=0 metaslab_array=14 metaslab_shift=29 ashift=9 asize=90374406144 is_log=0 DTL=161 LABEL 2 version=10 name='rpool' state=1 txg=327816
[zfs-discuss] ZFS problems which scrub can't find?
Hi, After a recent pkg image-update to OpenSolaris build 100, my system booted once and now will no longer boot. After exhausting other options, I am left wondering if there is some kind of ZFS issue a scrub won't find. The current behavior is that it will load GRUB, but trying to boot the most recent boot environment (b100 based) I get Error 16: Inconsistent filesystem structure. The pool has gone through two scrubs from a livecd based on b101a without finding anything wrong. If I select the previous boot environment (b99 based), I get a kernel panic. I've tried replacing the /etc/hostid based on a hunch from one of the engineers working on Indiana and ZFS boot. I also tried rebuilding the boot_archive and reloading the GRUB based on build 100. I then tried reloading the build 99 grub to hopefully get to where I could boot build 99. No luck with any of these thus far. More below, and some comments in this bug: http://defect.opensolaris.org/bz/show_bug.cgi?id=3965, though may need to be a separate bug. I'd appreciate any suggestions and be glad to gather any data to diagnose this if possible. == Screen when trying to boot b100 after boot menu == Booting 'opensolaris-15' bootfs rpool/ROOT/opensolaris-15 kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS loading '/platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS' ... cpu: 'GenuineIntel' family 6 model 15 step 11 [BIOS accepted mixed-mode target setting!] [Multiboot-kludge, loadaddr=0xbffe38, text-and-data=0x1931a8, bss=0x0, entry=0xc0] '/platform/i86pc/kernel/amd64/unix -B zfs-bootfs=rpool/391,bootpath=[EMAIL PROTECTED],0/pci1179,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0:a,diskdevid=id1,[EMAIL PROTECTED]/a' is loaded module$ /platform/i86pc/$ISADIR/boot_archive loading '/platform/i86pc/$ISADIR/boot_archive' ... Error 16: Inconsistent filesystem structure Press any key to continue... == Booting b99 == (by selecting the grub entry from the GRUB menu and adding -kd then doing a :c to continue I get the following stack trace) debug_enter+37 () panicsys+40b () vpanic+15d () panic+9c () (lines above typed in from ::stack, lines below typed in from when it dropped into the debugger) unix:die+ea () unix:trap+3d0 () unix:cmntrap+e9 () unix:mutex_owner_running+d () genunix:lokuppnat+bc () genunix:vn_removeat+7c () genunix:vn_remove_28 () zfs:spa_config_write+18d () zfs:spa_config_sync+102 () zfs:spa_open_common+24b () zfs:spa_open+1c () zfs:dsl_dsobj_to_dsname+37 () zfs:zfs_parse_bootfs+68 () zfs:zfs_mountroot+10a () genunxi:fsop_mountroot+1a () genunix:rootconf+d5 () genunix:vfs_mountroot+65 () genunix:main+e6 () unix:_locore_start+92 () panic: entering debugger (no dump device, continue to reboot) Loaded modules: [ scsi_vhci uppc sd zfs specfs pcplusmp cpu.generic ] kmdb: target stopped at: kmdb_enter+0xb: movq %rax,%rdi == Output from zdb == LABEL 0 version=10 name='rpool' state=1 txg=327816 pool_guid=6981480028020800083 hostid=95693 hostname='opensolaris' top_guid=5199095267524632419 guid=5199095267524632419 vdev_tree type='disk' id=0 guid=5199095267524632419 path='/dev/dsk/c4t0d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/[EMAIL PROTECTED],0/pci1179,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0:a' whole_disk=0 metaslab_array=14 metaslab_shift=29 ashift=9 asize=90374406144 is_log=0 DTL=161 LABEL 1 version=10 name='rpool' state=1 txg=327816 pool_guid=6981480028020800083 hostid=95693 hostname='opensolaris' top_guid=5199095267524632419 guid=5199095267524632419 vdev_tree type='disk' id=0 guid=5199095267524632419 path='/dev/dsk/c4t0d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/[EMAIL PROTECTED],0/pci1179,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0:a' whole_disk=0 metaslab_array=14 metaslab_shift=29 ashift=9 asize=90374406144 is_log=0 DTL=161 LABEL 2 version=10 name='rpool' state=1 txg=327816 pool_guid=6981480028020800083 hostid=95693 hostname='opensolaris' top_guid=5199095267524632419 guid=5199095267524632419 vdev_tree type='disk' id=0 guid=5199095267524632419 path='/dev/dsk/c4t0d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/[EMAIL PROTECTED],0/pci1179,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0:a' whole_disk=0 metaslab_array=14 metaslab_shift=29 ashift=9 asize=90374406144 is_log=0 DTL=161
Re: [zfs-discuss] ZFS Problems under vmware
Raw Device Mapping is a feature of ESX 2.5 and above which allows a guest OS to have access to a LUN on fibre or ISCSI SAN. See http://www.vmware.com/pdf/esx25_rawdevicemapping.pdf for more details. You may be able to do something similar with the raw disks under workstation see http://www.vmware.com/support/reference/linux/osonpartition_linux.html Since I added the RDM to one of my guest OSes all of them them have started working using virtual disks after running code #zpool export tank #zpool import -f tank /code Maybe adding the RDM changed some behavoiur of ESX or mabe I just got lucky This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Problems under vmware
I am seeing the same problem using a seperate virtual disk for the pool. This is happening with Solaris 10 U3, U4 and U5 SCSI reservations is know to be an issue with clustered solaris http://blogs.sun.com/SC/entry/clustering_solaris_guests_that_run I wonder if this is the same problem. Maybe we have to use Raw Device Mapping (RDM) to get zfs to work under vmware. Anthony Worrall This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Problems under vmware
Added an vdev using rdm and that seems to be stable over reboots however the pools based on a virtual disk now also seems to be stable after doing an export and import -f This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems with USB Storage devices
Hi Ricardo, I'll try that. Thanks (Obrigado) Paulo Soeiro On 6/5/08, Ricardo M. Correia [EMAIL PROTECTED] wrote: On Ter, 2008-06-03 at 23:33 +0100, Paulo Soeiro wrote: 6)Remove and attached the usb sticks: zpool status pool: myPool state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c6t0d0p0 FAULTED 0 0 0 corrupted data c7t0d0p0 FAULTED 0 0 0 corrupted data This could be a problem of USB devices getting renumbered (or something to that effect). Try doing zpool export myPool and zpool import myPool at this point, it should work fine and you should be able to get your data back. Cheers, Ricardo -- *Ricardo Manuel Correia* Lustre Engineering *Sun Microsystems, Inc.* Portugal Phone +351.214134023 / x58723 Mobile +351.912590825 Email [EMAIL PROTECTED] 6g_top.gif___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems with USB Storage devices
On Ter, 2008-06-03 at 23:33 +0100, Paulo Soeiro wrote: 6)Remove and attached the usb sticks: zpool status pool: myPool state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c6t0d0p0 FAULTED 0 0 0 corrupted data c7t0d0p0 FAULTED 0 0 0 corrupted data This could be a problem of USB devices getting renumbered (or something to that effect). Try doing zpool export myPool and zpool import myPool at this point, it should work fine and you should be able to get your data back. Cheers, Ricardo -- Ricardo Manuel Correia Lustre Engineering Sun Microsystems, Inc. Portugal Phone +351.214134023 / x58723 Mobile +351.912590825 Email [EMAIL PROTECTED] attachment: 6g_top.gif___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems with USB Storage devices
Did the same test again and here is the result: 1) zpool create myPool mirror c6t0d0p0 c7t0d0p0 2) -bash-3.2# zfs create myPool/myfs -bash-3.2# zpool status pool: myPool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM myPool ONLINE 0 0 0 mirror ONLINE 0 0 0 c6t0d0p0 ONLINE 0 0 0 c7t0d0p0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors 3)Copy a file to /myPool/myfs ls -ltrh total 369687 -rwxr-xr-x 1 root root 184M Jun 3 22:38 test.bin 4)Copy a second file cp test.bin test2.bin And shutdown Startup 5) -bash-3.2# zpool status pool: myPool state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c6t0d0p0 UNAVAIL 0 0 0 cannot open c7t0d0p0 UNAVAIL 0 0 0 cannot open pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors 6)Remove and attached the usb sticks: zpool status pool: myPool state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c6t0d0p0 FAULTED 0 0 0 corrupted data c7t0d0p0 FAULTED 0 0 0 corrupted data pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors --- So it's not a hub problem, but it seems to be a zfs usb storage problem. I just hope zfs works fine on hardisks. Because it's not working on usb sticks. It would be nice somebody from SUN could fix this problem... Thanks Regards Paulo On Tue, Jun 3, 2008 at 8:19 PM, Paulo Soeiro [EMAIL PROTECTED] wrote: I'll try the same without the hub. Thanks Regards Paulo On 6/2/08, Thommy M. [EMAIL PROTECTED] wrote: Paulo Soeiro wrote: Greetings, I was experimenting with zfs, and i made the following test, i shutdown the computer during a write operation in a mirrored usb storage filesystem. Here is my configuration NGS USB 2.0 Minihub 4 3 USB Silicom Power Storage Pens 1 GB each These are the ports: hub devices /---\ | port 2 | port 1 | | c10t0d0p0 | c9t0d0p0 | - | port 4 | port 4 | | c12t0d0p0 | c11t0d0p0| \/ Here is the problem: 1)First i create a mirror with port2 and port1 devices zpool create myPool mirror c10t0d0p0 c9t0d0p0 -bash-3.2# zpool status pool: myPool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM myPool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0p0 ONLINE 0 0 0 c9t0d0p0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors 2)zfs create myPool/myfs 3)created a random file (file.txt - more or less 100MB size) digest -a md5 file.txt 3f9d17531d6103ec75ba9762cb250b4c 4)While making a second copy of the file: cp file.txt test I've shutdown the computer while the file was being copied. And restarted the computer again. And here is the result: -bash-3.2# zpool status pool: myPool state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c12t0d0p0 OFFLINE 0 0 0 c9t0d0p0 FAULTED 0 0 0 corrupted data pool: rpool state: ONLINE scrub:
Re: [zfs-discuss] ZFS problems with USB Storage devices
This test was done without the hub: On Tue, Jun 3, 2008 at 11:33 PM, Paulo Soeiro [EMAIL PROTECTED] wrote: Did the same test again and here is the result: 1) zpool create myPool mirror c6t0d0p0 c7t0d0p0 2) -bash-3.2# zfs create myPool/myfs -bash-3.2# zpool status pool: myPool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM myPool ONLINE 0 0 0 mirror ONLINE 0 0 0 c6t0d0p0 ONLINE 0 0 0 c7t0d0p0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors 3)Copy a file to /myPool/myfs ls -ltrh total 369687 -rwxr-xr-x 1 root root 184M Jun 3 22:38 test.bin 4)Copy a second file cp test.bin test2.bin And shutdown Startup 5) -bash-3.2# zpool status pool: myPool state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c6t0d0p0 UNAVAIL 0 0 0 cannot open c7t0d0p0 UNAVAIL 0 0 0 cannot open pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors 6)Remove and attached the usb sticks: zpool status pool: myPool state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c6t0d0p0 FAULTED 0 0 0 corrupted data c7t0d0p0 FAULTED 0 0 0 corrupted data pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors --- So it's not a hub problem, but it seems to be a zfs usb storage problem. I just hope zfs works fine on hardisks. Because it's not working on usb sticks. It would be nice somebody from SUN could fix this problem... Thanks Regards Paulo On Tue, Jun 3, 2008 at 8:19 PM, Paulo Soeiro [EMAIL PROTECTED] wrote: I'll try the same without the hub. Thanks Regards Paulo On 6/2/08, Thommy M. [EMAIL PROTECTED] wrote: Paulo Soeiro wrote: Greetings, I was experimenting with zfs, and i made the following test, i shutdown the computer during a write operation in a mirrored usb storage filesystem. Here is my configuration NGS USB 2.0 Minihub 4 3 USB Silicom Power Storage Pens 1 GB each These are the ports: hub devices /---\ | port 2 | port 1 | | c10t0d0p0 | c9t0d0p0 | - | port 4 | port 4 | | c12t0d0p0 | c11t0d0p0| \/ Here is the problem: 1)First i create a mirror with port2 and port1 devices zpool create myPool mirror c10t0d0p0 c9t0d0p0 -bash-3.2# zpool status pool: myPool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM myPool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0p0 ONLINE 0 0 0 c9t0d0p0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors 2)zfs create myPool/myfs 3)created a random file (file.txt - more or less 100MB size) digest -a md5 file.txt 3f9d17531d6103ec75ba9762cb250b4c 4)While making a second copy of the file: cp file.txt test I've shutdown the computer while the file was being copied. And restarted the computer again. And here is the result: -bash-3.2# zpool status pool: myPool state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL
Re: [zfs-discuss] ZFS problems with USB Storage devices
On Jun 3, 2008, at 18:34, Paulo Soeiro wrote: This test was done without the hub: FWIW, I bought 9 microSD's and 9 USB controller units for them from NewEgg to replicate the famous ZFS demo video, and I had problems getting them working with OpenSolaris (on VMWare on OSX, in this case). After getting frustrated and thinking about it for a while, I decided to test each MicroSD card and controller independently (using dd) and one of the adapters turned out to be flakey at just writing zeros. It also happened to be the #0 adapter which through me off for a while, since that's where I started. So, then I was still having problems (but I had tested the remaining units), so I went home for the weekend, and left them plugged into their hubs (i-rocks brand seems OK so far), and came back to a system log full of a second adapter dropping out several times over the weekend (though it survived a quick dd). Taking it off the hub, it did the same thing for me if I waited long enough (10 minutes or so - I assume it was getting warmed up). I've also had to replace a server mobo which had a faulty USB implementation (Compaq brand, one of the early USB2.0 chips). Just food for thought - there's a lot to go wrong before ZFS sees it and USB gear isn't always well-made. -Bill - Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 [EMAIL PROTECTED] Cell: 603.252.2606 http://www.bfccomputing.com/Page: 603.442.1833 Blog: http://blog.bfccomputing.com/ VCard: http://bfccomputing.com/vcard/bill.vcf ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems with USB Storage devices
Paulo Soeiro wrote: Greetings, I was experimenting with zfs, and i made the following test, i shutdown the computer during a write operation in a mirrored usb storage filesystem. Here is my configuration NGS USB 2.0 Minihub 4 3 USB Silicom Power Storage Pens 1 GB each These are the ports: hub devices /---\ | port 2 | port 1 | | c10t0d0p0 | c9t0d0p0 | - | port 4 | port 4 | | c12t0d0p0 | c11t0d0p0| \/ Here is the problem: 1)First i create a mirror with port2 and port1 devices zpool create myPool mirror c10t0d0p0 c9t0d0p0 -bash-3.2# zpool status pool: myPool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM myPool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0p0 ONLINE 0 0 0 c9t0d0p0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors 2)zfs create myPool/myfs 3)created a random file (file.txt - more or less 100MB size) digest -a md5 file.txt 3f9d17531d6103ec75ba9762cb250b4c 4)While making a second copy of the file: cp file.txt test I've shutdown the computer while the file was being copied. And restarted the computer again. And here is the result: -bash-3.2# zpool status pool: myPool state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c12t0d0p0 OFFLINE 0 0 0 c9t0d0p0 FAULTED 0 0 0 corrupted data pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors --- I was expecting that only one of the files was corrupted, not the all the filesystem. This looks exactly like the problem I had (thread USB stick unavailable after restart) and the answer I got was that you can't relay on the HUB ... I haven't tried another HUB yet but will eventually test the Adaptec XHub 4 (AUH-4000) which is on the HCL list... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems with USB Storage devices
Thommy, If I read correctly your post stated that the pools did not automount on startup, not that they would go corrupt. It seems to me that Paulo is actually experiencing a corrupt fs -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thommy M. Sent: 02 June 2008 13:19 To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS problems with USB Storage devices Paulo Soeiro wrote: Greetings, I was experimenting with zfs, and i made the following test, i shutdown the computer during a write operation in a mirrored usb storage filesystem. Here is my configuration NGS USB 2.0 Minihub 4 3 USB Silicom Power Storage Pens 1 GB each These are the ports: hub devices /---\ | port 2 | port 1 | | c10t0d0p0 | c9t0d0p0 | - | port 4 | port 4 | | c12t0d0p0 | c11t0d0p0| \/ Here is the problem: 1)First i create a mirror with port2 and port1 devices zpool create myPool mirror c10t0d0p0 c9t0d0p0 -bash-3.2# zpool status pool: myPool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM myPool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0p0 ONLINE 0 0 0 c9t0d0p0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors 2)zfs create myPool/myfs 3)created a random file (file.txt - more or less 100MB size) digest -a md5 file.txt 3f9d17531d6103ec75ba9762cb250b4c 4)While making a second copy of the file: cp file.txt test I've shutdown the computer while the file was being copied. And restarted the computer again. And here is the result: -bash-3.2# zpool status pool: myPool state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c12t0d0p0 OFFLINE 0 0 0 c9t0d0p0 FAULTED 0 0 0 corrupted data pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors --- I was expecting that only one of the files was corrupted, not the all the filesystem. This looks exactly like the problem I had (thread USB stick unavailable after restart) and the answer I got was that you can't relay on the HUB ... I haven't tried another HUB yet but will eventually test the Adaptec XHub 4 (AUH-4000) which is on the HCL list... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems with USB Storage devices
Justin Vassallo wrote: Thommy, If I read correctly your post stated that the pools did not automount on startup, not that they would go corrupt. It seems to me that Paulo is actually experiencing a corrupt fs Nah, I also had indications of corrupted data if you read my posts. But the data was there after I fiddled with the sticks and exported/imported the pool. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS problems with USB Storage devices
Greetings, I was experimenting with zfs, and i made the following test, i shutdown the computer during a write operation in a mirrored usb storage filesystem. Here is my configuration NGS USB 2.0 Minihub 4 3 USB Silicom Power Storage Pens 1 GB each These are the ports: hub devices /---\ | port 2 | port 1 | | c10t0d0p0 | c9t0d0p0 | - | port 4 | port 4 | | c12t0d0p0 | c11t0d0p0| \/ Here is the problem: 1)First i create a mirror with port2 and port1 devices zpool create myPool mirror c10t0d0p0 c9t0d0p0 -bash-3.2# zpool status pool: myPool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM myPool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0p0 ONLINE 0 0 0 c9t0d0p0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors 2)zfs create myPool/myfs 3)created a random file (file.txt - more or less 100MB size) digest -a md5 file.txt 3f9d17531d6103ec75ba9762cb250b4c 4)While making a second copy of the file: cp file.txt test I've shutdown the computer while the file was being copied. And restarted the computer again. And here is the result: -bash-3.2# zpool status pool: myPool state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM myPool UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c12t0d0p0 OFFLINE 0 0 0 c9t0d0p0 FAULTED 0 0 0 corrupted data pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 errors: No known data errors --- I was expecting that only one of the files was corrupted, not the all the filesystem. Thanks Regards Paulo ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Problems under vmware
Hello, I'm having the same exact situation on one VM, and not on another VM on the same infrastructure. The only difference is that on the failing VM I initially created the pool with a name and then changed the mountpoint to another name. Did you found a solution to the issue? Should I consider to get back to UFS on this infrastructure? Thanx a lot Gabriele. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Problems under vmware
I have a test bed S10U5 system running under vmware ESX that has a weird problem. I have a single virtual disk, with some slices allocated as UFS filesystem for the operating system, and s7 as a ZFS pool. Whenever I reboot, the pool fails to open: May 8 17:32:30 niblet fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-CS, TYPE: Fault, VER: 1, SEVERITY: Major May 8 17:32:30 niblet EVENT-TIME: Thu May 8 17:32:30 PDT 2008 May 8 17:32:30 niblet PLATFORM: VMware Virtual Platform, CSN: VMware-50 35 75 0b a3 b3 e5 d4-38 3f 00 7a 10 c0 e2 d7, HOSTNAME: niblet May 8 17:32:30 niblet SOURCE: zfs-diagnosis, REV: 1.0 May 8 17:32:30 niblet EVENT-ID: f163d843-694d-4659-81e8-aa15bb72e2e0 May 8 17:32:30 niblet DESC: A ZFS pool failed to open. Refer to http://sun.com/msg/ZFS-8000-CS for more information. May 8 17:32:30 niblet AUTO-RESPONSE: No automated response will occur. May 8 17:32:30 niblet IMPACT: The pool data is unavailable May 8 17:32:30 niblet REC-ACTION: Run 'zpool status -x' and either attach the missing device or May 8 17:32:30 niblet restore from backup. According to 'zpool status', the device could not be opened: [EMAIL PROTECTED] ~ # zpool status pool: ospool state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: none requested config: NAMESTATE READ WRITE CKSUM ospool UNAVAIL 0 0 0 insufficient replicas c1t0d0s7 UNAVAIL 0 0 0 cannot open However, according to format, the device is perfectly accessible, and format even indicates that slice 7 is an active pool: [EMAIL PROTECTED] ~ # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 DEFAULT cyl 4092 alt 2 hd 128 sec 32 /[EMAIL PROTECTED],0/pci1000,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 Specify disk (enter its number): 0 selecting c1t0d0 [disk formatted] Warning: Current Disk has mounted partitions. /dev/dsk/c1t0d0s0 is currently mounted on /. Please see umount(1M). /dev/dsk/c1t0d0s1 is currently used by swap. Please see swap(1M). /dev/dsk/c1t0d0s3 is currently mounted on /usr. Please see umount(1M). /dev/dsk/c1t0d0s4 is currently mounted on /var. Please see umount(1M). /dev/dsk/c1t0d0s5 is currently mounted on /opt. Please see umount(1M). /dev/dsk/c1t0d0s6 is currently mounted on /home. Please see umount(1M). /dev/dsk/c1t0d0s7 is part of active ZFS pool ospool. Please see zpool(1M). Trying to import it does not find it: [EMAIL PROTECTED] ~ # zpool import no pools available to import Exporting it works fine: [EMAIL PROTECTED] ~ # zpool export ospool But then the import indicates that the pool may still be in use: [EMAIL PROTECTED] ~ # zpool import ospool cannot import 'ospool': pool may be in use from other system Adding the -f flag imports successfully: [EMAIL PROTECTED] ~ # zpool import -f ospool [EMAIL PROTECTED] ~ # zpool status pool: ospool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM ospool ONLINE 0 0 0 c1t0d0s7 ONLINE 0 0 0 errors: No known data errors And then everything works perfectly fine, until I reboot again, at which point the cycle repeats. I have a similar test bed running on actual x4100 hardware that doesn't exhibit this problem. Any idea what's going on here? -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems in dCache
On Wed, Aug 01, 2007 at 09:49:26AM -0700, Sergey Chechelnitskiy wrote: Hi Sergey, I have a flat directory with a lot of small files inside. And I have a java application that reads all these files when it starts. If this directory is located on ZFS the application starts fast (15 mins) when the number of files is around 300,000 and starts very slow (more than 24 hours) when the number of files is around 400,000. The question is why ? Let's set aside the question why this application is designed this way. I still needed to run this application. So, I installed a linux box with XFS, mounted this XFS directory to the Solaris box and moved my flat directory there. Then my application started fast ( 30 mins) even if the number of files (in the linux operated XFS directory mounted thru NSF to the Solaris box) was 400,000 or more. Basicly, what I want to do is to run this application on a Solaris box. Now I cannot do it. Just a rough guess - this might be a Solaris threading problem. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6518490 So perhaps starting the app with -XX:-UseThreadPriorities may help ... Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems in dCache
We have the same issue (using dCache on Thumpers, data on ZFS). A workaround has been to move the directory on a local UFS filesystem using a low nbpi parameter. However, this is not a solution. Doesn't look like a threading problem, thanks anyway Jens ! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems in dCache
I think I am having the same problem using a different application (Windchill). zfs is consuming hugh amounts of memory and system (T2000) is performing poorly. Occasionally it will take a long time (several hours) to do a snapshot. Normally a snapshot will take a second or two. The application will allow me to break the one directory which has almost 600,000 files in to several directories. I am in the process of doing this now. I never thought it was a good idea to have that many files in one directory. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems in dCache
Boyd Adamson [EMAIL PROTECTED] wrote: Or alternatively, are you comparing ZFS(Fuse) on Linux with XFS on Linux? That doesn't seem to make sense since the userspace implementation will always suffer. Someone has just mentioned that all of UFS, ZFS and XFS are available on FreeBSD. Are you using that platform? That information would be useful too. FreeBSD does not use what Solaris calls UFS. Both Solaris and FreeBSD did start with the same filesystem code but Sun did start enhancing UFD in the late 1980's while BSD did not take over the changes. Later BSD started a fork on the filesystemcode. Filesystem performance thus cannot be compared. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems in dCache
On 01/08/2007, at 7:50 PM, Joerg Schilling wrote: Boyd Adamson [EMAIL PROTECTED] wrote: Or alternatively, are you comparing ZFS(Fuse) on Linux with XFS on Linux? That doesn't seem to make sense since the userspace implementation will always suffer. Someone has just mentioned that all of UFS, ZFS and XFS are available on FreeBSD. Are you using that platform? That information would be useful too. FreeBSD does not use what Solaris calls UFS. Both Solaris and FreeBSD did start with the same filesystem code but Sun did start enhancing UFD in the late 1980's while BSD did not take over the changes. Later BSD started a fork on the filesystemcode. Filesystem performance thus cannot be compared. I'm aware of that, but they still call it UFS. I'm trying to determine what the OP is asking. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems in dCache
On 01/08/2007, at 7:50 PM, Joerg Schilling wrote: Boyd Adamson [EMAIL PROTECTED] wrote: Or alternatively, are you comparing ZFS(Fuse) on Linux with XFS on Linux? That doesn't seem to make sense since the userspace implementation will always suffer. Someone has just mentioned that all of UFS, ZFS and XFS are available on FreeBSD. Are you using that platform? That information would be useful too. FreeBSD does not use what Solaris calls UFS. Both Solaris and FreeBSD did start with the same filesystem code but Sun did start enhancing UFD in the late 1980's while BSD did not take over the changes. Later BSD started a fork on the filesystemcode. Filesystem performance thus cannot be compared. I'm aware of that, but they still call it UFS. I'm trying to determine what the OP is asking. I seem to remember many daemons that used large grouping of files such as this changing to a split out directory tree starting in the late 80's to avoid slow stat issues. Is this type of design (tossing 300k+ files into one flat directory) becoming more acceptable again? -Wade ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems in dCache
Hi All, Thank you for answers. I am not really comparing anything. I have a flat directory with a lot of small files inside. And I have a java application that reads all these files when it starts. If this directory is located on ZFS the application starts fast (15 mins) when the number of files is around 300,000 and starts very slow (more than 24 hours) when the number of files is around 400,000. The question is why ? Let's set aside the question why this application is designed this way. I still needed to run this application. So, I installed a linux box with XFS, mounted this XFS directory to the Solaris box and moved my flat directory there. Then my application started fast ( 30 mins) even if the number of files (in the linux operated XFS directory mounted thru NSF to the Solaris box) was 400,000 or more. Basicly, what I want to do is to run this application on a Solaris box. Now I cannot do it. Thanks, Sergey On August 1, 2007 08:15 am, [EMAIL PROTECTED] wrote: On 01/08/2007, at 7:50 PM, Joerg Schilling wrote: Boyd Adamson [EMAIL PROTECTED] wrote: Or alternatively, are you comparing ZFS(Fuse) on Linux with XFS on Linux? That doesn't seem to make sense since the userspace implementation will always suffer. Someone has just mentioned that all of UFS, ZFS and XFS are available on FreeBSD. Are you using that platform? That information would be useful too. FreeBSD does not use what Solaris calls UFS. Both Solaris and FreeBSD did start with the same filesystem code but Sun did start enhancing UFD in the late 1980's while BSD did not take over the changes. Later BSD started a fork on the filesystemcode. Filesystem performance thus cannot be compared. I'm aware of that, but they still call it UFS. I'm trying to determine what the OP is asking. I seem to remember many daemons that used large grouping of files such as this changing to a split out directory tree starting in the late 80's to avoid slow stat issues. Is this type of design (tossing 300k+ files into one flat directory) becoming more acceptable again? -Wade ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS problems in dCache
Hi All, We have a problem running a scientific application dCache on ZFS. dCache is a java based software that allows to store huge datasets in pools. One dCache pool consists of two directories pool/data and pool/control. The real data goes into pool/data/ For each file in pool/data/ the pool/control/ directory contains two small files, one is 23 bytes, another one is 989 bytes. When dcache pool starts it consecutively reads all the files in control/ directory. We run a pool on ZFS. When we have approx 300,000 files in control/ the pool startup time is about 12-15 minutes. When we have approx 350,000 files in control/ the pool startup time increases to 70 minutes. If we setup a new zfs pool with the smalles possible blocksize and move control/ there the startup time decreases to 40 minutes (in case of 350,000 files). But if we run the same pool on XFS the startup time is only 15 minutes. Could you suggest to reconfigure ZFS to decrease the startup time. When we have approx 400,000 files in control/ we were not able to start the pool in 24 hours. UFS did not work either in this case, but XFS worked. What could be the problem ? Thank you, -- -- Best Regards, Sergey Chechelnitskiy ([EMAIL PROTECTED]) WestGrid/SFU ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems in dCache
Sergey Chechelnitskiy [EMAIL PROTECTED] writes: Hi All, We have a problem running a scientific application dCache on ZFS. dCache is a java based software that allows to store huge datasets in pools. One dCache pool consists of two directories pool/data and pool/control. The real data goes into pool/data/ For each file in pool/data/ the pool/control/ directory contains two small files, one is 23 bytes, another one is 989 bytes. When dcache pool starts it consecutively reads all the files in control/ directory. We run a pool on ZFS. When we have approx 300,000 files in control/ the pool startup time is about 12-15 minutes. When we have approx 350,000 files in control/ the pool startup time increases to 70 minutes. If we setup a new zfs pool with the smalles possible blocksize and move control/ there the startup time decreases to 40 minutes (in case of 350,000 files). But if we run the same pool on XFS the startup time is only 15 minutes. Could you suggest to reconfigure ZFS to decrease the startup time. When we have approx 400,000 files in control/ we were not able to start the pool in 24 hours. UFS did not work either in this case, but XFS worked. What could be the problem ? Thank you, I'm not sure I understand what you're comparing. Is there an XFS implementation for Solaris that I don't know about? Are you comparing ZFS on Solaris vs XFS on Linux? If that's the case it seems there is much more that's different than just the filesystem. Or alternatively, are you comparing ZFS(Fuse) on Linux with XFS on Linux? That doesn't seem to make sense since the userspace implementation will always suffer. Someone has just mentioned that all of UFS, ZFS and XFS are available on FreeBSD. Are you using that platform? That information would be useful too. Boyd ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] ZFS problems
Hello James, Saturday, November 18, 2006, 11:34:52 AM, you wrote: JM as far as I can see, your setup does not mee the minimum JM redundancy requirements for a Raid-Z, which is 3 devices. JM Since you only have 2 devices you are out on a limb. Actually only two disks for raid-z is fine and you get redundancy. However it would make more sense to do mirror with just two disk - performance would be better and available space would be the same. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems
David Dyer-Bennet wrote: On 11/26/06, Al Hopper [EMAIL PROTECTED] wrote: [4] I proposed this solution to a user on the [EMAIL PROTECTED] list - and it resolved his problem. His problem - the system would reset after getting about 1/2 way through a Solaris install. The installer was simply acting as a good system exerciser and heating up his CPU until it glitched out. After he removed the CPU fan and cleaned up his heatsink - he loaded up Solaris successfully. I just identified and fixed exactly this symptom on my mother's Windows system, in fact; it'd get half-way through an install, then start getting flakier and flakier, and fairly soon refuse to boot at all. This made me think heat, and on examination the fan on the CPU cooler wasn't spinning *at all*. It's less than two years old -- but one of the three wires seems to be broken off right at the fan, so that may be the problem. It's not seized up physically, though it's a bit stiff. Anyway, while the software here isn't Solaris, the basic diagnostic issue is the same. This kind of thing is remarkably common, in fact! Yep, the top 4 things that tend to break are: fans, power supplies, disks, and memory (in no particular order). The enterprise-class systems should monitor the fan speed and alert when they are not operating normally. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems
First thing is I would like to thank everyone for their replies/help. This machine has been running for two years under Linux, but for last two or three months has had Nexenta Solaris on it. This machine has never once crashed. I rebooted with a Knoppix disk in and ran memtest86. Within 30 minutes it counted several hundred errors which after cleaning the connections still occurred in the same locations. I replaced the RAM module and retested with no errors. My md5sums all verified no data was lost making me very happy. I did a zpool scrub which came back 100% clean. I still don't understand how the machine ran reliably with bad ram. That being said, a few days later I did a zpool status and saw 20 checksum errors on one drive and 30 errors on the other. Does anyone have any idea why I have to do zpool export amber followed by zpool import amber for my zpool to be mounted on reboot? zfs set mountpoint does nothing. BTW to answer some other concerns, the Seasonic supply is 400Watts with a guaranteed minimum efficency of 80%. Using a kill-o-watt meter I have about 120Watts power consumption. The machine is on a UPS. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS problems
I'm new to this group, so hello everyone! I am having some issues with my Nexenta system I set up about two months ago as a zfs/zraid server. I have two new Maxtor 500GB Sata drives and an Adaptec controller which I believe has a Silicon Image chipset. Also I have a Seasonic 80+ power supply, so the power should be as clean as you can get. I had an issue with Nexenta where I had to reinstall, and since then everytime I reboot I have to type zpool export amber zpool import amber to get my zfs volume mounted. A week ago I noticed a couple of CKSUM errors when I did a zpool status, so I did a zpool scrub. This is the output after: # zpool status pool: amber state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed with 0 errors on Mon Nov 13 04:49:35 2006 config: NAMESTATE READ WRITE CKSUM amber ONLINE 0 0 0 raidz1ONLINE 0 0 0 c4d0ONLINE 0 051 c5d0ONLINE 0 041 errors: No known data errors I have md5sums on a lot of the files and it looks like maybe 5% of my files are corrupted. Does anyone have any ideas? I was under the impression that zfs was pretty reliable but I guess with any software it needs time to get the bugs ironed out. Michael ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems
On 11/18/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: ... scrub: scrub completed with 0 errors on Mon Nov 13 04:49:35 2006 config: NAMESTATE READ WRITE CKSUM amber ONLINE 0 0 0 raidz1ONLINE 0 0 0 c4d0ONLINE 0 051 c5d0ONLINE 0 041 errors: No known data errors I have md5sums on a lot of the files and it looks like maybe 5% of my files are corrupted. Does anyone have any ideas? Michael, as far as I can see, your setup does not mee the minimum redundancy requirements for a Raid-Z, which is 3 devices. Since you only have 2 devices you are out on a limb. Please read the manpage for the zpool command and pay close attention to the restrictions in the section on raidz. I was under the impression that zfs was pretty reliable but I guess with any software it needs time to get the bugs ironed out. ZFS is reliable. I use it - mirrored - at home. If I was going to use raidz or raidz2 I would make sure that I followed the instructions in the manpage about the number of devices I need in order to guarantee redundancy and thus reliability, rather than making an assumption. You should also check the output of iostat -En and see whether your devices are listed there with any error counts. James C. McPherson -- Solaris kernel software engineer, system admin and troubleshooter http://www.jmcp.homeunix.com/blog Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems
Hi Michael. Based on the output, there should be no user-visible file corruption. ZFS saw a bunch of checksum errors on the disk, but was able to recover in every instance. While 2-disk RAID-Z is really a fancy (and slightly more expensive, CPU-wise) way of doing mirroring, at no point should your data be at risk. I've been working on ZFS a long time, and if what you say is true, it will be the first instance I have ever seen (or heard) of such a phenomenon. I strongly doubt that somehow ZFS returned corrupted data without knowing about it. How are you sure that some application on your box didn't modify the contents of the files? --Bill On Sat, Nov 18, 2006 at 02:01:39AM -0800, [EMAIL PROTECTED] wrote: I'm new to this group, so hello everyone! I am having some issues with my Nexenta system I set up about two months ago as a zfs/zraid server. I have two new Maxtor 500GB Sata drives and an Adaptec controller which I believe has a Silicon Image chipset. Also I have a Seasonic 80+ power supply, so the power should be as clean as you can get. I had an issue with Nexenta where I had to reinstall, and since then everytime I reboot I have to type zpool export amber zpool import amber to get my zfs volume mounted. A week ago I noticed a couple of CKSUM errors when I did a zpool status, so I did a zpool scrub. This is the output after: # zpool status pool: amber state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed with 0 errors on Mon Nov 13 04:49:35 2006 config: NAMESTATE READ WRITE CKSUM amber ONLINE 0 0 0 raidz1ONLINE 0 0 0 c4d0ONLINE 0 051 c5d0ONLINE 0 041 errors: No known data errors I have md5sums on a lot of the files and it looks like maybe 5% of my files are corrupted. Does anyone have any ideas? I was under the impression that zfs was pretty reliable but I guess with any software it needs time to get the bugs ironed out. Michael ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems
On 18-Nov-06, at 2:01 PM, Bill Moore wrote: Hi Michael. Based on the output, there should be no user-visible file corruption. ZFS saw a bunch of checksum errors on the disk, but was able to recover in every instance. While 2-disk RAID-Z is really a fancy (and slightly more expensive, CPU-wise) way of doing mirroring, at no point should your data be at risk. I've been working on ZFS a long time, and if what you say is true, it will be the first instance I have ever seen (or heard) of such a phenomenon. I strongly doubt that somehow ZFS returned corrupted data without knowing about it. Also, I'd check your RAM. --Toby How are you sure that some application on your box didn't modify the contents of the files? --Bill On Sat, Nov 18, 2006 at 02:01:39AM -0800, [EMAIL PROTECTED] wrote: I'm new to this group, so hello everyone! I am having some issues with my Nexenta system I set up about two months ago as a zfs/ zraid server. I have two new Maxtor 500GB Sata drives and an Adaptec controller which I believe has a Silicon Image chipset. Also I have a Seasonic 80+ power supply, so the power should be as clean as you can get. I had an issue with Nexenta where I had to reinstall, and since then everytime I reboot I have to type zpool export amber zpool import amber to get my zfs volume mounted. A week ago I noticed a couple of CKSUM errors when I did a zpool status, so I did a zpool scrub. This is the output after: # zpool status pool: amber state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed with 0 errors on Mon Nov 13 04:49:35 2006 config: NAMESTATE READ WRITE CKSUM amber ONLINE 0 0 0 raidz1ONLINE 0 0 0 c4d0ONLINE 0 051 c5d0ONLINE 0 041 errors: No known data errors I have md5sums on a lot of the files and it looks like maybe 5% of my files are corrupted. Does anyone have any ideas? I was under the impression that zfs was pretty reliable but I guess with any software it needs time to get the bugs ironed out. Michael ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems
On Sat, 18 Nov 2006 [EMAIL PROTECTED] wrote: I'm new to this group, so hello everyone! I am having some issues with Welcome! my Nexenta system I set up about two months ago as a zfs/zraid server. I have two new Maxtor 500GB Sata drives and an Adaptec controller which I believe has a Silicon Image chipset. Also I have a Seasonic 80+ power supply, so the power should be as clean as you can get. I had an issue Just wondering (out loud) if your PSU is capable of meeting the demands of your current hardware - including the zfs related disk drives you just added and if the system is on a UPS. Just questions for you to answer and off topic for this list. But you'll see that this thought process is relevant to your particular problem - see more below. with Nexenta where I had to reinstall, and since then everytime I reboot I have to type zpool export amber zpool import amber to get my zfs volume mounted. A week ago I noticed a couple of CKSUM errors when I did a zpool status, so I did a zpool scrub. This is the output after: # zpool status pool: amber state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed with 0 errors on Mon Nov 13 04:49:35 2006 config: NAMESTATE READ WRITE CKSUM amber ONLINE 0 0 0 raidz1ONLINE 0 0 0 c4d0ONLINE 0 051 c5d0ONLINE 0 041 errors: No known data errors I have md5sums on a lot of the files and it looks like maybe 5% of my files are corrupted. Does anyone have any ideas? I was under the impression that zfs was pretty reliable but I guess with any software it needs time to get the bugs ironed out. [ I've seen the response where one astute list participate noticed you're running a 2-way raidz device, when the documentation clearly states that the mimimum raidz volume consists of 3 devices ] Going back to zero day (my terminology) for ZFS, when it was first integrated, if you read the zfs related blogs, you'll realize that zfs is arguably one of the most extensively tested bodies of software _ever_ added to (Open)Solaris. If there was a basic issue with zfs, like you describe above, zfs would never have been integrated (into (Open)Solaris). You can imagine that there were a lot of willing zfs testers (please can I be on the beta test...)[0] - but there were also a few cases of this issue has *got* to be ZFS related - because there were no other _rational_ explanations. One such case is mentioned here: http://blogs.sun.com/roller/page/elowe?anchor=zfs_saves_the_day_ta I would suggest that you look for some basic hardware problems within your system. The first place to start is to download/burn a copy of the Ultimate Boot CD ROM (UBCD) [1] and run the latest version of memtest memtest86 for 24 hours. It's likely that you have hardware issues. Please keep the list informed [0] including this author who built hardware specifically to eval/test/use ZFS and get it into production ASAP to solve a business storage problem for $6k instead of $30k to $40k. [1] http://www.ultimatebootcd.com/ Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS problems
[ I've seen the response where one astute list participate noticed you're running a 2-way raidz device, when the documentation clearly states that the mimimum raidz volume consists of 3 devices ] Not very astute. The documentation clearly states that the minimum is 2 devices. zpool(1M): A raidz group with N disks of size X can hold approxi- mately (N-1)*Xbytes and can withstand one device failing before data integrity is compromised. The minimum number of devices in a raidz group is 2. The recommended number is between 3 and 9. If the minimum were actually 3, this configuration wouldn't work at all. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss