[zfs-discuss] zfs snapshot stalled?
Hi all, I've just seen something weird. On a zpool which looks a bit busy right now (~ 100 read op/s, 100 kB/s) I started a zfs snapshot about an hour ago. Until now, a taking a snapshot took usually at few seconds at most even for largish ~TByte file systems. I don't know if the read IOs are currently related to the snapshot itself or if another user doing this since I have not looked prior to taking the snapshot. My remaining questions after searching the web: (1) Is it common that snapshots can take this long? (2) Is there a way to stop it if one assumes somethings went wrong? I.e. is there a special signal I could send it? Thanks for any hint Carsten -- Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics Callinstrasse 38, 30167 Hannover, Germany Phone/Fax: +49 511 762-17185 / -17193 http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice recommendations for backing up to ZFS
If you are using rsync already, I would run it on server in daemon mode. And there are Windows clients that support rsync protocol. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Success Stories
Hi all, I have built out an 8TB SAN at home using OpenSolaris + ZFS. I have yet to put it into 'production' as a lot of the issues raised on this mailing list are putting me off trusting my data onto the platform right now. Throughout time, I have stored my personal data on NetWare and now NT and this solution has been 100% reliable for the last 12 years. Never a single problem (nor have I had any issues with NTFS with the tens of thousands of spindles i've worked with over the years). I appreciate 99% of the time people only comment if they have a problem, which is why I think it'd be nice for some people who have successfully implemented ZFS, including making various use of the features (recovery, replacing disks, etc), could just reply to this post with a sentence or paragraph detailing how great it is for them. Not necessarily interested in very small implementations of one/two disks that haven't changed config since the first day it was installed, but more aimed towards setups that are 'organic' and have changed/been_administered over time (to show functionality of the tools, resilience of the platform, etc.).. .. Of course though, I guess a lot of people who may have never had a problem wouldn't even be signed up on this list! :-) Thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs snapshot stalled?
Hi again, brief update: the process ended successfully (at least a snapshot was created) after close to 2 hrs. Since the load is still the same as before taking the snapshot I blame other users' processes reading from that array for the long snapshot duration. Carsten Aulbert wrote: My remaining questions after searching the web: (1) Is it common that snapshots can take this long? (2) Is there a way to stop it if one assumes somethings went wrong? I.e. is there a special signal I could send it? Thanks for any hint Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Richard Elling пишет: Keep in mind that this is for Solaris 10 not opensolaris. Keep in mind that any changes required for Solaris 10 will first be available in OpenSolaris, including any changes which may have already been implemented. Indeed. For example, less than a week ago fix for the following two CRs (along with some others) was put back into Solaris Nevada: 6333409 traversal code should be able to issue multiple reads in parallel 6418042 want traversal in depth-first pre-order for quicker 'zfs send' This should have positive impact on 'zfs send' performance. Wbr, victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool import not working - I broke my pool...
If you have any backups of your boot volume, I found that the pool can be mounted on boot provided it's still listed in your /etc/zfs/zpool.cache file. I've moved to OpenSolaris now purely so I can take snapshots of my boot volume and backup that file. The relevant bug you need fixing is this one, but I've no idea how long it might take before that fix is done. http://bugs.opensolaris.org/view_bug.do?bug_id=6733267 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover after disk labels failure
Is there a way to recover from this problem? I'm pretty sure the data is still OK, it's just labels that get corrupted by controller or zfs. :( And this s confirmed by zdb, after loong wait for comparison of data and checksums: no data errors. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lost Disk Space
No takers? :) benr. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lost Disk Space
Hello there... I did see that already, talk with some guys without answer too... ;-) Actually, this week i did not see discrepancy between tools, but the pool information was wrong (space used). Exporting/importing, scrub, and etc, did not solve. I know that zfs is async in the status report ;-), but only after a reboot the status was OK again. ps.: b89 Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow zpool import with b98
I'm also seeing a very slow import on the 2008.11 build 98 prerelease. I have the following setup: a striped zpool of 2 mirrors, both mirrors have 1 local disk and 1 iscsi disk. I was testing a setup with iscsiboot (windows vista) with gpxeboot, every client was booted from a iscsi exposed zvol, I had a lot of snapshots, I enabled the automatic snapshot service last week, I configured it to create a snapshot every 15min , And I kept all snapshots, Today (after 4days) my first server crashed, I'm right now importing the pool on my second node to test failover, but the import is alredy running for more then 45min :-( If I had know it took that much time, I should have configured to remove ol snapshots. K -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow zpool import with b98
I'm also seeing a slow import on the 2008.11 build 98 prerelease. but my situation is a little different : I have the following setup: a striped zpool of 2 mirrors, both mirrors have 1 local disk and 1 iscsi disk. I was testing iscsiboot (windows vista) with gpxeboot, every client was booted from a iscsi exposed zvol, and was connected via cifs to 1 zfs filesystem I had a lot of snapshots, I enabled the automatic snapshot service last week, I configured it to create a snapshot every 15min for every zvol and zfs filesystem , And I configured it to keep all snapshots. Today (after running well for 4days) my first server crashed, I'm right now importing the pool on my second node to test failover, but the import is already running for more then 45min :-( So keeping all those snapshots is a dangerous situation since it takes so much time to import the zpool on a another node. K -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] unloading zfs module
Is it possible to unload zfs module without rebooting the computer. I am making some changes in zfs kernel code. then compiling it. i then want to reload the newly rebuilt module without rebooting. I tried modunload. But its always givin can't unload the module: Device busy . What can be the possible solution -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can i make my zpool as faulted.
Yuvraj, I see that you are using files as disks. You could write a few random bytes to one of the files and that would induce corruption. To make a particular disk faulty you could mv the file to a new name. Also, you can explore the zinject from zfs testsuite . Probably it has a way to induce fault. Thanks and regards, Sanjeev yuvraj wrote: Hi Sanjeev, I am herewith giving all the details of my zpool by firirng #zpool status command on commandline. Please go through the same and help me out. Thanks in advance. Regards, Yuvraj Balkrishna Jadhav. == # zpool status pool: mypool1 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM mypool1 ONLINE 0 0 0 /disk1ONLINE 0 0 0 /disk2ONLINE 0 0 0 errors: No known data errors pool: zpool21 state: ONLINE scrub: scrub completed with 0 errors on Sat Oct 18 13:01:52 2008 config: NAMESTATE READ WRITE CKSUM zpool21 ONLINE 0 0 0 /disk3ONLINE 0 0 0 /disk4ONLINE 0 0 0 errors: No known data errors -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unloading zfs module
shelly wrote: Is it possible to unload zfs module without rebooting the computer. I am making some changes in zfs kernel code. then compiling it. i then want to reload the newly rebuilt module without rebooting. I tried modunload. But its always givin can't unload the module: Device busy . What can be the possible solution If your root/boot filesystem is zfs then you won't be able to unload the module. If your root/boot filesystem is nfs or ufs then you can unload it like this: svcadm disable -t fmd svcadm disable -t sysevent pools=`zpool list -H -o name` for p in $pools ; do zpool export $p done sleep 5 modunload -i 0 modinfo | grep zfs If you do NOT see any output from modinfo then the unload succeeded if you DO see output from modinfo it did not succeed. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
On Mon, Oct 20, 2008 at 1:52 AM, Victor Latushkin [EMAIL PROTECTED]wrote Indeed. For example, less than a week ago fix for the following two CRs (along with some others) was put back into Solaris Nevada: 6333409 traversal code should be able to issue multiple reads in parallel 6418042 want traversal in depth-first pre-order for quicker 'zfs send' That is helpful Victor. Does anyone have a full list of CRs that I can provide to sun support? I have tried searching the bugs database, but I didn't even find those two on my own. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] ZFS Success Stories
On Mon, Oct 20, 2008 at 03:10, gm_sjo [EMAIL PROTECTED] wrote: I appreciate 99% of the time people only comment if they have a problem, which is why I think it'd be nice for some people who have successfully implemented ZFS, including making various use of the features (recovery, replacing disks, etc), could just reply to this post with a sentence or paragraph detailing how great it is for them. My initial test of zfs was with a few IDE disks which I had found flaky on other platforms (md5 mismatches, that kind of thing). I put them all in a non-redundant pool, and loaded some data on the pool. Then I let it sit in the corner serving NFS for a couple weeks, scrubbed the pool every once in a while, and watched the error counters. It confirmed what I'd seen: these disks gave off errors spontaneously. This was a good start: the first time I'd seen a storage stack that had the audacity to complain about problems with its hardware. So I upgraded, and put in known-good disks. I started with a mirrored pair of 750s, then added another pair, then added a pair of log disks. At each step, things moved smoothly, and speed increased. I've also helped my brother set up a Solaris/ZFS setup, on a bit larger scale but with a more static configuration. He started with Linux, md raid, and XFS, using raid 5 on 8 320GB disks and a Supermicro AOC-SAT2-MV8 Marvell controller. Unfortunately, he lost basically the entire array due to corruption in some layer of the stack. So I suggested ZFS as an alternative. This was around Build 67 of Nevada. He put his 8 disks in a raidz pool. About a year ago, he bought six 500gb disks and another Marvell controller, made a new raidz vdev (in a new pool) out of them, and added six of the 320gb disks in another vdev. A month or so ago, he bought six 1TB disks, made a new pool out of them, and moved all his data over to it. At each step of the way, he upgraded to solve a problem. Moving from Linux to Solaris was because it had better drivers for the Marvell-based card. Adding the 500GB disks was because he was out of space, and the reason we didn't just add another vdev to the existing pool is because his case only has room for 13 disks. Finally, the 320 gig disks have started returning checksum errors, so he wanted to get them out of the pool. The system as a whole has been very reliable, but due to some ZFS limitations (no vdev removal, no stripe width changing) a new pool has been needed at each stage. My experiences with ZFS at home have been very positive, but I also use it at work. I'm concerned about the speed of zfs send and with being able to remove vdevs before I will recommend it unilaterally for work purposes, but despite these issues I have a couple pools in production: one serving mail, one serving user home directories, and one serving data for research groups. We have had no problems with these pools, but I keep an eye on the backup logs for them. I hope that eventually such careful watching will not be necessary. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice recommendations for backing up to ZFS Fileserver
[EMAIL PROTECTED] wrote on 10/19/2008 01:59:29 AM: Ares Drake wrote: Greetings. I am currently looking into setting up a better backup solution for our family. I own a ZFS Fileserver with a 5x500GB raidz. I want to back up data (not the OS itself) from multiple PCs running linux oder windowsXP. The linux boxes are connected via 1000Mbit, the windows machines either via gigabit as well or 54Mbit WPA encrypted WLAN. So far i've set up sharing via NFS on the Solaris box and it works well from both Linux and Windows (via SFU). Anyone have a similar setup, recommendations, or maybe something I could use as an idea? I store all the data worth keeping on my windows boxes on either iSCSI volumes exported from a ZFS server, or SMB shares from the same server. That way, nothing has to be coped and they never run out of space. I use rsync with the inplace-update setting and a blocksize matching the zfs store. I call it from the zfs server and snap after it finishes. seems to allow for a massive amount of snaps as only changed blocks are updated. I make sure to --exclude volatile and non important files such as swap. I then share the snapshot directory via samba with vfs shadowcopy enabled to allow for restores from any of the snaps via xp's shadowcopy interface. 3x raw storage on my zfs server allows me about 1 year of staggered snapshots -- your milage will vary with data volatility and snap frequency. -Wade ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
Paul B. Henson wrote: snip At about 5000 filesystems, it starts taking over 30 seconds to create/delete additional filesystems. At 7848, over a minute: # time zfs create export/user/test real1m22.950s user1m12.268s sys 0m10.184s I did a little experiment with truss: # truss -c zfs create export/user/test2 syscall seconds calls errors _exit.000 1 read .004 892 open .023 67 2 close.001 80 brk .006 653 getpid .0378598 mount.006 1 sysi86 .000 1 ioctl 115.534 313036787920 execve .000 1 fcntl.000 18 openat .000 2 mkdir.000 1 getppriv .000 1 getprivimplinfo .000 1 issetugid.000 4 sigaction.000 1 sigfillset .000 1 getcontext .000 1 setustack.000 1 mmap .000 78 munmap .000 28 xstat.000 65 21 lxstat .000 1 1 getrlimit.000 1 memcntl .000 16 sysconfig.000 5 lwp_sigmask .000 2 lwp_private .000 1 llseek .084 15819 door_info.000 13 door_call.1038391 schedctl .000 1 resolvepath .000 19 getdents64 .000 4 stat64 .000 3 fstat64 .000 98 zone_getattr .000 1 zone_lookup .000 2 -- sys totals: 115.804 31338551 7944 usr time: 107.174 elapsed: 897.670 and it seems the majority of time is spent in ioctl calls, specifically: ioctl(16, MNTIOC_GETMNTENT, 0x08045A60) = 0 Yes, the implementation of the above ioctl walks the list of mounted filesystems 'vfslist' [in this case it walks 5000 nodes of a linked list before the ioctl returns] This in-kernel traversal of the filesystems is taking time. Interestingly, I tested creating 6 filesystems simultaneously, which took a total of only three minutes, rather than 9 minutes had they been created sequentially. I'm not sure how parallelizable I can make an identity management provisioning system though. Was I mistaken about the increased scalability that was going to be available? Is there anything I could configure differently to improve this performance? We are going to need about 30,000 filesystems to cover our You could set 'zfs set mountpoint=none pool-name' and then create the filesystems under the pool-name . [In my experiments the number of ioctl's went down drastically.] You could then set a mountpoint for the pool and then issue a 'zpool mount -a' . Pramod faculty, staff, students, and group project directories. We do have 5 x4500's which will be allocated to the task, so about 6000 filesystems per. Depending on what time of the quarter it is, our identity management sytem can create hundreds up to thousands of accounts, and when we purge accounts quarterly we typically delete 10,000 or so. Currently those jobs only take 2-6 hours, with this level of performance from ZFS they would take days if not over a week :(. Thanks for any suggestions. What is the internal recommendation on maximum number of file systems per server? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs cp hangs when the mirrors are removed ..
Hi Richard, Richard Elling wrote: Karthik Krishnamoorthy wrote: We did try with this zpool set failmode=continue pool option and the wait option before pulling running the cp command and pulling out the mirrors and in both cases there was a hang and I have a core dump of the hang as well. You have to wait for the I/O drivers to declare that the device is dead. This can be up to several minutes, depending on the driver. Okay, the customer indicated they didn't see a hang with the ufs when they ran the same test with UFS. Any pointers to the bug opening process ? http://bugs.opensolaris.org, or bugster if you have an account. Be sure to indicate which drivers you are using, as this is not likely a ZFS bug, per se. Output from prtconf -D should be a minimum. I have the core dump of the hang. Will make that available as well. Thanks and regards, Karthik -- richard Thanks Karthik On 10/15/08 22:27, Neil Perrin wrote: On 10/15/08 23:12, Karthik Krishnamoorthy wrote: Neil, Thanks for the quick suggestion, the hang seems to happen even with the zpool set failmode=continue pool option. Any other way to recover from the hang ? You should set the property before you remove the devices. This should prevent the hang. It isn't used to recover from it. If you did do that then it seems like a bug somewhere in ZFS or the IO stack below it. In which case you should file a bug. Neil. thanks and regards, Karthik On 10/15/08 22:03, Neil Perrin wrote: Karthik, The pool failmode property as implemented governs the behaviour when all the devices needed are unavailable. The default behaviour is to wait (block) until the IO can continue - perhaps by re-enabling the device(s). The behaviour you expected can be achieved by zpool set failmode=continue pool, as shown in the link you indicated below. Neil. On 10/15/08 22:38, Karthik Krishnamoorthy wrote: Hello All, Summary: cp command for mirrored zfs hung when all the disks in the mirrored pool were unavailable. Detailed description: ~ The cp command (copy a 1GB file from nfs to zfs) hung when all the disks in the mirrored pool (both c1t0d9 and c2t0d9) were removed physically. NAMESTATE READ WRITE CKSUM testONLINE 0 0 0 mirrorONLINE 0 0 0 c1t0d9 ONLINE 0 0 0 c2t0d9 ONLINE 0 0 0 We think if all the disks in the pool are unavailable, cp command should fail with error (not cause hang). Our request: Please investigate the root cause of this issue. How to reproduce: ~ 1. create a zfs mirrored pool 2. execute cp command from somewhere to the zfs mirrored pool. 3. remove the both of disks physically during cp command working = hang happen (cp command never return and we can't kill cp command) One engineer pointed me to this page http://opensolaris.org/os/community/arc/caselog/2007/567/onepager/ and indicated that if all the mirrors are removed zfs enters a hang like state to prevent the kernel from going into a panic mode and this type of feature would be an RFE. My questions are Are there any documentation of the mirror configuration of zfs that explains what happens when the underlying drivers detect problems in one of the mirror devices? It seems that the traditional views of mirror or raid-2 would expect that the mirror would be able to proceed without interruption and that does not seem to be this case in ZFS. What is the purpose of the mirror, in zfs? Is it more like an instant backup? If so, what can the user do to recover, when there is an IO error on one of the devices? Appreciate any pointers and help, Thanks and regards, Karthik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Windows XP nfs client poor performance
Greetings. I have a X4500 with an 8TB RAIDZ datapool, currently 75% full. I have it carved up into several filesystems. I share out two of the filesystems /datapool/data4 (approx 1.5TB) and /datapool/data5 (approx 3.5TB). THe data is imagery, and the primary application on the PCs is Socetset. The clients are Windows XP Pro, and I use services for unix (SFU) to mount the nfs shares from the thumper. When a client PC accesses files from data4, they come across quickly. When the same client accesses files from data5, the transfer rate comes to a crawl, and sometimes the application times out. The only difference I can see is the size of the volume, the data is all of the same type. I could find no references for any limitations on the volume size of nfs shares or mounts. It seems inconsistent and difficult to duplicate. I plan to begin a more in-depth troubleshooting of the problem with dtrace. Has anyone seen anything like this before? Thanks. -Bob Bencze -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] RAID-Z True Volume Size?
Hi all, I have a little question. WIth RAID-Z rules, what is the true usable disks space? Is there a calcul like any RAID (ex. RAID5 = nb of disks - 1 for parity)? Thank you for your help, i found everywhere on the web and i don't found my answer... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] My 500-gig ZFS is gone: insufficient replicas, corrupted data
Eugene Gladchenko wrote: Hi, I'm running FreeBSD 7.1-PRERELEASE with a 500-gig ZFS drive. Recently I've encountered a FreeBSD problem (PR kern/128083) and decided about updating the motherboard BIOS. It looked like the update went right but after that I was shocked to see my ZFS destroyed! Rolling the BIOS back did not help. Now it looks like that: # zpool status pool: tank state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAMESTATE READ WRITE CKSUM tankUNAVAIL 0 0 0 insufficient replicas ad4 UNAVAIL 0 0 0 corrupted data # zdb -l /dev/ad4 LABEL 0 version=6 name='tank' state=0 txg=4 pool_guid=12069359268725642778 hostid=2719189110 hostname='home.gladchenko.ru' top_guid=5515037892630596686 guid=5515037892630596686 vdev_tree type='disk' id=0 guid=5515037892630596686 path='/dev/ad4' devid='ad:5QM0WF9G' whole_disk=0 metaslab_array=14 metaslab_shift=32 ashift=9 asize=500103118848 LABEL 1 version=6 name='tank' state=0 txg=4 pool_guid=12069359268725642778 hostid=2719189110 hostname='home.gladchenko.ru' top_guid=5515037892630596686 guid=5515037892630596686 vdev_tree type='disk' id=0 guid=5515037892630596686 path='/dev/ad4' devid='ad:5QM0WF9G' whole_disk=0 metaslab_array=14 metaslab_shift=32 ashift=9 asize=500103118848 LABEL 2 failed to unpack label 2 LABEL 3 failed to unpack label 3 This would occur if the beginning of the partition was intact, but the end is not. Causes for the latter include: 1. partition table changes (or vtoc for SMI labels) 2. something overwrote data at the end If the cause is #1, then restoring the partion should work. If the cause is #2, then the data may be gone. Note: ZFS can import a pool with one working label, but if more of the data is actually unavailable or overwritten, then it may not be able to get to a consistent state. -- richard # I've tried to import the problem pool into OpenSolaris 2008.05 with no success: # zpool import pool: tank id: 12069359268725642778 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-EY config: tank UNAVAIL 0 0 0 insufficient replicas c3d0s2 UNAVAIL 0 0 0 corrupted data # Is there a way to recover my files from this broken pool? Maybe at least some of them? The drive was 4/5 full. :( I would appreciate any help. p.s. I already bought another drive of the same size yesterday. My next ZFS experience definitely will be a mirrored one. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAID-Z True Volume Size?
William Saadi wrote: Hi all, I have a little question. WIth RAID-Z rules, what is the true usable disks space? It depends on what data you write to it, how the writes are done, and what compression or redundancy parameters are set. Is there a calcul like any RAID (ex. RAID5 = nb of disks - 1 for parity)? In general, by default, yes, this will work. raidz approximates raid-5, raidz2 approximates raid-6. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...
On Thu, Oct 16, 2008 at 03:50:19PM +0800, Gray Carper wrote: Sidenote: Today we made eight network/iSCSI related tweaks that, in aggregate, have resulted in dramatic performance improvements (some I just hadn't gotten around to yet, others suggested by Sun's Mertol Ozyoney)... - disabling the Nagle algorithm on the head node - setting each iSCSI target block size to match the ZFS record size of 128K - disabling thin provisioning on the iSCSI targets - enabling jumbo frames everywhere (each switch and NIC) - raising ddi_msix_alloc_limit to 8 - raising ip_soft_rings_cnt to 16 - raising tcp_deferred_acks_max to 16 - raising tcp_local_dacks_max to 16 Can you tell us which of those changes made the most dramatic improvement? I have a similar situation here, with a 2-TB ZFS pool on a T2000 using Iscsi to a Netapp file server. Is there any way to tell in advance if any of those changes will make a difference? Many of them seem to be server resources. How can I determine their current usage? -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Windows XP nfs client poor performance
I've had serious problems trying to get Windows to run as a NFS server with SFU. On a modern raid array I can't get it above 4MB/s transfer rates. It's slow enough that virtual machines running off it time out almost every time I try to boot them. Oddly it worked ok when I used an old IDE disk - I got about 20MB/s out of it then. But it's not good when an IDE disk can outperform a raid array capable of around 500MB/s. It's probably not directly related to the issues you're having, but my own experience of SFU would make me want to repeat your tests with a Linux client before asuming it's a Solaris or ZFS problem. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] ZFS Success Stories
We have 135 TB capacity with about 75 TB in use on zfs based storage. zfs use started about 2 years ago, and has grown from there. This spans 9 SAN appliances, with 5 head nodes, and 2 more recent servers running zfs on JBOD with vdevs made up of raidz2. So far, the experience has been very positive. Never lost a bit of data. We scrub weekly, and I've started sleeping better at night. I have also read the horror stories, but we aren't seeing them here. We did have some performance issues, especially involving the SAN storage on more heavily used systems, but enabling the cache on the SAN devices without pushing fsync through to disk basically fixed that. Your zfs layout can profoundly effect performance, which is a down side. It's best to test your setup under an approximate realistic work load to balance capacity with performance before deploying. BTW, most of our zfs deployment is on Solaris 10{u4,u5}, but two large servers are on OpenSolaris svn86. The OpenSolaris servers seem to be considerably faster, and more feature rich, without any reliability issues, so far. Jon gm_sjo wrote: Hi all, I have built out an 8TB SAN at home using OpenSolaris + ZFS. I have yet to put it into 'production' as a lot of the issues raised on this mailing list are putting me off trusting my data onto the platform right now. Throughout time, I have stored my personal data on NetWare and now NT and this solution has been 100% reliable for the last 12 years. Never a single problem (nor have I had any issues with NTFS with the tens of thousands of spindles i've worked with over the years). I appreciate 99% of the time people only comment if they have a problem, which is why I think it'd be nice for some people who have successfully implemented ZFS, including making various use of the features (recovery, replacing disks, etc), could just reply to this post with a sentence or paragraph detailing how great it is for them. Not necessarily interested in very small implementations of one/two disks that haven't changed config since the first day it was installed, but more aimed towards setups that are 'organic' and have changed/been_administered over time (to show functionality of the tools, resilience of the platform, etc.).. .. Of course though, I guess a lot of people who may have never had a problem wouldn't even be signed up on this list! :-) Thanks! ___ storage-discuss mailing list [EMAIL PROTECTED] http://mail.opensolaris.org/mailman/listinfo/storage-discuss -- - _/ _/ / - Jonathan Loran - - -/ / /IT Manager - - _ / _ / / Space Sciences Laboratory, UC Berkeley -/ / / (510) 643-5146 [EMAIL PROTECTED] - __/__/__/ AST:7731^29u18e3 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...
Gary, Sidenote: Today we made eight network/iSCSI related tweaks that, in aggregate, have resulted in dramatic performance improvements (some I just hadn't gotten around to yet, others suggested by Sun's Mertol Ozyoney)... - disabling the Nagle algorithm on the head node - setting each iSCSI target block size to match the ZFS record size of 128K - disabling thin provisioning on the iSCSI targets - enabling jumbo frames everywhere (each switch and NIC) - raising ddi_msix_alloc_limit to 8 - raising ip_soft_rings_cnt to 16 - raising tcp_deferred_acks_max to 16 - raising tcp_local_dacks_max to 16 Can you tell us which of those changes made the most dramatic improvement? - disabling the Nagle algorithm on the head node This will have a dramatic effective on most I/Os, except for large sequential writes. - setting each iSCSI target block size to match the ZFS record size of 128K - enabling jumbo frames everywhere (each switch and NIC) These will have a positive effect for large writes, both sequential and random - disabling thin provisioning on the iSCSI targets This only has a benefit for file-based or dsk based backing stores. If one use rdsk backing stores of any type, this is not an issue. Jim I have a similar situation here, with a 2-TB ZFS pool on a T2000 using Iscsi to a Netapp file server. Is there any way to tell in advance if any of those changes will make a difference? Many of them seem to be server resources. How can I determine their current usage? -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Jim Dunham Storage Platform Software Group Sun Microsystems, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Windows XP nfs client poor performance
On Mon, Oct 20, 2008 at 9:29 AM, Bob Bencze [EMAIL PROTECTED] wrote: Greetings. I have a X4500 with an 8TB RAIDZ datapool, currently 75% full. I have it carved up into several filesystems. I share out two of the filesystems /datapool/data4 (approx 1.5TB) and /datapool/data5 (approx 3.5TB). THe data is imagery, and the primary application on the PCs is Socetset. The clients are Windows XP Pro, and I use services for unix (SFU) to mount the nfs shares from the thumper. When a client PC accesses files from data4, they come across quickly. When the same client accesses files from data5, the transfer rate comes to a crawl, and sometimes the application times out. The only difference I can see is the size of the volume, the data is all of the same type. I could find no references for any limitations on the volume size of nfs shares or mounts. It seems inconsistent and difficult to duplicate. I plan to begin a more in-depth troubleshooting of the problem with dtrace. Has anyone seen anything like this before? Thanks. -Bob Bencze -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss SFU NFS is often slow, but tunable, here is something you might find handy to squeeze some speed out of it: http://technet.microsoft.com/en-us/library/bb463205.aspx HTH -- Brent Jones [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Setting per-file record size / querying fs/file record size?
I've a report that the mismatch between SQLite3's default block size and ZFS' causes some performance problems for Thunderbird users. It'd be great if there was an API by which SQLite3 could set its block size to match the hosting filesystem or where it could set the DB file's record size to match the SQLite3/app default block size (1KB). Is there such an API? If not, is there an RFE I could add a call record to? Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
On Mon, 20 Oct 2008, Pramod Batni wrote: Yes, the implementation of the above ioctl walks the list of mounted filesystems 'vfslist' [in this case it walks 5000 nodes of a linked list before the ioctl returns] This in-kernel traversal of the filesystems is taking time. Hmm, O(n) :(... I guess that is the implementation of getmntent(3C)? Why does creating a new ZFS filesystem require enumerating all existing ones? You could set 'zfs set mountpoint=none pool-name' and then create the filesystems under the pool-name . [In my experiments the number of ioctl's went down drastically.] You could then set a mountpoint for the pool and then issue a 'zpool mount -a' . That would work for an initial mass creation, but we are going to need to create and delete fairly large numbers of file systems over time, this workaround would not help for that. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
A couple of updates: Installed Opensolaris on a Poweredge 1850 with a single network card, default iscsitarget configuration (no special tweaks or tpgt settings), vmotion was about 10 percent successful before I received write errors on disk. 10 percent better than the Poweredge 1900 iscsitarget. The GUID's are set by VMWare when the iscsi initiator connects to the Opensolaris Target. Therefore I have no control what the GUIDs are and from my observations it doesn't matter with the GUIDs are identical. Unless there is a bug in Vmware and GUIDs. I have followed the instructions to delete the backing stores, the zfs partitions and start a new. I even went as far as rebooting the machine after I created a Single LUN, connected to the vmware initiator. I then repeated the same steps when creating the second LUN. Overall VMWare determined the GUID # of the iscsi target. I Right now I am applying a ton of VMWare patches that have iscsi connectivity repairs and other security updates. I will be resorting back to a linux iscsi target model if the patches do not work to check whether the physical machines have an abnormality or networking that may be causing problems. I'll be submitting more updates as I continue testing! cliff notes: nothing has worked so far :( -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...
Hey, Jim! Thanks so much for the excellent assist on this - much better than I could have ever answered it! I thought I'd add a little bit on the other four... - raising ddi_msix_alloc_limit to 8 For PCI cards that use up to 8 interrupts, which our 10GBe adapters do. The previous value of 2 could cause some CPU interrupt bottlenecks. So far, this has been more of a preventative measure - we haven't seen a case where this really made any performance impact. - raising ip_soft_rings_cnt to 16 This increases the number of kernel threads associated with packet processing and is specifically meant to reduce the latency in handling 10GBe. This showed a small performance improvement. - raising tcp_deferred_acks_max to 16 This reduces the number of ACK packets sent, thus reducing the overall TCP overhead. This showed a small performance improvement. - raising tcp_local_dacks_max to 16 This also slows down ACK packets and showed a tiny performance improvement. Overall, we have found these four settings to not make a whole lot of difference, but every little bit helps. ; The four that Jim went through were much more impactful particularly the enabling of jumbo frames and the disabling of the Nagle algorithm. -Gray On Tue, Oct 21, 2008 at 4:21 AM, Jim Dunham [EMAIL PROTECTED] wrote: Gary, Sidenote: Today we made eight network/iSCSI related tweaks that, in aggregate, have resulted in dramatic performance improvements (some I just hadn't gotten around to yet, others suggested by Sun's Mertol Ozyoney)... - disabling the Nagle algorithm on the head node - setting each iSCSI target block size to match the ZFS record size of 128K - disabling thin provisioning on the iSCSI targets - enabling jumbo frames everywhere (each switch and NIC) - raising ddi_msix_alloc_limit to 8 - raising ip_soft_rings_cnt to 16 - raising tcp_deferred_acks_max to 16 - raising tcp_local_dacks_max to 16 Can you tell us which of those changes made the most dramatic improvement? - disabling the Nagle algorithm on the head node This will have a dramatic effective on most I/Os, except for large sequential writes. - setting each iSCSI target block size to match the ZFS record size of 128K - enabling jumbo frames everywhere (each switch and NIC) These will have a positive effect for large writes, both sequential and random - disabling thin provisioning on the iSCSI targets This only has a benefit for file-based or dsk based backing stores. If one use rdsk backing stores of any type, this is not an issue. Jim I have a similar situation here, with a 2-TB ZFS pool on a T2000 using Iscsi to a Netapp file server. Is there any way to tell in advance if any of those changes will make a difference? Many of them seem to be server resources. How can I determine their current usage? -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Jim Dunham Storage Platform Software Group Sun Microsystems, Inc. -- Gray Carper MSIS Technical Services University of Michigan Medical School [EMAIL PROTECTED] | skype: graycarper | 734.418.8506 http://www.umms.med.umich.edu/msis/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss