Re: [zfs-discuss] partioned cache devices
Andrew Werchowiecki wrote: Thanks for the info about slices, I may give that a go later on. I’m not keen on that because I have clear evidence (as in zpools set up this way, right now, working, without issue) that GPT partitions of the style shown above work and I want to see why it doesn’t work in my set up rather than simply ignoring and moving on. Didn't you read Richard's post? You can have only one Solaris partition at a time. Your original example failed when you tried to add a second. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
Andrew Werchowiecki wrote: Hi all, I'm having some trouble with adding cache drives to a zpool, anyone got any ideas? muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2 Password: cannot open '/dev/dsk/c25t10d1p2': I/O error muslimwookie@Pyzee:~$ I have two SSDs in the system, I've created an 8gb partition on each drive for use as a mirrored write cache. I also have the remainder of the drive partitioned for use as the read only cache. However, when attempting to add it I get the error above. Create one 100% Solaris partition and then use format to create two slices. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
Robert Milkowski wrote: Solaris 11.1 (free for non-prod use). But a ticking bomb if you use a cache device. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
Robert Milkowski wrote: Robert Milkowski wrote: Solaris 11.1 (free for non-prod use). But a ticking bomb if you use a cache device. It's been fixed in SRU (although this is only for customers with a support contract - still, will be in 11.2 as well). Then, I'm sure there are other bugs which are fixed in S11 and not in Illumos (and vice-versa). There may well be, but in seven+ years of using ZFS, this was the first one to cost me a pool. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SVM ZFS
Alfredo De Luca wrote: On Wed, Feb 27, 2013 at 10:36 AM, Paul Kraus p...@kraus-haus.org mailto:p...@kraus-haus.org wrote: On Feb 26, 2013, at 6:19 PM, Jim Klimov jimkli...@cos.ru mailto:jimkli...@cos.ru wrote: Ah, I forgot to mention - ufsdump|ufsrestore was at some time also a recommended way of such transition ;) The last time I looked at using ufsdump/ufsrestore for this ufsrestore was NOT aware of ZFS ACL semantics. That was under Solaris 10, but I would be surprised if the ufsrestore code has changed since then. what about Solaris live upgrade? It's been a long time, but I'm sure LU only supports UFS-ZFS for the root pool. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
Bob Friesenhahn wrote: On Tue, 26 Feb 2013, Richard Elling wrote: Consider using different policies for different data. For traditional file systems, you had relatively few policy options: readonly, nosuid, quota, etc. With ZFS, dedup and compression are also policy options. In your case, dedup for your media is not likely to be a good policy, but dedup for your backups could be a win (unless you're using something that already doesn't backup duplicate data -- eg most backup utilities). A way to approach this is to think of your directory structure and create file systems to match the policies. For example: I am finding that rsync with the right options (to directly block-overwrite) plus zfs snapshots is providing me with pretty amazing deduplication for backups without even enabling deduplication in zfs. Now backup storage goes a very long way. We do the same for all of our legacy operating system backups. Take a snapshot then do an rsync and an excellent way of maintaining incremental backups for those. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
Bob Friesenhahn wrote: On Wed, 27 Feb 2013, Ian Collins wrote: I am finding that rsync with the right options (to directly block-overwrite) plus zfs snapshots is providing me with pretty amazing deduplication for backups without even enabling deduplication in zfs. Now backup storage goes a very long way. We do the same for all of our legacy operating system backups. Take a snapshot then do an rsync and an excellent way of maintaining incremental backups for those. Magic rsync options used: -a --inplace --no-whole-file --delete-excluded This causes rsync to overwrite the file blocks in place rather than writing to a new temporary file first. As a result, zfs COW produces primitive deduplication of at least the unchanged blocks (by writing nothing) while writing new COW blocks for the changed blocks. Do these options impact performance or reduce the incremental stream sizes? I just use -a --delete and the snapshots don't take up much space (compared with the incremental stream sizes). -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there performance penalty when adding vdev to existing pool
Peter Wood wrote: I'm using OpenIndiana 151a7, zpool v28, zfs v5. When I bought my storage servers I intentionally left hdd slots available so I can add another vdev when needed and delay immediate expenses. After reading some posts on the mailing list I'm getting concerned about degrading performance due to unequal distribution of data among the vdevs. I still have a chance to migrate the data away, add all drives and rebuild the pools and start fresh. Before going that road I was hoping to hear your opinion on what will be the best way to handle this. System: Supermicro with 36 hdd bays. 28 bays filled with 3TB SAS 7.2K enterprise drives. 8 bays available to add another vdev to the pool. Pool configuration: snip # Will adding another vdev hurt the performance? How full is the pool? When I've added (or grown an existing) vdev, I used zfs send to make a copy of a suitably large filesystem, then deleted the original and renamed the copy. I had to do this a couple of times to redistribute data, but it saved a lot of down time. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there performance penalty when adding vdev to existing pool
Bob Friesenhahn wrote: On Thu, 21 Feb 2013, Sašo Kiselkov wrote: On 02/21/2013 12:27 AM, Peter Wood wrote: Will adding another vdev hurt the performance? In general, the answer is: no. ZFS will try to balance writes to top-level vdevs in a fashion that assures even data distribution. If your data is equally likely to be hit in all places, then you will not incur any performance penalties. If, OTOH, newer data is more likely to be hit than old data , then yes, newer data will be served from fewer spindles. In that case it is possible to do a send/receive of the affected datasets into new locations and then renaming them. You have this reversed. The older data is served from fewer spindles than data written after the new vdev is added. Performance with the newer data should be improved. Not if the pool is close to full, when new data will end up on fewer spindles (the new or extended vdev). -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there performance penalty when adding vdev to existing pool
Peter Wood wrote: Currently the pool is about 20% full: # zpool list pool01 NAME SIZE ALLOC FREE EXPANDSZCAP DEDUP HEALTH ALTROOT pool01 65.2T 15.4T 49.9T -23% 1.00x ONLINE - # So you will be about 15% full after adding a new vdev. Unless you are likely to get too close to filling the enlarged pool, you will probably be OK performance wise. The old data access times will be no worse, the new data better. If you can spread some of your old data around after added the new vdev, do so. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL
Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: From: Tim Cook [mailto:t...@cook.ms] We can agree to disagree. I think you're still operating under the auspices of Oracle wanting to have an open discussion. This is patently false. I'm just going to respond to this by saying thank you, Cindy, Casper, Neil, and others, for all the help over the years. I think we all agree it was cooler when opensolaris was open, but things are beyond our control, so be it. Moving forward, I don't expect Oracle to be any more open than MS or Apple or Google, which is to say, I understand there's stuff you can't talk about, and support you can't give freely or openly. But to the extent you're still able to discuss publicly known things, thank you. +1. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! RPool problem
Sašo Kiselkov wrote: On 02/16/2013 09:49 PM, John D Groenveld wrote: Boot with kernel debugger so you can see the panic. Sadly, though, without access to the source code, all he do can at that point is log a support ticket with Oracle (assuming he has paid his support fees) and hope it will get picked up by somebody there. People on this list have few, if any ways of helping out. If he can boot from a recent install media and import the pool, that's a pretty good indicator that the problem has been fixed. He can then upgrade the what ever he booted with (which could be OI or Solaris11.1) and recover his data. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL
Toby Thain wrote: Signed up, thanks. The ZFS list has been very high value and I thank everyone whose wisdom I have enjoyed, especially people like you Sašo, Mr Elling, Mr Friesenhahn, Mr Harvey, the distinguished Sun and Oracle engineers who post here, and many others. Let the Illumos list thrive. This list certainly has been high value for ZFS users (I think I subscribed the day is started!). One of its main advantages is it has been platform agnostic. We see Solaris, Illumos, BSD and more recently ZFS on Linux questions all give the same respect. I do hope we can get another, platform agnostic, home for this list. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL
Richard Elling wrote: On Feb 16, 2013, at 10:16 PM, Bryan Horstmann-Allen b...@mirrorshades.net wrote: +-- | On 2013-02-17 18:40:47, Ian Collins wrote: | One of its main advantages is it has been platform agnostic. We see Solaris, Illumos, BSD and more recently ZFS on Linux questions all give the same respect. I do hope we can get another, platform agnostic, home for this list. As the guy who provides the illumos mailing list services, and as someone who has deeply vested interests in seeing ZFS thrive on all platforms, I'm happy to suggest that we'd welcome all comers on z...@lists.illumos.org. +1 Me to. One list is certainly better than 1! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow zfs writes
Ram Chander wrote: Hi Roy, You are right. So it looks like re-distribution issue. Initially there were two Vdev with 24 disks ( disk 0-23 ) for close to year. After which which we added 24 more disks and created additional vdevs. The initial vdevs are filled up and so write speed declined. Now how to find files that are present in a Vdev or a disk. That way I can remove and re-copy back to distribute data. Any other way to solve this ? The only way is to avoid the problem in the first place by not mixing vdev sizes in a pool. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow zfs writes
Jim Klimov wrote: On 2013-02-12 10:32, Ian Collins wrote: Ram Chander wrote: Hi Roy, You are right. So it looks like re-distribution issue. Initially there were two Vdev with 24 disks ( disk 0-23 ) for close to year. After which which we added 24 more disks and created additional vdevs. The initial vdevs are filled up and so write speed declined. Now how to find files that are present in a Vdev or a disk. That way I can remove and re-copy back to distribute data. Any other way to solve this ? The only way is to avoid the problem in the first place by not mixing vdev sizes in a pool. I was a bit quick off the mark there, I didn't notice that some vdevs were older than others. Well, that disbalance is there - in the zpool status printout we see raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares - which seems to sum up to 48 ;) The vdev sizes are about (including parity space) 14, 14, 22, 19, 19, 19TB respectively and 127TB total. So even if the data is balanced, the performance of this pool will still start to degrade once ~84TB (about 2/3 full) are used. So the only viable long term solution is a rebuild, or putting bigger drives in the two smallest vdevs. In the short term, when I've had similar issues I used zfs send to copy a large filesystem within the pool then renamed the copy to the original name and deleted the original. This can be repeated until you have an acceptable distribution. One last thing: unless this is some form of backup pool, or the data on it isn't important, avoid raidz vdevs in such a large pool! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Bizarre receive error
I recently had to recover a lot of data from my backup pool which is on a Solaris 11 system. I'm now sending regular snapshots back to the pool and all was well until the pool became nearly full. I then started getting receive failures: receiving incremental stream of tank/vbox/windows@Wednesday_1800 into backup/vbox/windows@Wednesday_1800 zfs_receive: Can't mount a version 6 file system on a version 33 pool. Pool must be upgraded to mount this file system. When I freed up space on the pool, the errors stopped: receiving incremental stream of tank/vbox/windows@Wednesday_1800 into backup/vbox/windows@Wednesday_1800 received 380MB stream in 18 seconds (21.1MB/sec) On the Solaris 11.1 sender: zfs get -H version tank/vbox/windows tank/vbox/windows version 5 - Odd! I assume an error code was being misreported. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
Jim Klimov wrote: On 2013-01-23 09:41, casper@oracle.com wrote: Yes and no: the system reserves a lot of additional memory (Solaris doesn't over-commits swap) and swap is needed to support those reservations. Also, some pages are dirtied early on and never touched again; those pages should not be kept in memory. I believe, by the symptoms, that this is what happens often in particular to Java processes (app-servers and such) - I do regularly see these have large VM sizes and much (3x) smaller RSS sizes. Being swapped out is probably the best thing that can be done to most Java processes :) -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
Darren J Moffat wrote: It is a mechanism for part of the storage system above the disk (eg ZFS) to inform the disk that it is no longer using a given set of blocks. This is useful when using an SSD - see Saso's excellent response on that. However it can also be very useful when your disk is an iSCSI LUN. It allows the filesystem layer (eg ZFS or NTFS, etc) when on iSCSI LUN that advertises SCSI UNMAP to tell the target there are blocks in that LUN it isn't using any more (eg it just deleted some blocks). That is something I have been waiting a long time for! I have to run a periodic fill the pool with zeros cycle on a couple of iSCSI backed pools to reclaim free space. I guess the big question is do oracle storage appliances advertise SCSI UNMAP? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Odd snapshots exposed in Solaris 11.1
Since upgrading to Solaris 11.1, I've started seeing snapshots like tank/vbox/shares%VMs appearing with zfs list -t snapshot. I thought snapshots with a % in their name where private objects created during a send/receive operation. These snapshots don't have many properties: zfs get all tank/vbox/shares%VMs NAME PROPERTYVALUE SOURCE tank/vbox/shares%VMs creationTue Jan 15 9:15 2013 - tank/vbox/shares%VMs mountpoint /vbox/shares - tank/vbox/shares%VMs share.* ...local tank/vbox/shares%VMs zoned offdefault Which is casing one of my scripts grief. Does anyone know why these are showing up? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)
Cindy Swearingen wrote: Hi Jamie, Yes, that is correct. The S11u1 version of this bug is: https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599 and has this notation which means Solaris 11.1 SRU 3.4: Changeset pushed to build 0.175.1.3.0.4.0 Hello Cindy, I really really hope this will be a public update. Within a week of upgrading to 11.1 I hit this bug and I had to rebuild my main pool. I'm still restoring backups. Without this fix, 11.1 is a bomb waiting to go off! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool error in metadata:0x0
Jim Klimov wrote: I've had this error on my pool since over a year ago, when I posted and asked about it. The general consent was that this is only fixable by recreation of the pool, and that if things don't die right away, the problem may be benign (i.e. in some first blocks of MOS that are in practice written once and not really used nor relied upon). In detailed zpool status this error shows as: metadata:0x0 By analogy to other errors in unnamed files, this was deemed to be the MOS dataset, object number 0. Unlike you, I haven't had to time or patience to dig deeper into this! The only times I have seen this error are in iSCSI pools when the target machine's pool became full, causing bizarre errors in the iSCSI client pools. Once the underlying problem was fixed and the pools imported and exported, the error went away. This might enable you to recreate the error for testing. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools
Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Jim Klimov I really hope someone better versed in compression - like Saso - would chime in to say whether gzip-9 vs. lzjb (or lz4) sucks in terms of read-speeds from the pools. My HDD-based assumption is in general that the less data you read (or write) on platters - the better, and the spare CPU cycles can usually take the hit. Oh, I can definitely field that one - The lzjb compression (default compression as long as you just turn compression on without specifying any other detail) is very fast compression, similar to lzo. It generally has no noticeable CPU overhead, but it saves you a lot of time and space for highly repetitive things like text files (source code) and sparse zero-filled files and stuff like that. I personally always enable this. compresson=on zlib (gzip) is more powerful, but *way* slower. Even the fastest level gzip-1 uses enough CPU cycles that you probably will be CPU limited rather than IO limited. I haven't seen that for a long time. When gzip compression was first introduced, it would cause writes on a Thumper to be CPU bound. It was all but unusable on that machine. Today with better threading, I barely notice the overhead on the same box. There are very few situations where this option is better than the default lzjb. That part I do agree with! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Appliance as a general-purpose server question
On 11/23/12 05:50, Jim Klimov wrote: On 2012-11-22 17:31, Darren J Moffat wrote: Is it possible to use the ZFS Storage appliances in a similar way, and fire up a Solaris zone (or a few) directly on the box for general-purpose software; or to shell-script administrative tasks such as the backup archive management in the global zone (if that concept still applies) as is done on their current Solaris-based box? No it is a true appliance, it might look like it has Solaris underneath but it is just based on Solaris. You can script administrative tasks but not using bash/ksh style scripting you use the ZFSSA's own scripting language. So, the only supported (or even possible) way is indeed to us it as NAS for file or block IO from another head running the database or application servers?.. Yes. I wonder if it would make weird sense to get the boxes, forfeit the cool-looking Fishworks, and install Solaris/OI/Nexenta/whatever to get the most flexibility and bang for a buck from the owned hardware... Or, rather, shop for the equivalent non-appliance servers... As Tim Cook says, that would be a very expensive option. I'm sure Oracle dropped the Thumper line because they competed head on with the appliances and gave way more flexibility. If you are experienced with Solaris and ZFS, you will find using appliances very frustrating! You can't use the OS as you would like and you have to go through support when you would other wise fix things yourself. In my part of the world, that isn't much fun. Buy and equivalent JBOD and head unit and pretend you have a new Thumper. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel DC S3700
On 11/14/12 12:28, Jim Klimov wrote: On 2012-11-13 22:56, Mauricio Tavares wrote: Trying again: Intel just released those drives. Any thoughts on how nicely they will play in a zfs/hardware raid setup? Seems interesting - fast, assumed reliable and consistent in its IOPS (according to marketing talk), addresses power loss reliability (acc. to datasheet): * Endurance Rating - 10 drive writes/day over 5 years while running JESD218 standard * The Intel SSD DC S3700 supports testing of the power loss capacitor, which can be monitored using the following SMART attribute: (175, AFh). snip All in all, I can't come up with anything offensive against it quickly ;) One possible nit regards the ratings being geared towards 4KB block (which is not unusual with SSDs), so it may be further from announced performance with other block sizes - i.e. when caching ZFS metadata. I can't help thinking these drives would be overkill for an ARC device. All of the expensive controller hardware is geared to boosting random write IOPs, which somewhat wasted on a write slowly, read often device. The enhancements would be good for a ZIL, but the smallest drive is at least an order of magnitude too big... -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Woeful performance from an iSCSI pool
I look after a remote server that has two iSCSI pools. The volumes for each pool are sparse volumes and a while back the target's storage became full, causing weird and wonderful corruption issues until they manges to free some space. Since then, one pool has been reasonably OK, but the other has terrible performance receiving snapshots. Despite both iSCSI devices using the same IP connection, iostat shows one with reasonable service times while the other shows really high (up to 9 seconds) service times and 100% busy. This kills performance for snapshots with many random file removals and additions. I'm currently zero filling the bad pool to recover space on the target storage to see if that improves matters. Has anyone else seen similar behaviour with previously degraded iSCSI pools? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Woeful performance from an iSCSI pool
On 11/22/12 10:15, Ian Collins wrote: I look after a remote server that has two iSCSI pools. The volumes for each pool are sparse volumes and a while back the target's storage became full, causing weird and wonderful corruption issues until they manges to free some space. Since then, one pool has been reasonably OK, but the other has terrible performance receiving snapshots. Despite both iSCSI devices using the same IP connection, iostat shows one with reasonable service times while the other shows really high (up to 9 seconds) service times and 100% busy. This kills performance for snapshots with many random file removals and additions. I'm currently zero filling the bad pool to recover space on the target storage to see if that improves matters. Has anyone else seen similar behaviour with previously degraded iSCSI pools? As a data point, both pools are being zero filled with dd. A 30 second iostat sample shows one device getting more than double the write throughput of the other: r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 0.2 64.00.0 50.1 0.0 5.60.7 87.9 4 64 c0t600144F096C94AC74ECD96F20001d0 5.6 44.90.0 18.2 0.0 5.80.3 115.7 2 76 c0t600144F096C94AC74FF354B2d0 -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
On 11/14/12 15:20, Dan Swartzendruber wrote: Well, I think I give up for now. I spent quite a few hours over the last couple of days trying to get gnome desktop working on bare-metal OI, followed by virtualbox. Supposedly that works in headless mode with RDP for management, but nothing but fail for me. Found quite a few posts on various forums of people complaining that RDP with external auth doesn't work (or not reliably), and that was my experience. The final straw was when I rebooted the OI server as part of cleaning things up, and... It hung. Last line in verbose boot log is 'ucode0 is /pseudo/ucode@0'. I power-cycled it to no avail. Even tried a backup BE from hours earlier, to no avail. Likely whatever was bunged happened prior to that. If I could get something that ran like xen or kvm reliably for a headless setup, I'd be willing to give it a try, but for now, no... SmartOS. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Strange mount -a problem in Solaris 11.1
On 10/31/12 23:35, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Ian Collins Have have a recently upgraded (to Solaris 11.1) test system that fails to mount its filesystems on boot. Running zfs mount -a results in the odd error #zfs mount -a internal error Invalid argument truss shows the last call as ioctl(3, ZFS_IOC_OBJECT_STATS, 0xF706BBB0) The system boots up fine in the original BE. The root (only) pool in a single drive. Any ideas? devfsadm -Cv rm /etc/zfs/zpool.cache init 6 That was a big enough stick to fix it. Nasty bug none the less. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Strange mount -a problem in Solaris 11.1
Have have a recently upgraded (to Solaris 11.1) test system that fails to mount its filesystems on boot. Running zfs mount -a results in the odd error #zfs mount -a internal error Invalid argument truss shows the last call as ioctl(3, ZFS_IOC_OBJECT_STATS, 0xF706BBB0) The system boots up fine in the original BE. The root (only) pool in a single drive. Any ideas? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to older version
On 10/18/12 21:09, Michel Jansens wrote: Hi, I've been using a Solaris 10 update 9 machine for some time to replicate filesystems from different servers through zfs send|ssh zfs receive. This was done to store disaster recovery pools. The DR zpools are made from sparse files (to allow for easy/efficient backup to tape). Now I've installed a Solaris 11 machine and a SmartOS one. When I try to replicate the pools from those machines, I get an error because filesystem/pool version don't support some features/properties on the solaris 10u9. Is there a way (apart from rsync) to send a snapshot from a newer zpool to an older one? You have to create pools/filesystems with the older versions used by the destination machine. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS best practice for FreeBSD?
On 10/13/12 22:13, Jim Klimov wrote: 2012-10-13 0:41, Ian Collins пишет: On 10/13/12 02:12, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: There are at least a couple of solid reasons *in favor* of partitioning. #1 It seems common, at least to me, that I'll build a server with let's say, 12 disk slots, and we'll be using 2T disks or something like that. The OS itself only takes like 30G which means if I don't partition, I'm wasting 1.99T on each of the first two disks. As a result, when installing the OS, I always partition rpool down to ~80G or 100G, and I will always add the second partitions of the first disks to the main data pool. How do you provision a spare in that situation? Technically - you can layout the spare disks similarly and attach the partitions or slices as spares for pools. I probably didn't didn't make my self clear, so I'll try again! Assuming the intention is to get the most storage from your drives. If you add the remainder of the space on the drives you have partitioned for the root pool to the main pool giving a mix of device sizes in the pool, how do you provision a spare? That's why I have never done this. I use whole drives everywhere and as you mention further down, use the spare space in the root pool for scratch filesystems. However, in servers I've seen there were predominantly different layout designs: 1) Dedicated root disks/mirrors - small enough for rpool/swap tasks, nowadays perhaps SSDs or CF cards - especially if care was taken to use the rpool device mostly for reads and place all writes like swap and logs onto other pools; 2) For smaller machines with 2 or 4 disks, a partition (slice) is made for rpool sized about 10-20Gb, and the rest is for data pool vdevs. In case of 4-disk machines, the rpool can be a two-way mirror and the other couple of disks can host swap and/or dump in an SVM or ZFS mirror for example. The data pool components are identically sized and form a mirror, raid10 or a raidz1; rarely a raidz2 - that is assumed to have better resilience to loss of ANY two disks than a raid10 resilient to loss of CORRECT two disks (from different mirrors). 3) For todays computers with all disks being big, I'd also make a smallish rpool, a large data pool on separate disks, and use the extra space on the disks with rpool for something else - be it swap in SVM-mirrored partition, a scratch pool for incoming data or tests, etc. Most of the system I have built up this year are 2U boxes with 8 to 12 (2TB) drives. I expect these are very common at the moment. I use your third option but I tend to just create a big rpool mirror and add a scratch filesystem rather than partitioning the drives. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Using L2ARC on an AdHoc basis.
On 10/14/12 10:02, Michael Armstrong wrote: Hi Guys, I have a portable pool i.e. one that I carry around in an enclosure. However, any SSD I add for L2ARC, will not be carried around... meaning the cache drive will become unavailable from time to time. My question is Will random removal of the cache drive put the pool into a degraded state or affect the integrity of the pool at all? Additionally, how adversely will this effect warm up... Or will moving the enclosure between machines with and without cache, just automatically work, and offer benefits when cache is available, and less benefits when it isn't? Why bother with cache devices at all if you are moving the pool around? As you hinted above, the cache can take a while to warm up and become useful. You should zpool remove the cache device before exporting the pool. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS best practice for FreeBSD?
On 10/13/12 02:12, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: There are at least a couple of solid reasons *in favor* of partitioning. #1 It seems common, at least to me, that I'll build a server with let's say, 12 disk slots, and we'll be using 2T disks or something like that. The OS itself only takes like 30G which means if I don't partition, I'm wasting 1.99T on each of the first two disks. As a result, when installing the OS, I always partition rpool down to ~80G or 100G, and I will always add the second partitions of the first disks to the main data pool. How do you provision a spare in that situation? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question
On 10/08/12 20:08, Tiernan OToole wrote: Ok, so, after reading a bit more of this discussion and after playing around at the weekend, i have a couple of questions to ask... 1: Do my pools need to be the same? for example, the pool in the datacenter is 2 1Tb drives in Mirror. in house i have 5 200Gb virtual drives in RAIDZ1, giving 800Gb usable. If i am backing up stuff to the home server, can i still do a ZFS Send, even though underlying system is different? Yes you can, just make sure you have enough space! 2: If i give out a partition as an iSCSI LUN, can this be ZFS Sended as normal, or is there any difference? It can be sent as normal. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question
On 10/05/12 21:36, Jim Klimov wrote: 2012-10-05 11:17, Tiernan OToole wrote: Also, as a follow up question, but slightly unrelated, when it comes to the ZFS Send, i could use SSH to do the send, directly to the machine... Or i could upload the compressed, and possibly encrypted dump to the server... Which, for resume-ability and speed, would be suggested? And if i where to go with an upload option, any suggestions on what i should use? As for this, the answer depends on network bandwidth, reliability, and snapshot file size - ultimately, on the probability and retry cost of an error during transmission. Many posters on the list strongly object to using files as storage for snapshot streams, because in reliability this is (may be) worse than a single-disk pool and bitrot on it - a single-bit error in a snapshot file can render it and all newer snapshots invalid and un-importable. Still, given enough scratch space on the sending and receiving sides and a bad (slow, glitchy) network in-between, I did go with compressed files of zfs-send streams (perhaps making recursion myself and using smaller files of one snapshot each - YMMV). For compression on multiCPU senders I can strongly suggest pigz --fast $filename (I did have problems in pigz-1.7.1 compressing several files with one command, maybe that's fixed now). If you're tight on space/transfer size more than on CPU, you can try other parallel algos - pbzip2, p7zip, etc. Likewise, you can also pass the file into an encryptor of your choice. I do have to suffer a slow, glitchy WAN to a remote server and rather than send stream files, I broke the data on the remote server into a more fine grained set of filesystems than I would do normally. In this case, I made the directories under what would have been the leaf filesystems filesystems themselves. By spreading the data over more filesystems, the individual incremental sends are smaller, so there is less data to resend if the link burps during a transfer. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question
On 10/06/12 07:57, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Frank Cusack On Fri, Oct 5, 2012 at 3:17 AM, Ian Collinsi...@ianshome.com wrote: I do have to suffer a slow, glitchy WAN to a remote server and rather than send stream files, I broke the data on the remote server into a more fine grained set of filesystems than I would do normally. In this case, I made the directories under what would have been the leaf filesystems filesystems themselves. Meaning you also broke the data on the LOCAL server into the same set of more granular filesystems? Or is it now possible to zfs send a subdirectory of a filesystem? zfs create instead of mkdir As Ian said - he didn't zfs send subdirs, he made filesystems where he otherwise would have used subdirs. That right. I do have a lot of what would appear to be unnecessary filesystems, but after loosing the WAN 3 days into a large transfer, a change of tactic was required! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Should clearing the share property on a clone unshare the origin?
I've noticed on a Solaris 11 system that when I clone a filesystem and change the share property: #zfs clone -p -o atime=off filesystem@snapshot clone #zfs set -c share=name=old share clone #zfs set share=name=new NFS share clone #zfs set sharenfs=on clone The origin filesystem is no longer shared (the clone is successfully shared). The share and sharenfs properties on the origin filesystem are unchanged. I have to run zfs share on the origin filesystem to restore the share. Feature or a bug?? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] all in one server
On 09/19/12 02:38 AM, Sašo Kiselkov wrote: On 09/18/2012 04:31 PM, Eugen Leitl wrote: Can I actually have a year's worth of snapshots in zfs without too much performance degradation? Each additional dataset (not sure about snapshots, though) increases boot times slightly, however, I've seen pools with several hundred datasets without any serious issues, so yes, it is possible. Be prepared, though, that the data volumes might be substantial (depending on your overall data turn-around per unit time between the snapshots). The boot overhead for many (in my case 1200) filesystems isn't as bad as it was. On our original Thumper I had to amalgamate all our user home directories into one filesystem due to slow boot. Now I have split them again to send over a slow WAN... Large numbers of snapshots (10's of thousands) don't appear to impact boot times. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zvol vs zfs send/zfs receive
On 09/15/12 04:46 PM, Dave Pooser wrote: I need a bit of a sanity check here. 1) I have a a RAIDZ2 of 8 1TB drives, so 6TB usable, running on an ancient version of OpenSolaris (snv_134 I think). On that zpool (miniraid) I have a zvol (RichRAID) that's using almost the whole FS. It's shared out via COMSTAR Fibre Channel target mode. I'd like to move that zvol to a newer server with a larger zpool. Sounds like a job for ZFS send/receive, right? 2) Since ZFS send/receive is snapshot-based I need to create a snapshot. Unfortunately I did not realize that zvols require disk space sufficient to duplicate the zvol, and my zpool wasn't big enough. To do what? A snapshot only starts to consume space when data in the filesystem/volume changes. After a false start (zpool add is dangerous when low on sleep) I added a 250GB mirror and a pair of 3GB mirrors to miniraid and was able to successfully snapshot the zvol: miniraid/RichRAID@exportable (I ended up booting off an OI 151a5 USB stick to make that work, since I don't believe snv_134 could handle a 3TB disk). 3) Now it's easy, right? I enabled root login via SSH on the new host, which is running a zpool archive1 consisting of a single RAIDZ2 of 3TB drives using ashift=12, and did a ZFS send: ZFS send miniraid/RichRAID@exportable | ssh root@newhost zfs receive archive1/RichRAID It asked for the root password, I gave it that password, and it was off and running. GigE ain't super fast, but I've got time. The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB. That... doesn't look right. (Comparing zfs list -t snapshot and looking at the 5.34 ref for the snapshot vs zfs list on the new system and looking at space used.) Is this a problem? Should I be panicking yet? No. Do you have compression on on one side but no the other? Either way, let things run to completion. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scripting incremental replication data streams
On 09/13/12 07:44 AM, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: I send a replication data stream from one host to another. (and receive). I discovered that after receiving, I need to remove the auto-snapshot property on the receiving side, and set the readonly property on the receiving side, to prevent accidental changes (including auto-snapshots.) Question #1:Actually, do I need to remove the auto-snapshot on the receiving side?Or is it sufficient to simply set the readonly property?Will the readonly property prevent auto-snapshots from occurring? So then, sometime later, I want to send an incremental replication stream.I need to name an incremental source snap on the sending side...which needs to be the latest matching snap that exists on both sides. Question #2:What's the best way to find the latest matching snap on both the source and destination?At present, it seems, I'll have to build a list of sender snaps, and a list of receiver snaps, and parse and search them, till I find the latest one that exists in both.For shell scripting, this is very non-trivial. That's pretty much how I do it. Get the two (sorted) sets of snapshots, remove those that only exist on the remote end (ageing) and send those that only exist locally. The first incremental pair will be the last common snapshot and the first unique local snapshot. I haven't tried this in a script, but it's quite straightforward in C++ using the standard library set container and algorithms. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scripting incremental replication data streams
On 09/13/12 10:23 AM, Timothy Coalson wrote: Unless i'm missing something, they didn't solve the matching snapshots thing yet, from their site: To Do: Additional error handling for mismatched snapshots (last destination snap no longer exists on the source) walk backwards through the remote snaps until a common snapshot is found and destroy non-matching remote snapshots That's what I do as party of my destroy snapshots not on the source check. Over many years of managing various distributed systems, I've discovered the apparently simple tends to get complex! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what have you been buying for slog and l2arc?
On 08/ 4/12 09:50 PM, Eugen Leitl wrote: On Fri, Aug 03, 2012 at 08:39:55PM -0500, Bob Friesenhahn wrote: Extreme write IOPS claims in consumer SSDs are normally based on large write caches which can lose even more data if there is a power failure. Intel 311 with a good UPS would seem to be a reasonable tradeoff. The 313 series looks like a consumer price SLC drive aimed at the recent trend in windows cache drives. Should be worth a look. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Benefits of enabling compression in ZFS for the zones
On 07/10/12 09:25 PM, Jordi Espasa Clofent wrote: Hi all, By default I'm using ZFS for all the zones: admjoresp@cyd-caszonesrv-15:~$ zfs list NAME USED AVAIL REFER MOUNTPOINT opt 4.77G 45.9G 285M /opt opt/zones 4.49G 45.9G29K /opt/zones opt/zones/glad-gm02-ftcl01 367M 45.9G 367M /opt/zones/glad-gm02-ftcl01 opt/zones/glad-gp02-ftcl01 502M 45.9G 502M /opt/zones/glad-gp02-ftcl01 opt/zones/glad-gp02-ftcl02 1.21G 45.9G 1.21G /opt/zones/glad-gp02-ftcl02 opt/zones/mbd-tcasino-02 257M 45.9G 257M /opt/zones/mbd-tcasino-02 opt/zones/mbd-tcasino-04 281M 45.9G 281M /opt/zones/mbd-tcasino-04 opt/zones/mbfd-gp02-ftcl01 501M 45.9G 501M /opt/zones/mbfd-gp02-ftcl01 opt/zones/mbfd-gp02-ftcl02 475M 45.9G 475M /opt/zones/mbfd-gp02-ftcl02 opt/zones/mbhd-gp02-ftcl01 475M 45.9G 475M /opt/zones/mbhd-gp02-ftcl01 opt/zones/mbhd-gp02-ftcl02 507M 45.9G 507M /opt/zones/mbhd-gp02-ftcl02 However, I have the compression disabled in all of them. According to this Oracle whitepaper http://www.oracle.com/technetwork/server-storage/solaris10/solaris-zfs-in-containers-wp-167903.pdf: The next example demonstrates the compression property. If compression is enabled, Oracle Solaris ZFS will transparently compress all of the data before it is written to disk. The benefits of compression are both saved disk space and possible write speed improvements. What exactly means POSSIBLE write speed improvements? With compression enabled, less data has to be written to disk, so N bytes writes in N/compress ratio time. On most systems, the performance cost of compressing and uncompressing data is relatively low. As you can see above I don't use to have any room problems, so if I'm going to enable the compression flag it has to be because of the write speed improvements. I always enable compression by default and only turn it off for filesystems I know hold un-compressible data such as media files. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scenario sanity check
On 07/10/12 05:26 AM, Brian Wilson wrote: Yep, thanks, and to answer Ian with more detail on what TruCopy does. TruCopy mirrors between the two storage arrays, with software running on the arrays, and keeps a list of dirty/changed 'tracks' while the mirror is split. I think they call it something other than 'tracks' for HDS, but, whatever. When it resyncs the mirrors it sets the target luns read-only (which is why I export the zpools first), and the source array reads the changed tracks, and writes them across dedicated mirror ports and fibre links to the target array's dedicated mirror ports, which then brings the target luns up to synchronized. So, yes, like Richard says, there is IO, but it's isolated to the arrays, and it's scheduled as lower priority on the source array than production traffic. For example it can take an hour or more to re-synchronize a particularly busy 250 GB lun. (though you can do more than one at a time without it taking longer or impacting production any more unless you choke the mirror links, which we do our best not to do) That lower priority, dedicated ports on the arrays, etc, all makes the noticaeble impact on the production storage luns from the production server as un-noticable as I can make it in my environment. Thank you for the background on TruCopy. Reading the above, it looks like you can have pretty long time without a true copy! I guess my view on replication is you are always going to have X number of I/O operations and now dense they are depends on how up to date you want you're copy to be. What I still don't understand is why a service interruption is preferable to a wee bit more I/O? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scenario sanity check
On 07/ 7/12 08:34 AM, Brian Wilson wrote: Hello, I'd like a sanity check from people more knowledgeable than myself. I'm managing backups on a production system. Previously I was using another volume manager and filesystem on Solaris, and I've just switched to using ZFS. My model is - Production Server A Test Server B Mirrored storage arrays (HDS TruCopy if it matters) Backup software (TSM) Production server A sees the live volumes. Test Server B sees the TruCopy mirrors of the live volumes. (it sees the second storage array, the production server sees the primary array) Production server A shuts down zone C, and exports the zpools for zone C. Production server A splits the mirror to secondary storage array, leaving the mirror writable. Production server A re-imports the pools for zone C, and boots zone C. Test Server B imports the ZFS pool using -R /backup. Backup software backs up the mounted mirror volumes on Test Server B. Later in the day after the backups finish, a script exports the ZFS pools on test server B, and re-establishes the TruCopy mirror between the storage arrays. That looks awfully complicated. Why don't you just clone a snapshot and back up the clone? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scenario sanity check
On 07/ 7/12 11:29 AM, Brian Wilson wrote: On 07/ 6/12 04:17 PM, Ian Collins wrote: On 07/ 7/12 08:34 AM, Brian Wilson wrote: Hello, I'd like a sanity check from people more knowledgeable than myself. I'm managing backups on a production system. Previously I was using another volume manager and filesystem on Solaris, and I've just switched to using ZFS. My model is - Production Server A Test Server B Mirrored storage arrays (HDS TruCopy if it matters) Backup software (TSM) Production server A sees the live volumes. Test Server B sees the TruCopy mirrors of the live volumes. (it sees the second storage array, the production server sees the primary array) Production server A shuts down zone C, and exports the zpools for zone C. Production server A splits the mirror to secondary storage array, leaving the mirror writable. Production server A re-imports the pools for zone C, and boots zone C. Test Server B imports the ZFS pool using -R /backup. Backup software backs up the mounted mirror volumes on Test Server B. Later in the day after the backups finish, a script exports the ZFS pools on test server B, and re-establishes the TruCopy mirror between the storage arrays. That looks awfully complicated. Why don't you just clone a snapshot and back up the clone? Taking a snapshot and cloning incurs IO. Backing up the clone incurs a lot more IO reading off the disks and going over the network. These aren't acceptable costs in my situation. So splitting a mirror and reconnecting it doesn't incur I/O? The solution is complicated if you're starting from scratch. I'm working in an environment that already had all the pieces in place (offsite synchronous mirroring, a test server to mount stuff up on, scripts that automated the storage array mirror management, etc). It was setup that way specifically to accomplish short downtime outages for cold backups with minimal or no IO hit to production. So while it's complicated, when it was put together it was also the most obvious thing to do to drop my backup window to almost nothing, and keep all the IO from the backup from impacting production. And like I said, with a different volume manager, it's been rock solid for years. So, to ask the sanity check more specifically - Is it reasonable to expect ZFS pools to be exported, have their luns change underneath, then later import the same pool on those changed drives again? If you were splitting ZFS mirrors to read data from one half all would be sweet (and you wouldn't have to export the pool). I guess the question here is what does TruCopy do under the hood when you re-connect the mirror? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sol11 missing snapshot facility
On 07/ 5/12 06:52 PM, Carsten John wrote: Hello everybody, for some reason I can not find the zfs-autosnapshot service facility any more. I already reinstalles time-slider, but it refuses to start: RuntimeError: Error reading SMF schedule instances Details: ['/usr/bin/svcs', '-H', '-o', 'state', 'svc:/system/filesystem/zfs/auto-snapshot:monthly'] failed with exit code 1 svcs: Pattern 'svc:/system/filesystem/zfs/auto-snapshot:monthly' doesn't match any instances Have you looked with svcs -a? # svcs -a | grep zfs disabled Jul_02 svc:/system/filesystem/zfs/auto-snapshot:daily disabled Jul_02 svc:/system/filesystem/zfs/auto-snapshot:frequent disabled Jul_02 svc:/system/filesystem/zfs/auto-snapshot:hourly disabled Jul_02 svc:/system/filesystem/zfs/auto-snapshot:monthly disabled Jul_02 svc:/system/filesystem/zfs/auto-snapshot:weekly disabled Jul_02 svc:/application/time-slider/plugin:zfs-send -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sol11 missing snapshot facility
On 07/ 5/12 09:25 PM, Carsten John wrote: Hi Ian, yes, I already checked that: svcs -a | grep zfs disabled 11:50:39 svc:/application/time-slider/plugin:zfs-send is the only service I get listed. Odd. How did you install? Is the manifest there (/lib/svc/manifest/system/filesystem/auto-snapshot.xml)? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sol11 missing snapshot facility
On 07/ 5/12 11:32 PM, Carsten John wrote: -Original message- To: Carsten Johncj...@mpi-bremen.de; CC: zfs-discuss@opensolaris.org; From: Ian Collinsi...@ianshome.com Sent: Thu 05-07-2012 11:35 Subject:Re: [zfs-discuss] Sol11 missing snapshot facility On 07/ 5/12 09:25 PM, Carsten John wrote: Hi Ian, yes, I already checked that: svcs -a | grep zfs disabled 11:50:39 svc:/application/time-slider/plugin:zfs-send is the only service I get listed. Odd. How did you install? Is the manifest there (/lib/svc/manifest/system/filesystem/auto-snapshot.xml)? Hi Ian, I installed from CD/DVD, but it might have been in a rush, as I needed to replace a broken machine as quick as possible. The manifest is there: ls /lib/svc/manifest/system/filesystem/ . .. auto-snapshot.xml autofs.xml local-fs.xml minimal-fs.xml rmvolmgr.xml root-fs.xml ufs-quota.xml usr-fs.xml Running svcadm restart manifest-import should load it, or give you some idea why it won't load. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?
On 05/29/12 08:42 AM, Richard Elling wrote: On May 28, 2012, at 2:48 AM, Ian Collins wrote: On 05/28/12 08:55 PM, Sašo Kiselkov wrote: .. If the drives show up at all, chances are you only need to work around the power-up issue in Dell HDD firmware. Here's what I had to do to get the drives going in my R515: /kernel/drv/sd.conf sd-config-list = SEAGATE ST3300657SS, power-condition:false, SEAGATE ST2000NM0001, power-condition:false; (that's for Seagate 300GB 15k SAS and 2TB 7k2 SAS drives, depending on your drive model the strings might differ) How would that work when the drive type is unknown (to format)? I assumed if sd knows the type, so will format. I haven't looked at the code recently, but if it is the same parser as used elsewhere, then a partial match should work. Can someone try it out and report back to the list? sd-config-list = SEAGATE ST, power-condition:false; Well I finally got back to testing this box... Yes, that shorthand fixes the power-up issue (tested from a cold start). -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very sick iSCSI pool
On 07/ 1/12 08:57 PM, Ian Collins wrote: On 07/ 1/12 10:20 AM, Fajar A. Nugraha wrote: On Sun, Jul 1, 2012 at 4:18 AM, Ian Collinsi...@ianshome.com wrote: On 06/30/12 03:01 AM, Richard Elling wrote: Hi Ian, Chapter 7 of the DTrace book has some examples of how to look at iSCSI target and initiator behaviour. Thanks Richard, I 'll have a look. I'm assuming the pool is hosed? Before making that assumption, I'd try something simple first: - reading from the imported iscsi disk (e.g. with dd) to make sure it's not iscsi-related problem - import the disk in another host, and try to read the disk again, to make sure it's not client-specific problem - possibly restart the iscsi server, just to make sure Booting the initiator host from a live DVD image and attempting to import the pool gives the same error report. The pool's data appears to be recoverable when I import it read only. The storage appliance is so full they can't delete files from it! Now that shouldn't have caused problems with a fixed sized volume, but who knows? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very sick iSCSI pool
On 07/ 1/12 10:20 AM, Fajar A. Nugraha wrote: On Sun, Jul 1, 2012 at 4:18 AM, Ian Collinsi...@ianshome.com wrote: On 06/30/12 03:01 AM, Richard Elling wrote: Hi Ian, Chapter 7 of the DTrace book has some examples of how to look at iSCSI target and initiator behaviour. Thanks Richard, I 'll have a look. I'm assuming the pool is hosed? Before making that assumption, I'd try something simple first: - reading from the imported iscsi disk (e.g. with dd) to make sure it's not iscsi-related problem - import the disk in another host, and try to read the disk again, to make sure it's not client-specific problem - possibly restart the iscsi server, just to make sure Booting the initiator host from a live DVD image and attempting to import the pool gives the same error report. I suspect the problem is with your oracle storage appliance. But since you say there's no errors there, then the simple tests should make sure whethere it's client, disk, or zfs problem. So did I. I'll get the admin for that system to dig a little deeper and export a new volume to see if I can create a new pool. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very sick iSCSI pool
On 06/30/12 03:01 AM, Richard Elling wrote: Hi Ian, Chapter 7 of the DTrace book has some examples of how to look at iSCSI target and initiator behaviour. Thanks Richard, I 'll have a look. I'm assuming the pool is hosed? -- richard On Jun 28, 2012, at 10:47 PM, Ian Collins wrote: I'm trying to work out the case a remedy for a very sick iSCSI pool on a Solaris 11 host. The volume is exported from an Oracle storage appliance and there are no errors reported there. The host has no entries in its logs relating to the network connections. Any zfs or zpool commands the change the state of the pool (such as zfs mount or zpool export) hang and can't be killed. fmadm faulty reports: Jun 27 14:04:24 536fb2ad-1fca-c8b2-fc7d-f5a4a94c165d ZFS-8000-FD Major Host: taitaklsc01 Platform: SUN-FIRE-X4170-M2-SERVER Chassis_id : 1142FMM02N Product_sn : 1142FMM02N Fault class : fault.fs.zfs.vdev.io Affects : zfs://pool=fileserver/vdev=68c1bdefa6f97db8 faulted but still in service Problem in : zfs://pool=fileserver/vdev=68c1bdefa6f97db8 faulted but still in service Description : The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. The zpool status paints a very gloomy picture: pool: fileserver state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Jun 29 11:59:59 2012 858K scanned out of 15.7T at 43/s, (scan is slow, no estimated time) 567K resilvered, 0.00% done config: NAME STATE READ WRITE CKSUM fileserver ONLINE 0 1.16M 0 c0t600144F096C94AC74ECD96F20001d0 ONLINE 0 1.16M 0 (resilvering) errors: 1557164 data errors, use '-v' for a list Any ideas how to determine the cause of the problem and remedy it? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- ZFS Performance and Training richard.ell...@richardelling.com mailto:richard.ell...@richardelling.com +1-760-896-4422 -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Very sick iSCSI pool
I'm trying to work out the case a remedy for a very sick iSCSI pool on a Solaris 11 host. The volume is exported from an Oracle storage appliance and there are no errors reported there. The host has no entries in its logs relating to the network connections. Any zfs or zpool commands the change the state of the pool (such as zfs mount or zpool export) hang and can't be killed. fmadm faulty reports: Jun 27 14:04:24 536fb2ad-1fca-c8b2-fc7d-f5a4a94c165d ZFS-8000-FDMajor Host: taitaklsc01 Platform: SUN-FIRE-X4170-M2-SERVER Chassis_id : 1142FMM02N Product_sn : 1142FMM02N Fault class : fault.fs.zfs.vdev.io Affects : zfs://pool=fileserver/vdev=68c1bdefa6f97db8 faulted but still in service Problem in : zfs://pool=fileserver/vdev=68c1bdefa6f97db8 faulted but still in service Description : The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. The zpool status paints a very gloomy picture: pool: fileserver state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Jun 29 11:59:59 2012 858K scanned out of 15.7T at 43/s, (scan is slow, no estimated time) 567K resilvered, 0.00% done config: NAME STATE READ WRITE CKSUM fileserver ONLINE 0 1.16M 0 c0t600144F096C94AC74ECD96F20001d0 ONLINE 0 1.16M 0 (resilvering) errors: 1557164 data errors, use '-v' for a list Any ideas how to determine the cause of the problem and remedy it? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?
On 05/ 7/12 04:08 PM, Ian Collins wrote: On 05/ 7/12 03:42 PM, Greg Mason wrote: I am currently trying to get two of these things running Illumian. I don't have any particular performance requirements, so I'm thinking of using some sort of supported hypervisor, (either RHEL and KVM or VMware ESXi) to get around the driver support issues, and passing the disks through to an Illumian guest. The H310 does indeed support pass-through (the non-raid mode), but one thing to keep in mind is that I was only able to configure a single boot disk. I configured the rear two drives into a hardware raid 1 and set the virtual disk as the boot disk so that I can still boot the system if an OS disk fails. Once Illumos is better supported on the R720 and the PERC H310, I plan to get rid of the hypervisor silliness and run Illumos on bare metal. Thank you for the feedback Greg. Using a hypervisor layer is our fall-back position. My next attempt would be SmartOs if I can't get the cards swapped (the R720 currently has a Broadcom 5720 NIC). To follow up, the H310 appears to be useless in non-raid mode. The drives do show up in Solaris 11 format, but they show up as unknown, unformatted drives. One oddity is the box has two SATA SSDs which also show up the card's BIOS, but present OK to Solaris. I'd like to re-FLASH the cards, but I don't think Dell would be too happy with me doing that on an evaluation system... -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?
On 05/28/12 08:55 PM, Sašo Kiselkov wrote: On 05/28/2012 10:48 AM, Ian Collins wrote: To follow up, the H310 appears to be useless in non-raid mode. The drives do show up in Solaris 11 format, but they show up as unknown, unformatted drives. One oddity is the box has two SATA SSDs which also show up the card's BIOS, but present OK to Solaris. I'd like to re-FLASH the cards, but I don't think Dell would be too happy with me doing that on an evaluation system... If the drives show up at all, chances are you only need to work around the power-up issue in Dell HDD firmware. Here's what I had to do to get the drives going in my R515: /kernel/drv/sd.conf sd-config-list = SEAGATE ST3300657SS, power-condition:false, SEAGATE ST2000NM0001, power-condition:false; (that's for Seagate 300GB 15k SAS and 2TB 7k2 SAS drives, depending on your drive model the strings might differ) How would that work when the drive type is unknown (to format)? I assumed if sd knows the type, so will format. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?
On 05/28/12 10:53 PM, Sašo Kiselkov wrote: On 05/28/2012 11:48 AM, Ian Collins wrote: On 05/28/12 08:55 PM, Sašo Kiselkov wrote: On 05/28/2012 10:48 AM, Ian Collins wrote: To follow up, the H310 appears to be useless in non-raid mode. The drives do show up in Solaris 11 format, but they show up as unknown, unformatted drives. One oddity is the box has two SATA SSDs which also show up the card's BIOS, but present OK to Solaris. I'd like to re-FLASH the cards, but I don't think Dell would be too happy with me doing that on an evaluation system... If the drives show up at all, chances are you only need to work around the power-up issue in Dell HDD firmware. Here's what I had to do to get the drives going in my R515: /kernel/drv/sd.conf sd-config-list = SEAGATE ST3300657SS, power-condition:false, SEAGATE ST2000NM0001, power-condition:false; (that's for Seagate 300GB 15k SAS and 2TB 7k2 SAS drives, depending on your drive model the strings might differ) How would that work when the drive type is unknown (to format)? I assumed if sd knows the type, so will format. Simply take out the drive and have a look at the label. Tricky when the machine is on a different continent! Joking aside, *I* know what the drive is, the OS as far as I can tell doesn't. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?
On 05/28/12 11:01 PM, Sašo Kiselkov wrote: On 05/28/2012 12:59 PM, Ian Collins wrote: On 05/28/12 10:53 PM, Sašo Kiselkov wrote: On 05/28/2012 11:48 AM, Ian Collins wrote: On 05/28/12 08:55 PM, Sašo Kiselkov wrote: On 05/28/2012 10:48 AM, Ian Collins wrote: To follow up, the H310 appears to be useless in non-raid mode. The drives do show up in Solaris 11 format, but they show up as unknown, unformatted drives. One oddity is the box has two SATA SSDs which also show up the card's BIOS, but present OK to Solaris. I'd like to re-FLASH the cards, but I don't think Dell would be too happy with me doing that on an evaluation system... If the drives show up at all, chances are you only need to work around the power-up issue in Dell HDD firmware. Here's what I had to do to get the drives going in my R515: /kernel/drv/sd.conf sd-config-list = SEAGATE ST3300657SS, power-condition:false, SEAGATE ST2000NM0001, power-condition:false; (that's for Seagate 300GB 15k SAS and 2TB 7k2 SAS drives, depending on your drive model the strings might differ) How would that work when the drive type is unknown (to format)? I assumed if sd knows the type, so will format. Simply take out the drive and have a look at the label. Tricky when the machine is on a different continent! Joking aside, *I* know what the drive is, the OS as far as I can tell doesn't. Can you have a look at your /var/adm/messages or dmesg to check whether the OS is complaining about failed to power up on the relevant drives? If yes, then the above fix should work for you, all you need to do is determine the exact manufacturer and model to enter into sd.conf and reload the driver via update_drv -vf sd. Yes I do see that warning for the non-raid drives. The problem is I'm booting from a remote ISO image, so I can't alter /kernel/drv/sd.conf. I'll play more tomorrow, typing on a remote console inside an RDP session running in a VNC session on a virtual machine is interesting :) -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] need information about ZFS API
On 05/29/12 08:32 AM, Richard Elling wrote: Hi Dhiraj, On May 27, 2012, at 11:28 PM, Dhiraj Bhandare wrote: Hi All I would like to create a sample application for ZFS using C++/C and libzfs. I am very new to ZFS, I would like to have an some information about ZFS API. Even some sample code will be useful. Looking for help and constructive suggestion. libzfs is a private interface (see Solaris man page for attributes) It was not designed to be used directly by external programmers. I can't comment on what Oracle might or might not be doing, but for the open source community, there is a project underway called libzfs_core that is developing a stable library for external consumers. For more info, see http://smartos.org/2012/01/13/the-future-of-libzfs/ That's good news. It's a shame it wasn't announced here. I'm one of the many Matt refers to in the presentation who has been using libzfs. I started using it pretty much the week ZFS shipped and it a was a long while after that I discovered the the private nature of the interface (through a posting here asking for an enhancement!). Since then I have been using a thin wrapper to decouple my applications from changes to the API. Generally the API has been stable for basic operations such as iteration and accessing properties. Not so for send and receive! I have a simple (150 line) C++ wrapper that supports iteration and property access I'm happy to share if anyone is interested. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hard Drive Choice Question
On 05/17/12 02:53 AM, Paul Kraus wrote: I have a small server at home (HP Proliant Micro N36) that I use for file, DNS, DHCP, etc. services. I currently have a zpool of four mirrored 1 TB Seagate ES2 SATA drives. Well, it was a zpool of four until last night when one of the drives died. ZFS did it's job and all the data is still OK. The drive is still under warranty and is going back to Seagate, but it raised an issue. I want to pick up a spare drive or two so that I don't have to wait for shipping delays when a drive fails. I was just going to pick up another 1 TB ES2 or two, but I find that those drives are no longer available (I bought mine in 2009, warranty is up in 2014). What do people like today for 7x24 operation SATA drives? I am willing to consider 2TB, but don't really need the extra capacity (but if that is all the market offers, I don't have to use the other half :-) I found a Seagate Constellation ES 2 TB for about $350 (which is more than I really want to spend, I got the ES2 1TB drives for about $130 when I bought them). I have been sticking with Seagate as I am comfortable with them, but am willing to look at others. The only thing I insist on is that the drive be rated for 7x24 operation. I wouldn't be too fussed about 7x24 rating in a home server. I still have a set of 10 regular Seagate drives I bought in 2007 that were spinning non stop for four years in a very hostile environment (my garage!). They simply refuse to die and I'm still using them in various test systems. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Unexpected error adding a cache device to existing pool
On a Solaris 11 system I have a pool that was originally built with a log a cache device on a single SSD. The SSD died and I realised I should have a mirror log, so I've just tried to replace the log a cache with a pair of SSDs. Adding the log was OK: zpool add -f export log mirror c10t3d0s0 c10t4d0s0 But adding the cache fails: zpool add -f export cache c10t3d0s1 c10t4d0s1 invalid vdev specification the following errors must be manually repaired: /dev/dsk/c10t3d0s2 is part of active ZFS pool export. Please see zpool(1M). /dev/dsk/c10t3d0s1 overlaps with /dev/dsk/c10t3d0s2 Now that looks impossible to repair, s2 can't be removed. The SSD partition table is: Total disk cylinders available: 19932 + 2 (reserved cylinders) Part TagFlag Cylinders SizeBlocks 0 unassignedwm 0 - 2674 16.00GB(2675/0/0) 33555200 1 unassignedwm2675 - 19931 103.22GB(17257/0/0) 216471808 2 backupwu 0 - 19931 119.22GB(19932/0/0) 250027008 Is there a solution? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Unexpected error adding a cache device to existing pool
On 05/14/12 10:32 PM, Carson Gaspar wrote: On 5/14/12 2:02 AM, Ian Collins wrote: Adding the log was OK: zpool add -f export log mirror c10t3d0s0 c10t4d0s0 But adding the cache fails: zpool add -f export cache c10t3d0s1 c10t4d0s1 invalid vdev specification the following errors must be manually repaired: /dev/dsk/c10t3d0s2 is part of active ZFS pool export. Please see zpool(1M). /dev/dsk/c10t3d0s1 overlaps with /dev/dsk/c10t3d0s2 The only solution I know of is to get rid of the whole-disk slice s2 from the disk label. I ended up using prtvtoc to dump the table, editing it by hand, and feeding it to fmthard. You could also try making s0 start at cylinder 1 instead of zero, so zpool doesn't see a magic number on s2, but I don't know if that will be enough. Thank you for the suggestions Carson. Making s0 start at cylinder 1 did the trick. I'm sure I didn't have to do that when I originally built the pool, but that was back on Solaris 11 Express. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Strange hang during snapshot receive
On 05/11/12 02:01 AM, Mike Gerdts wrote: On Thu, May 10, 2012 at 5:37 AM, Ian Collinsi...@ianshome.com wrote: I have an application I have been using to manage data replication for a number of years. Recently we started using a new machine as a staging server (not that new, an x4540) running Solaris 11 with a single pool built from 7x6 drive raidz. No dedup and no reported errors. On that box and nowhere else is see empty snapshots taking 17 or 18 seconds to write. Everywhere else they return in under a second. Have you installed any SRUs? If not, you could be seeing: The machine was at SRU 3. 7060894 zfs recv is excruciatingly slow which is fixed in Solaris 11 SRU 5. Thanks Mike, that appears to be it. Updating to SRU 6 fixed the issue. If you are using zones and are using any https pkg(5) origins (such as https://pkg.oracle.com/solaris/support), I suggest reading https://forums.oracle.com/forums/thread.jspa?threadID=2380689tstart=15 before updating to SRU 6 (SRU 5 is fine, however). The fix for the problem mentioned in that forums thread should show up in an upcoming SRU via CR 7157313. Luckily I have a local repository! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Strange hang during snapshot receive
I have an application I have been using to manage data replication for a number of years. Recently we started using a new machine as a staging server (not that new, an x4540) running Solaris 11 with a single pool built from 7x6 drive raidz. No dedup and no reported errors. On that box and nowhere else is see empty snapshots taking 17 or 18 seconds to write. Everywhere else they return in under a second. Using truss and the last published source code, it looks like the pause is between a printf and the call to zfs_ioctl and there aren't any other functions calls between them: 100.5124 0.0004open(/dev/zfs, O_RDWR|O_EXCL)= 10 100.7582 0.0001read(7, \0\0\0\0\0\0\0\0ACCBBAF5.., 312)= 312 100.7586 0.read(7, 0x080464F8, 0)= 0 100.7591 0.time()= 1336628656 100.7653 0.0035ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040CF0)= 0 100.7699 0.0022ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900)= 0 100.7740 0.0016ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040580)= 0 100.7787 0.0026ioctl(8, ZFS_IOC_OBJSET_STATS, 0x080405B0)= 0 100.7794 0.0001write(1, r e c e i v i n g i n.., 75)= 75 118.3551 0.6927ioctl(8, ZFS_IOC_RECV, 0x08042570)= 0 118.3596 0.0010ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900)= 0 118.3598 0.time()= 1336628673 118.3600 0.write(1, r e c e i v e d 3 1 2.., 45)= 45 zpool iostat (1 second interval) for the period is: tank12.5T 6.58T175 0 271K 0 tank12.5T 6.58T176 0 299K 0 tank12.5T 6.58T189 0 259K 0 tank12.5T 6.58T156 0 231K 0 tank12.5T 6.58T170 0 243K 0 tank12.5T 6.58T252 0 295K 0 tank12.5T 6.58T179 0 200K 0 tank12.5T 6.58T214 0 258K 0 tank12.5T 6.58T165 0 210K 0 tank12.5T 6.58T154 0 178K 0 tank12.5T 6.58T186 0 221K 0 tank12.5T 6.58T184 0 215K 0 tank12.5T 6.58T218 0 248K 0 tank12.5T 6.58T175 0 228K 0 tank12.5T 6.58T146 0 194K 0 tank12.5T 6.58T 99258 209K 1.50M tank12.5T 6.58T196296 294K 1.31M tank12.5T 6.58T188130 229K 776K Can anyone offer any insight or further debugging tips? Thanks. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hung zfs destroy
On 05/ 8/12 08:36 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Ian Collins On a Solaris 11 (SR3) system I have a zfs destroy process what appears to be doing nothing and can't be killed. It has used 5 seconds of CPU in a day and a half, but truss -p won't attach. No data appears to have been removed. The dataset (but not the pool) is busy. I thought this was an old problem that was fixed long ago in Solaris 10 (I had several temporary patches over the years), but it appears to be alive and well. How big is your dataset? Small, 15GB. On what type of disks/pool? Single iSCSI volume. zfs destroy does indeed take time (unlike zpool destroy.) A couple of days might be normal expected behavior, depending on your configuration. You didn't specify if you have dedup... Dedup will greatly hurt your zfs destroy speed, too. I've yet to find a system with enough RAM to make dedup worthwhile! After 5 days, a grand total of 1.2GB has been removed and the process responded to kill -9 and exited... I just re-ran the command it it completed in 2 seconds. Well odd. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Has anyone used a Dell with a PERC H310?
I'm trying to configure a DELL R720 (not a pleasant experience) which has an H710p card fitted. The H710p definitely doesn't support JBOD, but the H310 looks like it might (the data sheet mentions non-RAID). Has anyone used one with ZFS? Thanks, -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?
On 05/ 7/12 03:42 PM, Greg Mason wrote: I am currently trying to get two of these things running Illumian. I don't have any particular performance requirements, so I'm thinking of using some sort of supported hypervisor, (either RHEL and KVM or VMware ESXi) to get around the driver support issues, and passing the disks through to an Illumian guest. The H310 does indeed support pass-through (the non-raid mode), but one thing to keep in mind is that I was only able to configure a single boot disk. I configured the rear two drives into a hardware raid 1 and set the virtual disk as the boot disk so that I can still boot the system if an OS disk fails. Once Illumos is better supported on the R720 and the PERC H310, I plan to get rid of the hypervisor silliness and run Illumos on bare metal. Thank you for the feedback Greg. Using a hypervisor layer is our fall-back position. My next attempt would be SmartOs if I can't get the cards swapped (the R720 currently has a Broadcom 5720 NIC). -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Hung zfs destroy
On a Solaris 11 (SR3) system I have a zfs destroy process what appears to be doing nothing and can't be killed. It has used 5 seconds of CPU in a day and a half, but truss -p won't attach. No data appears to have been removed. The dataset (but not the pool) is busy. I thought this was an old problem that was fixed long ago in Solaris 10 (I had several temporary patches over the years), but it appears to be alive and well. Any hints? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 04/26/12 10:12 PM, Jim Klimov wrote: On 2012-04-26 2:20, Ian Collins wrote: On 04/26/12 09:54 AM, Bob Friesenhahn wrote: On Wed, 25 Apr 2012, Rich Teer wrote: Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. This solution would limit bandwidth to that available from that single server. With the cluster approach, the objective is for each machine in the cluster to primarily access files which are stored locally. Whole files could be moved as necessary. Distributed software building faces similar issues, but I've found once the common files have been read (and cached) by each node, network traffic becomes one way (to the file server). I guess that topology works well when most access to shared data is read. Which reminds me: older Solarises used to have a nifty-looking (via descriptions) cachefs, apparently to speed up NFS clients and reduce traffic, which we did not get to really use in real life. AFAIK Oracle EOLed it for Solaris 11, and I am not sure it is in illumos either. I don't think it even made it into Solaris 10.. I used to use it with Solaris 8 back in the days when 100Mb switches were exotic! Does caching in current Solaris/illumos NFS client replace those benefits, or did the project have some merits of its own (like caching into local storage of client, so that the cache was not empty after reboot)? It did have local backing store, but my current desktop has more RAM than that Solaris 8 box had disk and my network is 100 times faster, so it doesn't really matter any more. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 04/26/12 09:54 AM, Bob Friesenhahn wrote: On Wed, 25 Apr 2012, Rich Teer wrote: Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. This solution would limit bandwidth to that available from that single server. With the cluster approach, the objective is for each machine in the cluster to primarily access files which are stored locally. Whole files could be moved as necessary. Distributed software building faces similar issues, but I've found once the common files have been read (and cached) by each node, network traffic becomes one way (to the file server). I guess that topology works well when most access to shared data is read. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 04/26/12 10:34 AM, Paul Archer wrote: 2:34pm, Rich Teer wrote: On Wed, 25 Apr 2012, Paul Archer wrote: Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, each node would have to mount from each other node. With 16 nodes, that's what, 240 mounts? Not to mention your data is in 16 different mounts/directory structures, instead of being in a unified filespace. Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. That assumes the data set will fit on one machine, and that machine won't be a performance bottleneck. Aren't those general considerations when specifying a file server? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed
On 04/23/12 01:47 PM, Manuel Ryan wrote: Hello, I have looked around this mailing list and other virtual spaces and I wasn't able to find a similar situation than this weird one. I have a 6 disks raidz zfs15 pool. After a scrub, the status of the pool and all disks still show up as ONLINE but two of the disks are starting to give me errors and I do have fatal data corruption. The disks seems to be failing differently : disk 2 has 78 (not growing) read errors, 43k (growing) write errors and 3 (not growing) checksum errors. disk 5 has 0 read errors, 0 write errors but 7.4k checksum errors (growing). Data corruption is around 22k files. I plan to replace both disks. Which disk do you think should be replaced first to loose as few data as possible ? I was thinking of replacing disk 5 first as it seems to have a lot of silent data corruption so maybe it's a bad idea to use it's output to replace disk 2. Also checksum and read errors on disk 2 do not seem to be growing as I used the pool to backup data (corrupted files could not be accessed, but a lot of files were fine) but write errors are growing extremely fast. So reading uncorrupted data from disk 2 seems to be working but writing on it seems to be problematic. Do you guys also think I should change disk 5 first or am I missing something ? If it were my data, I'd set the pool read only, backup, rebuild and restore. You do risk further data loss (maybe even pool loss) while the new drive is resilvering. I would only use raidz for unimportant data, or for a copy of data from a more robust pool. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Improving snapshot write performance
I use an application with a fairly large receive data buffer (256MB) to replicate data between sites. I have noticed the buffer becoming completely full when receiving snapshots for some filesystems, even over a slow (~2MB/sec) WAN connection. I assume this is due to the changes being widely scattered. Is there any way to improve this situation? Thanks, -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving snapshot write performance
On 04/12/12 04:17 AM, Richard Elling wrote: On Apr 11, 2012, at 1:34 AM, Ian Collins wrote: I use an application with a fairly large receive data buffer (256MB) to replicate data between sites. I have noticed the buffer becoming completely full when receiving snapshots for some filesystems, even over a slow (~2MB/sec) WAN connection. I assume this is due to the changes being widely scattered. Widely scattered on the sending side, receiving side should be mostly contiguous... That's what I originally thought. unless you are mostly full or there is some other cause of slow writes. The usual disk-oriented performance analysis will show if this is the case. Most likely, something else is going on here. Odd. The pool is a single iSCSI volume exported from a 7320 and there is 18TB free. I see the same issues with local replications on our LAN. The filesystems that appear to write slowly are ones containing many small files, such as office documents. Over the WAN, the receive buffer high water mark is usually the TCP receive window size, except for the apparently slow filesystems. I'll add some more diagnostics. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving snapshot write performance
On 04/12/12 09:00 AM, Jim Klimov wrote: 2012-04-11 23:55, Ian Collins wrote: Odd. The pool is a single iSCSI volume exported from a 7320 and there is 18TB free. Lame question: is that 18Tb free on the pool inside the iSCSI volume, or on the backing pool on 7320? I mean that as far as the external pool is concerned, the zvol's blocks are allocated - even if the internal pool considers them deleted but did not zero them out and/or TRIM them explicitly. Thus there may be lags due to fragmentation on the backing external pool (physical on 7320), especially if it is not very free and/or ifs free space is already too heavily fragmented into many small bubbles. I'll check, but I see the same effect with local replications as well. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving snapshot write performance
On 04/12/12 09:51 AM, Peter Jeremy wrote: On 2012-Apr-11 18:34:42 +1000, Ian Collinsi...@ianshome.com wrote: I use an application with a fairly large receive data buffer (256MB) to replicate data between sites. I have noticed the buffer becoming completely full when receiving snapshots for some filesystems, even over a slow (~2MB/sec) WAN connection. I assume this is due to the changes being widely scattered. As Richard pointed out, the write side should be mostly contiguous. Is there any way to improve this situation? Is the target pool nearly full (so ZFS is spending lots of time searching for free space)? Do you have dedupe enabled on the target pool? This would force ZFS to search the DDT to write blocks - this will be expensive, especially if you don't have enough RAM. Do yoy have a high compression level (gzip or gzip-N) on the target filesystems, without enough CPU horsepower? Do you have a dying (or dead) disk in the target pool? No to all of the above! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Puzzling problem with zfs receive exit status
On 03/29/12 10:46 PM, Borja Marcos wrote: Hello, I hope someone has an idea. I have a replication program that copies a dataset from one server to another one. The replication mechanism is the obvious one, of course: zfs send -Ri from snapshot(n-1) snapshot(n) file scp file remote machine (I do it this way instead of using a pipeline so that a network error won't interrupt a receive data stream) and on the remote machine, zfs receive -Fd pool It's been working perfectly for months, no issues. However, yesterday we began to see something weird: the zfs receive being executed on the remote machine is exiting with an exit status of 1, even though the replication is finished, and I see the copied snapshots on the remote machine. Any ideas? It's really puzzling. It seems that the replication is working (a zfs list -t snapshot shows the new snapshots correctly applied to the dataset) but I'm afraid there's some kind of corruption. Does zfs receive produce any warnings? Have you tried adding -v? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Receive failing with invalid backup stream error
On 03/10/12 01:48 AM, Jim Klimov wrote: 2012-03-09 9:24, Ian Collins wrote: I sent the snapshot to a file, coped the file to the remote host and piped the file into zfs receive. That worked and I was able to send further snapshots with ssh. Odd. Is it possible that in case of zfs send ... | ssh | zfs recv piping, the two ZFS processes can have some sort of dialog and misunderstanding in your case; while zfs-sending to a file has no dialog and some commonly-working default format/assumptions? As a wild guess, two systems might have different opinions for example regarding dedup during dialog, while it was not even considered when passing through files? Both systems are identical (same hardware, same SRU). The receive also fails if the output of the libzfs zfs_send() function is connected through a socket to zfs_receive() on the other box. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive script
On 03/10/12 02:48 AM, Cameron Hanover wrote: On Mar 6, 2012, at 8:26 AM, Carsten John wrote: Hello everybody, I set up a script to replicate all zfs filesystems (some 300 user home directories in this case) within a given pool to a mirror machine. The basic idea is to send the snapshots incremental if the corresponding snapshot exists on the remote side or send a complete snapshot if no corresponding previous snapshot is available Thee setup basically works, but form time to time (within a run over all filesystems) I get error messages like: cannot receive new filesystem stream: dataset is busy or cannot receive incremental filesystem stream: dataset is busy I've seen similar error messages from a script I've written, as well. Mine does create a lock file and won't run if a `zfs send` is already in progress. My only guess is that the second (or third, or...) filesystem starts sending to the receiving host before the latter has fully finished the `zfs recv` process. I've considered putting a 5 second pause between successive processes, but the errors are intermittent enough that it's pretty low on my to-do list. I have also seen the same issue (a long time ago) and the application I use for replication still has a one second pause between sends to fix the problem. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Receive failing with invalid backup stream error
On 03/ 3/12 11:57 AM, Ian Collins wrote: Hello, I am problems sending some snapshots between two fully up to date Solaris 11 systems: zfs send -i tank/live/fs@20120226_0705 tank/live/fs@20120226_1105 | ssh remote zfs receive -vd fileserver/live receiving incremental stream of tank/live/fs@20120226_1105 into fileserver/live/fs@20120226_1105 cannot receive incremental stream: invalid backup stream Both pools and filesystems are at the latest revision. Most the other filesystems in the pool can be sent without issues. The filesystem was upgraded yesterday, which is when the problems stared. The snapshots are from 26/02. Other filesystems that were upgraded yesterday receive fine, so I don't think the problem is directly related to the upgrade. Any ideas? I haven't had a solution from support yet, but I do have a workaround if anyone else encounters the same problem. I sent the snapshot to a file, coped the file to the remote host and piped the file into zfs receive. That worked and I was able to send further snapshots with ssh. Odd. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Receive failing with invalid backup stream error
Hello, I am problems sending some snapshots between two fully up to date Solaris 11 systems: zfs send -i tank/live/fs@20120226_0705 tank/live/fs@20120226_1105 | ssh remote zfs receive -vd fileserver/live receiving incremental stream of tank/live/fs@20120226_1105 into fileserver/live/fs@20120226_1105 cannot receive incremental stream: invalid backup stream Both pools and filesystems are at the latest revision. Most the other filesystems in the pool can be sent without issues. The filesystem was upgraded yesterday, which is when the problems stared. The snapshots are from 26/02. Other filesystems that were upgraded yesterday receive fine, so I don't think the problem is directly related to the upgrade. Any ideas? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs diff performance
On 02/28/12 12:53 PM, Ulrich Graef wrote: Hi Ian, On 26.02.12 23:42, Ian Collins wrote: I had high hopes of significant performance gains using zfs diff in Solaris 11 compared to my home-brew stat based version in Solaris 10. However the results I have seen so far have been disappointing. Testing on a reasonably sized filesystem (4TB), a diff that listed 41k changes took 77 minutes. I haven't tried my old tool, but I would expect the same diff to take a couple of hours. Size does not matter (at least here). How many files do you have and do you have enough cache in main memory (25% of ARC) or cache device (set to metadata only). Last time I looked, about 10 million files. If you are able to manage that every dnode (512 Byte) is in the ARC or the L2ARC then your compare will fly! When your are doing too much other stuff (do you IO? Do you have applications running?) They will move dnode data out of the direct access and compare needs to read a lot from disk. There was a send running form the same pool. You are comparing a measurement with a guess. That is not a valid test. The guess is based on the last time I ram my old diff tool. The box is well specified, an x4270 with 96G of RAM and a FLASH accelerator card used for log and cache. Number of files/size of files is missing. As I said, about 10 million, various sized form bytes to Gbytes. How much of the pool is used (in %)? 63% Perhaps the recordsize is lowered, then How much is used for the cache. Did you set secondarycache=metadata? No. When, is your burn in long enough, that all the metadata is on fast devices? How large is your L2ARC? 72GB. What is running in parallel to your test? What is the disk configuration (you know: disks are slow)? stripe of 5 2 way mirrors. Do you use de-duplication (does not directly harm the performance, but needs memory and slows down zfs diff through that)? No dedup! Tell me the hit rates of the cache (metadata and data in ARC and L2ARC). Good? I'll have to check next time I run a diff. Raidz or mirror? Are there any ways to improve diff performance? Yes. Mainly memory. Or use less files. Tell that to the users! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs diff performance
I had high hopes of significant performance gains using zfs diff in Solaris 11 compared to my home-brew stat based version in Solaris 10. However the results I have seen so far have been disappointing. Testing on a reasonably sized filesystem (4TB), a diff that listed 41k changes took 77 minutes. I haven't tried my old tool, but I would expect the same diff to take a couple of hours. The box is well specified, an x4270 with 96G of RAM and a FLASH accelerator card used for log and cache. Are there any ways to improve diff performance? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server upgrade
On 02/17/12 03:54 AM, Edward Ned Harvey wrote: If you consider paying for solaris - at Oracle, you just pay them for An OS and they don't care which one you use. Could be oracle linux, solaris, or solaris express. I would recommend solaris 11 express based on personal experience. It gets bugfixes and new features sooner than commercial solaris. Solaris 11 express is long gone. You don't just pay them for An OS. Compare the sensible support pricing for their Linux offering the the ridiculous price for Solaris. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Strange send failure
Hello, I'm attempting to dry run the send the root data set of a zone from one Solaris 11 host to another: sudo zfs send -r rpool/zoneRoot/zone@to_send | sudo ssh remote zfs receive -ven fileserver/zones But I'm seeing cannot receive: stream has unsupported feature, feature flags = 24 The source pool version is 31, the remote pool version is 33. Both the source filesystem and parent on the remote box are version 5. I've never seen this before, any clues? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup
On 12/ 9/11 12:39 AM, Darren J Moffat wrote: On 12/07/11 20:48, Mertol Ozyoney wrote: Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware. The only vendor i know that can do this is Netapp In fact , most of our functions, like replication is not dedup aware. For example, thecnicaly it's possible to optimize our replication that it does not send daya chunks if a data chunk with the same chechsum exists in target, without enabling dedup on target and source. We already do that with 'zfs send -D': -D Perform dedup processing on the stream. Deduplicated streams cannot be received on systems that do not support the stream deduplication feature. Is there any more published information on how this feature works? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] First zone creation - getting ZFS error
On 12/ 9/11 11:37 AM, Betsy Schwartz wrote: On Dec 7, 2011, at 9:50 PM, Ian Collins i...@ianshome.com wrote: On 12/ 7/11 05:12 AM, Mark Creamer wrote: Since the zfs dataset datastore/zones is created, I don't understand what the error is trying to get me to do. Do I have to do: zfs create datastore/zones/zonemaster before I can create a zone in that path? That's not in the documentation, so I didn't want to do anything until someone can point out my error for me. Thanks for your help! You shouldn't have to, but it won't do any harm. If you don't get any further, try zones-discuss. I would also try it without the /zones mountpoint. Putting the zone root dir on an alternate mountpoint caused problems for us. Try creating /datastore/zones for a zone root home, or just make the zones in /datastore Solaris seems to get very easily confused when zone root is anything out of the ordinary ( and it really bites you at patch time!) It shouldn't. On all my systems, I have: NAME USED AVAIL REFER MOUNTPOINT rpool/zoneRoot 11.6G 214G40K /zoneRoot -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] First zone creation - getting ZFS error
On 12/ 7/11 05:12 AM, Mark Creamer wrote: I'm running OI 151a. I'm trying to create a zone for the first time, and am getting an error about zfs. I'm logged in as me, then su - to root before running these commands. I have a pool called datastore, mounted at /datastore Per the wiki document http://wiki.openindiana.org/oi/Building+in+zones, I first created the zfs file system (note that the command syntax in the document appears to be wrong, so I did the options I wanted separately): zfs create datastore/zones zfs set compression=on datastore/zones zfs set mountpoint=/zones datastore/zones zfs list shows: NAME USED AVAIL REFER MOUNTPOINT datastore 28.5M 7.13T 57.9K /datastore datastore/dbdata28.1M 7.13T 28.1M /datastore/dbdata datastore/zones 55.9K 7.13T 55.9K /zones rpool 27.6G 201G45K /rpool rpool/ROOT 2.89G 201G31K legacy rpool/ROOT/openindiana 2.89G 201G 2.86G / rpool/dump 12.0G 201G 12.0G - rpool/export5.53M 201G32K /export rpool/export/home 5.50M 201G32K /export/home rpool/export/home/mcreamer 5.47M 201G 5.47M /export/home/mcreamer rpool/swap 12.8G 213G 137M - Then I went about creating the zone: zonecfg -z zonemaster create set autoboot=true set zonepath=/zones/zonemaster set ip-type=exclusive add net set physical=vnic0 end exit That all goes fine, then... zoneadm -z zonemaster install which returns... ERROR: the zonepath must be a ZFS dataset. The parent directory of the zonepath must be a ZFS dataset so that the zonepath ZFS dataset can be created properly. That's odd, it should have worked. Since the zfs dataset datastore/zones is created, I don't understand what the error is trying to get me to do. Do I have to do: zfs create datastore/zones/zonemaster before I can create a zone in that path? That's not in the documentation, so I didn't want to do anything until someone can point out my error for me. Thanks for your help! You shouldn't have to, but it won't do any harm. If you don't get any further, try zones-discuss. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Confusing zfs error message
I was trying to destroy a filesystem and I was baffled by the following error: zfs destroy -r rpool/test/opt cannot destroy 'rpool/test/opt/csw@2001_1405': dataset already exists zfs destroy -r rpool/test/opt/csw@2001_1405 cannot destroy 'rpool/test/opt/csw@2001_1405': snapshot is cloned It turns out there was a zfs receive writing to the filesystem. A more sensible error would have been dataset is busy. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Compression
On 11/23/11 04:58 PM, Jim Klimov wrote: 2011-11-23 7:39, Matt Breitbach wrote: So I'm looking at files on my ZFS volume that are compressed, and I'm wondering to myself, self, are the values shown here the size on disk, or are they the pre-compressed values. Google gives me no great results on the first few pages, so I headed here. Alas, I can't give a good hint about VMWare - which values it uses. But here are some numbers it might see (likely du or ls sizes are in play): Locally on a ZFS-enabled system you can use ls to normally list your files. This would show you the logical POSIX file size, including any referenced-but-not-allocated sparse blocks (logical size = big, physical size = zero), etc. Basically, this just gives a range of byte numbers that you can address in the file, and depending on the underlying FS all or not all of these bytes are backed by physical storage 1:1. If you use du on the ZFS filesystem, you'll see the logical storage size, which takes into account compression and sparse bytes. So the du size should be not greater than ls size. It can be significantly bigger: ls -sh x 2 x du -sh x 1Kx -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send/recv speed
On 11/16/11 01:01 PM, Eric D. Mudama wrote: On Wed, Nov 16 at 3:05, Anatoly wrote: Good day, The speed of send/recv is around 30-60 MBytes/s for initial send and 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk to 100+ disks in pool. But the speed doesn't vary in any degree. As I understand 'zfs send' is a limiting factor. I did tests by sending to /dev/null. It worked out too slow and absolutely not scalable. None of cpu/memory/disk activity were in peak load, so there is of room for improvement. My belief is that initial/incremental may be affecting it because of initial versus incremental efficiency of the data layout in the pools, not because of something inherent in the send/recv process itself. There are various send/recv improvements (e.g. don't use SSH as a tunnel) but even that shouldn't be capping you at 17MBytes/sec. My incrementals get me ~35MB/s consistently. Each incremental is 10-50GB worth of transfer. While my incremental sizes are much smaller, the rates I see for dense (large blocks of changes, such as media files) incrementals is about the same. I do see much lower rates for more scattered (such as filesystems with documents) changes. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hare receiving snapshots become slower?
On 11/14/11 04:00 AM, Jeff Savit wrote: On 11/12/2011 03:04 PM, Ian Collins wrote: It turns out this was a problem with e1000g interfaces. When we swapped over to an igb port, the problem went away. Ian, could you summarize what the e1000g problem was? It might be interesting or useful for the list. If you don't want to do that, but are willing to tell me off-list that would be appreciated. (Just out of curiosity). I was seeing high latency (2-4 seconds each) when sending large number of small snapshots, say a series of incremental snapshots for a filesystem that hadn't changed. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hare receiving snapshots become slower?
On 09/30/11 08:12 AM, Ian Collins wrote: On 09/30/11 08:03 AM, Bob Friesenhahn wrote: On Fri, 30 Sep 2011, Ian Collins wrote: Slowing down replication is not a good move! Do you prefer pool corruption? ;-) Probably they fixed a dire bug and this is the cost of the fix. Could be. I think I'll raise a support case to find out why. This is making it difficult for me to meet a replication guarantee. It turns out this was a problem with e1000g interfaces. When we swapped over to an igb port, the problem went away. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to set up solaris os and cache within one SSD
On 11/11/11 08:52 PM, darkblue wrote: 2011/11/11 Ian Collins i...@ianshome.com mailto:i...@ianshome.com On 11/11/11 02:42 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org mailto:zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- mailto:zfs-discuss- boun...@opensolaris.org mailto:boun...@opensolaris.org] On Behalf Of darkblue 1 * XEON 5606 1 * supermirco X8DT3-LN4F 6 * 4G RECC RAM 22 * WD RE3 1T harddisk 4 * intel 320 (160G) SSD 1 * supermicro 846E1-900B chassis I just want to say, this isn't supported hardware, and although many people will say they do this without problem, I've heard just as many people (including myself) saying it's unstable that way. I've never had issues with Supermicro boards. I'm using a similar model and everything on the board is supported. I recommend buying either the oracle hardware or the nexenta on whatever they recommend for hardware. Definitely DO NOT run the free version of solaris without updates and expect it to be reliable. That's a bit strong. Yes I do regularly update my supported (Oracle) systems, but I've never had problems with my own build Solaris Express systems. I waste far more time on (now luckily legacy) fully supported Solaris 10 boxes! what does it mean? Solaris 10 live upgrade is a pain in the arse! It gets confused when you have lots of filesystems, clones and zones. I am going to install solaris 10 u10 on this server.it http://server.it that any problem about compatible? and which version of solaris or solaris derived do you suggest to build storage with the above hardware. I'm running 11 Express now, upgrading to Solaris 11 this weekend. Unless you have good reason to use Solaris 10, use Solaris 11 or OpenIndiana. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to set up solaris os and cache within one SSD
On 11/11/11 02:42 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of darkblue 1 * XEON 5606 1 * supermirco X8DT3-LN4F 6 * 4G RECC RAM 22 * WD RE3 1T harddisk 4 * intel 320 (160G) SSD 1 * supermicro 846E1-900B chassis I just want to say, this isn't supported hardware, and although many people will say they do this without problem, I've heard just as many people (including myself) saying it's unstable that way. I've never had issues with Supermicro boards. I'm using a similar model and everything on the board is supported. I recommend buying either the oracle hardware or the nexenta on whatever they recommend for hardware. Definitely DO NOT run the free version of solaris without updates and expect it to be reliable. That's a bit strong. Yes I do regularly update my supported (Oracle) systems, but I've never had problems with my own build Solaris Express systems. I waste far more time on (now luckily legacy) fully supported Solaris 10 boxes! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Stream versions in Solaris 10.
On 11/ 5/11 02:37 PM, Matthew Ahrens wrote: On Wed, Oct 19, 2011 at 1:52 AM, Ian Collins i...@ianshome.com mailto:i...@ianshome.com wrote: I just tried sending from a oi151a system to a Solaris 10 backup server and the server barfed with zfs_receive: stream is unsupported version 17 I can't find any documentation linking stream version to release, so does anyone know the Update 10 stream version? The stream version here is actually the zfs send stream version, which is different from the zpool (SPA) and zfs (ZPL) version numbers. 17 is DMU_BACKUP_FEATURE_SA_SPILL (42) + DMU_SUBSTREAM (1). The SA_SPILL feature is enabled when sending a filesystem of version 5 (System attributes) or later. So the problem is that you are sending a version 5 zfs filesystem to a system that does not support filesystem version 5. Thank you Matt. Are these DMU details documented anywhere? I'm familiar with the SPA and ZPL defines in zfs.h. Odd coincidence: I was reading your blog when this reply came through! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Log disk with all ssd pool?
On 10/28/11 07:04 PM, Mark Wolek wrote: Still kicking around this idea and didn’t see it addressed in any of the threads before the forum closed. If one made an all ssd pool, would a log/cache drive just slow you down? Would zil slow you down? I would guess not, you would still be spreading your IOPs. I haven't tried an all SSD pool, but I have tried adding a lump of spinning rust as a log to pool of identical dives and it did give a small improvement to NFS performance. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Stream versions in Solaris 10.
I just tried sending from a oi151a system to a Solaris 10 backup server and the server barfed with zfs_receive: stream is unsupported version 17 I can't find any documentation linking stream version to release, so does anyone know the Update 10 stream version? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about btrfs and zfs
On 10/19/11 03:12 AM, Paul Kraus wrote: On Tue, Oct 18, 2011 at 9:13 AM, Darren J Moffat darr...@opensolaris.org wrote: On 10/18/11 14:04, Jim Klimov wrote: 2011-10-18 16:26, Darren J Moffat пишет: ZFS does slightly biases new vdevs for new writes so that we will get to a more even spread. It doesn't go and move already written blocks onto the new vdevs though. So while there isn't an admin interface to rebalancing ZFS does do something in this area. This is implemented in metaslab_alloc_dva() http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c See lines 1356-1378 And the admin interface would be what exactly?.. As I said there isn't one because that isn't how it works today it is all automatic and only for new writes. I was pointing out that ZFS does do 'something' not that it had an exactly matching feature. I have done a poor man's rebalance by copying data after adding devices. I know this is not a substitute for a real online rebalance, but it gets the job done (if you can take the data offline, I do it a small chunk at a time). I do the same. Whether you do the balance by hand, or the filesystem does it the data still has to be moved around which can be resource intensive. I'd rather do that at a time of my choosing. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss