[zfs-discuss] zfs send/receive of an entire pool
Hi, I have a zfs filesystem that I'd like to move to another host. It's part of a pool called space, which is mounted at /space and has several child filesystems. The first hurdle I came across was that zfs send only works on snapshots, so I create one: # zfs snapshot -r [EMAIL PROTECTED] # zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT [EMAIL PROTECTED] 0 - 25.9G - space/[EMAIL PROTECTED]0 -31K - space/[EMAIL PROTECTED] 924K - 52.4G - space/[EMAIL PROTECTED] 0 -38K - space/freebsd/[EMAIL PROTECTED] 0 -36K - space/freebsd/[EMAIL PROTECTED] 0 - 4.11G - space/[EMAIL PROTECTED] 0 - 47.6G - space/[EMAIL PROTECTED]352K - 14.7G - space/netboot/[EMAIL PROTECTED] 0 - 95.5M - space/netboot/manduba-freebsd/[EMAIL PROTECTED] 0 -36K - space/netboot/manduba-freebsd/[EMAIL PROTECTED] 0 - 327M - space/netboot/manduba-freebsd/[EMAIL PROTECTED] 0 -36K - space/[EMAIL PROTECTED] 234K - 167G - On the destination, I have created a zpool, again called space and mounted at /space. However, I can't work out how to send [EMAIL PROTECTED] to the new machine: # zfs send [EMAIL PROTECTED] | ssh musundo zfs recv -vn -d space cannot receive: destination 'space' exists # zfs send [EMAIL PROTECTED] | ssh musundo zfs recv -vn space cannot receive: destination 'space' exists # zfs send [EMAIL PROTECTED] | ssh musundo zfs recv -vn space2 cannot receive: destination does not exist # zfs send [EMAIL PROTECTED] | ssh musundo zfs recv -vn space/space2 would receive full stream of [EMAIL PROTECTED] into space/[EMAIL PROTECTED] # zfs send [EMAIL PROTECTED] | ssh musundo zfs recv -vn [EMAIL PROTECTED] cannot receive: destination 'space' exists # zfs send [EMAIL PROTECTED] | ssh musundo zfs recv -vn [EMAIL PROTECTED] cannot receive: destination does not exist What am I missing here? I can't recv to space, because it exists, but I can't make it not exist since it's the root filesystem of the pool. Do I have to send each filesystem individually and rsync up the root fs? Thanks, James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive of an entire pool
On Thu, 2008-01-17 at 09:29 -0800, Richard Elling wrote: You don't say which version of ZFS you are running, but what you want is the -R option for zfs send. See also the example of send usage in the zfs(1m) man page. Sorry, I'm running SXCE nv75. I can't see any mention of send -R in the man page. Ah, it's PSARC/2007/574 and nv77. I'm not convinced it'll solve my problem (sending the root filesystem of a pool), but I'll upgrade and give it a shot. Thanks, James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ? Removing a disk from a ZFS Storage Pool
Dave Lowenstein wrote: Couldn't we move fixing panic the system if it can't find a lun up to the front of the line? that one really sucks. That's controlled by the failmode property of the zpool, added in PSARC 2007/567 which was integrated in b77. -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive of an entire pool
James Andrewartha wrote: On Thu, 2008-01-17 at 09:29 -0800, Richard Elling wrote: You don't say which version of ZFS you are running, but what you want is the -R option for zfs send. See also the example of send usage in the zfs(1m) man page. Sorry, I'm running SXCE nv75. I can't see any mention of send -R in the man page. Ah, it's PSARC/2007/574 and nv77. I'm not convinced it'll solve my problem (sending the root filesystem of a pool), but I'll upgrade and give it a shot. It did in fact do exactly what I wanted. For the record, here are the commands I used: zfs snapshot -r [EMAIL PROTECTED] zfs send -R [EMAIL PROTECTED] | ssh musundo zfs recv -vFd space And later, to catch up further changes: zfs snapshot -r [EMAIL PROTECTED] zfs send -Ri @musundo [EMAIL PROTECTED] | ssh musundo zfs recv -vFd space In both cases the -F was necessary. -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Project Hardware
Erik Trimble wrote: On a related note - does anyone know of a good Solaris-supported 4+ port SATA card for PCI-Express? Preferably 1x or 4x slots... From what I can tell, all the vendors are only making SAS controllers for PCIe with more than 4 ports. Since SAS supports SATA, I guess they don't see much point in doing SATA-only controllers. For example, the LSI SAS3081E-R is $260 for 8 SAS ports on 8x PCIe, which is somewhat more expensive than the almost equivalent PCI-X LSI SAS3080X-R which is as low as $180. For those downthread looking for full RAID controllers with battery backup RAM, Areca (who formerly specialised in SATA controlers) now do SAS RAID at reasonable prices, and have Solaris drivers. -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Project Hardware
On Wed, 2008-05-28 at 10:34 -0600, Keith Bierman wrote: On May 28, 2008, at 10:27 AM 5/28/, Richard Elling wrote: Since the mechanics are the same, the difference is in the electronics In my very distant past, I did QA work for an electronic component manufacturer. Even parts which were identical were expected to behave quite differently ... based on population statistics. That is, the HighRel MilSpec parts were from batches with no failures (even under very harsh conditions beyond the normal operating mode, and all tests to destruction showed only the expected failure modes) and the hobbyist grade components were those whose cohort *failed* all the testing (and destructive testing could highlight abnormal failure modes). I don't know that drive builders do the same thing, but I'd kinda expect it. Seagate's ES.2 has a higher MBTF than the equivalent consumer drive, so you're probably right. Western Digital's RE2 series (which my work uses) comes with a 5 year warranty, compared to 3 years for the consumer versions. The RE2 also have firmware with Time-Limited Error Recovery, which reports errors promptly, letting the higher-level RAID do data recovery. Both have improved vibration tolerance through firmware tweaks. And if you want 10krpm, I think WD's VelociRaptor counts. http://www.techreport.com/articles.x/13732 http://www.techreport.com/articles.x/13253 http://www.techreport.com/articles.x/14583 http://www.storagereview.com/ is promising some SSD benchmarks soon. James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send and recordsize
Peter Boros wrote: I perform a snapshot and a zfs send on a filesystem with a recordsize of 16k, and redirect the output to a plain file. Later, I use cat sentfs | zfs receive otherpool/filesystem. In this case the new filesystem's recordsize will be the default 128k again. The other filesystem attributes (for example atime) are reverted to defaults too. Okay, I can set these later, but I can't set the recordsize for existing files. Are there any solutions for this problem? This is the case on Solaris 10u5 and on Nevada b91 too. My impression is you should change the recordsize on the first filesystem before performing the zfs send. This will then be used for all files when you receive the filesystem. I haven't tested this with recordsize, but I did with compression and I imagine recordsize (and others) will behave the same way. -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send and recordsize
Peter Boros wrote: Hi James, Of course, changing the recordsize was the first thing I did, after I created the original filesystem. I copied some files on it, made a snapshot, and then performed the zfs send (with the decreased recordsize). After I performed a zfs receive, the recordsize was the default (128k) on the new filesystem. Ah, I was using the -R option to zfs send, which does what you want. It's been in since nv77, PSARC/2007/574. To quote the manpage: -R Generate a replication stream package, which will replicate the specified filesystem, and all descendant file sys- tems, up to the named snapshot. When received, all properties, snapshots, des- cendent file systems, and clones are preserved. If the -i or -I flags are used in con- junction with the -R flag, an incremental replication stream is generated. The current values of properties, and current snapshot and file system names are set when the stream is received. If the -F flag is specified when this stream is recieved, snapshots and file systems that do not exist on the sending side are des- troyed. -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over multiple iSCSI targets
Tuomas Leikola wrote: On Mon, Sep 8, 2008 at 8:35 PM, Miles Nordin [EMAIL PROTECTED] wrote: ps iSCSI with respect to write barriers? +1. Does anyone even know of a good way to actually test it? So far it seems the only way to know if your OS is breaking write barriers is to trade gossip and guess. Write a program that writes backwards (every other block to avoid write merges) with and without O_DSYNC, measure speed. I think you can also deduce driver and drive cache flush correctness by calculating the best theoretical correct speed (which should be really slow, one write per disc spin) this has been on my TODO list for ages.. :( Does the perl script at http://brad.livejournal.com/2116715.html do what you want? -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] web interface not showing up
mike wrote: On Sun, Sep 21, 2008 at 11:49 PM, Volker A. Brandt [EMAIL PROTECTED] wrote: Hmmm... I run Solaris 10/sparc U4. My /usr/java points to jdk/jdk1.5.0_16. I am using Firefox 2.0.0.16. Works For Me(TM) ;-) Sorry, can't help you any further. Maybe a question for desktop-discuss? it's a java error on the server side, not client side (although there is a javascript error in every browser i tried it in, but probably unrelated or an error due to the java not executing properly) anyway - you did help me at least get the webconsole running. the zfs admin piece of it though is throwing the java error... Can you post the java error to the list? Do you have gzip compressed or aclinherit properties on your filesystems, hitting bug 6715550? http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048457.html http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048550.html -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] rename(2), atomicity, crashes and fsync()
Hi all, Recently there's been discussion [1] in the Linux community about how filesystems should deal with rename(2), particularly in the case of a crash. ext4 was found to truncate files after a crash, that had been written with open(foo.tmp), write(), close() and then rename(foo.tmp, foo). This is because ext4 uses delayed allocation and may not write the contents to disk immediately, but commits metadata changes quite frequently. So when rename(foo.tmp,foo) is committed to disk, it has a length of zero which is later updated when the data is written to disk. This means after a crash, foo is zero-length, and both the new and the old data has been lost, which is undesirable. This doesn't happen when using ext3's default settings because ext3 writes data to disk before metadata (which has performance problems, see Firefox 3 and fsync[2]) Ted T'so's (the main author of ext3 and ext4) response is that applications which perform open(),write(),close(),rename() in the expectation that they will either get the old data or the new data, but not no data at all, are broken, and instead should call open(),write(),fsync(),close(),rename(). Most other people are arguing that POSIX says rename(2) is atomic, and while POSIX doesn't specify crash recovery, returning no data at all after a crash is clearly wrong, and excessive use of fsync is overkill and counter-productive (Ted later proposes a yes-I-really-mean-it flag for fsync). I've omitted a lot of detail, but I think this is the core of the argument. Now the question I have, is how does ZFS deal with open(),write(),close(),rename() in the case of a crash? Will it always return the new data or the old data, or will it sometimes return no data? Is returning no data defensible, either under POSIX or common sense? Comments about other filesystems, eg UFS are also welcome. As a counter-point, XFS (written by SGI) is notorious for data-loss after a crash, but its authors defend the behaviour as POSIX-compliant. Note this is purely a technical discussion - I'm not interested in replies saying ?FS is a better filesystem in general, or on GPL vs CDDL licensing. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781?comments=all http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/ http://lwn.net/Articles/323169/ http://mjg59.livejournal.com/108257.html http://lwn.net/Articles/323464/ http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ http://lwn.net/Articles/323752/ * http://lwn.net/Articles/322823/ * * are currently subscriber-only, email me for a free link if you'd like to read them [2] http://lwn.net/Articles/283745/ -- James Andrewartha | Sysadmin Data Analysis Australia Pty Ltd | STRATEGIC INFORMATION CONSULTANTS 97 Broadway, Nedlands, Western Australia, 6009 PO Box 3258, Broadway Nedlands, WA, 6009 T: +61 8 9386 3304 | F: +61 8 9386 3202 | I: http://www.daa.com.au ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] Supermicro AOC-SASLP-MV8
myxi...@googlemail.com wrote: Bouncing a thread from the device drivers list: http://opensolaris.org/jive/thread.jspa?messageID=357176 Does anybody know if OpenSolaris will support this new Supermicro card, based on the Marvell 88SE6480 chipset? It's a true PCI Express 8 port JBOD SAS/SATA controller with pricing apparently around $125. If it works with OpenSolaris it sounds pretty much perfect. The Linux support for the 6480 builds on the 6440 mvsas support, so I don't think marvell88sx would work, and there doesn't seem to be a Marvell SAS driver for Solaris at all, so I'd say it's not supported. http://www.hardforum.com/showthread.php?t=1397855 has a fair few people testing it out, but mostly under Windows. -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Joerg Schilling wrote: I would be interested to see a open(2) flag that tells the system that I will read a file that I opened exactly once in native oder. This could tell the system to do read ahead and to later mark the pages as immediately reusable. This would make star even faster than it is now. Are you aware of posix_fadvise(2) and madvise(2)? -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] surprisingly poor performance
James Lever wrote: We also have a PERC 6/E w/512MB BBWC to test with or fall back to if we go with a Linux solution. Have you tried putting the slog on this controller, either as an SSD or regular disk? It's supported by the mega_sas driver, x86 and amd64 only. -- James Andrewartha | Sysadmin Data Analysis Australia Pty Ltd ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] surprisingly poor performance
James Lever wrote: On 07/07/2009, at 8:20 PM, James Andrewartha wrote: Have you tried putting the slog on this controller, either as an SSD or regular disk? It's supported by the mega_sas driver, x86 and amd64 only. What exactly are you suggesting here? Configure one disk on this array as a dedicated ZIL? Would that improve performance any over using all disks with an internal ZIL? I was mainly thinking about using the battery-backup write cache to eliminate the NFS latency. There's not much difference between internal vs dedicated ZIL if the disks are the same and on the same controller - dedicated ZIL wins come from using SSDs and battery-backed cache. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Log_Devices Is there a way to disable the write barrier in ZFS in the way you can with Linux filesystems (-o barrier=0)? Would this make any difference? http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes might help if the RAID card is still flushing to disk when ZFS asks it to even though it's safe in the battery-backed cache. -- James Andrewartha | Sysadmin Data Analysis Australia Pty Ltd ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding SATA cards for ZFS; was Lundman home NAS
Jorgen Lundman wrote: The mv8 is a marvell based chipset, and it appears there are no Solaris drivers for it. There doesn't appear to be any movement from Sun or marvell to provide any either. Do you mean specifically Marvell 6480 drivers? I use both DAC-SATA-MV8 and AOC-SAT2-MV8, which use Marvell MV88SX and works very well in Solaris. (Package SUNWmv88sx). They're PCI-X SATA cards, the AOC-SASLP-MV8 is a PCIe SAS card and has no (Open)Solaris driver. -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Sun Flash Accelerator F20
I'm surprised no-one else has posted about this - part of the Sun Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor for cache protection. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml There's no pricing on the webpage though - does anyone know how it compares in price to a logzilla? -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs inotify?
Carson Gaspar wrote: On 10/26/09 5:33 PM, p...@paularcher.org wrote: I can't find much on gam_server on Solaris (couldn't find too much on it at all, really), and port_create is apparently a system call. (I'm not a developer--if I can't write it in BASH, Perl, or Ruby, I can't write it.) I appreciate the suggestions, but I need something a little more pret-a-porte. Your Google-fu needs work ;-) Main Gamin page: http://www.gnome.org/~veillard/gamin/index.html Actually, I found this page, which has this gem: At this point Gamin is fairly tied to Linux, portability is not a primary goal at this stage but if you have portability patches they are welcome. Much has changed since that text was written, including support for the event completion framework (port_create() and friends, introduced with Sol 10) on Solaris, thus the recommendation for gam_server / gamin. $ nm /usr/lib/gam_server | grep port_create [458] | 134589544| 0|FUNC |GLOB |0|UNDEF |port_create The patch for port_create has never gone upstream however, while gvfs uses glib's gio, which has backends for inotify, solaris, fam and win32. -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] freeNAS moves to Linux from FreeBSD
Bob Friesenhahn wrote: On Mon, 7 Dec 2009, Michael DeMan (OA) wrote: Args for FreeBSD + ZFS: - Limited budget - We are familiar with managing FreeBSD. - We are familiar with tuning FreeBSD. - Licensing model Args against OpenSolaris + ZFS: - Hardware compatibility - Lack of knowledge for tuning and associated costs for training staff to learn 'yet one more operating system' they need to support. - Licensing model If you think about it a little bit, you will see that there is no significant difference in the licensing model between FreeBSD+ZFS and OpenSolaris+ZFS. It is not possible to be a little bit pregnant. Either one is pregnant, or one is not. There is a huge difference practically - OpenSolaris has no free security updates for stable releases, unlike FreeBSD. And I'm sure you don't recommend running /dev in production. This is offtopic, and isn't specifically related to CDDL vs BSD, just how Sun chooses to do things. Sure, there have been claims (since before 2008.05) that it might happen some day, but until 2009.06 users can freely get a non-vulernable Firefox or Samba or fixes for various network kernel panics the claims are meaningless. http://mail.opensolaris.org/pipermail/opensolaris-help/2009-November/015824.html -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss