Re: [zfs-discuss] Deduplication Memory Requirements
Hi, On 05/ 5/11 03:02 PM, Edward Ned Harvey wrote: From: Garrett D'Amore [mailto:garr...@nexenta.com] We have customers using dedup with lots of vm images... in one extreme case they are getting dedup ratios of over 200:1! I assume you're talking about a situation where there is an initial VM image, and then to clone the machine, the customers copy the VM, correct? If that is correct, have you considered ZFS cloning instead? When I said dedup wasn't good for VM's, what I'm talking about is: If there is data inside the VM which is cloned... For example if somebody logs into the guest OS and then does a cp operation... Then dedup of the host is unlikely to be able to recognize that data as cloned data inside the virtual disk. ZFS cloning and ZFS dedup are solving two problems that are related, but different: - Through Cloning, a lot of space can be saved in situations where it is known beforehand that data is going to be used multiple times from multiple different views. Virtualization is a perfect example of this. - Through Dedup, space can be saved in situations where the duplicate nature of data is not known, or not known beforehand. Again, in virtualization scenarios, this could be common modifications to VM images that are performed multiple times, but not anticipated, such as extra software, OS patches, or simply man users saving the same files to their local desktops. To go back to the cp example: If someone logs into a VM that is backed by ZFS with dedup enabled, then copies a file, the extra space that the file will take will be minimal. The act of copying the file will break down into a series of blocks that will be recognized as duplicate blocks. This is completely independent of the clone nature of the underlying VM's backing store. But I agree that the biggest savings are to be expected from cloning first, as they typically translate into n GB (for the base image) x # of users, which is a _lot_. Dedup is still the icing on the cake for all those data blocks that were unforeseen. And that can be a lot, too, as everone who has seen cluttered desktops full of downloaded files can probably confirm. Cheers, Constantin -- Constantin Gonzalez Schmitz, Sales Consultant, Oracle Hardware Presales Germany Phone: +49 89 460 08 25 91 | Mobile: +49 172 834 90 30 Blog: http://constantin.glez.de/| Twitter: zalez ORACLE Deutschland B.V. Co. KG, Sonnenallee 1, 85551 Kirchheim-Heimstetten ORACLE Deutschland B.V. Co. KG Hauptverwaltung: Riesstraße 25, D-80992 München Registergericht: Amtsgericht München, HRA 95603 Komplementärin: ORACLE Deutschland Verwaltung B.V. Hertogswetering 163/167, 3543 AS Utrecht Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven Oracle is committed to developing practices and products that help protect the environment ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deleting large amounts of files
Hi, Is there a way to see which files have been deduped, so I can copy them again an un-dedupe them? unfortunately, that's not easy (I've tried it :) ). The issue is that the dedup table (which knows which blocks have been deduped) doesn't know about files. And if you pull block pointers for deduped blocks from the dedup table, you'll need to backtrack from there through the filesystem structure to figure out what files are associated with those blocks. (remember: Deduplication happens at the block level, not the file level.) So, in order to compile a list of deduped _files_, one would need to extract the list of dedupes _blocks_ from the dedup table, then chase the pointers from the root of the zpool to the blocks in order to figure out what files they're associated with. Unless there's a different way that I'm not aware of (and I hope someone can correct me here), the only way to do that is run a scrub-like process and build up a table of files and their blocks. Cheers, Constantin -- Constantin Gonzalez Schmitz | Principal Field Technologist Phone: +49 89 460 08 25 91 || Mobile: +49 172 834 90 30 Oracle Hardware Presales Germany ORACLE Deutschland B.V. Co. KG | Sonnenallee 1 | 85551 Kirchheim-Heimstetten ORACLE Deutschland B.V. Co. KG Hauptverwaltung: Riesstraße 25, D-80992 München Registergericht: Amtsgericht München, HRA 95603 Komplementärin: ORACLE Deutschland Verwaltung B.V. Rijnzathe 6, 3454PV De Meern, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven Oracle is committed to developing practices and products that help protect the environment ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
Hi Tim, thanks for sharing your dedup experience. Especially for Virtualization, having a good pool of experience will help a lot of people. So you see a dedup ratio of 1.29 for two installations of Windows Server 2008 on the same ZFS backing store, if I understand you correctly. What dedup ratios do you see for the third, fourth and fifth server installation? Also, maybe dedup is not the only way to save space. What compression rate do you get? And: Have you tried setting up a Windows System, then setting up the next one based on a ZFS clone of the first one? Hope this helps, Constantin On 04/23/10 08:13 PM, tim Kries wrote: Dedup is a key element for my purpose, because i am planning a central repository for like 150 Windows Server 2008 (R2) servers which would take a lot less storage if they dedup right. -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist Blog: constantin.glez.de Tel.: +49 89/4 60 08-25 91 Twitter: @zalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Jürgen Kunz ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies
Hi, I agree 100% with Chris. Notice the on their own part of the original post. Yes, nobody wants to run zfs send or (s)tar by hand. That's why Chris's script is so useful: You set it up and forget and get the job done for 80% of home users. On another note, I was positively surprised by the availability of Crash Plan for OpenSolaris: http://crashplan.com/ Their free service allows to back up your stuff to a friend's system over the net in an encrypted way, the paid-for servide uses Crashplan's data centers at a less than Amazon-S3 pricing. While this may not be everyone's solution, I find it significant that they explicitly support OpenSolaris. This either means they're OpenSolaris fans or that they see potential in OpenSolaris home server users. Cheers, Constantin On 03/20/10 01:31 PM, Chris Gerhard wrote: I'll say it again: neither 'zfs send' or (s)tar is an enterprise (or even home) backup system on their own one or both can be components of the full solution. Up to a point. zfs send | zfs receive does make a very good back up scheme for the home user with a moderate amount of storage. Especially when the entire back up will fit on a single drive which I think would cover the majority of home users. Using external drives and incremental zfs streams allows for extremely quick back ups of large amounts of data. It certainly does for me. http://chrisgerhard.wordpress.com/2007/06/01/rolling-incremental-backups/ -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist Blog: constantin.glez.de Tel.: +49 89/4 60 08-25 91 Twitter: @zalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Hi, I'm using 2 x 1.5 TB drives from Samsung (EcoGreen, I believe) in my current home server. One reported 14 Read errors a few weeks ago, roughly 6 months after install, which went away during the next scrub/resilver. This remembered me to order a 3rd drive, a 2.0 TB WD20EADS from Western Digital and I now have a 3-way mirror, which is effectively a 2-way mirror with its hot-spare already synced in. The idea behind notching up the capacity is threefold: - No sorry, this disk happens to have 1 block too few problems on attach. - When the 1.5 TB disks _really_ break, I'll just order another 2 TB one and use the opportunity to upgrade pool capacity. Since at least one of the 1.5TB drives will still be attached, there won't be any slightly smaller drive problems either when attaching the second 2TB drive. - After building in 2 bigger drives, it becomes easy to figure out which of the drives to phase out. Just go for the smaller drives. This solves the headache of trying to figure out the right drive to build out when you replace drives that aren't hot spares and don't have blinking lights. Frankly, I don't care whether the Samsung or the WD drives are better or worse, they're both consumer drives and they're both dirt cheap. Just assume that they'll break soon (since you're probably using them more intensely than their designed purpose) and make sure their replacements are already there. It also helps mixing vendors, so one glitch that affect multiple disks in the same batch won't affect your setup too much. (And yes, I broke that rule with my initial 2 Samsung drives but I'm now glad I have both vendors :)). Hope this helps, Constantin Simon Breden wrote: I see also that Samsung have very recently released the HD203WI 2TB 4-platter model. It seems to have good customer ratings so far at newegg.com, but currently there are only 13 reviews so it's a bit early to tell if it's reliable. Has anyone tried this model with ZFS? Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Setting default user/group quotas?
Hi, first of all, many thanks to those who made user/group quotas possible. This is a huge improvement for many users of ZFS! While presenting on this new future at the Munich OpenSolaris User Group meeting yesterday, a question came up that I couldn't find an answer for: Can you set a default user/group quota? Apparently, zfs set userqu...@user1=5g tank/home/user1 is the only way to set user quotas and the @user1 part seems to be mandatory, at least according to the snv_126 version of the ZFS manpage. According to my attempts with ZFS: The {user|group}{used|quota}@ properties must be appended with a user or group specifier of one of these forms: POSIX name (eg: matt) POSIX id(eg: 126829) SMB n...@domain (eg: m...@sun) SMB SID (eg: S-1-234-567-89) Imagine a system that needs to handle thousands of users. Setting quota individually for all of these users would quickly become unwieldly, in a similar manner to the unwieldliness that having a filesystem for each user presented. Which was the reason to introduce user/group quotas in the first place. IMHO, it would be useful to have something like: zfs set userquota=5G tank/home and that would mean that all users who don't have an individual user quota assigned to them would see a default 5G quota. I haven't found an RFE for this yet. Is this planned? Should I file an RFE? Or did I overlook something? Thanks, Constantin -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting default user/group quotas?
Hi, IMHO, it would be useful to have something like: zfs set userquota=5G tank/home ... I think that would be great feature. thanks. I just created CR 6902902 to track this. I hope it becomes viewable soon here: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6902902 Cheers, Constantin -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS commands hang after several zfs receives
Hi, I think I've run into the same issue on OpenSolaris 2009.06. Does anybody know when this issue will be solved in OpenSolaris? What's the BugID? Thanks, Constantin Gary Mills wrote: On Tue, Sep 15, 2009 at 08:48:20PM +1200, Ian Collins wrote: Ian Collins wrote: I have a case open for this problem on Solaris 10u7. The case has been identified and I've just received an IDR,which I will test next week. I've been told the issue is fixed in update 8, but I'm not sure if there is an nv fix target. I'll post back once I've abused a test system for a while. The IDR I was sent appears to have fixed the problem. I have been abusing the box for a couple of weeks without any lockups. Roll on update 8! Was that IDR140221-17? That one fixed a deadlock bug for us back in May. -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto Updates [PSARC/2009/443 FastTrack timeout 08/24/2009]
Hi, Brian Hechinger wrote: On Tue, Aug 18, 2009 at 12:37:23AM +0100, Robert Milkowski wrote: Hi Darren, Thank you for the update. Have you got any ETA (build number) for the crypto project? Also, is there any word on if this will support the hardware crypto stuff in the VIA CPUs natively? That would be nice. :) ZFS Crypto uses the Solaris Cryptographics Framework to do the actual encryption work, so ZFS is agnostic to any hardware crypto acceleration. The Cryptographic Framework project on OpenSolaris.org is looking for help in implementing VIA Padlock support for the Solaris Cryptographic Framework: http://www.opensolaris.org/os/project/crypto/inprogress/padlock/ Cheers, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motherboard for home zfs/solaris file server
Hi, thank you so much for this post. This is exactly what I was looking for. I've been eyeing the M3A76-CM board, but will now look at 78 and M4A as well. Actually, not that many Asus M3A, let alone M4A boards show up yet on the OpenSolaris HCL, so I'd like to encourage everyone to share their hardware experience by clicking on the submit hardware link on: http://www.sun.com/bigadmin/hcl/data/os/ I've done it a couple of times and it's really just a matter of 5-10 minutes where you can help others know if a certain component works or not or if a special driver or /etc/driver_aliases setting is required. I'm also interested in getting the power down. Right now, I have the Athlon X2 5050e (45W TDP) on my list, but I'd also like to know more about the possibilities of the Athlon II X2 250 and whether it has better potential for power savings. Neal, the M3A78 seems to have a RealTek RTL8111/8168B NIC chip. I pulled this off a Gentoo Wiki, because strangely this information doesn't show up on the Asus website. Also, thanks for the CF to pata hint for the root pool mirror. Will try to find fast CFs to boot from. The performance problems you see when writing may be related to master/slave issues, but I'm not a good PC tweaker to back that up. Cheers, Constantin F. Wessels wrote: Hi, I'm using asus m3a78 boards (with the sb700) for opensolaris and m2a* boards (with the sb600) for linux some of them with 4*1GB and others with 4*2Gb ECC memory. Ecc faults will be detected and reported. I tested it with a small tungsten light. By moving the light source slowly towards the memory banks you'll heat them up in a controlled way and at a certain point bit flips will occur. I recommend you to go for a m4a board since they support up to 16 GB. I don't know if you can run opensolaris without a videocard after installation I think you can disable the halt on no video card in the bios. But Simon Breden had some trouble with it, see his homeserver blog. But you can go for one of the three m4a boards with a 780g onboard. Those will give you 2 pci-e x16 connectors. I don't think the onboard nic is supported. I always put an intel (e1000) in, just to prevent any trouble. I don't have any trouble with the sb700 in ahci mode. Hotplugging works like a charm. Transfering a couple of GB's over esata takes considerable less time than via usb. I have a pata to dual cf adapter and two industrial 16gb cf cards as mirrored root pool. It takes for ever to install nevada, at least 14 hours. I suspect the cf cards lack caches. But I don't update that regularly, still on snv104. And have 2 mirrors and a hot spare. The sixth port is an esata port I use to transfer large amounts of data. This system consumes about 73 watts idle and 82 under load i/o load. (5 disks , a separate nic ,8 gb ram and a be2400 all using just 73 watts!!!) Please note that frequency scaling is only supported on the K10 architecture. But don't expect to much power saving from it. A lower voltage yields far greater savings than a lower frequency. In september I'll do a post about the afore mentioned M4A boards and an lsi sas controller in one of the pcie x16 slots. -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already that asks for the ability to disable the ZIL on a per filesystem basis? Yes: 6280630 zil synchronicity good, thanks for the pointer! Though personally I've been unhappy with the exposure that zil_disable has got. It was originally meant for debug purposes only. So providing an official way to make synchronous behaviour asynchronous is to me dangerous. IMHO, the need here is to give admins control over the way they want their file servers to behave. In this particular case, the admin argues that he knows what he's doing, that he doesn't want his NFS server to behave more strongly than a local filesystem and that he deserves control of that behaviour. Ideally, there would be an NFS option that lets customers choose whether they want to honor COMMIT requests or not. Disabling ZIL on a per filesystem basis is only the second best solution, but since that CR already exists, it seems to be the more realistic route. Thanks, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, Bob Friesenhahn wrote: On Wed, 22 Oct 2008, Neil Perrin wrote: On 10/22/08 10:26, Constantin Gonzalez wrote: 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me that if a tar xvf were writing locally to a ZFS file system, the writes wouldn't be synchronous either, so there's no point in forcing NFS users to having a better availability experience at the expense of performance. The conclusion reached here is quite seriously wrong and no Sun employee should suggest it to a customer. If the system writing to a I'm not suggesting it to any customer. Actually, I argued quite a long time with the customer, trying to convince him that slow but correct is better. The conclusion above is a conscious decision by the customer. He says that he does not want NFS to turn any write into a synchronous write, he's happy if all writes are asynchronous, because in this case the NFS server is a backup to disk device and if power fails he simply restarts the backup 'cause he has the data in multiple copies anyway. local filesystem reboots then the applications which were running are also lost and will see the new filesystem state when they are restarted. If an NFS server sponteneously reboots, the applications on the many clients are still running and the client systems are using cached data. This means that clients could do very bad things if the filesystem state (as seen by NFS) is suddenly not consistent. One of the joys of NFS is that the client continues unhindered once the server returns. Yes, we're both aware of this. In this particular situation, the customer would restart his backup job (and thus the client application) in case the server dies. Thanks for pointing out the difference, this is indeed an important distinction. Cheers, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, yes, using slogs is the best solution. Meanwhile, using mirrored slogs from other servers' RAM-Disks running on UPSs seem like an interesting idea, if the reliability of UPS-backed RAM is deemed reliable enough for the purposes of the NFS server. Thanks for siggesting this! Cheers, Constantin Ross wrote: Well, it might be even more of a bodge than disabling the ZIL, but how about: - Create a 512MB ramdisk, use that for the ZIL - Buy a Micro Memory nvram PCI card for £100 or so. - Wait 3-6 months, hopefully buy a fully supported PCI-e SSD to replace the Micro Memory card. The ramdisk isn't an ideal solution, but provided you don't export the pool with it offline, it does work. We used it as a stop gap solution for a couple of weeks while waiting for a Micro Memory nvram card. Our reasoning was that our server's on a UPS and we figured if something crashed badly enough to take out something like the UPS, the motherboard, etc, we'd be loosing data anyway. We just made sure we had good backups in case the pool got corrupted and crossed our fingers. The reason I say wait 3-6 months is that there's a huge amount of activity with SSD's at the moment. Sun said that they were planning to have flash storage launched by Christmas, so I figure there's a fair chance that we'll see some supported PCIe cards by next Spring. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, Bob Friesenhahn wrote: On Thu, 23 Oct 2008, Constantin Gonzalez wrote: Yes, we're both aware of this. In this particular situation, the customer would restart his backup job (and thus the client application) in case the server dies. So it is ok for this customer if their backup becomes silently corrupted and the backup software continues running? Consider that some of the backup files may have missing or corrupted data in the middle. Your customer is quite dedicated in that he will monitor the situation very well and remember to reboot the backup system, correct any corrupted files, and restart the backup software whenever the server panics and reboots. This is what the customer told me. He uses rsync and he is ok with restarting the rsync whenever the NFS server restarts. A properly built server should be able to handle NFS writes at gigabit wire-speed. I'm advocating for a properly built system, believe me :). Cheers, Constantin -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Hi, On a busy NFS server, performance tends to be very modest for large amounts of small files due to the well known effects of ZFS and ZIL honoring the NFS COMMIT operation[1]. For the mature sysadmin who knows what (s)he does, there are three possibilities: 1. Live with it. Hard, if you see 10x less performance than could be and your users complain a lot. 2. Use a flash disk for a ZIL, a slog. Can add considerable extra cost, especially if you're using an X4500/X4540 and can't swap out fast SAS drives for cheap SATA drives to free the budget for flash ZIL drives.[2] 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me that if a tar xvf were writing locally to a ZFS file system, the writes wouldn't be synchronous either, so there's no point in forcing NFS users to having a better availability experience at the expense of performance. So, if the sysadmin draws the informed and conscious conclusion that (s)he doesn't want to honor NFS COMMIT operations, what are options less disruptive than disabling ZIL completely? - I checked the NFS tunables from: http://dlc.sun.com/osol/docs/content/SOLTUNEPARAMREF/chapter3-1.html But could not find a tunable that would disable COMMIT honoring. Is there already an RFE asking for a share option that disable's the translation of COMMIT to synchronous writes? - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already that asks for the ability to disable the ZIL on a per filesystem basis? Once Admins start to disable the ZIL for whole pools because the extra performance is too tempting, wouldn't it be the lesser evil to let them disable it on a per filesystem basis? Comments? Cheers, Constantin [1]: http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine [2]: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on -- Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologisthttp://blogs.sun.com/constantin Tel.: +49 89/4 60 08-25 91 http://google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Start with desired end state in mind...
Hi, great, thank you. So ZFS isn't picky about finding the target fs already created and attributed when replicating data into it. This is very cool! Best regards, Constantin Darren J Moffat wrote: Constantin Gonzalez wrote: Hi Darren, thank you for the clarification, I didn't know that. See the man page for zfs(1) where the -R options for send is discussed. Back to Brad's RFS, what would one need to do to send a stream from a compressed filesystem to one with a different compression setting, if the source file system has the compression attribute set to a specific algorithm (i.e. not inherited)? $ zfs create -o compression=gzip-1 tank/gz1 # put in your data $ zfs snapshot tank/[EMAIL PROTECTED] $ zfs create -o compression=gzip-9 tank/gz9 $ zfs send tank/[EMAIL PROTECTED] | zfs recv -d tank/gz9 Will leaving out -R just create a new, but plain unencrypted fs on the receivig side? Depends on inheritance. What if one wants to replicated a whole package of filesystems via -R, but change properties on the receiving side before it happens? If they are all getting the same properties use inheritance if they aren't then you (by the very nature of what you want to do) need to precreate them with the appropriate options. -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91www.google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Start with desired end state in mind...
Hi Darren, thank you for the clarification, I didn't know that. See the man page for zfs(1) where the -R options for send is discussed. oh, this is new. Thank you for bringing us -R. Back to Brad's RFS, what would one need to do to send a stream from a compressed filesystem to one with a different compression setting, if the source file system has the compression attribute set to a specific algorithm (i.e. not inherited)? Will leaving out -R just create a new, but plain unencrypted fs on the receivig side? What if one wants to replicated a whole package of filesystems via -R, but change properties on the receiving side before it happens? Best regards, Constantin But for the sake of implementing the RFE, one could extend the ZFS send/receive framework with a module that permits manipulation of the data on the fly, specifically in order to allow for things like recompression, en/decryption, change of attributes at the dataset level, etc. No need this already works this way. -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91www.google.com/search?q=constantin+gonzalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS with Memory Sticks
Hi Paul, # fdisk -E /dev/rdsk/c7t0d0s2 then # zpool create -f Radical-Vol /dev/dsk/c7t0d0 should work. The warnings you see are just there to double-check you don't overwrite any previously used pool which you may regret. -f overrules that. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS with Memory Sticks
Hi, # /usr/sbin/zpool import pool: Radical-Vol id: 3051993120652382125 state: FAULTED status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-5E config: Radical-Vol UNAVAIL insufficient replicas c7t0d0s0 UNAVAIL corrupted data ok, ZFS did recognize the disk, but the pool is corrupted. Did you remove it without exporting the pool first? Following your command: $ /opt/sfw/bin/sudo /usr/sbin/zpool status pool: Rad_Disk_1 state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAMESTATE READ WRITE CKSUM Rad_Disk_1 ONLINE 0 0 0 c0t1d0ONLINE 0 0 0 errors: No known data errors But this pool should be accessible, since you can zpool status it. Have you check zfs get all Rad_Disk_1? Does it show mount points and whether it should be mounted? But this device works currently on my Solaris PC's, the W2100z and a laptop of mine. Strange. Maybe it's a USB issue. Have you checked: http://www.sun.com/io_technologies/usb/USB-Faq.html#Storage Especially #19? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS with Memory Sticks
Hi Paul, yes, ZFS is platform agnostic and I know it works in SANs. For the USB stick case, you may have run into labeling issues. Maybe Solaris SPARC did not recognize the x64 type label on the disk (which is strange, because it should...). Did you try making sure that ZFS creates an EFI label on the disk? You can check this by running zpool status and then the devices should look like c6t0d0 without the s0 part. If you want to force this, you can create an EFI label on the USB disk from hand by saying fdisk -E /dev/rdsk/cxtxdx. Hope this helps, Constantin Paul Gress wrote: OK, I've been putting off this question for a while now, but it eating at me, so I can't hold off any more. I have a nice 8 gig memory stick I've formated with the ZFS file system. Works great on all my Solaris PC's, but refuses to work on my Sparc processor. So I've formated it on my Sparc machine (Blade 2500), works great there now, but not on my PC's. Re-Formatted it on my PC, doesn't work on Sparc, and so on and so on. I thought it was a file system to go back and forth both architectures. So when will this compatibility be here, or if it's possible now, what is the secret? Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Best practice for moving FS between pool on same machine?
Hi, Chris Quenelle wrote: Thanks, Constantin! That sounds like the right answer for me. Can I use send and/or snapshot at the pool level? Or do I have to use it on one filesystem at a time? I couldn't quite figure this out from the man pages. the ZFS team is working on a zfs send -r (recursive) option to be able to recursively send and receive hierarchies of ZFS filesystems in one go, including pools. So you'll need to do it one filesystem at a time. This is not always trivial: If you send a full snapshot, then an incremental one and the target filesystem is mounted, you'll likely get an error that the target filesystem was modified. Make sure the target filesystems are unmounted and ideally marked as unmountable while performing the send/receives. Also, you may want to use the -F option to receive which forces a rollback of the target filesystem to the most recent snapshot. I've written a script to do all of this, but it's only works on my system certified. I'd like to get some feedback and validation before I post it on my blog, so anyone, let me know if you want to try it out. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for moving FS between pool on same machine?
Hi Chris, What is the best (meaning fastest) way to move a large file system from one pool to another pool on the same machine. I have a machine with two pools. One pool currently has all my data (4 filesystems), but it's misconfigured. Another pool is configured correctly, and I want to move the file systems to the new pool. Should I use 'rsync' or 'zfs send'? zfs send/receive is the fastest and most efficient way. I've used it multiple times on my home server until I had my configuration right :). What happens is I forgot I couldn't incrementally add raid devices. I want to end up with two raidz(x4) vdevs in the same pool. Here's what I have now: For this reason, I decided to go with mirrors. Yes, they use more raw storage space, but they are also much more flexible to expand. Just add two disks when the pool is full and you're done. If you have a lot of disks or can afford to add disks 4-5 disks at a time, then RAID-Z may be as easy to do, but remember that two disk failures in RAID-5 variants can be quite common - You may want RAID-Z2 instead. 1. move data to dbxpool2 2. remount using dbxpool2 3. destroy dbxpool1 4. create new proper raidz vdev inside dbxpool2 using devices from dbxpool1 Add: 0. Snapshot data in dbxpool1 so you can use zfs send/receive Then the above should work fine. I'm constrained by trying to minimize the downtime for the group of people using this as their file server. So I ended up with an ad-hoc assignment of devices. I'm not worried about optimizing my controller traffic at the moment. Ok. If you want to really be thorough, I'd recommend: 0. Run a backup, just in case. It never hurts. 1. Do a snapshot of dbxpool1 2. zfs send/receive dbxpool1 - dbxpool2 (This happens while users are still using dbxpool1, so no downtime). 3. Unmount dbxpool1 4. Do a second snapshot of dbxpool1 5. Do an incremental zfs send/receive of dbxpool1 - dbxpool2. (This should take only a small amount of time) 6. Mount dbxpool2 where dbxpool1 used to be. 7. Check everything is fine with the new mounted pool. 8. Destroy dbxpool1 9. Use disks from dbxpool1 to expand dbxpool2 (be careful :) ). You might want to exercise the above steps on an extra spare disk with two pools just to gain some confidence before doing it in production. I have a script that automatically does 1-6 that is looking for beta testers. If you're interested, let me know. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Scalability/performance
Hi, I'm quite interested in ZFS, like everybody else I suppose, and am about to install FBSD with ZFS. welcome to ZFS! Anyway, back to business :) I have a whole bunch of different sized disks/speeds. E.g. 3 300GB disks @ 40mb, a 320GB disk @ 60mb/s, 3 120gb disks @ 50mb/s and so on. Raid-Z and ZFS claims to be uber scalable and all that, but would it 'just work' with a setup like that too? Yes. If you dump a set of variable-size disks into a mirror or RAID-Z configuration, you'll get the same result as if you had the smallest of their sizes. Then, the pool will grow when exchanging smaller disks with larger. I used to run a ZFS pool on 1x250GB, 1x200GB, 1x85 GB and 1x80 GB the following way: - Set up an 80 GB slice on all 4 disks and make a 4 disk RAID-Z vdev - Set up a 5 GB slice on the 250, 200 and 85 GB disks and make a 3 disk RAID-Z - Set up a 115GB slice on the 200 and the 250 GB disk and make a 2 disk mirror. - Concatenate all 3 vdevs into one pool. (You need zpool add -f for that). Not something to be done on a professional production system, but it worked for my home setup just fine. The remaining 50GB from the 250GB drive then went into a scratch pool. Kinda like playing Tetris with RAID-Z... Later, I decided using just paired disks as mirrors are really more flexible and easier to expand, since disk space is cheap. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Scalability/performance
Hi, How are paired mirrors more flexiable? well, I'm talking of a small home system. If the pool gets full, the way to expand with RAID-Z would be to add 3+ disks (typically 4-5). With mirror only, you just add two. So in my case it's just about the granularity of expansion. The reasoning is that of the three factors reliability, performance and space, I value them in this order. Space comes last since disk space is cheap. If I had a bigger number of disks (12+), I'd be using them in RAID-Z2 sets (4+2 plus 4+2 etc.). Here, the speed is ok and the reliability is ok and so I can use RAID-Z2 instead of mirroring to get some extra space as well. Right now, i have a 3 disk raid 5 running with the linux DM driver. One of the most resent additions was raid5 expansion, so i could pop in a matching disk, and expand my raid5 to 4 disks instead of 3 (which is always interesting as your cutting on your parity loss). I think though in raid5 you shouldn't put more then 6 - 8 disks afaik, so I wouldn't be expanding this enlessly. So how would this translate to ZFS? I have learned so far that, ZFS ZFS does not yet support rearranging the disk cofiguration. Right now, you can expand a single disk to a mirror or an n-way mirror to an n+1 way mirror. RAID-Z vdevs can't be changed right now. But you can add more disks to a pool by adding more vdevs (You have a 1+1 mirror, add another 1+1 pair and get more space, have a 3+2 RAID-Z2 and add another 5+2 RAID etc.) basically is raid + LVM. e.g. the mirrored raid-z pairs go into the pool, just like one would use LVM to bind all the raid pairs. The difference being I suppose, that you can't use a zfs mirror/raid-z without having a pool to use it from? Here's the basic idea: - You first construct vdevs from disks: One disk can be one vdev. A 1+1 mirror can be a vdev, too. A n+1 or n+2 RAID-Z (RAID-Z2) set can be a vdev too. - Then you concatenate vdevs to create a pool. Pools can be extended by adding more vdevs. - Then you create ZFS file systems that draw their block usage from the resources supplied by the pool. Very flexible. Wondering now is if I can simply add a new disk to my raid-z and have it 'just work', e.g. the raid-z would be expanded to use the new disk(partition of matching size) If you have a RAID-Z based pool in ZFS, you can add another group of disks that are organized in a RAID-Z manner (a vdev) to expand the storage capacity of the pool. Hope this clarifies things a bit. And yes, please check out the admin guide and the other collateral available on ZFS. It's full of new concepts and one needs some getting used to to explore all possibilities. Cheers, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Scalability/performance
Hi Mike, If I was to plan for a 16 disk ZFS-based system, you would probably suggest me to configure it as something like 5+1, 4+1, 4+1 all raid-z (I don't need the double parity concept) I would prefer something like 15+1 :) I want ZFS to be able to detect and correct errors, but I do not need to squeeze all the performance out of it (I'll be using it as a home storage server for my DVDs and other audio/video stuff. So only a few clients at the most streaming off of it) this is possibe. ZFS in theory does not significantly limit the n and 15+1 is indeed possible. But for a number of reasons (among them performance) people generally advise to use no more than 10+1. A lot of ZFS configuration wisdom can be found on the Solaris internals ZFS Best Practices Guide Wiki at: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Richard Elling has done a great job of thoroughly analyzing different reliability concepts for ZFS in his blog. One good introduction is the following entry: http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance That may help you find the right tradeoff between space and reliability. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] New german white paper on ZFS
Hi, if you understand german or want to brush it up a little, I've a new ZFS white paper in german for you: http://blogs.sun.com/constantin/entry/new_zfs_white_paper_in Since there's already so much collateral on ZFS in english, I thought it's time for some localized stuff for my country. There are also some new ZFS slides that go with it, also in german. Let me know if you have any suggestions. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?
Hi, I'm a big fan of live upgrade. I'm also a big fan of ZFS boot. The latter is more important for me. And yes, I'm looking forward to both being integrated with each other. Meanwhile, what is the best way to upgrade a post-b61 system that is booted from ZFS? I'm thinking: 1. Boot from ZFS 2. Use Tim's excellent multiple boot datasets script to create a new cloned ZFS boot environment: http://blogs.sun.com/timf/entry/an_easy_way_to_manage 3. Loopback mount the new OS ISO image 4. Run the installer from the loopbacked ISO image in upgrade mode on the clone 5. Mark the clone to be booted the next time 6. Reboot into the upgraded OS. Questions: - How exactly do I do step 4? Before, luupgrade did everything for me, now what manpage do I need to do this? - Did I forget something above? I'm ok with losing some logfiles and stuff that maybe changed between the clone and the reboot, but is there anything else? - Did someone already blog about this and I haven't noticed yet? Cheers, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?
Hi, Our upgrade story isn't great right now. In the meantime, you might check out Tim Haley's blog entry on using bfu with zfs root. thanks. But doesn't live upgrade just start the installer from the new OS DVD with the right options? Can't I just do that too? Cheers, Constantin http://blogs.sun.com/timh/entry/friday_fun_with_bfu_and lori Constantin Gonzalez wrote: Hi, I'm a big fan of live upgrade. I'm also a big fan of ZFS boot. The latter is more important for me. And yes, I'm looking forward to both being integrated with each other. Meanwhile, what is the best way to upgrade a post-b61 system that is booted from ZFS? I'm thinking: 1. Boot from ZFS 2. Use Tim's excellent multiple boot datasets script to create a new cloned ZFS boot environment: http://blogs.sun.com/timf/entry/an_easy_way_to_manage 3. Loopback mount the new OS ISO image 4. Run the installer from the loopbacked ISO image in upgrade mode on the clone 5. Mark the clone to be booted the next time 6. Reboot into the upgraded OS. Questions: - How exactly do I do step 4? Before, luupgrade did everything for me, now what manpage do I need to do this? - Did I forget something above? I'm ok with losing some logfiles and stuff that maybe changed between the clone and the reboot, but is there anything else? - Did someone already blog about this and I haven't noticed yet? Cheers, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?
Hi Malachi, Malachi de Ælfweald wrote: I'm actually wondering the same thing because I have b62 w/ the ZFS bits; but need the snapshot's -r functionality. you're lucky, it's already there. From my b62 machine's man zfs: zfs snapshot [-r] [EMAIL PROTECTED]|[EMAIL PROTECTED] Creates a snapshot with the given name. See the Snapshots section for details. -rRecursively create snapshots of all descendant datasets. Snapshots are taken atomically, so that all recursive snapshots correspond to the same moment in time. Or did you mean send -r? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] A big Thank You to the ZFS team!
Hi, I just got ZFS boot up and running on my laptop. This being a major milestone in the history of ZFS, I thought I'd reflect a bit on what ZFS brought to my life so far: - I'm using ZFS at home and on my laptop since January 2006 for mission critical purposes: - Backups of my wife's and my own Macs. - Storing family photos (I have a baby now, so they _are_ mission critical :) ). - Storing my ca. 400 CDs that were carefully ripped and metadata'ed, which took a lot of work. - Providing fast and reliable storage for my PVR. - And of course all the rough stuff that happens to laptops on the road. - ZFS has already saved me from bit rot once. I could see that it fixed a bad block during a weekly scrub. What a great feeling to know that your data is much safer than it was before and to be able to see how and when it is being protected! It is kinda weird to talk to customers about adopting ZFS while knowing that my family pictures at home are probably stored safer than their company data... - ZFS enabled me to just take a bunch of differently sized drives that have been lying around somewhere and turn them into an easy to manage, consistent and redundant pool of storage that effortlessly handles very diverse workloads (File server, audio streaming, video streaming). - During the frequent migrations (Couldn't make up my mind first on how to slice and dice my 4 disks), zfs send/receive has been my best friend. It enabled me to painlessly migrate whole filesystems between pools in minutes. I'm now writing a script to further automate recursive and updating zfs send/receive orgies for backups and other purposes. - Disk storage is cheap, and thanks to ZFS it became reliable at zero cost. Therefore, I can snapshot a lot, not think about whether to delete stuff or not, or simply delete stuff I don't need know, while knowing it is still preserved in my snapshots. - As a result of all of this, I learned a great deal about Solaris 10 and it's other features, which is a big help in my day-to-day job. I know there's still a lot to do and that we're still working on some bugs, but I can safely say that ZFS is the best thing that happened to my data so far. So here's a big THANK YOU! to the ZFS team for making all of this and more possible for my little home system. Down the road, I've now migrated my pools to external mirrored USB disks (mirrored because it's fast and lowers complexity; USB, because it's pluggable and host-independent) and I'm thinking of how to backup them (I realize I still need a backup) onto other external disks or preferably another system. Again, zfs send/receive will be my friend here. ZFS boot on my home server is the other next big thing, enabling me to mirror my root file system more reliably than SVM can while saving space for live upgrade and enabling other cool stuff. I'm also thinking of using iSCSI zvols as Mac OS X storage for audio/video editing and whole-disk backups, but that requires some waiting until the Mac OS X iSCSI support has matured a bit. And then I can start to really archive stuff: Older backups that sit on CDs and are threatened by CD-rot, old photo CDs that have been sitting there and hopefully haven't begun to rot yet, maybe scan in some older photos, migrating my CD collection to a lossless format, etc. This sounds like I've been drinking too much koolaid, and I've probably have, but I guess all the above points remain valid even if I didn't work for Sun. So please take this email as being written by a private ZFS user and not a Sun employee. So, again, thank you so much ZFS team and keep up the good work! Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS boot: 3 smaller glitches with console, /etc/dfs/sharetab and /dev/random
Hi, I've now gone through both the opensolaris instructions: http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/ and Tim Foster's script: http://blogs.sun.com/timf/entry/zfs_bootable_datasets_happily_rumbling for making my laptop ZFS bootable. Both work well and here's a big THANK YOU to the ZFS boot team! There seem to be 3 smaller glitches with these approaches: 1. The instructions on opensolaris.org assume that one wants console output to show up in /dev/tty. This may be true for a server, but it isn't for a laptop or workstation user. Therefore, I suggest someone explains them to be optional as not everybody knows that these can be left out. 2. After going through the zfs-bootification, Solaris complains on reboot that /etc/dfs/sharetab is missing. Somehow this seems to have been fallen through the cracks of the find command. Well, touching /etc/dfs/sharetab just fixes the issue. 3. But here's a more serious one: While booting, Solaris complains: Apr 19 15:00:37 foeni kcf: [ID 415456 kern.warning] WARNING: No randomness provider enabled for /dev/random. Use cryptoadm(1M) to enable a provider. Somehow, /dev/random and/or it's counterpart in /devices seems to have suffered from the migration procedure. Does anybody know how to fix the /dev/random issue? I'm not very fluent in cryptoadm(1M) and some superficial reading of it's manpage did not enlighten me too much (cryptoadm list -p claims all is well...). Best regards and again, congratulations to the ZFS boot team! Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Who modified my ZFS receive destination?
Hi, I'm currently migrating a filesystem from one pool to the other through a series of zfs send/receive commands in order to preserve all snapshots. But at some point, zfs receive says cannot receive: destination has been modified since most recent snapshot. I am pretty sure nobody changed anything at my destination filesystem and I also tried rolling back to an earlier snapshot on the destination filesystem to make it clean again. Here's an excerpt of the snapshots on my source filesystem: # zfs list -rt snapshot pelotillehue/constant NAME USED AVAIL REFER MOUNTPOINT pelotillehue/[EMAIL PROTECTED] 236K - 33.6G - pelotillehue/[EMAIL PROTECTED] 747K - 46.0G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06 3.07G - 116G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00 18.9M - 115G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03 10.9M - 115G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00 606M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01 167M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00 5.31M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01 1.90M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 1.26M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 15.2M - 109G - pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-15-00:00:00 17.5M - 109G - ... (further lines omitted) On the destination filesystem, snapshots have been replicated through zfs send/receive up to the 2007-01-01 snapshot, so I do the following: # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 | zfs receive santiago/home/constant This worked, but now, only seconds later: # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive santiago/home/constant cannot receive: destination has been modified since most recent snapshot Fails. So I try rolling back to the 2007-01-08 snapshot on the destination filesystem to be clean again, but: # zfs rollback santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive santiago/home/constant cannot receive: destination has been modified since most recent snapshot Hmm, why does ZFS think my destination has been modified, although I didn't do anything? Another peculiar thing: zfs list on the destination snapshots says: # zfs list -rt snapshot santiago/home/constant NAMEUSED AVAIL REFER MOUNTPOINT santiago/home/[EMAIL PROTECTED] 189K - 33.6G - santiago/home/[EMAIL PROTECTED] 670K - 46.0G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06 3.07G - 116G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00 18.4M - 115G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03 10.5M - 115G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00 603M - 105G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01 163M - 105G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00 4.87M - 105G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01 1.79M - 106G - santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 1.16M - 106G - santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:0057K - 109G - Note that the Used column for the 2007-01-08 snapshot says 57K on the destination, but 15.2M on the source. Could it be that the reception of the 2007-01-08 failed and ZFS didn't notice? I've tried this multiple times, including destroying snapshots and rolling back on the destination to the 2007-01-01 state, so what you see above is already a second try of the same. The other values vary too, but only slightly. Compression is turned on on both pools. The source pool has been scrubbed on Monday with no known data errors and the destination pool is brand new and I'm scrubbing it as we speak. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland
Summary: [zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?
Hi, here's a quick summary of the answers I've seen so far: - Splitting mirrors is a current practice with traditional volume management. The goal is to quickly and effortlessly create a clone of a storage volume/pool. - Splitting mirrors with ZFS can be done, but it has to be done the hard way by resilvering, then unplugging the disk, then trying to import it somewhere else. zpool detach would render the detached disk unimportable. - Another, cleaner way of splitting a mirror would be to export the pool, then disconnect one drive, then re-import again. After that, the disconnected drive needs to be zpool detach'ed from the mother, while the clone can then be imported and its missing mirrors detached as well. But this involves unmounting the pool so it can't be done without downtime. - The supported alternative would be zfs snapshot, then zfs send/receive, but this introduces the complexity of snapshot management which makes it less simple, thus less appealing to the clone-addicted admin. - There's an RFE for supporting splitting mirrors: 5097228 http://bugs.opensolaris.org/view_bug.do?bug_id=5097228 IMHO, we should investigate if something like zpool clone would be useful. It could be implemented as a script that recursively snapshots the source pool, then zfs send/receives it to the destination pool, then copies all properties, but the actual reason why people do mirror splitting in the first place is because of its simplicity. A zpool clone or a zpool send/receive command would be even simpler and less error-prone than the tradition of splitting mirrors, plus it could be implemented more efficiently and more reliably than a script, thus bringing real additional value to administrators. Maybe zpool clone or zpool send/receive would be the better way of implementing 5097228 in the first place? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list [EMAIL PROTECTED] http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Who modified my ZFS receive destination?
Hi Trev, Trevor Watson wrote: Hi Constantin, I had the same problem, and the solution was to make sure that the filesystem is not mounted on the destination system when you perform the zfs recv (zfs set mountpoint=none santiago/home). thanks! This time it worked: # zfs unmount santiago/home/constant # zfs rollback santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive santiago/home/constant # Still, this is kinda strange. This means, that we'll need to zfs unmount, then zfs rollback last snapshot a lot when doing send/receive on a regular basis (as in weekly, daily, hourly, minutely cron-jobs) to be sure. Or keep any replicated filesystems unmounted _all_ the time. Best regards, Constantin Trev Constantin Gonzalez wrote: Hi, I'm currently migrating a filesystem from one pool to the other through a series of zfs send/receive commands in order to preserve all snapshots. But at some point, zfs receive says cannot receive: destination has been modified since most recent snapshot. I am pretty sure nobody changed anything at my destination filesystem and I also tried rolling back to an earlier snapshot on the destination filesystem to make it clean again. Here's an excerpt of the snapshots on my source filesystem: # zfs list -rt snapshot pelotillehue/constant NAME USED AVAIL REFER MOUNTPOINT pelotillehue/[EMAIL PROTECTED] 236K - 33.6G - pelotillehue/[EMAIL PROTECTED] 747K - 46.0G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06 3.07G - 116G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00 18.9M - 115G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03 10.9M - 115G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00 606M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01 167M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00 5.31M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01 1.90M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 1.26M - 105G - pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 15.2M - 109G - pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-15-00:00:00 17.5M - 109G - ... (further lines omitted) On the destination filesystem, snapshots have been replicated through zfs send/receive up to the 2007-01-01 snapshot, so I do the following: # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 | zfs receive santiago/home/constant This worked, but now, only seconds later: # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive santiago/home/constant cannot receive: destination has been modified since most recent snapshot Fails. So I try rolling back to the 2007-01-08 snapshot on the destination filesystem to be clean again, but: # zfs rollback santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 # zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive santiago/home/constant cannot receive: destination has been modified since most recent snapshot Hmm, why does ZFS think my destination has been modified, although I didn't do anything? Another peculiar thing: zfs list on the destination snapshots says: # zfs list -rt snapshot santiago/home/constant NAME USED AVAIL REFER MOUNTPOINT santiago/home/[EMAIL PROTECTED] 189K - 33.6G - santiago/home/[EMAIL PROTECTED] 670K - 46.0G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06 3.07G - 116G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00 18.4M - 115G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03 10.5M - 115G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00 603M - 105G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01 163M - 105G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00 4.87M - 105G - santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01 1.79M - 106G - santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 1.16M - 106G - santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 57K - 109G - Note that the Used
Re: [zfs-discuss] ZFS vs. rmvolmgr
Hi, sorry, I needed to be more clear: Here's what I did: 1. Connect USB storage device (a disk) to machine 2. Find USB device through rmformat 3. Try zpool create on that device. It fails with: can't open /dev/rdsk/cNt0d0p0, device busy 4. svcadm disable rmvolmgr 5. Now zpool create works with that device and the pool gets created. 6. svcadm enable rmvolmgr 7. After that, everything works as expected, the device stays under control of the pool. can't open /dev/rdsk/cNt0d0p0, device busy Do you remember exactly what command/operation resulted in this error? See above, it comes right after trying to create a zpool on that device. It is something that tries to open device exclusively. So after ZFS opens the device exclusively, hald and rmvolmgr will ignore it? What happens at boot time, is zfs then quicker in grabbing the device than hald and rmvolmgr are? So far, I've just said svcadm disable -t rmvolmgr, did my thing, then said svcadm enable rmvolmgr. This can't possibly be true, because rmvolmgr does not open devices. Hmm. I really remember to have done the above. Actually, I've been pulling some hairs out trying to do zpools on external devices until I got the idea of diasbling the rmvolmgr, then it worked. You'd need to also disable the 'hal' service. Run fuser on your device and you'll see it's one of the hal addons that keeps it open: Perhaps something depended on rmvolmgr which release the device after I disabled the service? For instance, I'm now running several USB disks with ZFS pools on them, and even after restarting rmvolmgr or rebooting, ZFS, the disks and rmvolmgr get along with each other just fine. I'm confused here. In the beginning you said that something got in the way, but now you're saying they get along just fine. Could you clarify. After creating the pool, the device now belongs to ZFS. Now, ZFS seems to be able to grab the device before anybody else. One possible workaround would be to match against USB disk's serial number and tell hal to ignore it using fdi(4) file. For instance, find your USB disk in lshal(1M) output, it will look like this: udi = '/org/freedesktop/Hal/devices/pci_0_0/pci1028_12c_1d_7/storage_5_0' usb_device.serial = 'DEF1061F7B62' (string) usb_device.product_id = 26672 (0x6830) (int) usb_device.vendor_id = 1204 (0x4b4) (int) usb_device.vendor = 'Cypress Semiconductor' (string) usb_device.product = 'USB2.0 Storage Device' (string) info.bus = 'usb_device' (string) info.solaris.driver = 'scsa2usb' (string) solaris.devfs_path = '/[EMAIL PROTECTED],0/pci1028,[EMAIL PROTECTED],7/[EMAIL PROTECTED]' (string) You want to match an object with this usb_device.serial property and set info.ignore property to true. The fdi(4) would look like this: thanks, this sounds just like what I was looking for. So the correct way of having a zpool out of external USB drives is to: 1. Attach the drives 2. Find their USB serial numbers with lshal 3. Set up an fdi file that matches the disks and tells hal to ignore them The naming of the file /etc/hal/fdi/preprobe/30user/10-ignore-usb.fdi sounds like init.d style directory and file naming, ist this correct? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?
Hi Mark, Mark J Musante wrote: On Tue, 10 Apr 2007, Constantin Gonzalez wrote: Has anybody tried it yet with a striped mirror? What if the pool is composed out of two mirrors? Can I attach devices to both mirrors, let them resilver, then detach them and import the pool from those? You'd want to export them, not detach them. Detaching will overwrite the vdev labels and make it un-importable. thank you for the export/import idea, it does sound cleaner from a ZFS perspective, but comes at the expense of temporarily unmounting the filesystems. So, instead of detaching, would unplugging, then detaching work? I'm thinking something like this: - zpool create tank mirror dev1 dev2 dev3 - {physically move dev3 to new box} - zpool detach tank dev3 On the new box: - zpool import tank - zpool detach tank dev1 - zpool detach tank dev2 This should work for one disk, and I assume this would also work for multiple disks? Thinking along similar lines, would it be a useful RFE to allow asynchronous mirroring like this: - dev1, dev2 are both 250GB, dev3 is 500GB - zpool create tank mirror dev1,dev2 dev3 This means that half of dev3 would mirror dev1, the other half would mirror dev2 and dev1,dev2 is a regular stripe. The utility of this would be for cases where customer have set up mirrors, then need to replace disks or upgrade the mirror after a long time, when bigger disks are easier to get than smaller ones and while reusing older disks. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Poor man's backup by attaching/detaching mirror
Hi, How would you access the data on that device? Presumably, zpool import. yes. This is basically what everyone does today with mirrors, isn't it? :-) sure. This may not be pretty, but it's what customers are doing all the time with regular mirrors, 'cause it's quick, easy and reliable. Cheers, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS vs. rmvolmgr
Hi, while playing around with ZFS and USB memory sticks or USB harddisks, rmvolmgr tends to get in the way, which results in a can't open /dev/rdsk/cNt0d0p0, device busy error. So far, I've just said svcadm disable -t rmvolmgr, did my thing, then said svcadm enable rmvolmgr. Is there a more elegant approach that tells rmvolmgr to leave certain devices alone on a per disk basis? For instance, I'm now running several USB disks with ZFS pools on them, and even after restarting rmvolmgr or rebooting, ZFS, the disks and rmvolmgr get along with each other just fine. What and how does ZFS tell rmvolmgr that a particular set of disks belongs to ZFS and should not be treated as removable? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Hi, - RAID-Z is _very_ slow when one disk is broken. Do you have data on this? The reconstruction should be relatively cheap especially when compared with the initial disk access. Also, what is your definition of broken? Does this mean the device appears as FAULTED in the pool status, or that the drive is present and not responding? If it's the latter, this will be fixed by my upcoming FMA work. sorry, the _very_ may be exaggarated and depending much on the load of the system and the config. I'm referring to a couple of posts and anecdotal experience from colleagues. This means that indeed slow or very slow may be a mixture of reconstruction overhead and device timeout issue. So, it's nice to see that the upcoming FMA code will fix some of the slowness issues. Did anybody measure how much CPU overhead RAID-Z and RAID-Z2 parity computation induces, both for writes and for reads (assuming a data disk is broken)? This data would be useful when arguing for a software RAID scheme in front of hardware-RAID addicted customers. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Hi, Now that zfsboot is becoming available, I'm wondering how to put it to use. Imagine a system with 4 identical disks. Of course I'd like to use you lucky one :). raidz, but zfsboot doesn't do raidz. What if I were to partition the drives, such that I have 4 small partitions that make up a zfsboot partition (4 way mirror), and the remainder of each drive becomes part of a raidz? Sounds good. Performance will suffer a bit, as ZFS thinks it has two pools with 4 spindels each, but it should still perform better than the same on a UFS basis. You may also want to have two 2-way mirrors and keep the second for other purposes such as a scratch space for zfs migration or as spare disks for other stuff. Do I still have the advantages of having the whole disk 'owned' by zfs, even though it's split into two parts? I'm pretty sure that this is not the case: - ZFS has no guarantee that someone will do something else with that other partition, so it can't assume the right to turn on disk cache for the whole disk. - Yes, it could be smart and realize that it does have the whole disk, only split up across two pools, but then I assume that this is not your typical enterprise class configuration and so it probably didn't get implemented that way. I'd say that not being able to benefit from the disk drive's cache is not as bad in the face of ZFS' other advantages, so you can probably live with that. Swap would probably have to go on a zvol - would that be best placed on the n-way mirror, or on the raidz? I'd place it onto the mirror for performance reasons. Also, it feels cleaner to have all your OS stuff on one pool and all your user/app/data stuff on another. This is also recommended by the ZFS Best Practices Wiki on www.solarisinternals.com. Now back to the 4 disk RAID-Z: Does it have to be RAID-Z? Maybe you might want to reconsider using 2 2-way mirrors: - RAID-Z is slow when writing, you basically get only one disk's bandwidth. (Yes, with variable block sizes this might be slightly better...) - RAID-Z is _very_ slow when one disk is broken. - Using mirrors is more convenient for growing the pool: You run out of space, you add two disks, and get better performance too. No need to buy 4 extra disks for another RAID-Z set. - When using disks, you need to consider availability, performance and space. Of all the three, space is the cheapest. Therefore it's best to sacrifice space and you'll get better availability and better performance. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Hi, Manoj Joseph wrote: Can write-cache not be turned on manually as the user is sure that it is only ZFS that is using the entire disk? yes it can be turned on. But I don't know if ZFS would then know about it. I'd still feel more comfortably with it being turned off unless ZFS itself does it. But maybe someone from the ZFS team can clarify this. Cheers, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pathological ZFS performance
/product.asp?item=N82E16812156010 It feels kind of nuts, but I have to think this would perform better than what I have now. This would cost me the one SATA drive I'm using now in a smaller pool. Rob T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating a pool
Hi Matt, cool, thank you for doing this! I'll still write my script since today my two shiny new 320GB USB disks will arrive :). I'll add to that the feature to first send all current snapshots, then bring down the services that depend on the filesystem, unmount the old fs, send a final incremental snapshot then zfs set mountpoint=x to the new filesystem, then bring up the services again. Hope this works as I imagine. Cheers, Constantin Matthew Ahrens wrote: Constantin Gonzalez wrote: What is the most elegant way of migrating all filesystems to the new pool, including snapshots? Can I do a master snapshot of the whole pool, including sub-filesystems and their snapshots, then send/receive them to the new pool? Or do I have to write a script that will individually snapshot all filesystems within my old pool, then run a send (-i) orgy? Unfortunately, you will need to make/find a script to do the various 'zfs send -i' to send each snapshot of each filesystem. I am working on 'zfs send -r', which will make this a snap: # zfs snapshot -r [EMAIL PROTECTED] # zfs send -r [EMAIL PROTECTED] | zfs recv ... You'll also be able to do 'zfs send -r -i @yesterday [EMAIL PROTECTED]'. See RFE 6421958. --matt -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Migrating a pool
Hi, soon it'll be time to migrate my patchwork pool onto a real pair of mirrored (albeit USB-based) external disks. Today I have about half a dozen filesystems in the old pool plus dozens of snapshots thanks to Tim Bray's excellent SMF snapshotting service. What is the most elegant way of migrating all filesystems to the new pool, including snapshots? Can I do a master snapshot of the whole pool, including sub-filesystems and their snapshots, then send/receive them to the new pool? Or do I have to write a script that will individually snapshot all filesystems within my old pool, then run a send (-i) orgy? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating a pool
Hi, Today I have about half a dozen filesystems in the old pool plus dozens of snapshots thanks to Tim Bray's excellent SMF snapshotting service. I'm sorry I mixed up Tim's last name. The fine guy who wrote the SMF snapshot service is Tim Foster. And here's the link: http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_8 There doesn't seem to be an easy answer to the original question of how to migrate a complete pool. Writing a script with a snapshot send/receive party seems to be the only approach. I wish I could zfs snapshot pool then zfs send pool | zfs receive dest and all blocks would be transferred as they are, including all embedded snapshots. Is that already an RFE? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2-way mirror or RAIDZ?
Hi, I have a shiny new Ultra 40 running S10U3 with 2 x 250Gb disks. congratulations, this is a great machine! I want to make best use of the available disk space and have some level of redundancy without impacting performance too much. What I am trying to figure out is: would it be better to have a simple mirror of an identical 200Gb slice from each disk or split each disk into 2 x 80Gb slices plus one extra 80Gb slice on one of the disks to make a 4 + 1 RAIDZ configuration? you probably want to mirror the OS slice of the disk to protect your OS and its configuration from the loss of a whole disk. Do it with SVM today and upgrade to a bootable ZFS mirror in the future. The OS slice needs only to be 5GB in size if you follow the standard recommendation, but 10 GB is probably a safe and easy to remember bet, leaving you some extra space for apps etc. Plan to be able to live upgrade into new OS versions. You may break up the mirror to do so, but this is kinda complicated and error-prone. Disk space is cheap, so I'd rather recommend you safe two slices per disk for creating 2 mirrored boot environments where you can LU back and forth. For swap, allocate an extra slice per disk and of course mirror swap too. 1GB swap should be sufficient. Now, you can use the rest for ZFS. Having only two physical disks, there is no good reason to do something other than mirroring. If you created 4+1 slices for RAID-Z, you would always lose the whole pool if one disk broke. Not good. You could play russian roulette by having 2+3 slices and RAID-Z2 and hoping that the right disk fails, but that isn't s good practice either and it wouldn't buy you any redundant space either, just leave an extra unprotected scratch slice. So, go for the mirror, it gives you good performance and less headaches. If you can spare the money, try increasing the number of disks. You'd still need to mirror boot and swap slices, but then you would be able to use a real RAID-Z config for the rest, enabling to leverage more disk capacity at a good redundancy/performance compromise. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)
Hi, Artem: Thanks. And yes, Peter S. is a great actor! Christian Mueller wrote: who is peter stormare? (sorry, i'm from old europe...) as usual, Wikipedia knows it: http://en.wikipedia.org/wiki/Peter_Stormare and he's european too :). Great actor, great movies. I particularly like Constantine, not just because of the name, of course :) Out budget is quite limited at the moment, but after the 1.000.000th view on YouTube/Google Video we might want to reconsider our cast for the next episode :). But first, we need to get the english version finished... Cheers, Constantin thx bye christian Artem Kachitchkine schrieb: Brilliant video, guys. Totally agreed, great work. Boy, would I like to see Peter Stormare in that video %) -Artem. -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)
Hi Richard, Richard Elling wrote: FYI, here is an interesting blog on using ZFS with a dozen USB drives from Constantin. http://blogs.sun.com/solarium/entry/solaris_zfs_auf_12_usb thank you for spotting it :). We're working on translating the video (hope we get the lip-syncing right...) and will then re-release it in an english version. BTW, we've now hosted the video on YouTube so it can be embedded in the blog. Of course, I'll then write an english version of the blog entry with the tech details. Please hang on for a week or two... :). Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
Hi, I need to be a little bit more precise in how I formulate comments: 1. Yes, zpool remove is a desirable feature, no doubt about that. 2. Most of the cases where customers ask for zpool remove can be solved with zfs send/receive or with zpool replace. Think Pareto's 80-20 rule. 2a. The cost of doing 2., including extra scratch storage space or scheduling related work into planned downtimes is smaller than the cost of not using ZFS at all. 2b. Even in the remaining 20% of cases (figuratively speaking, YMMV) where zpool remove would be the only solution, I feel that the cost of sacrificing the extra storage space that would have become available through zpool remove is smaller than the cost of the project not benefitting from the rest of ZFS' features. 3. Bottom line: Everybody wants zpool remove as early as possible, but IMHO this is not an objective barrier to entry for ZFS. Note my use of the word objective. I do feel that we have to implement zpool remove for subjective reasons, but that is a non technical matter. Is this an agreeable summary of the situation? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: can I use zfs on just a partition?
Hi, When you do the initial install, how do you do the slicing? Just create like: / 10G swap 2G /altroot 10G /zfs restofdisk yes. Or do you just create the first three slices and leave the rest of the disk untouched? I understand the concept at this point, just trying to explain to a third party exactly what they need to do to prep the system disk for me :) No. You need to be able to tell ZFS what to use. Hence, if your pool is created at the slice level, you need to create a slice for it. So the above is the way to go. And yes, you only should do this on laptos and other machines where you only have 1 disk or are otherwise very disk-limited :). Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: can I use zfs on just a partition?
Hi Tim, Essentially I'd like to have the / and swap on the first 60GB of the disk. Then use the remaining 100GB as a zfs partition to setup zones on. Obviously the snapshots are extremely useful in such a setup :) Does my plan sound feasible from both a usability and performance standpoint? yes, it works, I do it on my laptop all the time: # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0d0 DEFAULT cyl 48451 alt 2 hd 64 sec 63 /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 Specify disk (enter its number): 0 selecting c0d0 Controller working list found [disk formatted, defect list found] Warning: Current Disk has mounted partitions. /dev/dsk/c0d0s0 is currently mounted on /. Please see umount(1M). /dev/dsk/c0d0s1 is currently used by swap. Please see swap(1M). /dev/dsk/c0d0s3 is part of active ZFS pool poolchen. Please see zpool(1M). /dev/dsk/c0d0s4 is in use for live upgrade /. Please see ludelete(1M). c0d0s5 is also free and can be used as a third live upgrade partition. My recommendation: Use at least 2 slices for the OS so you can enjoy live upgrade, one for swap and the rest for ZFS. Performance-wise, this is of course not optimal, but perfectly feasible. I have an Acer Ferrari 4000 which is known to have a slow disk, but it still works great for what I do (email, web, Solaris demos, presentations, occasional video). More complicated things are possible as well. The following blog entry: http://blogs.sun.com/solarium/entry/tetris_spielen_mit_zfs (sorry it's german) ilustrates how my 4 disks at home are sliced in order to get OS partitions on multiple disks, Swap and as much ZFS space as possible at acceptable redundancy despite differently-sized disks. Check out the graphic in the above entry to see what I mean. Works great (but I had to use -f to zpool create :) ) and gives me enough performance for all my home-serving needs. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] poor NFS/ZFS performance
Hi, I haven't followed all the details in this discussion, but it seems to me that it all breaks down to: - NFS on ZFS is slow due to NFS being very conservative when sending ACK to clients only after writes have definitely committed to disk. - Therefore, the problem is not that much ZFS specific, it's just a conscious focus on data correctness vs. speed on ZFS/NFS' part. - Currently known workarounds include: - Sacrifice correctness for speed by disabling ZIL or using a less conservative network file system. - Optimize NFS/ZFS to get as much speed as possible within the constraints of the NFS protocol. But one aspect I haven't seen so far is: How can we optimize ZFS on a more hardware oriented level to both achieve good NFS speeds and still preserve the NFS level of correctness? One possibility might be to give the ZFS pool enough spindles so it can comfortably handle many small IOs fast enough for them not to become NFS commit bottlenecks. This may require some tweaking on the ZFS side so it doesn't queue up write IOs for too long as to not delay commits more than necessary. Has anyone investigated this branch or am I too simplistic in my view of the underlying root of the problem? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] poor NFS/ZFS performance
Hi Roch, thanks, now I better understand the issue :). Nope. NFS is slow for single threaded tar extract. The conservative approach of NFS is needed with the NFS protocol in order to ensure client's side data integrity. Nothing ZFS related. ... NFS is plenty fast in a throughput context (not that it does not need work). The complaints we have here are about single threaded code. ok, then it's just a single thread client latency of request issue, which (as increasingly often) software vendors need to realize. The proper way to deal with this, then, is to multi-thread on the application layer. Reminds my of many UltraSPARC T1 issues, which don't sit in hardware nor OS, but in the way applications have been developed for years :). Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Newbie questions about drive problems
Hi, I have 3 drives. The first one will be the primary/boot drive under UFS. The 2 others will become a mirrored pool with ZFS. Now, I have problem with the boot drive (hardware or software), so all the data on my mirrored pool are ok? How can I restore this pool? When I create the pool, do I need to save the properties? All metadata for the pool is stored inside the pool. If the boot disk fails in any way, all pool data is safe. Worst case might be that you have to reinstall everything on the boot disk. After that, you just say zfs import to get your pool back and everything will be ok. What happend when a drive crash when ZFS write some data on a raidz pool? If the crash occurs in the middle of a write operation, then the new data blocks will not be valid. ZFS will then revert back to the state before writing the new set of blocks. Therefore you'll have 100% data integrity but of course the new blocks that were written to the pool will be lost. Do the pool go to the degraded state or faulted state? No, the pool will come up as online. The degraded state is only for devices that aren't accessible any more and the faulted state is for pools that do not have enough valid devices to be complete. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Load-balancing over vdevs vs. real disks?
Hi, my ZFS pool for my home server is a bit unusual: pool: pelotillehue state: ONLINE scrub: scrub completed with 0 errors on Mon Aug 21 06:10:13 2006 config: NAMESTATE READ WRITE CKSUM pelotillehue ONLINE 0 0 0 mirrorONLINE 0 0 0 c0d1s5 ONLINE 0 0 0 c1d0s5 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c0d0s3 ONLINE 0 0 0 c0d1s3 ONLINE 0 0 0 c1d0s3 ONLINE 0 0 0 c1d1s3 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c0d1s4 ONLINE 0 0 0 c1d0s4 ONLINE 0 0 0 c1d1s4 ONLINE 0 0 0 The reason is simple: I have 4 differently-sized disks (80, 80, 200, 250 GB. It's a home server and so I crammed whatever I could find elswhere into that box :) ) and my goal was to create the biggest pool possible but retaining some level of redundancy. The above config therefore groups the biggest slices that can be created on all four disks into the 4-disk RAID-Z vdev, then the biggest slices that can be created on 3 disks into the 3-disk RAID-Z, then two large slices remain which are mirrored. It's like playing Tetris with disk slices... But the pool can tolerate 1 broken disk and it gave me maximum storage capacity, so be it. This means that we have one pool with 3 vdevs that access up to 3 different sliced on the same physical disk. Question: Does ZFS consider the underlying physical disks when load-balancing or does it only load-balance across vdevs thereby potentially overloading physical disks with up to 3 parallel requests per physical disk at once? I'm pretty sure ZFS is very intelligent and will do the right thing, but a confirmation would be nice here. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposal: user-defined properties
Hi Eric, this is a great proposal and I'm sure this is going to help administrators a lot. One small question below: Any property which contains a colon (':') is defined as a 'user property'. The name can contain alphanumeric characters, plus the following special characters: ':', '-', '.', '_'. User properties are always strings, and are always inherited. No additional validation is done on the contents. Properties are set and retrieved through the standard mechanisms: 'zfs set', 'zfs get', and 'zfs inherit'. # zfs list -o name,local:department NAME LOCAL:DEPARTMENT test 12345 test/foo 12345 # zfs set local:department=67890 test/foo # zfs inherit local:department test # zfs get -s local -r all test NAME PROPERTY VALUE SOURCE test/foo local:department 12345 local # zfs list -o name,local:department NAME LOCAL:DEPARTMENT test - test/foo 12345 the example suggests that properties may be case-insensitive. Is that the case (sorry for the pun)? If so, that should be noted in the user defined property definition just for clarity. Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Home Server with ZFS
Hi, What i dont know is what happens if the boot disk dies? can i replace is, install solaris again and get it to see the zfs mirror? As I understand it, this be possible, but I haven't tried it and I'm not an expert Solaris admin. Some ZFS info is stored in a persistent file on your system disk, and you may have to do a little dance to get around that. It's worth researching and practicing in advance :-). IIRC, then ZFS has all relevant information stored inside the pool. So you should be able to install a new OS into the replacement disk, then say zpool import (possibly with -d and the devices where the mirror lives) to re-import the pool. But I haven't really tried it myself :). All in all, ZFS is an excellent choice for a home server. I use ZFS as a video storage for a digital set top box (quotas are really handy here), as a storage for my music collection, as a backup storage for important data (including photos), etc. I'm currently juggling around 4 differently sized disks into a new config with the goal of getting as much storage as possible out of them at a minimum level of redundance. Interesting, Teris-like calculation exercise that I'd be happy to blog about when I'm done. Feel free to visit my blog for how to set up your home server as a ZFS iTunes streaming server :). Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Enabling compression/encryption on a populated filesystem
Hi, there might be value in a zpool scrub -r (as in re-write blocks) other than the prior discussion on encryption and compression. For instance, a bit that is just about to rot might not be detected with a regular zfs scrub but it would be rewritten with a re-writing scrub. It would also exercise the writing muscles on disks that don't see a lot of writing, such as archives or system disks, thereby detecting any degradation that affects writing of data. Of course the re-writing must be 100% safe, but that can be done with COW quite easily. Then, admins would for instance run a zpool scrub every week and maybe a zpool scrub -r every month or so. Just my 2 cents, Constantin Luke Scharf wrote: Darren J Moffat wrote: Buth the reason thing is how do you tell the admin its done now the filesystem is safe. With compression you don't generally care if some old stuff didn't compress (and with the current implementation it has to compress a certain amount or it gets written uncompressed anyway). With encryption the human admin really needs to be told. As a sysadmin, I'd be happy with another scrub-type command. Something with the following meaning: Reapply all block-level properties such as compression, encryption, and checksum to every block in the volume. Have the admin come back tomorrow and run 'zpool status' too see if it's zone. Mad props if I can do this on a live filesystem (like the other ZFS commands, which also get mad props for being good tools). A natural command for this would be something like zfs blockscrub tank/volume. Also, zpool blockscrub tank would make sense to me as well, even though it might touch more data. Of course, it's easy for me to just say this, since I'm not thinking about the implementation very deeply... -Luke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [raidz] file not removed: No space left on device
Hi Eric, Eric Schrock wrote: You don't need to grow the pool. You should always be able truncate the file without consuming more space, provided you don't have snapshots. Mark has a set of fixes in testing which do a much better job of estimating space, allowing us to always unlink files in full pools (provided there are no snapshots, of course). This provides much more logical behavior by reserving some extra slop. is this a planned and not yet implemented functionality or why did Tatjana see the not able to rm behaviour? Or should she use unlink(1M) in these cases? Best regards, Constantin - Eric On Mon, Jul 03, 2006 at 02:23:06PM +0200, Constantin Gonzalez wrote: Hi, of course, the reason for this is the copy-on-write approach: ZFS has to write new blocks first before the modification of the FS structure can reflect the state with the deleted blocks removed. The only way out of this is of course to grow the pool. Once ZFS learns how to free up vdevs this may become a better solution because you can then shrink the pool again after the rming. I expect many customers to run into similar problems and I've already gotten a number of what if the pool is full questions. My answer has always been No file system should be used up more than 90% for a number of reasons, but in practice this is hard to ensure. Perhaps this is a good opportunity for an RFE: ZFS should reserve enough blocks in a pool in order to always be able to rm and destroy stuff. Best regards, Constantin P.S.: Most US Sun employees are on vacation this week, so don't be alarmed if the really good answers take some time :). -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [raidz] file not removed: No space left on device
Hi, of course, the reason for this is the copy-on-write approach: ZFS has to write new blocks first before the modification of the FS structure can reflect the state with the deleted blocks removed. The only way out of this is of course to grow the pool. Once ZFS learns how to free up vdevs this may become a better solution because you can then shrink the pool again after the rming. I expect many customers to run into similar problems and I've already gotten a number of what if the pool is full questions. My answer has always been No file system should be used up more than 90% for a number of reasons, but in practice this is hard to ensure. Perhaps this is a good opportunity for an RFE: ZFS should reserve enough blocks in a pool in order to always be able to rm and destroy stuff. Best regards, Constantin P.S.: Most US Sun employees are on vacation this week, so don't be alarmed if the really good answers take some time :). Tatjana S Heuser wrote: On a system still running nv_30, I've a small RaidZ filled to the brim: 2 3 [EMAIL PROTECTED] pts/9 ~ 78# uname -a SunOS mir 5.11 snv_30 sun4u sparc SUNW,UltraAX-MP 0 3 [EMAIL PROTECTED] pts/9 ~ 50# zfs list NAME USED AVAIL REFER MOUNTPOINT mirpool1 33.6G 0 137K /mirpool1 mirpool1/home 12.3G 0 12.3G /export/home mirpool1/install 12.9G 0 12.9G /export/install mirpool1/local1.86G 0 1.86G /usr/local mirpool1/opt 4.76G 0 4.76G /opt mirpool1/sfw 752M 0 752M /usr/sfw Trying to free some space is meeting a lot of reluctance, though: 0 3 [EMAIL PROTECTED] pts/9 ~ 51# rm debug.log rm: debug.log not removed: No space left on device 0 3 [EMAIL PROTECTED] pts/9 ~ 55# rm -f debug.log 2 3 [EMAIL PROTECTED] pts/9 ~ 56# ls -l debug.log -rw-r--r-- 1 th12242027048 Jun 29 23:24 debug.log 0 3 [EMAIL PROTECTED] pts/9 ~ 58# : debug.log debug.log: No space left on device. 0 3 [EMAIL PROTECTED] pts/9 ~ 63# ls -l debug.log -rw-r--r-- 1 th12242027048 Jun 29 23:24 debug.log There are no snapshots, so removing/clearing the files /should/ be a way to free some space there. Of course this is the same filesystem where zdb dumps core - see: *Synopsis*: zdb dumps core - bad checksum http://bt2ws.central.sun.com/CrPrint?id=6437157 *Change Request ID*: 6437157 (zpool reports the RaidZ pool as healthy while zdb crashes with a 'bad checksum' message.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] add_install_client and ZFS and SMF incompatibility
Hi, I just set up an install server on my notebook and of course all the installer data is on a ZFS volume. I love the zfs compression=on command! It seems that the standard ./add_install_client script from the S10U2 Tools directory creates an entry in /etc/vfstab for a loopback mount of the Solaris miniroot into the /tftpboot directory. Unfortunately, at boot time (I'm using Nevada build 39), the mount_all script tries to mount the loopback mount from /vfstab before ZFS gets its filesystems mounted. So the SMF filesystem/local method fails and I have to either mount all ZFS filesystems from hand, then re-run mount_all or replace the vfstab entry with a simple symlink. Which only works until you say add_install_client the next time. Is this a known issue? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss